Crystallizing Knowledge with a Learning Machine

UConn research was the cover story in a recent edition of the journal CrystEngComm.
UConn research was the cover story in a recent edition of the journal CrystEngComm.

Transforming a new drug from a set of liquid ingredients in a lab to a pill in a box can be an exercise in complex chemistry. To better understand how drug ingredients crystallize, UConn researchers mined a vast collection of experimental data provided by Pfizer. They reported their findings in the Feb. 28 cover story of the journal CrystEngComm.

Many medicines are taken in solid crystalline form as pills. But figuring out the best way to coax a drug into solid form is a tricky problem. There are many different solvents that drug ingredients could be dissolved in, and many different procedures that might get the drug to crystallize. Processing conditions, such as temperature and pressure, can also have a profound effect. There are so many different variables – things that could change the outcome – involved that machine learning might be the best way to attack such a complicated problem.

Pfizer formed a collaboration with UConn materials scientist Serge Nakhmanson and his colleagues in the Department of Materials Science and Engineering to evaluate machine learning approaches for their usefulness. Data mining, they hoped, could help figure out the best way to get a pharmaceutical compound to crystallize. Using Pfizer’s data and relevant expertise, the UConn materials team tested three different computer algorithms. The algorithms are referred to as machine learning because the computer uses them to build mathematical models of the data, find patterns, and then ‘learn’ from those patterns to make accurate predictions.

Nakhmanson’s graduate student, Ayana Ghosh, found that the Random Forest Regression (RFR) algorithm provided the most accurate crystallization predictions. In addition, RFR was the only one able to identify traits that would make pharmaceutical molecules easier to crystallize; for example, if a molecule weighs less than X amount and has a certain number of hydrogen bonds, the probability it can be successfully crystallized is increased.

“This is precisely the sort of information that a synthetic chemist would need in order to decide how to make a new drug in the form of a pill,” says Nakhmanson. “The RFR machine learning technique is really helpful in addressing which parameters are important for crystallization and which ones are not.”