Using Big Data to Identify Genetic, Neural Bases for Substance Use Disorder

UConn researcher Jinbo Bi is using machine learning to better understand the genetic basis for substance use disorder.

White man holding two beers

Image by Ri Butov from Pixabay

Substance use disorders (SUDs) represent a broad category of mental health disorders that encompasses dependence on any substance from alcohol to nicotine to opioids. In 2018, 20.3 million people in the United States had an SUD according to the National Center for Drug Abuse Statistics.

One of the challenges for researchers studying SUDs is that there might be different underlying mechanisms or pathways that cause someone to become addicted to a certain substance, and the specific neurological and genetic factors that account for heterogeneous clinical manifestation are poorly understood.

Professor Jinbo Bi in the Department of Computer Science and Engineering at the University of Connecticut has received a $1.7 million grant from the National Institute on Drug Abuse to develop machine learning algorithms to help identify SUD subcategories based on clinical, neuroimaging, and genetic data.

This work will help advance scientific understanding of the underlying mechanisms of SUDs in the hopes of eventually leveraging this knowledge to develop better treatments for addiction.

Bi will work with Chiang-shan Li, a neuroscientist at Yale University, and Henry Kranzler, a pharmacogenecist at the University of Pennsylvania, to examine large-scale public databases.

The team’s previous project leveraged genetic data from 12,000 people gathered during other studies of alcohol and drug dependence. They used this information to generate SUD subtypes that can be differentiated not only on the basis of clinical symptoms but also genetic associations.

The problem with this kind of work is that observable clinical symptoms are the endpoint, not the cause. By working backwards from symptoms and attempting to identify how genes impact SUDs, the detectable effects are often weak, inconsistent, and difficult to detect.

Neural features, however, can be a useful tool for identifying the underlying causes of psychiatric disorders including SUDs. The researchers hope to identify specific neuroimaging features that serve as biomarkers and can help them further identify SUD subtypes for specific substances.

This work advances beyond traditional statistical analyses by emphasizing pattern discoveries we can uncover by taking advantage of big data.

Bi’s team will utilize information from the UK Biobank Project, a large biomedical database with data from 500,000 participants. The UK Biobank Project collects information on lifestyle, cognition, biomarkers, imaging, genetics, physical activity and other useful metrics. Bi’s team will combine MRI and genetic data focusing on nicotine and alcohol addiction, the two most common SUDs.

The multiple modalities of MRI imaging can show structural and functional changes between brains with and without an SUD. It may also elucidate specific differences between someone addicted to alcohol versus nicotine.

By combining this neuroimaging information with genetic data, the researchers will create an innovative machine learning model capable of identifying heritable SUDs based on genes and neural features, furthering the understanding of the genetic basis of addiction.

“The machine learning tools developed for this project will provide an innovative and reliable foundation to enhance the aggregation and analysis of multidimensional data, and to meet the diagnostic and predictive challenges in mental health research,” Bi says.

The team has previously worked with the Human Connectome Project, which is dedicated to mapping the connections between the neural pathways that underly brain function and human behavior. This project will be a continuation of that work.


Bi is the Frederick H. Leonhardt Professor of Computer Science and associate head of the department. She holds a Ph.D. from Rensselaer Polytechnic Institute. Her research interests include artificial intelligence, machine learning, data mining, pattern recognition, optimization, computer vision, bioinformatics, medical informatics, drug discovery.

Follow UConn Research on Twitter & LinkedIn.