Dmitry Korkin Leads $1.3 Million Project to Model the Predictability and Stability of Protein Isoforms


Marketing Communications


Dmitry Korkin, professor in the Department of Computer Science, has launched a $1.3 million bioinformatics project funded by the National Institutes of Health to develop computational tools to predict the stability and function of proteins that are produced as a result of alternative splicing of genetic code. 


Dmitry Korkin

Dmitry Korkin

The four-year project is combining high-level computing and laboratory experiments to probe a critical but poorly understood aspect of gene expression: One gene can give rise to more than one type of protein, and those diverse proteins could be inactive molecules, unstable molecules, or variants that play a role in human disease. 

“Humans are complex beings that can produce as many as 200 different proteins from a single gene, depending on how code from that gene is spliced together,” Korkin says. “This research will use deep machine learning algorithms to predict what that alternative splicing of genetic code means for the proteins that get produced and their function in cells, tissues, and diseases. ”  

Gene expression in multi-cellular organisms is a complex process that starts with DNA, a long molecule found in cells. Segments of DNA contain instructions, or genes. During a step called transcription, other molecules copy segments of a gene. The copies undergo additional steps to become messenger RNA (mRNA), which provides instructions used to make specific proteins in a process called translation. 

Alternative splicing occurs between transcription and translation, when copies of DNA code are assembled in different ways into precursors of mRNA. The alternative splicing leads to mRNA that is translated into protein variants known as isoforms. 

As a result, a limited number of genes can lead to many protein isoforms. Scientists estimate that the human genome contains about 20,500 genes that code for about 300,000 protein isoforms. 


"Ultimately, this technology could lead to more precise diagnostics and personalized medical treatments." - Dmitry Korkin, Professor, Department of Computer Science

Little is known about the stability and function of protein isoforms, according to Korkin, who is principal investigator (PI) on the NIH-funded project. In addition to developing a stability predictor and a method to characterize the function of isoforms, Korkin and co-PI Gloria Sheynkman will develop a method to study the effects of alternative splicing on interactions between proteins. Sheynkman is an assistant professor of molecular physiology and biological physics at the University of Virginia School of Medicine who has previously collaborated with Korkin. 

Experiments using state-of-the-art proteogenomic and CRISPR approaches will be designed and carried out by the Sheynkman lab across different cell lines and human tissues to confirm the computational predictions done by Korkin’s group and to provide Korkin with new data to train Korkin’s models. 

“A unique aspect of our research is our holistic approach,” Korkin says. “The computational findings will lead to lab experiments, and data from those experiments will be used to refine the computational tools.” 

Two PhD students at WPI will participate in the research, and undergraduate students will have opportunities to get involved, as well. Korkin, who is on sabbatical until July 2023 but continues to work on the project, says the project also will be incorporated into his teaching.  

“Bioinformatics is a relatively new science, and it is evolving,” Korkin says. “It makes sense to take what we’re learning from research and introduce it in the classroom. In fact, many students find the field exciting because we are discussing cutting-edge discoveries.”