As a kid growing up in Hyderabad, India, Siddhartha Pullannagari found himself drawn to technology and its possibilities. He liked the idea of researching a problem and applying that knowledge to create a technological solution to address it. It’s ultimately what led him to the United States to study computer science with the intention of getting involved in app development.
Pullannagari, now a senior at the University of Missouri–St. Louis, hadn’t initially considered healthcare as an area to deploy those skills, but he made it a point to start going to hackathons around the country, at places like Harvard, MIT and the University of California, Berkeley, and he kept noticing it as one possible track for exploration.
“It always intrigued me,” Pullannagari says. “Especially AI in health care."
His curiosity was similarly piqued last year when Associate Professor Sharlee Climer told him about a Biological Data Science course she was teaching in the fall semester, designed to give students an introduction to key areas of biological data science and provide them hands-on experience processing and analyzing genetic data.
Pullannagari eagerly enrolled, joining five graduate students and nine other undergraduates in the class. Over the course of the semester, they downloaded data sets associated with a disease of their choosing from the Gene Expression Omnibus, a public functional genomics data repository, and learned how to clean and process the data, built network models that captured correlation patterns and tested those patterns for association with a trait of interest – such as the presence of a disease. They then validated their results with independent data.
“I had built apps that were basically cutting down the gap between doctors and patients and using AI so they could have better care,” Pullannagari says. “But this is a completely different side of it. It’s network modeling, which I’ve been really interested to try. There’s been a boom recently in how it can be used for lead creating and lead generating. I thought, ‘Why not explore this in health care? Let’s see where we would go with this.’”
Pullannagari and his classmates followed the same approach that Climer has used in her own research, which is often focused on genetic data associated with Alzheimer’s.
But they used it to investigate data sets connected to a variety of other conditions. They were able to identify and validate significant associations involving gene expression levels in cells lining the colon for colon cancer patients, blood samples for rheumatoid arthritis patients and skin biopsies for psoriasis patients.
There were other relevant findings involving gene expression levels in cells lining the colon for colon cancer patients, blood samples for rheumatoid arthritis patients and skin biopsies for psoriasis patients.
There were other relevant findings involving gene expression levels in blood samples for lupus patients and microRNA levels in blood samples for cancer patients.
Some of Climer's students have been continuing their analyses this semester and are examining traits such as DNA methylation levels for COVID-19, DNA methylation levels for hepatocellular carcinoma and gene expression levels for ALS.
“Each of the validated patterns, Climer says, “represents a biomarker signature with potential to identify individuals exhibiting a subtype of the given trait.
One size doesn't fit all
The word “subtype is an import one because conditions and diseases,including most types of cancer, diabetes, rheumatoid arthritis, lupus, Alzheimer'sand countless others, are heterogeneous.
That means they have multiple forms, often arising from varying genetic, molecular or environmental factors, such as diet, exercise and exposure to toxins. In general, complex diseases have multiple genetic variants working together in complicated biological processes, possibly augmented by those environmental factors, that create a distinctive multifactorial genetic and environmental signature for each subtype.
It all can lead to different clinical presentation. One subtype might show up earlier in a person’s life than another or seem to act more aggressively, increasing the urgency to diagnose the problem and begin treatment.
Precision medicine aims to account for those differences while creating a more tailored approach to health that customizes disease prevention and treatment based on each patient’s unique genetic profile, environment and lifestyle.
But Climer says common tests for diseases often still don’t account for those differences. “The methods that other people are using are really good if there’s only one type – if it’s homogenous – but they don’t work when there are subtypes,” she says. “The correlation measures don’t work because they’re looking for the correlations for everybody, not just for a small group.”
The statistical measures used to evaluate a test’s effectiveness often rely on examining the true positive, true negative, false positive and/or false negative rates. But if the test is looking for a single biomarker associated with a subtype of a disease that represents only about 10% of all cases of that condition, the true positive rates will be inherently small, rendering the test ineffective.
The computational tools Climer has developed – and the ones Pullannagari and his fellow students learned to deploy – are geared not only at recognizing a combination of biomarkers associated with a disease but looking at how closely correlated those biomarkers are with particular subtypes, which reveal themselves as clusters in the network models.