Cracking the code for better care

Associate Professor Sharlee Climer and her computer science students are unlocking clues in genetic data that can improve precision medicine.

By Steve Walentik

As a kid growing up in Hyderabad, India, Siddhartha Pullannagari found himself drawn to technology and its possibilities. He liked the idea of researching a problem and applying that knowledge to create a technological solution to address it. It’s ultimately what led him to the United States to study computer science with the intention of getting involved in app development.

Pullannagari, now a senior at the University of Missouri–St. Louis, hadn’t initially considered healthcare as an area to deploy those skills, but he made it a point to start going to hackathons around the country, at places like Harvard, MIT and the University of California, Berkeley, and he kept noticing it as one possible track for exploration.

“It always intrigued me,” Pullannagari says. “Especially AI in health care."

His curiosity was similarly piqued last year when Associate Professor Sharlee Climer told him about a Biological Data Science course she was teaching in the fall semester, designed to give students an introduction to key areas of biological data science and provide them hands-on experience processing and analyzing genetic data.

Pullannagari eagerly enrolled, joining five graduate students and nine other undergraduates in the class. Over the course of the semester, they downloaded data sets associated with a disease of their choosing from the Gene Expression Omnibus, a public functional genomics data repository, and learned how to clean and process the data, built network models that captured correlation patterns and tested those patterns for association with a trait of interest – such as the presence of a disease. They then validated their results with independent data.

“I had built apps that were basically cutting down the gap between doctors and patients and using AI so they could have better care,” Pullannagari says. “But this is a completely different side of it. It’s network modeling, which I’ve been really interested to try. There’s been a boom recently in how it can be used for lead creating and lead generating. I thought, ‘Why not explore this in health care? Let’s see where we would go with this.’”

Pullannagari and his classmates followed the same approach that Climer has used in her own research, which is often focused on genetic data associated with Alzheimer’s.

But they used it to investigate data sets connected to a variety of other conditions. They were able to identify and validate significant associations involving gene expression levels in cells lining the colon for colon cancer patients, blood samples for rheumatoid arthritis patients and skin biopsies for psoriasis patients.

There were other relevant findings involving gene expression levels in cells lining the colon for colon cancer patients, blood samples for rheumatoid arthritis patients and skin biopsies for psoriasis patients.

There were other relevant findings involving gene expression levels in blood samples for lupus patients and microRNA levels in blood samples for cancer patients.

Some of Climer's students have been continuing their analyses this semester and are examining traits such as DNA methylation levels for COVID-19, DNA methylation levels for hepatocellular carcinoma and gene expression levels for ALS.

“Each of the validated patterns, Climer says, “represents a biomarker signature with potential to identify individuals exhibiting a subtype of the given trait.

One size doesn't fit all

The word “subtype is an import one because conditions and diseases,including most types of cancer, diabetes, rheumatoid arthritis, lupus, Alzheimer'sand countless others, are heterogeneous.

That means they have multiple forms, often arising from varying genetic, molecular or environmental factors, such as diet, exercise and exposure to toxins. In general, complex diseases have multiple genetic variants working together in complicated biological processes, possibly augmented by those environmental factors, that create a distinctive multifactorial genetic and environmental signature for each subtype.

It all can lead to different clinical presentation. One subtype might show up earlier in a person’s life than another or seem to act more aggressively, increasing the urgency to diagnose the problem and begin treatment.

Precision medicine aims to account for those differences while creating a more tailored approach to health that customizes disease prevention and treatment based on each patient’s unique genetic profile, environment and lifestyle.

But Climer says common tests for diseases often still don’t account for those differences. “The methods that other people are using are really good if there’s only one type – if it’s homogenous – but they don’t work when there are subtypes,” she says. “The correlation measures don’t work because they’re looking for the correlations for everybody, not just for a small group.”

The statistical measures used to evaluate a test’s effectiveness often rely on examining the true positive, true negative, false positive and/or false negative rates. But if the test is looking for a single biomarker associated with a subtype of a disease that represents only about 10% of all cases of that condition, the true positive rates will be inherently small, rendering the test ineffective.

The computational tools Climer has developed – and the ones Pullannagari and his fellow students learned to deploy – are geared not only at recognizing a combination of biomarkers associated with a disease but looking at how closely correlated those biomarkers are with particular subtypes, which reveal themselves as clusters in the network models.

“The methods that other people are using are really good if there’s only one type – if it’s homogenous – but they don’t work when there are subtypes."

– Sharlee Climer

“Identifying the various subtypes is essential for advancing science on multiple fronts,” Climer says. “First, the speciﬁc analytes present in the biomarker pattern of each subtype oﬀer insights that can help formulate hypotheses regarding the pathogenesis for that group, as well as highlight potential drug development targets. Second, the capacity to categorize individuals into subtypes during drug trials can be critical; a drug may work eﬀectively for one subtype but may fail due to the inclusion of other subtypes. Lastly, the realization of successful precision medicine hinges on the ability to accurately diagnose an individual’s subtype and tailor treatment to their speciﬁc needs.”

A better way

Climer has tried to advocate for a new approach to biomarker research to improve health care prevention and treatment.

“She came up with a general method for estimating networks of interacting or associating elements that blew away all the alternative analyses,” said Alan Templeton, the Charles Rebstock Professor of Biology Emeritus at Washington University in St. Louis and one of Climer’s PhD advisors. “Genomics allows us to look at literally millions of genetic variants at a time, but most analyses could only analyze them one by one. Even looking at pairs was extremely difficult. But genes rarely work in isolation; they interact with one another and with the environment to produce the traits that people have.

“This is the reality that all geneticists know, but this reality was a computationally difficult problem – indeed, seemingly

impossible – so the ﬁeld was dominated by single-factor analyses with multiple factor data bases. Sharlee came up with an algorithm that could estimate networks of interacting or associated elements that broke this ﬁeld wide open.”

In 2024, Climer highlighted her methods in a presentation to the Hope Center Neurogenetics and Transcriptomics Group at Washington University in St. Louis.

“To get people to come, I titled it: ‘We’re using the wrong statistics for precision medicine,’” Climer says. “I had a packed room, and I showed them, ‘These popular association testing methods are wrong, and these prevalent correlation measures are wrong.’ And then I explained how I’m doing it. I’ve developed tools for capturing subtypes by using data distributions for association testing and correlation metrics that evaluate each type of alignment. I convinced them all of it. But I still don’t see the change out there.

“The current methods are obviously wrong, and yet editors are reluctant to send my papers out. It seems like they’re looking for incremental advancements, not for a whole diﬀerent way to approach things.”

She’s hoping the results from last fall’s biological data science course can help change that.

Seeing is believing

Pullannagari has continued working with Climer through an independent study course this semester and has been creating visualizations for each of the validated results produced by him or his classmates. He’ll then work to develop a manuscript that demonstrates the eﬀectiveness of Climer’s methods across a variety of conditions or diseases.

“They show how the code or the approach that we took is universal and would be useful in all of the use cases that we have already done,” Pullannagari says. “It provides a broad spectrum to show how it can be used in diﬀerent sides of biology, especially for gene expression and other diseases. It works perfectly ﬁne.”

Climer intends to submit the paper for journal publication with each of the students serving as a co-author.

“I think we’ll get something good because there are quite a few really strong results,” Climer says. “What we have in the discovery set is almost identical in the validation, and that’s completely independent data. There’s got to be something there. It can’t just be by chance.”

The work could ﬁnd an audience among other computer science researchers or clinicians who might use speciﬁc results to guide their research on a particular disease or a drug that might alleviate it.

Pullannagari feels fortunate to have had the opportunity to be part of the work, and he’s moved into other areas of health care-connected study while serving as a research assistant investigating lung cancer drug binding aﬃnity at UMSL’s Center for Nanoscience.

“It’s been amazing,” he says. “I would really want to explore this ﬁeld. I didn’t expect it to be this impactful and exciting to work on.”