Credit: Unsplash/CC0 Public Domain

Biologists at Cold Spring Harbor Laboratory (CSHL) are using a mathematical approach developed in CSHL Assistant Professor David McCandlish's lab to find solutions to a diverse set of biological problems. Originally created as a way to understand interactions between different mutations in proteins, the tool is now being used by McCandlish and his collaborators to learn about the complexities of gene expression and the chromosomal mutations associated with cancer. McCandlish says, "This is one of the things that's really fascinating about mathematical research, is sometimes you can see connections between topics, which on the surface they seem so different, but at a mathematical level, they might be using some of the same technical ideas."

All of these questions involve mapping the likelihood of different variations on a biological theme: Which combinations of are most likely to arise in a particular , for example, or which chromosome mutations are most often found together in the same cancer cell. McCandlish explains that these are problems of density estimation—a statistical tool that predicts how often an event happens. Density estimation can be relatively straightforward, such as charting different heights within a group of people. But when dealing with complex biological sequences, such as the hundreds, or thousands of that are strung together to build a protein, predicting the probability of each potential sequence becomes astonishingly complex.

McCandlish explains the his team is using math to address: "Sometimes if you make, say one mutation to a , it doesn't do anything. The protein works fine. And if you make a second mutation, it still works fine, but then if you put the two of them together, now you've got a broken protein. We've been trying to come up with methods to model not just interactions between pairs of mutations, but between three or four or any number of mutations."

The methods they have developed can be used to interpret data from experiments that measure how hundreds of thousands of different combinations of mutations impact the function of a protein.

This study, reported in the Proceedings of the National Academy of Sciences, began with conversations with two other CSHL colleagues, CSHL Fellow Jason Sheltzer and Associate Professor Justin Kinney. They worked with McCandlish to apply his methods to gene expression and the evolution of cancer mutations. Software released by McCandlish's team will enable other researchers to use these same approaches in their own work. He says he hopes it will be applied to a variety of biological problems.

More information: Wei-Chia Chen et al, Field-theoretic density estimation for biological sequence space with applications to 5′ splice site diversity and aneuploidy in cancer, Proceedings of the National Academy of Sciences (2021). DOI: 10.1073/pnas.2025782118

Journal information: Proceedings of the National Academy of Sciences