In previous posts, we’ve alluded to the ever-expanding wealth of Big Biological Data, and the increasing capacity of biomedical informatics to convert this data into knowledge, cures, and cash. Here, I’d like to clarify the source of this approach’s power. Rather than relying on strong individual signals to reveal the causes and answers to disease, bioinformaticians are unearthing the complex webs of weak associations that underlie biological (mal)function.
The need for such methods is illustrated by the “missing heritability problem”. As Gregor Mendel was lucky enough to find and rigorous enough to observe, many traits such as plant seed color are passed from parent to offspring in a predictable manner. With the advent of molecular biology, it became clear that these traits are determined by variants in parental DNA, called alleles, which are inherited by the cells that make up the next generation. However, many other traits such as height, diabetes and Crohn’s Disease, though clearly heritable, can’t be traced to single allele and neatly predicted by a high schooler’s Punnett Square. For instance, a casual glance around one’s social network will confirm that parental height often corresponds to their child’s chance at making the basketball team. Tall parents beget tall children, seems simple enough. Yet height is determined by at least 40 different genes, which when combined still only explain 5% of the height variance of tens of thousands of people! How is it that 40 supposedly clear signals can’t pinpoint inheritance patterns we can plainly see? In the past decade, it’s become clear that most complex traits can’t be understood by finding a few smoking guns, but rather by connecting hundreds of scattered embers. Thus, to understand complex diseases, we must untangle the weak, noisy contributions of many, many genes.
Believe it or not, this is the type of problem that twoXAR’s software architect Carl worked on at NASA. To study extraterrestrial objects, NASA scientists record their electromagnetic emissions using instruments such as radio telescopes. As these objects are really frickin’ far away, radio signals they emit are extremely weak and noisy. However, what this data lacks in clarity, it makes up for in abundance. The concept goes like this: if a signal is even slightly more consistent than random noise, over lots and lots (and lots) of measurements, its pattern will manifest. All you need then is some clever algorithms to detect it. Fortunately, Carl’s and his ilk are some pretty clever folks.
When seven leading geneticists were interviewed about how to solve the missing heritability problem, one common theme that emerged was the need for more data, and more different types of it. Here at twoXAR, we’ve taken that concept to heart by querying multiple measurements, databases and tissue types in our search for protein networks linked to disease, and hiring folks like Carl to help build effective telescopes.