How Much Dirt is Too Much Dirt — Quality Metrics in Gene Expression Analysis

At twoXAR we bring together a lot of disparate data to rapidly identify disease treatments. It’s through these different data that we gain our predictive power. However, more data isn’t always better — not if the new data is of poor quality. In other words, quantity doesn’t trump quality, and that’s because of a common data science saying: bad data in = bad data out. Because of this, we check the quality of our input data at multiple levels; some of this is a manual process, but we automate as much as possible.

In July’s post, (ML)²: Myths and Legends of Machine Learning, I touched on the messiness of real world data and mentioned quality control checks; here, I will expand on that with an example of one of the checks we use for gene expression data…

READ THE FULL POST AT MEDIUM.COM

(ML)²: Myths and Legends of Machine Learning


Skepticism is (and should be) a vital part of any science; statistics and data science are no exception. Statistician George Box nicely summed it up when he said, “all models are wrong, but some are useful”. Box reminds us that statistical models are just that: models. A simplified representation of the real-world will always have shortcomings. But we shouldn’t forget the last bit of Box’s saying: “some [models] are useful”. Although challenging, carefully constructed statistical models can be extremely…

READ THE FULL POST AT MEDIUM.COM 

How machines are able to help you find a parking spot, a great place to stay, and the next medication you might take

These three accomplishments are all possible today because of machine learning.

Machine learning continues to disrupt markets and transform peoples’ everyday lives. Yet, the public is far removed from the actual technology that drives these changes. To many, the idea of machine learning may elicit images of complex mathematical formulas and sentient robots. In fact, many of the general ideas behind machine learning are approachable to a wider audience…

READ THE FULL POST AT MEDIUM.COM

Synergizing against breast cancer

I was about twelve when I found out my grandmother had breast cancer. My parents did a good job of shielding me from the worst of the details, but there is no way to avoid fear that comes from a loved one being diagnosed with cancer. As a kid, there wasn’t much I could do, but my grandmother loves to tell the story of me trying to comfort her by telling her I was going to do research to help cure her cancer. Little did I know at the time that treating cancer is not as simple as taking a pill once a day and that even identifying the right medicine is akin to finding a needle in a haystack.

Over the next seventeen years, as I pursued undergraduate and graduate studies in biology and genetics, I filled in those knowledge gaps, but felt no closer to changing the status quo of breast cancer…

READ THE FULL POST AT MEDIUM.COM

Seeing the power of AI in drug development

Today we announced our collaboration with Santen, a world leader in the development of innovative ophthalmology treatments. Scientists at twoXAR will use our proprietary computational drug discovery platform to discover, screen and prioritize novel drug candidates with potential application in glaucoma. Santen will then develop and commercialize drug candidates arising from the collaboration. This collaboration is an exciting example of how artificial intelligence-driven approaches can move beyond supporting existing hypotheses and lead the discovery of new drugs. Combining twoXAR’s unique capabilities with Santen’s experience in ophthalmic product development and commercialization… 

READ THE FULL POST AT MEDIUM.COM

Consider Your Biases

In the wake of Donald Trump’s victory over Hillary Clinton, pundits and politicians alike have wondered, “how did we not predict this?” Theories range from misrepresentative polling to journalistic bias to confirmation bias, fueled by the echo chambers of social media. These fervent debates about bias in politics had me reflecting on the role that bias plays in science and in R&D. Sampling bias, expectancy bias, publication bias… all hazards of the profession and yet science is held up against other disciplines as relatively bias-free by virtue of its data-centric approach.

Biopharma R&D has rapidly evolved over the last few years — it is more collaborative, demands greater speed to respond to competition, and challenges many notions of “conventional” drug discovery. In my reflections, I was curious whether this rapid evolution was a harbinger of biases not conventionally associated with science — and wanted to understand how we at twoXAR aim to stay aware and ahead of such biases.

READ THE FULL POST AT MEDIUM.COM

Validating DUMA Independently

When independent scientific validation happens with new technologies it is an exciting time for both researcher and validator.

Some time ago we used our DUMA drug discovery platform to find new potential drug treatments for Parkinson’s disease. After processing over 25,000 drugs with our system, we identified a handful of promising candidates for further study. We noticed one of our highest ranked predictions was currently under study at an NIH Udall Center of Excellence in Parkinson’s Disease Research at Michigan State University.

We decided to be good citizens to the research community and provide our findings to the research team at Michigan State University. We prepared a 5-page PDF that summarized our computational prediction. When DUMA highly ranks a drug for efficacy it also provides the supporting evidence it used to make that prediction. This can include:

  • Calculated proteins of significant interest in the disease state,
  • How the drug interacts with those proteins or their binding neighbors,
  • Drugs with similar molecular substructures that have similar effects, and
  • Protective evidence found in clinical medical records.

We emailed our report to Dr. Tim Collier and figured that was the end of it. Much to our surprise we found ourselves on a phone call the next day with Tim and his colleague Dr. Katrina Paumier. Tim told us that we had independently validated work that had been going on for years.

As part of the review of the report, Tim and Katrina asked a number of questions on how we came up with the prediction we presented. We explained a bit about DUMA and how quickly it can be used to screen large databases of drugs and make predictions within a few minutes. They told us they had another promising drug under study and asked us to run it through DUMA. We returned the results on this new drug right away. It turned out this second candidate was highly predicted by DUMA to be effective in treating Parkinson’s disease. Once again our evidence matched their data, independently validating that they were on the right track with their second candidate.

Finally, Tim asked us to run one more drug through our system. He didn’t tell us much about this particular molecule, and we let DUMA process the data we collected on it. The prediction ranked this candidate relatively lower. We informed Tim that our system gave a low to moderate indication of efficacy, and supplied the evidence that DUMA had made to assign this ranking. This once again matched his own data about the compound.

Our work with Michigan State University continues today. We are working with Tim on providing new, novel compounds for further study. We have collaborated on combining the power of the DUMA drug discovery system with the expertise in Parkinson’s research labs.

This is the way heritability is found: not with a bang, but with many, many whispers

In previous posts, we’ve alluded to the ever-expanding wealth of Big Biological Data, and the increasing capacity of biomedical informatics to convert this data into knowledge, cures, and cash. Here, I’d like to clarify the source of this approach’s power. Rather than relying on strong individual signals to reveal the causes and answers to disease, bioinformaticians are unearthing the complex webs of weak associations that underlie biological (mal)function.

The need for such methods is illustrated by the “missing heritability problem”. As Gregor Mendel was lucky enough to find and rigorous enough to observe, many traits such as plant seed color are passed from parent to offspring in a predictable manner. With the advent of molecular biology, it became clear that these traits are determined by variants in parental DNA, called alleles, which are inherited by the cells that make up the next generation. However, many other traits such as height, diabetes and Crohn’s Disease, though clearly heritable, can’t be traced to single allele and neatly predicted by a high schooler’s Punnett Square. For instance, a casual glance around one’s social network will confirm that parental height often corresponds to their child’s chance at making the basketball team. Tall parents beget tall children, seems simple enough. Yet height is determined by at least 40 different genes, which when combined still only explain 5% of the height variance of tens of thousands of people!  How is it that 40 supposedly clear signals can’t pinpoint inheritance patterns we can plainly see? In the past decade, it’s become clear that most complex traits can’t be understood by finding a few smoking guns, but rather by connecting hundreds of scattered embers. Thus, to understand complex diseases, we must untangle the weak, noisy contributions of many, many genes.

Believe it or not, this is the type of problem that twoXAR’s software architect Carl worked on at NASA. To study extraterrestrial objects, NASA scientists record their electromagnetic emissions using instruments such as radio telescopes. As these objects are really frickin’ far away, radio signals they emit are extremely weak and noisy. However, what this data lacks in clarity, it makes up for in abundance.  The concept goes like this: if a signal is even slightly more consistent than random noise, over lots and lots (and lots) of measurements, its pattern will manifest. All you need then is some clever algorithms to detect it. Fortunately, Carl’s and his ilk are some pretty clever folks.

When seven leading geneticists were interviewed about how to solve the missing heritability problem, one common theme that emerged was the need for more data, and more different types of it. Here at twoXAR, we’ve taken that concept to heart by querying multiple measurements, databases and tissue types in our search for protein networks linked to disease, and hiring folks like Carl to help build effective telescopes.