Synergizing against breast cancer

I was about twelve when I found out my grandmother had breast cancer. My parents did a good job of shielding me from the worst of the details, but there is no way to avoid fear that comes from a loved one being diagnosed with cancer. As a kid, there wasn’t much I could do, but my grandmother loves to tell the story of me trying to comfort her by telling her I was going to do research to help cure her cancer. Little did I know at the time that treating cancer is not as simple as taking a pill once a day and that even identifying the right medicine is akin to finding a needle in a haystack.

Over the next seventeen years, as I pursued undergraduate and graduate studies in biology and genetics, I filled in those knowledge gaps, but felt no closer to changing the status quo of breast cancer…


Seeing the power of AI in drug development

Today we announced our collaboration with Santen, a world leader in the development of innovative ophthalmology treatments. Scientists at twoXAR will use our proprietary computational drug discovery platform to discover, screen and prioritize novel drug candidates with potential application in glaucoma. Santen will then develop and commercialize drug candidates arising from the collaboration. This collaboration is an exciting example of how artificial intelligence-driven approaches can move beyond supporting existing hypotheses and lead the discovery of new drugs. Combining twoXAR’s unique capabilities with Santen’s experience in ophthalmic product development and commercialization… 


Consider Your Biases

In the wake of Donald Trump’s victory over Hillary Clinton, pundits and politicians alike have wondered, “how did we not predict this?” Theories range from misrepresentative polling to journalistic bias to confirmation bias, fueled by the echo chambers of social media. These fervent debates about bias in politics had me reflecting on the role that bias plays in science and in R&D. Sampling bias, expectancy bias, publication bias… all hazards of the profession and yet science is held up against other disciplines as relatively bias-free by virtue of its data-centric approach.

Biopharma R&D has rapidly evolved over the last few years — it is more collaborative, demands greater speed to respond to competition, and challenges many notions of “conventional” drug discovery. In my reflections, I was curious whether this rapid evolution was a harbinger of biases not conventionally associated with science — and wanted to understand how we at twoXAR aim to stay aware and ahead of such biases.


Validating DUMA Independently

When independent scientific validation happens with new technologies it is an exciting time for both researcher and validator.

Some time ago we used our DUMA drug discovery platform to find new potential drug treatments for Parkinson’s disease. After processing over 25,000 drugs with our system, we identified a handful of promising candidates for further study. We noticed one of our highest ranked predictions was currently under study at an NIH Udall Center of Excellence in Parkinson’s Disease Research at Michigan State University.

We decided to be good citizens to the research community and provide our findings to the research team at Michigan State University. We prepared a 5-page PDF that summarized our computational prediction. When DUMA highly ranks a drug for efficacy it also provides the supporting evidence it used to make that prediction. This can include:

  • Calculated proteins of significant interest in the disease state,
  • How the drug interacts with those proteins or their binding neighbors,
  • Drugs with similar molecular substructures that have similar effects, and
  • Protective evidence found in clinical medical records.

We emailed our report to Dr. Tim Collier and figured that was the end of it. Much to our surprise we found ourselves on a phone call the next day with Tim and his colleague Dr. Katrina Paumier. Tim told us that we had independently validated work that had been going on for years.

As part of the review of the report, Tim and Katrina asked a number of questions on how we came up with the prediction we presented. We explained a bit about DUMA and how quickly it can be used to screen large databases of drugs and make predictions within a few minutes. They told us they had another promising drug under study and asked us to run it through DUMA. We returned the results on this new drug right away. It turned out this second candidate was highly predicted by DUMA to be effective in treating Parkinson’s disease. Once again our evidence matched their data, independently validating that they were on the right track with their second candidate.

Finally, Tim asked us to run one more drug through our system. He didn’t tell us much about this particular molecule, and we let DUMA process the data we collected on it. The prediction ranked this candidate relatively lower. We informed Tim that our system gave a low to moderate indication of efficacy, and supplied the evidence that DUMA had made to assign this ranking. This once again matched his own data about the compound.

Our work with Michigan State University continues today. We are working with Tim on providing new, novel compounds for further study. We have collaborated on combining the power of the DUMA drug discovery system with the expertise in Parkinson’s research labs.

This is the way heritability is found: not with a bang, but with many, many whispers

In previous posts, we’ve alluded to the ever-expanding wealth of Big Biological Data, and the increasing capacity of biomedical informatics to convert this data into knowledge, cures, and cash. Here, I’d like to clarify the source of this approach’s power. Rather than relying on strong individual signals to reveal the causes and answers to disease, bioinformaticians are unearthing the complex webs of weak associations that underlie biological (mal)function.

The need for such methods is illustrated by the “missing heritability problem”. As Gregor Mendel was lucky enough to find and rigorous enough to observe, many traits such as plant seed color are passed from parent to offspring in a predictable manner. With the advent of molecular biology, it became clear that these traits are determined by variants in parental DNA, called alleles, which are inherited by the cells that make up the next generation. However, many other traits such as height, diabetes and Crohn’s Disease, though clearly heritable, can’t be traced to single allele and neatly predicted by a high schooler’s Punnett Square. For instance, a casual glance around one’s social network will confirm that parental height often corresponds to their child’s chance at making the basketball team. Tall parents beget tall children, seems simple enough. Yet height is determined by at least 40 different genes, which when combined still only explain 5% of the height variance of tens of thousands of people!  How is it that 40 supposedly clear signals can’t pinpoint inheritance patterns we can plainly see? In the past decade, it’s become clear that most complex traits can’t be understood by finding a few smoking guns, but rather by connecting hundreds of scattered embers. Thus, to understand complex diseases, we must untangle the weak, noisy contributions of many, many genes.

Believe it or not, this is the type of problem that twoXAR’s software architect Carl worked on at NASA. To study extraterrestrial objects, NASA scientists record their electromagnetic emissions using instruments such as radio telescopes. As these objects are really frickin’ far away, radio signals they emit are extremely weak and noisy. However, what this data lacks in clarity, it makes up for in abundance.  The concept goes like this: if a signal is even slightly more consistent than random noise, over lots and lots (and lots) of measurements, its pattern will manifest. All you need then is some clever algorithms to detect it. Fortunately, Carl’s and his ilk are some pretty clever folks.

When seven leading geneticists were interviewed about how to solve the missing heritability problem, one common theme that emerged was the need for more data, and more different types of it. Here at twoXAR, we’ve taken that concept to heart by querying multiple measurements, databases and tissue types in our search for protein networks linked to disease, and hiring folks like Carl to help build effective telescopes.

What We Do: The A,B,C’s of twoXAR, Part III

So far, we’ve told you about various pieces of the twoXAR puzzle: our goal of identifying new drugs to treat human disease, our machine learning methods for drug classification, and our overall vision to improve lives more quickly and efficiently through data science. Now, I’d like to connect the dots by walking you through our drug discovery process for one of our major disease targets, Type II Diabetes—a disease that as of 2012 affects one out of every ten people in the United States.

So, where to begin? Step 1 in data science: collect data! Our first task is to identify the molecular differences between health and disease through published gene expression profiling databases. While over 99% of the human genome is shared by all members of the species, everybody’s cells exhibit differences in the expression of this genome: the transcribing of DNA instructions into RNA messengers, which are ultimately translated into protein products that execute biological functions. In other words: if the genome is a giant cookbook, then each cell type opens up a different set of pages to photocopy recipes (RNA), and follows those instructions to create a specific set of foods (proteins), which the cell will then use. Importantly, the particular recipes copied, and the number of copies made, is often changed during disease, leading to aberrant protein accumulation, which disrupts normal cellular function. Gene expression profiling studies identify these differences for thousands of distinct RNA instructions, creating a global picture of the ways in which diseased cells have gone awry.

The proteins that are made using these disease-altered RNA recipes are therefore attractive targets for drug intervention. However, proteins rarely act alone: they team up with a large network of fellow proteins in order to do their jobs in the cell. Thus, drugs that target the “coworkers” of disease-associated proteins may also prove to be effective therapies in the clinic. We therefore made use of a marvelously curated database of known protein-protein interactions to identify the buddies of proteins whose RNA instructions are significantly altered in Type II Diabetes patients, compared to healthy subjects. What’s more, this database also provides lists of drugs that interact with the disease-associated proteins and their friends.

Here’s our pot of gold: these drugs have the potential to compensate or correct for the changes caused by disease! The catch is, the gold is actually a massive stack of hay, with a few dozen golden needles buried inside. The human brain can’t make sense of thousands of data points from RNA profiles, protein networks and drug interactions alone. And no pharmaceutical company will ever pour R&D funds to investigate these connections one by one. Enter: the machine.

We developed some super sweet computer algorithms to map and simplify these large, diverse and unbiased datasets into a single biological model of disease. This model is then used to quantify each drug’s relevance to Type II Diabetes through machine learning. Machine learning provides a rigorous, unbiased way to predict which factors are relevant to a disease. As humans, it’s impossible to sort through a pile of data looking for relevant hits without our prior beliefs creeping up on us. A computer, on the other hand doesn’t privilege or discriminate against any correlations it finds. Thus, machine learning enables the discovery of drug-disease connections that researchers may have never considered on their own.

After the machine has made its predictions, we then use our human knowledge to verify its performance. Sprinkled throughout the haystack are a few known golden needles: chemical compounds that are already being used or developed to treat Type II Diabetes. What did our brilliant but ignorant machine think of these drugs?

Top 25 treatments for T2D ranked by twoXAR's algorithm

Our model successfully predicted the relevance of more than thirty drugs known to affect Type II Diabetes (see Figure above, blue bars), including both currently used clinical therapies and promising candidates that have shown significant effects in animal studies. For example, Bromocriptine, a top hit identified by our model, is an FDA-approved therapy that improves blood sugar levels and other hallmarks of diabetes. Meanwhile, NADH, another drug that was highly ranked by our algorithm, improves glucose tolerance and insulin sensitivity in both diet- and age-induced models of Type II Diabetes in mice. Our model also highly ranks many common treatments such as insulin therapy.

These successes validate the predictive power of our methods, and tantalizingly hint at the therapeutic potential of those highly ranked candidates whose effects on Type II Diabetes are currently unknown (depicted, appropriately, as “golden needles” in the chart above). Now it’s time to start sewing…

What We Do: The A,B,C’s of twoXAR, Part I

What We Do: The A,B,C’s of twoXAR, Part II

What We Do: The A,B,C’s of twoXAR, Part III

What We Do: The A,B,C’s of twoXAR, Part II

As you may recall, in a previous post, I introduced some of the biomedical informatics concepts that twoXAR uses to find new drug treatments for disease. In this post, I’d like to offer more specifics about how our work fits into wide, yet sub-divided landscape of biomedical informatics.

What often comes to mind when people think of the use of computers in the medical field is identifying patterns in clinical data. This branch of biomedical informatics, known as clinical informatics, enables researchers, physicians, and policy makers to learn new information from electronic healthcare records such as predicting disease outbreak or discovering adverse drug effects. One of the more notable clinical informatics studies revealed that Vioxx was responsible for an increased risk of heart attacks, which subsequently resulting in it being pulled from the market.

Another branch of biomedical informatics is computational molecular biology. In this field, researchers process genomic sequences to help map and understand the underlying structure of our DNA. This knowledge allows other scientists to make associations based on our genes— such as uncovering the links between heredity and disease. The work here includes powerful data processing techniques to coalesce, order, and organize the jumble of data that comes out of gene detection instruments like gene expression microarrays. The most famous computational molecular biology project was the human genome project, which was the first time society was able to sequence the entire DNA of a human being.

The field that best encompasses the twoXAR technology is translational informatics. As with clinical informatics and computational molecular biology, translational informatics uses computer science approaches such as machine learning, data mining and other statistical analysis techniques to gain new insights through computation. The biggest difference between clinical and translational informatics is that clinical informatics is concerned with finding new insights from patient data, while translational informatics is focused on translating new scientific discoveries into functional solutions for humans.

At twoXAR, we are using computational methods to find new drug treatments that have never been identified or used before. After rigorous clinical trials, we will bring these new therapeutics to your doctor so she is able to prescribe a new course of drug therapy that will produce safer and more effective results than existing treatments.

In a subsequent post I’ll talk a little more about how we use machine learning and data mining to discover new drug therapies.

What We Do: The A,B,C’s of twoXAR, Part I

What We Do: The A,B,C’s of twoXAR, Part II

What We Do: The A,B,C’s of twoXAR, Part III

Identifying Solutions for Neurodegenerative and Psychiatric Diseases

I suppose you could call me one of the men behind the curtain at twoXAR. I’ve been here all along, you just haven’t heard much from me. I’m a Data Scientist on the team, and I’m a graduate student at MIT where I’m developing techniques for studying the molecular mechanism of neurodegenerative diseases. My passion is rooted in this research, and I’m excited to tell you about it and how I’m using it to develop our company.

I share Andrews’ excitement for blazing new trails. My work gains insights into how diseases exert their detrimental effects and charts the process of technology development, which involves a lot of hacking and tinkering. My focus is on neurodegenerative diseases because very few effective drugs have been found to treat them. twoXAR is interested in them for the same reason.

In my opinion, it is the lack of mechanistic understanding of these diseases that makes drug discovery for them very difficult. This is also true of psychiatric diseases like autism and schizophrenia. twoXAR’s approach is a promising alternative to testing individual molecular targets experimentally, which academia has spent many earnest years searching for with very poor results. As an experimental scientist, I dream that the data that I work so hard to generate can be used not only by me, but by other scientists, doctors, pharmaceutical companies, and beyond, who can leverage them to better effects. (Side note from Andrew A.: We’ll have a post soon about our take on the benefits of capitalism for medicine and society as a whole.)

As you already know, twoXAR has developed an algorithm that can discover new drugs through a computer, rather than a wet lab. However, that algorithm still relies on gene expression data from patients and healthy individuals, so we’re using the same hard data from flesh and blood samples. Instead of using them ourselves, we borrow data from scientists who actually processed these samples. In order for our algorithm to work accurately, these data are carefully inspected and annotated, which is where I come in. My role is to identify the most suitable data sets about the diseases we’ve chosen to focus on from published scientific studies for our algorithm(s).

We’ll get into the nitty gritty of my work for twoXAR in a later post. For now, I’ll jump on the band wagon Desiree started and let you guess which of these three facts is false. It’s another round of Two Truths… and a Lie!

  1. I went as an enzyme to last year’s Biology department Halloween party.

  1. In my off-time, I enjoy a solid game of table tennis and a good cookie.

  1. This picture captures me injecting brains into a tube.