How Much Dirt is Too Much Dirt — Quality Metrics in Gene Expression Analysis

At twoXAR we bring together a lot of disparate data to rapidly identify disease treatments. It’s through these different data that we gain our predictive power. However, more data isn’t always better — not if the new data is of poor quality. In other words, quantity doesn’t trump quality, and that’s because of a common data science saying: bad data in = bad data out. Because of this, we check the quality of our input data at multiple levels; some of this is a manual process, but we automate as much as possible.

In July’s post, (ML)²: Myths and Legends of Machine Learning, I touched on the messiness of real world data and mentioned quality control checks; here, I will expand on that with an example of one of the checks we use for gene expression data…

READ THE FULL POST AT MEDIUM.COM

Comments are closed.