A deeper dive into data from our colorectal cancer screening program


Blood-based cancer screening has the potential to overcome barriers posed by existing approaches. Making cancer screening more convenient and less invasive using a low-volume blood draw may help improve adherence rates and help doctors save more lives by catching more cancers at their earliest, most treatable stages.

In Freenome’s recent poster presentation at the American College of Gastroenterology’s annual meeting, we shared initial data on our prototype blood test for colorectal cancer (CRC) screening. Beyond the encouraging preliminary results, the poster was as an important proof of concept for our machine-learning based approach to decoding the novel biology of cell-free DNA (cfDNA).

In a follow-up paper just posted to bioRxiv, Machine learning enables detection of early-stage cancer by whole-genome sequencing of plasma cell-free DNA, we dive deeper into the data to explain our approach and what cfDNA can teach us about both human physiology and pathophysiology.

 Chief Medical Officer, Girish Putcha (at right), and ML research engineer, Nathan Wan, presenting data at the American College of Gastroenterology meeting in Philadelphia.

Chief Medical Officer, Girish Putcha (at right), and ML research engineer, Nathan Wan, presenting data at the American College of Gastroenterology meeting in Philadelphia.

Focusing on a single analyte in 817  low-volume, retrospective samples—our machine-learning based approach was able to:

  • Achieve a classifier performance of 85% sensitivity at 85% specificity that—to our knowledge—exceeds that of any other cfDNA-only based test, particularly in early-stage (I and II) disease

  • Reveal contributions from both tumor and non-tumor derived signals

  • Demonstrate the importance of novel cross-validation methods for mitigating the effects of various confounders, such as the source of patient samples or the batch in which they were processed

A deeper dive into the data

The diagnostic performance of the prototype test was shown in the form of a receiver operating characteristic curve (ROC curve), which graphs the true positive rate (or sensitivity) as the first coordinate (the abscissa, for the mathematically-minded reader) and the false positive rate (or 1 minus the specificity) as the second coordinate (the ordinate).

The area under the ROC curve (AUROC, frequently also abbreviated as the AUC and referred to as the “c-statistic”) is a summary statistic that literally is an estimate of the area under the ROC curve. A “perfect” test has an AUC of 1.0 while one with no discriminatory power whatsoever (i.e., like a random flip of a coin) has an AUC of 0.5.

Non-linear relationship between total number of samples in training and AUC in test set

 This plot shows the performance of the classifier improving as we increased the number of samples available for training.

This plot shows the performance of the classifier improving as we increased the number of samples available for training.

ROC curves also clearly demonstrate the tradeoff between sensitivity and specificity – in other words that an increase in sensitivity is typically accompanied by a decrease in specificity and vice versa – because one can “move” along the curve and select different thresholds for sensitivity or specificity. For example, in the case of our classifier, our sensitivity at 85% specificity is 85%, but were we to select a specificity of 90%, the sensitivity would be 74%.

Addressing confounders: Going beyond standard approaches

While machine learning can help identify otherwise hidden patterns in high-dimensional data sets, it is also easily susceptible to learning irrelevant associations. Therefore, our data scientists need to be ever-vigilant to ensure that what our platform is learning is actually meaningful biology, and not merely artifacts related  to how, when and where the blood samples were collected, processed and analyzed.

One way we have measured our performance more rigorously is to use cross-validation procedures that stratify by confounders directly. We do this by repeatedly splitting data into different groups, or “folds,” for testing and training so that a technical or site-specific variable is completely isolated. These cross-validation procedures allow us to measure possible effects of well-known confounders. This will help ensure that we achieve a test performance that is accurate and generalizable.

What sets Freenome’s work in this area apart from similar studies is the range of cross-validation techniques performed. Typically only k-fold cross-validation is performed—stratifying the folds by the sample alone which can be extremely susceptible to confounders, a subset of which are “batch effects.” In the bioRxiv paper, we demonstrate the importance of performing multiple types of confounder analyses. In future work, we will also attempt to directly mitigate effects from a broad range of confounders. This will provide greater tools to analyze technical bias and variation, and would lead to a more specific and robust test that would survive validation.

The multi-analyte future

Our preliminary data suggests that our classifier’s performance is not attributable to tumor-derived signals alone—i.e., DNA from sources other than the tumor, possibly immune cells, is improving our predictions. Between now and our next planned readout in 2019, we will continue to refine our machine learning algorithms to discover more of the underlying biology, and will begin to explore the incorporation of multiple analytes (e.g., proteins, cfRNA, methylated DNA) into our platform.

We believe that additional biomarkers could add important new information beyond what is available from cfDNA alone, further improving the accuracy and predictive power of our AI genomics platform. (Here's a link to select data from our multi-analyte proof-of-concept study, presented earlier in 2018.) More about that exciting work from our research team in the coming year.

_Girish Putcha, MD, PhD, Chief Medical Officer