Much of the recent research in the field of early cancer detection has centered on cell-free DNA, a concept that most people, clinicians included, have never heard of. But its existence makes intuitive sense. Cells, like anything else, don’t simply up and vanish when they die. They break down into the raw materials from which they were comprised. Some of those component parts are fragments of nucleic acids from the cells’ DNA and RNA. Think of them like cellular garbage blowing along before being swept up by the blood’s street sweeping system.
Such real-time information on cellular turnover in the body is, of course, of great interest to anyone studying what’s going on with a disease like cancer. To this point, much of the work around cell-free DNA (cfDNA) in cancer has centered around trying to detect tiny traces of mutated DNA shed into the bloodstream from cancerous cells. The problem with using this technique for early cancer detection is that, during the early stages of a cancer, the fraction of tumor signal as a total of cfDNA in the blood is incredibly, infinitesimally small—less than a tenth to a hundredth of 1%. If you do the math (and we did, in this Perspective Paper), you start to realize some serious limitations on tumor DNA as an early detection tool.
To reliably detect cancer (at around 95% sensitivity) you’d need to draw 15 tubes of blood; not something most patients will be eager to participate in as routine blood work. That’s one problem. The other challenge you run into is that there is only so much you can say with confidence even when you do detect a mutation.
Why? Because not every mutation, even in known cancer-driver genes, represents a malignancy. For example, in the human eyelid, multiple cancer genes are under strong positive selection (found in 18%-32% of cells), including most of the key drivers of cutaneous squamous cell carcinomas.1 That’s almost up to a third of cells showing cancer-associated mutations in otherwise physiologically normal skin. If cells from that epidermal layer (for example) were to shed DNA into the bloodstream, a test based solely on calling mutations might incorrectly identify those somatic mutations as cancerous—leading to poor specificity (false positive results).
Recognizing that identifying circulating tumor DNA (ctDNA) in the blood was a classic needle-in-the-haystack problem, some of us at Freenome started to wonder about the rest of the haystack—the other 99.99% of the cell-free DNA circulating in the blood. Where does it come from and what can it teach us about the health status of an individual?
Our partners at the Medical University of Graz in Austria are doing some leading-edge work in this area. In 2016 graduate student Peter Ulz and Professors Ellen Heitzer and Michael Speicher published a study in Nature Genetics, “Inferring expressed genes by whole-genome sequencing of plasma DNA,” showing how patterns of gene expression could be inferred from cell-free DNA. To make their results more accessible, we’ve put together a short summary of the paper here.
Their study was based on the knowledge that cfDNA that resists degradation long enough for analysis consists primarily of sequences that were bound within, and protected by, nucleosomes—the DNA-protein complexes within which DNA is organized. Because nucleosomes stick to DNA very tightly, they block other cellular components, like the proteins that transcribe DNA to RNA, from accessing the DNA to which they’re bound. Consequently, nucleosome binding in the cell is a dynamic process: they will bind, unbind, and move around as necessary to allow or restrict access to the underlying DNA.
Because nucleosomes must move around to allow the cell’s transcriptional machinery access to DNA, patterns in this nucleosome positioning, or “nucleosome footprints,” at a particular gene vary depending on whether the gene is actively expressed or not. In particular, the “beginning” of an actively expressed gene (the transcriptional start site, or TSS) tends to be less tightly packaged within nucleosomes to allow transcription to occur more readily. Given their lack of protection by the nucleosome, TSS’s corresponding to actively expressed genes were expected to be under-represented in cfDNA.
To test this hypothesis, the study authors compared differences in cfDNA sequencing coverage between transcriptionally silent and highly-transcribed genes. After establishing that silent and highly-transcribed genes had different coverage patterns, and that larger changes in transcription level lead to larger changes in coverage, they then assessed the sensitivity and accuracy of gene-expression predictions based on cfDNA sequencing–coverage analysis.
Finally, they confirmed the technique of nucleosome footprint analysis by determining whether blood samples from patients with cancer were informative for expressed cancer driver genes. They were, as predicted. Again, you can find a helpful review of their foundational work here.
The most important implication of this work is that nucleosome footprints—inferred from cfDNA sequencing coverage and analyzed through machine-learning techniques—can be used to develop classifiers to sensitively and accurately predict expression of certain genes from cfDNA alone, both in healthy individuals and those with cancer.
Nucleosome footprints are known to vary by cell type, and prior research has demonstrated that, in healthy individuals (theoretically more representative of early-stage patients with low-tumor fraction), most circulating cfDNA is derived from immune cells.2,3 Freenome is currently investigating whether the techniques outlined above may be used to similarly infer epigenetic changes in immune cells to provide us with valuable insights into cancer’s interaction with the rest of the body.
The Medical University of Graz continues to be a leader in this important research, and we’re excited to deepen our collaboration with support from the Christian Doppler Research Organization.
As Freenome continues to move toward a systems-biology approach to cancer detection, our artificial intelligence platform is the key to realizing the true clinical potential of cell-free DNA. Only through advanced machine-learning techniques can we hope to determine clinical significance from subtle correlations among billions and billions of data sets. Inferring gene-expression patterns in this way will provide important clues to cancer’s underlying biology, leading to new and noninvasive ways to detect and monitor tumor activi
ty over time, and, eventually, helping physicians disrupt tumor formation altogether.
Imran Haque, PhD, Chief Scientific Officer, Freenome