Where Tech Meets Bio

Where Tech Meets Bio

Deep Dives

Cancer as a Data Problem: What AI Is Doing in Oncology

We track what has moved from promise to proximate execution, from AI-assisted candidate design with 2026 trial targets to agentic workflows that aim to handle multi-step oncology research tasks

BiopharmaTrend's avatar
BiopharmaTrend
Feb 27, 2026
∙ Paid

Cancer can be looked at as a data problem because a tumor is an evolving population of cells, each accumulating mutations, signaling to neighbors, evading immune surveillance, adapting to treatment. The challenge of modeling has historically outrun the tools available to do it, but computers have been catching up.

Transformer architectures trained on biological data are beginning to predict drug response, generate therapeutic hypotheses, and identify which patients are likely to benefit from which treatments (part of a broader push that includes early attempts at virtual cell models) tasks that previously required years of wet-lab iteration. Some of that work is still early, though a handful of results have made it far enough through validation to be worth paying attention to.

  • Google Research, Google DeepMind, and Yale spent much of 2025 scaling C2S-Scale, a language model that reads single-cell RNA data as text; the 27-billion-parameter version, released in April, came in October with wet-lab validation of a model-generated hypothesis about making immune-”cold” tumors visible to T cells.

  • A collaboration between Microsoft Research, Providence Health, and the University of Washington took a complementary approach: GigaTIME, published in Cell in December, routinely converts pathology slides into virtual immune-protein maps, surfacing over 1,200 significant associations across 14,256 patients.

  • At Davos in January, Demis Hassabis now put Isomorphic Labs‘ first trials, primarily oncology candidates, at end of 2026; the company followed this month with IsoDDE, a general-purpose drug design engine that reportedly doubles AlphaFold 3’s accuracy, already deployed across its oncology programs.

Role of artificial intelligence in the cancer treatment continuum. Source: Current AI technologies in cancer diagnostics and treatment

Not all of it is language-model work. At the detection end, Linköping University researchers showed that a 32-sensor electronic nose can distinguish ovarian cancer in blood plasma with 97% accuracy in ten minutes—no biomarker panel, just pattern recognition across volatile compounds.

Further upstream, MIT and Microsoft developed CleaveNet, which designs peptide sensors that coat nanoparticles, circulate through the body, get cleaved by tumor-associated proteases, and shed fragments detectable in urine.

AI agents are gaining traction as well: a recent Nature article by an international team of authors, including S. Azizi from Google DeepMind, highlighted the potential of LLM-based agents in tackling complex problems across oncology research.

Roughly a year of developments, spanning diagnostics, molecular sensing, therapeutic modeling, and clinical timelines. The question worth asking isn’t whether AI is entering oncology (it already has) but what it’s actually doing there now, and why cancer turned out to be such a productive target.


📊 The Data Angle

Cancer is a highly complex disease, driven by a multitude of interacting processes within the patient’s body. To capture this complexity, the cancer research community has generated vast amounts of molecular and phenotypic data aimed at comprehensively characterizing the hallmarks of cancer. Breakthroughs in high-throughput technologies have accelerated the production of omics data, bringing in the era of “big data” in oncology. Big data in this context is defined as datasets with two key properties, they:

  1. Contain sufficiently rich information to yield novel insights into fundamental biological and clinical questions.

  2. Their storage and analysis require computational infrastructure beyond what is typically available to an individual researcher

A paradigmatic example is The Cancer Genome Atlas (TCGA) which comprises ~2.5 petabytes of raw data (roughly 500x storage capacity of a standard laptop in 2026) and necessitates specialized infrastructure for data management and analysis. Its impact has been profound: from its launch in 2008 through March 2022, TCGA was cited in at least 10,242 scientific publications and referenced in 11,054 NIH grants according to PubMed searches.

Such large-scale datasets are foundational for effective AI applications since they ensure robust training of algorithms for cancer diagnostics, early detection, therapy development, and treatment optimization.

One illustrative example of treating cancer as a data problem comes from Noetik, a San-Francisco-based biotech. Starting with non-small cell lung cancer, they’ve assembled a spatial multi-omics atlas of over 1,000 cases combining protein mapping, H&E staining, spatial transcriptomics, and whole exome sequencing to train their transformer-based AI engine, OCTO. With roughly 10% of their dataset linked to real clinical outcomes like immunotherapy response, they aim to identify which drugs work for which patients, facilitating more targeted cancer therapies.


1️⃣ Cancer Diagnostics

AI has become widely adopted in medical imaging and digital pathology—the two primary branches of AI-driven cancer diagnostics, together accounting for approximately 80% of AI applications in oncology.

In radiographic imaging, AI improves tumor detection, classification, and treatment planning by converting qualitative image characteristics into quantitative radiomic features (e.g., size, shape, texture). Deep learning models can match or surpass human performance in specific tasks, including:

  • Malignancy risk estimation. Using 16,077 lung nodules from the National Lung Screening Trial, a CNN-based model developed by Venkadesh et al. achieved strong performance, with an AUC of 0.93.

  • Tumor screening. In a study of 461,818 women, 260,739 screened with AI support by 119 radiologists had a breast cancer detection rate 17.6% higher than the control group. Similarly, Google DeepMind‘s breast cancer model trained on mammograms reduced false positives by 5.7%.

  • Deep-learning image reconstruction. Because radiation dose and image quality are inversely related, dose reduction is limited by noise. Deep neural networks can convert low-dose, high-noise CT images into high-dose–quality equivalents. In 2019, GE Healthcare received FDA clearance for its deep-learning reconstruction engine.

In digital pathology AI is assisting in diagnostic support and biomarker discovery through analysis of gigapixel whole-slide images. Models like Prov-GigaPath and CHIEF use self-supervised learning and have achieved state-of-the-art results. Prov-GigaPath led in 25 of 26 tasks, including a 23.5% AUROC (measure of diagnostic accuracy) improvement for EGFR mutation prediction in TCGA, while CHIEF outperformed competitors by 36% in cancer cell detection and tumor origin identification.

⭐ Aiforia

Aiforia is a Helsinki-based digital pathology company providing AI-powered image analysis solutions for clinical and veterinary diagnostics. Its CE-IVD marked clinical suites cover multiple cancer types, including Breast, Lung, Prostate and Colorectal Last October, Aiforia was selected by Institut Curie as a partner for AI-assisted cancer diagnostics. The company later expanded its AI platform by integrating Vision Transformer architecture into its proprietary Foundation Engine. Over the past six months, Aiforia also launched two new clinical suites: Gastric Cancer and Lymph Node Metastasis.

⭐ PathAI

PathAI is a Boston-based AI pathology company whose flagship platform, AISight Dx, serves as an image management system for AI-powered histopathology workflows. In June 2025, AISight Dx received FDA clearance for use in primary diagnosis in clinical settings. The platform has since been deployed through partnerships with institutions like Austrian pathology labs, Utrecht UMC, University Hospital Zurich and most recently a nation-wide network of Labcorp labs. In December 2025, PathAI’s AIM-MASH AI Assist became the first AI-powered tool to receive FDA qualification for use in MASH clinical trials.


Digital Pathology: Slides, AI, & Challenges

Digital Pathology: Slides, AI, & Challenges

Illia Terpylo and BiopharmaTrend
·
August 14, 2025
Read full story

2️⃣ Cancer Genomics & Early Detection

Precision oncology is transforming by the integration of AI and genomics, leading to advances in mutation detection and biomarker discovery. Next generation sequencing (NGS) has deepened our understanding of tumor biology by identifying somatic and germline pathogenic variants, while multiomics enables more comprehensive characterization of tumor heterogeneity and therapeutic targets. Key mutations in genes like BRCA1/2, KRAS, and TP53 remain central to cancer initiation and progression.

Building on these genomic foundations, AI tools have been efficient in detecting genetic mutations from imaging and histopathological data, cutting the time and cost of traditional genetic testing. Shao et al. used a CNN on PET/CT scans to identify EGFR mutations in lung adenocarcinoma (AUROC: 0.73), Cao et al. applied transfer learning to boost colorectal cancer mutation detection from an AUROC of 0.65 to 0.93, and Aburass with colleagues achieved an F1 score of 0.831 for gene mutation classification using a hybrid ensemble on Kaggle’s Personalized Medicine Dataset. Together, these models highlight AI’s growing capacity to extract clinically meaningful genetic insights across diverse cancer types.


Five Genomics Watchpoints for 2026

Five Genomics Watchpoints for 2026

BiopharmaTrend
·
Feb 20
Read full story

Beyond mutation detection, AI is equally reshaping biomarker discovery, accelerating the field’s shift toward non-invasive, molecularly specific markers through precise analysis of histopathological images. An ML framework applied to glioblastoma liquid biopsies identified 42 genes and seven enriched pathways linked to 12-month survival, while Liu et al. uncovered two previously unknown T-cell exhaustion subtypes with distinct clinical profiles. In ovarian cancer, Kawakami et al.’s random forest classifier incorporating 32 biomarkers from standard blood tests achieved 92.4% accuracy in differentiating malignant from benign tumors.

⭐ Freenome

Freenome is a Brisbane-based biotech focused on early cancer detection through blood-based testing. Its lead program targets colorectal cancer and is supported by the large-scale PREEMPT CRC study with over 40,000 participants. The company is also advancing early lung cancer detection through the PROACT LUNG study. Most recently, Freenome launched the Vallania study to support the development of blood-based screening tests for multiple cancers.

In November, the company inked a deal with Roche worth over $200M to develop and commercialize cancer screening tests outside the US. Freenome has also announced expanded AI and DL initiatives, accelerated by NVIDIA, to further improve personalized multi-cancer detection.

⭐ DELFI Diagnostics

DELFI Diagnostics, from Palo Alto-based, is also focused on early cancer detection. Their platform is based on the proprietary fragmentomics technology, which analyzes cell-free DNA patterns in simple blood samples using AI and NGS. While conventional liquid biopsy methods rely on detecting known cancer-associated mutations, DELFI’s platform analyzes millions of cell-free DNA fragmentation data points to identify cancer-associated patterns across the genome.

Its approach is supported by a landmark 2019 Nature publication and an ongoing clinical trial aimed at validating its blood-based lung cancer screening test.


3️⃣ Cancer Drug Discovery & Development

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 BiopharmaTrend (BPT Analytics Ltd) · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture