Can AI Diagnose Better Than Humans?
With Microsoft’s agentic AI orchestrator solving NEJM cases at 85% accuracy, we briefly profile 22 companies developing diagnostic AI across imaging, genomics, and pathology
A week ago, Microsoft released the AI Diagnostic Orchestrator (MAI-DxO), a platform designed to utilize five large language model agents functioning as a panel of virtual doctors to develop disease treatment strategies. To evaluate its capabilities, the Orchestrator was tested using 304 medical case studies sourced from the New England Journal of Medicine. MAI-DxO pushed OpenAI’s latest o3 model to solve 85.5% of them, quadrupling the success rate reported for experienced clinicians working under identical constraints.
This launch echoes the words of medical futurist Bertalan Meskó who once called artificial intelligence the “the stethoscope of the 21st century”. But AI diagnostics didn’t start with LLMs, it goes back to the 1950s, with rule-based systems and early diagnostic logic.
In this article: How Diagnostic AI Got Here — Tradition vs. Modernity — Molecular Diagnostics — Cancer Diagnostics — Infectious Diseases — Rare Genetic Diseases — Clinical Imaging — Cardiovascular Diseases — Dermatology — Respiratory Diseases — Eye Diseases — Digital Pathology — Prospects & Challenges
How Diagnostic AI Got Here
AI in medical diagnostics began in the 1950s with foundational AI concepts which transitioned into practical applications by the 1970s. When INTERNIST-1, the first AI program for doctors, debuted in 1971, it could only rank likely diagnoses from a limited rule set. Five years later MYCIN moved the field forward by recommending antibiotics for blood infections, hinting at AI’s potential to save lives.
The 1980s and 1990s refined those early ideas as DXplain (1986) broadened differential lists for general practitioners, whereas CorSage (1989) calculated individual coronary-artery-disease risk—an early taste of personalized medicine.
In the 2000s, deep-learning models began reading images faster and, in some tasks, more accurately than human specialists. By 2017 Arterys earned the first FDA clearance for cloud-based AI cardiac MRI analysis, cutting reporting time from 30-60 minutes to less than a minute.
Fast-forward to 2025: the FDA green-lit 1247 AI-enabled medical devices in total (149 this year).
Tradition vs. Modernity
How useful AI for diagnostics actually is? In the npj Digial Medicine article "Transforming diagnosis through artificial intelligence" L. D’Adderio and D. W. Bates presented findings from a detailed five-year study examining AI adoption for stroke care at three major UK stroke centers. They highlight an AI-caused shift in diagnostic processes, particularly during the critical 'door-to-treatment' phase, which runs from the first scan to the final treatment decision.
Traditionally, diagnosis relies on iterative human assessment, hypothesis generation, and refinement to determine a definitive diagnostic label. However, AI can now provide clinicians with an immediate initial diagnostic label often even before a thorough clinical assessment.
Earlier examples date back to 2017, when IBM’s Watson took 10 minutes to review and provide treatment recommendations based on genetic data of tumor cells. The respective work on the same task done by human experts lasted 160 hours.

D’Adderio and Bates note a key limitation: AI primarily offers binary predictions about diagnoses rather than guiding clinicians through diagnostic reasoning.