14 Startups in AI Protein Design: Platforms, Specialists, Modular Tools
How the 2024’s Nobel in Chemistry, 200 million predicted structures, and a $1 billion startup round shape the new fold of protein design
A Nobel, a venom cure, and a 200-million-protein database—by late 2024, AI design has earned recognition. David Baker from University of Washington collected the Nobel Prize in Chemistry for structure-guided engineering, sharing the honor with DeepMind’s Demis Hassabis and John Jumper, whose work on AlphaFold cracked the folding code. Only months later, Baker’s team—partnering with the Digital Biotechnology Lab at the Technical University of Denmark—reported AI-designed proteins that neutralize snake-venom toxins.
In this article: A Brief on Protein Structure — From Manual to Computational — End-to-End AI Workflows — The Startup Layer — All-in-One Protein Designers — Specialized AI Systems — Modular Tools — The Unfolded Problems
“By lowering the cost of protein medicines,” Baker said, “we’re taking considerable steps toward a future where everyone can get the treatments they deserve”. What once lived in theoretical models now shapes real therapeutics, and the pace looks to be picking up.
A Brief on Protein Structure
Even though proteins weren’t always designed—but they still ran the cell’s command center. Made from 20 alpha-amino acids, they assemble into enzymes, receptors, scaffolds, and transporters. Each function traces back to a linear amino acid sequence, which determines how the chain folds. The origin of proteins likely lies in early Earth’s prebiotic chemistry, though the exact path from primordial soup to functional polymer remains unsettled.
The first classification attempts appeared in the late 1700s. In 1789, Antoine Fourcroy grouped albumin, fibrin, gelatin, and gluten as members of a broader category then referred to as “albumins”. The word “protein” entered scientific vocabulary in 1838 via Jöns Jacob Berzelius. A century later, structural biology advanced with the 1958 X-ray crystallographic model of myoglobin resolved by John Kendrew, revealing how folded shape governs protein behavior. Just a few years later, scientific illustrator Irving Geis translated this atomic model into a detailed visual form—his 1961 rendering of myoglobin helped establish the visual language of molecular biology and remains one of the earliest acts of protein design by hand.
A protein’s shape hierachically—starting as a chain of amino acids, folding into local patterns like helices and sheets, forming a compact 3D structure, and sometimes joining with other chains to work as a complex:

Determining these shapes has relied on atomic-resolution techniques—X-ray crystallography, cryo-EM, and NMR. Structural data from these methods, archived in the Protein Data Bank (PDB), not only defined the field of structural biology but now trains AI systems that design proteins from scratch.
From Manual to Computational
For most of the last century, protein design meant hours at the bench and days at the diffraction beamline. Structural biologists tinkered with single residues through site-directed mutagenesis, hoping each swap would clarify how shape controls function. Progress moved by inches and every unexpected result sent teams back to the notebook.
In 1998 Rosetta arrived. A computational suite for macromolecular modeling, developed at Baker Lab, used physics-based energy functions and stochastic search algorithms to predict three-dimensional protein or nucleic-acid structures. Folding pathways could be simulated, energies scored, and entirely synthetic backbones proposed—but each run devoured CPU time and every model still faced wet-lab reality checks.
Unpredictable links between sequence and activity pushed many labs toward directed evolution. Vast mutant libraries, screened in microtiter plates or droplet devices, turned chance into strategy. The approach worked well enough to earn Frances Arnold a share of the 2018 Nobel Prize in Chemistry.
Bench experiments, high-performance computing, evolution—all offered different routes, but all collided with the same obstacle of the time elapsed. Generating, expressing, and validating candidates was a cost sink.
End-to-End AI Workflows
Now, one single workflow could thread three models in sequence: RFDiffusion sketches the backbone, ProteinMPNN proposes a matching sequence (both out of Baker’s Lab at the University of Washington) while DeepMind’s AlphaFold judges the final result.
RFDiffusion treats each protein as a cloud of atoms warming out of simulated noise; training on PDB entries teaches it the geometry of feasible folds, so the model can draft shapes never catalogued before. ProteinMPNN receives that scaffold and sweeps through sequence space, scoring chains likely to adopt the target geometry until it outputs a FASTA line that a synth lab can print. AlphaFold then closes the loop, predicting whether the designed chain will fold as intended.
After the 2020 release of AlphaFold2, a single GPU could predict a structure in under ten minutes with backbone RMSD commonly below two Å, a leap that pushed CASP14 (protein structure prediction competition) scores above ninety; the May 2024 AlphaFold3 update extends evaluation to protein–ligand and protein–DNA complexes. Four years after its initial release, the AlphaFold database had passed 200 million models—effectively mapping most reference sequences—so designers now use the predictor as a gatekeeper: if a candidate chain folds convincingly in silico it advances to the bench, and if it collapses they refine upstream assumptions rather than waste weeks on expression trials.

Madani and MacKinnon (Recursion, 2023) divided structure prediction models into two major categories by their reliance on multiple sequence alignments (MSAs)—data structures that capture co-evolutionary patterns across homologous protein sequences. MSA-dependent models like AlphaFold 2 and RoseTTAFold achieve high structural accuracy but require evolutionary context, limiting their applicability to natural or well-characterized sequences.
In contrast, single-sequence models like OmegaFold, HelixFold-Single, and Meta’s ESMFold bypass the need for MSAs by applying language modeling directly to individual sequences, enabling faster inference (up to ~60x on short chains) and extending usability to orphan and engineered proteins, albeit with some loss in accuracy on canonical targets.
All these methods are gaining traction in drug development, enzyme engineering, and synthetic biology, where custom protein functions are increasingly required.
The Startup Layer of AI Protein Design
Proteins are moving from natural workhorses to programmable materials, tailored for medicine and industry. Generative models draft novel molecules on demand, and prompt-based tools keep widening the frontier of protein engineering.
The current AI-backed protein modelling landscape comprises three distinct but interconnected currents.