16
Folds
Esben Lorentzen and Casper Bruun Jensen
We begin with a negative prediction about predictions.1 Contrary to widespread claims, artificial intelligence will not revolutionise drug design. However, analogous with the Human Genome Project (HGP), which failed to deliver on grand medical promises but took genomics into uncharted territory, we also predict that over the next five years, high quality structural models generated with machine learning programs for a wide range of molecular-biology systems, will transform the biomedical sciences.
When Gilles Deleuze (1993: 10) called origami, the Japanese art of folding paper, ‘the model for the sciences of matter,’ he evoked proteins. For decades, these heterogeneous and variable folded structures of amino acids have primarily been determined or ‘solved’ experimentally by growing crystals – a haphazard process often likened to an occult science – and shooting x-rays at them to obtain diffraction patterns.2 Over the last few years, however, a new deep reinforcement learning system, Alphafold, has enabled structural predictions of any protein (Jumper et al. 2021). As predictions about accelerated drug discovery shape medical and political hopes and visions, protein structures are folded into biosocial futures.
The HGP, an immense international collaboration that aimed to sequence the entire human genome, sparked enormous excitement in the early 2000s. The genome – the total set of an organism’s genetic information – is coded as strings of nucleotides (A, G, T, or C) within DNA molecules. In humans, these nucleotides are distributed on 23 pairs of chromosomes – these sequences provide ‘instructions’ for making the proteins that keep us alive. The instructions for building something as complex as a living organism are quite long. A single copy of the human genome has an estimated 3 billion base pairs-worth of DNA.
In 1999, the head of the Human Genome Research Institute proclaimed that personalised genetic tests and treatments for cancer, heart disease and other common diseases would be available within a decade. With a map of the genome at hand, along with new sequencing technologies and databases, it would soon be possible to ‘mine miracles’ (Collins 1999). President Bill Clinton followed suit, declaring that the Human Genome Project (HGP) would ‘revolutionize the diagnosis, prevention and treatment of most, if not all, human diseases.’3 At the time, the HGP was a decade old. Celera Genomics, headed by J. Craig Venter, began as the HGP’s competitor, seeking to gain control over the potential ‘pharmaceutical treasure trove’4 of genome patents. In the end, though, these competing groups joined forces, and presented the first complete genome map on the fiftieth anniversary of Francis Crick and James D. Watson’s publication of the double-helix structure of DNA.
In the dry words of Isabelle Stengers (2020: 218), the HGP proved ‘of questionable fecundity,’ and nowhere more so than in the ‘vastly oversold’ 5 promises of medical miracles. But while it failed to revolutionise medical treatment, technological advances associated with massive parallel sequencing and growing sequence databases opened new scientific horizons. Like the exciting days when the Dutch cloth merchant and father of microbiology van Leeuwenhoek6 (1632–1723) built the microscopes that first made it possible to see living things (‘animalcules’) in rainwater, saliva, urine and semen, the new sequencing technologies made microorganisms and their previously unknown realms partially visible. Thus, a vast chasm yawned between expectations and outcomes. The expectation was that the elements were basically known, and all that was needed was some hard work to fill in the details on the map. The outcome was a totally different landscape, populated by myriad mysterious actors – microorganisms – the existence of which was neither anticipated nor understood. No surprise, then, that the grand medical promises failed!
Recent breakthroughs in protein structure prediction through AI mapping methods have led to a fresh round of promises about an imminent, monumental leap in drug discovery and disease treatments. Founded in 2010, DeepMind Technologies Ltd7 initially trained neural networks on simple computer games. By 2014, AlphaGo became the first machine learning system to defeat top-level players in the complex board game Go. Since then, similar techniques have been used for games, language models, chatbots, and sports predictions. But by far the most impressive success has been Alphafold, a deep learning system that predicts protein structures (Jumper et al. 2021). Folding chains of amino acids form 3-dimensional (3D) protein structures; because these structures are crucial to determining the function of proteins, scientists have dedicated great effort to experimentally resolve them. Methods such as x-ray crystallography, nuclear magnetic resonance and cryo-electron microscopy are, unfortunately, costly, slow and unreliable. Since an astonishing variety of proteins exists, small wonder that researchers have looked to computational models for help.8 For around three decades, a competition with the snappy title ‘Critical Assessment of Techniques for Protein Structure Prediction’ (CASP) has invited teams to predict 3D protein structures given information only about the linear (1D) sequence of amino acids.9 Alphafold won this tournament in 2018, but the real breakthrough came in 2020, when Alphafold2 blew away the competition and achieved a near perfect score of 92.4 of 100. By the summer of 2022, the Alphafold team, together with the European Bioinformatics Institute, had made more than 200 million protein structures available for download.10
A flurry of comments hailed these results as unprecedented and revolutionary. A 2021 Nature11 editorial began with an unnamed researcher declaring that: ‘I didn’t think we would get to this point in my lifetime.’ Venki Ramakrishnan, winner of the Nobel Prize in chemistry for structural studies of the protein synthesising ribosome, evoked ‘stunning advances’ happening ‘decades before many people in the field would have predicted,’12 and looked forward to a fundamental transformation of structural biology. Alphafold2 also proved able to accurately predict the structures of protein complexes, despite being trained primarily on single-chained proteins.13 While the system is currently restricted to proteins and does not provide direct insights into their interactions with small molecules (e.g., drugs), a recent preprint introduced the RoseTTAFold All-Atom, which encompasses proteins, nucleic acids, ions, and small molecule ligands (Krishna et al 2023).
Similar to when the HGP came to fruition, these developments are leading to medical predictions that stretch far beyond basic research. Before long, the claims go, accurate protein structures will ‘open the door to deep learning-based design of protein-small molecule assemblies’ and thereby broadly ‘impact’ (Krishna et al. 2023) if not ‘accelerate’ drug discovery (Callaway 2020). However – this is our negative prediction – it is unlikely that these advances will fundamentally alter the landscape of drug discovery, especially because a lack of knowledge about protein structures is rarely the major hurdle for making effective drugs.14 As the HGP made abundantly clear, only a few diseases result from a single faulty gene or protein. Rather, the majority involve complex interactions among many components across multiple systems. Even where disease is caused by a single mutation, the effect is often a subtle change in protein structure, dynamics, or stability, rather than a distinct, ‘druggable’ target (Leslie et al 2022).
Today, thousands of scientists use the sequence databases that emerged from the HGP as daily research tools. The microorganisms have become partly visible, and it is now possible to pose new problems and design new inquiries. In this sense, the HGP initiated an infrastructural revolution in bioscience. Perhaps the science has since become less ‘occult,’ but no less complex or fascinating. Analogously, we predict that Alphafold and other deep learning systems will engender another deep infrastructural transformation of bioscience. There are too many uncertainties to guess the exact scientific implications—unless you are truly in the business of fortune telling – but the availability of comprehensive databases with high-quality structural protein information will fundamentally alter research project development and experimental design. There will certainly be many new folds between science and society. Biomedical futures will keep on folding. But we should not expect to ‘mine medical miracles’ anytime soon.