AlphaFold might be an exception

Reading: AI and Biology, Derek Lowe, In the Pipeline, Science, 19 September 2024.

In which DL argues that AlphaFold’s success in an exception in AI applied to life sciences. The problems he notes are:

  1. Data quality: the Protein Database Base is described as a growing and quality controlled source of data for AI, unlike pretty much anything else. Next best thing? “Gene sequence data would probably be the first place to look - it’s still shaggy compared to the PDB”.
  2. Data quality: “you're not going to be able to use a big pile of solid information in these [non protein folding] areas, because those piles don't yet exist.”
  3. Focus: protein folding is a constrained problem with a small language to learn and a well-defined, closed, result (a 3D folded protein structure). “You could say the same thing about RNA and DNA structure, but we don't really have enough good structural data in those areas for the magic to happen yet.”
  4. Data completeness: we know a lot about proteins. “[...] but that is not the case for many other biological problems, where we keep running into things, into whole classes of things, that we never knew even existed.” 

The advice is to look for areas where the four problems above are well advanced.

If you prefer a more cautiously optimistic view, try the 1hr+ Gradient Dissent podcast episode from “Iso” (Isomorphic Labs, Google), Accelerating Drug discovery with AI:

We're not done. In fact, it's [AlphaFold 2] opened up the door for really amazing breakthroughs, and it's really fundamentally solved the problem of predicting protein structure. But we feel there's probably on the order of 10 or so AlphaFold-like challenges that need to be solved in order to actually be able to resolve this really complex set of questions, scientific questions around how do we design a drug that is going to solve the ultimate problem and have a whole bunch of other desirable properties as it's administered