Sanne Abeln appointed Professor of AI Technology for Life

New research group links biology and computing science

Utrecht University has appointed Sanne Abeln as Professor of AI Technology for Life. Abeln and her research group will focus on developing technology in the field of artificial intelligence (AI) with the aim of gaining more insight into complex biological systems, such as cells, organisms and ecosystems. In this way, Abeln and her colleagues form a bridge between the university's Department of Biology and Department of Information and Computing Science. Abeln will start on 1 April.

Portret Sanne Abeln
Sanne Abeln

At the moment, major developments are taking place in the field of AI. Take ChatGPT, the ‘intelligent‘ chatbot that is able to have human-like conversations and to write convincing texts on almost any topic imaginable. ChatGPT is an AI application trained with huge amounts of texts. Based on all those texts, ChatGPT is able to predict what text is likely to come next in response to a question of a user.

Large amounts of data

In biology, things are also happening fast. Abeln: "Thanks to new methods and techniques, large amounts of new biological data can be collected very quickly, such as genetic data and data about proteins."

You would say that all this data can be used to train AI applications, so that those applications can make predictions about biological topics. But according to Abeln, it is not straightforward to apply the most powerful AI methods directly to biological data.

Suppose a researcher wants to train an AI model to predict, based on genetic data, whether a particular animal will get sick. The researcher would need data from large numbers of examples of animals of the same species, both animals that got sick and those that did not. Abeln points out that ‘phenotypic’ traits, observable traits such as whether an animal is sick or not, are often missing from data sets and that data sets contain insufficient numbers of examples. It is also very expensive to create good and complete datasets that contain the numbers of examples needed. Abeln and her colleagues are therefore looking for smart solutions that will allow current AI algorithms to be trained for biological data with fewer examples.

Biological data are substantially different from the data for which current AI applications have been developed.

Complex living systems

Abeln also emphasizes that living systems are very complex. This means that it is not yet clear how these systems actually work. For example, the effect of a particular mutation in the DNA of a cancer cell can give the cell properties that make it divide faster. But for many types of mutations, researchers do not yet understand the exact effects on the cell and its environment.

Abeln: "Biological data are therefore substantially different from the data for which current AI applications have been developed. We know what an English text should look like, but we have no intuitive idea of what a certain DNA sequence means. That makes it difficult to assess whether a prediction made by an AI is meaningful. You really need biological expertise to understand it."

In our new group, we are going to work on AI applications that not only makes better predictions, but for which we can also explain how each prediction was made.

Explainable AI

There is already an AI application that is making a big impact in biology: AlphaFold. To function properly, proteins must fold into a specific three-dimensional structure. With AlphaFold, it is possible to predict this structure based on the sequence of amino acids that make up the protein. Before AlphaFold existed, it was common practice to determine the structure of a protein using expensive and time-consuming experimental techniques.

Even though AlphaFold can predict the structure of a protein, biologists do not actually understand how the application does that. According to Abeln, AlphaFold therefore has not yet led to new insights about how proteins fold. Abeln: "In our new group, we are going to work on AI applications that not only makes better predictions, but for which we can also explain how each prediction was made. On what aspects of the data does an AI model base its predictions? This is called explainable AI. More understanding about how predictions are created will lead to more understanding about complex biological systems."

Multidisciplinary

An example of a topic the group will focus on is the interaction between plants and microorganisms. Ultimately, this could lead to an AI application that predicts which combination of microorganisms in the soil will lead to the optimal growth and development of a particular plant.

Members of the new chair group will collaborate with and contribute to the Utrecht Bioinformatics Center. Abeln's team will be multidisciplinary, with some researchers focusing more on computing science and others more on biology. Abeln herself studied mathematics and computing science, received her PhD in bioinformatics at Oxford and then did a postdoc in biophysics. Abeln explains that her broad experience now comes in handy. Abeln: "I actually still use all these disciplines in my research. You have to know something about a pretty large number of topics to be able to nuance these kinds of research questions."