Press notes 04/05/2026

A new artificial intelligence tool that can generate millions of new molecules

The CoCoGraph system enables the creation of realistic chemical compounds that comply with the laws of chemistry. The research by the URV paves the way for the development of new drugs and materials

Finding and developing new molecules is one of the great research endeavours of modern chemistry. From the development of new drugs to the creation of more sustainable materials, everything depends on finding new combinations of atoms with useful properties. Now, a research team from the Universitat Rovira i Virgili (URV) has developed an artificial intelligence tool capable of generating millions of new molecules which, although still unknown to science, comply with the laws of chemistry and could therefore be realistic possibilities. The research results have been published in the journal Nature Machine Intelligence.

The system, called CoCoGraph, works in a similar way to generative artificial intelligence tools for text or images, such as ChatGPT or Dall-E. “These models create new content that looks very much like the real thing. Our algorithm does the same, but with molecules,” explains Roger Guimerà, an ICREA Research Professor in the Department of Chemical Engineering at the URV.

Unlike other AI tools, however, the model does not yet respond to specific instructions. For the moment it simply carries out the more basic task of generating plausible molecules, that is, structures that comply with the rules of chemistry.

Nevertheless, the task is enormous. Even when the system is given just one molecular formula (for example, that of paracetamol), it can construct a vast number of atomic combinations, although only a small fraction of these combinations turns out to be viable in reality.

“The number of possible molecules is immense; it is estimated that there could be up to 10⁶⁰ different ones, which is far more than the number of water molecules in the ocean,” explains Guimerà. In contrast, the number of known molecules is only a tiny fraction of this figure. The sheer enormity of the number of possible new molecules means that finding ones that are actually useful is like looking for a needle in a giant haystack.

Roger Guimerà, Manuel XXX i Marta Sales, investigadors del Departament d'Enginyeria Química de la URV, han participat a la recerca.
Roger Guimerà, Manuel Ruiz-Botella and Marta Sales, from Department of Chemical Engineering, have led the research.
How the model works

To generate these new molecules, CoCoGraph uses a diffusion model, a technique common in image generation. The process involves progressively “disordering” a real molecule and training the system to learn how to reconstruct it.

“We start with a real molecule, break the bonds and create new ones at random. The model learns to reverse this process and reconstruct coherent structures,” comments Marta Sales-Pardo, a researcher in the Department of Chemical Engineering who also took part in the research.

Unlike images, however, molecules are discrete structures, which makes the problem much more complex from a mathematical point of view.

Always-valid molecules

One of the main innovations of the model is that it directly incorporates the basic rules of chemistry. For example, each atom always maintains the correct number of bonds, and this guarantees that 100% of the molecules generated are chemically valid, unlike the impossible structures that can be produced by other models.

Furthermore, the system is more efficient: it uses fewer parameters, requires less computing power and can generate molecules more quickly.

The research team has compared CoCoGraph with other state-of-the-art models and analysed 36 physicochemical properties of the generated molecules, such as solubility and structural complexity. The result is that, for approximately two-thirds of these properties, the molecules generated are chemically more realistic than those from other models.

Verification by the scientific community

To check how plausible these molecules were, the team conducted an experiment with 121 chemistry experts from the University itself. Each participant was shown twenty pairs of molecules—one real and one generated by the new AI—and had to identify which was the real one.

The results showed that the experts were wrong in approximately 4 out of 10 cases, meaning they often confused the generated molecules with the real ones. “This means that many of the molecules we generate are very convincing,” explains Sales.

Although the model cannot yet design molecules with a specific function, promising tests have already been carried out. For example, researchers have identified molecules with properties similar to paracetamol from among the millions generated. They have also explored techniques to partially modify an existing molecule, a kind of chemical “tweak”, to create new variants with similar characteristics.

These approaches could be useful in the future for optimising drugs or developing new materials.

The first step towards an AI that designs bespoke molecules

The research team is clear that this is only the beginning. The main medium- to long-term goal is to be able to ask the artificial intelligence for a molecule with specific properties; for example, for a molecule that is soluble, non-toxic and useful for a specific application.

“For the moment, we are only generating molecules. The next step will be to apply specific objectives to this process,” says Manuel Ruiz-Botella, a doctoral student who also participated in the research.

If successful, the technology could transform fields such as chemistry, pharmacology and materials science and accelerate the discovery of new solutions in a chemical universe that is still practically unexplored.

Referència bibliogràfica: Ruiz-Botella, M., Sales-Pardo, M. & Guimerà, R. A collaborative constrained graph diffusion model for the generation of realistic synthetic molecules. Nat Mach Intell (2026). https://doi.org/10.1038/s42256-026-01229-5

Print Friendly, PDF & Email
Subscribe to the URV newsletters