Overview of SPARROW and its role within the molecular design cycle. Each molecule in a candidate set, comprising molecular ideas from any combination of algorithmic or expert sources, is annotated with its anticipated properties and potential synthetic routes. These annotations can make use of quantitative structure–property relationship models with or without uncertainty quantification, as well as computer-aided synthesis planning tools or human experts. SPARROW then weighs the utility of every candidate against their synthetic costs, not one-by-one, but as a batch, and selects an optimal subset of candidates for synthesis and testing. In the depicted retrosynthetic graph, orange circles represent reaction nodes. Pink, blue and green circles represent target compounds, intermediates and buyable compounds, respectively. Credit: Nature Computational Science (2024). DOI: 10.1038/s43588-024-00639-y

The use of AI to streamline drug discovery is exploding. Researchers are deploying machine-learning models to help them identify molecules, among billions of options, that might have the properties they are seeking to develop new medicines.

But there are so many variables to consider—from the price of materials to the risk of something going wrong—that even when scientists use AI, weighing the costs of synthesizing the best candidates is no easy task.

The myriad challenges involved in identifying the best and most cost-efficient to test is one reason take so long to develop, as well as a key driver of high prescription drug prices.

To help scientists make cost-aware choices, MIT researchers have developed an algorithmic framework to automatically identify optimal molecular candidates, which minimizes synthetic cost while maximizing the likelihood candidates have desired properties. The algorithm also identifies the materials and experimental steps needed to synthesize these molecules.

Their quantitative framework, known as Synthesis Planning and Rewards-based Route Optimization Workflow (SPARROW), considers the costs of synthesizing a batch of molecules at once, since multiple candidates can often be derived from some of the same . Moreover, this unified approach captures key information on molecular design, property prediction, and synthesis planning from online repositories and widely used AI tools.

The paper is published in the journal Nature Computational Science.

Beyond helping pharmaceutical companies discover new drugs more efficiently, SPARROW could be used in applications like the invention of new agrichemicals or the discovery of specialized materials for organic electronics.

"The selection of compounds is very much an art at the moment—and at times it is a very successful art. But because we have all these other models and predictive tools that give us information on how molecules might perform and how they might be synthesized, we can and should be using that information to guide the decisions we make," says Connor Coley, the Class of 1957 Career Development Assistant Professor in the MIT departments of Chemical Engineering and Electrical Engineering and Computer Science, and senior author of a paper on SPARROW.

Coley is joined on the paper by lead author Jenna Fromer.

Complex cost considerations

In a sense, whether a scientist should synthesize and test a certain molecule boils down to a question of the synthetic cost versus the value of the experiment. However, determining cost or value are tough problems on their own.

For instance, an experiment might require expensive materials or it could have a high risk of failure. On the value side, one might consider how useful it would be to know the properties of this molecule or whether those predictions carry a high level of uncertainty.

At the same time, increasingly use batch synthesis to improve efficiency. Instead of testing molecules one at a time, they use combinations of chemical building blocks to test multiple candidates at once. However, this means the chemical reactions must all require the same experimental conditions. This makes estimating cost and value even more challenging.

SPARROW tackles this challenge by considering the shared intermediary compounds involved in synthesizing molecules and incorporating that information into its cost-versus-value function.

"When you think about this optimization game of designing a batch of molecules, the cost of adding on a new structure depends on the molecules you have already chosen," Coley says.

The framework also considers things like the costs of starting materials, the number of reactions that are involved in each synthetic route, and the likelihood those reactions will be successful on the first try.

To utilize SPARROW, a scientist provides a set of molecular compounds they are thinking of testing and a definition of the properties they are hoping to find.

From there, SPARROW collects information on the molecules and their synthetic pathways and then weighs the value of each one against the cost of synthesizing a batch of candidates. It automatically selects the best subset of candidates that meet the user's criteria and finds the most cost-effective synthetic routes for those compounds.

"It does all this optimization in one step, so it can really capture all of these competing objectives simultaneously," Fromer says.

More information: Jenna C. Fromer et al, An algorithmic framework for synthetic cost-aware decision making in molecular design, Nature Computational Science (2024). DOI: 10.1038/s43588-024-00639-y

Journal information: Nature Computational Science