The overall architecture for predicting the properties of ionizable lipids begins with obtaining transfection efficiency datasets for various ionizable lipids. Subsequently, the BalMol block is used to smooth the distribution of labels and molecular features for balancing the data of LNPs. Finally, the TransLNP model is employed to predict transfection efficiency. Credit: Briefings in Bioinformatics (2024). DOI: 10.1093/bib/bbae186

The targeted treatment of pan-cancer by messenger RNA (mRNA) vaccine is a hot topic in drug research. A key challenge in mRNA design is the construction of delivery systems called lipid nanoparticles (LNPs), which serve as carriers to deliver mRNA therapies or vaccines to target cells. The preparation and screening of LNPs components involve long cycles and high costs.

In a study published in Briefings in Bioinformatics, a research team led by Prof. Liu Lizhuang from the Shanghai Advanced Research Institute (SARI) of the Chinese Academy of Sciences proposed a named TransLNP based on self-attention mechanisms, which maps the three-dimensional (3D) microstructure and biochemical properties of mRNA-LNPs to enable high-precision automated screening of LNPs.

The designed TransLNP used a cross-molecule automatic learning approach to extract knowledge from existing molecular data, enabling small-sample training for LNPs and facilitating model transfer across different molecule types.

To construct the mapping relationship between the 3D microstructure and biochemical properties of mRNA-LNPs, the model fully leveraged coarse-grained atomic sequence information and fine-grained atomic spatial correspondences. It extracted molecular-level features through the interaction of atomic information (atom types, coordinates, relative distance matrices, edge type matrices) based on a self-attention mechanism.

To address the imbalance caused by limited LNP data, the BalMol module was designed. This module balanced the data by smoothing label distributions and molecular feature distributions.

TransLNP achieved a mean squared error (MSE) of less than five for predicting LNP transfection efficiency. Compared with various mainstream graph and machine learning algorithms, TransLNP showed superior performance in terms of MSE, R2 (the larger the value, the better), and Pearson correlation coefficient, achieving top-tier metrics in the field.

This work is helpful for the rapid and accurate prediction of mRNA-LNP transfection efficiency and the prediction of new lipid nanoparticle structures, and sheds light on the application of mRNA drugs in , vaccine development, and drug delivery.

More information: Kun Wu et al, Data-balanced transformer for accelerated ionizable lipid nanoparticles screening in mRNA delivery, Briefings in Bioinformatics (2024). DOI: 10.1093/bib/bbae186