Journal of Cheminformatics

Paper

Systematic validation of graph neural network explanations against adverse outcome pathway reactive centers for skin sensitization

Byoungjun Jeon

2d ago

Whether graph neural network (GNN) attributions capture the reactive chemistry encoded in adverse outcome pathway (AOP) annotations for skin sensitization has not been tested under a label-permutation control. We trained AttentiveFP on a 436-molecule LLNA-labeled subset (81 sensitizers) of a curated skin-sensitization dataset and extracted atom-level attributions from seven methods (integrated gr…

Contact Dermatitis and Allergies

Paper

DICL: a manually curated database of ion channels and ligands as a useful platform for drug discovery targeting ion channels

Wei Zhao·+6 more

2d ago

Computational Drug Discovery Methods

Paper

Contrastive representation learning and capsule networks enable accurate identification of ferroptosis-related proteins

Yiyang Zhao·+10 more

3/28/2026

Ferroptosis is a distinct iron-dependent form of regulated cell death that plays critical roles in cancer progression, neurodegenerative disorders, and immune regulation. Computational identification of ferroptosis-related proteins (FRPs) remains challenging due to the complex regulatory network of ferroptosis, the functional heterogeneity of FRPs, and the limited availability of experimentally v…

Deep learningFeature learningFerroptosis and cancer prognosisHealth SciencesIdentification (biology)

Paper

Correction: Integrating artificial intelligence and manual curation to enhance bioassay annotations in ChEMBL

Ines Smit·+8 more

3/27/2026

Biochemistry, Genetics and Molecular BiologyLife SciencesMolecular BiologyMolecular Biology Techniques and Applications

Paper

Perspective on applicability of data-driven machine learning computational new approach methodologies for hazard identification in chemicals risk assessment

Geven Piir·+8 more

3/26/2026

Machine Learning (ML) and Artificial Intelligence (AI) approaches have potential to make better-informed decisions in chemical hazard identification while reducing animal testing. Their application in the context of New Approach Methodologies (NAMs) for Hazard Identification in Chemicals Risk Assessment (CRA) is challenging due to the limited knowledge, lack of experience, and uncertainty related…

Effects and risks of endocrine disrupting chemicalsEnvironmental ScienceHealth, Toxicology and MutagenesisPhysical Sciences

Paper

Predicting toxicity and bioactivity of the chemical exposome: a case study for the blood exposome database

Ankita Dutta·Dinesh Kumar Barupal

3/19/2026

Humans are exposed to thousands of chemicals throughout their life. Many of these chemicals are detected in blood and have been catalogued in the Blood Exposome Database. Comprehensive hazard assessment of a chemical requires time-consuming and costly lab experiments using animal or cell-lines, which cannot be easily scaled up to the chemical exposome, highlighting the urgent need for computation…

Environmental ScienceHealth, Environment, Cognitive AgingHealth, Toxicology and MutagenesisPhysical Sciences

Paper

A light-weight Graph Neural Network for the prediction of 31P Nuclear Magnetic Resonance signals

Dimitri Domnjuk·...·Jana de Wiljes

3/17/2026

Graph Neural Networks (GNNs) are powerful tools for predicting chemical shifts in Nuclear Magnetic Resonance (NMR) spectroscopy. In this paper, we improve the state-of-the-art mean absolute error (MAE) on the Ilm-NMR-P31 dataset for the prediction of <sup>31</sup>P shifts from 11.4 ppm to 8.88 ppm by proposing a lightweight GNN which is based on the Metalayer-Framework. Furthermore, we analyze th…

Computational Drug Discovery MethodsComputational Theory and MathematicsComputer SciencePhysical Sciences

Paper

Evolve with your research: stepwise system evolution from document-driven to fact-centric research data management in materials science

Victor Dudarev·Alfred Ludwig

3/17/2026

The digitalisation of research requires data management systems capable of supporting a broad spectrum of usage scenarios, ranging from document-oriented repositories to fully factographic environments. This paper introduces a methodological approach for the stepwise development of such systems, illustrated by the MatInf Research Data Management System (RDMS). The proposed framework combines a gr…

Computer ScienceInformation SystemsPhysical SciencesResearch Data Management Practices

Paper

Graph latent diffusion-based molecular representation learning for enhanced generalization in molecular property prediction

Daiki Koge·...·Takashi Abe

3/16/2026

This study aims to evaluate the effect of latent diffusion models on molecular representation learning from the perspective of generalization performance in molecular property prediction. To this end, we formulate a deep generative model for molecular representation learning based on a latent diffusion-based prior distribution, and introduce an evaluation methodology of generalization for learned…

Machine Learning in Materials ScienceMaterials ChemistryMaterials SciencePhysical Sciences

Paper

Molecular embedding-based algorithm selection in protein-ligand docking

Jiabao Brad Wang·+4 more

3/14/2026

Selecting an effective docking algorithm is highly context-dependent, and no single method performs reliably across structural, chemical, and protocol regimes. MolAS is a lightweight algorithm-selection model that predicts per-algorithm performance from pretrained protein and ligand embeddings using attentional pooling and a shallow residual decoder. With hundreds to a few thousand labelled compl…

Biochemistry, Genetics and Molecular BiologyLife SciencesMolecular BiologyProtein Degradation and Inhibitors

Paper

RGReco: a unified framework for automated R-group recognition in chemical publications

YUANJIE XIANG·+6 more

3/10/2026

The abundant R-group information available in chemical publications plays a crucial role in data-driven artificial intelligence (AI) research in the field of medicinal chemistry. In real-world publications, R-groups are expressed in various textual and graphical forms, thereby rendering their manual integration labor-intensive and inefficient. Although automated tools exist for R-group recognitio…

Biochemistry, Genetics and Molecular BiologyBiomedical Text Mining and OntologiesLife SciencesMolecular Biology

Paper

CKAN-ATHP: a predictor for antihypertensive peptides based on sample augmentation and loss improvement strategies using the convolutional Kolmogorov–Arnold network

Sen Yang·...·Jiaqi Ni

3/8/2026

Antihypertensive peptides (AHTPs) are short-chain peptides derived from food or bioproteins through enzymatic hydrolysis, fermentation, or chemical synthesis, and they have demonstrated promising blood pressure-lowering effects. These peptides primarily function by inhibiting angiotensin-converting enzyme (ACE), regulating the renin-angiotensin system (RAS), and enhancing nitric oxide (NO) produc…

Biochemistry, Genetics and Molecular BiologyLife SciencesMolecular BiologyProtein Hydrolysis and Bioactive Peptides

Paper

Data curation in cheminformatics: importance and implementation

Tsuyoshi Esaki·Kazuyoshi Ikeda

3/2/2026

Data curation is a fundamental yet often underappreciated aspect of cheminformatics and computational drug discovery. Large public and proprietary databases now provide vast amounts of chemical structure, physicochemical, absorption, distribution, metabolism, excretion, and toxicity (ADMET), and bioactivity data. However, these resources contain structural inconsistencies, annotation errors, and …

Computational Drug Discovery MethodsComputational Theory and MathematicsComputer SciencePhysical Sciences

Paper

Collision-free morgan fingerprints: a principled approach to enhance machine learning performance and interpretability in chemistry

Jie Li·...·Xintong Qu

3/2/2026

The success of machine learning in chemistry is fundamentally underpinned by the information fidelity of molecular representations. Despite their widespread adoption for efficiency and interpretability, Morgan fingerprints harbor a long-overlooked and fundamental flaw: bit collisions. This phenomenon erroneously maps distinct chemical substructures to identical positions, systematically corruptin…

Machine Learning in Materials ScienceMaterials ChemistryMaterials SciencePhysical Sciences

Paper

Graph-based transformer to predict the octanol–water partition coefficient

Vyacheslav Alekseevich Grigorev·+4 more

2/27/2026

Lipophilicity is a fundamental physicochemical property that significantly influences various aspects of drug behavior, such as solubility, permeability, metabolism, distribution, protein binding, and excretion. Consequently, accurate prediction of this property is critical for the successful discovery and development of new drug candidates. The classical metric for assessing lipophilicity is log…

Electrical and Electronic EngineeringEngineeringPhysical SciencesPower Transformer Diagnostics and Insulation

Paper

Privileged structure-based molecular fingerprints for organic electronic materials: towards intuitive machine learning interpretation

Tae Wook Yang·...·Seung Hyun Jo

2/27/2026

Molecular descriptors are central to the performance and interpretability of QSPR models, yet most existing fingerprints for organic electronics lack chemical relevance or interpretability. Here, we present the Organic Electronic Fingerprint (OEFP), a structure-based representation tailored for OLED and OPV materials. OEFP was constructed from a manually curated OLED dataset and publicly availabl…

ChemistryOrganic ChemistryPhysical SciencesSynthesis and Properties of Aromatic Compounds

Paper

PROTAC-Splitter: a machine learning framework for automated identification of PROTAC substructures

Stefano Ribes·+6 more

2/20/2026

Proteolysis-targeting chimeras (PROTACs) are heterobifunctional molecules composed of an E3 ligase ligand, a linker, and a warhead targeting a protein of interest. Despite their modular structure, accurately identifying and annotating these components in PROTACs is challenging and typically relies on manual curation and predefined substructure matching. To address this, we developed PROTAC-Splitt…

Biochemistry, Genetics and Molecular BiologyLife SciencesMolecular BiologyProtein Degradation and Inhibitors

Paper

How quantum-chemical geometry optimization level affects classical 3D descriptors and QSAR performance: a comparative study

Jianmin Li·...·Rongling Gu

2/19/2026

Accurate representation of three-dimensional (3D) molecular structures is essential for quantitative structure-activity relationship (QSAR) modeling; however, it remains unclear whether increasing the level of theory used for quantum-chemical geometry optimization yields a practically meaningful benefit for classical conformation-dependent 3D descriptors (Dragon 3D) and the resulting QSAR perform…

Computational Drug Discovery MethodsComputational Theory and MathematicsComputer SciencePhysical Sciences

Paper

Addressing model overcomplexity in drug-drug interaction prediction with molecular fingerprints

Manel Gil-Sorribes·Alexis Molina

2/19/2026

Accurately predicting interactions between drugs is critical for pharmaceutical research and clinical safety. The literature keeps moving toward increasingly complex architectures, yet gains on standard benchmarks are often small. We use a deliberately simple setup that keeps the classifier fixed and swaps only the molecular representation. We compare ECFP4 Morgan fingerprints (MFPs), pretrained …

Computational Drug Discovery MethodsComputational Theory and MathematicsComputer SciencePhysical Sciences

Paper

FAME3R: an efficient, practical and reliable open-source tool for predicting phase 1 and phase 2 sites of metabolism

Roxane Axel Jacob·+5 more

2/14/2026

Predicting likely sites of metabolism (SOMs), i.e., the atoms in a molecule where metabolic reactions are initiated, is an important component of the computational development pipeline for pharmaceuticals, agrochemicals, and cosmetics. Among SOM prediction tools, FAME3, introduced in 2019, is one of only a few non-commercial models capable of predicting both Phase 1 and Phase 2 SOMs for a wide ra…

Computational Drug Discovery MethodsComputational Theory and MathematicsComputer SciencePhysical Sciences

research.io

Sign up to keep scrolling

Create your feed subscriptions, save articles, keep scrolling.

Already have an account?