Journal of Cheminformatics

Whether graph neural network (GNN) attributions capture the reactive chemistry encoded in adverse outcome pathway (AOP) annotations for skin sensitization has not been tested under a label-permutation control. We trained AttentiveFP on a 436-molecule LLNA-labeled subset (81 sensitizers) of a curated skin-sensitization dataset and extracted atom-level attributions from seven methods (integrated gr…

Contact Dermatitis and Allergies

Ferroptosis is a distinct iron-dependent form of regulated cell death that plays critical roles in cancer progression, neurodegenerative disorders, and immune regulation. Computational identification of ferroptosis-related proteins (FRPs) remains challenging due to the complex regulatory network of ferroptosis, the functional heterogeneity of FRPs, and the limited availability of experimentally v…

Deep learningFeature learningFerroptosis and cancer prognosisHealth SciencesIdentification (biology)

Machine Learning (ML) and Artificial Intelligence (AI) approaches have potential to make better-informed decisions in chemical hazard identification while reducing animal testing. Their application in the context of New Approach Methodologies (NAMs) for Hazard Identification in Chemicals Risk Assessment (CRA) is challenging due to the limited knowledge, lack of experience, and uncertainty related…

Effects and risks of endocrine disrupting chemicalsEnvironmental ScienceHealth, Toxicology and MutagenesisPhysical Sciences

Humans are exposed to thousands of chemicals throughout their life. Many of these chemicals are detected in blood and have been catalogued in the Blood Exposome Database. Comprehensive hazard assessment of a chemical requires time-consuming and costly lab experiments using animal or cell-lines, which cannot be easily scaled up to the chemical exposome, highlighting the urgent need for computation…

Environmental ScienceHealth, Environment, Cognitive AgingHealth, Toxicology and MutagenesisPhysical Sciences

Graph Neural Networks (GNNs) are powerful tools for predicting chemical shifts in Nuclear Magnetic Resonance (NMR) spectroscopy. In this paper, we improve the state-of-the-art mean absolute error (MAE) on the Ilm-NMR-P31 dataset for the prediction of <sup>31</sup>P shifts from 11.4 ppm to 8.88 ppm by proposing a lightweight GNN which is based on the Metalayer-Framework. Furthermore, we analyze th…

Computational Drug Discovery MethodsComputational Theory and MathematicsComputer SciencePhysical Sciences

The digitalisation of research requires data management systems capable of supporting a broad spectrum of usage scenarios, ranging from document-oriented repositories to fully factographic environments. This paper introduces a methodological approach for the stepwise development of such systems, illustrated by the MatInf Research Data Management System (RDMS). The proposed framework combines a gr…

Computer ScienceInformation SystemsPhysical SciencesResearch Data Management Practices

This study aims to evaluate the effect of latent diffusion models on molecular representation learning from the perspective of generalization performance in molecular property prediction. To this end, we formulate a deep generative model for molecular representation learning based on a latent diffusion-based prior distribution, and introduce an evaluation methodology of generalization for learned…

Machine Learning in Materials ScienceMaterials ChemistryMaterials SciencePhysical Sciences

Selecting an effective docking algorithm is highly context-dependent, and no single method performs reliably across structural, chemical, and protocol regimes. MolAS is a lightweight algorithm-selection model that predicts per-algorithm performance from pretrained protein and ligand embeddings using attentional pooling and a shallow residual decoder. With hundreds to a few thousand labelled compl…

Biochemistry, Genetics and Molecular BiologyLife SciencesMolecular BiologyProtein Degradation and Inhibitors

The abundant R-group information available in chemical publications plays a crucial role in data-driven artificial intelligence (AI) research in the field of medicinal chemistry. In real-world publications, R-groups are expressed in various textual and graphical forms, thereby rendering their manual integration labor-intensive and inefficient. Although automated tools exist for R-group recognitio…

Biochemistry, Genetics and Molecular BiologyBiomedical Text Mining and OntologiesLife SciencesMolecular Biology

Antihypertensive peptides (AHTPs) are short-chain peptides derived from food or bioproteins through enzymatic hydrolysis, fermentation, or chemical synthesis, and they have demonstrated promising blood pressure-lowering effects. These peptides primarily function by inhibiting angiotensin-converting enzyme (ACE), regulating the renin-angiotensin system (RAS), and enhancing nitric oxide (NO) produc…

Biochemistry, Genetics and Molecular BiologyLife SciencesMolecular BiologyProtein Hydrolysis and Bioactive Peptides
Paper
Tsuyoshi Esaki·Kazuyoshi Ikeda
3/2/2026

Data curation is a fundamental yet often underappreciated aspect of cheminformatics and computational drug discovery. Large public and proprietary databases now provide vast amounts of chemical structure, physicochemical, absorption, distribution, metabolism, excretion, and toxicity (ADMET), and bioactivity data. However, these resources contain structural inconsistencies, annotation errors, and …

Computational Drug Discovery MethodsComputational Theory and MathematicsComputer SciencePhysical Sciences

The success of machine learning in chemistry is fundamentally underpinned by the information fidelity of molecular representations. Despite their widespread adoption for efficiency and interpretability, Morgan fingerprints harbor a long-overlooked and fundamental flaw: bit collisions. This phenomenon erroneously maps distinct chemical substructures to identical positions, systematically corruptin…

Machine Learning in Materials ScienceMaterials ChemistryMaterials SciencePhysical Sciences
Paper
Vyacheslav Alekseevich Grigorev·+4 more
2/27/2026

Lipophilicity is a fundamental physicochemical property that significantly influences various aspects of drug behavior, such as solubility, permeability, metabolism, distribution, protein binding, and excretion. Consequently, accurate prediction of this property is critical for the successful discovery and development of new drug candidates. The classical metric for assessing lipophilicity is log…

Electrical and Electronic EngineeringEngineeringPhysical SciencesPower Transformer Diagnostics and Insulation

Molecular descriptors are central to the performance and interpretability of QSPR models, yet most existing fingerprints for organic electronics lack chemical relevance or interpretability. Here, we present the Organic Electronic Fingerprint (OEFP), a structure-based representation tailored for OLED and OPV materials. OEFP was constructed from a manually curated OLED dataset and publicly availabl…

ChemistryOrganic ChemistryPhysical SciencesSynthesis and Properties of Aromatic Compounds

Proteolysis-targeting chimeras (PROTACs) are heterobifunctional molecules composed of an E3 ligase ligand, a linker, and a warhead targeting a protein of interest. Despite their modular structure, accurately identifying and annotating these components in PROTACs is challenging and typically relies on manual curation and predefined substructure matching. To address this, we developed PROTAC-Splitt…

Biochemistry, Genetics and Molecular BiologyLife SciencesMolecular BiologyProtein Degradation and Inhibitors

Accurate representation of three-dimensional (3D) molecular structures is essential for quantitative structure-activity relationship (QSAR) modeling; however, it remains unclear whether increasing the level of theory used for quantum-chemical geometry optimization yields a practically meaningful benefit for classical conformation-dependent 3D descriptors (Dragon 3D) and the resulting QSAR perform…

Computational Drug Discovery MethodsComputational Theory and MathematicsComputer SciencePhysical Sciences

Accurately predicting interactions between drugs is critical for pharmaceutical research and clinical safety. The literature keeps moving toward increasingly complex architectures, yet gains on standard benchmarks are often small. We use a deliberately simple setup that keeps the classifier fixed and swaps only the molecular representation. We compare ECFP4 Morgan fingerprints (MFPs), pretrained …

Computational Drug Discovery MethodsComputational Theory and MathematicsComputer SciencePhysical Sciences

Predicting likely sites of metabolism (SOMs), i.e., the atoms in a molecule where metabolic reactions are initiated, is an important component of the computational development pipeline for pharmaceuticals, agrochemicals, and cosmetics. Among SOM prediction tools, FAME3, introduced in 2019, is one of only a few non-commercial models capable of predicting both Phase 1 and Phase 2 SOMs for a wide ra…

Computational Drug Discovery MethodsComputational Theory and MathematicsComputer SciencePhysical Sciences
research.ioresearch.io

Sign up to keep scrolling

Create your feed subscriptions, save articles, keep scrolling.

Already have an account?