Journal of Cheminformatics
Ferroptosis is a distinct iron-dependent form of regulated cell death that plays critical roles in cancer progression, neurodegenerative disorders, and immune regulation. Computational identification of ferroptosis-related proteins (FRPs) remains challenging due to the complex regulatory network of ferroptosis, the functional heterogeneity of FRPs, and the limited availability of experimentally v…
Machine Learning (ML) and Artificial Intelligence (AI) approaches have potential to make better-informed decisions in chemical hazard identification while reducing animal testing. Their application in the context of New Approach Methodologies (NAMs) for Hazard Identification in Chemicals Risk Assessment (CRA) is challenging due to the limited knowledge, lack of experience, and uncertainty related…
Humans are exposed to thousands of chemicals throughout their life. Many of these chemicals are detected in blood and have been catalogued in the Blood Exposome Database. Comprehensive hazard assessment of a chemical requires time-consuming and costly lab experiments using animal or cell-lines, which cannot be easily scaled up to the chemical exposome, highlighting the urgent need for computation…
Graph Neural Networks (GNNs) are powerful tools for predicting chemical shifts in Nuclear Magnetic Resonance (NMR) spectroscopy. In this paper, we improve the state-of-the-art mean absolute error (MAE) on the Ilm-NMR-P31 dataset for the prediction of <sup>31</sup>P shifts from 11.4 ppm to 8.88 ppm by proposing a lightweight GNN which is based on the Metalayer-Framework. Furthermore, we analyze th…
The digitalisation of research requires data management systems capable of supporting a broad spectrum of usage scenarios, ranging from document-oriented repositories to fully factographic environments. This paper introduces a methodological approach for the stepwise development of such systems, illustrated by the MatInf Research Data Management System (RDMS). The proposed framework combines a gr…
This study aims to evaluate the effect of latent diffusion models on molecular representation learning from the perspective of generalization performance in molecular property prediction. To this end, we formulate a deep generative model for molecular representation learning based on a latent diffusion-based prior distribution, and introduce an evaluation methodology of generalization for learned…
Selecting an effective docking algorithm is highly context-dependent, and no single method performs reliably across structural, chemical, and protocol regimes. MolAS is a lightweight algorithm-selection model that predicts per-algorithm performance from pretrained protein and ligand embeddings using attentional pooling and a shallow residual decoder. With hundreds to a few thousand labelled compl…
The abundant R-group information available in chemical publications plays a crucial role in data-driven artificial intelligence (AI) research in the field of medicinal chemistry. In real-world publications, R-groups are expressed in various textual and graphical forms, thereby rendering their manual integration labor-intensive and inefficient. Although automated tools exist for R-group recognitio…
Antihypertensive peptides (AHTPs) are short-chain peptides derived from food or bioproteins through enzymatic hydrolysis, fermentation, or chemical synthesis, and they have demonstrated promising blood pressure-lowering effects. These peptides primarily function by inhibiting angiotensin-converting enzyme (ACE), regulating the renin-angiotensin system (RAS), and enhancing nitric oxide (NO) produc…
Data curation is a fundamental yet often underappreciated aspect of cheminformatics and computational drug discovery. Large public and proprietary databases now provide vast amounts of chemical structure, physicochemical, absorption, distribution, metabolism, excretion, and toxicity (ADMET), and bioactivity data. However, these resources contain structural inconsistencies, annotation errors, and …
The success of machine learning in chemistry is fundamentally underpinned by the information fidelity of molecular representations. Despite their widespread adoption for efficiency and interpretability, Morgan fingerprints harbor a long-overlooked and fundamental flaw: bit collisions. This phenomenon erroneously maps distinct chemical substructures to identical positions, systematically corruptin…
Lipophilicity is a fundamental physicochemical property that significantly influences various aspects of drug behavior, such as solubility, permeability, metabolism, distribution, protein binding, and excretion. Consequently, accurate prediction of this property is critical for the successful discovery and development of new drug candidates. The classical metric for assessing lipophilicity is log…
Molecular descriptors are central to the performance and interpretability of QSPR models, yet most existing fingerprints for organic electronics lack chemical relevance or interpretability. Here, we present the Organic Electronic Fingerprint (OEFP), a structure-based representation tailored for OLED and OPV materials. OEFP was constructed from a manually curated OLED dataset and publicly availabl…
Proteolysis-targeting chimeras (PROTACs) are heterobifunctional molecules composed of an E3 ligase ligand, a linker, and a warhead targeting a protein of interest. Despite their modular structure, accurately identifying and annotating these components in PROTACs is challenging and typically relies on manual curation and predefined substructure matching. To address this, we developed PROTAC-Splitt…
Accurate representation of three-dimensional (3D) molecular structures is essential for quantitative structure-activity relationship (QSAR) modeling; however, it remains unclear whether increasing the level of theory used for quantum-chemical geometry optimization yields a practically meaningful benefit for classical conformation-dependent 3D descriptors (Dragon 3D) and the resulting QSAR perform…
Accurately predicting interactions between drugs is critical for pharmaceutical research and clinical safety. The literature keeps moving toward increasingly complex architectures, yet gains on standard benchmarks are often small. We use a deliberately simple setup that keeps the classifier fixed and swaps only the molecular representation. We compare ECFP4 Morgan fingerprints (MFPs), pretrained …
Predicting likely sites of metabolism (SOMs), i.e., the atoms in a molecule where metabolic reactions are initiated, is an important component of the computational development pipeline for pharmaceuticals, agrochemicals, and cosmetics. Among SOM prediction tools, FAME3, introduced in 2019, is one of only a few non-commercial models capable of predicting both Phase 1 and Phase 2 SOMs for a wide ra…
research.ioSign up to keep scrolling
Create your feed subscriptions, save articles, keep scrolling.