diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 3734bac9..5eb4c902 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -22,7 +22,7 @@ jobs:
- ubuntu-latest
- windows-latest
python-version:
- - 3.9
+ - '3.10'
steps:
- uses: actions/checkout@v3
diff --git a/.gitignore b/.gitignore
index b3f603c5..44fc928d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -5,6 +5,9 @@ __pycache__/
examples/data/
+lightning_logs/
+checkpoints/
+
# C extensions
*.so
diff --git a/README.md b/README.md
index 91bff717..698f7662 100644
--- a/README.md
+++ b/README.md
@@ -37,6 +37,10 @@ More detailed and comprehensive documentation in [DeepMol readthedocs](https://d
- [Regression](https://colab.research.google.com/drive/1vE-Q01orImdD4qFTo20MAT4E4kP2hsYF?usp=sharing)
- [Multi-task/multi-label](https://colab.research.google.com/drive/18z2vN6zLNSVJ3qgskKZTYxA_t9UNS1b8?usp=sharing)
+### DeepMol models
+
+All deployed DeepMol models are available in [here](https://github.com/BioSystemsUM/deepmol_case_studies). An example to obtain predictions with deployed models is available in [here](https://colab.research.google.com/drive/1_I-f7jQPx2AR76h431x4AdV5Peybs5LO?usp=sharing). Installation of the models' API can be done via pip as in [here](https://pypi.org/project/deepmol-models/).
+
### Table of contents:
- [Installation](#installation)
diff --git a/examples/notebooks/featurization.ipynb b/examples/notebooks/featurization.ipynb
index 2e44e769..1d87bd65 100644
--- a/examples/notebooks/featurization.ipynb
+++ b/examples/notebooks/featurization.ipynb
@@ -2344,7 +2344,7 @@
],
"metadata": {
"kernelspec": {
- "display_name": "Python 3 (ipykernel)",
+ "display_name": "deepmol",
"language": "python",
"name": "python3"
},
@@ -2358,7 +2358,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.10.14"
+ "version": "3.11.13"
}
},
"nbformat": 4,
diff --git a/examples/notebooks/molecular_standardizers copy.ipynb b/examples/notebooks/molecular_standardizers copy.ipynb
new file mode 100644
index 00000000..cd70c5a8
--- /dev/null
+++ b/examples/notebooks/molecular_standardizers copy.ipynb
@@ -0,0 +1,1245 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": false
+ },
+ "source": [
+ "# Compound Standardization with DeepMol\n",
+ "\n",
+ "Standardization is the process of converting a chemical structure to a standardized format using a set of rules. The standardized format enables the chemical structure to be easily compared with other chemical structures and used in various computational applications.\n",
+ "\n",
+ "It is possible to standardize the loaded molecules using three option. Using a basic standardizer that only does sanitization (Kekulize, check valencies, set aromaticity, conjugation and hybridization). A more complex standardizer can be customized by choosing or not to perform specific tasks such as sanitization, remove isotope information, neutralize charges, remove stereochemistry and remove smaller fragments. Another possibility is to use the ChEMBL Standardizer."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": false
+ },
+ "source": [
+ "Standardizing molecules is important in machine learning pipelines because it helps to **ensure that the data is consistent and comparable across different samples**. In the context of molecular data, standardization typically involves removing salts and fragments and/or neutralizing charges."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2023-05-25T11:00:04.391061234Z",
+ "start_time": "2023-05-25T11:00:04.381852176Z"
+ },
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "from deepmol.datasets import SmilesDataset\n",
+ "\n",
+ "# list of non-standardized smiles\n",
+ "smiles_non_standardized = [\"C1=CC=C2C(=C1)C(=CN2)C[C@@H](C(=O)O)N\", \"C(C(=O)[O-])N.[Na+]\"]\n",
+ "\n",
+ "# Let's create a small dataset with our non-standardized smiles\n",
+ "df = SmilesDataset(smiles=smiles_non_standardized)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": false
+ },
+ "source": [
+ "### Let's see how our molecules look like using RDKit"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2023-05-25T11:00:07.783981150Z",
+ "start_time": "2023-05-25T11:00:07.749990412Z"
+ },
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/svg+xml": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "from rdkit.Chem.Draw import IPythonConsole\n",
+ "from rdkit.Chem import Draw\n",
+ "from IPython.display import SVG, display\n",
+ "\n",
+ "IPythonConsole.drawOptions.addAtomIndices = True\n",
+ "\n",
+ "# non standard molecules\n",
+ "mols_non_standardized = df.mols\n",
+ "# Draw the molecules to a grid image\n",
+ "img = Draw.MolsToGridImage(mols_non_standardized, molsPerRow=3, subImgSize=(400, 400), useSVG=True)\n",
+ "# Get the SVG image from the grid image\n",
+ "svg = img.data\n",
+ "# Display the SVG image using the IPython.display.SVG object\n",
+ "display(SVG(svg))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": false
+ },
+ "source": [
+ "## Standardization using the BasicStandardizer\n",
+ "\n",
+ "The BasicStandardizer only does sanitization (Kekulize, check valencies, set aromaticity, conjugation and hybridization).\n",
+ "To perform the standardization we need to call the `standardize` method with the dataset as input."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "2025-06-23 14:56:11,821 — INFO — Standardizer BasicStandardizer initialized with -1 jobs.\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "[14:56:11] Initializing Normalizer\n",
+ "BasicStandardizer: 100%|██████████| 2/2 [00:00<00:00, 34.77it/s]\n"
+ ]
+ }
+ ],
+ "source": [
+ "from deepmol.standardizer import BasicStandardizer\n",
+ "from copy import deepcopy\n",
+ "\n",
+ "# Let's create a copy of our dataset\n",
+ "d1 = deepcopy(df)\n",
+ "\n",
+ "# Let's standardize our dataset using the BasicStandardizer\n",
+ "basic_standardizer = BasicStandardizer()\n",
+ "basic_standardizer.standardize(d1, inplace=True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": false
+ },
+ "source": [
+ "### Let's see how our molecules look like after standardization\n",
+ "\n",
+ "With this standardizer you can only notice small changes in the molecules as only sanitization is done.\n",
+ "Some visible changes were mainly due to the conversion of all chiral centers to a consistent configuration, removal of explicit hydrogens, and standardization of atom order.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2023-05-25T11:00:16.545636363Z",
+ "start_time": "2023-05-25T11:00:16.511514529Z"
+ },
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/svg+xml": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# Standardized molecules\n",
+ "mols_standardized = d1.mols\n",
+ "# Draw the molecules to a grid image\n",
+ "img = Draw.MolsToGridImage(mols_standardized, molsPerRow=2, subImgSize=(400, 400), useSVG=True)\n",
+ "# Get the SVG image from the grid image\n",
+ "svg = img.data\n",
+ "# Display the SVG image using the IPython.display.SVG object\n",
+ "display(SVG(svg))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": false
+ },
+ "source": [
+ "# Standardization using the CustomStandardizer\n",
+ "\n",
+ "In the custom standardizer you can choose which tasks to perform. The default tasks are:\n",
+ "- Remove isotope information (default: False)\n",
+ "- Neutralize charges (default: False)\n",
+ "- Remove stereochemistry (default: True)\n",
+ "- Remove smaller fragments (default: False)\n",
+ "- Add explicit hydrogens (default: False)\n",
+ "- Kekulize (default: False)\n",
+ "- Neutralize charges again (default: True)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "2025-06-23 15:20:32,684 — INFO — Standardizer CustomStandardizer initialized with -1 jobs.\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "CustomStandardizer: 100%|██████████| 2/2 [00:00<00:00, 37.19it/s]\n"
+ ]
+ }
+ ],
+ "source": [
+ "from deepmol.standardizer import CustomStandardizer\n",
+ "\n",
+ "# Let's create a copy of our dataset\n",
+ "d2 = deepcopy(df)\n",
+ "\n",
+ "# Define the standardization steps\n",
+ "standardization_steps = {'REMOVE_ISOTOPE': True,\n",
+ " 'NEUTRALISE_CHARGE': True,\n",
+ " 'REMOVE_STEREO': True,\n",
+ " 'KEEP_BIGGEST': True,\n",
+ " 'ADD_HYDROGEN': False,\n",
+ " 'KEKULIZE': False,\n",
+ " 'NEUTRALISE_CHARGE_LATE': True}\n",
+ "\n",
+ "# Let's standardize our dataset using the CustomStandardizer\n",
+ "custom_standardizer = CustomStandardizer(standardization_steps)\n",
+ "custom_standardizer.standardize(d2, inplace=True)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "NC(Cc1c[nH]c2ccccc12)C(=O)O\n",
+ "NCC(=O)O\n"
+ ]
+ }
+ ],
+ "source": [
+ "from rdkit.Chem import MolToSmiles\n",
+ "\n",
+ "for mol in d2.mols:\n",
+ " print(MolToSmiles(mol))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": false
+ },
+ "source": [
+ "### Let's see how our molecules look like after standardization\n",
+ "\n",
+ "As we can see the standadized molecules do not contain any isotopic information, the charges are neutralized, do not contain any stereochemistry information (e.g. chirality centers), and smaller fragments are removed."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2023-05-25T11:00:50.489246988Z",
+ "start_time": "2023-05-25T11:00:50.472318939Z"
+ },
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/svg+xml": [
+ "