A cheminformatics tool that assigns substrates to enzymatic domains in Type I Polyketide Synthase (T1PKS) biosynthetic gene clusters. Given a known product SMILES and a set of unordered domains, NPannotator determines the correct module ordering and substrate assignments by iteratively modifying starter/extender units and filtering via chemical similarity.
pip install -e .Key dependencies (rdkit, pandas, numpy) are installed automatically. The retrotide package (which provides the bcs module for domain/module/cluster objects) is also installed from GitHub.
import bcs
import pandas as pd
from NPannotator import Annotator
# Define target domains (unordered)
domains = [[bcs.AT],
[bcs.AT, bcs.KR],
[bcs.AT, bcs.KR, bcs.DH],
[bcs.AT, bcs.KR, bcs.DH, bcs.ER]]
target_SMILES = "..." # SMILES string of the known product
# Initialize and run the annotation pipeline
annotator = Annotator(target_SMILES=target_SMILES,
target_domains=domains,
scaffoldsDB=scaffoldsDB)
results_df = annotator.RunPipeline()pytest tests/