The Drug Design Pipeline: An Overview
Designing a drug candidate is not a single step – it is a pipeline. You start with a target hypothesis (a protein you want to inhibit or activate), generate candidate molecules computationally, validate their drug-like properties, screen for toxicity and metabolic liabilities, verify that they can actually be synthesized, and finally order synthesis of the top candidates. Each step eliminates compounds, and only the survivors advance.
Traditionally, this pipeline involves a patchwork of tools: REINVENT or another generator for molecule design, RDKit for property calculations, custom ADMET models or commercial software like StarDrop for ADMET prediction, retrosynthesis tools like ASKCOS or AiZynthFinder for synthesis planning, and spreadsheets or databases to track results. Integrating these tools requires significant engineering effort and domain expertise.
This guide walks through the complete pipeline using SciRouter's unified API. Every step – from molecule generation to synthesis feasibility – runs through a single SDK with consistent data formats.
Prerequisites
You need Python 3.8+ and a SciRouter API key. Sign up at scirouter.ai/register for 500 free credits per month.
pip install scirouter pandasexport SCIROUTER_API_KEY="sk-sci-your-api-key-here"Step 1: Generate Candidate Molecules
The pipeline starts with molecule generation. We use REINVENT4 to produce novel molecules optimized against a set of drug-like property targets. For this example, we design kinase inhibitor candidates with a quinazoline scaffold bias:
from scirouter import SciRouter
import pandas as pd
client = SciRouter()
# Generate 100 molecules optimized for kinase inhibitor properties
gen_result = client.generate.molecules(
num_molecules=100,
scoring={
"qed": {"weight": 0.25, "target": 0.7},
"sa_score": {"weight": 0.25, "target": 3.0},
"molecular_weight": {"weight": 0.2, "min": 300, "max": 550},
"logp": {"weight": 0.15, "min": 1.0, "max": 4.5},
"hbd": {"weight": 0.15, "max": 3},
},
similarity={
"reference_smiles": "c1ccc(-c2cnc3ccccc3n2)cc1", # quinazoline core
"min_tanimoto": 0.25,
},
)
print(f"Generated {len(gen_result.molecules)} molecules")
# Quick overview
for mol in gen_result.molecules[:5]:
print(f" {mol.smiles} (QED={mol.qed:.2f}, SA={mol.sa_score:.1f})")similarity parameter biases generation toward molecules that share substructural features with the reference SMILES. A min_tanimoto of 0.25 is permissive enough for scaffold hopping while maintaining some structural relevance to your target pharmacophore.Step 2: Calculate Molecular Properties
The first filter is drug-likeness. Use molecular properties to compute a comprehensive property profile for each generated molecule. These properties determine whether a molecule has the physicochemical characteristics of an oral drug:
# Calculate detailed properties for each molecule
molecules_data = []
for mol in gen_result.molecules:
props = client.chemistry.properties(smiles=mol.smiles)
molecules_data.append({
"smiles": mol.smiles,
"mw": props.molecular_weight,
"logp": props.logp,
"hba": props.hba,
"hbd": props.hbd,
"tpsa": props.tpsa,
"rotatable_bonds": props.rotatable_bonds,
"qed": props.qed,
"rings": props.num_rings,
})
df = pd.DataFrame(molecules_data)
print(f"Calculated properties for {len(df)} molecules")
print(f"\nProperty ranges:")
print(f" MW: {df['mw'].min():.0f} - {df['mw'].max():.0f}")
print(f" LogP: {df['logp'].min():.1f} - {df['logp'].max():.1f}")
print(f" HBA: {df['hba'].min()} - {df['hba'].max()}")
print(f" HBD: {df['hbd'].min()} - {df['hbd'].max()}")
print(f" TPSA: {df['tpsa'].min():.0f} - {df['tpsa'].max():.0f}")Apply Lipinski and Veber Filters
Apply the Rule of Five (Lipinski) and Veber's rules to eliminate molecules with poor oral bioavailability potential:
# Lipinski's Rule of Five
lipinski = df[
(df["mw"] <= 500) &
(df["logp"] <= 5.0) &
(df["hba"] <= 10) &
(df["hbd"] <= 5)
].copy()
print(f"Pass Lipinski: {len(lipinski)}/{len(df)}")
# Veber's rules (oral bioavailability)
veber = lipinski[
(lipinski["tpsa"] <= 140) &
(lipinski["rotatable_bonds"] <= 10)
].copy()
print(f"Pass Veber: {len(veber)}/{len(lipinski)}")
# Additional quality filters
quality = veber[
(veber["qed"] >= 0.4) &
(veber["rings"] >= 1) &
(veber["rings"] <= 5)
].copy()
print(f"Pass quality: {len(quality)}/{len(veber)}")
print(f"\n{len(quality)} molecules advance to ADMET screening")Step 3: ADMET Screening
ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) screening predicts biological behavior that cannot be inferred from simple molecular descriptors. This step eliminates compounds with metabolic instability, toxicity risks, or poor absorption before you invest in synthesis:
# Screen each candidate for ADMET liabilities
admet_results = []
for _, row in quality.iterrows():
admet = client.chemistry.admet(smiles=row["smiles"])
admet_results.append({
"smiles": row["smiles"],
"mw": row["mw"],
"logp": row["logp"],
"qed": row["qed"],
"caco2_permeable": admet.caco2_permeable,
"cyp_inhibitor": admet.cyp_inhibitor,
"herg_safe": admet.herg_safe,
"ames_safe": admet.ames_safe,
"hepatotox_safe": admet.hepatotox_safe,
"bbb_penetrant": admet.bbb_penetrant,
"ppb": admet.plasma_protein_binding,
})
admet_df = pd.DataFrame(admet_results)
# Apply ADMET filters
safe = admet_df[
(admet_df["herg_safe"] == True) & # no cardiac toxicity risk
(admet_df["ames_safe"] == True) & # no mutagenicity
(admet_df["hepatotox_safe"] == True) & # no liver toxicity
(admet_df["caco2_permeable"] == True) & # good intestinal absorption
(admet_df["cyp_inhibitor"] == False) # no major CYP inhibition
].copy()
print(f"ADMET screening results:")
print(f" Input: {len(admet_df)}")
print(f" hERG safe: {admet_df['herg_safe'].sum()}")
print(f" Ames safe: {admet_df['ames_safe'].sum()}")
print(f" Hepatotox safe: {admet_df['hepatotox_safe'].sum()}")
print(f" Caco-2 perm: {admet_df['caco2_permeable'].sum()}")
print(f" No CYP inhib: {(~admet_df['cyp_inhibitor']).sum()}")
print(f" Pass all: {len(safe)}")
print(f"\n{len(safe)} molecules advance to synthesis check")Step 4: Synthetic Accessibility Check
A beautiful molecule on screen is worthless if it cannot be made in a flask. Use synthesis check to evaluate whether each surviving candidate can realistically be synthesized:
# Check synthetic accessibility for ADMET-safe candidates
synth_results = []
for _, row in safe.iterrows():
synth = client.generate.synthesis_check(smiles=row["smiles"])
synth_results.append({
"smiles": row["smiles"],
"mw": row["mw"],
"logp": row["logp"],
"qed": row["qed"],
"sa_score": synth.sa_score,
"feasibility": synth.feasibility,
"num_steps": synth.estimated_steps,
})
synth_df = pd.DataFrame(synth_results)
# Filter by synthesis feasibility
synthesizable = synth_df[
(synth_df["sa_score"] < 4.5) &
(synth_df["feasibility"].isin(["easy", "moderate"]))
].copy()
print(f"Synthesis check results:")
print(f" Input: {len(synth_df)}")
print(f" Easy: {(synth_df['feasibility'] == 'easy').sum()}")
print(f" Moderate: {(synth_df['feasibility'] == 'moderate').sum()}")
print(f" Difficult: {(synth_df['feasibility'] == 'difficult').sum()}")
print(f" Very difficult: {(synth_df['feasibility'] == 'very difficult').sum()}")
print(f" Pass (SA < 4.5): {len(synthesizable)}")
print(f"\n{len(synthesizable)} molecules are final candidates")Step 5: Rank and Select Final Candidates
With all filters applied, rank the surviving candidates by a composite score that balances drug-likeness, ADMET profile, and synthetic feasibility:
# Composite ranking score
synthesizable = synthesizable.copy()
synthesizable["rank_score"] = (
synthesizable["qed"] * 0.35 + # drug-likeness
(1 - synthesizable["sa_score"] / 10) * 0.35 + # synthesis ease (inverted)
(1 - abs(synthesizable["logp"] - 2.5) / 5) * 0.15 + # optimal LogP distance
(1 - abs(synthesizable["mw"] - 400) / 200) * 0.15 # optimal MW distance
)
# Sort by rank score
final = synthesizable.sort_values("rank_score", ascending=False).head(10)
print("=== TOP 10 DRUG CANDIDATES ===\n")
for i, (_, row) in enumerate(final.iterrows()):
print(f"Rank {i+1}:")
print(f" SMILES: {row['smiles']}")
print(f" MW: {row['mw']:.0f}")
print(f" LogP: {row['logp']:.1f}")
print(f" QED: {row['qed']:.2f}")
print(f" SA score: {row['sa_score']:.1f}")
print(f" Feasibility: {row['feasibility']}")
print(f" Est. steps: {row['num_steps']}")
print(f" Rank score: {row['rank_score']:.3f}")
print()Step 6: Export for Synthesis
Export the final candidates in formats ready for submission to a contract research organization (CRO) or your in-house medicinal chemistry team:
import json
# Export as CSV for spreadsheet review
final.to_csv("drug_candidates.csv", index=False)
print("Saved drug_candidates.csv")
# Export as JSON with full details
export_data = {
"campaign": "kinase_inhibitor_v1",
"pipeline": {
"generated": len(gen_result.molecules),
"passed_lipinski": len(lipinski),
"passed_veber": len(veber),
"passed_quality": len(quality),
"passed_admet": len(safe),
"passed_synthesis": len(synthesizable),
"final_candidates": len(final),
},
"candidates": final.to_dict(orient="records"),
}
with open("drug_candidates.json", "w") as f:
json.dump(export_data, f, indent=2)
print("Saved drug_candidates.json")
# Export SMILES list for CRO submission
with open("synthesis_order.smi", "w") as f:
for i, (_, row) in enumerate(final.iterrows()):
f.write(f"{row['smiles']}\tCandidate_{i+1}\n")
print("Saved synthesis_order.smi")
print(f"\nPipeline summary:")
print(f" {len(gen_result.molecules)} generated -> {len(final)} candidates ({len(final)/len(gen_result.molecules)*100:.1f}% survival rate)")The Complete Pipeline Script
Here is the full end-to-end script that chains all six steps into a single runnable pipeline:
"""
SMILES-to-Synthesis Pipeline
Generates, validates, and ranks drug candidates using SciRouter.
"""
import json
import sys
from scirouter import SciRouter
from scirouter.exceptions import SciRouterError
client = SciRouter()
# --- Configuration ---
TARGET_SMILES = "c1ccc(-c2cnc3ccccc3n2)cc1" # reference scaffold
NUM_MOLECULES = 100
SA_CUTOFF = 4.5
QED_CUTOFF = 0.4
print("=== SMILES-to-Synthesis Pipeline ===\n")
# Step 1: Generate
print("Step 1: Generating molecules...")
try:
gen = client.generate.molecules(
num_molecules=NUM_MOLECULES,
scoring={
"qed": {"weight": 0.3, "target": 0.7},
"sa_score": {"weight": 0.3, "target": 3.0},
"molecular_weight": {"weight": 0.2, "min": 300, "max": 550},
"logp": {"weight": 0.2, "min": 1.0, "max": 4.5},
},
similarity={"reference_smiles": TARGET_SMILES, "min_tanimoto": 0.25},
)
except SciRouterError as e:
print(f"Generation failed: {e}")
sys.exit(1)
print(f" Generated: {len(gen.molecules)}")
# Step 2: Properties + Lipinski/Veber
print("Step 2: Property filtering...")
candidates = []
for mol in gen.molecules:
p = client.chemistry.properties(smiles=mol.smiles)
if (p.molecular_weight <= 500 and p.logp <= 5.0 and
p.hba <= 10 and p.hbd <= 5 and
p.tpsa <= 140 and p.rotatable_bonds <= 10 and
p.qed >= QED_CUTOFF):
candidates.append({"smiles": mol.smiles, "mw": p.molecular_weight,
"logp": p.logp, "qed": p.qed, "tpsa": p.tpsa})
print(f" Pass drug-likeness: {len(candidates)}")
# Step 3: ADMET
print("Step 3: ADMET screening...")
admet_safe = []
for c in candidates:
a = client.chemistry.admet(smiles=c["smiles"])
if a.herg_safe and a.ames_safe and a.hepatotox_safe and a.caco2_permeable:
admet_safe.append(c)
print(f" Pass ADMET: {len(admet_safe)}")
# Step 4: Synthesis check
print("Step 4: Synthesis feasibility...")
synthesizable = []
for c in admet_safe:
s = client.generate.synthesis_check(smiles=c["smiles"])
if s.sa_score < SA_CUTOFF:
c["sa_score"] = s.sa_score
c["feasibility"] = s.feasibility
synthesizable.append(c)
print(f" Pass synthesis: {len(synthesizable)}")
# Step 5: Rank
print("Step 5: Ranking...\n")
for c in synthesizable:
c["score"] = c["qed"] * 0.4 + (1 - c["sa_score"] / 10) * 0.4 + (1 - abs(c["logp"] - 2.5) / 5) * 0.2
synthesizable.sort(key=lambda x: x["score"], reverse=True)
# Step 6: Output
print("=== TOP CANDIDATES ===\n")
for i, c in enumerate(synthesizable[:10]):
print(f"{i+1}. {c['smiles']}")
print(f" QED={c['qed']:.2f} SA={c['sa_score']:.1f} LogP={c['logp']:.1f} MW={c['mw']:.0f}")
with open("pipeline_results.json", "w") as f:
json.dump({"candidates": synthesizable[:10]}, f, indent=2)
print(f"\nPipeline: {len(gen.molecules)} -> {len(synthesizable)} candidates")
print("Results saved to pipeline_results.json")Ordering Synthesis: From SMILES to Flask
Once you have your ranked candidates, the next step is synthesis. There are several paths depending on your resources and timeline:
Make-on-Demand Services
Companies like Enamine, Mcule, and WuXi AppTec offer make-on-demand synthesis. You submit SMILES strings and receive purified compounds in 2 to 6 weeks. Typical costs range from $200 to $2,000 per compound depending on complexity. The SA score and estimated step count from the synthesis check give you a rough cost estimate.
Catalog Screening
Before ordering custom synthesis, check whether close analogs of your candidates are available in commercial compound libraries. Enamine REAL, Mcule, and MolPort contain billions of immediately purchasable or quickly synthesizable molecules. A Tanimoto similarity search against these catalogs may find compounds close enough to your designs that custom synthesis is unnecessary.
In-House Medicinal Chemistry
If you have medicinal chemists on your team, provide them with the SMILES, retrosynthetic route suggestions, and the property profile. They can evaluate the route feasibility, suggest modifications that simplify synthesis while maintaining the desired properties, and execute the synthesis.
Pipeline Optimization Tips
- Adjust filter stringency to your hit rate: If fewer than 5% of generated molecules survive all filters, loosen the generation constraints or ADMET thresholds. If more than 30% survive, tighten the filters to be more selective.
- Generate more than you need: Aim to generate 5 to 10 times more molecules than your final target count. Each filter stage typically eliminates 30 to 60% of candidates.
- Iterate on scoring weights: If your candidates cluster too tightly in property space, increase the diversity weight in the generation step. If they are too diverse, tighten the similarity constraint.
- Use the Molecular Design Lab: SciRouter's Molecular Design Lab automates this entire pipeline in a single API call with built-in filtering and ranking.
- Save intermediate results: Write CSV or JSON files at each stage so you can restart from any point without re-running earlier steps.
- Batch API calls: When screening hundreds of molecules, batch your property and ADMET calls to maximize throughput. The SciRouter API handles concurrent requests efficiently.
Common Failure Modes and How to Avoid Them
Mode Collapse in Generation
If REINVENT4 produces many similar molecules, the scoring function may be too narrow. Increase the generation temperature, add a diversity component to the scoring function, or widen the acceptable property ranges.
Over-Filtering
Applying too many strict filters can eliminate all candidates. Start with essential filters (Lipinski, hERG, Ames) and add additional ones only if you have too many candidates. Remember that every filter is a prediction with its own error rate – false negatives eliminate good compounds.
Ignoring Synthesis Cost
A molecule with SA score 2.0 that costs $200 to synthesize is worth more than a molecule with slightly better binding but SA score 5.0 that costs $5,000. Weight synthesis feasibility appropriately in your ranking function.
Next Steps
You now have a complete workflow for taking AI-generated molecules from SMILES strings to synthesis-ready candidates. Use REINVENT4 for generation, molecular properties for validation, and synthesis check for feasibility assessment.
For a deeper comparison of molecule generators, see our REINVENT4 vs MolMIM vs DrugEx comparison. To learn more about ADMET prediction, read our ADMET prediction guide.
For the fully automated version of this pipeline with a visual interface, try the Molecular Design Lab on SciRouter. Sign up at scirouter.ai/register for 500 free credits and start designing drug candidates today.