Drug DiscoveryDrug Discovery Tools

From SMILES to Synthesis: Complete Workflow for AI-Designed Drug Candidates

End-to-end drug design workflow: generate molecules, validate properties, screen ADMET, check synthesizability, and order synthesis. Full Python SDK tutorial.

Ryan Bethencourt
April 8, 2026
10 min read

The Drug Design Pipeline: An Overview

Designing a drug candidate is not a single step – it is a pipeline. You start with a target hypothesis (a protein you want to inhibit or activate), generate candidate molecules computationally, validate their drug-like properties, screen for toxicity and metabolic liabilities, verify that they can actually be synthesized, and finally order synthesis of the top candidates. Each step eliminates compounds, and only the survivors advance.

Traditionally, this pipeline involves a patchwork of tools: REINVENT or another generator for molecule design, RDKit for property calculations, custom ADMET models or commercial software like StarDrop for ADMET prediction, retrosynthesis tools like ASKCOS or AiZynthFinder for synthesis planning, and spreadsheets or databases to track results. Integrating these tools requires significant engineering effort and domain expertise.

This guide walks through the complete pipeline using SciRouter's unified API. Every step – from molecule generation to synthesis feasibility – runs through a single SDK with consistent data formats.

Prerequisites

You need Python 3.8+ and a SciRouter API key. Sign up at scirouter.ai/register for 500 free credits per month.

Install dependencies
pip install scirouter pandas
Set your API key
export SCIROUTER_API_KEY="sk-sci-your-api-key-here"

Step 1: Generate Candidate Molecules

The pipeline starts with molecule generation. We use REINVENT4 to produce novel molecules optimized against a set of drug-like property targets. For this example, we design kinase inhibitor candidates with a quinazoline scaffold bias:

Generate candidate molecules
from scirouter import SciRouter
import pandas as pd

client = SciRouter()

# Generate 100 molecules optimized for kinase inhibitor properties
gen_result = client.generate.molecules(
    num_molecules=100,
    scoring={
        "qed": {"weight": 0.25, "target": 0.7},
        "sa_score": {"weight": 0.25, "target": 3.0},
        "molecular_weight": {"weight": 0.2, "min": 300, "max": 550},
        "logp": {"weight": 0.15, "min": 1.0, "max": 4.5},
        "hbd": {"weight": 0.15, "max": 3},
    },
    similarity={
        "reference_smiles": "c1ccc(-c2cnc3ccccc3n2)cc1",  # quinazoline core
        "min_tanimoto": 0.25,
    },
)

print(f"Generated {len(gen_result.molecules)} molecules")

# Quick overview
for mol in gen_result.molecules[:5]:
    print(f"  {mol.smiles} (QED={mol.qed:.2f}, SA={mol.sa_score:.1f})")
Note
The similarity parameter biases generation toward molecules that share substructural features with the reference SMILES. A min_tanimoto of 0.25 is permissive enough for scaffold hopping while maintaining some structural relevance to your target pharmacophore.

Step 2: Calculate Molecular Properties

The first filter is drug-likeness. Use molecular properties to compute a comprehensive property profile for each generated molecule. These properties determine whether a molecule has the physicochemical characteristics of an oral drug:

Calculate properties for all candidates
# Calculate detailed properties for each molecule
molecules_data = []

for mol in gen_result.molecules:
    props = client.chemistry.properties(smiles=mol.smiles)
    molecules_data.append({
        "smiles": mol.smiles,
        "mw": props.molecular_weight,
        "logp": props.logp,
        "hba": props.hba,
        "hbd": props.hbd,
        "tpsa": props.tpsa,
        "rotatable_bonds": props.rotatable_bonds,
        "qed": props.qed,
        "rings": props.num_rings,
    })

df = pd.DataFrame(molecules_data)
print(f"Calculated properties for {len(df)} molecules")
print(f"\nProperty ranges:")
print(f"  MW:    {df['mw'].min():.0f} - {df['mw'].max():.0f}")
print(f"  LogP:  {df['logp'].min():.1f} - {df['logp'].max():.1f}")
print(f"  HBA:   {df['hba'].min()} - {df['hba'].max()}")
print(f"  HBD:   {df['hbd'].min()} - {df['hbd'].max()}")
print(f"  TPSA:  {df['tpsa'].min():.0f} - {df['tpsa'].max():.0f}")

Apply Lipinski and Veber Filters

Apply the Rule of Five (Lipinski) and Veber's rules to eliminate molecules with poor oral bioavailability potential:

Apply drug-likeness filters
# Lipinski's Rule of Five
lipinski = df[
    (df["mw"] <= 500) &
    (df["logp"] <= 5.0) &
    (df["hba"] <= 10) &
    (df["hbd"] <= 5)
].copy()
print(f"Pass Lipinski: {len(lipinski)}/{len(df)}")

# Veber's rules (oral bioavailability)
veber = lipinski[
    (lipinski["tpsa"] <= 140) &
    (lipinski["rotatable_bonds"] <= 10)
].copy()
print(f"Pass Veber:    {len(veber)}/{len(lipinski)}")

# Additional quality filters
quality = veber[
    (veber["qed"] >= 0.4) &
    (veber["rings"] >= 1) &
    (veber["rings"] <= 5)
].copy()
print(f"Pass quality:  {len(quality)}/{len(veber)}")
print(f"\n{len(quality)} molecules advance to ADMET screening")
Tip
Lipinski and Veber filters are necessary but not sufficient. Many molecules that pass these rules still fail in biological assays. The ADMET screen in the next step catches liabilities that physicochemical properties alone cannot predict.

Step 3: ADMET Screening

ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) screening predicts biological behavior that cannot be inferred from simple molecular descriptors. This step eliminates compounds with metabolic instability, toxicity risks, or poor absorption before you invest in synthesis:

Screen ADMET properties
# Screen each candidate for ADMET liabilities
admet_results = []

for _, row in quality.iterrows():
    admet = client.chemistry.admet(smiles=row["smiles"])
    admet_results.append({
        "smiles": row["smiles"],
        "mw": row["mw"],
        "logp": row["logp"],
        "qed": row["qed"],
        "caco2_permeable": admet.caco2_permeable,
        "cyp_inhibitor": admet.cyp_inhibitor,
        "herg_safe": admet.herg_safe,
        "ames_safe": admet.ames_safe,
        "hepatotox_safe": admet.hepatotox_safe,
        "bbb_penetrant": admet.bbb_penetrant,
        "ppb": admet.plasma_protein_binding,
    })

admet_df = pd.DataFrame(admet_results)

# Apply ADMET filters
safe = admet_df[
    (admet_df["herg_safe"] == True) &         # no cardiac toxicity risk
    (admet_df["ames_safe"] == True) &          # no mutagenicity
    (admet_df["hepatotox_safe"] == True) &     # no liver toxicity
    (admet_df["caco2_permeable"] == True) &    # good intestinal absorption
    (admet_df["cyp_inhibitor"] == False)       # no major CYP inhibition
].copy()

print(f"ADMET screening results:")
print(f"  Input:          {len(admet_df)}")
print(f"  hERG safe:      {admet_df['herg_safe'].sum()}")
print(f"  Ames safe:      {admet_df['ames_safe'].sum()}")
print(f"  Hepatotox safe: {admet_df['hepatotox_safe'].sum()}")
print(f"  Caco-2 perm:    {admet_df['caco2_permeable'].sum()}")
print(f"  No CYP inhib:   {(~admet_df['cyp_inhibitor']).sum()}")
print(f"  Pass all:       {len(safe)}")
print(f"\n{len(safe)} molecules advance to synthesis check")

Step 4: Synthetic Accessibility Check

A beautiful molecule on screen is worthless if it cannot be made in a flask. Use synthesis check to evaluate whether each surviving candidate can realistically be synthesized:

Check synthesis feasibility
# Check synthetic accessibility for ADMET-safe candidates
synth_results = []

for _, row in safe.iterrows():
    synth = client.generate.synthesis_check(smiles=row["smiles"])
    synth_results.append({
        "smiles": row["smiles"],
        "mw": row["mw"],
        "logp": row["logp"],
        "qed": row["qed"],
        "sa_score": synth.sa_score,
        "feasibility": synth.feasibility,
        "num_steps": synth.estimated_steps,
    })

synth_df = pd.DataFrame(synth_results)

# Filter by synthesis feasibility
synthesizable = synth_df[
    (synth_df["sa_score"] < 4.5) &
    (synth_df["feasibility"].isin(["easy", "moderate"]))
].copy()

print(f"Synthesis check results:")
print(f"  Input:            {len(synth_df)}")
print(f"  Easy:             {(synth_df['feasibility'] == 'easy').sum()}")
print(f"  Moderate:         {(synth_df['feasibility'] == 'moderate').sum()}")
print(f"  Difficult:        {(synth_df['feasibility'] == 'difficult').sum()}")
print(f"  Very difficult:   {(synth_df['feasibility'] == 'very difficult').sum()}")
print(f"  Pass (SA < 4.5):  {len(synthesizable)}")
print(f"\n{len(synthesizable)} molecules are final candidates")
Note
The SA score is a heuristic, not a guarantee. Molecules with low SA scores (under 3.0) can typically be synthesized in 3 to 5 steps from commercial building blocks. Scores between 3.0 and 4.5 may require 5 to 8 steps. Always consult a medicinal chemist before committing to synthesis of your top candidates.

Step 5: Rank and Select Final Candidates

With all filters applied, rank the surviving candidates by a composite score that balances drug-likeness, ADMET profile, and synthetic feasibility:

Rank final candidates
# Composite ranking score
synthesizable = synthesizable.copy()
synthesizable["rank_score"] = (
    synthesizable["qed"] * 0.35 +                           # drug-likeness
    (1 - synthesizable["sa_score"] / 10) * 0.35 +           # synthesis ease (inverted)
    (1 - abs(synthesizable["logp"] - 2.5) / 5) * 0.15 +    # optimal LogP distance
    (1 - abs(synthesizable["mw"] - 400) / 200) * 0.15       # optimal MW distance
)

# Sort by rank score
final = synthesizable.sort_values("rank_score", ascending=False).head(10)

print("=== TOP 10 DRUG CANDIDATES ===\n")
for i, (_, row) in enumerate(final.iterrows()):
    print(f"Rank {i+1}:")
    print(f"  SMILES:       {row['smiles']}")
    print(f"  MW:           {row['mw']:.0f}")
    print(f"  LogP:         {row['logp']:.1f}")
    print(f"  QED:          {row['qed']:.2f}")
    print(f"  SA score:     {row['sa_score']:.1f}")
    print(f"  Feasibility:  {row['feasibility']}")
    print(f"  Est. steps:   {row['num_steps']}")
    print(f"  Rank score:   {row['rank_score']:.3f}")
    print()

Step 6: Export for Synthesis

Export the final candidates in formats ready for submission to a contract research organization (CRO) or your in-house medicinal chemistry team:

Export results
import json

# Export as CSV for spreadsheet review
final.to_csv("drug_candidates.csv", index=False)
print("Saved drug_candidates.csv")

# Export as JSON with full details
export_data = {
    "campaign": "kinase_inhibitor_v1",
    "pipeline": {
        "generated": len(gen_result.molecules),
        "passed_lipinski": len(lipinski),
        "passed_veber": len(veber),
        "passed_quality": len(quality),
        "passed_admet": len(safe),
        "passed_synthesis": len(synthesizable),
        "final_candidates": len(final),
    },
    "candidates": final.to_dict(orient="records"),
}

with open("drug_candidates.json", "w") as f:
    json.dump(export_data, f, indent=2)
print("Saved drug_candidates.json")

# Export SMILES list for CRO submission
with open("synthesis_order.smi", "w") as f:
    for i, (_, row) in enumerate(final.iterrows()):
        f.write(f"{row['smiles']}\tCandidate_{i+1}\n")
print("Saved synthesis_order.smi")

print(f"\nPipeline summary:")
print(f"  {len(gen_result.molecules)} generated -> {len(final)} candidates ({len(final)/len(gen_result.molecules)*100:.1f}% survival rate)")

The Complete Pipeline Script

Here is the full end-to-end script that chains all six steps into a single runnable pipeline:

Complete SMILES-to-synthesis pipeline
"""
SMILES-to-Synthesis Pipeline
Generates, validates, and ranks drug candidates using SciRouter.
"""

import json
import sys
from scirouter import SciRouter
from scirouter.exceptions import SciRouterError

client = SciRouter()

# --- Configuration ---
TARGET_SMILES = "c1ccc(-c2cnc3ccccc3n2)cc1"  # reference scaffold
NUM_MOLECULES = 100
SA_CUTOFF = 4.5
QED_CUTOFF = 0.4

print("=== SMILES-to-Synthesis Pipeline ===\n")

# Step 1: Generate
print("Step 1: Generating molecules...")
try:
    gen = client.generate.molecules(
        num_molecules=NUM_MOLECULES,
        scoring={
            "qed": {"weight": 0.3, "target": 0.7},
            "sa_score": {"weight": 0.3, "target": 3.0},
            "molecular_weight": {"weight": 0.2, "min": 300, "max": 550},
            "logp": {"weight": 0.2, "min": 1.0, "max": 4.5},
        },
        similarity={"reference_smiles": TARGET_SMILES, "min_tanimoto": 0.25},
    )
except SciRouterError as e:
    print(f"Generation failed: {e}")
    sys.exit(1)
print(f"  Generated: {len(gen.molecules)}")

# Step 2: Properties + Lipinski/Veber
print("Step 2: Property filtering...")
candidates = []
for mol in gen.molecules:
    p = client.chemistry.properties(smiles=mol.smiles)
    if (p.molecular_weight <= 500 and p.logp <= 5.0 and
        p.hba <= 10 and p.hbd <= 5 and
        p.tpsa <= 140 and p.rotatable_bonds <= 10 and
        p.qed >= QED_CUTOFF):
        candidates.append({"smiles": mol.smiles, "mw": p.molecular_weight,
                          "logp": p.logp, "qed": p.qed, "tpsa": p.tpsa})
print(f"  Pass drug-likeness: {len(candidates)}")

# Step 3: ADMET
print("Step 3: ADMET screening...")
admet_safe = []
for c in candidates:
    a = client.chemistry.admet(smiles=c["smiles"])
    if a.herg_safe and a.ames_safe and a.hepatotox_safe and a.caco2_permeable:
        admet_safe.append(c)
print(f"  Pass ADMET: {len(admet_safe)}")

# Step 4: Synthesis check
print("Step 4: Synthesis feasibility...")
synthesizable = []
for c in admet_safe:
    s = client.generate.synthesis_check(smiles=c["smiles"])
    if s.sa_score < SA_CUTOFF:
        c["sa_score"] = s.sa_score
        c["feasibility"] = s.feasibility
        synthesizable.append(c)
print(f"  Pass synthesis: {len(synthesizable)}")

# Step 5: Rank
print("Step 5: Ranking...\n")
for c in synthesizable:
    c["score"] = c["qed"] * 0.4 + (1 - c["sa_score"] / 10) * 0.4 + (1 - abs(c["logp"] - 2.5) / 5) * 0.2
synthesizable.sort(key=lambda x: x["score"], reverse=True)

# Step 6: Output
print("=== TOP CANDIDATES ===\n")
for i, c in enumerate(synthesizable[:10]):
    print(f"{i+1}. {c['smiles']}")
    print(f"   QED={c['qed']:.2f} SA={c['sa_score']:.1f} LogP={c['logp']:.1f} MW={c['mw']:.0f}")

with open("pipeline_results.json", "w") as f:
    json.dump({"candidates": synthesizable[:10]}, f, indent=2)

print(f"\nPipeline: {len(gen.molecules)} -> {len(synthesizable)} candidates")
print("Results saved to pipeline_results.json")

Ordering Synthesis: From SMILES to Flask

Once you have your ranked candidates, the next step is synthesis. There are several paths depending on your resources and timeline:

Make-on-Demand Services

Companies like Enamine, Mcule, and WuXi AppTec offer make-on-demand synthesis. You submit SMILES strings and receive purified compounds in 2 to 6 weeks. Typical costs range from $200 to $2,000 per compound depending on complexity. The SA score and estimated step count from the synthesis check give you a rough cost estimate.

Catalog Screening

Before ordering custom synthesis, check whether close analogs of your candidates are available in commercial compound libraries. Enamine REAL, Mcule, and MolPort contain billions of immediately purchasable or quickly synthesizable molecules. A Tanimoto similarity search against these catalogs may find compounds close enough to your designs that custom synthesis is unnecessary.

In-House Medicinal Chemistry

If you have medicinal chemists on your team, provide them with the SMILES, retrosynthetic route suggestions, and the property profile. They can evaluate the route feasibility, suggest modifications that simplify synthesis while maintaining the desired properties, and execute the synthesis.

Pipeline Optimization Tips

  • Adjust filter stringency to your hit rate: If fewer than 5% of generated molecules survive all filters, loosen the generation constraints or ADMET thresholds. If more than 30% survive, tighten the filters to be more selective.
  • Generate more than you need: Aim to generate 5 to 10 times more molecules than your final target count. Each filter stage typically eliminates 30 to 60% of candidates.
  • Iterate on scoring weights: If your candidates cluster too tightly in property space, increase the diversity weight in the generation step. If they are too diverse, tighten the similarity constraint.
  • Use the Molecular Design Lab: SciRouter's Molecular Design Lab automates this entire pipeline in a single API call with built-in filtering and ranking.
  • Save intermediate results: Write CSV or JSON files at each stage so you can restart from any point without re-running earlier steps.
  • Batch API calls: When screening hundreds of molecules, batch your property and ADMET calls to maximize throughput. The SciRouter API handles concurrent requests efficiently.

Common Failure Modes and How to Avoid Them

Mode Collapse in Generation

If REINVENT4 produces many similar molecules, the scoring function may be too narrow. Increase the generation temperature, add a diversity component to the scoring function, or widen the acceptable property ranges.

Over-Filtering

Applying too many strict filters can eliminate all candidates. Start with essential filters (Lipinski, hERG, Ames) and add additional ones only if you have too many candidates. Remember that every filter is a prediction with its own error rate – false negatives eliminate good compounds.

Ignoring Synthesis Cost

A molecule with SA score 2.0 that costs $200 to synthesize is worth more than a molecule with slightly better binding but SA score 5.0 that costs $5,000. Weight synthesis feasibility appropriately in your ranking function.

Next Steps

You now have a complete workflow for taking AI-generated molecules from SMILES strings to synthesis-ready candidates. Use REINVENT4 for generation, molecular properties for validation, and synthesis check for feasibility assessment.

For a deeper comparison of molecule generators, see our REINVENT4 vs MolMIM vs DrugEx comparison. To learn more about ADMET prediction, read our ADMET prediction guide.

For the fully automated version of this pipeline with a visual interface, try the Molecular Design Lab on SciRouter. Sign up at scirouter.ai/register for 500 free credits and start designing drug candidates today.

Frequently Asked Questions

What is SMILES notation and why is it used in drug design?

SMILES (Simplified Molecular Input Line Entry System) is a text-based notation for representing chemical structures. For example, aspirin is CC(=O)Oc1ccccc1C(=O)O. SMILES is used in computational drug design because it is compact, human-readable, machine-parseable, and compatible with all major cheminformatics tools. AI molecule generators output SMILES strings, which can then be converted to 2D/3D structures for analysis.

How long does the full SMILES-to-synthesis workflow take?

Using SciRouter's API, the computational portion of the workflow takes 5 to 15 minutes for a typical campaign of 100 generated molecules. Molecule generation takes 1 to 3 minutes, property calculation is sub-second per molecule, ADMET screening takes a few seconds per molecule, and synthesis checking is sub-second. The longest step is usually REINVENT4 generation if the scoring function includes docking.

What ADMET properties should I screen for in early drug design?

At minimum, screen for: oral absorption (Caco-2 permeability, PAMPA), metabolic stability (CYP inhibition, microsomal stability), toxicity alerts (hERG inhibition, Ames mutagenicity, hepatotoxicity), and basic pharmacokinetics (plasma protein binding, half-life estimate). These filters eliminate compounds with obvious liabilities before investing in synthesis.

What SA score cutoff should I use for filtering?

For most drug discovery programs, an SA score below 4.0 indicates easy to moderate synthesis difficulty. Scores between 4.0 and 5.5 are achievable by experienced medicinal chemists. Scores above 6.0 should be deprioritized unless the molecule has exceptional pharmacological properties that justify the synthetic investment. Most approved oral drugs have SA scores between 1.5 and 4.5.

Can I use this workflow for peptide or macrocyclic drug design?

The SMILES-based workflow is best suited to small molecule drug design (molecular weight under 600 Da). Peptides and macrocycles require specialized tools because their conformational flexibility and non-standard building blocks are not well-captured by standard SMILES generators. SciRouter offers separate tools for peptide and protein design through the ProteinMPNN and protein engineering endpoints.

How do I go from a computationally validated molecule to actual synthesis?

After computational validation, the next step is to engage a contract research organization (CRO) or in-house medicinal chemistry team. Provide them with the SMILES or structural drawing, the retrosynthetic route (if available from the synthesis check), and the target properties. Companies like Enamine, WuXi AppTec, and Mcule offer make-on-demand services where you submit SMILES and receive synthesized compounds in 2 to 6 weeks.

Try this yourself

500 free credits. No credit card required.