Drug DiscoveryDrug Discovery Tools

Matched Molecular Pair Analysis: Systematic SAR for Drug Design

Use matched molecular pair analysis for systematic SAR. Identify activity cliffs, predict property changes from single-atom modifications, and optimize leads.

Ryan Bethencourt
April 8, 2026
9 min read

What Is Matched Molecular Pair Analysis?

Every medicinal chemist has faced the same question: "If I change this methyl group to a chlorine, what happens to potency? To solubility? To metabolic stability?" The problem is that in most drug design campaigns, multiple changes are made simultaneously, making it impossible to attribute any single property change to any single structural modification. The result is muddled SAR where intuition substitutes for evidence.

Matched molecular pair (MMP) analysis solves this problem by enforcing a simple constraint: compare two molecules that differ by exactly one structural transformation. If molecule A and molecule B share the same scaffold and differ only in one substituent – say, a hydrogen replaced by a fluorine at the 4-position of a phenyl ring – then any difference in their measured properties is attributable to that specific H-to-F transformation. This is the molecular equivalent of a controlled experiment.

The concept was formalized by Kenny and Sadowski in 2005, building on decades of informal SAR analysis in pharmaceutical companies. What made MMP analysis transformative was the recognition that large databases of measured compounds (ChEMBL, corporate screening collections) contain thousands of matched pairs hidden within them. By algorithmically extracting these pairs and tabulating the property changes associated with each transformation, you build a quantitative lookup table of medicinal chemistry knowledge.

Consider the scope. ChEMBL 34 contains over 2.4 million compounds with measured bioactivities. From this database, algorithms can extract tens of millions of matched molecular pairs spanning thousands of distinct transformations. For common transformations – methyl to ethyl, phenyl to pyridyl, amide to reverse amide – you might have 10,000 or more examples, providing robust statistics on the expected property changes. For rarer transformations, even 20 to 50 examples can provide useful directional guidance.

Identifying Molecular Pairs: The Algorithm

The core of MMP analysis is an algorithm that takes a set of molecules and identifies all pairs that differ by a single transformation. The most widely used approach is the Hussain-Rea algorithm (2010), which works by fragmenting each molecule at single non-ring bonds and cataloging the resulting fragments. Two molecules form a matched pair if they share the same "context" (the larger fragment) and differ only in the "variable" (the smaller fragment).

Step 1: Fragment Each Molecule

For each molecule, cut every single acyclic bond to generate all possible two-fragment decompositions. Each decomposition produces a context fragment (typically the larger piece, representing the core scaffold) and a variable fragment (the smaller piece, representing the substituent). A molecule with N single acyclic bonds produces N distinct decompositions.

Step 2: Index by Context

Build a hash table where the key is the SMILES representation of the context fragment and the value is a list of (variable fragment, original molecule) pairs. Two molecules that share the same context fragment are matched pair candidates – they have the same scaffold and differ only in the variable region.

Step 3: Extract Transformations

For each context that has two or more associated molecules, extract all pairwise combinations. Each pair defines a transformation: variable_A to variable_B. If molecule X has context C with variable V1 and molecule Y has context C with variable V2, then the transformation is V1 to V2 applied to context C.

Size constraints are typically applied to ensure chemical relevance. The variable fragment is usually limited to 13 or fewer heavy atoms (to exclude trivial "transformations" that replace most of the molecule). The context must contain at least one ring (to ensure the core is a meaningful scaffold rather than a simple chain).

Note
The Hussain-Rea algorithm has O(N) complexity in the number of molecules for the fragmentation step and O(M) for pair extraction, where M is the number of context collisions. For a dataset of 100,000 molecules, the algorithm typically runs in under 60 seconds and produces 1 to 10 million matched pairs, depending on structural diversity.

Systematic SAR from Single-Atom Changes

The most powerful insights from MMP analysis come from the simplest transformations: single-atom substitutions and small functional group replacements. These are the building blocks of medicinal chemistry optimization, and MMP analysis provides quantitative expectations for each one.

Hydrogen to Fluorine

The H-to-F transformation is the most studied in medicinal chemistry. Across thousands of matched pairs, replacing a hydrogen with a fluorine typically decreases LogP by 0.1 to 0.3 units (fluorine is more polar than hydrogen despite its electronegativity), improves metabolic stability at CYP450-vulnerable positions by blocking oxidative metabolism, and has variable effects on potency depending on the binding site environment. Fluorine at an sp3 carbon is metabolically stabilizing; fluorine on an aromatic ring can modulate pKa of adjacent functional groups.

Methyl to Ethyl

Adding a single carbon (CH3 to CH2CH3) increases LogP by approximately 0.5 units, increases molecular weight by 14 Da, and typically increases metabolic liability (more sites for CYP450 oxidation). In binding sites, the extra methyl group can fill a small hydrophobic pocket to improve potency by 2 to 5-fold, or it can create a steric clash that reduces potency by 10-fold. MMP analysis across thousands of kinase inhibitors shows that the effect is strongly position-dependent.

Phenyl to Pyridyl

Replacing a benzene ring with a pyridine ring (C to N substitution) reduces LogP by 0.6 to 1.0 units, introduces a hydrogen bond acceptor, and often improves aqueous solubility by 3 to 10-fold. The position of the nitrogen matters: 2-pyridyl, 3-pyridyl, and 4-pyridyl substitutions have different effects on pKa, basicity, and protein interactions. This is one of the most commonly applied bioisosteric replacements in lead optimization campaigns.

Amide to Reverse Amide

Swapping the orientation of an amide bond (CONHR to NHCOR) changes the hydrogen bond donor/acceptor pattern, often affecting membrane permeability and metabolic stability. MMP analysis shows that the reverse amide typically has lower aqueous solubility (reduced dipole moment) but improved metabolic stability (less susceptible to amidase cleavage). The effect on potency depends on whether the amide participates in direct hydrogen bonds with the target protein.

Using SciRouter to Compare Analogs

SciRouter's Lead Optimization Lab provides the computational infrastructure to run MMP analysis programmatically. The workflow involves generating or collecting a set of analogs that differ by defined transformations, profiling each analog across multiple property dimensions, and building a transformation effect table.

Step 1: Define Your Lead and Transformations

Start with your lead compound and define a set of single-point transformations to explore. Each transformation produces one matched pair with the lead as the reference.

Define lead compound and MMP transformations
import os, requests

API_KEY = os.environ["SCIROUTER_API_KEY"]
BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

# Lead compound: a JAK2 inhibitor scaffold
LEAD_SMILES = "CC1CCN(C(=O)c2cnc3ccccc3n2)CC1"

# Matched pairs: single transformations from the lead
transformations = {
    "lead":       "CC1CCN(C(=O)c2cnc3ccccc3n2)CC1",
    "4F-phenyl":  "CC1CCN(C(=O)c2cnc3cc(F)ccc3n2)CC1",
    "6Cl-quinaz": "CC1CCN(C(=O)c2cnc3ccc(Cl)cc3n2)CC1",
    "N-methyl":   "CC1CCN(C(=O)c2cnc3ccccc3n2)C(C)C1",
    "pyridyl":    "CC1CCN(C(=O)c2cnc3ccncc3n2)CC1",
    "CF3":        "CC1CCN(C(=O)c2cnc3cc(C(F)(F)F)ccc3n2)CC1",
    "OH":         "CC1CCN(C(=O)c2cnc3cc(O)ccc3n2)CC1",
    "cyclopropyl":"C1CC1C2CCN(C(=O)c3cnc4ccccc4n3)CC2",
}

print(f"Lead: {LEAD_SMILES}")
print(f"Transformations to evaluate: {len(transformations) - 1}")

Step 2: Profile Every Analog

For each analog in the matched pair set, compute molecular properties and ADMET predictions. This creates a property matrix where each row is an analog and each column is a measured property.

Profile all analogs in the MMP set
# Profile all analogs
profiles = {}

for name, smiles in transformations.items():
    props = requests.post(f"{BASE}/chemistry/properties",
        headers=HEADERS, json={"smiles": smiles}).json()
    admet = requests.post(f"{BASE}/chemistry/admet",
        headers=HEADERS, json={"smiles": smiles}).json()
    synth = requests.post(f"{BASE}/chemistry/synthesis-check",
        headers=HEADERS, json={"smiles": smiles}).json()

    profiles[name] = {
        "smiles": smiles,
        "mw": props["molecular_weight"],
        "logp": props["logp"],
        "tpsa": props["tpsa"],
        "hbd": props["h_bond_donors"],
        "hba": props["h_bond_acceptors"],
        "rotatable_bonds": props["rotatable_bonds"],
        "herg": admet["herg_inhibition"],
        "hepatotox": admet["hepatotoxicity"],
        "oral_f": admet["oral_bioavailability"],
        "solubility": admet["solubility_class"],
        "sa_score": synth["sa_score"],
    }

# Print property table
header = f"{'Name':<14} {'MW':>6} {'LogP':>5} {'TPSA':>5} {'hERG':>6} {'OralF':>6} {'SA':>4}"
print(header)
print("-" * len(header))
for name, p in profiles.items():
    print(f"{name:<14} {p['mw']:>6.1f} {p['logp']:>5.2f} "
          f"{p['tpsa']:>5.1f} {p['herg']:>6} {p['oral_f']:>6} "
          f"{p['sa_score']:>4.1f}")

Step 3: Build the Transformation Effect Table

The transformation effect table is the core deliverable of MMP analysis. For each transformation, compute the delta (change) in every property relative to the lead compound. Positive deltas indicate increases; negative deltas indicate decreases. Color-code or flag transformations that move properties in favorable directions.

Compute transformation effects relative to lead
# Compute deltas vs lead
lead = profiles["lead"]

print("\n=== Transformation Effect Table ===")
print(f"{'Transformation':<14} {'dMW':>5} {'dLogP':>6} {'dTPSA':>6} "
      f"{'hERG':>6} {'OralF':>6} {'dSA':>5}")
print("-" * 56)

for name, p in profiles.items():
    if name == "lead":
        continue
    d_mw = p["mw"] - lead["mw"]
    d_logp = p["logp"] - lead["logp"]
    d_tpsa = p["tpsa"] - lead["tpsa"]
    d_sa = p["sa_score"] - lead["sa_score"]

    print(f"{name:<14} {d_mw:>+5.0f} {d_logp:>+6.2f} {d_tpsa:>+6.1f} "
          f"{p['herg']:>6} {p['oral_f']:>6} {d_sa:>+5.1f}")

# Identify best transformation per property
print("\n=== Best Transformations ===")
analogs = {k: v for k, v in profiles.items() if k != "lead"}

best_logp = min(analogs.items(), key=lambda x: x[1]["logp"])
best_sa = min(analogs.items(), key=lambda x: x[1]["sa_score"])
print(f"Lowest LogP:  {best_logp[0]} ({best_logp[1]['logp']:.2f})")
print(f"Lowest SA:    {best_sa[0]} ({best_sa[1]['sa_score']:.1f})")

Scaling MMP Analysis with Generative Models

Manual enumeration of matched pairs works well for 5 to 10 transformations, but the real power of MMP analysis emerges at scale. Instead of hand-picking transformations, you can use SciRouter's molecule generation endpoint to create hundreds of analogs with controlled similarity to the lead, then algorithmically extract all matched pairs from the resulting set.

Generate analogs for large-scale MMP extraction
import time

# Generate 100 close analogs (high similarity = single-point changes)
job = requests.post(f"{BASE}/chemistry/generate", headers=HEADERS, json={
    "model": "reinvent4",
    "num_molecules": 100,
    "objectives": {
        "similarity": {
            "weight": 1.0,
            "reference_smiles": LEAD_SMILES,
            "min_similarity": 0.7,   # High similarity = small changes
            "max_similarity": 0.95,
        },
        "drug_likeness": {"weight": 0.5, "method": "qed"},
        "synthetic_accessibility": {"weight": 0.3, "max_sa_score": 5.0},
    },
}).json()

# Poll for completion
while True:
    result = requests.get(
        f"{BASE}/chemistry/generate/{job['job_id']}", headers=HEADERS
    ).json()
    if result["status"] in ("completed", "failed"):
        break
    time.sleep(10)

analogs = result["molecules"]
print(f"Generated {len(analogs)} close analogs for MMP extraction")

# Verify similarity range
for mol in analogs[:5]:
    sim = requests.post(f"{BASE}/chemistry/similarity", headers=HEADERS,
        json={"smiles_a": LEAD_SMILES, "smiles_b": mol["smiles"]}).json()
    print(f"  Tanimoto: {sim['tanimoto']:.3f} - {mol['smiles'][:60]}")

The key insight is setting the similarity window to 0.7 to 0.95. This constrains the generator to produce analogs that differ from the lead by only one or two small changes – exactly the type of modifications that produce informative matched pairs. Lower similarity thresholds (0.3 to 0.6) produce analogs with multiple simultaneous changes, which are useful for scaffold hopping but not for isolating individual transformation effects.

Once you have 100 close analogs with full property profiles, you can extract matched pairs algorithmically by grouping analogs that share the same maximum common substructure (MCS) and differ in exactly one substituent. The transformation effect table then contains statistically robust entries because you have multiple examples of each common transformation across slightly different contexts within the same chemical series.

Applications in Lead Optimization

MMP analysis transforms lead optimization from an intuition-driven art into a data-driven engineering discipline. Here are the primary applications where MMP analysis delivers the most value.

Metabolic Soft Spot Remediation

When a lead compound is metabolized too quickly by CYP450 enzymes, the medicinal chemist needs to identify which position on the molecule is vulnerable and what replacement will block metabolism without degrading other properties. MMP analysis provides a systematic answer. If the 4-position of a phenyl ring is the metabolic soft spot, the transformation table shows that H-to-F at that position blocks metabolism (delta CYP stability: +40%) while maintaining potency (delta IC50: less than 2-fold) and slightly improving solubility (delta LogP: -0.2). The H-to-Cl transformation at the same position also blocks metabolism but increases LogP by 0.7, worsening the overall profile.

Solubility Rescue

Poor aqueous solubility is a leading cause of compound attrition. MMP analysis reveals which transformations most effectively improve solubility in the context of your specific scaffold. Common solubility-improving transformations include phenyl to pyridyl (introduces a nitrogen that improves crystal packing disruption), methyl to hydroxymethyl (adds a polar group), and gem-dimethyl to cyclopropyl (reduces conformational flexibility, which can improve both solubility and metabolic stability simultaneously).

hERG Liability Reduction

Inhibition of the hERG potassium channel is a critical safety liability that has killed multiple drug programs. hERG inhibition correlates with lipophilicity and the presence of basic nitrogen atoms. MMP analysis across hERG screening data shows that reducing LogP by 1 unit through polar substitutions typically reduces hERG risk from "medium" to "low." Specific transformations like tert-butyl to isopropyl (delta LogP: -0.5, delta hERG IC50: +3-fold) provide targeted fixes when the overall lipophilicity cannot be reduced further.

Selectivity Window Expansion

When a kinase inhibitor shows off-target activity against related kinases, MMP analysis helps identify transformations that differentially affect the target versus the off-target. If the target has a deeper hydrophobic pocket at the gatekeeper position than the off-target, then adding a larger substituent at the appropriate position (methyl to ethyl, or ethyl to isopropyl) can selectively improve target potency while reducing off-target activity. MMP data from selectivity screening panels provides the evidence for which transformations achieve this differential effect.

Building a Reusable MMP Knowledge Base

The long-term value of MMP analysis extends beyond any single lead optimization campaign. Each campaign generates transformation effect data that can be stored in a searchable knowledge base. Over time, this knowledge base becomes a quantitative encyclopedia of medicinal chemistry transformations, applicable to future projects on different targets and different scaffolds.

Store MMP results for future reference
import json

# Structure MMP results for storage
mmp_database = []

for name, p in profiles.items():
    if name == "lead":
        continue
    mmp_database.append({
        "lead_smiles": LEAD_SMILES,
        "analog_smiles": p["smiles"],
        "transformation": name,
        "scaffold_class": "quinazoline_amide",
        "target": "JAK2",
        "deltas": {
            "molecular_weight": p["mw"] - lead["mw"],
            "logp": round(p["logp"] - lead["logp"], 2),
            "tpsa": round(p["tpsa"] - lead["tpsa"], 1),
            "sa_score": round(p["sa_score"] - lead["sa_score"], 1),
        },
        "properties": {
            "herg": p["herg"],
            "oral_bioavailability": p["oral_f"],
            "solubility": p["solubility"],
            "hepatotoxicity": p["hepatotox"],
        },
    })

# Save to JSON for reuse
with open("mmp_results_jak2.json", "w") as f:
    json.dump(mmp_database, f, indent=2)

print(f"Stored {len(mmp_database)} transformation records")
print(f"Transformations cataloged: "
      f"{[r['transformation'] for r in mmp_database]}")

Over multiple campaigns, this database accumulates hundreds of transformation records across different scaffolds and targets. When starting a new lead optimization campaign, you can query the database: "What effect does phenyl-to-pyridyl have on hERG liability across all scaffolds?" If 8 out of 10 previous examples show a reduction in hERG risk, you have strong evidence to prioritize that transformation in your new campaign.

Tip
Combine MMP analysis with SciRouter's ADMET prediction endpoint to profile every analog across 12 or more ADMET properties simultaneously. This creates a rich multi-dimensional transformation effect table that captures safety-relevant properties alongside the standard physicochemical descriptors.

Limitations and Best Practices

MMP analysis is powerful but not without caveats. Understanding its limitations ensures you apply the method appropriately and interpret results correctly.

  • Context dependence: The effect of a transformation can vary depending on the scaffold context. Phenyl-to-pyridyl might improve solubility on one scaffold but reduce potency on another due to differences in binding mode. Always check whether transformation effects are consistent across multiple contexts before generalizing.
  • Additivity assumption: Combining two individually beneficial transformations does not guarantee additive improvement. The H-to-F and methyl-to-OH transformations might each improve solubility independently but interact unfavorably when applied simultaneously. Test combinations explicitly.
  • Property cliff risk: Some transformations produce activity cliffs – small structural changes with disproportionately large property effects. A single-atom change that disrupts a critical hydrogen bond can reduce potency by 100-fold. MMP analysis reveals these cliffs, but they cannot always be predicted in advance.
  • 3D effects are invisible: MMP analysis operates on 2D molecular graphs. It cannot capture 3D effects like conformational changes, intramolecular hydrogen bonds, or steric clashes that arise from specific substitution patterns. Complement MMP analysis with docking studies for 3D-dependent SAR questions.
  • Data quality: The transformation effects are only as reliable as the underlying property measurements. Noisy assay data can produce misleading transformation effects. Use ADMET predictions from validated models like SciRouter's ADMET-AI to supplement or cross-validate experimental measurements.

From MMP Analysis to Design Decisions

The end goal of MMP analysis is not a transformation table – it is a design decision. Given a lead compound with identified liabilities (too lipophilic, hERG risk, poor metabolic stability), MMP analysis provides an evidence-based ranking of which specific structural modifications are most likely to fix each liability with minimal impact on other properties.

The decision framework is straightforward. For each liability, identify the transformations that address it from the MMP table. Rank them by the magnitude of their effect. Cross-reference against other properties to ensure the transformation does not introduce new liabilities. Prioritize transformations that improve the target liability while being neutral or beneficial on other dimensions. Synthesize the top 3 to 5 candidates and test experimentally.

With SciRouter's molecular properties, ADMET-AI, and molecule generator endpoints, you can run a complete MMP analysis workflow in under 10 minutes: generate analogs with controlled similarity, profile them across all relevant properties, build the transformation effect table, and identify the optimal modifications for your specific lead optimization challenge. What traditionally required weeks of synthesis and testing becomes a computational exercise that informs and accelerates the experimental campaign.

Frequently Asked Questions

What is a matched molecular pair?

A matched molecular pair (MMP) is two molecules that differ by a single, well-defined structural transformation at one site while sharing the same core scaffold. For example, replacing a methyl group with a chlorine atom on an otherwise identical molecule creates a matched pair. The key requirement is that only one change is made, so any difference in measured properties can be attributed directly to that specific transformation. This makes MMPs the gold standard for understanding structure-activity relationships in medicinal chemistry.

How is MMP analysis different from traditional SAR?

Traditional SAR analysis examines how multiple simultaneous structural changes affect biological activity, making it difficult to isolate the effect of any single modification. MMP analysis enforces the constraint that exactly one structural change is made per comparison. This isolates the contribution of each transformation to every measured property. The result is a quantitative table of transformation effects that can be applied predictively to new scaffolds, whereas traditional SAR often produces qualitative rules that are scaffold-dependent.

What is a molecular transformation in the context of MMP analysis?

A molecular transformation is the specific chemical change between the two molecules in a matched pair. It is expressed as a SMIRKS pattern or as a simple replacement notation such as CH3 to Cl, or phenyl to pyridyl. Transformations can range from single-atom substitutions (H to F) to functional group replacements (OH to NH2) to ring modifications (benzene to pyridine). Each transformation has a measurable effect on molecular properties, and the goal of MMP analysis is to catalog these effects systematically across many pairs.

How many matched pairs do I need for statistically reliable conclusions?

For a given transformation to yield statistically reliable property predictions, you generally need at least 10 to 20 matched pairs containing that transformation across different scaffolds. With fewer pairs, the observed effect may be scaffold-specific rather than generalizable. Large pharmaceutical databases like ChEMBL contain millions of matched pairs, enabling robust statistics for common transformations. For novel or unusual transformations, computational generation of analogs via the SciRouter API can supplement sparse experimental data.

Can MMP analysis predict the effect of a transformation on a new molecule?

Yes, and this is one of the most powerful applications. If you have cataloged that the methyl-to-fluorine transformation reduces LogP by an average of 0.5 units and improves metabolic stability by 30% across 50 matched pairs on different scaffolds, you can predict with reasonable confidence that making the same transformation on your current lead will have a similar effect. The prediction is most reliable when the transformation is well-represented in the database and the new molecule shares structural features with the training pairs.

How does SciRouter support MMP analysis computationally?

SciRouter supports MMP analysis through its molecular properties, ADMET prediction, and molecule generation endpoints. You can generate a series of analogs that differ by single transformations using the molecule generator, then profile each analog with the properties and ADMET endpoints to build a comprehensive MMP table. The similarity endpoint helps verify that pairs share the same core scaffold. The entire workflow can be automated via the Python SDK to analyze hundreds of transformations in minutes.

Try this yourself

500 free credits. No credit card required.