ProteinsAntibody Design

CDR Design with AntiFold: Engineer Antibody Binding Loops

Design antibody CDR loops with AntiFold. Structure-aware sequence optimization for binding affinity, hands-on tutorial with SciRouter API.

Ryan Bethencourt
April 8, 2026
9 min read

What Are Complementarity Determining Regions?

Every antibody has a Y-shaped structure with two functional domains: the constant region that mediates immune effector functions, and the variable region that binds to a specific antigen. Within the variable region, six short loops – three on the heavy chain and three on the light chain – form the actual antigen-binding site. These loops are called Complementarity Determining Regions, or CDRs.

CDR-H1, CDR-H2, and CDR-H3 sit on the heavy chain variable domain (VH). CDR-L1, CDR-L2, and CDR-L3 sit on the light chain variable domain (VL). Of these, CDR-H3 is the longest, most diverse, and most critical for determining what the antibody binds to. It is also the hardest to model computationally because its conformations are less constrained by canonical structural rules than the other five CDRs.

When you engineer an antibody – whether for better binding affinity, altered specificity, or improved developability – you are almost always modifying CDR sequences. The framework regions between CDRs provide structural scaffolding and are typically left unchanged. This is what makes CDR design the central task of antibody engineering.

Why Structure-Aware CDR Design Matters

Traditional antibody optimization relies on random mutagenesis and screening: you create thousands of CDR variants in the lab, express them, and test which ones bind. This works, but it is slow and expensive. Computational approaches can narrow the search space by predicting which CDR sequences are likely to fold into the desired structure and maintain binding.

The key insight behind structure-aware design is that CDR sequences and CDR structures are tightly coupled. A CDR loop must fold into a specific 3D conformation to present the right residues at the right positions for antigen contact. If you change the sequence in a way that disrupts the loop conformation, binding will be lost regardless of what residues you introduce.

This is where AntiFold excels. Unlike sequence-only models that treat CDR design as a text generation problem, AntiFold is an inverse folding model: it takes a 3D antibody structure as input and predicts amino acid sequences that are likely to fold into that structure. This structural grounding means AntiFold's designs respect the geometric constraints of the binding site, producing sequences that are far more likely to fold correctly and maintain function.

How AntiFold Works

AntiFold is built on the inverse folding paradigm – given backbone coordinates of a protein structure, predict the amino acid sequence that would fold into those coordinates. The model architecture uses a graph neural network that encodes the 3D structure of the antibody as a graph, where nodes represent residues and edges encode spatial relationships.

Training on Antibody-Specific Data

AntiFold is trained exclusively on antibody structures from the Structural Antibody Database (SAbDab), which contains thousands of experimentally determined antibody crystal structures. This antibody-specific training gives AntiFold several advantages over general-purpose inverse folding models like ProteinMPNN:

  • Canonical CDR classes: AntiFold learns the discrete structural classes that CDR-H1, CDR-H2, CDR-L1, CDR-L2, and CDR-L3 adopt. It generates sequences compatible with the canonical form of each loop.
  • CDR-H3 diversity: CDR-H3 does not follow canonical rules, so AntiFold learns the broader distribution of H3 conformations from thousands of examples.
  • VH/VL interface: The model encodes how heavy and light chain variable domains pack together, ensuring designs maintain proper chain pairing.
  • Framework compatibility: Designed CDR sequences are conditioned on the surrounding framework residues, maintaining structural compatibility at the CDR-framework boundaries.

The Design Process

When you call AntiFold through SciRouter, the following happens:

  • The input PDB structure is parsed and the antibody chains are identified
  • CDR regions are located using antibody numbering (IMGT or Chothia scheme)
  • The 3D graph representation is built from backbone atom coordinates
  • The model autoregressively generates amino acid probabilities at each CDR position, conditioned on the structure and any fixed framework residues
  • Multiple sequences are sampled at the specified temperature
  • Each design is scored by its log-likelihood under the model

Getting Started: Prerequisites

You need Python 3.8+ and a SciRouter API key. Sign up at scirouter.ai/register for 500 free credits per month – enough for dozens of design runs.

Install the SDK
pip install scirouter
Set your API key
export SCIROUTER_API_KEY="sk-sci-your-api-key-here"

Hands-On: CDR Design with SciRouter

Step 1: Fold the Starting Antibody

AntiFold requires a 3D structure as input. If you already have a crystal structure PDB file, you can use it directly. If you only have sequences, fold them first with ImmuneBuilder. Here we start with trastuzumab (Herceptin), a well-characterized anti-HER2 antibody:

Fold the starting antibody
from scirouter import SciRouter

client = SciRouter()

# Trastuzumab variable region sequences
heavy_chain = (
    "EVQLVESGGGLVQPGGSLRLSCAASGFNIKDTYIHWVRQAPGKGLEWVARIYPTNGYTRYADSVKG"
    "RFTISADTSKNTAYLQMNSLRAEDTAVYYCSRWGGDGFYAMDYWGQGTLVTVSS"
)
light_chain = (
    "DIQMTQSPSSLSASVGDRVTITCRASQDVNTAVAWYQQKPGKAPKLLIYSASFLYSGVPSRFSGSR"
    "SGTDFTLTISSLQPEDFATYYCQQHYTTPPTFGQGTKVEIK"
)

# Predict the 3D structure
structure = client.antibodies.fold(
    heavy_chain=heavy_chain,
    light_chain=light_chain,
)

print(f"Structure predicted. Mean pLDDT: {structure.mean_plddt:.1f}")
print(f"PDB size: {len(structure.pdb)} bytes")
Note
If you have an experimental crystal structure (for example, PDB ID 1N8Z for trastuzumab), you can skip this step and pass the PDB text directly to AntiFold. Experimental structures generally give better design results than predicted structures because they have more accurate backbone coordinates.

Step 2: Design CDR-H3 Variants

Start by designing the most impactful region – CDR-H3. This loop contributes the most to antigen binding specificity and is where mutations have the highest chance of changing binding behavior:

Design CDR-H3 sequences
# Design new CDR-H3 sequences
designs = client.antibodies.design(
    pdb=structure.pdb,
    num_sequences=10,
    regions=["CDR-H3"],
    temperature=0.2,
)

print(f"Generated {len(designs.sequences)} CDR-H3 variants:\n")
for i, seq in enumerate(designs.sequences):
    print(f"Variant {i+1}:")
    print(f"  CDR-H3:    {seq.cdr_h3}")
    print(f"  Recovery:  {seq.sequence_recovery:.1%}")
    print(f"  Log-LL:    {seq.log_likelihood:.2f}")
    print()

Step 3: Expand to Multiple CDR Regions

Once you have validated single-region designs, you can design multiple CDRs simultaneously for broader optimization of the binding interface:

Multi-CDR design
# Design all three heavy chain CDRs
multi_designs = client.antibodies.design(
    pdb=structure.pdb,
    num_sequences=20,
    regions=["CDR-H1", "CDR-H2", "CDR-H3"],
    temperature=0.15,  # lower temperature for multi-region stability
)

print(f"Generated {len(multi_designs.sequences)} multi-CDR variants:\n")
for i, seq in enumerate(multi_designs.sequences[:5]):
    print(f"Variant {i+1}:")
    print(f"  H1: {seq.cdr_h1}")
    print(f"  H2: {seq.cdr_h2}")
    print(f"  H3: {seq.cdr_h3}")
    print(f"  Log-LL: {seq.log_likelihood:.2f}")
    print()
Tip
When designing multiple CDR regions, use lower temperatures (0.1 to 0.2) and generate more candidates (20 or more). The combinatorial space is much larger, so you need more samples to find high-quality designs. Sort by log-likelihood to find the top candidates.

Interpreting AntiFold Results

AntiFold returns several metrics for each designed sequence. Understanding these metrics is critical for selecting candidates worth testing experimentally.

Log-Likelihood

The log-likelihood score reflects the model's confidence that the designed sequence will fold into the target structure. Higher (less negative) values indicate better structural compatibility. Compare log-likelihoods across designs to rank candidates, but note that the absolute values depend on the scaffold and are not directly comparable across different antibodies.

Sequence Recovery

Sequence recovery is the fraction of designed positions that match the original (wild-type) sequence. A recovery of 0.8 means 80% of CDR residues are unchanged. High recovery (above 0.7) indicates conservative designs that maintain the original binding mode. Low recovery (below 0.4) suggests more radical redesigns that may adopt different binding mechanisms.

Per-Position Probabilities

AntiFold also provides per-position amino acid probability distributions. Positions with high entropy (many amino acids with similar probabilities) are tolerant of mutation, while positions with low entropy (one dominant amino acid) are structurally constrained and should be kept unchanged:

Analyze per-position probabilities
# Examine position-level details for the top design
top_design = max(designs.sequences, key=lambda s: s.log_likelihood)

print(f"Top design CDR-H3: {top_design.cdr_h3}")
print(f"Log-likelihood: {top_design.log_likelihood:.2f}")
print(f"Sequence recovery: {top_design.sequence_recovery:.1%}")

# Identify mutable positions (high entropy)
if hasattr(top_design, "position_entropies"):
    for pos, entropy in enumerate(top_design.position_entropies):
        marker = "<-- mutable" if entropy > 1.5 else ""
        print(f"  Position {pos}: entropy={entropy:.2f} {marker}")

Iterating on Designs: The Fold-Design-Validate Loop

The most powerful workflow with AntiFold is an iterative loop: design CDRs, fold the new sequences to validate they form good structures, then optionally feed the best structures back into AntiFold for further refinement. This mimics computational directed evolution.

Iterative design-validate loop
import json

results = []

# Round 1: Design CDR-H3 variants
print("=== Round 1: Initial design ===")
designs = client.antibodies.design(
    pdb=structure.pdb,
    num_sequences=10,
    regions=["CDR-H3"],
    temperature=0.2,
)

# Validate each design by re-folding
ranked = sorted(designs.sequences, key=lambda s: s.log_likelihood, reverse=True)

for i, design in enumerate(ranked[:5]):
    validation = client.antibodies.fold(
        heavy_chain=design.full_heavy_chain,
        light_chain=light_chain,
    )
    passed = validation.mean_plddt >= 75
    results.append({
        "round": 1,
        "variant": i + 1,
        "cdr_h3": design.cdr_h3,
        "log_likelihood": design.log_likelihood,
        "plddt": validation.mean_plddt,
        "passed": passed,
    })
    status = "PASS" if passed else "FAIL"
    print(f"  Variant {i+1}: pLDDT={validation.mean_plddt:.1f} [{status}]")

    # Round 2: Refine the best passing design
    if passed and i == 0:
        print("\n=== Round 2: Refine best variant ===")
        refined = client.antibodies.design(
            pdb=validation.pdb,
            num_sequences=10,
            regions=["CDR-H3"],
            temperature=0.15,  # tighter sampling for refinement
        )
        for j, ref_design in enumerate(refined.sequences[:3]):
            ref_val = client.antibodies.fold(
                heavy_chain=ref_design.full_heavy_chain,
                light_chain=light_chain,
            )
            ref_passed = ref_val.mean_plddt >= 75
            results.append({
                "round": 2,
                "variant": j + 1,
                "cdr_h3": ref_design.cdr_h3,
                "log_likelihood": ref_design.log_likelihood,
                "plddt": ref_val.mean_plddt,
                "passed": ref_passed,
            })
            status = "PASS" if ref_passed else "FAIL"
            print(f"  Refined {j+1}: pLDDT={ref_val.mean_plddt:.1f} [{status}]")

# Save results
with open("cdr_design_results.json", "w") as f:
    json.dump(results, f, indent=2)
print(f"\nTotal candidates: {len(results)}, Passed: {sum(1 for r in results if r['passed'])}")
Note
Each round of refinement tends to converge toward higher-confidence designs. Two to three rounds are usually sufficient. More rounds risk over-optimization, producing sequences that score well computationally but lose binding diversity that matters experimentally.

Controlling Design Diversity with Temperature

The temperature parameter is your primary control over how different the designed CDRs are from the original sequence. Choosing the right temperature depends on your goals:

  • Temperature 0.1: Very conservative. Designs differ by 1 to 2 mutations from the wild-type. Best for fine-tuning an already-good binder.
  • Temperature 0.2: Moderate. Designs differ by 2 to 5 mutations. Good default for affinity maturation.
  • Temperature 0.3: Exploratory. Designs may have 30 to 50% new residues. Useful for generating diverse libraries.
  • Temperature 0.5: Aggressive. Significant sequence divergence. Use this when you want to explore entirely new binding modes.
Compare temperature effects
# Generate designs at different temperatures
for temp in [0.1, 0.2, 0.3, 0.5]:
    designs = client.antibodies.design(
        pdb=structure.pdb,
        num_sequences=5,
        regions=["CDR-H3"],
        temperature=temp,
    )
    avg_recovery = sum(s.sequence_recovery for s in designs.sequences) / len(designs.sequences)
    avg_ll = sum(s.log_likelihood for s in designs.sequences) / len(designs.sequences)
    print(f"Temperature {temp}: avg recovery={avg_recovery:.1%}, avg log-LL={avg_ll:.2f}")

Targeting Specific CDR Positions

Sometimes you know which positions in a CDR are critical for binding (from alanine scanning or structural analysis) and want to keep them fixed while redesigning the rest. AntiFold supports this through position masking:

Fixed-position design
# Design CDR-H3 but keep positions 100 and 100a fixed (IMGT numbering)
# These are the key contact residues from crystal structure analysis
designs = client.antibodies.design(
    pdb=structure.pdb,
    num_sequences=10,
    regions=["CDR-H3"],
    fixed_positions=["H100", "H100a"],  # IMGT numbering
    temperature=0.25,
)

for i, seq in enumerate(designs.sequences[:5]):
    print(f"Variant {i+1}: CDR-H3={seq.cdr_h3} (LL={seq.log_likelihood:.2f})")

Combining AntiFold with Other SciRouter Tools

CDR design is most powerful when combined with other tools in a multi-step pipeline. Here are three common workflows:

Design + Structure Validation

Use AntiFold for CDR design, then ImmuneBuilder to validate that the designed sequences fold correctly. This is the workflow shown in the examples above.

Design + Docking

After designing CDR variants and validating their structures, dock them against the target antigen using DiffDock or Boltz-2 to predict binding affinity. This adds a binding-quality filter to your structural designs.

End-to-End Antibody Discovery

SciRouter's Antibody Design Studio chains all of these steps into a single pipeline: fold the scaffold, design CDR variants, validate structures, and rank candidates by predicted binding quality.

Best Practices for CDR Design

  • Start with a good scaffold: Use an experimental crystal structure when available. Predicted structures work but introduce additional uncertainty in the backbone coordinates.
  • Design CDR-H3 first: It contributes the most to specificity. Once you have good H3 variants, optionally expand to other CDRs.
  • Always validate by re-folding: A high log-likelihood from AntiFold does not guarantee the sequence will fold well. Re-fold every candidate and check pLDDT scores.
  • Generate more candidates than you need: Expect 30 to 50% of designs to fail structural validation. Generate 20 or more candidates to get 5 to 10 good ones.
  • Check for liabilities: After selecting structurally valid designs, screen for sequence liabilities like N-glycosylation motifs (N-X-S/T), deamidation hotspots (NG, NS), and unpaired cysteines.
  • Use multiple temperatures: Generate a diverse pool by sampling at different temperatures, then merge and rank by structural quality.

What Running AntiFold Locally Requires

For context, here is what you would need to run AntiFold on your own machine:

  • PyTorch with CUDA support (NVIDIA GPU required)
  • PyTorch Geometric for graph neural network operations
  • ESM library for antibody language model features
  • ANARCI for antibody numbering (requires HMMER installation)
  • Custom trained model weights (~500 MB)
  • Careful version pinning across all dependencies
  • Setup time: 1 to 3 hours for an experienced engineer

SciRouter eliminates all of this. AntiFold runs on pre-deployed GPU instances and is accessible through two lines of Python.

Next Steps

You now have the tools to design antibody CDRs computationally. Use AntiFold for structure-aware CDR design and ImmuneBuilder for structure validation. For a fully automated pipeline from antigen to ranked antibody candidates, try the Antibody Design Studio.

To evaluate binding to a specific antigen, dock your designed antibodies with DiffDock or predict complex structures with Boltz-2. For nanobody-specific design, see our guide on nanobody engineering with AI.

Sign up at scirouter.ai/register for 500 free credits and start designing antibody CDRs today.

Frequently Asked Questions

What are CDRs and why are they important for antibody function?

CDRs (Complementarity Determining Regions) are the six hypervariable loops on an antibody that directly contact the antigen. Three are on the heavy chain (CDR-H1, CDR-H2, CDR-H3) and three on the light chain (CDR-L1, CDR-L2, CDR-L3). CDR-H3 is typically the most variable and contributes the most to binding specificity. Designing better CDR sequences is the primary mechanism for engineering antibodies with improved affinity, specificity, or cross-reactivity.

How does AntiFold differ from general-purpose inverse folding models like ProteinMPNN?

AntiFold is specifically trained on antibody structures from the Structural Antibody Database (SAbDab). It understands the unique constraints of CDR loop conformations, canonical classes, and the VH/VL interface. ProteinMPNN is trained on general proteins and does not model antibody-specific features like CDR canonical forms or the framework-CDR boundary. AntiFold produces higher-quality CDR designs because it encodes antibody-specific structural knowledge.

What temperature should I use for CDR design?

Temperature controls sequence diversity. Lower temperatures (0.1 to 0.2) produce conservative designs close to the original sequence, which is useful for affinity maturation of an already-good binder. Higher temperatures (0.3 to 0.5) produce more diverse sequences, which is useful for exploring new binding modes or generating diverse libraries for screening. Start with 0.2 for optimization and 0.4 for exploration.

Can I design CDRs for all six loops simultaneously?

Yes. AntiFold supports designing any combination of CDR regions in a single call. You can target CDR-H3 alone, all three heavy chain CDRs, or all six CDRs. Designing more regions simultaneously increases the search space, so you should use lower temperatures and generate more candidates to find good solutions. Always validate multi-CDR designs by re-folding with ImmuneBuilder.

How do I know if a designed CDR sequence will actually fold correctly?

The best validation is to fold the designed sequence with ImmuneBuilder and check the pLDDT confidence score, especially in the CDR region. Designs with mean pLDDT above 80 and CDR pLDDT above 70 are strong candidates. You can also check the sequence recovery metric from AntiFold, where higher recovery indicates the design is more structurally compatible with the scaffold. Ultimately, experimental validation through expression and binding assays is the gold standard.

How long does AntiFold CDR design take through SciRouter?

AntiFold CDR design typically completes in 5 to 15 seconds per batch of sequences, depending on the number of candidates requested and the number of CDR regions being designed. The first call may take 20 to 40 seconds if the GPU worker needs a cold start. Subsequent calls within the same session are faster due to model caching.

Try It Free

No Login Required

Try this yourself

500 free credits. No credit card required.