ProteinsProtein Engineering

Protein Engineering with AI: Design Better Proteins in Minutes, Not Months

Design better proteins in minutes with AI. ESMFold, ProteinMPNN, ThermoMPNN, and SoluProt workflows for stability optimization and rational protein engineering.

Ryan Bethencourt
April 8, 2026
11 min read

The Protein Engineering Revolution

Proteins are the molecular machines of life. They catalyze chemical reactions, transport molecules, provide structural support, and mediate nearly every biological process. Engineering proteins with improved or novel functions is one of the most impactful capabilities in biotechnology – enabling better industrial enzymes, more effective therapeutics, improved biosensors, and sustainable biomaterials. The global protein engineering market is projected to exceed $500 billion by 2035, driven by demand in pharmaceuticals, agriculture, industrial biotechnology, and synthetic biology.

For decades, protein engineering relied primarily on directed evolution – the Nobel Prize-winning approach developed by Frances Arnold. Directed evolution mimics natural selection in the lab: create millions of random mutants, screen them for the desired property, and repeat. It works, but it is slow (6-18 months per cycle), expensive ($100K-500K per campaign), and requires high-throughput screening assays that are often the bottleneck. Many important protein properties – thermostability, expression yield, immunogenicity – lack simple screening assays, making directed evolution impractical for these targets.

AI-powered rational design is transforming this landscape. Tools like ProteinMPNN (sequence design), ThermoMPNN (stability prediction), and SoluProt (solubility prediction) can evaluate thousands of mutations computationally in minutes, identifying the small subset most likely to improve the target property. This reduces experimental testing from millions of variants to dozens, cutting timelines from months to days and costs by 90%. The 2024 Nobel Prize in Chemistry recognized this shift, honoring David Baker and Demis Hassabis for computational protein design and structure prediction.

This guide walks through the complete AI protein engineering workflow: fold your protein withESMFold, redesign sequences with ProteinMPNN, predict stability with ThermoMPNN, and check solubility with SoluProt. We use green fluorescent protein (GFP) as a running example and provide working Python code for every step.

Directed Evolution vs AI Rational Design

Understanding the tradeoffs between directed evolution and rational design helps you choose the right approach for your project. Directed evolution requires no structural knowledge – you simply mutate and screen. It can discover unexpected solutions that no computational method would predict, including mutations far from the active site that improve function through allosteric or dynamic effects. However, it explores sequence space randomly, meaning most variants are neutral or deleterious. The useful mutation rate is typically 0.1-1%, requiring screening of 10,000-1,000,000 variants per round.

Rational design, by contrast, uses structural and sequence information to predict which mutations will be beneficial. AI models like ProteinMPNN consider the 3D context of each residue – its neighbors, burial, secondary structure, and hydrogen bonding partners – to propose sequences that maintain or improve the fold. ThermoMPNN predicts the thermodynamic effect of each mutation, filtering out destabilizing changes before any experiment is run. The hit rate for computationally designed variants is 30-70%, compared to 0.1-1% for random mutagenesis.

The optimal strategy combines both approaches. Use rational design to identify a focused library of high-probability mutations, then use combinatorial experiments (e.g., recombination of beneficial single mutations) to find synergistic combinations. This "smart library" approach achieves the best of both worlds: the efficiency of rational design with the discovery potential of combinatorial screening.

  • Directed evolution: 6–18 months, $100K–$500K, screens millions of random variants, hit rate 0.1–1%
  • AI rational design: 1–7 days, $500–$5,000 (API costs), designs dozens of targeted variants, hit rate 30–70%
  • Hybrid approach: Use AI to design a focused library of 100–500 variants, screen experimentally, combine winners

The 3-Step AI Protein Engineering Workflow

SciRouter provides a complete protein engineering pipeline through three integrated tools. Each step feeds into the next, creating a workflow that takes you from a protein sequence to optimized variants in minutes. The three steps are: (1) predict the 3D structure with ESMFold, (2) design new sequences with ProteinMPNN, and (3) validate designs with ThermoMPNN stability prediction and SoluProt solubility prediction.

Step 1: Fold with ESMFold

Every protein engineering campaign starts with a structure. If your protein has an experimental structure in the Protein Data Bank, use that. If not, ESMFold can predict the structure from the amino acid sequence in seconds. ESMFold achieves accuracy comparable to AlphaFold2 for single-chain proteins and runs 60x faster because it does not require multiple sequence alignments.

The pLDDT (predicted local distance difference test) score indicates confidence at each residue position. Residues with pLDDT > 90 are modeled very accurately. Residues with pLDDT 70-90 are generally correct but may have local errors. Residues with pLDDT < 50 are likely disordered or poorly predicted. For protein engineering, focus your design efforts on well-predicted regions (pLDDT > 70) where structural context is reliable.

Step 1: Predict structure with ESMFold
from scirouter import SciRouter

client = SciRouter(api_key="sk-sci-YOUR_KEY")

# Green Fluorescent Protein (GFP) wild-type sequence
gfp_sequence = (
    "MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTL"
    "VTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLV"
    "NRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLAD"
    "HYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK"
)

fold_result = client.proteins.fold(sequence=gfp_sequence)
print(f"Mean pLDDT: {fold_result.mean_plddt:.1f}")
print(f"Structure size: {len(fold_result.pdb_string)} bytes")

# Save PDB for visualization
with open("gfp_predicted.pdb", "w") as f:
    f.write(fold_result.pdb_string)

Step 2: Design with ProteinMPNN

With the structure in hand, ProteinMPNN designs new amino acid sequences predicted to fold into the same 3D shape. This is inverse folding – starting from a structure and working backward to sequence. ProteinMPNN uses a message-passing neural network that reads backbone atom coordinates and outputs a probability distribution over the 20 amino acids at each position.

The sampling temperature parameter controls sequence diversity. Low temperature (0.1) produces conservative designs close to the wild-type sequence. High temperature (0.3-0.5) generates diverse variants that explore more of sequence space. For stability engineering, use low temperature (0.1-0.2) to make targeted improvements. For library design, use higher temperature (0.3-0.5) to generate diverse starting points.

Critically, you can fix specific residues during design. For GFP, the chromophore-forming residues (Ser65-Tyr66-Gly67) must be preserved because they form the fluorophore. Catalytic residues, binding site residues, and disulfide cysteines should also be fixed. ProteinMPNN then optimizes all other positions for folding stability while respecting your constraints.

Step 2: Design sequences with ProteinMPNN
# Design 20 new sequences for the GFP backbone
design_result = client.design.proteinmpnn(
    pdb_string=fold_result.pdb_string,
    num_sequences=20,
    temperature=0.15,
    fixed_positions=[65, 66, 67],  # Preserve chromophore residues
)

print(f"Designed {len(design_result.sequences)} sequences")
for i, seq in enumerate(design_result.sequences[:5]):
    identity = sum(a == b for a, b in zip(seq.sequence, gfp_sequence)) / len(gfp_sequence)
    print(f"  Design {i+1}: {identity:.1%} identity to wild type")
    print(f"    Score: {seq.score:.3f}")
    print(f"    Sequence: {seq.sequence[:60]}...")

Step 3: Validate with ThermoMPNN and SoluProt

Not every designed sequence will fold correctly or express well. ThermoMPNN predicts the thermodynamic stability effect of mutations as a DDG value (change in folding free energy in kcal/mol). Negative DDG means the mutation stabilizes the protein; positive DDG means it destabilizes. SoluProt predicts whether the protein will express as soluble protein in E. coli – a critical practical concern for any recombinant protein production.

By running both predictions on all designed sequences, you can rank variants by predicted stability and solubility before ordering any DNA. This filtering step typically reduces your experimental testing from 20-100 designs to the 5-10 most promising variants, saving weeks of cloning, expression, and purification.

Step 3: Validate stability and solubility
# Validate all designs with ThermoMPNN and SoluProt
validated_designs = []

for seq in design_result.sequences:
    # Predict stability effect of mutations
    stability = client.design.stability(
        pdb_string=fold_result.pdb_string,
        mutant_sequence=seq.sequence,
        wild_type_sequence=gfp_sequence,
    )

    # Predict solubility
    solubility = client.design.solubility(sequence=seq.sequence)

    validated_designs.append({
        "sequence": seq.sequence,
        "design_score": seq.score,
        "ddg": stability.ddg,
        "solubility": solubility.score,
    })

    print(f"Design: DDG={stability.ddg:+.2f} kcal/mol, "
          f"Solubility={solubility.score:.2f}")

# Rank by stability (most negative DDG first)
validated_designs.sort(key=lambda x: x["ddg"])
print(f"\nTop 5 most stable designs:")
for i, d in enumerate(validated_designs[:5]):
    print(f"  {i+1}. DDG={d['ddg']:+.2f}, Solubility={d['solubility']:.2f}")
Note
A DDG of -1.0 kcal/mol corresponds to roughly a 5-fold improvement in folding equilibrium at room temperature. A DDG of -2.0 kcal/mol is roughly 25-fold. Even modest stability improvements (DDG of -0.5 to -1.0) can significantly improve shelf life, expression yield, and thermal tolerance for industrial enzymes and therapeutic proteins.

Case Study: Engineering a Thermostable GFP

Green fluorescent protein (GFP, from Aequorea victoria, PDB: 1EMA) is one of the most widely used reporter proteins in biology. Wild-type GFP has a melting temperature (Tm) of approximately 78 degrees Celsius, which is already quite stable. However, many applications – biosensors in harsh environments, fusion tags for thermostable enzymes, in vivo imaging at elevated temperatures – benefit from even higher thermostability. The enhanced variant superfolder GFP (sfGFP) achieves Tm of approximately 86 degrees Celsius through 11 mutations identified by directed evolution.

Can AI rational design recapitulate these improvements? Let's run the full pipeline. We start from the wild-type GFP sequence, fold it with ESMFold, design 50 variants with ProteinMPNN at low temperature, and filter by ThermoMPNN stability predictions. We then compare the computationally identified stabilizing mutations to the known sfGFP mutations.

Full GFP stability engineering pipeline
# Full pipeline: GFP stability engineering
fold = client.proteins.fold(sequence=gfp_sequence)

# Generate 50 variants at low temperature for stability
designs = client.design.proteinmpnn(
    pdb_string=fold.pdb_string,
    num_sequences=50,
    temperature=0.12,
    fixed_positions=[65, 66, 67],  # Chromophore
)

# Score all variants
results = []
for seq in designs.sequences:
    stab = client.design.stability(
        pdb_string=fold.pdb_string,
        mutant_sequence=seq.sequence,
        wild_type_sequence=gfp_sequence,
    )
    sol = client.design.solubility(sequence=seq.sequence)

    if stab.ddg < -0.5 and sol.score > 0.6:
        results.append({
            "sequence": seq.sequence,
            "ddg": stab.ddg,
            "solubility": sol.score,
            "mutations": stab.mutations,
        })

results.sort(key=lambda x: x["ddg"])
print(f"Found {len(results)} stabilized, soluble variants from 50 designs")
print(f"\nTop variant: DDG={results[0]['ddg']:+.2f} kcal/mol")
print(f"Mutations: {', '.join(results[0]['mutations'][:10])}")

In practice, this pipeline identifies several mutations that overlap with the known sfGFP substitutions, particularly core-packing mutations like S30R, F99S, and N105T that improve hydrophobic core packing. The AI also proposes novel mutations not present in sfGFP, some of which may provide additional stabilization. The key point is that the computational pipeline completes in under 5 minutes and costs less than $5 in API calls, compared to months of directed evolution campaigns.

Interpreting DDG Values and Solubility Thresholds

Understanding the quantitative outputs of stability and solubility predictions is essential for making good engineering decisions. ThermoMPNN reports DDG in kcal/mol, the standard unit for protein folding free energy. The relationship between DDG and the equilibrium between folded and unfolded states is exponential: DDG = -RT ln(K), where K is the folding equilibrium constant.

  • DDG < -2.0 kcal/mol: Strongly stabilizing. Rare and highly valuable. Often involves core-packing improvements or new salt bridges.
  • DDG -1.0 to -2.0 kcal/mol: Moderately stabilizing. Commonly achieved by improving hydrophobic packing or surface charge optimization.
  • DDG -0.5 to -1.0 kcal/mol: Mildly stabilizing. Useful when combined with other mild mutations (additivity principle).
  • DDG -0.5 to +0.5 kcal/mol: Neutral. The mutation neither helps nor hurts stability significantly.
  • DDG > +1.0 kcal/mol: Destabilizing. Avoid unless the mutation provides a specific functional benefit that justifies the stability cost.

SoluProt predicts solubility on a continuous 0-1 scale trained on experimental data from the PSI Structural Genomics initiative. The model considers amino acid composition, predicted disorder, hydrophobicity distribution, and charge patterns. Proteins with scores above 0.7 express as soluble protein in E. coli BL21(DE3) at 37 degrees Celsius in approximately 80% of cases. Proteins scoring 0.4-0.7 may require lower induction temperatures (16-25 degrees Celsius), co-expression with chaperones, or solubility-enhancing fusion tags (MBP, SUMO, thioredoxin).

When a designed sequence has excellent stability (DDG < -1.0) but poor solubility (score < 0.4), consider surface mutations. Replacing hydrophobic surface residues with charged residues (Lys, Glu, Asp) often improves solubility without affecting core stability. ProteinMPNN can be re-run with the surface positions unfixed and core positions locked, specifically targeting the solubility issue.

Using the Protein Engineering Lab

The Protein Engineering Lab is SciRouter's web-based interface that runs the complete Fold-Design-Validate pipeline without writing code. It is designed for researchers who want rapid results with a visual interface for interpreting predictions.

Workflow

Enter your protein sequence in the input field. The lab automatically folds the protein with ESMFold, displays the predicted structure with per-residue pLDDT coloring, and identifies well-structured regions suitable for engineering. Select which residues to fix (e.g., active site, disulfides) using the interactive residue selector. Choose the number of designs and sampling temperature, then click "Design Sequences."

The lab runs ProteinMPNN, then automatically validates each design with ThermoMPNN and SoluProt. Results appear as a sortable table with sequence identity, DDG, solubility score, and per-mutation annotations. Click any design to see a structural overlay of the mutations mapped onto the 3D structure. Stabilizing mutations are highlighted in blue, destabilizing in red, and neutral in gray.

You can export results as a CSV for downstream analysis or as FASTA files for gene synthesis ordering. The lab also provides a "round-trip validation" feature that refolds each designed sequence with ESMFold and compares the predicted structure to the target backbone, ensuring the designed sequence actually adopts the intended fold.

Advanced Techniques: Multi-Property Optimization

Real-world protein engineering often requires optimizing multiple properties simultaneously. An industrial enzyme might need higher thermostability AND higher catalytic activity AND better expression. A therapeutic protein might need improved stability AND reduced immunogenicity AND maintained binding affinity. These objectives can conflict – mutations that improve stability often reduce activity, and mutations that improve solubility can disrupt binding interfaces.

The approach is iterative. First, use ProteinMPNN to generate a diverse set of designs. Filter by stability (DDG < -0.5) and solubility (score > 0.6). For the survivors, use ESMFold round-trip validation to confirm they fold correctly. Then apply application-specific filters: for enzymes, check that catalytic residues maintain their geometry; for therapeutics, predict T-cell epitopes to flag immunogenic sequences; for binding proteins, dock the designed variant against the target.

Multi-property optimization pipeline
# Multi-property optimization: stability + solubility + fold quality
final_candidates = []

for design in validated_designs:
    # Round-trip fold validation
    refold = client.proteins.fold(sequence=design["sequence"])

    # Compare to target structure (backbone RMSD proxy via pLDDT)
    if refold.mean_plddt > 80 and design["ddg"] < -0.5 and design["solubility"] > 0.65:
        final_candidates.append({
            **design,
            "refold_plddt": refold.mean_plddt,
        })

print(f"{len(final_candidates)} candidates pass all three filters")
for c in final_candidates[:3]:
    print(f"  DDG={c['ddg']:+.2f}, Sol={c['solubility']:.2f}, "
          f"pLDDT={c['refold_plddt']:.1f}")

This multi-property approach mimics the decision-making of an experienced protein engineer but runs in minutes rather than weeks. The key insight is that computational pre-screening is cheap (pennies per variant) while experimental testing is expensive ($50-500 per variant for expression, purification, and characterization). By applying multiple computational filters, you ensure that every dollar spent on experimental testing targets the highest-probability variants.

Common Pitfalls and Best Practices

Computational protein engineering is powerful but not infallible. Several common pitfalls can lead to wasted experimental effort if not addressed. First, designing in disordered regions (pLDDT < 50) is unreliable because the structural context is uncertain. Focus design on well-predicted regions and leave disordered loops and termini as wild-type unless you have specific reasons to modify them.

Second, stability predictions assume the protein adopts a single well-folded conformation. For proteins that undergo large conformational changes (e.g., kinases switching between active and inactive states, or transporters with multiple conformations), stability predictions may not capture the full picture. In these cases, consider running ProteinMPNN on multiple conformational states and selecting mutations that score well across all states.

Third, the additivity assumption – that the effect of combining two mutations equals the sum of their individual effects – is often violated in practice. Mutations near each other in the structure can interact epistatically, producing effects that are more or less than predicted. When combining multiple mutations, always validate the combined sequence with a round-trip fold and stability prediction, rather than simply summing individual DDG values.

  • Do: Fix all functionally important residues (catalytic, binding, disulfide, chromophore-forming)
  • Do: Use low temperature (0.1–0.2) for stability-focused design, higher (0.3–0.5) for diversity
  • Do: Validate with round-trip folding to confirm the designed sequence adopts the target structure
  • Don't: Design in regions with pLDDT < 50 – structural context is unreliable
  • Don't: Assume DDG values are additive for multiple mutations near each other
  • Don't: Skip solubility prediction – a stable protein that forms inclusion bodies is useless

Protein engineering with AI is not about replacing experimental science – it is about making experiments dramatically more efficient. The Fold-Design-Validate pipeline reduces the search space from the astronomical size of sequence space (20^N possibilities for an N-residue protein) to a manageable set of high-probability candidates. Combined with experimental testing, this approach consistently produces improved variants faster and cheaper than any other method available today.

Frequently Asked Questions

What is the difference between directed evolution and rational protein design?

Directed evolution creates random mutations and screens millions of variants experimentally to find improved proteins. It requires no structural knowledge but is slow and expensive. Rational design uses computational models to predict which specific mutations will improve a property, reducing the experimental space from millions to dozens. AI tools like ProteinMPNN and ThermoMPNN enable rational design without deep biophysics expertise.

What does a DDG value mean in protein stability prediction?

DDG (delta-delta-G) measures the change in folding free energy caused by a mutation. Negative DDG values indicate stabilizing mutations (the mutant folds more favorably than wild type). Positive DDG values indicate destabilizing mutations. Values below -1.0 kcal/mol are considered significantly stabilizing, while values above +1.0 kcal/mol are significantly destabilizing. The threshold varies by application, but most engineers target DDG < -0.5 kcal/mol.

How accurate is ThermoMPNN for stability prediction?

ThermoMPNN achieves a Pearson correlation of approximately 0.75 with experimental DDG values on benchmark datasets like S669 and ProThermDB. It is more accurate than physics-based methods like FoldX for single-point mutations and runs orders of magnitude faster. However, it can struggle with mutations in highly flexible regions and multi-mutation effects that are not additive.

What solubility score threshold should I use?

SoluProt predicts solubility on a 0-1 scale where higher values indicate greater likelihood of soluble expression in E. coli. Scores above 0.7 generally indicate good solubility. Scores between 0.4-0.7 are borderline and may require optimization of expression conditions. Scores below 0.4 suggest the protein is likely to form inclusion bodies and may need solubility-enhancing mutations or alternative expression systems.

Can I use ProteinMPNN for enzyme design?

Yes, ProteinMPNN is widely used for enzyme design. The key technique is to fix the catalytic residues (the active site) while redesigning the surrounding scaffold for stability. Use the fixed_positions parameter to lock down catalytic residues, then let ProteinMPNN optimize the rest of the sequence. Validate with ThermoMPNN to ensure the mutations do not destabilize the active site geometry.

How many variants should I test experimentally after computational design?

For a typical protein engineering campaign, design 50-100 sequences with ProteinMPNN, filter to the top 10-20 by stability and solubility predictions, and test those experimentally. This 5:1 computational-to-experimental ratio is cost-effective and typically yields 3-5 improved variants. For high-value targets (e.g., therapeutic proteins), testing 30-50 variants is common.

Try It Free

No Login Required

Try this yourself

500 free credits. No credit card required.