PeptidesPeptide Lab

Peptide Design with ESMFold and AI: A Practical Guide

Why short peptides break AlphaFold's MSA assumption — and how ESMFold, hydrophobicity tuning, and mutation ranking let you design novel sequences in a single browser tab.

SciRouter Team
April 10, 2026
14 min read

Peptide design sits in a strange gap in computational biology. The big structure-prediction breakthroughs of the last few years — AlphaFold, RoseTTAFold, and their descendants — were optimized for full-length proteins and domains. Short peptides are not that. They are usually unique, not conserved across species, and frequently disordered in isolation. The tools that dominate the long-protein world are not always the right match.

This guide explains why short peptides are a different design problem, how ESMFold addresses the short-sequence gap, and how to put together a practical fold-property-mutate-iterate workflow that runs in a browser.

The MSA problem

Most pre-ESMFold structure predictors depend heavily on multiple sequence alignments. Given a target sequence, the first step is usually to search public protein databases for homologs, stack them into an alignment, and feed the evolutionary co-variation signal into the model. Residues that change together across evolutionary time often contact each other in 3D — that is the signal the model exploits to predict structure.

This is brilliant when you have deep alignments. It falls apart when you do not. And peptides — particularly engineered peptides, bioactive fragments, or short signaling motifs — frequently do not. Some of the reasons:

  • Peptides are short. A 15-residue sequence is too short to return meaningful BLAST hits in many cases.
  • Peptides are not conserved like domains. Functional peptides often tolerate substitution at many positions, diluting the evolutionary co-variation signal.
  • Designed peptides have no homologs by definition. If you built the peptide yesterday, there is no evolutionary record of it anywhere.

All of this is why classical MSA-dependent predictors often return low-confidence structures on short, standalone peptides.

How ESMFold is different

ESMFold is built on top of a large protein language model (the ESM-2 family). Instead of relying on multiple sequence alignments at inference time, it uses learned representations from the language model to predict structure directly from a single sequence. No alignment step, no homolog search, just sequence in, structure out.

That design has two practical consequences for peptide work:

  • It works on sequences without MSAs. Novel, engineered, and short peptides are fair game.
  • It is fast. Without the MSA pipeline, end-to-end prediction is measured in seconds to low tens of seconds on a single GPU for short inputs. That speed unlocks iterative design.

ESMFold is not perfect. For long, heavily multimeric systems, MSA-based methods still have the edge. For short, designed, or peptide-scale targets, ESMFold is frequently the better first tool.

Reading pLDDT for short peptides

Every ESMFold prediction comes with a pLDDT score — a per-residue confidence metric from 0 to 100, averaged to give a mean over the full sequence. It is the first thing you should look at on any new structure.

  • Mean pLDDT above 85. High confidence. The predicted backbone is probably close to the true structure. For peptides, this typically means the sequence has a well-defined secondary structure element.
  • Mean 70-85. Reasonable confidence. Treat the backbone shape as indicative rather than definitive. Side-chain details may be approximate.
  • Mean 50-70. Partial or conditional folding. The peptide may adopt a structure only in specific contexts (membrane, binding partner) or may populate multiple conformations.
  • Mean below 50. Likely disordered or unstable. ESMFold is essentially telling you it cannot commit to a single conformation. That is information — many bioactive peptides are intrinsically disordered until they engage a partner.
Note
Low pLDDT is not a failure. For peptides like BPC-157 (proline-rich and likely disordered), low pLDDT is the correct answer. For peptides like LL-37 in a lipid context (helical), high pLDDT signals that your prediction is grounded. Context matters.

A practical design workflow

Here is a simple, repeatable workflow that Peptide Lab is built around:

1. Start with a seed sequence

Pick a known peptide (BPC-157, LL-37, GHK-Cu, Semax) or paste in a candidate of your own. The seed gives you a baseline structure and a baseline property profile to compare against.

2. Predict structure with ESMFold

Run the fold call. Inspect the mean pLDDT and the per-residue confidence. Is the backbone credible? Where are the low-confidence regions? That is your first real piece of structural information.

3. Evaluate physicochemical properties

Independent of 3D structure, peptides live or die by their bulk properties: net charge, hydrophobicity, helix propensity, aggregation risk. Compute these on the sequence directly — they are cheap and informative. Peptide Lab shows a 5-axis property radar covering solubility, stability, permeability, hemolysis risk, and structural confidence.

4. Propose mutations

Pick a target: increase solubility, reduce hydrophobic surface, shift net charge, preserve helix register. A simple mutation scanner ranks single-residue substitutions by predicted improvement in the target metric.

5. Re-predict and iterate

Apply the best candidate mutation, re-run ESMFold, recompute properties, and compare. Did the structure survive the substitution? Did the property shift? If yes, accept and keep going. If no, back out and try the next candidate. Rinse and repeat.

That loop is exactly what Peptide Lab implements in-browser, with every step cached so repeat visits to the same sequence are instant.

Calling ESMFold through the SciRouter API

If you want to scale this workflow beyond the browser, you can call ESMFold directly through SciRouter's protein endpoint. Here is a minimal Python example that folds a short peptide:

fold_peptide.py
import httpx

API_KEY = "sk-sci-your-key-here"
BASE_URL = "https://scirouter-gateway-production.up.railway.app"

# BPC-157 as an example
sequence = "GEPPPGKPADDAGLV"

response = httpx.post(
    f"{BASE_URL}/v1/proteins/fold",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={"sequence": sequence},
    timeout=120.0,
)

result = response.json()
job_id = result["job_id"]
print(f"Submitted fold job: {job_id}")

Because ESMFold runs asynchronously for long sequences, you typically submit the job and then poll for the result:

poll_result.py
import time

while True:
    poll = httpx.get(
        f"{BASE_URL}/v1/proteins/fold/{job_id}",
        headers={"Authorization": f"Bearer {API_KEY}"},
    )
    data = poll.json()
    if data["status"] == "completed":
        structure_pdb = data["result"]["pdb"]
        mean_plddt = data["result"]["mean_plddt"]
        print(f"Mean pLDDT: {mean_plddt:.1f}")
        break
    time.sleep(2)

For short peptides like BPC-157 (15 residues) or GHK-Cu (3 residues), the inference time is typically under 10 seconds once the worker is warm. For longer peptides (LL-37 at 37 residues, tirzepatide at 39 residues) it is still usually well under a minute. That is fast enough to put a fold call inside a design loop.

Where structure prediction is not enough

A pragmatic caveat: ESMFold predicts static backbone structures. It does not tell you:

  • How a peptide binds a receptor. That is a docking or co-folding problem — tools like Chai-1 or DiffDock are better suited.
  • How the peptide behaves in a membrane environment. Amphipathic AMPs like LL-37 adopt different conformations in solution versus at a lipid interface.
  • Stability to proteases. Structure does not directly predict half-life; you need sequence-level models for protease susceptibility or empirical measurement.
  • Activity. A well-folded peptide can be inert. A disordered peptide can be potent. Structure confidence is not the same as bioactivity.

A good design workflow layers complementary methods rather than relying on any single model.

Start in Peptide Lab

The easiest way to try this workflow is to use SciRouter's peptide playground. Nothing to install:

Pick a seed, edit a sequence, fold it, inspect the properties, mutate, re-fold, save. That loop — sequence in, structure out, property out, next sequence in — is the core of modern peptide design, and you can run it without leaving your browser tab.

Open Peptide Lab →

Frequently Asked Questions

Why is folding short peptides harder than folding full-length proteins?

Most modern folding models (AlphaFold, RoseTTAFold) were trained to exploit multiple sequence alignments — stacks of related sequences from homologous proteins. Short peptides rarely have rich MSAs because they are not conserved across species the way large protein domains are. Without a deep MSA, MSA-hungry models lose their main signal. ESMFold sidesteps this problem by using a protein language model that does not require an MSA at all.

What is ESMFold good at for peptides?

ESMFold performs competitively on short, standalone sequences precisely because it is MSA-free. For peptides that form a well-defined secondary structure element (a helix, a beta hairpin, a stable turn) ESMFold typically returns a confident backbone. For intrinsically disordered peptides it returns a low-confidence prediction, which is itself informative — it tells you the peptide probably does not have a single stable conformation.

How should I read the pLDDT score on a short peptide?

pLDDT is a per-residue confidence metric from 0 to 100. For short peptides, look at the mean pLDDT across the whole sequence rather than single-residue values. Mean above 70 is typically credible for the backbone. Mean between 50 and 70 suggests partial or conditional folding. Mean below 50 generally indicates the peptide is predicted to be disordered.

Can I design new peptides with ESMFold alone?

Not exactly. ESMFold predicts structure from sequence; it does not generate new sequences. The useful design pattern is to pair structure prediction with property prediction and a mutation explorer. You generate or choose a candidate sequence, predict its structure, score its properties, propose substitutions, and iterate. That is the workflow Peptide Lab supports.

Do I need a GPU to run this?

Not for short peptides through SciRouter. The gateway runs ESMFold on managed GPU infrastructure, and peptide workloads are extremely fast. You call the API, you get the structure. For large-scale batch design you may want dedicated GPU capacity; for prototyping the hosted endpoint is plenty.

How do I start in Peptide Lab?

Go to /peptide-lab, pick a seed peptide (BPC-157, LL-37, GHK-Cu, or any of the 30 starters), and open its workspace. Edit the sequence, run 'Predict 3D' to call ESMFold, check the property radar, and use the mutation suggester to rank single-point changes. Save variants and compare them against the wild type.

Try this yourself

500 free credits. No credit card required.