ImmunologyDarkScan

How to Design Your Own mRNA Vaccine: A Complete Guide to Computational Vaccine Engineering

Complete guide to computational mRNA vaccine design — codon optimization, antigen engineering, MHC binding prediction, and hands-on SciRouter API walkthrough with Python SDK examples.

Ryan Bethencourt
April 8, 2026
12 min read

The mRNA Revolution: From Pandemic Response to Precision Medicine

In December 2020, the first COVID-19 mRNA vaccines were authorized for emergency use. They represented the culmination of three decades of research into synthetic mRNA as a therapeutic platform – and the beginning of a revolution in how we think about vaccines. Within eighteen months, more than 13 billion mRNA vaccine doses had been manufactured and administered worldwide. The technology proved that mRNA could encode virtually any protein, be manufactured in weeks rather than months, and produce robust immune responses in humans.

By 2026, the mRNA vaccine platform has expanded far beyond COVID-19. Moderna has mRNA vaccines in clinical trials for influenza (mRNA-1010), RSV (mRNA-1345, now approved), Epstein-Barr virus (mRNA-1189), CMV (mRNA-1647), and several cancers. BioNTech is pursuing personalized cancer vaccines (autogene cevumeran), malaria, tuberculosis, and shingles. CureVac, Arcturus Therapeutics, and a dozen smaller companies are developing mRNA vaccines and therapeutics for everything from rare genetic disorders to autoimmune diseases.

The design principles underlying all of these programs are the same. Every mRNA vaccine begins as a computational design problem: which antigen to encode, how to optimize the mRNA sequence for stability and translation, and how to select the right regulatory elements. This guide walks through each of these steps in detail, with hands-on examples using the SciRouter API, so you can understand – and practice – the computational engineering behind modern mRNA vaccines.

Note
This guide focuses on computational vaccine design – the bioinformatics and sequence engineering that precede manufacturing. It does not cover lipid nanoparticle formulation, GMP manufacturing, or clinical development. Those topics deserve their own guides.

Anatomy of an mRNA Vaccine: The Five Essential Components

An mRNA vaccine is not simply a strand of messenger RNA injected into the body. It is a carefully engineered molecule with five distinct functional regions, each of which must be optimized for the vaccine to work. Understanding these components is the foundation of computational vaccine design.

1. The 5' Cap

The 5' cap is a modified guanosine nucleotide (m7GpppN) added to the beginning of the mRNA. It serves three critical functions: it protects the mRNA from degradation by 5'-to-3' exonucleases, it recruits the eIF4E translation initiation factor to begin protein synthesis, and it distinguishes the mRNA from foreign (uncapped) RNA that would trigger innate immune sensors. Modern mRNA vaccines use Cap1 structures (m7GpppAm or m7GpppGm with 2'-O-methylation on the first nucleotide), which mimic endogenous mRNA caps and minimize innate immune activation. The cap is added enzymatically during manufacturing, but the design phase must specify the cap type and the first transcribed nucleotide.

2. The 5' UTR (Untranslated Region)

The 5' UTR is the sequence between the cap and the start codon (AUG). It controls translation efficiency by affecting ribosome recruitment and scanning. An optimal 5' UTR for vaccine mRNA includes a strong Kozak consensus sequence (GCCACCAUGG) surrounding the start codon, minimal secondary structure (which would impede ribosome scanning), and no upstream AUG codons (uAUGs) that could initiate premature, out-of-frame translation. Many vaccine developers use the 5' UTR from human alpha-globin (HBA1) or beta-globin (HBB) mRNA, which have been optimized by evolution for efficient translation in human cells.

3. The Coding Sequence (CDS)

The CDS encodes the antigen protein. This is where codon optimization happens – the largest and most computationally intensive part of mRNA vaccine design. The CDS must encode the correct protein while simultaneously optimizing for translation efficiency, mRNA stability, and avoidance of immune-stimulatory sequences. A typical cancer vaccine CDS encodes multiple epitopes connected by cleavable linker sequences. A typical infectious disease vaccine CDS encodes a full-length viral protein, often with stabilizing mutations (like the K986P/V987P proline substitutions in the SARS-CoV-2 spike protein that lock it in the prefusion conformation).

4. The 3' UTR

The 3' UTR follows the stop codon and affects mRNA stability and localization. It contains regulatory elements that control mRNA half-life in the cytoplasm. The most commonly used 3' UTR in mRNA vaccines is derived from human beta-globin (HBB), which confers high stability and robust protein expression. Some designs use tandem 3' UTRs – two copies of an optimized UTR sequence – for enhanced stability. BioNTech's BNT162b2 (Comirnaty) uses a split amino-terminal enhancer of split (AES) sequence followed by the mitochondrial 12S rRNA 3' UTR, a combination that produced superior expression in their screening assays.

5. The Poly-A Tail

The poly-A tail is a stretch of adenosine residues at the 3' end of the mRNA. It protects against 3'-to-5' exonuclease degradation and promotes translation by interacting with poly-A binding proteins (PABPs). Longer tails generally produce more protein, but with diminishing returns above 100–120 nucleotides. BNT162b2 uses a segmented poly-A tail of approximately 100 residues with a 10-nucleotide linker interruption, encoded in the DNA template to ensure consistent length during manufacturing. Moderna's approach uses enzymatic polyadenylation, which produces heterogeneous tail lengths. For computational design, specifying a poly-A length of 100–120 nucleotides is standard.

Codon Optimization: The Heart of mRNA Engineering

Codon optimization is the process of replacing codons in the mRNA sequence with synonymous codons (those encoding the same amino acid) that are better suited for expression in human cells. Because the genetic code is degenerate – most amino acids are encoded by two to six different codons – there are an astronomically large number of possible mRNA sequences that encode the same protein. For a 300-amino-acid protein, the number of possible codon combinations exceeds 10^150. Choosing the right combination is a multi-objective optimization problem.

Codon Adaptation Index (CAI)

The codon adaptation index measures how closely the codon usage in an mRNA matches the preferred codon usage of the host organism. Human cells do not use all synonymous codons equally – some are translated faster because their corresponding tRNAs are more abundant. For example, the codon GCC (alanine) has a relative adaptiveness of 1.0 in human cells, while GCG (also alanine) has a relative adaptiveness of only 0.36. Using GCC instead of GCG at every alanine position increases translation speed.

A CAI of 1.0 means every codon is the most preferred synonym. In practice, CAI values above 0.80 are considered well-optimized for human expression. BNT162b2 has an estimated CAI of 0.96. However, maximizing CAI is not always optimal – ribosome traffic jams can occur when many consecutive codons demand the same tRNA, actually slowing translation. Modern optimization algorithms balance CAI with other objectives.

GC Content

GC content – the fraction of guanosine and cytosine nucleotides in the mRNA – affects both stability and immune activation. GC-rich mRNA forms more stable secondary structures, increasing half-life. GC-poor (AU-rich) mRNA is recognized by innate immune sensors and degraded more rapidly. The target range for therapeutic mRNA is 45–55% GC content. The wild-type SARS-CoV-2 spike coding sequence has approximately 36% GC content. After codon optimization, BNT162b2's spike CDS has approximately 57% GC content – a massive shift achieved entirely through synonymous codon substitution.

The relationship between GC content and mRNA stability is not linear. Below 40% GC, mRNA half-life drops sharply. Between 45% and 55%, stability is generally high. Above 60%, the mRNA can form structures so stable that ribosome scanning is impeded, reducing translation efficiency. The optimization sweet spot balances stability and translatability.

Uridine Depletion

Uridine-rich sequences are potent activators of innate immune sensors, particularly TLR7 and TLR8 in endosomes and RIG-I in the cytoplasm. Even when modified nucleosides like N1-methylpseudouridine (m1psi) are used to replace uridine (as in both Moderna's and BioNTech's COVID vaccines), reducing the total uridine content through codon optimization provides an additional layer of immune evasion. This is achieved by favoring codons with G or C in the third (wobble) position over those with U. For example, for phenylalanine, UUC is preferred over UUU because it has one fewer uridine while encoding the same amino acid.

Putting It Together: Multi-Objective Codon Optimization

Real-world codon optimization must balance all of these objectives simultaneously: maximize CAI (translation efficiency), target 45–55% GC content (stability), minimize uridine (immune evasion), avoid mRNA secondary structures near the start codon (ribosome loading), eliminate cryptic splice sites, avoid consecutive rare codons, and avoid internal polyadenylation signals (AAUAAA) that could cause premature termination. No single codon choice optimizes all objectives. The design is a compromise.

Codon-optimize an antigen sequence with SciRouter
import requests

API_KEY = "sk-sci-your-api-key"
BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

# Example: Codon-optimize the SARS-CoV-2 receptor binding domain (RBD)
# Amino acid sequence of the RBD (residues 319-541 of spike)
rbd_protein = (
    "RVQPTESIVRFPNITNLCPFGEVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFK"
    "CYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVIAWNS"
    "NNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVEGFNCYFPLQSYGFQP"
    "TNGVGYQPYRVVVLSFELLHAPATVCGPKKSTNLVKNKCVNF"
)

response = requests.post(
    f"{BASE}/vaccine/codon-optimize",
    headers=HEADERS,
    json={
        "protein_sequence": rbd_protein,
        "organism": "human",
        "optimization_targets": {
            "cai_weight": 0.35,
            "gc_content_weight": 0.30,
            "uridine_depletion_weight": 0.20,
            "structure_avoidance_weight": 0.15
        },
        "target_gc_range": [0.45, 0.55],
        "avoid_patterns": ["AATAAA", "AAUAAA"],  # poly-A signals
        "avoid_splice_sites": True
    }
)

result = response.json()
print(f"Optimized CDS length: {result['cds_length']} nt")
print(f"CAI score: {result['cai']:.3f}")
print(f"GC content: {result['gc_content']:.1%}")
print(f"Uridine fraction: {result['uridine_fraction']:.1%}")
print(f"Wild-type uridine fraction: {result['wildtype_uridine_fraction']:.1%}")
print(f"Uridine reduction: {result['uridine_reduction']:.1%}")
print(f"Min free energy (5' region): {result['mfe_five_prime']} kcal/mol")

# Preview the first 90 nucleotides
print(f"CDS start: {result['optimized_cds'][:90]}...")
Tip
The weight parameters in the optimization request control the trade-offs between objectives. If you are designing an mRNA that will use N1-methylpseudouridine, you can reduce the uridine depletion weight (since modified U already reduces immune activation) and increase the CAI weight for maximum protein expression.

Antigen Design: Full-Length vs. Epitope-Based

Before optimizing codons, you must decide what antigen to encode. This is arguably the most consequential design decision in the entire vaccine development process. The two fundamental approaches – full-length antigen and epitope-based – serve different use cases and have distinct trade-offs.

Full-Length Antigen Design

In this approach, the mRNA encodes the complete protein of interest. The BNT162b2 and mRNA-1273 COVID vaccines both encode the full-length SARS-CoV-2 spike protein (with stabilizing proline mutations). Full-length design has several advantages: it presents the protein in its native conformation (important for generating neutralizing antibodies that recognize 3D structural epitopes), it provides many possible T cell epitopes (enabling broad coverage across diverse HLA types without patient-specific customization), and it does not require prior knowledge of which epitopes are immunodominant.

The disadvantages are a larger mRNA molecule (the spike protein CDS alone is approximately 3,800 nucleotides), potential inclusion of immunodominant but non-protective epitopes that distract the immune response, and the risk of encoding domains with undesirable biological activity (the spike protein's furin cleavage site, for example, required careful consideration). Full-length design is standard for infectious disease vaccines where the target protein is known and the goal is broad population immunity.

Epitope-Based (Polyepitope) Design

In this approach, the mRNA encodes a string of short peptide epitopes (typically 8–25 amino acids each), connected by linker sequences. This is the standard approach for personalized cancer vaccines. The advantages include smaller mRNA size (encoding 10–20 epitopes requires only 500–1,000 nucleotides of CDS), precise targeting of tumor-specific or dark genome-derived antigens, and the ability to combine epitopes from multiple proteins in a single construct.

The key challenge is epitope selection – you must know which peptides will bind the patient's MHC molecules and activate T cells. This requires HLA typing and computational MHC binding prediction, which we cover in the next section.

Compare full-length vs. epitope-based designs
import requests

API_KEY = "sk-sci-your-api-key"
BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

# Approach 1: Full-length antigen (e.g., a viral surface protein)
viral_protein = (
    "MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSSVLHSTQDLFLPFFS"
    "NVTWFHAIHVSGTNGTKRFDNPVLPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIV"
    "NNATNVVIKVCEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVSQPFLMDLE"
)

full_length_resp = requests.post(
    f"{BASE}/vaccine/mrna-design",
    headers=HEADERS,
    json={
        "protein_sequence": viral_protein,
        "design_type": "full_length",
        "codon_optimization": "human",
        "five_prime_utr": "alpha_globin",
        "three_prime_utr": "beta_globin",
        "poly_a_length": 110,
        "modified_nucleosides": True,
        "optimize_gc_content": True,
    }
)

fl_design = full_length_resp.json()
print(f"Full-length design:")
print(f"  mRNA length: {fl_design['mrna_length']} nt")
print(f"  CAI: {fl_design['cai_score']:.3f}")
print(f"  GC content: {fl_design['gc_content']:.1%}")

# Approach 2: Epitope-based (e.g., cancer neoantigens)
epitopes = [
    "YLQPRTFLL", "KQSSKALQR", "RMFPNAPYL",
    "FLWGPRALV", "GILGFVFTL", "SLLMWITQC",
    "KTWGQYWQV", "ALWGPDPAAA"
]

epitope_resp = requests.post(
    f"{BASE}/vaccine/mrna-design",
    headers=HEADERS,
    json={
        "epitopes": epitopes,
        "design_type": "polyepitope",
        "linker_type": "AAY",
        "codon_optimization": "human",
        "five_prime_utr": "kozak_optimized",
        "three_prime_utr": "beta_globin",
        "poly_a_length": 120,
        "modified_nucleosides": True,
        "optimize_gc_content": True,
    }
)

ep_design = epitope_resp.json()
print(f"\nEpitope-based design:")
print(f"  mRNA length: {ep_design['mrna_length']} nt")
print(f"  CAI: {ep_design['cai_score']:.3f}")
print(f"  GC content: {ep_design['gc_content']:.1%}")
print(f"  Epitopes encoded: {ep_design['epitope_count']}")

MHC Binding Prediction: Matching Epitopes to the Patient

For an epitope-based vaccine to work, the selected peptides must be presented on the patient's MHC (Major Histocompatibility Complex) molecules. MHC molecules are the immune system's display cases – they hold peptide fragments on the cell surface where T cells can inspect them. If a peptide does not bind to the patient's MHC, it will never be seen by T cells, no matter how perfectly the mRNA is designed.

Humans have two classes of MHC molecules. MHC class I (HLA-A, HLA-B, HLA-C) presents peptides to CD8+ cytotoxic T cells – the killers. MHC-I typically binds peptides of 8–11 amino acids. MHC class II (HLA-DR, HLA-DQ, HLA-DP) presents peptides to CD4+ helper T cells, which coordinate the broader immune response. MHC-II binds longer peptides, typically 13–25 amino acids, in an open-ended groove.

The human MHC system (called HLA, Human Leukocyte Antigen) is the most polymorphic gene family in the human genome. There are over 35,000 known HLA alleles. Each person inherits one allele from each parent for each HLA gene, giving them a unique combination (haplotype) that determines which peptides their immune system can see. This is why organ transplant matching is so difficult, and why personalized vaccine design requires knowing the patient's HLA type.

AI models trained on hundreds of thousands of experimental peptide-MHC binding measurements can predict binding affinity for any peptide-allele combination. The standard metric is IC50 (the peptide concentration at which 50% of MHC molecules are bound). Peptides with predicted IC50 below 500 nM are classified as binders; below 50 nM are strong binders.

MHC class I and class II binding prediction
import requests

API_KEY = "sk-sci-your-api-key"
BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

# Patient HLA alleles (from clinical HLA typing)
hla_class_i = ["HLA-A*02:01", "HLA-A*11:01",
               "HLA-B*07:02", "HLA-B*35:01",
               "HLA-C*04:01", "HLA-C*07:02"]

hla_class_ii = ["HLA-DRB1*03:01", "HLA-DRB1*15:01",
                "HLA-DQB1*02:01", "HLA-DQB1*06:02"]

# Candidate peptides from neoantigen discovery
candidates = [
    "YLQPRTFLL",    # 9-mer, potential MHC-I binder
    "KQSSKALQR",    # 9-mer
    "RMFPNAPYL",    # 9-mer
    "FLWGPRALV",    # 9-mer
    "GILGFVFTL",    # 9-mer (known influenza epitope for HLA-A*02:01)
    "KLPDDFTGCVIAWNSNNL",  # 18-mer, potential MHC-II binder
    "RVQPTESIVRFPNITN",    # 16-mer, potential MHC-II binder
]

# MHC Class I prediction (CD8+ T cell epitopes)
class_i_resp = requests.post(
    f"{BASE}/immunology/mhc-binding",
    headers=HEADERS,
    json={
        "peptides": [p for p in candidates if len(p) <= 11],
        "alleles": hla_class_i,
        "mhc_class": "I",
        "prediction_type": "binding_affinity"
    }
)

print("MHC Class I Binding Results (CD8+ T cell epitopes):")
for pred in class_i_resp.json()["predictions"]:
    if pred["ic50_nm"] < 500:
        strength = "STRONG" if pred["ic50_nm"] < 50 else "moderate"
        print(f"  {pred['peptide']} + {pred['allele']}: "
              f"IC50 = {pred['ic50_nm']:.0f} nM ({strength} binder)")

# MHC Class II prediction (CD4+ helper T cell epitopes)
class_ii_resp = requests.post(
    f"{BASE}/immunology/mhc-binding",
    headers=HEADERS,
    json={
        "peptides": [p for p in candidates if len(p) > 11],
        "alleles": hla_class_ii,
        "mhc_class": "II",
        "prediction_type": "binding_affinity"
    }
)

print("\nMHC Class II Binding Results (CD4+ helper T cell epitopes):")
for pred in class_ii_resp.json()["predictions"]:
    if pred["ic50_nm"] < 1000:
        print(f"  {pred['peptide'][:15]}... + {pred['allele']}: "
              f"IC50 = {pred['ic50_nm']:.0f} nM")

A well-designed cancer vaccine includes both MHC-I restricted epitopes (to activate killer T cells that destroy tumor cells) and MHC-II restricted epitopes (to activate helper T cells that sustain the immune response). The standard approach is to select 10–20 MHC-I epitopes and 3–5 MHC-II epitopes for inclusion in the polyepitope construct.

Hands-On Walkthrough: End-to-End mRNA Vaccine Design with the SciRouter Python SDK

Let us walk through a complete vaccine design workflow, from raw inputs to an optimized mRNA sequence ready for manufacturing. We will design a personalized cancer vaccine for a hypothetical melanoma patient with known tumor mutations and HLA type.

Step 1: Install the SDK and Set Up Authentication

Install the SciRouter Python SDK
pip install scirouter
Initialize the client
from scirouter import SciRouter

client = SciRouter(api_key="sk-sci-your-api-key")

# Verify connection
print(f"Connected to SciRouter API")
print(f"Available tools: {len(client.tools.list())} tools")

Step 2: Define the Patient Profile

Patient tumor and HLA data
# Patient: 58-year-old with Stage III melanoma
# HLA type determined from clinical genotyping
patient_hla = {
    "class_i": [
        "HLA-A*02:01", "HLA-A*24:02",
        "HLA-B*07:02", "HLA-B*44:02",
        "HLA-C*05:01", "HLA-C*07:02"
    ],
    "class_ii": [
        "HLA-DRB1*01:01", "HLA-DRB1*15:01",
        "HLA-DQB1*05:01", "HLA-DQB1*06:02"
    ]
}

# Somatic mutations identified from whole-exome sequencing
# VAF = variant allele frequency (higher = more clonal)
tumor_mutations = [
    {"gene": "BRAF",  "mutation": "V600E", "vaf": 0.48},
    {"gene": "TP53",  "mutation": "R248W", "vaf": 0.41},
    {"gene": "CDK4",  "mutation": "R24C",  "vaf": 0.55},
    {"gene": "NRAS",  "mutation": "Q61R",  "vaf": 0.38},
    {"gene": "PTEN",  "mutation": "R130Q", "vaf": 0.32},
    {"gene": "MAP2K1","mutation": "P124S", "vaf": 0.27},
    {"gene": "ARID2", "mutation": "Q1334*","vaf": 0.44},
    {"gene": "PPP6C", "mutation": "R264C", "vaf": 0.21},
]

print(f"Patient HLA-A alleles: {patient_hla['class_i'][:2]}")
print(f"Tumor mutations: {len(tumor_mutations)}")
print(f"Mean VAF: {sum(m['vaf'] for m in tumor_mutations)/len(tumor_mutations):.2f}")

Step 3: Run the Neoantigen Discovery Pipeline

Identify neoantigen candidates
import requests

BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": "Bearer sk-sci-your-api-key"}

# Run the full neoantigen pipeline
neo_resp = requests.post(
    f"{BASE}/immunology/neoantigen-pipeline",
    headers=HEADERS,
    json={
        "mutations": tumor_mutations,
        "hla_alleles": patient_hla["class_i"],
        "max_candidates": 20,
        "min_binding_affinity_nm": 500,
        "include_processing_score": True,
        "include_self_similarity": True
    }
)

pipeline = neo_resp.json()
print(f"Mutations analyzed: {pipeline['mutations_analyzed']}")
print(f"Candidate peptides screened: {pipeline['peptides_screened']}")
print(f"Strong MHC binders: {pipeline['strong_binders']}")
print(f"Final selected neoantigens: {len(pipeline['selected_neoantigens'])}")

print("\nTop neoantigen candidates:")
for i, neo in enumerate(pipeline["selected_neoantigens"][:10], 1):
    print(f"  {i}. {neo['peptide']} | Gene: {neo['gene']} | "
          f"IC50: {neo['best_ic50_nm']:.0f} nM | "
          f"VAF: {neo['vaf']:.2f} | "
          f"Score: {neo['composite_score']:.2f}")

Step 4: Design the mRNA Construct

Build the optimized mRNA vaccine
# Extract top epitopes from the neoantigen pipeline
selected_epitopes = [
    neo["peptide"]
    for neo in pipeline["selected_neoantigens"][:15]
]

# Design the complete mRNA construct
mrna_resp = requests.post(
    f"{BASE}/vaccine/mrna-design",
    headers=HEADERS,
    json={
        "epitopes": selected_epitopes,
        "design_type": "polyepitope",
        "linker_type": "AAY",
        "codon_optimization": "human",
        "five_prime_utr": "kozak_optimized",
        "three_prime_utr": "beta_globin",
        "poly_a_length": 120,
        "modified_nucleosides": True,
        "optimize_gc_content": True,
        "target_gc_range": [0.45, 0.55],
        "avoid_patterns": ["AATAAA"],
        "signal_peptide": "tpa",  # Tissue plasminogen activator signal
    }
)

design = mrna_resp.json()
print("mRNA Vaccine Design Results:")
print(f"  Total mRNA length: {design['mrna_length']} nt")
print(f"  5' UTR: {design['five_prime_utr_length']} nt")
print(f"  CDS: {design['cds_length']} nt")
print(f"  3' UTR: {design['three_prime_utr_length']} nt")
print(f"  Poly-A tail: {design['poly_a_length']} nt")
print(f"  GC content: {design['gc_content']:.1%}")
print(f"  CAI score: {design['cai_score']:.3f}")
print(f"  Uridine fraction: {design['uridine_fraction']:.1%}")
print(f"  Epitopes encoded: {design['epitope_count']}")
print(f"  Predicted half-life: {design['predicted_half_life_hours']:.1f} hours")

Step 5: Validate the Design

Quality control checks on the mRNA design
# Verify all epitopes are correctly encoded
print("Epitope verification:")
for ep in design["encoded_epitopes"]:
    status = "PASS" if ep["verified"] else "FAIL"
    print(f"  {ep['peptide']} - {status} "
          f"(position {ep['cds_start']}-{ep['cds_end']})")

# Check for problematic sequences
print(f"\nQuality checks:")
print(f"  Internal poly-A signals: {design['internal_polya_signals']}")
print(f"  Cryptic splice sites: {design['cryptic_splice_sites']}")
print(f"  Consecutive rare codons (>3): {design['consecutive_rare_codons']}")
print(f"  Upstream AUGs in 5' UTR: {design['upstream_augs']}")

# Export the final sequence
print(f"\nFinal mRNA sequence ({design['mrna_length']} nt):")
print(f"  5' cap: m7GpppAm (Cap1)")
print(f"  Sequence: {design['mrna_sequence'][:60]}...")
print(f"  ...{design['mrna_sequence'][-30:]}")
print(f"  Modified nucleoside: N1-methylpseudouridine (m1Y)")
Note
The signal peptide (TPA, tissue plasminogen activator) directs the translated protein into the endoplasmic reticulum for MHC class I processing. Without a signal peptide, the polyepitope protein would remain in the cytoplasm and rely solely on proteasomal degradation for MHC loading, which is less efficient.

Lipid Nanoparticle Delivery: A Brief Overview

The mRNA molecule designed above is useless on its own – naked mRNA is rapidly degraded by extracellular RNases and cannot cross cell membranes. Lipid nanoparticles (LNPs) solve both problems. They encapsulate the mRNA in a sphere of lipids approximately 80–100 nm in diameter, protecting it from degradation and enabling cellular uptake through endocytosis.

A typical LNP formulation contains four components: an ionizable lipid (which is positively charged at low pH for mRNA encapsulation but neutral at physiological pH to avoid toxicity), a helper phospholipid (DSPC, which stabilizes the bilayer), cholesterol (which enhances structural integrity), and a PEG-lipid (which prevents aggregation and extends circulation time). The Moderna formulation uses SM-102 as the ionizable lipid. The BioNTech formulation uses ALC-0315.

LNP design is primarily an experimental and manufacturing challenge rather than a computational one, which is why it falls outside the scope of this guide. However, the mRNA sequence design does influence LNP performance – longer mRNA molecules are harder to encapsulate efficiently, and the mRNA secondary structure affects encapsulation. This is another reason to keep the mRNA as short as possible while encoding all required epitopes.

Regulatory Context: What Computational Vaccine Design Enables

Regulatory agencies (FDA, EMA) evaluate vaccines based on manufacturing quality, preclinical safety, and clinical efficacy – not on the computational methods used to design them. However, computational design tools dramatically accelerate the regulatory pathway by reducing the design-test-iterate cycle from months to days.

For personalized cancer vaccines, the FDA has established an accelerated regulatory framework that treats each patient's vaccine as an individualized product manufactured under a single IND (Investigational New Drug) application. The computational pipeline described in this guide – neoantigen prediction, MHC binding analysis, codon optimization – is a core component of the IND submission package. The FDA requires documentation of the algorithms used, their validation data, and the rationale for epitope selection.

For infectious disease vaccines intended for population-wide use, the design phase is just the starting point. The computationally designed mRNA must go through preclinical testing (animal models), Phase 1 (safety), Phase 2 (immunogenicity), and Phase 3 (efficacy) trials. The pandemic-era Emergency Use Authorization pathway compressed this from the typical 10–15 years to about 11 months, but even that accelerated timeline assumed the computational design was already complete.

The takeaway: computational vaccine design tools like those provided by SciRouter do not replace clinical development, but they dramatically reduce the time and cost of the design phase – which is the critical first step in the entire pipeline.

Bringing It All Together: The Complete Vaccine Design Pipeline

Here is the full end-to-end pipeline in a single script, from patient data to a manufacturing-ready mRNA sequence. This represents the computational core of modern mRNA vaccine development.

Complete mRNA vaccine design pipeline
import requests
import json

API_KEY = "sk-sci-your-api-key"
BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

# ---- Patient Input ----
patient = {
    "id": "PT-2026-0042",
    "cancer_type": "melanoma",
    "hla_class_i": ["HLA-A*02:01", "HLA-A*24:02",
                    "HLA-B*07:02", "HLA-B*44:02"],
    "hla_class_ii": ["HLA-DRB1*01:01", "HLA-DRB1*15:01"],
    "mutations": [
        {"gene": "BRAF",   "mutation": "V600E", "vaf": 0.48},
        {"gene": "TP53",   "mutation": "R248W", "vaf": 0.41},
        {"gene": "CDK4",   "mutation": "R24C",  "vaf": 0.55},
        {"gene": "NRAS",   "mutation": "Q61R",  "vaf": 0.38},
        {"gene": "PTEN",   "mutation": "R130Q", "vaf": 0.32},
    ]
}

# ---- Step 1: Neoantigen Discovery ----
print("Step 1: Running neoantigen pipeline...")
neo_resp = requests.post(f"{BASE}/immunology/neoantigen-pipeline",
    headers=HEADERS,
    json={
        "mutations": patient["mutations"],
        "hla_alleles": patient["hla_class_i"],
        "max_candidates": 20,
        "min_binding_affinity_nm": 500,
        "include_processing_score": True,
    }
)
neoantigens = neo_resp.json()
epitopes = [n["peptide"] for n in neoantigens["selected_neoantigens"]]
print(f"  Found {len(epitopes)} high-confidence epitopes")

# ---- Step 2: MHC-II Epitope Selection (for CD4+ help) ----
print("Step 2: Selecting MHC-II epitopes for CD4+ T cell help...")
mhc2_resp = requests.post(f"{BASE}/immunology/mhc-binding",
    headers=HEADERS,
    json={
        "peptides": [n["long_peptide"] for n in neoantigens["selected_neoantigens"][:10]],
        "alleles": patient["hla_class_ii"],
        "mhc_class": "II",
        "prediction_type": "binding_affinity"
    }
)
mhc2_binders = [
    p["peptide"] for p in mhc2_resp.json()["predictions"]
    if p["ic50_nm"] < 1000
]
print(f"  Found {len(mhc2_binders)} MHC-II binders")

# ---- Step 3: mRNA Construct Design ----
print("Step 3: Designing mRNA construct...")
all_epitopes = epitopes[:12] + mhc2_binders[:3]  # Mix of CD8+ and CD4+ epitopes

mrna_resp = requests.post(f"{BASE}/vaccine/mrna-design",
    headers=HEADERS,
    json={
        "epitopes": all_epitopes,
        "linker_type": "AAY",
        "codon_optimization": "human",
        "five_prime_utr": "kozak_optimized",
        "three_prime_utr": "beta_globin",
        "poly_a_length": 120,
        "modified_nucleosides": True,
        "optimize_gc_content": True,
        "target_gc_range": [0.45, 0.55],
        "signal_peptide": "tpa",
    }
)

design = mrna_resp.json()
print(f"  mRNA length: {design['mrna_length']} nt")
print(f"  GC content: {design['gc_content']:.1%}")
print(f"  CAI: {design['cai_score']:.3f}")
print(f"  Epitopes: {design['epitope_count']}")

# ---- Summary ----
print(f"\n{'='*50}")
print(f"VACCINE DESIGN COMPLETE")
print(f"Patient: {patient['id']}")
print(f"Cancer: {patient['cancer_type']}")
print(f"Neoantigens screened: {neoantigens['peptides_screened']}")
print(f"Epitopes in construct: {design['epitope_count']}")
print(f"mRNA length: {design['mrna_length']} nt")
print(f"CAI: {design['cai_score']:.3f} | GC: {design['gc_content']:.1%}")
print(f"Ready for GMP manufacturing")
print(f"{'='*50}")

What Comes Next

The mRNA sequence generated by this pipeline is the starting point for manufacturing. The next steps in the real-world vaccine development process include DNA template preparation (the mRNA is synthesized from a linearized DNA plasmid via in vitro transcription), IVT (in vitro transcription) reaction optimization, LNP encapsulation and characterization, quality control (identity, purity, potency, endotoxin testing), and fill-finish for clinical administration.

For researchers exploring computational vaccine design, SciRouter provides the tools to experiment with the design phase. Use Neoantigen Pipeline to identify targets from tumor mutations. Use MHC Binding Prediction to evaluate epitope-HLA interactions across diverse patient populations. Use Vaccine Design to generate optimized mRNA constructs with full control over UTRs, codon optimization weights, and construct architecture.

For a deeper look at where the next generation of vaccine targets will come from, see our companion article on scanning the dark genome for hidden cancer genes. For the broader context of AI in cancer immunotherapy, see our post on how AI is changing cancer vaccine design.

The computational tools for vaccine design are now accessible to any researcher with a Python environment and an API key. The mRNA vaccine revolution is not just about Moderna and BioNTech – it is about democratizing the technology so that the next pandemic vaccine, the next cancer therapy, and the next rare disease treatment can be designed by anyone with the right data and the right tools.

Frequently Asked Questions

What programming skills do I need to design an mRNA vaccine computationally?

Basic Python proficiency is sufficient to use the SciRouter API for mRNA vaccine design. You need to understand HTTP requests (the requests library), JSON data structures, and basic string manipulation. No machine learning expertise is required — the AI models run on SciRouter's infrastructure. A background in molecular biology or immunology is helpful for interpreting results, but the API documentation explains the key concepts.

What is codon adaptation index (CAI) and what value should I target?

Codon adaptation index (CAI) measures how closely the codon usage in your mRNA matches the preferred codon usage of the host organism (in this case, human cells). CAI ranges from 0 to 1, where 1.0 means every codon is the most frequently used synonym in the human genome. For mRNA vaccines, you should target a CAI of 0.80 or higher. Values above 0.90 are excellent. However, maximizing CAI alone can create problems — extremely high CAI can deplete specific tRNA pools and actually slow translation. Balance CAI with GC content (40–60%) and mRNA secondary structure stability.

Why does GC content matter for mRNA vaccines?

GC content affects mRNA stability, translation efficiency, and innate immune activation. mRNA with GC content below 40% tends to be unstable and rapidly degraded by cellular RNases. mRNA with GC content above 60% can form excessively stable secondary structures that impede ribosome scanning. The optimal range for therapeutic mRNA is 45–55%, with 50% often cited as ideal. GC-rich codons also reduce the uridine content of the mRNA, which decreases recognition by innate immune sensors like TLR7 and TLR8 — an important consideration even when using modified nucleosides.

What is the difference between full-length antigen and epitope-based vaccine design?

A full-length antigen vaccine encodes the complete protein (e.g., the SARS-CoV-2 spike protein). This approach presents many possible epitopes and works across diverse HLA types, but produces a larger mRNA molecule and may include immunodominant epitopes that distract from the most protective responses. An epitope-based vaccine encodes only short peptide sequences (8–25 amino acids) known to bind MHC and activate T cells. This allows precise targeting but requires HLA typing and may miss important conformational epitopes. Cancer vaccines typically use the epitope approach because each patient's neoantigens are unique. Infectious disease vaccines often use full-length antigens for broader population coverage.

Can I use this guide to make a real vaccine?

This guide covers the computational design phase of mRNA vaccine development. The tools and methods described here produce optimized mRNA sequences suitable for research and preclinical development. Manufacturing a real vaccine requires GMP-grade in vitro transcription, lipid nanoparticle encapsulation, quality control testing, and regulatory approval — none of which are covered here. The computational design is a critical first step, but it is only one part of the vaccine development pipeline. Always work within the regulatory framework of your country and institution.

How does SciRouter's vaccine design compare to tools used by Moderna or BioNTech?

Moderna and BioNTech use proprietary internal platforms for codon optimization and mRNA design that have been refined over years of clinical development. SciRouter's vaccine design tools implement the same fundamental algorithms — codon adaptation index optimization, GC content balancing, UTR selection, and MHC binding prediction — using published methods and open-source models. The core science is identical. The difference is that pharma companies have proprietary training data from their own clinical trials and manufacturing runs. For academic research, preclinical work, and educational purposes, SciRouter provides equivalent computational capability through an accessible API.

Try this yourself

500 free credits. No credit card required.