What Are Neoantigens?
Every cancer starts with mutations. As tumor cells divide and accumulate somatic DNA changes – point mutations, insertions, deletions, gene fusions – some of those mutations fall within protein-coding regions and alter the amino acid sequence of the resulting protein. When these mutated proteins are processed by the proteasome, the resulting peptide fragments are loaded onto MHC (Major Histocompatibility Complex) molecules and displayed on the cell surface. These mutant peptides are called neoantigens.
Neoantigens are, in a real sense, the Achilles heel of cancer. They are absent from every normal cell in the patient's body, which means the immune system has not developed tolerance to them. When a T cell encounters a neoantigen-MHC complex, it recognizes it as foreign and can mount a cytotoxic response against the tumor cell. This is the biological basis of cancer immunotherapy – the immune system can distinguish tumor cells from normal cells because of the neoantigens they display.
The clinical evidence is compelling. Patients whose tumors have high neoantigen loads (typically correlated with high tumor mutation burden, or TMB) respond better to immune checkpoint inhibitors like pembrolizumab (Keytruda) and nivolumab (Opdivo). The landmark CheckMate-227 trial showed that TMB greater than 10 mutations per megabase predicted response to nivolumab-ipilimumab in non-small cell lung cancer with a hazard ratio of 0.62 for overall survival. The more neoantigens a tumor displays, the more targets the immune system has to attack.
But checkpoint inhibitors are blunt instruments – they release the brakes on the immune system broadly, causing autoimmune side effects in 15 to 40% of patients. The next frontier is precision immunotherapy: identifying the specific neoantigens in a patient's tumor and designing vaccines or T cell therapies that target those neoantigens directly. This requires computational prediction of which mutant peptides will bind MHC molecules and be recognized by T cells.
Tumor Mutation Burden as a Biomarker
Tumor mutation burden (TMB) is the total count of somatic mutations per megabase of coding DNA. It varies enormously across cancer types. Melanoma and non-small cell lung cancer (NSCLC) have median TMBs of 10 to 15 mutations per megabase, driven by UV radiation and tobacco carcinogen exposure respectively. Microsatellite-instable (MSI-high) colorectal cancers have TMBs exceeding 40 mutations per megabase due to defective DNA mismatch repair. At the other end of the spectrum, pediatric cancers, prostate cancer, and pancreatic cancer typically have TMBs below 3 mutations per megabase.
TMB matters for neoantigen prediction because more mutations mean more potential neoantigens. A tumor with 300 non-synonymous mutations (roughly 10 mutations per megabase in a typical exome) might generate 10 to 30 high-confidence predicted neoantigens after filtering through MHC binding prediction and immunogenicity scoring. A tumor with 30 non-synonymous mutations might yield only 1 to 3 candidates – still potentially enough for a vaccine, but with less margin for error if some predictions do not validate experimentally.
The FDA approved TMB as a pan-cancer biomarker for pembrolizumab in 2020 (KEYNOTE-158 trial), using a cutoff of 10 or more mutations per megabase. This was a landmark decision that validated the link between mutational load, neoantigen abundance, and immunotherapy response. However, TMB alone is an imperfect biomarker – not all mutations produce immunogenic neoantigens, and some low-TMB tumors respond to immunotherapy through mechanisms unrelated to neoantigens (such as viral antigens in HPV-positive cancers).
Computational neoantigen prediction goes beyond simple TMB counting by asking which specific mutations actually produce peptides that bind MHC molecules, are presented on the cell surface, and can be recognized by the patient's T cell repertoire. This is the difference between counting bullets and counting the ones that hit the target.
The Neoantigen Prediction Pipeline
A complete neoantigen prediction pipeline transforms raw sequencing data into a ranked list of candidate neoantigens for vaccine or therapy design. The pipeline has five major steps, each with specific computational tools and biological considerations.
Step 1: Somatic Mutation Calling
The starting point is whole-exome or whole-genome sequencing of both the tumor and a matched normal sample (typically blood). Variant callers like Mutect2, Strelka2, or VarScan compare the two samples to identify somatic mutations – changes present in the tumor but absent from the germline. Non-synonymous mutations (those that change the amino acid sequence) are retained for neoantigen prediction. Typical output: 50 to 500 non-synonymous mutations depending on TMB.
Step 2: HLA Typing
MHC molecules are encoded by the HLA (Human Leukocyte Antigen) gene complex. Every person has a unique combination of HLA alleles (up to 6 HLA-I alleles: 2 each of HLA-A, HLA-B, and HLA-C; and multiple HLA-II alleles). Different HLA alleles bind different peptide motifs. HLA typing determines which alleles the patient carries, which determines which mutant peptides can be presented on their tumor cells. Tools like OptiType, HISAT-genotype, or HLA-HD perform HLA typing from the same sequencing data used for mutation calling.
Step 3: Peptide Generation
For each non-synonymous mutation, generate all possible peptide windows containing the mutated residue. For MHC-I prediction, generate 8-mer through 11-mer peptides (the size range that MHC-I can accommodate). For MHC-II, generate 13-mer through 25-mer peptides. A single point mutation typically produces 30 to 40 candidate peptides across all lengths and positions.
Step 4: MHC Binding Prediction
This is the core computational step. For each candidate peptide and each of the patient's HLA alleles, predict the binding affinity of the peptide-MHC complex. Tools like NetMHCpan 4.1, MHCflurry 2.0, and MixMHCpred use neural networks trained on experimental binding data to predict IC50 values (the peptide concentration needed for 50% inhibition of a reference peptide). A common threshold is IC50 below 500 nM for a weak binder and below 50 nM for a strong binder.
Step 5: Immunogenicity Scoring
MHC binding is necessary but not sufficient for immunogenicity. A peptide must also be processed by the proteasome, transported by TAP (Transporter associated with Antigen Processing), and recognized by T cells. Immunogenicity scoring models like PRIME, BigMHC, or DeepImmuno attempt to predict the full pipeline from peptide to T cell recognition. Additional features that improve prediction include the difference in MHC binding between the mutant and wild-type peptide (larger differences suggest stronger immune recognition), the mutation position within the peptide, and hydrophobicity at TCR-facing residues.
MHC-I vs. MHC-II Neoantigens
The immune system uses two distinct antigen presentation pathways, and understanding the difference is critical for neoantigen vaccine design.
MHC class I molecules are expressed on virtually all nucleated cells. They present short peptides (8 to 11 amino acids) derived from intracellular proteins to CD8+ cytotoxic T lymphocytes (CTLs). When a CTL recognizes a neoantigen on MHC-I, it directly kills the tumor cell through perforin and granzyme secretion. MHC-I neoantigens are the primary effectors of anti-tumor immunity and are the focus of most neoantigen vaccine efforts. HLA-A*02:01 is the most common HLA-I allele in Caucasian populations (frequency approximately 25 to 30%) and binds peptides with leucine or methionine at position 2 and valine or leucine at the C-terminus.
MHC class II molecules are expressed primarily on professional antigen-presenting cells (dendritic cells, macrophages, B cells). They present longer peptides (13 to 25 amino acids) to CD4+ helper T cells. While CD4+ T cells do not directly kill tumor cells in most cases, they provide essential support for CD8+ responses: they activate dendritic cells, promote CD8+ T cell priming, maintain CD8+ T cell memory, and recruit and activate macrophages. Clinical data from neoantigen vaccine trials increasingly shows that including MHC-II neoantigens improves the durability and breadth of the anti-tumor response.
The computational challenge differs between the two classes. MHC-I binding prediction is relatively mature, with models trained on extensive experimental binding data (over 500,000 binding measurements across common HLA-I alleles). MHC-II prediction is harder because the binding groove is open-ended (accommodating variable peptide lengths), the training data is smaller, and the set of HLA-II alleles is more diverse. Current MHC-II predictors (NetMHCIIpan 4.3, MixMHC2pred) achieve AUC values of 0.80 to 0.90, compared to 0.90 to 0.95 for MHC-I predictors.
Best practice for neoantigen vaccine design is to include both MHC-I and MHC-II epitopes. A vaccine with 10 to 20 predicted neoantigens typically includes 7 to 12 MHC-I epitopes (for direct CTL responses) and 3 to 8 MHC-II epitopes (for CD4+ helper support). This dual approach is used by BioNTech in their autogene cevumeran (BNT122) individualized neoantigen vaccine, which showed a 44% recurrence-free survival benefit in pancreatic cancer (Phase 1, published in Nature 2023).
Hands-On: Neoantigen Prediction with the SciRouter API
SciRouter provides API endpoints for the core computational steps of neoantigen prediction: peptide generation from mutant sequences, MHC binding prediction, and immunogenicity scoring. The following example walks through predicting neoantigens from a set of somatic mutations.
import os, requests
API_KEY = os.environ["SCIROUTER_API_KEY"]
BASE = "https://api.scirouter.ai/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
# Example: somatic mutations from a melanoma exome
# Each mutation: gene, wild-type peptide context, mutant amino acid, position
mutations = [
{
"gene": "BRAF",
"wild_type_sequence": "LATEKSRWSGSHQFEQLS",
"mutant_sequence": "LATEKSRWSGSHQFEELS",
"mutation": "Q612E",
},
{
"gene": "NRAS",
"wild_type_sequence": "MTEYKLVVVGAGGVGKSALT",
"mutant_sequence": "MTEYKLVVVGADGVGKSALT",
"mutation": "G12D",
},
{
"gene": "TP53",
"wild_type_sequence": "VVRCPHERCTEGQFHRHSE",
"mutant_sequence": "VVRCPHERCTEGHFHRHSE",
"mutation": "Q248H",
},
]
# Patient HLA alleles (determined from sequencing)
hla_alleles = ["HLA-A*02:01", "HLA-A*24:02", "HLA-B*07:02",
"HLA-B*44:02", "HLA-C*05:01", "HLA-C*07:02"]
# Predict MHC-I binding for all mutations
results = requests.post(f"{BASE}/immunology/neoantigen-predict", headers=HEADERS, json={
"mutations": mutations,
"hla_alleles": hla_alleles,
"peptide_lengths": [8, 9, 10, 11],
"binding_threshold": 500, # IC50 in nM
"include_wild_type": True, # compare mutant vs WT binding
}).json()
print(f"Total candidate peptides evaluated: {results['total_peptides']}")
print(f"Predicted binders (IC50 < 500 nM): {results['num_binders']}")
print(f"Strong binders (IC50 < 50 nM): {results['num_strong_binders']}")
print("\n=== Top Predicted Neoantigens ===")
for neo in results["neoantigens"][:10]:
print(f"\nGene: {neo['gene']} ({neo['mutation']})")
print(f" Peptide: {neo['peptide']}")
print(f" HLA: {neo['hla_allele']}")
print(f" Mutant IC50: {neo['mutant_ic50']:.1f} nM")
print(f" Wild-type IC50: {neo['wildtype_ic50']:.1f} nM")
print(f" DAI: {neo['differential_agretopicity']:.2f}")
print(f" Immunogenicity score: {neo['immunogenicity_score']:.3f}")The output includes the Differential Agretopicity Index (DAI), which measures the ratio of MHC binding affinity between the mutant and wild-type peptide. A high DAI indicates that the mutation creates a peptide that binds MHC much more strongly than the wild-type version – a strong signal for immunogenicity because the immune system has not been tolerized to the mutant sequence.
For a complete vaccine design workflow, filter the predicted neoantigens by binding strength (IC50 below 100 nM), DAI (above 5-fold), and immunogenicity score (above a model-specific threshold). Then select 10 to 20 diverse peptides covering multiple HLA alleles and multiple mutations for inclusion in a personalized vaccine.
Clinical Context: Neoantigen Vaccines in the Clinic
As of early 2026, over 100 clinical trials testing neoantigen-based immunotherapies are registered on ClinicalTrials.gov. The approaches span mRNA vaccines, synthetic long peptide vaccines, dendritic cell vaccines loaded with neoantigen peptides, and adoptive T cell therapies targeting neoantigen-reactive T cells.
The most advanced program is BioNTech's autogene cevumeran (BNT122), a personalized mRNA vaccine encoding up to 20 patient-specific neoantigens. In a Phase 1 trial for resected pancreatic ductal adenocarcinoma (published in Nature, May 2023), patients who mounted T cell responses to the vaccine had significantly delayed recurrence (median recurrence-free survival not reached vs. 13.4 months in non-responders). BioNTech and Genentech are now conducting a Phase 2 trial (IMCODE-003) in melanoma combining the personalized vaccine with atezolizumab (an anti-PD-L1 checkpoint inhibitor).
Moderna's mRNA-4157 (V940) personalized cancer vaccine showed a 44% reduction in recurrence or death risk when combined with pembrolizumab in resected high-risk melanoma (KEYNOTE-942, Phase 2b). Based on these results, Moderna and Merck are advancing to a Phase 3 trial – the first randomized Phase 3 trial of a personalized neoantigen vaccine.
These trials demonstrate that neoantigen prediction pipelines like the one described in this article are no longer purely academic. They are driving clinical decisions about which peptides to include in patient-specific vaccines. The computational accuracy of neoantigen prediction directly affects vaccine efficacy – better predictions mean more immunogenic peptides in the vaccine, stronger T cell responses, and better clinical outcomes.
Combining Neoantigen Prediction with DarkScan
Standard neoantigen prediction focuses on the approximately 2% of the genome that encodes known proteins. But tumors also express antigens from the dark genome– cancer-testis antigens, endogenous retroviral proteins, and non-coding RNA-derived peptides that are normally silenced in adult tissues. SciRouter's DarkScan Studio extends neoantigen prediction to include these dark genome targets, dramatically expanding the pool of candidate vaccine antigens.
This is particularly valuable for low-TMB tumors where conventional neoantigens are scarce. A pancreatic cancer with only 30 non-synonymous mutations might yield 2 to 3 conventional neoantigens after MHC binding and immunogenicity filtering. But the same tumor may also express MAGE-A4, NY-ESO-1, and HERV-K-derived peptides that are strong MHC binders and highly immunogenic. By combining conventional neoantigen prediction with dark genome scanning, you can design a vaccine with 15 to 20 antigens even for low-TMB tumors.
# Predict conventional neoantigens from somatic mutations
neo_results = requests.post(f"{BASE}/immunology/neoantigen-predict",
headers=HEADERS, json={
"mutations": mutations,
"hla_alleles": hla_alleles,
"peptide_lengths": [8, 9, 10, 11],
"binding_threshold": 500,
}).json()
# Scan for dark genome antigens (cancer-testis, HERV, non-coding)
dark_results = requests.post(f"{BASE}/immunology/darkscan", headers=HEADERS, json={
"cancer_type": "melanoma",
"hla_alleles": hla_alleles,
"include_cta": True, # cancer-testis antigens
"include_herv": True, # endogenous retroviral antigens
"include_noncoding": True, # non-coding RNA-derived peptides
"binding_threshold": 500,
}).json()
# Combine and rank all candidates
all_antigens = []
for neo in neo_results["neoantigens"]:
all_antigens.append({
"source": "somatic_mutation",
"gene": neo["gene"],
"peptide": neo["peptide"],
"hla": neo["hla_allele"],
"ic50": neo["mutant_ic50"],
"score": neo["immunogenicity_score"],
})
for dark in dark_results["antigens"]:
all_antigens.append({
"source": dark["antigen_class"], # CTA, HERV, or noncoding
"gene": dark["gene"],
"peptide": dark["peptide"],
"hla": dark["hla_allele"],
"ic50": dark["ic50"],
"score": dark["immunogenicity_score"],
})
# Sort by immunogenicity score
all_antigens.sort(key=lambda x: x["score"], reverse=True)
print(f"Conventional neoantigens: {len(neo_results['neoantigens'])}")
print(f"Dark genome antigens: {len(dark_results['antigens'])}")
print(f"Total vaccine candidates: {len(all_antigens)}")
print("\n=== Top 20 Vaccine Candidates ===")
for i, ag in enumerate(all_antigens[:20]):
print(f"{i+1}. [{ag['source']}] {ag['gene']} - {ag['peptide']}")
print(f" HLA: {ag['hla']}, IC50: {ag['ic50']:.0f} nM, "
f"Score: {ag['score']:.3f}")The combined pipeline produces a ranked list of vaccine candidates from both conventional and dark genome sources. For vaccine design, select the top 15 to 20 candidates ensuring coverage across multiple HLA alleles and a mix of CD8+ (MHC-I) and CD4+ (MHC-II) epitopes. This multi-source approach maximizes the breadth of the anti-tumor immune response and reduces the risk that any single antigen is lost due to immune editing.
From Predicted Neoantigens to Vaccine Design
Selecting the final set of neoantigens for a vaccine requires balancing several factors beyond raw immunogenicity score. First, include antigens presented by different HLA alleles to maximize the probability that at least some will be presented. Second, include both MHC-I and MHC-II epitopes for coordinated CD8+ and CD4+ T cell responses. Third, prioritize antigens from driver mutations (like BRAF V600E or KRAS G12D) that are essential for tumor survival, because the tumor cannot easily escape immune pressure by losing these mutations.
The vaccine modality affects antigen selection. mRNA vaccines (BioNTech/Moderna approach) encode the full mutant protein sequence or a concatenated string of neoantigen-containing peptide sequences. The patient's own cells then translate the mRNA, process the protein, and present the neoantigens on MHC molecules. This approach naturally generates both MHC-I and MHC-II epitopes from a single construct. Synthetic long peptide (SLP) vaccines deliver the peptides directly, requiring separate peptide synthesis for each neoantigen.
SciRouter's vaccine design API can take a ranked list of neoantigen peptides and generate the optimal mRNA construct, including codon optimization, UTR design, and sequence ordering to maximize expression of all included neoantigens. This bridges the gap between computational neoantigen prediction and physical vaccine manufacturing.
Next Steps
Neoantigen prediction is one component of a broader immunotherapy design toolkit. Use MHC Binding Prediction for peptide-MHC affinity calculations, Vaccine Design for mRNA construct optimization, and Neoantigen Pipeline for the complete end-to-end workflow from mutations to vaccine candidates.
For a deeper look at immunotherapy targets beyond conventional neoantigens, read our guide on scanning the dark genome for cancer targets. For mRNA vaccine design principles, see the mRNA vaccine design guide.
Sign up for a free SciRouter API key and start predicting neoantigens today. With 500 free credits per month and no bioinformatics infrastructure to manage, SciRouter is the fastest path from tumor sequencing data to personalized immunotherapy candidates.