Why Build a Personal Genomics App
Millions of people have raw DNA data files from 23andMe or AncestryDNA sitting on their hard drives. These files contain hundreds of thousands of genotype calls, but without interpretation they are just rows of rsIDs and letter pairs. Building a personal genomics app lets you turn that raw data into actionable trait reports, drug interaction alerts, and ancestry insights.
This tutorial walks through the full pipeline: parsing a 23andMe raw data file with Python, querying the SciRouter SNP annotation API, matching user genotypes against known associations, and assembling a trait report. By the end you will have a working script that produces a structured JSON report from any 23andMe file.
Step 1: Parse the 23andMe Raw Data File
A 23andMe raw data file is a tab-separated text file. Lines starting with # are comments. Each data line has four columns: rsID, chromosome, position, and genotype. Here is a Python function that parses it into a dictionary keyed by rsID:
def parse_23andme(filepath: str) -> dict[str, dict]:
"""Parse a 23andMe raw data file into a dict keyed by rsID."""
genotypes = {}
with open(filepath) as f:
for line in f:
if line.startswith("#") or line.strip() == "":
continue
parts = line.strip().split("\t")
if len(parts) < 4:
continue
rsid, chrom, pos, geno = parts[0], parts[1], parts[2], parts[3]
if rsid.startswith("rs"):
genotypes[rsid] = {
"chromosome": chrom,
"position": int(pos),
"genotype": geno,
}
return genotypes
# Usage
user_snps = parse_23andme("genome_data.txt")
print(f"Parsed {len(user_snps)} SNPs")A typical 23andMe v5 file contains around 640,000 SNPs. The parser runs in under a second on modern hardware. All processing happens locally — the raw file never leaves your machine.
Step 2: Fetch the SNP Annotation Catalog
SciRouter provides a free, unauthenticated endpoint that returns the full catalog of curated SNP annotations. Each entry includes the rsID, trait or condition, risk allele, genotype interpretations, category, and data source references.
import requests
response = requests.get(
"https://api.scirouter.ai/v1/personal-genomics/annotations"
)
annotations = response.json()["annotations"]
print(f"Fetched {len(annotations)} annotated SNPs")
# Each annotation looks like:
# {
# "rsid": "rs4988235",
# "gene": "MCM6/LCT",
# "trait": "Lactose Tolerance",
# "category": "Nutrition",
# "risk_allele": "T",
# "genotype_effects": {
# "TT": "Lactase persistent (tolerant)",
# "CT": "Likely tolerant (carrier)",
# "CC": "Lactose intolerant (ancestral)"
# },
# "source": "GWAS Catalog"
# }Step 3: Match User Genotypes Against Annotations
With the parsed genotypes and the annotation catalog in hand, matching is straightforward. For each annotated SNP, check whether the user has a genotype call, then look up the interpretation:
def match_genotypes(user_snps: dict, annotations: list) -> list:
"""Match user genotypes against annotated SNPs."""
results = []
for ann in annotations:
rsid = ann["rsid"]
if rsid not in user_snps:
continue
user_geno = user_snps[rsid]["genotype"]
# Normalize genotype order for matching
sorted_geno = "".join(sorted(user_geno))
effects = ann.get("genotype_effects", {})
interpretation = effects.get(sorted_geno, "No interpretation available")
results.append({
"rsid": rsid,
"gene": ann.get("gene", ""),
"trait": ann["trait"],
"category": ann["category"],
"genotype": user_geno,
"interpretation": interpretation,
"risk_allele": ann.get("risk_allele", ""),
})
return results
matches = match_genotypes(user_snps, annotations)
print(f"Matched {len(matches)} SNPs with annotations")Step 4: Build the Trait Report
Group the matched results by category to produce a structured report. This gives you separate sections for traits, pharmacogenomics, health markers, and ancestry:
from collections import defaultdict
import json
def build_report(matches: list) -> dict:
"""Group matches into a categorized report."""
report = defaultdict(list)
for m in matches:
report[m["category"]].append({
"rsid": m["rsid"],
"gene": m["gene"],
"trait": m["trait"],
"genotype": m["genotype"],
"interpretation": m["interpretation"],
})
return dict(report)
report = build_report(matches)
print(json.dumps(report, indent=2))
# Output structure:
# {
# "Traits": [...],
# "Pharmacogenomics": [...],
# "Health": [...],
# "Ancestry": [...],
# "Neanderthal": [...]
# }The report typically contains 200-300 matched SNPs across all categories, depending on the user's chip version and call rate. You can render this as a web dashboard, export to PDF, or feed it into downstream analysis tools.
Next Steps
- Try the full pipeline using SciRouter Personal Genomics Lab which handles all of this in the browser with a drag-and-drop interface
- Use the free SNP lookup tool to query individual variants
- Read the SNP annotation API reference for filtering and advanced query parameters
- Explore the upload tutorial for the no-code approach