GenomicsPersonal Genomics

How to Read Your 23andMe Raw Data File (Explained)

Your 23andMe raw data is a text file with hundreds of thousands of lines. Learn what the rsid, chromosome, position, and genotype columns mean and how to parse the data.

Ryan Bethencourt
April 9, 2026
8 min read

Your Raw Data File Is Just a Text File

When you download your raw data from 23andMe, you get a zipped text file containing hundreds of thousands of lines. It looks intimidating at first glance, but the format is straightforward once you understand the four columns. This guide explains exactly what each part means and how to work with the data.

The File Format

The raw data file is a tab-separated values (TSV) file. Lines starting with # are comments — they contain metadata like the chip version and download date. The data lines have four columns separated by tabs:

Example 23andMe raw data
# This data file generated by 23andMe at: Wed Jan 15 2026 08:42:33
# Below is a text version of your data.
# rsid  chromosome  position  genotype
rs3094315	1	752566	AG
rs12562034	1	768448	GG
rs3934834	1	1005806	CT
rs9442372	1	1018704	AG
rs3737728	1	1021415	GG
rs11260588	1	1021695	--

Each non-comment line represents one SNP (single nucleotide polymorphism) — a position in your genome where people commonly differ. A typical file has 600,000 to 730,000 of these entries.

What the Columns Mean

rsid — The SNP Identifier

The first column is the reference SNP identifier, a unique label assigned by the dbSNP database. It always starts with “rs” followed by a number (like rs3094315). This is the universal name for that variant position — researchers, clinicians, and databases all use the same rsid to refer to the same genomic position. You can paste any rsid into the SNP Lookup tool to find out what it does.

Note
Some entries use internal identifiers starting with “i” (like i3000001) instead of rs numbers. These are 23andMe custom probes that may not be in public databases. They are valid data points but harder to look up in external tools.

chromosome — Where in the Genome

The second column tells you which chromosome the SNP is on. Values range from 1 to 22 for autosomes, plus X, Y, and MT (mitochondrial DNA). Chromosomes 1–22 are present in two copies (one from each parent), which is why most genotypes have two letters. The X chromosome has two copies in genetic females and one in genetic males. Y chromosome and mitochondrial SNPs are inherited from one parent only.

position — The Exact Coordinate

The third column is the base-pair position on the chromosome, using coordinates from the GRCh37 (hg19) human reference genome assembly. This number tells bioinformatics tools exactly where in the chromosome the variant falls. Position 752566 on chromosome 1 means the 752,566th nucleotide on chromosome 1 according to the GRCh37 reference. If you use tools that expect GRCh38 coordinates, you will need to convert positions using a liftover tool.

genotype — Your Two Alleles

The fourth column is your genotype: the two nucleotide letters you carry at that position. Since you have two copies of each autosomal chromosome, you get two letters. The possible nucleotides are A (adenine), C (cytosine), G (guanine), and T (thymine).

  • Homozygous — both alleles are the same (AA, CC, GG, TT). You inherited the same variant from both parents.
  • Heterozygous — the alleles differ (AG, CT, etc.). You inherited one variant from each parent.
  • No-call (--) — the chip could not determine your genotype at this position. This is normal and affects 1–3% of SNPs.
Note
The allele order in heterozygous genotypes (AG vs GA) is arbitrary and does not indicate which allele came from which parent. 23andMe reports alleles on the plus strand of the reference genome.

Understanding Effect Alleles

When a study reports that “the A allele at rs1234 is associated with increased risk,” the A is called the effect allele. If your genotype at rs1234 is AG, you carry one copy of the effect allele. If it is AA, you carry two copies. If it is GG, you carry none. The number of effect alleles you carry (0, 1, or 2) is called your dosage, and it determines the magnitude of the genetic effect for additive traits.

Chip Versions and SNP Counts

23andMe has used several different Illumina genotyping arrays over the years. The chip version determines which SNPs are in your file:

  • Version 3 (2010–2013) — ~960,000 SNPs. Broadest coverage of any 23andMe chip.
  • Version 4 (2013–2017) — ~570,000 SNPs. Smaller but added 23andMe custom content.
  • Version 5 (2017–2020) — ~640,000 SNPs. More clinically relevant variants and pharmacogenomics coverage.
  • GSA chip (2020–present) — ~730,000 SNPs. Illumina Global Screening Array with custom additions.

Not all SNPs overlap between chip versions. A variant present on version 3 may be absent from version 5, which is why third-party tools sometimes report that certain SNPs are not available in your data.

How to Look Up a Specific SNP

The fastest way to understand what a specific SNP does is to search your raw data file for the rsid and then look it up. Open your file in a text editor, use Ctrl+F (or Cmd+F on Mac) to search for the rsid, and note your genotype. Then paste the rsid into SciRouter's free SNP Lookup tool to get a plain-English explanation of the variant, associated traits, and what your genotype means.

Counting and Parsing SNPs with Python

For a quick summary of your data, here is a Python snippet that parses the file and reports basic statistics:

Parse and summarize 23andMe raw data
from collections import Counter

chroms = Counter()
no_calls = 0
total = 0

with open("genome_data.txt") as f:
    for line in f:
        if line.startswith("#") or not line.strip():
            continue
        parts = line.strip().split("\t")
        if len(parts) == 4:
            rsid, chrom, pos, genotype = parts
            total += 1
            chroms[chrom] += 1
            if genotype == "--":
                no_calls += 1

print(f"Total SNPs: {total:,}")
print(f"No-calls:   {no_calls:,} ({100*no_calls/total:.1f}%)")
print(f"\nSNPs per chromosome:")
for c in sorted(chroms, key=lambda x: (x.isdigit(), int(x) if x.isdigit() else 0)):
    print(f"  chr{c}: {chroms[c]:,}")

What You Can Learn from Your Data

Once you understand the file format, you can explore several areas:

  • Pharmacogenomics — how you metabolize medications. Try the Pharmacogenomics Checker.
  • Trait genetics — variants linked to taste, caffeine metabolism, earwax type, and more
  • Health-related variants — APOE, MTHFR, BRCA, and other well-studied positions
  • Regulatory variant effects — use AlphaGenome to predict how your variants affect gene regulation
  • Comprehensive analysisupload your file for a full personal genomics dashboard

Ready to explore your genome? Start with the free SNP Lookup or sign up for a free SciRouter account to access the full genomics API.

Frequently Asked Questions

What does '--' mean in my genotype data?

A '--' genotype is a no-call, meaning the genotyping chip could not confidently determine your alleles at that position. This happens when the fluorescence signal is ambiguous or the DNA quality at that spot was low. No-calls are normal and typically affect 1-3% of SNPs. They do not indicate a deletion or mutation — just a failed read. You can ignore these entries in most analyses.

Why do I have fewer SNPs than expected?

The number of SNPs varies by chip version. Version 3 had approximately 960,000 SNPs, while versions 4 and 5 had fewer (570,000 and 640,000 respectively). The current GSA chip covers about 730,000. Additionally, no-calls reduce the usable count. If your file has significantly fewer than expected for your chip version, the download may have been corrupted — try re-downloading from your 23andMe account.

What is GRCh37?

GRCh37 (also called hg19) is a version of the human reference genome assembly published by the Genome Reference Consortium in 2009. It defines the coordinate system that 23andMe uses for SNP positions. When your file says a SNP is at position 752566 on chromosome 1, that means position 752,566 on the GRCh37 assembly. The newer GRCh38 (hg38) assembly uses different coordinates, so you may need to 'lift over' positions when using tools that expect GRCh38.

Can I convert my data to a different format?

Yes. The 23andMe text format can be converted to VCF (Variant Call Format), PLINK format, or other standard genomics formats using tools like bcftools, PLINK, or online converters. VCF is the standard format used by most bioinformatics pipelines and clinical genomics tools. The key information needed for conversion (rsid, chromosome, position, genotype) is all present in the raw data file.

What's the difference between 23andMe chip versions?

23andMe has used several Illumina genotyping chips over the years. Version 3 (2010-2013) tested ~960K SNPs with broad coverage. Version 4 (2013-2017) dropped to ~570K but added custom content. Version 5 (2017-2020) had ~640K SNPs with more clinically relevant variants. The current GSA-based chip covers ~730K SNPs. Chip versions affect which SNPs you have data for, which is why some third-party tools may report that certain variants are not available in your data.

Try this yourself

500 free credits. No credit card required.