What Is a SNP?
A single nucleotide polymorphism (SNP, pronounced “snip”) is a position in the human genome where a single DNA base letter varies between individuals. The human genome is 3.2 billion bases long, and any two people differ at roughly 4–5 million of those positions. SNPs are the most common form of genetic variation and the basis of most consumer genetic tests.
Each SNP is identified by an rsID — a reference number from the dbSNP database maintained by NCBI. For example, rs12913832 is a SNP in the HERC2/OCA2 region that strongly influences eye color. When you download your 23andMe or AncestryDNA raw data, it is essentially a long list of rsIDs paired with your genotype at each position.
How Genotyping Works
Consumer genetic tests like 23andMe and AncestryDNA use microarray chips to read your DNA. Unlike whole genome sequencing (which reads every base), microarrays measure a pre-selected set of SNP positions — typically 600,000 to 700,000 sites chosen for their relevance to health, traits, ancestry, and research.
The process works by hybridization: your fragmented DNA is washed over a chip containing millions of short DNA probes. Each probe is designed to bind specifically to one allele at a SNP position. Fluorescent signals indicate which allele (or alleles) are present. The raw data file you download contains the results of this process for each measured position.
Reading Genotype Data
Your genotype at each SNP is reported as two letters, one from each chromosome (since humans are diploid). There are three possible genotypes at a biallelic SNP:
- Homozygous reference — Both copies match the reference allele (e.g., AA). You carry zero copies of the variant.
- Heterozygous — One copy of each allele (e.g., AG). You carry one copy of the variant.
- Homozygous variant — Both copies are the alternative allele (e.g., GG). You carry two copies of the variant.
The order of letters (AG vs GA) does not matter — they represent the same heterozygous genotype. You cannot tell which allele came from which parent using microarray data alone; that requires phasing analysis or parental genotyping.
Reference vs Alternative Alleles
Every SNP has a reference allele (the base in the GRCh37 or GRCh38 human reference genome) and one or more alternative alleles. It is important to understand that the reference allele is not necessarily the “normal” or most common variant. The reference genome was assembled from a small number of individuals and may not represent the global majority at every position.
For example, at some SNP positions, the alternative allele may be more common globally than the reference allele. When reading research papers or annotations, pay attention to which allele is reported as the effect allele — the one associated with a trait or outcome — rather than assuming the reference allele is the baseline.
Population Frequencies
Allele frequencies vary significantly across global populations. A variant that is common in one population may be rare in another. Population frequency data (from projects like gnomAD and the 1000 Genomes Project) is typically reported for major continental groups:
- EUR — European ancestry populations
- AFR — African ancestry populations
- EAS — East Asian ancestry populations
- SAS — South Asian ancestry populations
- AMR — Admixed American populations
Understanding population frequency is critical for interpreting whether your genotype is common or rare in your ancestral background, and for contextualizing risk associations that may have been studied primarily in one population.
Confidence Levels in Genomic Research
Not all SNP-trait associations are equally well-supported. When evaluating SNP annotations, consider:
- Sample size — Large GWAS studies with tens of thousands of participants provide stronger evidence than small candidate gene studies.
- Replication — Findings replicated across multiple independent studies and populations are more reliable.
- Effect size — Most common SNPs have small individual effects. Claims of a single SNP “determining” a complex trait should be treated with skepticism.
- Study design — GWAS provide statistical associations, not proof of causation. Functional studies that demonstrate a biological mechanism provide stronger evidence.
Start Exploring Your SNPs
Ready to look up individual SNPs from your raw data? The free SNP Lookup tool lets you search any rsID and see trait associations, population frequencies, effect alleles, and research confidence levels. For a broader analysis, upload your full raw data file through the Genomics Dashboard.
Developers can access the full SNP annotation catalog via the SciRouter Genomics API — sign up for a free API key to get started.