How Autosomal Matching Works

Your Raw Data File

The raw data file you downloaded from your testing lab is a spreadsheet with four or five columns. Those that have four columns simply combine the last two columns into a single column, the meaning is the same. Each spreadsheet will have between 500,000 and 1,000,000 rows.

  1. RSID: Reference SNP cluster ID. This number identifies the SNP uniquely. All labs use the same set of RSID numbers.
  2. Chromosome: The chromosome number, 1-23. Some formats use "X" instead of the number 23.
  3. Position: This is the position of the SNP within the chromosome. Most labs today use position numbers published by the Genome Reference Consortium Build 37.
  4. Allele: This is the value read by the lab at the given position. The valid values are A, C, G, or T. Some labs use other values to indicate that the equipment failed to read a particular location. Our matching system only looks at the four valid values, ignoring anything else. Each SNP has two alleles; the order is not significant.

Example:

RSID Chromosome Position Result
rs4477212 1 82154 AA
rs4970383 1 838555 CC
rs4475691 1 846808 CT
rs7537756 1 854250 AA
rs13302982 1 861808 GG

Definitions

Kit 1   Kit 2
RSID Chromosome Position Result   Count RSID Chromosome Position Result
rs4477212 1 82154 AA Match 1 rs4477212 1 82154 AA
rs4970383 1 838555 CC Match 2 rs4970383 1 838555 CC
rs4475691 1 846808 CT Half Match 2 rs4475691 1 846808 CC
rs7537756 1 854250 AA Match 3 rs7537756 1 854250 AA
rs13302982 1 861808 GG Mismatch 0 rs13302982 1 861808 TT