How can we use DNA to determine ancestry and disease risk?

Reading Time – 9 Minutes, Difficulty Level 3/5

What may previously have been deemed science fiction is becoming a more acceptable and understood reality. The ability to discover who our distant ancestors were or what diseases and traits we may be predisposed to developing is a luxury millions of individuals have paid for in 2023. Upon visiting genealogy sites such as Ancestry.com or 23andMe, you are now able to have your genome sequenced with a quick swab of saliva.

These companies are becoming increasingly popular options for people who want to learn more about their ancestral story or their risk of developing diseases. They have also been used for quite controversial applications, such as criminal cases. We will discuss more about the applications in the coming paragraphs but first, how do these services work?

How these services work

Upon signing up and purchasing a kit from most of the common genealogy companies, you will receive your kit in the mail. This includes a saliva collection tube, which is precisely what it sounds like, a tube to collect your saliva sample. Saliva is an excellent source of DNA, the molecule needed to infer ancestry or disease risk. Once your saliva has been returned to the company, they will isolate the DNA from the sample, and sequence it. 

Once sequenced the experts at these companies, along with computer programs, will analyse specific portions of your genome. Specifically, they zoom into areas of the human genome known to have a great deal of variation between populations. These variations are called Single Nucleotide Polymorphisms (SNPs), and are common instances of a change in a nucleotide (subunit of DNA) at a specific location in the genome. SNPs are looked at both individually and as groups of multiple SNPs.

A haplotype is a group of SNPs that are typically inherited together. An example of how an individual’s haplotype is determined is seen below. In this example it is important to note that each SNP is in a different location on the genome, however they are close enough together that they not recombined during meiosis (cell division that occurs in a sperm or eggs which allows for a recombination of DNA, restricting the effects of mutations and making new combinations of DNA sequences). After sequencing someone’s DNA sample, the genealogists have a large curation of all the individual’s haplotypes. This list is used to determine ancestry or disease risk. 

SNP 1SNP 2SNP 3Haplotype
Person 1 AGCAGC
Person 2TTATTA
Person 3TGGTGG

How is ancestry or disease risk determined from a DNA sequence?

Using the list of haplotypes and SNPs, genealogists are able to compare the haplotypes of any individual to a database of people with known ancestry, traits, or diseases. For example, if you share many haplotypes with individuals from France, you are likely to have strong ancestry from France. In the case of slightly mixed ancestry, ancestry is determined in percentages.

For instance, if you share about 55% of your haplotypes with a population from France, and 45% with a Peruvian population, your ancestry would be determined as 55% French and 45% Peruvian. In the case of disease risk, the analysis is similar. If you have haplotype “C” and haplotype “Y”, which are found in all of the individuals with disease X, you may be considered to be at risk for disease X. In reality, the comparisons are much more complex, and many groups of haplotypes are considered when elucidating ancestry or risk of disease.

Why do people with common ancestry have similar haplotypes?

Before planes, trains, and automobiles were invented, populations typically stayed in one location for generations. This means that variations would get passed down into the population and increase in prevalence. For example, if one person has a variation, and they have 6 children. There is now a chance that all 6 of these people have the same variation which they will pass to their children, thus allowing it to spread throughout the population.

Of course, this is a very simple explanation of the genetic preservation that occurs in more isolated populations. After many generations of populations living in the same general area, genetic clues are left behind, even after populations move to different cities, countries, or continents. These genetic clues are SNPs, and the genealogists are the detectives that put the clues together and uncover the origins of your ancestors. 

Things to note

Firstly, it is important to consider that only specific areas of the genome, where common SNPs are found, are analysed and considered. Despite small differences in populations, we are all human and share more DNA than not therefore, it would not be meaningful to compare every single section of the genome, because there would not be many differences to look at. The purpose of looking at SNPs is to find the differences that exist between populations and use these differences to classify individuals based on shared genetic ancestry. 

One main limiting factor of DNA variant analysis in the context of ancestry tracking is that it becomes increasingly less reliable as a person’s ancestry gets more mixed. One variation alone cannot determine ancestry, genealogists look at a combination of many variations that typically coincide with one population.

When you are of very mixed ancestry, discerning patterns become increasingly difficult. Thus, mixed ancestry means that you may be grouped into incorrect populations. Additionally, it is important to note that the basis of these ancestry mapping technologies is the ability to compare your genome to individuals of known ancestry. So, if the database doesn’t include people of your ancestry, you’ll likely be assigned to a nearby population. In fact, it is common for the initial ancestral percentages provided to change as the database gets larger and more refined. 

Lastly but perhaps most importantly, it is important to discuss the significance of the disease risks provided. When these companies discover that you have a genetic risk of developing a disease, they are referring to an association between a singular or group of haplotypes/genes and individuals with that disease. Association does not mean that you will get a disease, it just means you may have an increased risk of developing it. It is important to speak to a healthcare provider concerning any increased risk of disease that you’re concerned about. 

How did they catch the golden state killer?

DNA sequencing can not only be used to evaluate risk of disease or genetic ancestry (although these are both very cool and important applications), but can also be used in criminal cases. In 2018 Joseph James DeAngelo, also known as the Golden State Killer was arrested for a multitude of horrific crimes he committed between 1974 and 1986. It may strike some as odd that the man was charged over 30 years after his crimes ended, but the reason is because, up until recently, law enforcement had no way of tracing the DNA found at his crime scenes to an actual individual.

It wasn’t until investigators began to use the new abundance of genetic data that this individual was found. Law enforcement inputted the DNA sequence of samples from the original crime scenes into GEDmatch, an online curation of DNA data from many different companies. Using this database, 10 to 20 distant relatives of DeAngelo were identified based on shared variations and sequences to the DNA collected from the crime scenes.

Investigators collaborated with genealogists to construct a very large family tree using these individuals, leading them to Joseph James DeAngelo as a new suspect in the cold cases from the 70s and 80s. DeAngelo’s DNA was then sequences via samples discarded from his home, and his DNA sequence fully matched those found in the many crime scenes. With the concrete evidence finally found, police arrested and charged DeAngelo who was later sentenced to life in prison without the possibility of parole. 

The story of the Golden State Killer is only one example of how DNA sequencing technology and databases can be used in criminal cases. This case in particular, however, did alert the public that genomic data can be used to identify you, your loved ones, or distant relatives. This sparked an ethical debate regarding the privacy of individuals who submit their samples to companies such as Ancestry.com or 23andMe. Your DNA sequence could be used to identify your siblings, cousins, parents, and virtually any one of your blood relatives.

Additionally, with the increasing number of people submitting their samples to these databases, it is becoming easier to identify relatives who never submitted any data. The ethical debate surrounding this technology and the data it produces is an important one. Many people argue that, like medical records, DNA sequence data should be kept private unless an individual has given informed consent to have their data shared. Others argue that law enforcement are justified to use these databases whenever they feel necessary in order to continue identifying people like the Golden State Killer. 

DNA sequencing is not an extremely new technology, but its applications are becoming increasingly accepted and discussed in everyday interactions. Whether learning about your ancestry or evaluating your risk of developing certain diseases, genomic sequencing and subsequent genealogical analysis is a powerful tool in uncovering the secrets of our genome.

Since our DNA is the most unique thing about us and can be used to identify who we are, and who our families are, there is a fair bit of controversy concerning how, and if, genomic data should be stored. All in all, the use of DNA sequencing and analysis of variants is unlikely to dissipate anytime soon. The amazing amount of knowledge we now have access to has transformed our understanding of ancestry and genealogy as well as empowered individuals to connect with their roots and take charge of their health.

Leave a Reply

Your email address will not be published. Required fields are marked *

Recent Articles

  • All Post
  • Anthropology
  • Astronomy
  • Astrophysics
  • Biology
  • Black Holes
  • Chemistry
  • Communication
  • Earth Sciences
  • Education
  • Engineering
  • Environmental Science
  • Epidemiology
  • Evolution
  • Geography
  • Geology
  • Mathematics
  • Medical Science
  • Microbiology
  • Mycology
  • Natural Sciences
  • Nutritional Science
  • Paleontology
  • Particle Physics
  • Physics
  • Public Health
  • Quantum Mechanics
  • SETI
    •   Back
    • Agriculture
    • Ecology
    •   Back
    • Ornithology
    • Animal Sciences
    •   Back
    • Archaeology
    •   Back
    • Electronics
    • Semiconductor Physics
    • Computational Sciences
    •   Back
    • Conservation
    • Food
    •   Back
    • Food
    •   Back
    • Space
    • Zoology
It Smells Like….

June 20, 2024

There are so many emotions and memories bundled into one single sniff that not everyone will have the same experience…

Get Updates!

We’ll periodically notify you of new content and features if you subscribe with your email address. 

Copyright © 2023. The Average Scientist | All Rights Reserved | Privacy | Telephone: 01205 212 291

0
Spend £40.00 more to get free GB shipping
Your Cart is empty!

It looks like you haven't added any items to your cart yet.

Browse Products