Journal Club

Highlighting recently published papers selected by Academy members

Study uncovers new privacy worries for direct-to-consumer DNA testing

Genetic testing services that let users upload their own DNA data raise new privacy concerns. Image credit: Shutterstock/Sarah Weldon.

Genetic testing services that let users upload their own DNA data raise privacy concerns. Image credit: Shutterstock/Sarah Weldon.

Genetic testing services let consumers analyze their DNA to learn about their heritage, their relatives, and their risk for certain heritable diseases. But access to such personal data hasn’t been without controversy, stoking privacy fears. A recent study in eLife uncovered new reasons to be wary. Researchers found several vulnerabilities in the DNA databases of the companies GEDmatch, MyHertiage, FamilyTreeDNA, and Living DNA that could leave users’ personal genetic data at risk of being hacked.

The vulnerable companies are smaller competitors of industry giants, such as 23andMe and Ancestry.com. But the larger companies don’t let users upload their own genetic data. Many of the smaller companies do—and that’s when the privacy concerns arise.

Those companies grow their own databases by letting new users upload their own genotype data purchased from another service. According to study coauthor and population geneticist Michael Edge at the University of California, Davis, the idea is appealing to people who already use, say, 23andMe, and want to plug their genetic results into the GEDmatch database, in hopes of finding more extended family. When many of these smaller direct-to-consumer companies find a family match, they provide users with detailed information about their relatives, including names, email addresses, and in some cases, the specific regions of DNA that they share.

The problem is that someone could easily exploit these upload-based services, Edge says, to extract, for example, potentially sensitive family relationships or health-related data. Just how vulnerable user data are “depends how the database is set up,” he says.

One of biggest risks is a high number of uploads. Imagine a hacker uploading hundreds of real genomes, scraped from a public database such as the 1000 Genomes Project. Each dataset alone might only match a short stretch of a targeted user’s DNA. But if upload one matches region A of a user’s genome, and upload two matches region B of that same user’s genome, then a hacker could cobble together the user’s whole genome over multiple uploads. To prevent this sort of attack, the authors suggest limiting the number of uploads, or reporting family matches without revealing where their genomes are similar.

Other vulnerabilities include targeting users to check whether they have specific genes or mutations, such as those known to correlate with a high risk of Alzheimer’s or bipolar disorder. In this case, rather than uploading real data, a hacker would design a false dataset to find matches at the genes or mutation sites of interest for any user, explains biostatistician Sharon Browning at the University of Washington in Seattle. Another recent study, published in the proceedings of the upcoming Network and Distributed Systems Security Symposium 2020, raises similar concerns about GEDmatch, warning that someone could use fake genetic data to discover another user’s genetic information, including genetic markers known to be correlated to diseases.

One possible fix, researchers suggest, is a unique cryptographic signature on uploaded DNA, provided by a genotyping service that, say, GEDmatch deemed trustworthy.

Browning praised the eLife study’s thorough unpacking of these vulnerabilities, and its suggestions to address attacks. The take-home message for consumers, she says, is that “if they care about their DNA’s privacy, then they shouldn’t upload [their DNA] to these databases.”

Edge contacted GEDmatch, MyHertiage, FamilyTreeDNA, and Living DNA with his findings. All four companies responded, he says, some assuring him that they had already taken steps to resolve the vulnerabilities, and others inquiring about potential ways to patch their holes. So far, Edge has only confirmed that GEDmatch has made changes, including blocking some types of data uploads and adding a reCAPTCHA to hamper hackers from using bots to upload datasets.

Computational biologist Itsik Pe’er, at Columbia University in New York City, praises the ideas and solutions in Edge’s paper. But Pe’er, a reviewer on the eLife study and a former 23andMe consultant, questions if hackers would really bother piecing together genomes using upload services, when they could simply hack company employee cell phones to access sensitive information. Pe’er also questions the special privilege on privacy for genetic information, when other predictors of disease risk such as age, family history, and exercise level are readily available.

Moving forward, Edge hopes the industry will shift toward uniform privacy guidelines, for instance, requiring every company to use cryptographic signatures for user data. But to do so, companies would need to work together to develop signatures that identify which company did the genetic testing. Right now, says Edge, there’s little incentive for the largest companies to help their competitors by fostering such industry-wide standards.

Categories: Genetics | Journal Club and tagged | |
Print Email Comment

Leave a Comment

Your email address will not be published. Required fields are marked *