by Kanak Gupta
Graphic design by Emily Huang
Genome Wide Association Studies (GWAS) have been at the frontier of biomedical research’s shift towards precision medicine. Tens of thousands of genetic disease risk factors and trait-associated gene variants have been discovered since the first GWAS in the early 2000s.1 Moreover, with the availability of data from large-scale genetic biobanks, study sample sizes frequently surpass one million, helping us detect even subtle genetic effects.2 However, this perfect picture of scientific advancement is marred by the fact that 95% of participants in all GWAS to date are of European descent.3,4 Are we investigating the mysteries of the human genome, or only the European one? Could precision medicine based on such limited diversity actually be dangerous?
The GWAS Catalog, run collaboratively by the National Human Genome Research Institute and the European Bioinformatics Institute, is a catalogue of all publications utilising GWAS to find genetic associations for different diseases and traits. These studies source genomic data from private companies such as 23andMe and AncestryDNA, as well as national institutional genomic databases and initiatives, including biobanks in the UK, the US, Canada, Iceland, and China.2 A 2020 report estimated that 38 million individuals worldwide have had their genome sequenced in some capacity; it estimated that this figure may reach 52 million by 2025.5
However, 72% of all subjects within global biobanks come only from three countries: the United States, United Kingdom, and Iceland. In 2023, 90% of participants in discovery stage genomic studies were of European descent (83% in replication-stage GWAS). Genomic data initiatives in China, Japan, South Korea, and India contributed to the majority of the non-European participants. However, African, Latin American, Hispanic, and Indigenous populations only accounted for 1.5% of all participants in discovery stage GWAS and 9% of replication studies.3 Even when geneticists do have access to more diverse samples, they show a preference for European cohorts, citing that controlling for ancestry will simplify data analysis. Furthermore, new research builds upon existing research, and repeated sampling from older, biased studies to perform new analyses only exacerbates the Eurocentric sampling bias.6
Clearly, the genomic research landscape has a blind spot when it comes to ancestral diversity, leading to a large gap in genetic diversity. Despite all humans sharing the same genetic ancestry, GWAS studies often lack cross-population transferability due to unaddressed genetic, geographic, and cultural differences that affect risk scores for different diseases in clinical applications of genetic research. 2 A 2019 meta-review indicated that polygenic risk scores are 2 to 4.5 times more accurate in European populations than in East Asian and African populations.7
This oversight can have dangerous consequences. People of colour are more likely than Europeans to receive ambiguous genetic test results, or even erroneous results, indicating they are carriers of genes that increase their risk of certain diseases.6 For example, based on conclusions drawn from European populations, gene variants linked to glycated haemoglobin (HbA1C) are used as diagnostic markers of diabetes. However, a study in rural Uganda revealed that 22% of their participant population had this gene and it was not correlated with diabetes; rather, it was protective against severe malaria. This study also found several novel genes of interest that were not detected in existing data.3,8 In fact, numerous small studies in non-European populations have revealed useful genetic links that were missed in large sample GWAS because those alleles occur less often in Europeans.6,9
While the diversity gap in our genomic data is tremendous, it is not insurmountable. Many academics have made calls to increase the diversity of our biobanks, and tools like the GWAS Diversity Monitor can increase awareness and track progress. More recent projects, such as the Trans-Omics for Precision whole-genome sequencing project in the US, have made efforts to recruit a diverse range of participants. Large-scale genome biobanks in Asia have played a significant role in increasing non-European genomic data, although 93% of this data remains locally used.2,6
Furthermore, many institutions are funding initiatives to increase geographic and ethnic representation. In South Asia, projects such as IndiGen and Pakistan Alliance on genetic RisK factors for Health (PARKH) have fostered sustainable research collaborations between South Asian and Western institutions.9,10 In Africa, initiatives such as the Human, Heredity, and Health in Africa Consortium, the Uganda Genome Resource, and the Nigerian 100K Genome Project, aim to increase equity and help build genomic research infrastructure.2,6,9 These projects have already yielded key insights; however, African scientists warn that this increase in data should not only benefit the Western institutions funding the initiatives, but also serve the local populations from whom the data is being collected.11 Globally, efforts are being made to build ethical relationships with Indigenous peoples to encourage their participation in genomic research.12
Increasing the diversity of our genetic biobanks is crucial to decreasing inequity in genomic research. However, cultural and practical shifts in academia are required to increase both the recruitment and utilisation of non-European genomic data. The knowledge we have acquired from our genomic libraries has grown exponentially since the first human genome was mapped, but the longer we ignore the gap in our ancestral diversity, the wider the chasm of ignorance becomes. Without prompt action, the very foundations of precision medicine will be built on unequal grounds.
References
1. Uffelmann E, Huang QQ, Munung NS et al. Genome-wide association studies. Nat Rev Methods Primer. 2021 Aug 26;1(1):1–21.
2. Abdellaoui A, Yengo L, Verweij KJH, et al. 15 years of GWAS discovery: Realizing the promise. Am J Hum Genet. 2023 Feb 2;110(2):179–94.
3. Mills MC, Rahal C. The GWAS Diversity Monitor tracks diversity by disease in real time. Nat Genet. 2020 Mar;52(3):242–3.
4. Mills MC, Rahal C. A scientometric review of genome-wide association studies. Commun Biol. 2019 Jan 7;2(1):1–11.
5. Understanding the Global Landscape of Genomic Initiatives – IQVIA [Internet]. [cited 2024 Nov 4]. Available from: https://www.iqvia.com/insights/the-iqvia-institute/reports-and-publications/reports/understanding-the-global-landscape-of-genomic-initiatives
6. Popejoy AB, Fullerton SM. Genomics is failing on diversity. Nature. 2016 Oct;538(7624):161–4.
7. Martin AR, Kanai M, Kamatani Y et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet. 2019 Apr;51(4):584–91.
8. Gurdasani D, Carstensen T, Fatumo S et al. Uganda Genome Resource Enables Insights into Population History and Genomic Discovery in Africa. Cell. 2019 Oct 31;179(4):984-1002.e36.
9. Fatumo S, Chikowore T, Choudhury A, et al. A roadmap to increase diversity in genomic studies. Nat Med. 2022 Feb;28(2):243–50.
10. Divakar MK, Jain A, Bhoyar RC, et al. Whole-genome sequencing of 1029 Indian individuals reveals unique and rare structural variants. J Hum Genet. 2023 Jun;68(6):409–17.
11. Nordling L. African scientists call for more control of their continent’s genomic data. Nature [Internet]. 2018 Apr 18 [cited 2024 Nov 4]; Available from: https://www.nature.com/articles/d41586-018-04685-112
12. Claw KG, Anderson MZ, Begay RL, et al. A framework for enhancing ethical genomic research with Indigenous communities. Nat Commun. 2018 Jul 27;9(1):2957.