genome-wide association study
- Related Topics:
- human genome
- genomics
genome-wide association study (GWAS), systematic approach to rapidly scanning the human genome for genetic variations, with the aim of identifying associations between variants and particular diseases. Genome-wide association studies often concentrate on variations known as single nucleotide polymorphisms (SNPs) and are designed in ways that allow researchers to investigate hundreds or thousands of SNPs in large numbers of individuals at once. Variations that are identified in the genome can be used to assess not only disease risk but also other factors, such as response to certain therapeutic agents.
Developments
Genome-wide association studies were first used in the early 2000s, coincident with the completion of the Human Genome Project and the International HapMap Project and with the development of computer databases capable of storing the full human genome sequence and known variations. Subsequent growth in the application of genome-wide association studies—and corresponding increases in computational demands—fueled the development of improved data management systems. Of particular significance was a shift to the use of higher-performance computing technologies, particularly cloud-distributed computing systems, which are capable of storing incredible amounts of data and which facilitate data sharing, retrieval, and processing.
Methods
In general, to identify statistically meaningful associations, genome-wide association studies require large sample sizes, often consisting of tens of thousands of individuals, with data on genotype (genetic constitution). Different study designs may be used, depending on the question of interest. Case-control studies, for example, search for associations by comparing data from patients with the disease and data from healthy controls. Quantitative studies, on the other hand, search for associations involving traits that vary continuously in natural populations (e.g., blood pressure and weight). Genome-wide association studies further are focused on populations or family groups; family-based studies enable researchers to apply linkage analysis, a powerful means for identifying associations between inherited genetic factors and disease.
Data for genome-wide association studies may be drawn from tailored study cohorts or from preexisting or publicly available resources, such as biobanks. Genotype data already available in such resources or obtained in cohort studies generally is derived from microarray analysis—an effective means for identifying common variants—or from next-generation sequencing, such as whole-genome sequencing, which can capture common and rare variants. Data on individuals included in a genome-wide association study typically undergoes extensive testing for quality control to remove flaws, such as errors in genotyping, prior to testing for associations.
Applications and limitations
Genome-wide association studies have various applications. In general, results from such studies are used for predicting disease risk, for understanding the genetic underpinnings of diseases and other traits, and for understanding the biological role of genetic variations. Genome-wide association studies are of particular interest in fields such as personalized medicine and therapeutics, where information on genotype can be used to assess an individual’s disease risk and to inform treatment decisions.
A general conclusion from genome-wide association studies is that many individual variations contribute to complex traits. For example, more than 125 genetic variants are associated with schizophrenia, with the variants differing in type and contribution to disease risk. An example of the successful application of genome-wide association studies in the realm of therapeutics is the investigation of the hepatitis C virus (HCV), in which analysis of HCV from infected individuals revealed SNPs associated with response to the antiviral drug sofosbuvir. The discovery suggested that results of HCV genotyping could be used to guide treatment decisions for HCV-infected patients.
Although genome-wide association studies have successfully identified connections between SNPs and disease and have cast light on biological pathways, the majority of variations that have been identified offer little novel insight into disease. Moreover, these studies, by linking hundreds or even thousands of variants to common illnesses, could eventually identify and associate one or more variants in every active region of DNA in a given tissue with a disease. In the absence of a more complete understanding of biochemical pathways, the identification of so many variants—most being of limited relevance to disease—would introduce insurmountable complexity to the interpretation of findings from genome-wide association studies.