So, the genome-wide association study (GWAS) data for your disease of interest was published, and it has thrown up some very interesting associations. However, at this stage, bear in mind that this is only an association. Your project is to provide the link between the GWAS single nucleotide polymorphisms (SNP) and pathological changes. Where do you start? What experiments do you need to do?
Here are some pointers to help you on your way to functionally characterize this novel disease-associated region.
What other SNPs Are in Linkage Disequilibrium
In addition to the lead association SNP, there will more than likely be additional SNPs within the linkage disequilibrium (LD) block (an average of 30–70 SNPs per block). Using the Haploreg database, first find out how many SNPs you are working with. Then, look at their impact on either protein primary sequence for exonic variants or any chromosomal landscape characteristics for non-coding variants (histone modifications, DNase hypersensitivity, transcription factor binding, etc.). This gives you clues regarding whether the SNP affects protein sequence or gene expression.
Changes in Gene Expression or in Protein Structure?
SNPs within exons can be either synonymous (no amino acid substitution) or non-synonymous (amino acid substitution). Non-synonymous SNPs can also create missense mutations, nonsense mutations, or can create premature stop codons. Of course, these will all have profound impacts on protein structure and function, leading to pathogenic changes. Consider this before you investigate non-coding SNPs.
SNPs within non-coding regions can influence mRNA stability if they reside within a microRNA binding site (generally within the 3′ UTR). Alternatively, they could affect gene expression if they reside within a genomic region with functional properties (i.e., promoter, enhancer, insulator regions). Altered gene expression can have large or subtle pathogenic effects.
Work out the Affected Gene
If your LD block contains no coding SNPs, then the SNPs likely cause changes in gene expression. However, the dynamic spatiotemporal properties of DNA and chromatin can make things a little more complicated. Keep in mind that the affected gene may not be the one containing, or adjacent to, the associated SNP. In fact, SNPs can influence expression of genes that are positioned megabases away. In the absence of chromosome conformation capture data, which can reveal the interactions of the region, you will need to do some detective work at this stage. Scour the literature for any known roles of nearby genes in your disease of interest. Make use of publicly available gene expression data (e.g., on Genecards, ProteinAtlas, or the GTEx database). You can also quantify expression of surrounding genes in your samples to eliminate any that are not expressed.
Is There an Expression Quantitative Trait Loci (eQTL) Operating?
Once you have identified the affected gene, you’ll want to identify the presence of an eQTL: where gene expression is influenced by DNA sequence. You can do this in two different ways. The first is to quantify allelic expression imbalance (or AEI), which is a direct measure of the expression of each allele in heterozygous individuals. However, you can only use this method if there is a transcript SNP. Alternatively you can quantify gene expression in a number of samples and then stratify gene expression in each sample by genotype at the SNP.
Characterize the Impact of the eQTL on the Cell
Depending on the direction of the eQTL (whether the disease-associated allele corresponds with increased or decreased gene expression), manipulate gene expression in a model system and characterize any resulting change in phenotype. You can look at different readouts, such as changes in expression of key transcription factors, apoptotic markers, or differentiation markers.
Find the Functional SNP Identified in the GWAS
Having characterized the eQTL and its effect on the cell, you can add a great deal of impact to your work if you identify the functional SNP. Not only does this provide additional evidence to support your work, but also highlights novel disease-associated pathways that may be of therapeutic interest. You may need to narrow down your search by eliminating SNPs. Use mRNA stability, luciferase reporter, and electrophoretic mobility shift assays to do this experimentally. Back this up with any available bioinformatics data and/or fine mapping of the region.
Investigate the Impact of the Transcription Factor on Gene Expression
If you identify the functional SNP and the microRNA or transcription factor mediating differential expression, you can manipulate expression of this factor in your model system as an additional step to prove your results. If this again results in changes to the eQTL or to gene expression, then once more you have provided further evidence to support your results.
Then, congratulations, you have successfully characterized this novel, disease-associated locus!
Following on from a GWAS can be challenging, but it’s an emerging field of functional genetics and genomics—with real prospects to identify new disease-associated pathways and therapeutic targets. To date, there have been tens of thousands of GWAS SNPs identified, but only a tiny proportion have been followed up on. This is a real bottleneck. I hope that the above template can help you in your own GWAS follow-up study.