Current and Future Work
The Gilad Lab is now focused on moving beyond simple explorations of gene expression levels, to studies of variation in regulatory mechanisms, response phenotypes, and ultimately – complex traits (including disease). the following are ongoing projects in the lab:
Using iPSC to study primate evolution
(in collaboration with Jonathan Pritchard)
Comparative genomics studies in primates are extremely restricted because we only have access to a few types of cell lines from non-human apes and to a limited collection of frozen tissues. In order to gain better insight in regulatory processes that underlie variation in complex phenotypes, we must have access to faithful model systems for a wide range of tissues and cell types. To facilitate this, we have generated a panel of 7 fully characterized chimpanzee (Pan troglodytes) induced pluripotent stem cell (iPSC) lines derived from fibroblasts of healthy donors. All lines appear to be free of integration from exogenous reprogramming vectors, can be maintained using standard iPSC culture techniques, and have proliferative and differentiation potential similar to human and mouse lines. To begin demonstrating the utility of comparative iPSC panels, we collected RNA sequencing data and methylation profiles from the chimpanzee iPSCs and their corresponding fibroblast precursors, as well as from 7 human iPSCs and their precursors, which were of multiple cell type and population origins. Overall, we observed much less regulatory variation within species in the iPSCs than in the somatic precursors, indicating that the reprogramming process has erased many of the differences observed between somatic cells of different origins. We identified 4,918 differentially expressed genes and 3,598 differentially methylated regions between iPSCs of the two species, many of which are novel inter-species differences that were not observed between the somatic cells of the two species. Our panel will help realize the potential of iPSCs in primate studies, and in combination with genomic technologies, transform studies of comparative evolution.
The evolution of protein regulation
(in collaboration with Jonathan Pritchard)
The goals of this study are to test the hypothesis that protein expression levels evolve under greater evolutionary constraint than transcript expression levels, and to take first steps in order to understand the underlying mechanisms and thereby also distinguish between buffering and compensation. Variation in gene regulation, both at the transcriptional and translational levels, is thought to be involved in human phenotypic diversity including disease susceptibility, and is hypothesized to have played an important role in human evolution. Measurements of steady-state mRNA levels have revealed substantial differences across primate transcriptomes and have led to the identification of putatively adaptive changes in transcript expression levels. Generally, mRNA levels have been considered to be good proxies for protein levels, which are usually more directly involved in biological processes. However, the independent translational regulatory machinery is sufficiently complex that a liner relationship between transcript and protein abundance cannot be assumed, and there are other post-transcriptional and post-translational mechanisms that influence protein expression levels, for example micro RNA-mediated translational repression. Therefore, the characterization of variation in protein levels among human and non-human primate tissues is expected to provide additional insight into our evolutionary history, beyond what may be obtained by mRNA-based analyses of gene expression patterns. Our preliminary results indicate that protein levels may evolve under greater evolutionary constraint than transcript expression levels. Our observations (as well as those of others, in different model species) raise the hypothesis that protein expression may be buffered or compensated against changes at the transcript level. To test the hypothesis that buffering (rather than compensation) of protein expression levels is a general property of gene regulation and to functionally measure the extent of hard-wired buffering, we propose to measure RNA and protein expression levels from multiple human and chimpanzee primary tissues and their corresponding cell lines (Aim 1), to use ribosomal profiling to comparatively characterize protein translation rates in the same collection of tissues and cell lines (Aim 2), and to use genome-editing techniques to introduce random mutations in the regulatory elements of the selected genes (Aim 3).
Genome-wide association studies (GWAS) have identified many variants associated with cardiovascular-related diseases, some of which are novel. However, similar to other common diseases, these identified risk-associated variants fail to explain a significant portion of the genetic heritability of cardiovascular disease (CVD). Moreover, many associated variants are non-coding with no obvious function, though putatively, these are involved in gene regulation. By combining results of GWAS with expression quantitative trait locus (eQTL) mapping one can identify functional variants that influence gene expression and are also associated with disease risk. Using such combination of approaches one can identify true weak associations that are otherwise difficult to distinguish from statistical noise using a GWAS approach alone, as well as develop an immediate intuition regarding both the function of the associated variants and knowledge of the implicated genes. However, for this method to be most effective, eQTL studies should be performed in cells that are relevant to the phenotype of interest, which are often not easily accessible in population samples. Indeed, nearly all eQTL mapping studies in humans to date (including the studies we performed in the first term of this grant) used gene expression measurements from blood cell types, fibroblasts, or lymphoblastoid cell lines (LCLs). In that sense, induced pluripotent stem cells (iPSCs) can change human genetics in a profound way. The ability to differentiate iPSCs can allow us to perform functional studies in the most relevant cell types. We thus propose to map eQTLs and investigate the genetic basis of cardiovascular disease in cardiomyocytes, which will be differentiated from induced pluripotent stem cells (iPSCs) of 120 Hutterite individuals. The Hutterites are a founder population of European descent that practices a communal, farming lifestyle. The Hutterites of South Dakota, the subjects of our studies, live on a communal (15-25 families) farms (called “colonies”), where all meals are prepared and eaten in a communal kitchen, smoking is prohibited (and rare), and early life environments are extremely uniform. Our specific aims are to reprogram iPSCs from the LCLs of 120 Hutterites and obtain differentiated cardiomyocytes from each individual (aim 1), map eQTLs in differentiated cardiomyocytes (aim 2), and integrate eQTL mapping with GWAS results to identify variants associated with CVD-related phenotypes (aim 3). At the conclusion of this work, we expect to gain important insight on the genetic basis of gene regulation in the heart in general, as well as on gene regulatory variation that is associated with CVD risk.
Tuberculosis is a major public health problem. One-third of the world’s population is estimated to be infected with Mycobacterium tuberculosis (MTB), the etiological agent causing tuberculosis (TB), and active disease kills nearly 2 million individuals worldwide every year. Successions of treatments of TB have quickly become ineffective as the agent rapidly becomes resistant. However, strikingly, only 10% of infected individuals develop the disease. In other words, while Mycobacterium tuberculosis quickly develops resistance to new drugs, roughly 90% of individuals are naturally resistant to infection (when not co-infected by agents, which compromises the immune system, such as HIV). Several lines of evidence indicate that genetic factors contribute to inter-individual differences in susceptibility to TB, including the observation that monozygotic twins have considerably higher concordance rates for tuberculosis morbidity than do dizygotic twins. In addition, multiple rare single-gene mutations with high penetrance have also been linked with susceptibility to mycobacteria. However, although genetic studies of TB have identified important pathways involved in protective immunity, very little is known about the underlying genetic determinants or mechanisms contributing for differences in susceptibility at the population level. To address this gap, we use a combination of empirical and statistical approach, including expression profiling and eQTL mapping in infected primary cell cultures, to identify genes and regulatory pathways that contribute to inter-individual and inter-population variability in susceptibility to Mycobacterium tuberculosis infection.
One of the central challenges in modern genomics is to learn the rules by which the genome encodes regulatory information. How does a single genome sequence encode the information for exquisitely precise, and yet highly distinctive programs of gene regulation for different cell types, at different time points, under different conditions? Molecular biology provides insight into many of the major principles of gene regulation, yet we are still very far from understanding how information is encoded in the genome. Recent studies of variation in gene regulatory phenotypes within and between species have provided important insights towards deciphering the regulatory logic and the complex interplay between different mechanisms. Though we now have considerable insight into the primary sources of variation in gene expression levels, the current challenge is to develop an understanding of regulatory code that will allow us to interpret, and even predict which variants in the genome affect gene regulation and by what mechanisms. In the first term of this award, we focused on characterizing and genetically mapping inter-individual variation in different regulatory mechanisms in lymphoblastoid cell lines (LCLs). We gained considerable insight (we report on our progress below), but our observations were limited to only one cell type. To gain a more complete understanding of the language and logic that underlie dynamic gene regulatory programs we must have access to collections of multiple cell types or tissues from the same individuals. Postmortem collections of frozen human tissues can provide one such resource, but these are not renewable, present a challenge with respect to cellular heterogeneity, and one cannot perform follow-up functional validation and perturbation assays using frozen tissue samples. We therefore propose to characterize and genetically map variation in steady state, spatial, and temporal gene regulation, in a set of 70 induced pluripotent cells (iPSCs) as well as in their corresponding timecourse staged differentiated cells towards three cell fates. The iPSC lines are being reprogrammed from 70 unrelated HapMap Yoruba LCLs, which we have used as a model system for studies of gene regulation for the past five years. In addition to study gene expression levels in the iPSCs and their differentiated cells, we will also collect data on epigenetic markers, chromatin accessibility and transcription factors binding footprints from all cell types at all time points. This combination of data will allow us to map loci that affect variation in temporal, spatial, and steady state gene expression levels, as well as characterize – at unprecedented resolution – the causal relationship across a network of genetic variation, variation in different regulatory mechanisms, and ultimately, changes in gene expression levels.