Home

This database provides a catalog of unique, non-overlapping, orthologous exon regions in the genomes of human, chimpanzee, and rhesus macaque. The database can be used in analysis of multi-species RNAseq expression data, allowing for comparisons of exon-level expression across primates, as well as comparative examination of alternative splicing and transcript isoforms.

You can use the search function to find orthologous exons for a gene of interest, or download the full dataset.

Enter a gene name in the box below and press enter to search (for example: TP53, FOXP3, LCT). Suggested Gene names will be displayed as you type.


Results will be displayed here


Methodology

In order to identify a orthologous exons in human, chimpanzee, and rhesus macaque, we used a three step strategy (see Figure below):

  • For each annotated human exon, identify putative ortholo-gous exons in chimpanzee and rhesus macaque;
  • Exclude exons located in regions with repetitive sequence content in any of the species, to avoid ambiguity in RNAseq read mapping;
  • (3) Merge exons from the same gene whose genomic locations overlap to create a final set of orthologous meta-exons.
Illustration of methodology

Specifically, as a starting set, we used all known human Ensembl exons (composed of 520,023 non-unique and overlapping exons in 36,397 genes) (Hubbard, et al., 2009). We then used Blat (Kent, 2002) to identify likely orthologous exons in the chimpanzee (panTro2) and rhesus macaque (rheMac2) genomes. We included only exons with a high similarity between species, and did not allow for long gaps in the aligned exon sequences. This step resulted in the identification of 222,287 orthologous exons (in 28,299 genes) in chimpanzee, and 193,632 orthologous exons (in 24,598 genes) in rhesus macaque.

Next, we excluded exons that might be positioned within repetitive regions in any of the three genomes, as such regions might be particularly susceptible to mapping biases, thus leading to detection of spurious differences in expression levels across species. To do so, we mapped the exons of each species against that species’ genome using Blat, and excluded from further analysis exons whose sequence is highly similar to at least one additional region in the genome. Of the remaining orthologous exons, 163,487 were shared across the three species.

Finally, we merged regions of overlapping exons, to allow for a unique mapping of reads to a single, orthologous exon in each species. To do so, we identified all cases of overlapping exons (Ensembl annotations include a small number of overlapping exons), excluded exons that were overlapping in one or two, but not in all three species, and combined the remaining set of overlapping exons as appropriate. This final step resulted in the definition of 150,107 meta-exons (in 20,689 Ensembl genes) with orthologs in human, chimpanzee, and rhesus macaque.

About

The database was constructed by Ran Blekhman and John Marioni, at the Department of Human Genetics, The University of Chicago, and is described in the manuscript:
A database of orthologous exons in primates for comparative analysis of RNAseq data, R. Blekhman and J. C. Marioni, In review.

Download

Download the full dataset as a tab-delimited flat file here.

Contact

For any questions or comments, please email Ran Blekhman at blekhman~%~uchicago.edu