This database provides a catalog of unique, non-overlapping, orthologous exon regions in the genomes of human, chimpanzee, and rhesus macaque. The database can be used in analysis of multi-species RNA-seq expression data, allowing for comparisons of exon-level expression across primates, as well as comparative examination of alternative splicing and transcript isoforms.
Enter a gene name in the box below and press enter to search (for example: TP53, FOXP3, LCT). Suggested Gene names will be displayed as you type. The search currently displays results using version 1 of the database (hg18-panTro2-rheMac2)
To identify orthologous exons in human, chimpanzee, and rhesus macaque, I used a three-step strategy (see Figure below):
Specifically, as a starting set, I used all known human Ensembl exons. I then used Blat (Kent, 2002) to identify likely orthologous exons in the chimpanzee and rhesus macaque genomes. I included only exons with a high similarity between species, and did not allow for long gaps in the aligned exon sequences. Next, I excluded exons that might be positioned within repetitive regions in any of the three genomes, as such regions might be particularly susceptible to mapping biases, thus leading to detection of spurious differences in expression levels across species. To do so, I mapped the exons of each species against that species’ genome using Blat, and excluded from further analysis exons whose sequence is highly similar to at least one additional region in the genome (see supplementary methods). I then excluded from the analysis any exons for which there are no good matches in both chimpanzee and rhesus macaque, resulting in a set of high-quality orthologous exon trios.
Finally, I merged regions of overlapping exons, to allow for a unique mapping of reads to a single, orthologous exon in each species. To do so, I identified all cases of overlapping exons (En-sembl annotations include a large number of overlapping exons), excluded exons that were overlapping in one or two, but not in all three species, and combined the remaining set of overlapping exons as appropriate.
The full analysis outlined above was repeated twice, generating two versions of the database: (1) hg18-panTro2-rheMac2 and (2) hg19-panTro3-rheMac2. The final dataset defined 150,107 meta-exons (in 20,689 Ensembl genes) in version 1, and 187,889 meta-exons (in 30,030 Ensembl genes) in version 2.
The database was constructed by Ran Blekhman,
currently at the Department of Molecular Biology and Genetics, Cornell University,
and previousely at the Department of Human Genetics, The University of Chicago.
The databse is described in the manuscript:
A database of orthologous exons in primates for comparative analysis of RNA-seq data, R. Blekhman, In review.
Download the full dataset as a tab-delimited flat file:
For any questions or comments, please email Ran Blekhman at rb565~%~cornell.edu