This database provides a catalog of unique, non-overlapping, orthologous exon regions in the genomes of human, chimpanzee, and rhesus macaque. The database can be used in analysis of multi-species RNAseq expression data, allowing for comparisons of exon-level expression across primates, as well as comparative examination of alternative splicing and transcript isoforms.
In order to identify a orthologous exons in human, chimpanzee, and rhesus macaque, we used a three step strategy (see Figure below):
Specifically, as a starting set, we used all known human Ensembl exons (composed of 520,023 non-unique and overlapping exons in 36,397 genes) (Hubbard, et al., 2009). We then used Blat (Kent, 2002) to identify likely orthologous exons in the chimpanzee (panTro2) and rhesus macaque (rheMac2) genomes. We included only exons with a high similarity between species, and did not allow for long gaps in the aligned exon sequences. This step resulted in the identification of 222,287 orthologous exons (in 28,299 genes) in chimpanzee, and 193,632 orthologous exons (in 24,598 genes) in rhesus macaque.
Next, we excluded exons that might be positioned within repetitive regions in any of the three genomes, as such regions might be particularly susceptible to mapping biases, thus leading to detection of spurious differences in expression levels across species. To do so, we mapped the exons of each species against that species’ genome using Blat, and excluded from further analysis exons whose sequence is highly similar to at least one additional region in the genome. Of the remaining orthologous exons, 163,487 were shared across the three species.
Finally, we merged regions of overlapping exons, to allow for a unique mapping of reads to a single, orthologous exon in each species. To do so, we identified all cases of overlapping exons (Ensembl annotations include a small number of overlapping exons), excluded exons that were overlapping in one or two, but not in all three species, and combined the remaining set of overlapping exons as appropriate. This final step resulted in the definition of 150,107 meta-exons (in 20,689 Ensembl genes) with orthologs in human, chimpanzee, and rhesus macaque.
The database was constructed by Ran Blekhman and John Marioni,
at the Department of Human Genetics, The University of Chicago, and is described
in the manuscript:
A database of orthologous exons in primates for comparative analysis of RNAseq data, R. Blekhman and J. C. Marioni, In review.
Download the full dataset as a tab-delimited flat file here.
For any questions or comments, please email Ran Blekhman at blekhman~%~uchicago.edu