![]() |
![]() |
The Comparative Genomics section in ElDorado allows analysis of the transcripts known for a group of orthologous genes (vertebrates or plants). Incomplete or misleading annotation for one genome is identified by comparison of the information available from the other genomes. The program evaluates which of the available promoter regions correspond to each other and extends the annotation by additional promoter regions for which no transcripts are annotated so far (CompGen promoter). All of the promoter regions can be selected by the user for further analysis.
For the requested input gene all Locus IDs for a group of known orthologous genes and their alternative transcripts are retrieved from the database. The transcripts are sorted by organism and aligned by the most 5' conserved exon. Corresponding promoter regions from different organisms are grouped into promoter sets. The analysis is based on the comparison of the exon/intron structure of two transcripts and on the sequence similarity of the corresponding sequence regions.
For the assignment of orthologous genes all transcripts annotated in ElDorado are aligned against each other exhaustively. The pairwise sequence similarity is used to build homology groups for vertebrate and plant genomes.
The output will show a table with all transcripts, a task menu for further analyses of the promoters and a graphical overview of the aligned transcripts.
The result page contains several sections:
A link to a graphical overview of the aligned transcripts, described below
The first column ("Loci") lists the information about organism, chromosomal location, gene name, Genomatix locus ID, and GeneID for each of the loci. Following the links in this column either starts a complete ElDorado analysis or opens the More Gene Info for the respective locus. The contig accession number and strand specificity refer to the genomic sequence for this locus and the orientation of the locus as it is annotated in this genomic context. The - icon next to the organism name allows to hide the information for this locus. It can be re-displayed by clicking on the + icon.
The promoter regions belonging to the loci are shown in
the next column ("Promoter Selection"). For each of the
promoter regions the start and end position and the length of the promoter
are provided. If the promoter region belongs to a promoter set conserved
between the orthologous loci, the respective information is annotated.
Promoters determined by the analysis of the available genome annotation
from orthologous loci (CompGen promoter)
are also listed. The check boxes in this column allow to select a set of
promoter regions for sequence extraction or for further analysis.
Moreover, the top tissues for a promoter, based on CAGE-tags (currently only available for Homo sapiens and Mus musculus)
that are 20 bp around the TSS of the corresponding transcripts, are also listed.
For comparability the tpm (tags per million tags) value will be displayed.
The tpm value is calculated by multiplying the number of CAGE-tags per tissue in a promoter by one million and dividing that through the number of all CAGE-tags per tissue in the corresponding organism.
The transcripts assigned to a promoter are listed in the
last column ("Transcript"). For each transcript, the number
of exons and the position of the TSS relative to the promoter region are
given. The colored field in front of the transcripts indicates the quality
of the transcript (gold, silver, bronze,
for details see the element definition).
Additional information like non-coding/coding status or number of CAGE tags around the TSS
(for human and mouse transcripts) might be displayed. Click on the link behind the
accession number to get detailed information. If redundant transcripts from other sources are available, they are listed in the "Transcript" column..
When one or more transcription factor families (up to 5) are selected in the popup menu and the "Apply" button is pressed, the page is reloaded and additional information is shown for each promoter. If ALL of the selected binding sites are present this is shown in red.
Additionally, promoters belonging to certain promoter sets and promoters with coding transcripts can be selected. All selected filters are combined with AND.
After selection of the desired length the selected promoters in the above result table can either be extracted or used to start an analysis with GEMS Launcher:
The overview is a graphical representation of the transcripts listed in the table above and their genomic context. To allow an alignment of transcripts from different loci they are all displayed on the plus strand. Alternative transcripts from a single locus are simply arranged by their genomic location. Transcripts from orthologous loci are aligned against each other by their most 5' conserved exon. If there are no corresponding exons found the transcripts are aligned by their assumed TSS.
In this example (ACTN4 gene) we depicted two transcripts for the human genome. They are transcribed from two independent promoters. Also two transcripts from the mouse genome are shown. The second corresponds to the first transcript of the human genome (promoter set 1 = first orange box). For the rat genome the first three depicted transcripts also correspond to the longer transcript known from the human genome (promoter set 1). In the fourth rat sequence, the second promoter set (orange box) in the first mouse sequence was used to predict a new promoter (=> second orange box with yellow promoter region).
The graphic consists of four parts:
The main sequence panel contains all transcripts as listed in the table view of the output. For each transcript, a separate sequence is depicted as a line with all annotated elements. Thus, the transcripts appear below each other with differently annotated exons, introns or UTRs.
The colored field to the right of the transcript labels indicates the quality of the transcript (gold, silver, bronze).
The view on the sequence can be changed by using the zoom- and scroll element in the lower right part of the graphics. The navigation panel contains a scaled down version of the sequence and a red box which marks the currently selected part of the sequence that is visible in the main sequence panel above. By default, the whole sequence is displayed.
To zoom
in or out, click on the red box in the navigation panel and adjust
the box by reducing the size via its handles (the small white boxes top
left and bottom right of the red box). The sequence in the main panel will
adjust to the selected window.
If you want to scroll along the sequence, move the red box within the navigation panel by sliding it with the mouse to the desired position. Alternatively you can click anywhere inside the scrollbar to jump to the desired destination.
Individual transcripts can be removed from the main sequence panel by clicking on the checkboxes.
![]() |
exporting the complete result page to HTML (including all graphics in JPG format) |
![]() |
exporting the graphics or a selected region to a certain format (JPG, PNG, TIFF, PNM), based on the current settings of zoom and element selection |
![]() |
selection of a region which can then be exported |
![]() |
recalculation of the layout, i.e. the graphics is displayed as it was set up by default |
© 2022 Precigen Bioinformatics Germany GmbH - All rights reserved |