Genomatix-Logo
Overview of Help-Pages
ElDorado-Logo

Source of elements annotated in ElDorado

Elements Source
Primary transcripts
Exons
Introns
Transcripts and their exon/intron structure are determined by the mapping of cDNA sequences from different sources (e.g. RefSeq, GenBank, Ensembl). Transcripts annotated in ElDorado are divided into 3 quality levels:
  1. The lowest quality bronze is assigned to transcripts derived from the mapping of cDNAs for which no experimental evidence about the 5' completeness is available
  2. The quality silver is assigned to a transcript if its promoter region overlaps with a PromoterInspector prediction.

  3. The quality gold is assigned to transcripts derived from the mapping of cDNAs for which experimental evidence about the 5' completeness is available ( e.g. by oligo-capping). The quality of transcripts initially assigned to bronze/silver is increased to gold if the transcript correlates with at least 3 CAGE tags up to 3 bp upstream/downstream of the transcript start.
Loci Descriptive information about genetic loci was derived from NCBI's EntrezGene.
UTRs calculated by determining the longest open reading frame (ORF) for the transcript
PromoterInspector-Predictions calculated by PromoterInspector
Promoter regions Promoters available in ElDorado are evaluated in a 3 step process:
  1. For each of the transcripts (independent of quality) the promoter is set to 1000/39bp up/downstream of the TSS (prior to ElDorado release 04-2020: 1000/100bp up/downstream of the TSS; prior to ElDorado release 12-2016: 500/100bp up/downstream of the TSS). Also, beginning with ElDorado 04-2020, we do not allow the promoter to overlap with an upstream CDS of a neighbouring locus on the same strand, the promoter gets trimmed accordingly (but no shorter than 100bp upstream of the TSS). Since ElDorado 04-2021 we do not allow the promoter to overlap with a CDS or intron start downstream of the TSS; the promoter gets trimmed accordingly.
  2. The promoters of two or more transcripts are merged into larger promoter regions if they satisfy all of the following conditions:

    • belong to the same locus
    • the promoters and the first exons of the two transcripts are overlapping, respectively
    This length makes up the Genomatix optimized promoter length.

  3. The annotation available from orthologous loci is evaluated. Promoter regions are extended if the first exons of corresponding transcripts differ in length. Based on the comparison of the exon/intron structure of two transcripts and on the sequence similarity of the corresponding sequence regions additional promoter regions are annotated (CompGen promoters). The genome annotation so far contains no transcripts for these promoter regions.
TSR
(transcriptional start regions)
TSRs are defined as regions of genomic sequence for which experimental evidence for transcription initiation is available.
  • Since ElDorado 04-2021: TSRs are directly based on CAGE peaks from the FANTOM5 project
  • Before ElDorado 04-2021: Information about transcription initiation is derived from individual full-length cDNAs and from CAGE tags. Both data sources make use of the oligo-capping method.
    The 5' ends of full-length transcripts and CAGE tags are taken as experimentally verified transcription start sites (TSS). TSSs separated by less than 40bp are grouped in a TSR.
MicroRNAs microRNAs are based on the sequences available in the miRBase at the Sanger Institute.
CAGE tags CAGE tags are available for human and mouse (downloaded from the FANTOM 3 project (up to ElDorado 12-2013) or FANTOM 5 project (since ElDorado 06-2015).
Probes Each single probe from gene expression arrays from Affymetrix and Illumina is mapped against the corresponding genome. All perfect matches are annotated.
SNPs derived from dbSNP (NCBI)
SMARs calculated by SMARTest
Repeats The following genomic repeats are calculated by ModelInspector: ALUs, L1 elements, THEs, and B1 elements.
Promoter Modules calculated for promoter regions by ModelInspector (Promoter Module Library)
Literature Genomatix Pathway System (GePS) data is based on the analysis of abstracts from NCBI's PubMed.