Elements |
Source |
Primary transcripts
Exons
Introns |
Transcripts and their exon/intron structure are determined by the mapping
of cDNA sequences from different sources (e.g. RefSeq, GenBank, Ensembl). Transcripts
annotated in ElDorado are divided into 3 quality levels:
- The lowest quality bronze is assigned to transcripts derived
from the mapping of cDNAs for which no experimental evidence about
the 5' completeness is available
The quality silver is assigned to a transcript if its promoter
region overlaps with a PromoterInspector prediction.
- The quality gold is assigned to transcripts derived from the
mapping of cDNAs for which experimental evidence about the 5' completeness
is available ( e.g. by oligo-capping). The quality of transcripts initially
assigned to bronze/silver is increased to gold if the transcript correlates
with at least 3 CAGE tags up to 3 bp upstream/downstream of the transcript
start.
|
Loci |
Descriptive information about genetic loci was derived from NCBI's
EntrezGene.
|
UTRs |
calculated by determining the longest open reading frame (ORF) for the transcript |
PromoterInspector-Predictions |
calculated by PromoterInspector |
Promoter regions |
Promoters available in ElDorado are evaluated in a 3 step process:
For each of the transcripts (independent of quality) the promoter
is set to 1000/39bp up/downstream of the TSS
(prior to ElDorado release 04-2020: 1000/100bp up/downstream of the TSS; prior to ElDorado release 12-2016: 500/100bp up/downstream of the TSS).
Also, beginning with ElDorado 04-2020, we do not allow the promoter to overlap with an upstream CDS of a neighbouring locus on the same strand,
the promoter gets trimmed accordingly (but no shorter than 100bp upstream of the TSS). Since ElDorado 04-2021 we do not allow the promoter to overlap
with a CDS or intron start downstream of the TSS; the promoter gets trimmed accordingly.
The promoters of two or more transcripts are merged into larger promoter
regions if they satisfy all of the following conditions:
- belong to the same locus
- the promoters and the first exons of the two transcripts are
overlapping, respectively
This length makes up the Genomatix optimized promoter length.
- The annotation available from orthologous loci is evaluated. Promoter
regions are extended if the first exons of corresponding transcripts
differ in length. Based on the comparison of the exon/intron structure
of two transcripts and on the sequence similarity of the corresponding
sequence regions additional promoter regions are annotated
(CompGen promoters).
The genome annotation so far contains no transcripts for these promoter regions.
|
TSR
(transcriptional start regions) |
TSRs are defined as regions of genomic sequence for which experimental
evidence for transcription initiation is available.
-
Since ElDorado 04-2021: TSRs are directly based on CAGE peaks from the FANTOM5 project
-
Before ElDorado 04-2021:
Information about transcription
initiation is derived from individual full-length cDNAs and from CAGE tags.
Both data sources make use of the oligo-capping method.
The 5' ends of
full-length transcripts and CAGE tags are taken as experimentally verified
transcription start sites (TSS). TSSs separated by less than 40bp are grouped
in a TSR.
|
MicroRNAs |
microRNAs are based on the sequences available in the miRBase at the
Sanger Institute. |
CAGE tags |
CAGE tags are available for human and mouse (downloaded from the
FANTOM 3 project
(up to ElDorado 12-2013) or
FANTOM 5 project
(since ElDorado 06-2015). |
Probes |
Each single probe from gene expression arrays from Affymetrix and Illumina
is mapped against the corresponding genome. All perfect matches are annotated. |
SNPs |
derived from dbSNP (NCBI) |
SMARs |
calculated by SMARTest |
Repeats |
The following genomic repeats are calculated by
ModelInspector: ALUs, L1 elements, THEs, and B1 elements. |
Promoter Modules |
calculated for promoter regions by ModelInspector
(Promoter Module Library) |
Literature |
Genomatix Pathway System (GePS) data is based on the analysis of abstracts from NCBI's
PubMed.
|