Overview of Help-Pages
Gene2Promoter Logo

Gene2Promoter - large scale

[Introduction] [Input form] [Statistics page] [Output]


Gene2Promoter provides access to promoter sequences of all genes annotated in the available genomes. Promoter regions are thoroughly annotated and validated according to highest scientific standards, including Genomatix proprietary technology (e.g. PromoterInspector, oligo-capping, comparative genomics).

Generally, there are two ways of accessing Genomatix promoters via Gene2Promoter:

  1. to extract and/or to interactively analyze up to 1000 promoters (see Gene2Promoter for details)
  2. extract large sets of promoters and/or filter promoters for transcription factor binding sites (details below)

Gene2Promoter large scale output includes a file with the promoter sequences in FASTA or GenBank format, an Excel-readable file containing gene <-> promoter correlation and optionally Excel-readable files containing all transcription factor binding sites and promoter modules located in the promoters. After computation, the result files can be downloaded from our server.


"Promoter" sequences available from other sources are usually based on the 5' upstream regions of annotated genes. This is often misleading, since eukaryotic genes usually have 5' untranslated regions (5' UTRs). Since the 5' UTR may also be split over several exons the real regulatory region for a gene frequently is far away from the coding sequence (up to several kb). Gene2Promoter contains the precise annotation of the promoter sequences.

More than 50% of all genes do have alternative transcripts. These genes, additional to alternative splicing, are frequently regulated by different promoters. Only Gene2Promoter includes such alternative promoters.


Input form

In the input form you can choose from a list of organisms. You can retrieve all promoters or orthologous promoters for complete organisms or for a customized list of Gene Ids, Locus Ids or Promoter Ids. Additionally, you can apply a TF binding site filter, i.e. you can select up to four TF binding site matrix families all of which must have at least one match in the promoter sequences.

All promoters from Organism You may choose to extract all promoters from an organism. The organism is selected via the combo box on the upper right of the HTML page.

If you select this option, all promoter regions for this organism will be extracted (optionally filtered by occurrences of matrix family matches, as described below).
If you are looking for orthologous promoters, you have to activate the search for orthologous promoters by selecting organisms from the checkbox list below.

List or file upload You can use the file upload field and/or the text field to upload a customized list containing Gene Ids, Locus Ids, Promoter Ids, Transcript Ids or cDNA Accession numbers from GenBank, RefSeq or Ensembl. The input file/list may contain ids of different types. To distinguish between these, the following rules are applied:
  • a string consisting exclusively of digits is a Gene Id from EntrezGene
  • a string consisting of a leading 'GXL_' (case-insensitive) followed by digits is a Genomatix Locus Id
  • a string consisting of a leading 'GXP_' (case-insensitive) followed by digits is a Genomatix Promoter Id
  • a string consisting of a leading 'GXT_' (case-insensitive) followed by digits is a Genomatix Transcript Id
  • a string consisting of a leading 'ENS' or 'FB' (case-insensitive) followed by other characters and digits is an Ensembl gene or transcript Id
  • a string not matching any of the above patterns and consisting of alphanumeric characters, '-' (minus), '.' (dot) or '_' (underscore), is an Accession number
  • any other string is rejected

The Ids in the input list may be separated by white space (including tabs and newlines), ',' or ';'.

If you are interested in promoters of a single organism only, you can use the popup menu to specify an organism for filtering your list of input ids.
These parameters are hidden by default. Clicking on will reveal them.

Of course, the search for orthologous promoters and the TF binding site filter is also applicable for input Ids.
General note: Applying different filters on input lists can lead to an empty result. Consider uploading a list of vertebrate and plant genes and filtering

  1. for orthologous promoters in any vertebrate organism (using the restrictive search for orthologous promoters)
  2. for promoters containing matches of an arbitrary plant matrix
Then the first filter will remove all plant genes from your list and the second filter will remove all vertebrate genes.

Search for orthologous promoters This option is activated as soon as at least one organism from the list is selected, i.e. at least one of the checkboxes is checked. There are two ways to treat orthologous promoters:

Add orthologous promoters from each of the selected organisms

With this option, the promoters resulting from the input organism, resp. upload list/file, are examined. If there are orthologously related promoters in any of the selected organisms, these are added to the result. If the TF binding site filter is activated, it is also applied to the orthologous promoters.

Restrict output to promoters which are orthologously conserved between ALL selected organisms

In this mode, the orthologous promoters are used as a filter. First, any orthologous promoters from the selected organisms are added to the promoter list, but then any homology group which does not contain at least one promoter from each of the selected organisms, is deleted. This means that any promoter in the result

  • is either from one of the selected organisms
  • OR is orthologously related to a promoter from each selected organism
  • AND contains TF matrix matches of the TF filter, if any

In the result promoter file, any affiliation of a promoter to a homolgy group is denoted only if the search for orthologous promoters is activated. So if you are interested in a single organism, but want to have the homology groups/promoter sets, to which the promoters belong, you must select the same organism from the checkboxes for the "search for orthologous" option (see also the examples section).

Note: The list of organisms to search for orthologs is automatically adjusted whenever the input organism changes.

TF binding site filter You can use the drop-down menus to specify up to four matrix families from the Genomatix matrix library.
These matrix families are used for filtering the promoter sequences. All promoters are scanned for matches of these families. If the sequence does not contain at least one match for ALL of the filter matrix families, it is rejected. In other words, ALL resulting promoter regions contain at least one match for EACH of the filter matrix families.

The output file containing gene information will list the number of filter matrix family matches for each promoter.

Keep in mind that there are different matrix families for the organism groups. When you select a matrix family to filter the promoters, the matrix library the family belongs to MUST match with the group to which the selected organism(s) belong(s), e.g. if you selected an insect organism, all filter matrix families must be from the insect matrix library. An exception to this rule is C. elegans. For this organism you may select matrix families from either the vertebrate or the nematode matrix library. The library can be recognized by the name of the matrix family:

  • Vertebrate matrix families start with V$
  • Insect matrix families start with I$
  • Plant matrix families start with P$
  • Fungus matrix families start with F$
  • Nematode matrix families start with N$

Note: The list of matrix families is automatically adjusted whenever the input organism changes.

Important notice: Applying a matrix family filter will considerably slow down the computation of the statistical data.
Tissue filter
If CAGE tag information is available for the selected species, the tissue filter option will appear with a list of tissues (list of available tissues/species).
Here, multiple tissues can be selected and the promoters will be filtered by tissues. Tissue information is gathered via the CAGE tags for the transcripts belonging to a promoter, i.e. if a CAGE tag with the corresponding tissue information was found in one of the promoter's transcripts, the promoter will appear in the output list.
Note that this is not a complete listing (e.g. for a lot of transcripts there is no or incomplete tissue information) but this task can be used to accumulate those promoters with a positive tissue information.

Statistics page

After selecting organism(s) and/or uploading a list of Ids and specifying TF matrix families on the input form, a statistical overview for your result is computed and displayed. Also, you can specify further options for your result files.

Parameter listing

First on this page is a listing of the organism(s) and the TF binding site filter families which you selected in the input form.

GPD parameters

If you used the upload option, the parameter listing will include a distribution of your input Ids over the organisms. If any Ids could not be assigned to an organism (this can depend on the ElDorado database version in use), you will be notified about this.

GPD parameters

Statistical overview

The table shows the number of Genomatix loci and promoters which satisfy your search conditions.

GPD statistics

With the upload option, the table shows, in which organisms your input Ids were found. If you selected organisms for filtering, all selected organisms will appear in the table, also those, for which no promoters were found.

GPD statistics

Download form

Before you start the extraction of the promoter regions, you should check if you want to apply some of the download options.

Download options
Sequence format By clicking the radio button you can choose the sequence format for the extracted promoter regions. Available formats are
Additional output Optionally you can include an analysis of the resulting promoter sequences.

Selecting the "Transcription factor binding sites" checkbox will create an additional output file containing Genomatix MatInspector matrix matches. The matrices used for searching are hereby automatically selected on the basis of the organism(s) which you selected, respectively to which your input Ids belong (if you used the list upload feature). In particular, vertebrate TF site matrices are used for human, chimp, rhesus macaque, mouse, opossum, rat, dog, horse, cow, Platypus, chicken and Zebra fish promoters, plant TF site matrices for Arabidopsis and rice promoters, insect TF site matrices for Drosophila, Anopheles and honeybee promoters, vertebrate and nematode TF site matrices for C. elegans).

In an analogous manner, if you select "Promoter modules", the promoter regions will be analyzed with Genomatix ModelInspector. The promoter modules involved are also chosen depending on the organism.
Important note: Promoter modules are not available for all organisms. Currently, there is no promoter module library for insects, i.e. for:

  • Drosophila melanogaster
  • Anopheles gambiae and
  • Apis mellifera
promoter modules are not available. If you selected an insect organism, the checkbox to select the promoter module add-on is not displayed. If you used the list upload, the selection depends on the organisms to which the input Ids belong. E.g. if the result contains both vertebrate and insect promoters, the promoter modules option is available, but the model matches will be computed for the vertebrate promoters only.
For C. elegans, the modules from the Vertebrate Promoter Module library are used.

If you select the "BED file with promoter positions" option, an additional output file containing the promoter positions in BED format is created.

This option is only available if all promoters in the output belong to the same organism.

Email address Extracting the promoter regions is a long running job. You will be notified via email when your result files are available (this should take at most one day). Thus you must provide an email address where the notification will be sent to. The email will contain a link to the HTML result page.

Program Output

Gene2Promoter large scale creates several output files (clicking an item below will show a sample output):


The following examples demonstrate the effect of the different input/filter options of Gene2Promoter large scale.

Single organism without filters

Single organism with additional orthologous promoters from the same organism

Single organism with restriction on orthologous promoters from the same organism

Input list with additional orthologous promoters from two organisms

Input list with additional orthologous promoters from two organisms and TF site filter