Genomatix-Logo
Overview of Help-Pages
GEMS Launcher Logo

Prediction of promoter regions in mammalian genomic sequences


[Introduction] [Specificity] [Sequence Selection] [Output Example] [References]

Introduction

PromoterInspector is a program that predicts eukaryotic pol II promoter regions with high specificity in mammalian genomic sequences.

The program PromoterInspector focuses on the genomic context of promoters rather than their exact location.

promoter context

Prediction is based on context specific features which were extracted from training sequences (all mammalian) by a heuristic free approach. The novel idea of the PromoterInspector approach is the way of feature definition: Features are defined by equivalence classes of IUPAC groups which allow a fuzzy description of the promoter context. Prediction is based on the analysis of feature frequencies. Details of the algorithm are published in Scherf et al., 2000 (JMB)


Specificity and Sensitivity

Specificity 85%*
specificity was checked on base of > 44 mio basepairs of EMBL Rel. 65 with more than 1200 fully annotated genes
Sensitivity 48%
sensitivity was checked on base of chromosome 21 and 22

* PromoterInspector has also been tested on a large set of exons and introns. Only 2% of these sequences are predicted as promoter regions. This result suggests that some of the additional predictions may well be unknown promoters or enhancers.

Summarizing, please expect PromoterInspector to find good matches but not all promoters!


PromoterInspector Input Data

Sequence Selection
Sequence data There are several ways to supply a set of input sequences:

  • Choose from your previously uploaded sequences.
  • Enter your sequences directly into the form.
  • If your browser supports this option, a sequence file can be uploaded.
  • Enter accession number(s).

The input sequences must be in either one of the following formats:

Result Parameters
Email address Here you can choose between two methods for receiving the results:
  • Show result directly in browser window
    In this option the URL of the result is directly shown in your browser window.

    Warning: Please use this option only for analyses which can be performed in a short time.
    If the analysis takes longer than the timeout of the webserver, the connection will be terminated and you will receive an error message (e.g. "The document contained no data."). In this case, the results will not be available, please restart the analysis using the option below "Send the URL of the result to".

  • Send the URL of the result via email
    In this option an email with the URL of the results will be sent to the user provided email address, when the analysis is finished.

The results will be available for a limited time on our server. For details of how long your results will be kept please see the result-email. After that period they will be deleted unless protected in the project management!


Output Example

PromoterInspector creates an output file that contains a list of promoter regions for every sequence. Start and end of the regions correspond to sense strand numbering. Please note that predictions are not strand specific.

The length of regions might vary between 200 bp and >1000 bp. Please note, that the region start and end is not the start resp. the end of the promoter. The region might contain the promoter or overlap with the promoter.

In our studies, a promoter region was counted as true positive, if a transcription start site (TSS) was located within or up to 200 bp downstream of the predicted promoter region.

Example:

Sequence HSU69634, 1515 bp. TSS is located at position 1450.

PromoterInspector output:

Inspecting sequence HSU69634 [U69634] (1 - 1515):

[Human neurotensin receptor gene, promoter region.]

Start End Length in bp Select Match
1165 1512 348


The promoter predictions can be selected for extraction of the corresponding sequences. You can extract either the complete sequence or only the predicted promoter regions.

Extraction Options
Sequence Extraction You can extract the
  • complete sequence or
  • only the match positions (default)
It is possible to extract further sequence positions upstream and downstream of the match positions. In this case you have to enter the number of additional basepairs into the box. The sequences can be extracted either in FASTA format (default) or in GenBank format.


References for PromoterInspector

If you are interested in more details, the method is described in