Home page - Projects - Information Content in Genomic Sequences
 21-Dic-2011   Stampa la pagina corrente   Mostra la posizione di questa pagina nella mappa

Information Content in Genomic Sequences

Nucleotide distribution in genomes is known not to be random, showing overepresentation of specific motifs, long and short range correlations, periodicities, large-scale compositional heterogeneities. Little is known about the possible functional effects of nucleotide distributions on the conformational landscape of DNA, nevertheless, it is known that the chemo-physical properties of the stacking of nucleotides are fundamental in determining the double helix dynamics and are critical factors for the transmission of the message encoded in DNA.

A vast literature indicates that there is a relationship between transcription processes and the base composition of the related regulative sequences. It is still an open question whether the GC content may play a role in determining promoter performances as it is related to DNA conformational properties such as flexibility, thermal stability, opening bubbles which permit transcription and so on. Consequently, we are interested in deviations from randomness in nucleotide distribution patterns as well as in the content of information or ‘complexity' encoded in the symbolic representation of sequences upstream the Transcription Start Site (TSS).

Analysis of regulatory regions

We are investigating DNA sequence heterogeneities at a local level, in functional regions such as promoters, in order to ascertain possible evolutive constraints on their structure. We hypothesize the possible correlation between nucleotide distribution patterns and the putative existence of differential selection pressures, deriving from structural and/or functional constraints.


Base Composition Analysis in Vertebrates. Promoter sequences from -1000 bp to -1 bp relative to the TSS in (A) Danio rerio, (B) Xenopus tropicalis, (C) Monodelphis domestica, (D) Gallus gallus, (E) Canis familiaris, (F) Mus musculus, (G) Bos bovis, (H) Pan troglodytes.
X-axis: nucleotide position relative to the TSS. Y-axis: nucleotide density.

We use specific entropy indicators and aligning tools to determine structural differences and similarities among promoters belonging to different classes (both functional and taxonomic). Consensus sequences and repetitive motifs are analyzed with different methods with the aim of a more refined characterization of such sequences.


Positional Shannon Entropy (hn) for substring 8bp-long in different organisms: comparison between real promoters and surrogated sequences (orange). X-axis: nucleotide position relative to the TSS. Y-axis: Positional Shannon Entropy.

Home page - Projects - Information Content in Genomic Sequences
Marsilius - Motore di ricerca dell'Ateneo Fiorentino - logo
- progetto e idea grafica CSIAF - contenuti e gestione a cura del Gruppo - Responsabile Tecnico del sito