Presenter: Joshua D. Starmer
Advisor(s): Donald Bitzer, Mladen Vouk
Author(s): Joshua Starmer, Anne Stomp, Mladen Vouk, Donald Bitzer
Graduate Program: Bioinformatics, Forestry, Computer Science,
Computer Science
Title: Predicting Shine-Dalgarno sequence locations in 18 prokaryote transcriptomes exposes systematic annotation errors
Abstract: We implemented the Individual Nearest Neighbor Hydrogen Bond
(INN-HB) model for oligo-oligo hybridization and created a new metric, Relative Spacing (RS),
to identify both the location and the magnitude of Shine-Dalgarno (SD) sequences by simulating the binding between mRNAs and single-stranded
16S rRNA 3' tails. In 18 prokaryote genomes we observed 2420 genes where the strongest
binding in the translation initiation region (TIR) includes the start codon, deviating from the expected location of 5 to 10 bases upstream.
We designated these as RS+1 genes. Analysis discovered an unusual bias of the start codon in that the majority of the RS+1 genes used
GUG, not AUG. Further characterization of the 624 RS+1 genes whose SD sequence was associated with a free energy release
of less than -8.4 Kcal/mole (strong RS+1 genes) determined that the most likely explanation of the unexpected location of the
SD sequence for 384 of these genes is mis-annotation of the start codon. In this way, the new RS metric provided an improved method for gene sequence
annotation. Many of the remaining strong RS+1 genes appear to be bona fide with their SD sequences in an unexpected location that includes the
start codon.