Presenter: Joshua D. Starmer
Advisor(s): Donald Bitzer, Mladen Vouk
Author(s): Joshua Starmer, Anne Stomp, Mladen Vouk, Donald Bitzer
Graduate Program: Bioinformatics, Forestry, Computer Science, Computer Science

Title: Predicting Shine-Dalgarno sequence locations in 18 prokaryote transcriptomes exposes systematic annotation errors

Abstract: We implemented the Individual Nearest Neighbor Hydrogen Bond (INN-HB) model for oligo-oligo hybridization and created a new metric, Relative Spacing (RS), to identify both the location and the magnitude of Shine-Dalgarno (SD) sequences by simulating the binding between mRNAs and single-stranded 16S rRNA 3' tails. In 18 prokaryote genomes we observed 2420 genes where the strongest binding in the translation initiation region (TIR) includes the start codon, deviating from the expected location of 5 to 10 bases upstream. We designated these as RS+1 genes. Analysis discovered an unusual bias of the start codon in that the majority of the RS+1 genes used GUG, not AUG. Further characterization of the 624 RS+1 genes whose SD sequence was associated with a free energy release of less than -8.4 Kcal/mole (strong RS+1 genes) determined that the most likely explanation of the unexpected location of the SD sequence for 384 of these genes is mis-annotation of the start codon. In this way, the new RS metric provided an improved method for gene sequence 
annotation. Many of the remaining strong RS+1 genes appear to be bona fide with their SD sequences in an unexpected location that includes the start codon.