Identification of functional cryptic recombination signals in the mouse genome by statistical modeling. Lindsay Grey Cowell, Dept. of Biostatistics and Bioinformatics, Duke University Laboratory of Computational Immunology, Center for Bioinformatics and Computational Biology Ninety-seven percent of the human genome is non-protein-coding, and 20-30% of the non-coding sequence in eukaryotic genomes is expected to play a role in gene regulation. This suggests a significant role for the regulation of gene expression in organismal complexity. At the crux of gene regulation is the cognate binding of DNA by regulatory proteins. Although binding is sequence-specific, alignments of the known binding sites for a protein show that binding-site DNA sequences are often highly variable. Consequently, one of the most challenging problems in molecular/computational biology is the identification of binding sites and the prediction of their functional level. I will briefly discuss the difficulties of modeling an alignment of known binding sites and describe a novel method of model selection developed to model the binding sites (recombination signals, RS) for the V(D)J recombinase. V(D)J recombination is a DNA recombination mechanism that generates antigen-receptor (eg. antibody) encoding genes. A healthy immune response depends on a highly diverse antigen-receptor repertoire. Genetic mechanisms have arisen that specifically diversify the genes encoding antigen receptors, primarily V(D)J recombination and somatic hypermutation. Our statistical model of the V(D)J recombinase binding sites has identified novel sites of recombination and predicts their role in a previously unidentified mechanism of antigen-receptor gene diversification.