Classes Offered at UC Berkeley

Ian Holmes

Evolutionary genomics; post-transcriptional regulation; bioinformatics algorithmsioengineering
Bioengineering

Research Description

Our lab is interested in the structure and evolution of genomes. By studying similarities and differences between genomic and post-genomic data from related species, we seek to understand the mechanisms by which DNA evolves, and in particular the signature interplay between mutation and selection that distinguishes various regions of the genome (including e.g. protein-coding genes; RNA genes; promoters, zipcodes and other transcriptional/translational regulatory elements; transposons, pseudogenes & other "junk" DNA).

Using this insight, we can build better tools for annotating and interrogating genomic data. In doing this, we borrow techniques from computer science (e.g. Bayesian machine learning, natural language processing, graphical models) as well as statistical physics, probability theory and molecular biology. We work closely with experimental biologists, and endeavour to test our predictions in the wet-lab.

Current projects include: Post-transcriptional regulation of RNA. The central dogma of molecular biology states that "DNA makes RNA makes protein". Regulatory mechanisms can act at both stages: that is, both before and after the DNA >>is transcribed into RNA. We are investigating the latter, post-transcriptional regulation, which includes attenuation, suppression, localisation, degradation, alternative splicing, incorporation of nonstandard amino acids and other interesting biology. Typically this regulation is guided by cis acting signals that recruit regulatory proteins, ncRNAs or RNA-protein complexes. These signals often use RNA secondary structure. We are focused on identifying and characterising such signals, by comparative genomics (using tools imported from computational linguistics, such as stochastic grammars, and involving collaboration with wetlab biologists. Model systems of particular interest include localisation elements ("zipcodes") in the fruitfly Drosophila melanogaster and packaging/regulatory elements in viruses.

Probabilistic evolution and multiple sequence alignment. Multiple alignment algorithms typically assume some kind of phylogenetic relationship between the sequences being aligned. Underlying this relationship is an assumed evolutionary process whereby a sequence experiences random (albeit biased) mutation events of various kinds, most typically insertions/deletions and substitutions. We are making these implicit assumptions rigorous by developing probabilistic evolutionary models for sequence evolution, explicitly parameterised in terms of the underlying substitution and indel rates. Using probabilistic algorithms, such as Expectation Maximisation and Markov Chain Monte Carlo, we are developing a fully probabilistic theory of multiple alignment based on evolutionary models. We are applying these techniques to build improved software tools for multiple alignment, profiling and phylogeny, and to apply these tools to accurately measure mutation rates in genes and genomes.

Comparative genomics and transcriptional regulation. The advent of post-genomic technologies for gene expression and proteomics, such as DNA microarrays and improved mass spectrometry, offers new ways to address fundamental questions of mammalian development and differentiation at the molecular genetics level. We are involved in collaborations with genome sequencing centres to explore the DNA promoter signals that regulate the expression of genes in various phyletic groups (mammals, fruitflies, nematodes) with a particular interest in the evolutionary context of such regulatory networks.

Representative Publications

Pairwise RNA structure comparison using stochastic context-free grammars. Proc. Pacific Symposium on Biocomputing (2002)

Evolutionary HMMs: A Bayesian approach to multiple alignment. Bioinformatics 17(9) (2001), 803-820.

An Expectation Maximization algorithm for training hidden substitution models. Journal of Molecular Biology 317(5) (2002), 757-768.


Copyright UC Berkeley. All rights reserved.