Evolutionary genomics; post-transcriptional regulation; bioinformatics
algorithmsioengineering
Bioengineering
Our lab is interested in the structure and evolution of genomes.
By studying similarities and differences between genomic and post-genomic
data from related species, we seek to understand the mechanisms by
which DNA evolves, and in particular the signature interplay between
mutation and selection that distinguishes various regions of the genome
(including e.g. protein-coding genes; RNA genes; promoters, zipcodes
and other transcriptional/translational regulatory elements; transposons,
pseudogenes & other "junk" DNA).
Using this insight, we can build better tools for annotating and
interrogating genomic data. In doing this, we borrow techniques from
computer science (e.g. Bayesian machine learning, natural language
processing, graphical models) as well as statistical physics, probability
theory and molecular biology. We work closely with experimental biologists,
and endeavour to test our predictions in the wet-lab.
Current projects include: Post-transcriptional regulation of RNA.
The central dogma of molecular biology states that "DNA makes RNA
makes protein". Regulatory mechanisms can act at both stages: that
is, both before and after the DNA >>is transcribed into RNA. We are
investigating the latter, post-transcriptional regulation, which includes
attenuation, suppression, localisation, degradation, alternative splicing,
incorporation of nonstandard amino acids and other interesting biology.
Typically this regulation is guided by cis acting signals that recruit
regulatory proteins, ncRNAs or RNA-protein complexes. These signals
often use RNA secondary structure. We are focused on identifying and
characterising such signals, by comparative genomics (using tools
imported from computational linguistics, such as stochastic grammars,
and involving collaboration with wetlab biologists. Model systems
of particular interest include localisation elements ("zipcodes")
in the fruitfly Drosophila melanogaster and packaging/regulatory elements
in viruses.
Probabilistic evolution and multiple sequence alignment. Multiple
alignment algorithms typically assume some kind of phylogenetic relationship
between the sequences being aligned. Underlying this relationship
is an assumed evolutionary process whereby a sequence experiences
random (albeit biased) mutation events of various kinds, most typically
insertions/deletions and substitutions. We are making these implicit
assumptions rigorous by developing probabilistic evolutionary models
for sequence evolution, explicitly parameterised in terms of the underlying
substitution and indel rates. Using probabilistic algorithms, such
as Expectation Maximisation and Markov Chain Monte Carlo, we are developing
a fully probabilistic theory of multiple alignment based on evolutionary
models. We are applying these techniques to build improved software
tools for multiple alignment, profiling and phylogeny, and to apply
these tools to accurately measure mutation rates in genes and genomes.
Comparative genomics and transcriptional regulation. The advent of
post-genomic technologies for gene expression and proteomics, such
as DNA microarrays and improved mass spectrometry, offers new ways
to address fundamental questions of mammalian development and differentiation
at the molecular genetics level. We are involved in collaborations
with genome sequencing centres to explore the DNA promoter signals
that regulate the expression of genes in various phyletic groups (mammals,
fruitflies, nematodes) with a particular interest in the evolutionary
context of such regulatory networks.
Pairwise RNA structure comparison using stochastic context-free grammars.
Proc. Pacific Symposium on Biocomputing (2002)
Evolutionary HMMs: A Bayesian approach to multiple alignment. Bioinformatics
17(9) (2001), 803-820.
An Expectation Maximization algorithm for training hidden substitution
models. Journal of Molecular Biology 317(5) (2002), 757-768.