| Alignment | ||
|---|---|---|
| Marc Suchard |
"A Bayesian perspective on alignment" |
November 18, 2009 |
| Ward Wheeler |
"Dynamic homology and phylogenetic systematics" |
December 7, 2009 |
| Gene-tree species-tree | ||
| Joseph Heled |
"The end of lineage sorting: inferring species trees using *BEAST" |
January 25, 2010 |
| Noah Rosenberg |
"Consistency properties of species tree inference algorithms under the multispecies coalescent." |
February 24, 2010 |
| Jens Lagergren |
"Probabilistic Analysis of gene families with respect to gene duplication, loss, and transfer." |
March 29, 2010 |
| Infectious disease | ||
| Trevor Bedford |
"Adaptation and migration in the human influenza virus." The influenza A virus infects approximately 500 million individuals each year. Owing to its RNA makeup, influenza mutates extremely rapidly allowing the virus population to escape the pull of the human immune system. A single individual may be infected year after year by antigenically novel strains. As result of this rate of mutation, the timescale of influenza evolution is a human timescale. We get the chance to observe the process of evolution in action. However, the rapid pace of evolution also causes an intrinsic link between evolutionary and ecological dynamics in the virus population. The availability of temporally spaced sequence data allows estimation of details of these dynamics unavailable in other systems. Through analysis of this data, I address open questions regarding patterns of adaptation and the effects of seasonality in the human influenza virus. |
April 23, 2010 |
| Philippe Lemey |
"Phylogenetic diffusion models and their applications in viral epidemiology" Emerging infectious diseases continue to appear all over the world, and importantly, they have also risen significantly over time after. Having the potential to quickly adapt to new hosts and environments, RNA viruses are prime candidates to emerge as global threats to human health. Their rapid rate of evolution, however, also turns viral genomes into valuable resources to reconstruct the spatial and temporal processes that are shaping epidemic or endemic dynamics. In this seminar, I will highlight recent developments in phylogenetic diffusion models that tie together sequence evolution and geographic history in a coherent statistical framework. Both discrete and continuous phylogeographic models have recently been implemented in a Bayesian statistical approach. I will position this approach among other popular phylogeographic methods, and then focus on applications in viral molecular epidemiology to demonstrate their use. Finally, I will hint at future extensions that may provide entirely new opportunities for phylogeographic hypothesis testing. |
September 10, 2010 |
| Marco Salemi |
"Phylogenetic challenges in the retroviridae branch of the tree of life." The representation of all virus families within a single phylogenetic tree may be a misleading description of their evolutionary history. First, it is unlikely that all viruses originated from a unique common ancestor. Second, viruses (retroviruses in particular) can integrate into the host genome and be transmitted vertically as well horizontally. Third, different viral genera can evolve according to dramatically different molecular clocks. Three paradigmatic examples from the retroviridae family will be considered here: the simian foamy viruses (SFVs); the primate T-lymphotropic viruses (PTLVs), which include HTLV and STLV, and the primate lentiviruses (PLVs), which include SIV, HIV-1 and HIV-2. SFV is an example of an ancient virus that has been co-evolving with its primate hosts over the last 30 million years. PTLVs emerged around 300 thousand years ago and are characterized by frequent interspecies transmissions and multiple introductions into human populations since prehistoric times. PLVs have a much more recent origin and only within the last 200 years have been able to spread successfully within the human population. The complex relationship between population dynamics and evolutionary time-scale of these retroviruses, as well as the challenge of their integration within the tree of life will be discussed. |
September 20, 2010 |
| Sergei Kosakovsky Pond |
"Accurate estimation of evolutionary attributes of coding sequences and evolutionary fingerprinting." Codon substitution models have facilitated the interpretation of evolutionary forces operating on genomes. Most of these models, however, assume a single rate of non-synonymous substitution irrespective of the nature of amino acids being exchanged. Recent developments have shown that models which allow for amino acid pairs to have different rates of substitution offer improved fit over single rate models. However, these approaches have been limited by the necessity for large alignments in their estimation or the adoption of a particular residue exchangeability scale. We present an alternative procedure which assigns substitution rates between amino acid pairs can be subdivided into a few rate classes, dependent on the information content of the alignment. This procedure permits us to infer generalizable models for specific genes, organisms and taxonomic clades. |
October 28, 2010 |
| Macroevolution | ||
| Joe Felsenstein |
"What poultry breeders and guinea pigs have to tell us about statistical nonmolecular phylogenetics." We are far from having an understanding of the determination of morphological characters at the genome level, so most evolutionary biologists working on them still need to use phenotypic approaches. I will discuss the prospects for using the tools of quantitative genetics, which has faced the same dilemma for the past century. I will use as examples three projects of my own. One, which is joint work with Fred Bookstein, adapts the tools of morphometrics, of which he is a chief developer, to modeling change of morphological forms on phylogenies. The second is a similar project that asks how to best place fossil forms into a phylogeny of present-day species when there is molecular data enabling us to get a good estimate of the phylogeny for those species. The third models discrete 0/1 characters using the Threshold Model developed by Sewall Wright for his work on guinea pigs. All of these lead to asking whether we can connect Brownian Motion models with quantitative genetics models. In all such cases we will have limits on what we can infer, and need to be aware of the need to carry that uncertainty through any subsequent inference using these results. |
January 24, 2011 |
| Luke Harmon |
"New Frontiers for the Comparative Analysis of Diversification." We're building the tree of life, but what can we do with it? It seems clear that there is a wealth of information about evolution in the structure of this tree. There are some methods that can use phylogenetic trees to test macroevolutionary models, but the range of models that we can test is still severely limited. In some cases, such as the estimation of extinction rates from phylogenetic trees, current methods have proven controversial. We are now beginning to develop and implement methods that use tree-of-life scale data to answer key questions in evolution. I will review three new approaches developed in my lab for analyzing comparative datasets: MECCA, fossil-Medusa, and reversible-jump MCMC. I argue that these methods represent the next generation of comparative methods that will open the door to analyzing a much broader range of models with large datasets. |
February 25, 2011 |
| Brian O'Meara |
"Making comparative methods as easy as ABC." For decades, biologists have addressed evolutionary and ecological questions using measurements of species traits, phylogenies, and an assortment of comparative methods. Unfortunately, while there is a large assortment of these methods, they are still fairly limited and development of new methods is slow. It took seven years between the introduction of using a simple Brownian motion model for looking at trait evolution (Felsenstein, 1985) and the use of this same model for looking at rates of trait evolution (Garland, 1992), and an additional 14 years to more powerful tests using a small modification of the basic model (O'Meara et al., 2006). Still other promising methods are described and even tested but remain unavailable to empiricists because they are not put into software. As a result, the questions empiricists can ask about the world are limited by the research productivity of the few dozen scientists who develop and implement new methods in phylogenetics. We describe a new approach based on Approximate Bayesian Computation and implemented in R that will allow researchers to easily develop their own models for trait evolution without requiring them to have specialized mathematical or computational knowledge. |
March 30, 2011 |
| Evolutionary genomics | ||
| Mike Lin |
"Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes." The degeneracy of the genetic code allows protein-coding DNA and RNA sequences to simultaneously encode additional, overlapping functional elements. A sequence in which both protein-coding and additional overlapping functions have evolved under purifying selection should show increased evolutionary conservation compared to typical protein-coding genes -- especially at synonymous sites. We developed a method to systematically locate short regions within known ORFs that show conspicuously low estimated rates of synonymous substitution, based on phylogenetic codon rate models and likelihood ratio tests. We applied this method to genome alignments of 29 placental mammals, resulting in more than 10,000 “synonymous constraint elements” (SCEs) with resolution down to nine-codon windows. These are found within more than a quarter of all human protein-coding genes and contain ~2% of their synonymous sites. We collected numerous lines of evidence that the observed synonymous constraint in these regions reflects selection on overlapping functional elements including splicing regulatory elements, dual-coding genes, RNA secondary structures, microRNA target sites, and developmental enhancers. We also ruled out certain alternative explanations such as codon usage bias and neutral rate variation. Our initial results show that overlapping functional elements are common in mammalian genes, despite the vast genomic landscape. Furthermore, anticipating the future availability of additional mammalian and vertebrate genomes, we are currently developing Bayesian codon modeling methods to measure synonymous rates at even higher resolutions, perhaps eventually allowing the detection of individual regulator binding sites embedded in protein-coding ORFs. |
April 26, 2011 |
| Adam Siepel |
"Bayesian inference of ancient human demography from individual genome sequences." Besides their value for biomedicine, individual genome sequences represent a rich source of information about human evolution. I will describe an effort to estimate key evolutionary parameters from the genome sequences of six individuals from diverse human populations. We have used a Bayesian approach based on coalescent theory to extract information about ancestral population sizes, divergence times, and migration rates from inferred genealogies at many neutrally evolving loci from across the genome. We introduce new methods for accounting for gene flow between populations and integrating over possible phasings of diploid genotypes. I will also describe a custom pipeline for genotype inference to mitigate possible biases from heterogeneous sequencing technologies, coverage levels, and read lengths. Our analysis indicates that the San of Southern Africa diverged from other human populations 108--157 thousand years ago (kya), that Eurasian populations diverged 38--64 kya, and that the effective population size of the ancestors of all modern humans was ~9,000. |
May 24, 2011 |
| Jason Stajich |
"Fungal phylogenomics: Getting lost in the moldy forest." Fungi occupy diverse ecological niches in roles from nutrient cycling in rainforest floors to aggressive plant and animal pathogens. Molecular phylogenetics has helped resolve many of branches on the Fungal tree of life and enabling studies of evolution across this diverse kingdom. The genome sequences from hundreds of fungi now permit the study of change in genes and gene content in this phylogenetic context and to connect molecular evolution with adaptation to ecological niches or changes in lifestyles. I will describe our work in studies contrasting pathogenic and non-pathogenic fungi and efforts to unravel the evolution of multicellularity in fungi comparing unicellular basal fungi with multicellular mushrooms and molds. The development of tools for data mining and use of fungal genomics is also driving the pace of molecular biology and genetics of fungi. I will highlight new approaches to make this easier and the ways data integration can inform and transform studies of functional biology of fungi. |
June 29, 2011 |
| Beyond IID | ||
|
Oscar Westesson
UC Berkeley |
"Accurate reconstruction of insertion-deletion histories by statistical phylogenetics" The "multiple sequence alignment" is a computational artifact. In nature there is no such thing; rather, an alignment represents a partial summary either of indel history, or of structural similarity. Here we show, via evolutionary simulation tests, that all currently-available multiple alignment tools introduce systematic biases into downstream evolutionary analysis - particularly when used to reconstruct histories of insertions and deletions. I will present our unification of Felsenstein's "pruning" algorithm and "progressive alignment" to build a fast, linearly-scaling approximate-maximum-likelihood phylogenetic alignment/reconstruction algorithm. Inference of evolutionary history in this framework displays a clear improvement in accuracy over non-statistical phylogenetic reconstructions and a massive improvement in performance over slow-running MCMC statistical reconstructions. |
September 20, 2011 |
|
Alexandre Bouchard-Côté
U British Columbia |
"The Poisson Indel Process" The key component of a probabilistic joint approach to tree and alignment inference is a Continuous Time Markov Chain (CTMC) over strings. Ideally, this CTMC should support tractable inference algorithms and should be easily extensible to support a wide range of evolutionary models. The classical string-valued CTMC, the TKF91 model (Thorne et al., 1991), is limited in both of these axes. Previous work has focussed on increasing the complexity of the TKF91 model, making the inference problem computationally more difficult (Miklos et al., 2004). In this work, we present a new stochastic process, the Poisson Indel Process (PIP), which allows simple and practical inference algorithms. Efficient computations are based on an exchangeable representation and on Poisson processes. This representation gives a natural way of extending the capacity of the model while keeping inference computationally practical. We used this process to design a joint Bayesian estimator over alignments and trees. We evaluated both consensus trees and alignments against standard baselines on synthetic and real data. These experiments demonstrate that competitive trees and alignments can be inferred using a Bayesian model equipped with a PIP prior. |
October 18, 2011 |
| Software | ||
|
Liam Revell and Klaus Schliep
UMass Boston and University of Paris |
"Introduction to phytools and phangorn: phylogenetics tools for R" phytools is a new multifunctional phylogenetics package for the R statistical computing environment. The focus of the package is on methods for phylogenetic comparative biology; however it also includes tools for simulation, phylogeny input/output, manipulation, and even inference. The phytools library is designed for maximum interoperability with other important R phylogenetics packages such as ape, geiger, and phangorn. phangorn is a package for phylogenetic reconstruction and analysis in the R language. Previously it was only possible to estimate phylogenetic trees with distance methods in R. phangorn, now offers the possibility of reconstructing phylogenies with distance based methods, maximum parsimony or maximum likelihood (ML) and performing Hadamard conjugation. Extending the general ML framework, this package provides the possibility of estimating mixture and partition models. Furthermore, phangorn offers several functions for comparing trees, phylogenetic models or splits, simulating character data and performing congruence analyses. |
December 15, 2011 |
|
Sergei Kosakovsky Pond
UCSD |
"Introduction to HyPhy: Hypothesis testing using Phylogenies" HyPhy is an open-source software package for the analysis of genetic sequences using techniques in phylogenetics, molecular evolution, and machine learning. It features a complete graphical user interface (GUI) and a rich scripting language for limitless customization of analyses. Additionally, HyPhy features support for parallel computing environments (via message passing interface) and it can be compiled as a shared library and called from other programming environments such as Python or R. |
January 25, 2012 |
To watch a recording, simply click on the name of the speaker (and be patient while it starts...).