next on phyloseminar.org
Using invariants for coalescent-based phylogenetic inference
The advent of rapid and inexpensive sequencing technologies has necessitated the development of computationally efficient methods for analyzing sequence data for many genes simultaneously in a phylogenetic framework. The coalescent process is the most commonly used model for linking the underlying genealogies of individual genes with the global species-level phylogeny, but inference under the coalescent model is computationally daunting in the typical inference frameworks (e.g., the likelihood and Bayesian frameworks) due to the dimensionality of the space of both gene trees and species trees. By viewing the data arising under the phylogenetic coalescent model as a collection of site patterns, the algebraic structure associated with the probability distribution on the site patterns can be used to develop computationally efficient methods for inference via phylogenetic invariants.
In this talk, I will discuss three problems that can be addressed using invariants. First, I will describe how identifiability results for four-taxon species trees based on site pattern probabilities can be used to build a quartet-based inference algorithm for trees of arbitrary size. Second, methods for rooting phylogenetic species trees inferred under the coalescent model will be discussed. Finally, the use of invariants to detect species that arose via hybridization will be described. The methods presented will be demonstrated on several phylogenomic-scale datasets. Because the methods are derived in a fully model-based framework (i.e., the coalescent process is used to model the relationship between gene trees and the species tree, and standard nucleotide substitution models (GTR+I+G and all submodels) are used for sequence-level evolution), these methods are promising approaches for computationally efficient, model-based inference for the large-scale sequence data available today.
Developing a statistically powerful measure for quartet tree inference using phylogenetic and Markov invariants
Recently there has been renewed interest in phylogenetic inference methods based on phylogenetic invariants, alongside the related Markov invariants. Broadly speaking, both these approaches give rise to polynomial functions of sequence site patterns that, in expectation value, either vanish for particular evolutionary trees (in the case of phylogenetic invariants) or have well understood transformation properties (in the case of Markov invariants).
While both approaches have been valued for their intrinsic mathematical interest, it is not clear how they relate to each other, and to what extent they can be used as practical tools for inference of phylogenetic trees. By focusing on the special case of binary sequence data and quartets of taxa, we are able to view these two different polynomial-based approaches within a common framework.
We present three desirable statistical properties that we argue any invariant-based phylogenetic method should satisfy: (1) sensible behaviour under reordering of input sequences; (2) stability as the taxa evolve independently according to a Markov process; and (3) explicit dependence on the assumption of a continuous-time process. Motivated by these statistical properties, we develop and explore several new phylogenetic inference methods. In particular, we develop a statistically bias-corrected version of the Markov invariants approach which satisfies all three properties. We also extend previous work by showing that the phylogenetic in- variants can be implemented in such a way as to satisfy property (3). A simulation study shows that, in comparison to other methods, our new proposed approach based on bias-corrected Markov invariants is extremely powerful for phylogenetic inference.