next on phyloseminar.org
Advances in computational Bayesian methods and their use in large-scale single-cell tree reconstruction
I will describe a Bayesian method to reconstruct single cell phylogenetic trees from copy number events such as those that arise in cancers with high genomic instability. The method is motivated by low-depth genome-wide data which can be obtained for increasingly large numbers of cells thanks to technologies such as Direct Library Preparation or 10x Single Cell Genomics.
Computing the posterior distribution in this model at scale is challenging. I will describe how recent advances in the field of Bayesian computational statistics can be used to parallelize the posterior inference computation to an arbitrary number of cores, touching on topics such as non-reversible methods and change of measure approaches.
The posterior inference methods described are available through an open source Bayesian modelling language called Blang, which can be used for a range of phylogenetic problems including more traditional phylogenetic models, as well as other Bayesian analysis problems. The motivating copy-number-based phylogenetic model is implemented in Blang and available in a cancer Bayesian phylogenetics and population genetics library we are actively developing. This library has been used to infer phylogenetic trees on >4000 cells using >60 cores.
Reconstructing probabilistic trees of cellular differentiation from single-cell RNA-seq data
Recent advances in single-cell methods have made tangible how individual cell profiles can reflect the imprint of ephemeral or dynamic processes. However, synthesizing this information to reconstruct dynamic biological phenomena – from data that are noisy, heterogenous, and sparse, and from processes that may unfold asynchronously – poses a computational and statistical challenge.
We develop a full generative model and inference for reconstructing a dynamic process (cellular differentiation) from many static snapshots (single-cell RNA-seq profiles), with calibrated uncertainties. Specifically, we define cell state by the latent parameterization of a distribution over gene expression space, and model these latent vectors as arising from bifurcating, self-reinforcing paths along a probabilistic tree — necessitating the design of a new class of Bayesian tree models for data that arise from a latent branching spectrum.
In this talk, I explore how our model fills a hole in the existing literature on probabilistic trees, and what having an explicit generative model buys us in the context of reconstructing trajectories to understand cell fate decisions in differentiation.
Cellular ‘phylogenetics’ - decoding the developmental history and relationships among individual cells
Multicellular organisms develop by way of a lineage tree, a series of cell divisions that give rise to cell types, tissues, and organs. This pattern mirrors the evolutionary relationships between species, though our knowledge of the cell lineage and its determinants remains extremely fragmentary for nearly all species. This includes all vertebrates and arthropods such as Drosophila, wherein cell lineage varies between individuals. Embryos and organs are often visually inaccessible, and progenitor cells disperse by long-distance migration. We recently pioneered a new paradigm for recording cell lineage and other aspects of developmental history that has the potential to enhance our understanding of vertebrate biology. In brief, we engineer cells to stochastically introduce mutations at specific locations in the genome during development. The resulting patterns of mutations, which can be efficiently queried by massively parallel sequencing, can be used to reconstruct lineage using methods adapted from phylogenetics. We demonstrate our technique by tracing the lineage of tens of thousands of cells within individual Zebrafish and Drosophila, relating the lineage of numerous emerging tissue and organ systems.