Bayesian phylogenetics

Erick Matsen


  • What does likelihood mean?
  • What is its interpretation in terms of simulation?

Modern technology gives us the ability to observe in great detail


But very detailed observation is not the same as understanding

To understand we need to simplify and abstract.

What abstractions do we have at our disposal?



\(x\) is useful and we love it dearly!


\(x\) allows us to describe knowledge in an implicit way:


\[ f(x)=y \]


then we can work towards solving for \(x\).



Alternatively, one might be interested in taking the average of a \(g(x)\) between two values \(a\) and \(b\).

Define \(\int_a^b g(x) \ dx\) as area

\(1/(b-a) \cdot \int_a^b g(x) \ dx\) is average

Variables allow us to solve

  • Problem 1: given \(y\), solve for \(x\).
  • Problem 2: predict if a 10% bigger charge will hit the castle.
    Say the answer to this is \(\text{hit}_{10}(x)\), such that \(\text{hit}_{10}(x)\) is 1 if that \(x\) will make the cannonball hit the castle, and 0 otherwise.

Variables allow us to solve



… in a deterministic framework.


Life is a probabilistic process.



How do we abstract probabilistic quantities?

Random variables \(X\) abstract variables

It doesn’t have a fixed value: we have to “ask” it for a value.

Random variables are capricious,
but they are well defined behind their stochastic exterior.

Random variable sampling determined by distributions


Sometimes discrete:

\[ \begin{align} P(\text{heads}) & = 0.51 \\ P(\text{tails}) & = 0.49 \end{align} \]


Sometimes continuous:

Working with random variables \(X\):


We can solve for \(X\) in “equations” like \(f(X) \sim Y\), obtaining expressions such as \(\mathbb P(X \mid Y);\) this is called inference.



We can also average with respect to \(X\):

\[ \int g(X) \, d\mathbb P(X \mid Y) \]

where now we are averaging out with respect to a probability.

Probabilistic approach to prediction

  • \(Y\): horizontal distance traveled by a cannonball (random variable)
  • \(X\): cannon angle (inferred random variable)
  • Problem 1: given observed distribution \(Y\), infer distribution of \(X\).
  • Problem 2: find probability that a 10% bigger charge will hit castle.


  1. Solve \(f(X) = Y\)   to get   \(\mathbb P(X \mid Y)\).
  2. Integrate \(\int \text{hit}_{10}(X) \, d\mathbb P(X \mid Y)\).

Biological experiments are measurements with uncertainty

Model-based statistical inference


We can solve for \(X\) in “equations” like \(f(X) \sim Y\),
inferring an unknown distribution for \(X\)
(what can we learn about the angle of the cannon).



We can push uncertainty through an analysis using integrals like \[ \int_a^b g(X) \, d\mathbb P(X \mid Y). \] (we don’t care what the angle of the cannon is really, we just want to know with what probability the shot is going to hit the castle!)

Bayes is magic




\[ P(X \mid D) \propto P(D \mid X) P(X) \]

What is our height divided by the average elevation?

Now, what is model-based statistical inference on discrete mathematical objects?


Motivation: we would like to decide whether an individual has been superinfected, i.e. infected with a second viral variant in a separate event


Integrate out phylogenetic uncertainty


To decide superinfection, we would like to calculate \[ \int_S g(X) \, d\mathbb P(X \mid Y) \] where \(X\) is now a phylogenetic-tree-valued random variable.

Time to count your blessings.

  • Real numbers are equipped with a total order.         (\(3 < 4\))
  • Real numbers are equipped with a simply-computed distance
    that is compatible with the total order.
            (\(\,|7-3| = 4\,\))
  • Real numbers form a continuum.             (\(2.9 < 2.95 < 3\))

We can thus define the integral

for real-valued \(\int_a^b g(x) dx\) and \(\int_a^b g(X) \, d\mathbb P(X \mid Y)\).

Integrating over phylogenetic trees?

Phylogenetic trees have discrete topologies, there is no canonical distance between them, nor a natural total order.


But we still want to do inference and integration in this setting!

Subtree-prune-regraft (rSPR) definition



These trees are then distance 1 apart.

Tree graph connected by rSPR moves

Metropolis-Hastings algorithm

  • If you jump to a better tree, accept that move
  • If you jump to a worse tree, accept that move with a non-zero probability
  • It’s all arranged so that you sample trees in proportion to their posterior probability

Try out MCMC robot!

Markov chain Monte Carlo

Subset to high probability nodes

The top 4096 trees for a data set

The posterior probability of a tree is the probability that the observed tree is correct (given the model and priors)

  • Bayesians sample from this posterior
  • If you can deal with a prior, it’s the statistically right thing to do
  • Sometimes we aren’t actually interested in the tree, so we can integrate it out
  • But! Short alignment, 100 taxa = hours

Why is random-walk Markov chain Monte Carlo so slow?

The efficiency of MCMC depends on the fraction of good neighbors.

# good neighbors for 41 sequences

# good neighbors for 41 sequences (!)




… we are very unlikely to hit another good tree by randomly trying a neighbor. No wonder random walk MCMC is so slow.

Whidden & Matsen IV. (2015). Quantifying MCMC exploration of phylogenetic tree space. Systematic Biology.

Variational inference is an alternate strategy

MCMC strategy: sample under this curve

Variational inference: fit a distribution \(q_\phi(\mathbf{z})\) to it