# Review

• What does likelihood mean?
• What is its interpretation in terms of simulation?
Modern technology gives us the ability to observe in great detail

But very detailed observation is not the same as understanding

To understand we need to simplify and abstract.

# $x$ is useful and we love it dearly!

$x$ allows us to describe knowledge in an implicit way:

$f(x)=y$

then we can work towards solving for $x$.

Alternatively, one might be interested in taking the average of a $g(x)$ between two values $a$ and $b$.

# Variables allow us to solve

• Problem 1: given $y$, solve for $x$.
• Problem 2: predict if a 10% bigger charge will hit the castle.
Say the answer to this is $\text{hit}_{10}(x)$, such that $\text{hit}_{10}(x)$ is 1 if that $x$ will make the cannonball hit the castle, and 0 otherwise.

# Variables allow us to solve

… in a deterministic framework.

Life is a probabilistic process.

How do we abstract probabilistic quantities?

# Random variables $X$ abstract variables

It doesn’t have a fixed value: we have to “ask” it for a value.

Random variables are capricious,
but they are well defined behind their stochastic exterior.

# Random variable sampling determined by distributions

Sometimes discrete:

\begin{align} P(\text{heads}) & = 0.51 \\ P(\text{tails}) & = 0.49 \end{align}

Sometimes continuous:

# Working with random variables$X$:

We can solve for $X$ in “equations” like $f(X) \sim Y$, obtaining expressions such as $\mathbb P(X \mid Y);$ this is called inference.

We can also average with respect to $X$:

$\int g(X) \, d\mathbb P(X \mid Y)$

where now we are averaging out with respect to a probability.

# Probabilistic approach to prediction

• $Y$: horizontal distance traveled by a cannonball (random variable)
• $X$: cannon angle (inferred random variable)
• Problem 1: given observed distribution $Y$, infer distribution of $X$.
• Problem 2: find probability that a 10% bigger charge will hit castle.

1. Solve $f(X) = Y$   to get   $\mathbb P(X \mid Y)$.
2. Integrate $\int \text{hit}_{10}(X) \, d\mathbb P(X \mid Y)$.

# Model-based statistical inference ✓

We can solve for $X$ in “equations” like $f(X) \sim Y$,
inferring an unknown distribution for $X$
(what can we learn about the angle of the cannon).

We can push uncertainty through an analysis using integrals like $\int_a^b g(X) \, d\mathbb P(X \mid Y).$ (we don’t care what the angle of the cannon is really, we just want to know with what probability the shot is going to hit the castle!)

# Bayes is magic

$P(X \mid D) \propto P(D \mid X) P(X)$

# Integrate out phylogenetic uncertainty

To decide superinfection, we would like to calculate $\int_S g(X) \, d\mathbb P(X \mid Y)$ where $X$ is now a phylogenetic-tree-valued random variable.

# Time to count your blessings.

• Real numbers are equipped with a total order.         ($3 < 4$)
•
• Real numbers are equipped with a simply-computed distance
that is compatible with the total order.
($\,|7-3| = 4\,$)
•
• Real numbers form a continuum.             ($2.9 < 2.95 < 3$)

# We can thus define the integral

for real-valued $\int_a^b g(x) dx$ and $\int_a^b g(X) \, d\mathbb P(X \mid Y)$.

# Integrating over phylogenetic trees?

Phylogenetic trees have discrete topologies, there is no canonical distance between them, nor a natural total order.

But we still want to do inference and integration in this setting!

# Subtree-prune-regraft (rSPR) definition

These trees are then distance 1 apart.

# Metropolis-Hastings algorithm

• If you jump to a worse tree, accept that move with a non-zero probability
• It’s all arranged so that you sample trees in proportion to their posterior probability

Try out MCMC robot!

# The posterior probability of a tree is the probability that the observed tree is correct (given the model and priors)

• Bayesians sample from this posterior
• If you can deal with a prior, it’s the statistically right thing to do
• Sometimes we aren’t actually interested in the tree, so we can integrate it out
• But! Short alignment, 100 taxa = hours

# Why is random-walk Markov chain Monte Carlo so slow?

The efficiency of MCMC depends on the fraction of good neighbors.

# # good neighbors for 41 sequences (!)

… we are very unlikely to hit another good tree by randomly trying a neighbor. No wonder random walk MCMC is so slow.

Whidden & Matsen IV. (2015). Quantifying MCMC exploration of phylogenetic tree space. Systematic Biology.

# Variational inference is an alternate strategy

MCMC strategy: sample under this curve