What is its interpretation in terms of simulation?
Modern technology gives us the ability to observe in great detail
But very detailed observation is not the same as understanding
To understand we need to simplify and abstract.
What abstractions do we have at our disposal?
\(x\) is useful and we love it dearly!
\(x\) allows us to describe knowledge in an implicit way:
\[
f(x)=y
\]
then we can work towards solving for \(x\).
Alternatively, one might be interested in taking the average of a \(g(x)\) between two values \(a\) and \(b\).
Define \(\int_a^b g(x) \ dx\) as area
\(1/(b-a) \cdot \int_a^b g(x) \ dx\) is average
Variables allow us to solve
Problem 1: given \(y\), solve for \(x\).
Problem 2: predict if a 10% bigger charge will hit the castle. Say the answer to this is \(\text{hit}_{10}(x)\), such that \(\text{hit}_{10}(x)\) is 1 if that \(x\) will make the cannonball hit the castle, and 0 otherwise.
Variables allow us to solve
… in a deterministic framework.
Life is a probabilistic process.
How do we abstract probabilistic quantities?
Random variables \(X\) abstract variables
It doesn’t have a fixed value: we have to “ask” it for a value.
Random variables are capricious, but they are well defined behind their stochastic exterior.
Random variable sampling determined by distributions
Biological experiments are measurements with uncertainty
Model-based statistical inference ✓
We can solve for \(X\) in “equations” like \(f(X) \sim Y\), inferring an unknown distribution for \(X\) (what can we learn about the angle of the cannon).
We can push uncertainty through an analysis using integrals like \[
\int_a^b g(X) \, d\mathbb P(X \mid Y).
\] (we don’t care what the angle of the cannon is really, we just want to know with what probability the shot is going to hit the castle!)
What is our height divided by the average elevation?
Now, what is model-based statistical inference on discrete mathematical objects?
Motivation: we would like to decide whether an individual has been superinfected, i.e. infected with a second viral variant in a separate event
Integrate out phylogenetic uncertainty
To decide superinfection, we would like to calculate \[
\int_S g(X) \, d\mathbb P(X \mid Y)
\] where \(X\) is now a phylogenetic-tree-valued random variable.
Time to count your blessings.
Real numbers are equipped with a total order. (\(3 < 4\))
Real numbers are equipped with a simply-computed distance that is compatible with the total order. (\(\,|7-3| = 4\,\))
Real numbers form a continuum. (\(2.9 < 2.95 < 3\))
We can thus define the integral
for real-valued \(\int_a^b g(x) dx\) and \(\int_a^b g(X) \, d\mathbb P(X \mid Y)\).
Integrating over phylogenetic trees?
Phylogenetic trees have discrete topologies, there is no canonical distance between them, nor a natural total order.
But we still want to do inference and integration in this setting!