Welcome to the phylogenetics module of Genome Sciences 541.
After this course, I hope you will be able to
- recognize situations when evolutionary thinking is important
- understand basic features of evolutionary trees
- be familiar with the various types of tree inference, and when they are useful
- understand likelihood-based tree inference
- understand the choices one makes when performing likelihood-based tree inference
- understand potential pitfalls of tree inference methods
Prerequisites
Software prerequisites
- seaview
- Anaconda. If you have an existing installation, excellent. Otherwise I recommend Miniconda
- I suggest that you use Python 3.7. With conda this looks like
conda create --name 541 python=3.7; conda activate 541
- Install ETE and friends with
conda install -c etetoolkit ete3
Day 1: Phylogenetics motivation and intro
Day 2: Phylogenetics methods
Homework 1
- Infer a phylogenetic tree from measles data using seaview. Write a little Python script to find the longest branch (a.k.a. edge) in the tree, and draw a version of that tree such that the longest branch is colored red. (ETE hints: Node style, tree traversal, and inline tree rendering if you are using a Jupyter notebook.)
- Make a scatter plot of this measles tree with one point for each branch of the tree: make the x axis the length of the branch and the y axis the number of descendants of that branch (
len(n)
gives the number of descendants of a node in ETE).
- Imagine that instead of 4 DNA bases, we have just two bases, named 0 and 1. Follow through the development of the transition probabilities starting on Lewis’ slide 59 to obtain a likelihood function for the corresponding model in terms of branch length nu (written ν in the slides) given two sequences that have 20 identical sites and 4 differing ones. Plot the logarithm of the likelihood function for a range of branch length values containing the maximum likelihood branch length. (Note that this part of the assignment does not involve a proper tree, just two sequences evolving from one to the other.)
- Submit both the script and the output, or a Jupyter notebook that has been run from scratch (“Restart & Run All”) and exported to PDF.
Day 3: Sequence alignment, recombination and trees as data structures
Day 4: Further topics
Homework 2
Do the following in a script, either submitting both the script and the output, or a Jupyter notebook that has been run from scratch (“Restart & Run All”) and exported to PDF. PDF is best, but if you encounter problems with the PDF export you may submit (in order of preference) HTML or the .ipynb
file.
- Simulate sequence evolution down the measles tree using the 0/1 model from homework 1 starting with a uniform draw for the root state, returning one “column” of sequence data at a time (i.e. a single 0/1 value for each tip). (Python hints: I used numpy’s binomial with n=1 and stored values on the tree using add_feature). Display the tree with an example set of simulated tip states from running your simulator.
- Implement the Fitch algorithm to calculate parsimony scores on your simulated data. I found it useful while debugging my version to annotate the inferences on my tree with
n.add_face(TextFace(str(fitch_cost)), column=0, position = "branch-top")
with another annotation for fitch_state
.
- Simulate 1000 times on the Measles tree and run the Fitch algorithm on each of these. Make a plot such that each simulated data set is a single point, with the x axis representing the number of simulated mutations, and the y axis representing the parsimony score. What do you notice?
Books
Introductory lecture series
Software
- FastTree – approximate ML
- RAxML, PhyML, and IQ-TREE – somewhat less approximate ML
- BEAST – Bayesian, inferring event times and rooted trees
- MrBayes and RevBayes – Bayesian, inferring unrooted trees
- DataMonkey, your one-stop online shop for selection and recombination analysis