*Welcome to the phylogenetics module of Genome Sciences 541.*

After this course, I hope you will be able to

- recognize situations when evolutionary thinking is important
- understand basic features of evolutionary trees
- be familiar with the various types of tree inference, and when they are useful
- understand likelihood-based tree inference
- understand the choices one makes when performing likelihood-based tree inference
- understand potential pitfalls of tree inference methods

- seaview
- Anaconda. If you have an existing installation, excellent. Otherwise I recommend Miniconda
- I suggest that you use Python 3.7. With conda this looks like
`conda create --name 541 python=3.7; conda activate 541`

- Install ETE and friends with
`conda install -c etetoolkit ete3`

- Lecture: Phylogenetics motivation
- Perform sequence alignment on sample data using seaview
- Lecture: Phylogenetics methods intro
- Try using various algorithms to build trees with seaview; then try clicking “Full, Swap, Re-root, and Subtree”

- Lecture: introduction to likelihood-based phylogenetics (video)
- Lecture: Phylogenetic confidence measures
- Investigate a mysterious data set using aLRT and bootstrapping

- Infer a phylogenetic tree from measles data using seaview. Write a little Python script to find the longest branch (a.k.a. edge) in the tree, and draw a version of that tree such that the longest branch is colored red. (ETE hints: Node style, tree traversal, and inline tree rendering if you are using a Jupyter notebook.)
- Make a scatter plot of this measles tree with one point for each branch of the tree: make the x axis the length of the branch and the y axis the number of descendants of that branch (
`len(n)`

gives the number of descendants of a node in ETE). - Imagine that instead of 4 DNA bases, we have just two bases, named 0 and 1. Follow through the development of the transition probabilities starting on Lewis’ slide 59 to obtain a likelihood function for the corresponding model in terms of branch length nu (written ν in the slides) given two sequences that have 20 identical sites and 4 differing ones. Plot the logarithm of the likelihood function for a range of branch length values containing the maximum likelihood branch length. (Note that this part of the assignment does not involve a proper tree, just two sequences evolving from one to the other.)
- Submit both the script and the output, or a Jupyter notebook that has been run from scratch (“Restart & Run All”) and exported to PDF.

- Lecture: Tree exploration
- Lecture: Overview of sequence substitution models (video)
- Lecture: Sequence alignment
- Perform sequence alignment of some HIV gag sequences
- Lecture: Trees and recombination
- Testing for recombination using GARD

- Lecture: Bayesian methods
- Interleaved with: Lewis slides on Bayesian inference (video)
- Play with MCMC Robot

Do the following in a script, either submitting both the script and the output, or a Jupyter notebook that has been run from scratch (“Restart & Run All”) and exported to PDF. PDF is best, but if you encounter problems with the PDF export you may submit (in order of preference) HTML or the `.ipynb`

file.

- Simulate sequence evolution down the measles tree using the 0/1 model from homework 1 starting with a uniform draw for the root state, returning one “column” of sequence data at a time (i.e. a single 0/1 value for each tip). (Python hints: I used numpy’s binomial with n=1 and stored values on the tree using add_feature). Display the tree with an example set of simulated tip states from running your simulator.
- Implement the Fitch algorithm to calculate parsimony scores on your simulated data. I found it useful while debugging my version to annotate the inferences on my tree with
`n.add_face(TextFace(str(fitch_cost)), column=0, position = "branch-top")`

with another annotation for`fitch_state`

. - Simulate 1000 times on the Measles tree and run the Fitch algorithm on each of these. Make a plot such that each simulated data set is a single point, with the x axis representing the number of simulated mutations, and the y axis representing the parsimony score. What do you notice?

*Inferring Phylogenies*by Felsenstein*The Phylogenetic Handbook*edited by Lemey, Salemi, and Vandamme, chapters by the stars

- Introduction to likelihood-based phylogenetics video (with slides)
- Introduction to phylogenetic models video (with slides)
- Introduction to Bayesian statistics and how it is used in phylogenetics (with slides)
- More Bayesian phylogenetics: proposals, prior distributions, hierarchical models, and Bayes factors (with slides)