### Erick MatsenTeaching

Welcome to the phylogenetics module of Genome Sciences 541.

After this course, I hope you will be able to

• recognize situations when evolutionary thinking is important
• understand basic features of evolutionary trees
• be familiar with the various types of tree inference, and when they are useful
• understand likelihood-based tree inference
• understand the choices one makes when performing likelihood-based tree inference
• understand potential pitfalls of tree inference methods

## Prerequisites

### Software prerequisites

• seaview
• Anaconda. If you have an existing installation, excellent. Otherwise I recommend Miniconda
• I suggest that you use Python 3.7. With conda this looks like conda create --name 541 python=3.7; conda activate 541
• Install ETE and friends with conda install -c etetoolkit ete3

## Day 2: Phylogenetics methods

### Homework 1

• Infer a phylogenetic tree from measles data using seaview. Write a little Python script to find the longest branch (a.k.a. edge) in the tree, and draw a version of that tree such that the longest branch is colored red. (ETE hints: Node style, tree traversal, and inline tree rendering if you are using a Jupyter notebook.)
• Make a scatter plot of this measles tree with one point for each branch of the tree: make the x axis the length of the branch and the y axis the number of descendants of that branch (len(n) gives the number of descendants of a node in ETE).
• Imagine that instead of 4 DNA bases, we have just two bases, named 0 and 1. Follow through the development of the transition probabilities starting on Lewis’ slide 59 to obtain a likelihood function for the corresponding model in terms of branch length nu (written ν in the slides) given two sequences that have 20 identical sites and 4 differing ones. Plot the logarithm of the likelihood function for a range of branch length values containing the maximum likelihood branch length. (Note that this part of the assignment does not involve a proper tree, just two sequences evolving from one to the other.)
• Submit both the script and the output, or a Jupyter notebook that has been run from scratch (“Restart & Run All”) and exported to PDF.

## Day 4: Further topics

### Homework 2

Do the following in a script, either submitting both the script and the output, or a Jupyter notebook that has been run from scratch (“Restart & Run All”) and exported to PDF. PDF is best, but if you encounter problems with the PDF export you may submit (in order of preference) HTML or the .ipynb file.

• Simulate sequence evolution down the measles tree using the 0/1 model from homework 1 starting with a uniform draw for the root state, returning one “column” of sequence data at a time (i.e. a single 0/1 value for each tip). (Python hints: I used numpy’s binomial with n=1 and stored values on the tree using add_feature). Display the tree with an example set of simulated tip states from running your simulator.
• Implement the Fitch algorithm to calculate parsimony scores on your simulated data. I found it useful while debugging my version to annotate the inferences on my tree with n.add_face(TextFace(str(fitch_cost)), column=0, position = "branch-top") with another annotation for fitch_state.
• Simulate 1000 times on the Measles tree and run the Fitch algorithm on each of these. Make a plot such that each simulated data set is a single point, with the x axis representing the number of simulated mutations, and the y axis representing the parsimony score. What do you notice?

## Software

• FastTree – approximate ML
• RAxML, PhyML, and IQ-TREE – somewhat less approximate ML
• BEAST – Bayesian, inferring event times and rooted trees
• MrBayes and RevBayes – Bayesian, inferring unrooted trees
• DataMonkey, your one-stop online shop for selection and recombination analysis