trim trims placefiles down to only containing an informative subset of the mass.
usage: trim [options] placefile[s]
--point-mass | Treat every pquery as a point mass concentrated on the highest-weight placement. |
--pp | Use posterior probability for the weight. |
-o | Specify the filename to write to. |
--out-dir | Specify the directory to write files to. |
--prefix | Specify a string to be prepended to filenames. |
--min-path-mass | |
The minimum mass which must be on the path to a leaf to keep it. default: 0.001 | |
--discarded | A file to write discarded pqueries to. |
--rewrite-discarded-mass | |
Move placements which were on discarded leaves to the nearest non-discarded node. |
This subcommand is a quick root-dependent way to trim the reference tree to what is relevant for a collection of placements.
The first step is to select a collection of leaves that will be present in the trimmed tree. Convert all of the placements into mass by unitizing mass for each placerun individually and then take the overall average of those collections of masses. Then select any leaf for inclusion that has at least --min-path-mass on the path from the root to the leaves.
Once those leaves are selected, we take the induced subtree on those leaves. That is, if we were to take the induced subtree of
^
/ \
/\ \
ab /\
c d
with the leaf set {a,d}, we would get the tree
^
/ \
/ \
a \
d
with the branch lengths induced by adding branch lengths along edges that are not bifurcating.
This trimmed tree is then put in the placefile. Note that even though the tree is different than the one in the reference package, we have arranged things so that it’s possible to use reference package C with a tree that has been trimmed from C‘s tree.