reclass

reclass reclassifies nonconvex sequences in a reference package.

usage: reclass [options] -c my.refpkg

Options

-c Reference package path. Required.
-o Specify the filename to write to.
--out-dir Specify the directory to write files to.
--prefix Specify a string to be prepended to filenames.
--csv Output the results as csv instead of a padded matrix.
-j The number of processes to run pplacer with. default: 2
-p Calculate posterior probabilities when doing placements.
--placefile Save the placefile generated by running pplacer to the specified location.
-t If specified, the path to write the suggestion tree to.

Details

Suggest better classifications of sequences in a reference package.

rppr reclass first looks at all ranks of a reference package, starting at the most specific rank, until it finds the first rank with discordance. Discordant leaves are cut off using the convexify algorithm. The sequences corresponding to the cut leaves are then placed with pplacer against the remaining sequences in the reference package.

The result is a table of the suggested reclassifications of these sequences:

Column name Description
seq_name The name of the sequence.
old_name The name of the tax_id this sequence used to have.
old_taxid The actual tax_id this sequence used to have.
new_name The name of the suggested tax_id for this sequence.
new_taxid The actual suggested tax_id.
makes_convex true if reclassifying this sequence with the new tax_id will result in a convex tree, otherwise false.
uninformative true if this sequence sits inside a clade determined to be uninformative, otherwise false.
old_median_dist The median distance from this leaf to the convex subset of leaves classified with the previous tax_id.
old_avg_cv The coefficient of variation of the distances from this leaf to the convex subset of leaves classified with the previous tax_id, multiplied by 100%. [1]
new_median_dist The median distance from this leaf to the convex subset of leaves classified with the suggested tax_id. [2]
new_avg_cv The coefficient of variation of the distances from this leaf to the convex subset of leaves classified with the suggested tax_id, multiplied by 100%. [1]
n_with_old The number of leaves in the original tree which were classified with the previoux tax_id.
n_nonconvex The number of leaves in the original tree classified with the previous tax_id which were also non-convex.
[1](1, 2) This may be - if there are only zero or one other leaves with that tax_id.
[2]This may be - if there are no other leaves with that tax_id.

Uninformative clades are determined to be clades containing all of the representatives of exactly two tax_ids which are not found anywhere else in the tree.

The suggestion tree emitted with the -t flag is similar to the discordance tree emitted by rppr convexify; it colors all of the leaves which were cut by the convexify step red, colors all of the edges which are considered uninformative (with the overlap between these two sets colored orange), and relabels the cut sequences to include the name of the suggested new classification. The new label is in the format seq_name -> new_name, using the column names described above.