splitify

splitify writes out differences of masses for the splits of the tree.

usage: splitify [options] placefile(s)

Options

-o Specify the filename to write to.
--out-dir Specify the directory to write files to.
--prefix Specify a string to be prepended to filenames.
--point-mass Treat every pquery as a point mass concentrated on the highest-weight placement.
--pp Use posterior probability for the weight.
--kappa Specify the exponent for scaling between weighted and unweighted splitification. default: 1
--rep-edges Cluster neighboring edges that have splitified euclidean distance less than the argument.
--epsilon The epsilon to use to determine if a split matrix’s column is constant for filtering. default: no filtering

Details

The first step to perform edge PCA is to make a matrix with rows indexed by the samples, and columns by the edges of the tree. The (s,e) entry of this matrix is the difference between the distribution of mass on either side of edge e for the sample s. Specifically, it is the amount of mass on the distal (non-root) side of edge e minus the amount of mass on the proximal side. The matrix is indexed such that the first numerical column is edge labeled 0 in the reference tree. The splitify subcommand simply writes out this matrix.

Specifying --rep-edges x will only take representatives from collections of neighboring edges whose Euclidean distance between splitified columns is less than x.

The --kappa option enables a componentwise transformation \varphi_\kappa on the entries of this matrix.

\varphi_\kappa(x) = \mathrm{sgn}(x) |x|^\kappa

where the \kappa parameter can any non-negative number. This parameter scales between ignoring abundance information (\kappa = 0), using it linearly (\kappa = 1), and emphasizing it (\kappa > 1).