Table Of Contents

Previous topic

kr_heat

Next topic

mcl

This Page

lpca

lpca performs length principal components.

usage: lpca [options] placefiles

Options

--out-dir Specify the directory to write files to.
--prefix Specify a string to be prepended to filenames. Required.
--point-mass Treat every pquery as a point mass concentrated on the highest-weight placement.
--pp Use posterior probability for the weight.
-c Reference package path.
--min-fat The minimum branch length for fattened edges (to increase their visibility). To turn off set to 0. Default: 0.01
--total-width Set the total pixel width for all of the branches of the tree. Default: 300
--width-factor Override total-width by directly setting the number of pixels per unit of thing displayed.
--node-numbers Put the node numbers in where the bootstraps usually go.
--gray-black Use gray/black in place of red/blue to signify the sign of the coefficient for that edge.
--min-width Specify the minimum width for a branch to be colored and thickened. Default is 1.
--write-n The number of principal coordinates to calculate (default is 5).
--som The number of dimensions to rotate for support overlap minimization(default is 0; options are 0, 2, 3).
--scale Scale variances to one before performing principal components.
--symmv Use a complete eigendecomposition rather than power iteration.
--raw-eval Output the raw eigenvalue rather than the fraction of variance.
--kappa Specify the exponent for scaling between weighted and unweighted splitification. default: 1
--rep-edges Cluster neighboring edges that have splitified euclidean distance less than the argument.
--epsilon The epsilon to use to determine if a split matrix’s column is constant for filtering. default: 1e-05

Details

This is an experimental feature. We are currently validating and writing a paper about it.

Perform length principal components analysis (“length PCA”). Length PCA takes the special structure of phylogenetic placement data into account. Consequently, it is possible to visualize the principal component eigenvectors, and it can find consistent differences between samples which may not be so far apart in the tree. Length PCA is similar to epca but is invariant to edge subdivision on the reference tree where edge PCA is invariant to branch length. Running this command produces the following files for a run with out prefix set to out:

out.trans
The top eigenvalues (first column) then their corresponding eigenvectors.
out.proj
The samples projected into principal coordinate space.
out.xml
The eigenvectors visualized as fattened and colored trees.

The --som flag triggers a Support Overlap Minimization (SOM) rotation of the principal components. Setting this value to n triggers a rotation of the first n principal component vectors such that the overlap in support (non-zero vector entries) between the vectors is minimized. This can make the projections easier to interpret from a biological perspective, but care should be taken not to rotate noise into more meaningful components (a good rule of thumb is not to rotate vectors with less than 10% of the variance).

Acceptable values are 0 (no rotation; the default), 2 or 3. Looking at the output from the non-rotated principal components can help you determine what is most appropriate here. If 2 or 3 are specified, the follows files will also be output:

out.som
The rotated eigenvectors and corresponding variance values.
out.som.xml
The rotated vectors visualized as fattened and colored trees.

See the splitify documentation for information about the --kappa flag.