compress¶
compress compresses a placefile’s pqueries.
usage: compress [options] placefile
Options¶
--point-mass | Treat every pquery as a point mass concentrated on the highest-weight placement. |
--pp | Use posterior probability for the weight. |
-o | Specify the filename to write to. |
--out-dir | Specify the directory to write files to. |
--prefix | Specify a string to be prepended to filenames. |
--cutoff | The cutoff parameter for mass compression |
--discard-below | |
In island clustering, ignore pquery locations with a mass less than the specified value. | |
--mcl | Use MCL clustering instead of island clustering. |
--inflation | If specified, pass this as the inflation value to MCL. |
Details¶
A cutoff c is specified via a command line flag. The compress command merges pairs of pqueries that have KR distance between them less than c.
To compress the pqueries:
- divide the pqueries into sets via islands or mcl
- for each pquery set, calculate all of the pairwise distances between the pqueries for that island and put them in a matrix
- build a graph such that the nodes are pqueries, and there is an edge between them if their distance is less than c
- merge pqueries according to this graph as described below.
Each original pquery (=node) will get merged into one of the the selected pqueries. This will happen as follows. Maintain a set of unmerged pqueries, and a set of pairs (w, d(w)), where w is a selected pquery and d(w) is the degree of w in the graph.
- find the (w, d(w)) pair with the greatest d(w) and remove it from the set
- find all of the unmerged pqueries that are adjacent to w in the graph, and merge their mass into w. Remove w and all of the adjacent pqueries from the unmerged pquery set.
- repeat!
Stop when the unmerged pquery set is empty.