:tocdepth: 3 .. _scripts: ======= scripts ======= The `pplacer` package comes with a few scripts to perform common tasks on reference packages and placements: Installing ========== All scripts can be used by specifying the full path. For convenience, a Python ``setup.py`` file is provided, which will install them globally. To install, run:: $ python setup.py install from the ``scripts/`` subdirectory, prefixed with ``sudo`` if the python installation directory is not writable. ``refpkg_align.py`` =================== ``refpkg_align.py`` works with reference package alignments and alignment profiles, providing methods to align sequences to a reference package alignment, and extract an alignment from a reference package. ``refpkg_align.py`` depends on `BioPython `_, as well as the external tools used for alignment: HMMER3_, Infernal_, and PyNAST_. .. _HMMER3: http://hmmer.janelia.org .. _Infernal: http://infernal.janelia.org .. _PyNAST: http://qiime.org/pynast List of subcommands ------------------- ``align`` ********* ``refpkg_align.py align`` aligns sequences to a reference package alignment for use with pplacer. For reference packages built with Infernal_, ``cmalign`` is used for alignment. For packages built using HMMER3_, alignment is performed with ``hmmalign``. Reference packages lacking a ``profile`` entry are aligned using PyNAST_. By default, an alignment :ref:`mask` is applied if it exists. The output format varies: Stockholm for Infernal- and HMMER-based reference packages, FASTA for all others. For Infernal-based reference packages, MPI may be used. :: usage: refpkg_align.py align [options] refpkg seqfile outfile Options ^^^^^^^ :: -h, --help show this help message and exit --align-opts OPTS Alignment options, such as "--mapali $aln_sto". '$' characters will need to be escaped if using template variables. Available template variables are $aln_sto, $profile. Defaults are as follows for the different profiles: (PyNAST: "-l 150 -f /dev/null -g /dev/null") (INFERNAL: "-1 --hbanded --sub --dna") --alignment-method {PyNAST,HMMER3,INFERNAL} Profile version to use. [default: Guess. PyNAST is used if a valid CM or HMM is not found in the reference package.] --no-mask Do not trim the alignment to unmasked columns. [default: apply mask if it exists] --debug Enable debug output --verbose Enable verbose output MPI Options --use-mpi Use MPI [infernal only] --mpi-arguments MPI_ARGUMENTS Arguments to pass to mpirun --mpi-run MPI_RUN Name of mpirun executable ``extract`` *********** ``extract`` extracts a reference alignment from a reference package, apply a :ref:`mask` if it exists by default. :: usage: refpkg_align.py extract [options] refpkg output_file Options ^^^^^^^ :: positional arguments: refpkg Reference package directory output_file Destination optional arguments: -h, --help show this help message and exit --output-format OUTPUT_FORMAT output format [default: stockholm] --no-mask Do not apply mask to alignment [default: apply mask if it exists] .. _mask: mask ---- *Warning:* masking is experimental and we may change our mind about how it gets implemented. Alignment masks may be specified through an entry named "mask" in the ``CONTENTS.json`` file of a reference package pointing to a file with a comma-delimited set of 0-based indices in an alignment to **keep** after masking. For example, a mask specification of: ``0,1,2,3,4,5,6,28,29`` Would discard all columns in an alignment except for 0-7, 28, and 29. ``sort_placefile.py`` ===================== ``sort_placefile.py`` takes a placefile and sorts and formats its contents for then performing a visual diff of placefiles. Output defaults to being emitted to stdout. :: usage: sort_placefile.py [-h] [-o FILE] infile .. ``update_refpkg.py`` ==================== ``update_refpkg.py`` updates a reference package from the 1.0 format to the 1.1 format. It takes the ``CONTENTS.json`` file in the reference package as its parameter and updates it in place, after making a backup copy. :: usage: update_refpkg.py [-h] CONTENTS.json .. ``check_placements.py`` ======================= ``check_placements.py`` checks a placefile for potential issues, including: * Any ``like_weight_ratio`` being equal to 0. * The sum of the ``like_weight_ratios`` not being equal to 1. * Any ``post_prob`` being equal to 0. * The sum of the ``post_probs`` being equal to 0. * The sum of the ``post_probs`` not being equal to 1. :: usage: check_placements.py example.jplace .. .. _deduplicate-sequences: ``deduplicate_sequences.py`` ============================ ``deduplicate_sequences.py`` deduplicates a sequence file and produces a dedup file suitable for use with ``guppy redup -m``. See the :ref:`redup ` documentation for details. ``pca_for_qiime.py`` ==================== ``pca_for_qiime.py`` converts the ``trans`` file output by ``guppy pca`` into the tab-delimited format expected by QIIME's plotting functions. :: usage: pca_for_qiime.py [-h] trans tsv ``extract_taxonomy_from_biom.py`` ================================= ``extract_taxonomy_from_biom.py`` extracts the taxonomy information from a BIOM file, producing seqinfo and taxonomy files which can then be placed into a reference package. :: usage: extract_taxonomy_from_biom.py [-h] biom taxtable seqinfo ``hrefpkg_query.py`` ==================== ``hrefpkg_query.py`` classifies sequences using a hrefpkg. The output is a sqlite database with the same schema as created by :ref:`rppr prep_db `. :: usage: hrefpkg_query.py [options] hrefpkg query_seqs classification_db positional arguments: hrefpkg hrefpkg to classify using query_seqs input query sequences classification_db output sqlite database optional arguments: -h, --help show this help message and exit -j CORES, --ncores CORES number of cores to tell commands to use -r RANK, --classification-rank RANK rank to perform the initial NBC classification at --workdir DIR directory to write intermediate files to (default: a temporary directory) --disable-cleanup don't remove the work directory as the final step --use-mpi run refpkg_align with MPI --alignment {align-each,merge-each,none} respectively: align each input sequence; subset an input stockholm alignment and merge each sequence to a reference alignment; only subset an input stockholm alignment (default: align-each) --cmscores FILE in align-each mode, write out a file containing the cmalign scores external binaries: --pplacer PROG pplacer binary to call --guppy PROG guppy binary to call --rppr PROG rppr binary to call --refpkg-align PROG refpkg_align binary to call --cmalign PROG cmalign binary to call .. ``multiclass_concat.py`` ======================== ``multiclass_concat.py`` takes a database which has been classified using :ref:`guppy classify ` and creates a view ``multiclass_concat``. This view has the same schema as ``multiclass``, with the addition of an ``id_count`` column. However, instead of getting multiple rows when a sequence has multiple classifications at a rank, the ``tax_id`` column will be all of the tax_ids concatenated together, delimited by ``,``. To ensure that it's still easy to join ``multiclass_concat`` to the ``taxa`` table, rows are inserted into the ``taxa`` table for each concatenated tax_id present in the ``multiclass_concat`` table which have a ``tax_name`` created by concatenating the names of all the constituent tax_ids. :: usage: multiclass_concat.py [options] database positional arguments: database sqlite database (output of `rppr prep_db` after `guppy classify`) optional arguments: -h, --help show this help message and exit .. ``split_qiime.py`` ================== ``split_qiime.py`` takes sequences in `QIIME's preprocessed FASTA format`_ and generates a FASTA file which contains the original sequence names. Optionally, a specimen map can also be written out which maps from the original sequence names to their specimens as listed in the QIIME file. For example, an incoming sequence identified by ``>PC.634_1 FLP3FBN01ELBSX`` will be written out as ``>FLP3FBN01ELBSX`` with an entry in the specimen_map of ``FLP3FBN01ELBSX,PC.634``. :: usage: split_qiime.py [-h] [qiime] [fasta] [specimen_map] Extract the original sequence names from a QIIME FASTA file. positional arguments: qiime input QIIME file (default: stdin) fasta output FASTA file (default: stdout) specimen_map if specified, output specimen map (default: don't write) optional arguments: -h, --help show this help message and exit .. _QIIME's preprocessed FASTA format: http://qiime.org/tutorials/tutorial.html#assign-samples-to-multiplex-reads