:tocdepth: 3
.. _scripts:
=======
scripts
=======
The `pplacer` package comes with a few scripts to perform common tasks on
reference packages and placements:
Installing
==========
All scripts can be used by specifying the full path. For convenience, a Python
``setup.py`` file is provided, which will install them globally. To install,
run::
$ python setup.py install
from the ``scripts/`` subdirectory, prefixed with ``sudo`` if the python
installation directory is not writable.
``refpkg_align.py``
===================
``refpkg_align.py`` works with reference package alignments and alignment
profiles, providing methods to align sequences to a reference package
alignment, and extract an alignment from a reference package.
``refpkg_align.py`` depends on `BioPython `_, as
well as the external tools used for alignment: HMMER3_, Infernal_, and PyNAST_.
.. _HMMER3: http://hmmer.janelia.org
.. _Infernal: http://infernal.janelia.org
.. _PyNAST: http://qiime.org/pynast
List of subcommands
-------------------
``align``
*********
``refpkg_align.py align`` aligns sequences to a reference package alignment for
use with pplacer. For reference packages built with Infernal_, ``cmalign`` is
used for alignment. For packages built using HMMER3_, alignment is performed
with ``hmmalign``. Reference packages lacking a ``profile`` entry are aligned
using PyNAST_. By default, an alignment :ref:`mask` is applied if it exists.
The output format varies: Stockholm for Infernal- and HMMER-based reference
packages, FASTA for all others.
For Infernal-based reference packages, MPI may be used.
::
usage: refpkg_align.py align [options] refpkg seqfile outfile
Options
^^^^^^^
::
-h, --help show this help message and exit
--align-opts OPTS Alignment options, such as "--mapali $aln_sto". '$'
characters will need to be escaped if using template
variables. Available template variables are $aln_sto,
$profile. Defaults are as follows for the different
profiles: (PyNAST: "-l 150 -f /dev/null -g /dev/null")
(INFERNAL: "-1 --hbanded --sub --dna")
--alignment-method {PyNAST,HMMER3,INFERNAL}
Profile version to use. [default: Guess. PyNAST is
used if a valid CM or HMM is not found in the
reference package.]
--no-mask Do not trim the alignment to unmasked columns.
[default: apply mask if it exists]
--debug Enable debug output
--verbose Enable verbose output
MPI Options
--use-mpi Use MPI [infernal only]
--mpi-arguments MPI_ARGUMENTS
Arguments to pass to mpirun
--mpi-run MPI_RUN Name of mpirun executable
``extract``
***********
``extract`` extracts a reference alignment from a reference package, apply a
:ref:`mask` if it exists by default.
::
usage: refpkg_align.py extract [options] refpkg output_file
Options
^^^^^^^
::
positional arguments:
refpkg Reference package directory
output_file Destination
optional arguments:
-h, --help show this help message and exit
--output-format OUTPUT_FORMAT
output format [default: stockholm]
--no-mask Do not apply mask to alignment [default: apply mask
if it exists]
.. _mask:
mask
----
*Warning:* masking is experimental and we may change our mind about how it gets
implemented.
Alignment masks may be specified through an entry named "mask" in the
``CONTENTS.json`` file of a reference package pointing to a file with a
comma-delimited set of 0-based indices in an alignment to **keep** after
masking.
For example, a mask specification of:
``0,1,2,3,4,5,6,28,29``
Would discard all columns in an alignment except for 0-7, 28, and 29.
``sort_placefile.py``
=====================
``sort_placefile.py`` takes a placefile and sorts and formats its contents for
then performing a visual diff of placefiles. Output defaults to being emitted
to stdout.
::
usage: sort_placefile.py [-h] [-o FILE] infile
..
``update_refpkg.py``
====================
``update_refpkg.py`` updates a reference package from the 1.0 format to the 1.1
format. It takes the ``CONTENTS.json`` file in the reference package as its
parameter and updates it in place, after making a backup copy.
::
usage: update_refpkg.py [-h] CONTENTS.json
..
``check_placements.py``
=======================
``check_placements.py`` checks a placefile for potential issues, including:
* Any ``like_weight_ratio`` being equal to 0.
* The sum of the ``like_weight_ratios`` not being equal to 1.
* Any ``post_prob`` being equal to 0.
* The sum of the ``post_probs`` being equal to 0.
* The sum of the ``post_probs`` not being equal to 1.
::
usage: check_placements.py example.jplace
..
.. _deduplicate-sequences:
``deduplicate_sequences.py``
============================
``deduplicate_sequences.py`` deduplicates a sequence file and produces a dedup
file suitable for use with ``guppy redup -m``. See the
:ref:`redup ` documentation for details.
``pca_for_qiime.py``
====================
``pca_for_qiime.py`` converts the ``trans`` file output by ``guppy pca`` into
the tab-delimited format expected by QIIME's plotting functions.
::
usage: pca_for_qiime.py [-h] trans tsv
``extract_taxonomy_from_biom.py``
=================================
``extract_taxonomy_from_biom.py`` extracts the taxonomy information from a BIOM
file, producing seqinfo and taxonomy files which can then be placed into a
reference package.
::
usage: extract_taxonomy_from_biom.py [-h] biom taxtable seqinfo
``hrefpkg_query.py``
====================
``hrefpkg_query.py`` classifies sequences using a hrefpkg. The output is a
sqlite database with the same schema as created by :ref:`rppr prep_db
`.
::
usage: hrefpkg_query.py [options] hrefpkg query_seqs classification_db
positional arguments:
hrefpkg hrefpkg to classify using
query_seqs input query sequences
classification_db output sqlite database
optional arguments:
-h, --help show this help message and exit
-j CORES, --ncores CORES
number of cores to tell commands to use
-r RANK, --classification-rank RANK
rank to perform the initial NBC classification at
--workdir DIR directory to write intermediate files to (default: a
temporary directory)
--disable-cleanup don't remove the work directory as the final step
--use-mpi run refpkg_align with MPI
--alignment {align-each,merge-each,none}
respectively: align each input sequence; subset an
input stockholm alignment and merge each sequence to a
reference alignment; only subset an input stockholm
alignment (default: align-each)
--cmscores FILE in align-each mode, write out a file containing the
cmalign scores
external binaries:
--pplacer PROG pplacer binary to call
--guppy PROG guppy binary to call
--rppr PROG rppr binary to call
--refpkg-align PROG refpkg_align binary to call
--cmalign PROG cmalign binary to call
..
``multiclass_concat.py``
========================
``multiclass_concat.py`` takes a database which has been classified using
:ref:`guppy classify ` and creates a view
``multiclass_concat``. This view has the same schema as ``multiclass``, with
the addition of an ``id_count`` column. However, instead of getting multiple
rows when a sequence has multiple classifications at a rank, the ``tax_id``
column will be all of the tax_ids concatenated together, delimited by ``,``.
To ensure that it's still easy to join ``multiclass_concat`` to the ``taxa``
table, rows are inserted into the ``taxa`` table for each concatenated tax_id
present in the ``multiclass_concat`` table which have a ``tax_name`` created by
concatenating the names of all the constituent tax_ids.
::
usage: multiclass_concat.py [options] database
positional arguments:
database sqlite database (output of `rppr prep_db` after `guppy
classify`)
optional arguments:
-h, --help show this help message and exit
..
``split_qiime.py``
==================
``split_qiime.py`` takes sequences in `QIIME's preprocessed FASTA format`_ and
generates a FASTA file which contains the original sequence names. Optionally,
a specimen map can also be written out which maps from the original sequence
names to their specimens as listed in the QIIME file.
For example, an incoming sequence identified by ``>PC.634_1 FLP3FBN01ELBSX``
will be written out as ``>FLP3FBN01ELBSX`` with an entry in the specimen_map of
``FLP3FBN01ELBSX,PC.634``.
::
usage: split_qiime.py [-h] [qiime] [fasta] [specimen_map]
Extract the original sequence names from a QIIME FASTA file.
positional arguments:
qiime input QIIME file (default: stdin)
fasta output FASTA file (default: stdout)
specimen_map if specified, output specimen map (default: don't write)
optional arguments:
-h, --help show this help message and exit
.. _QIIME's preprocessed FASTA format: http://qiime.org/tutorials/tutorial.html#assign-samples-to-multiplex-reads