wget https://github.com/cjneely10/BioMetaPipeline/blob/master/run_pipedm.py
alias pipedm="/path/to/run_pipedm.py"
docker pull cjneely10/biometapipeline:v0.0.2Clone or download this repository.
cd /path/to/BioMetaPipeline
pip3 install -r requirements.txt
python3 setup.py build_ext --inplace
export PYTHONPATH=/path/to/BioMetaPipeline:$PYTHONPATH
alias pipedm="/path/to/BioMetaPipeline/pipedm.py"Adding the export and alias statements to a user's .bashrc file will maintain these settings on next log-in.
See the wiki page for instructions on installing either
the standalone or docker version of BioMetaPipeline. Python dependencies are best maintained within a separate Python
virtual environment. BioMetaDB and BioMetaPipeline must be contained and built within the same python environment.
However, BioMetaPipeline data pipelines are managed through config files that allow direct input of the paths to the
Python 2/3 environments that house external programs (such as CheckM). The docker version of this script only relies on
the database files for these external programs, bypassing the need for the user to download individual programs
BioMetaPipeline is a wrapper-script for genome/metagenome evaluation tasks. This script will
run common evaluation and annotation programs and create a BioMetaDB project with the integrated results.
This wrapper script was built using the luigi Python package.
Running a pipeline will output a bibtex citation file and sample in-text citations for use in academic writing and research. Users are highly recommended to cite all items that are output in this file.
Each genome pipeline has an associated configuration file that is needed to properly call the underlying programs.
Default files are available in the Examples/Config directory. To preserve these files for future use, users are recommended
to make edits only to copies of these default settings. Flags and arguments that are typically passed to individual programs
can be provided here.
Although the data pipeline is made to run from start to finish, skipping completed steps as they are found, each major step
("pipe") can be rerun if needed. Delete the pipe's output directory and call the given pipeline as listed in the pipedm
section. If a BioMetaDB project is provided in the config file or passed as a command-line argument, its contents
will also be updated with the new results of this pipeline.
BioMetaPipeline outputs a BioMetaDB project containing the completed results. By passing the -a flag, users can
omit the creation or update of a given BioMetaDB project. Each pipeline outputs a final .tsv file of the results of
each individual pipe.
Multiple pipelines can be run using the same project - the results of each pipeline are stored as a new database table, and re-running a pipeline will update the existing table within the BioMetaDB project.
Some programs in each pipeline can have very high memory requirements (>100GB) or long completion times (depending on
the system used to run the pipeline). Users are advised to use a program such as screen or nohup to run this pipeline,
as well as to redirect stderr to a separate file.
pipedm.py is the calling script for the standalone version of BioMetaPipeline. run_pipedm.py is the calling script
for the docker installation.
usage: pipedm.py [-h] -d DIRECTORY -c CONFIG_FILE [-a] [-o OUTPUT_DIRECTORY]
[-b BIOMETADB_PROJECT] [-t TYPE_FILE]
program
pipedm: Run meta/genome evaluation and annotation pipelines
Available Programs:
MET_ANNOT: Runs gene callers and annotation programs on MAGs
(Flags: --directory --config_file --cancel_autocommit --output_directory --biometadb_project --type_file)
MET_EVAL: Evaluates completion, contamination, and redundancy of MAGs
(Flags: --directory --config_file --cancel_autocommit --output_directory --biometadb_project)
positional arguments:
program Program to run
optional arguments:
-h, --help show this help message and exit
-d DIRECTORY, --directory DIRECTORY
Directory containing genomes
-c CONFIG_FILE, --config_file CONFIG_FILE
Config file
-a, --cancel_autocommit
Cancel commit to database
-o OUTPUT_DIRECTORY, --output_directory OUTPUT_DIRECTORY
Output directory prefix, default out
-b BIOMETADB_PROJECT, --biometadb_project BIOMETADB_PROJECT
/path/to/BioMetaDB_project (updates values of existing database)
-t TYPE_FILE, --type_file TYPE_FILE
/path/to/type_file formatted as 'file_name.fna\t[Archaea/Bacteria]\t[gram+/gram-]\n'The typical workflow involves creating a configuration file based on the templates in Sample/Config. This config
file is then used to call the given pipeline by passing to each program any flags specified by the user. This setup
allows users to customize the calling programs to better fit their needs, as well as provides a useful documentation
step for researchers.
Both pipedm.py and run_pipedm.py use the same set of command-line arguments to function.
- Docker installation
- Users must edit the
run_pipedm.pyscript to provide paths to downloaded data. See the wiki page for instructions. - Users must use the config file templates found in
Sample/Config/Docker.
- Users must edit the
- Source code installation
- Users must edit the config file template found in
Sample/Configto provide paths to working program installations.
- Users must edit the config file template found in
- MET_EVAL
- Evaluate MAGs for completion and contamination using
CheckM, and evaluate set of passed genomes for redundancy usingFastANI. Optionally predict phylogeny usingGTDBtk. A final BioMetaDB project is generated, or updated, with a table that provides a summary of the results.
- Evaluate MAGs for completion and contamination using
- MET_ANNOT
- Structurally and functionally annotate MAGs using several available annotation programs. Identify peptidase proteins, determine KEGG pathways, run PROKKA pipeline, predict viral sequences, and run interproscan. A final BioMetaDB project is generated, or updated, with a table that provides a summary of the results.
The use of signalp and RNAmmer requires an additional academic license agreement upon download. Binaries for these
programs are thus not distributed with BioMetaPipeline.