Quick description¶
MetaMobilePicker is a Snakemake pipeline designed to identify mobile genetic elements (MGEs), specifically plasmids, insertion sequences (IS) and phages, and antimicrobial resistance (AMR) genes in metagenomics samples. It runs preprocessing steps, metagenomics assembly, several MGE identification tools, two AMR database lookups and combines the output together into one YAML file. Additionally, it creates two essential Anvi’o files for visualization purposes.
Quick installation guide¶
The quickest way to install MetaMobilePicker is to create a conda environment and install using mamba:
> conda create --name metamobilepicker python=3.10
> conda activate metamobilepicker
> conda install mamba
> mamba install -c bioconda metamobilepicker
> conda install -c bioconda biopython
Note: At this point in time, biopython needs to be installed manually. To run the pipeline you will need Singularity installed on your system with version number 3.7 or higher. If Singularity is not installed, it can be installed in one of the ways described here.
After installation, the fastest way to test the installation is to use the included test data. This dataset consists of 10.000 reads and should run relatively fast. To test the pipeline, run the following commands
> metamobilepicker run --dryrun --test
Running the pipeline¶
Within the repository the main script (metamobilepicker.py) can be found in the MetaMobilePicker directory. This is the file that runs when using the metamobilepicker command. It is not fit to run as standalone (as of version 0.6)!
Metamobilepicker.py¶
The start script has two callable submodules.
config
run
Config¶
The config module generates a config file (see “config file” for more info). This step needs to be performed before running the pipeline.
Usage: metamobilepicker config [OPTIONS]
Options:
-s, --samples TEXT Supply a custom text file with sample names and read
locations [default: samples.txt]
-a, --assembly TEXT Supply custom assembly for all samples in samples.txt
-o, --output TEXT Name of the config file
-t, --threads INTEGER Maximum number of threads to use [default: 16]
-h, --host TEXT Fasta file with host sequences
-m, --memory INTEGER Set the maximum available memory (make sure to have
enough for your assembly!) [default: 8]
-d, --datadir TEXT Set the path to your data [default: current directory]
-O, --outdir TEXT Set the path to your output directory [default: current directory]
--help Show this message and exit
To make use of this feature, first create a samples file. This is a comma-separated file (CSV) containing the names, path to forward reads and path to reverse reads per sample you want to run. Example of a samples file (named here: samples.csv. By default, metamobilepicker expects a file with the name samples.txt)
testsample,../data/raw/ERR2241639_1_test.fastq.gz,../data/raw/ERR2241639_2_test.fastq.gz
Running
metamobilepicker.py config --samples samples.csv
will create a config file from the samples files that can be used to run the pipeline.
Run¶
The run module runs the pipeline.
Usage: metamobilepicker run [OPTIONS]
Runs the MetaMobilePicker pipeline.
Options:
-n, --dryrun Test the script without running the pipeline
-a, --assembly TEXT Run MetaMobilePicker with a preexisting assembly
file
-A, --assemblyfile TEXT Run MetaMobilePicker with a set of preexisting
assemblies
-s, --snakefile TEXT Change the name of the Snakefile to be used
[default: Snakefile]
-p, --profile TEXT Use Snakemake profile
-c, --config TEXT Specify config file [default: config/config.yaml]
-u, --unlock Unlock directory after failed Snakemake attempt
-t, --test Run MetaMobilePicker with a small test set
-C, --cores INTEGER Specify the number of available cores [default:
16]
--help Show this message and exit.
By default, the pipeline will take the basic config file in the config directory. However we recommend to specify a config file generated with the config submodule and specified below. To use a premade assembly (or set of assemblies for multiple samples), the -a and -A parameters can be used. To have multiple assemblies as input, supply a text file with all paths to the assemblies on new lines. The pipeline will combine them with the samples in the order they are specified in the assembly file.
If something goes wrong while running the pipeline before an error can be displayed, like being disconnected from a server, Snakemake will lock the directory. To unlock the directory to try again, use the –unlock or -u flag.
The Config file¶
MetaMobilePicker uses the Snakemake framework to run. Config files are therefore in the YAML format. The MetaMobilePicker config files include the samples on which to run the pipeline, some information on the host contamination database and information on how many cores can be used in the process. An example can be found in the test directory
datadir: /current/directory/
host: /current/directory/human_genome.fna
max_mem: 8
outdir: /current/directory/
samples:
test:
fwd: data/sample_R1.fastq.gz
rev: data/sample_R2.fastq.gz
threads:
big: 8
huge: 16
medium: 8
small: 4
All samples are placed within the samples section with their forward (fwd) and reverse (rev) read paths. In the threads section, there are four ‘sizes’ of jobs with the number of CPUs that correspond to them. If the system does not have this amount of CPUs, Snakemake will use whatever is available. The host section contains the path to a FASTA file with host sequences. The max_mem section contains the maximum amount of memory that can be used (in Gigabytes) so that tools like Atlas know how much they can ask for. This config file can be manually created or created using the config module of the pipeline.
Output¶
The output of the pipeline is stored in a per-sample folder in the directory specified in the configfile as ‘outdir’.
In the sample results folder, there are several subdirectories. First, there is the ATLAS preprocessing folder. This folder contains the preprocessing output files, most importantly the quality-checked (QC) reads. The next subdirectory is the MetaSPAdES directory which contains all output files from the metagenomics assembly, including the contigs file with sequences larger than 1000 bp. In the “MGEs” folder, the result for all the MGE identification steps can be found. Also, the final YAML file of the pipeline (called $sample_metamobilepicker.out) along with the FASTA file containing al putative MGEs can be found in this directory. The annotation folder contains the AMR gene annotation results and the gene prediction results. The gene predictions are unused at the moment but could be used for functional annotation. The ANVIO directory contains a contigs and profile database that can be used for further analyses using Anvi’o. In the mapping directory, BAM files containing the alignment of the reads to the assembly are located.
Final output file¶
The final output file of MetaMobilePicker is a custom output file that can be parsed like a YAML file. Example:
contig:
contig_id: NODE_2_length_12828_cov_50.403664
length: 12828
number_IS: 1
number_annotations: 2
plasmid:
ID: plasmid
score: 0.966940194363968
Annotation_1:
name: Drugs:Glycopeptides:VanA-type_regulator:VANRA
start: 235
stop: 720
source: megares
gene: VANRA
accession: MEG_7452
Annotation_2:
name: Drugs:Glycopeptides:VanA-type_regulator:VANRA
start: 873
stop: 1783
source: megares
gene: VANRA
accession: MEG_7458
IS_1:
ID: IS6_292
class: IS6
start: 2107
stop: 2921
This format allows for hierarchical annotation of contigs with optional fields like ISs and AMR genes.
Additionally, a FASTA file is generated containing only the contigs predicted as plasmid or phage, or that have an IS annotated with at least 200bp on either side of the IS. The removal of IS with flanking regions shorter than 200bp is to make sure there is information on the contig other than the IS that can be used for further analysis. The details of the annotations and classifications can be found in the FASTA header.
Tutorial¶
Check if the installation of MetaMobilePicker was successful using this tutorial.
Testing the Pipeline¶
Technical test¶
After installation, the fastest way to test the installation is to use the included test data. This dataset consists of 5.000 reads and should run relatively fast. To test the pipeline, run the following commands
> metamobilepicker run --test --dryrun
If this doesn’t give any errors, run the pipeline with the following command
> metamobilepicker run --test
If this is the first run of the pipeline, it will create the appropriate conda environments and download the used containers, which can take a while.
Testing a run from scratch¶
To make sure everything is working as intended, you can create a new run using the same test data.
Config files¶
MetaMobilePicker is a Snakemake pipeline that works with YAML config files. The easiest way is to let MetaMobilePicker generate its own config file. Before we can do this, we need to generate the samples file. This comma separated file contains your sample names and the paths to the paired end reads. Before making the config file, copy the test reads to a location where you can easily locate them. In your prefered directory run the following commands
> mkdir mmp_data
> cp {PATH TO REPOSITORY}/MetaMobilePicker/test/test_reads_R1.fastq mmp_data
> cp {PATH TO REPOSITORY}/MetaMobilePicker/test/test_reads_R2.fastq mmp_data
> mkdir mmp_test_output # Our output files will go here
Now we can create our samples.txt file to look like this
testsample,mmp_data/test_reads_R1.fastq,mmp_data/test_reads_R2.fastq
Save this file as samples.txt for now. Next, we generate the config file
> metamobilepicker config --samples samples.txt --output test_config.yaml --outdir mmp_test_output
This should give you a file in the current directory called test_config.yaml that contains all the information we need to run MetaMobilePicker.
Running the pipeline using our config file¶
Next, to test the installation of the pipeline, run the following command
> metamobilepicker run -c test_config.yaml --dryrun
If this doesn’t give errors, go ahead and run
Troubleshooting¶
MetaMobilePicker depends on the (conda) installation of several tools. This can lead to unexpected errors when trying to install all environments on a different system. Here we show some issues we found during testing.
Installing the Anvi’o environment
From the Anvi’o documentation: While setting up your environment to track the development branch, especially on Ubuntu systems (first observed on Ubuntu 20.04 LTS), you may run into issues related to package conflicts that produce error messages like this one:
Encountered problems while solving:
- nothing provides r 3.2.2* needed by r-magrittr-1.5-r3.2.2_0
- nothing provides icu 54.* needed by r-base-3.3.1-1
- package sqlite-3.32.3-h4cf870e_1 requires readline >=8.0,<9.0a0, but none of the providers can be installed
- package samtools-1.9-h8ee4bcc_1 requires ncurses >=6.1,<6.2.0a0, but none of the providers can be installed
These problems can be solved by explicitly setting conda with flexible channel priority setting. Run these commands to change the channel priority setting:
conda config --describe channel_priority
conda config --set channel_priority flexible
And re-run the commands to install conda packages. You can set the priority back to ‘strict’ at any time.
Singularity issues The installation of Singularity can be the source of some bugs. When you are not the administrator of the system you’re trying to instal MetaMobilePicker on, it is important to first check if the administrator has a system-wide version of Singularity installed. This can be checked using the following command
If this command gives an error, Singularity is not installed. Please try to install it from the singularity website or from conda.
Please note that the minimal required Singularity version for MetaMobilePicker is v3.7.
Contributions¶
We welcome contributions in the form of GitLab issues or pull requests, or additions to this documentation.