This section provides instructions for tasks not covered in my workshops. They include how to install some programs, how to use “helper” scripts to get data into the proper format for further processing. examples of data analysis using R, how to download SRA files, how to train RDP and QIIME2 classifiers, and how to process 16S and ITS sequences with the QIIME2 DADA2 plugin.
RDP’s web-based tools are also available on GitHub as a set of command line programs named RDPTools. Advantages of the command line versions are that you can avoid repeatedly uploading and downloading data, you can handle larger data sets, and you can write scripts to automate sequence processing and analysis.
Installing Modified PANDAseq
PANDAseq is a program for merging paired Illumina reads. The RDP made modifications to the original PANDAseq to improve accuracy of assembly by performing a modified statistical analysis using the sequencer supplied Q scores to find the most likely region of overlap. These modifications are now largely available in the original PANDAseq by using the options
-A rdp_mle and
-C min_readscore:25, but the modified version provides more options important for running the RDP algorithm.
Installing FunGene Pipeline
The FunGene Pipeline is for processing functional gene sequences. It makes use of FrameBot to correct sequencing errors (insertions and deletions) and translate nucleotide sequences into protein sequences. It is available as a web-based tool at this link, but can also be installed locally.
For ecologists, the most common use of Procrustes analysis is to compare ordinations, for example of samples by species and environmental data. This tutorial explains how to carry out such an analysis in R with functions in the
Downloading Sequence Files from NCBI’s SRA
Sequences can be quickly downloaded from NCBI’s short read archive and given meaningful file names using the directions in this tutorial.
Training the RDP Classifer
The RDP Classier includes databases for bacterial and archaeal 16S rRNA and fungal LSU gene sequences, and for fungal ITS sequences by both Warcup and UNITE taxonomy. It is possible to update these databases and to make up your own databases for other genes and/or taxonomies using the directions in this tutorial.
Conda and Virtual Environments
Solve program installation problems by using conda, a package and virtual environment manager.
Processing 16S Sequences with QIIME2 and DADA2
Instructions for processing 16S sequence data with the DADA2 plug-in for QIIME2 and creating files that are easily imported into R and phyloseq.
FIGARO is a program from Zymo Research for finding the optimal truncation parameters when using the DADA2 plug-in for QIIME2.
Training the QIIME2 Classifier with UNITE ITS Reference Sequences
Instructions for creating a classifier file to be used by QIIME2 for the classification of fungal ITS sequences.
Processing ITS Sequences with QIIME2 and DADA2
Instructions for processing ITS sequence data with the DADA2 plug-in for QIIME2 and creating files that are easily imported into R and phyloseq.