This section provides instructions for tasks not covered in my workshops. They include how to install some programs, how to use “helper” scripts to get data into the proper format for further processing. examples of data analysis using R, how to download SRA files, how to train RDP and QIIME2 classifiers, and how to process 16S and ITS sequences with the QIIME2 DADA2 plugin.
Installing RDPTools
RDP’s web-based tools are also available on GitHub as a set of command line programs named RDPTools. Advantages of the command line versions are that you can avoid repeatedly uploading and downloading data, you can handle larger data sets, and you can write scripts to automate sequence processing and analysis.
Installing the Stand-alone RDP Classifier
You can use the RDP Classifier without installing all of the RDPTools – just install the stand-alone verson of the classifier from Sourceforge.
Add SeqMatch Databases
SeqMatch is part of RDPTools but the installation instructions above do not include adding a SeqMatch database. See this page for how to do so.
Installing Modified PANDAseq
PANDAseq is a program for merging paired Illumina reads. The RDP made modifications to the original PANDAseq to improve accuracy of assembly by performing a modified statistical analysis using the sequencer supplied Q scores to find the most likely region of overlap. These modifications are now largely available in the original PANDAseq by using the options -A rdp_mle and -C min_readscore:25, but the modified version provides more options important for running the RDP algorithm.
Installing FunGene Pipeline
The FunGene Pipeline is for processing functional gene sequences. It makes use of FrameBot to correct sequencing errors (insertions and deletions) and translate nucleotide sequences into protein sequences. It is available as a web-based tool at this link, but can also be installed locally.
Procrustes Analysis
For ecologists, the most common use of Procrustes analysis is to compare ordinations, for example of samples by species and environmental data. This tutorial explains how to carry out such an analysis in R with functions in the vegan package.
Downloading Sequence Files from NCBI’s SRA
Sequences can be quickly downloaded from NCBI’s short read archive and given meaningful file names using the directions in this tutorial.
Training the RDP Classifier
The RDP Classier includes databases for bacterial and archaeal 16S rRNA and fungal LSU gene sequences, and for fungal ITS sequences by both Warcup and UNITE taxonomy. It is possible to update these databases and to make up your own databases for other genes and/or taxonomies using the directions in this tutorial.
Conda and Virtual Environments
Solve program installation problems by using conda, a package and virtual environment manager.
Processing 16S Sequences with QIIME2 and DADA2
Instructions for processing 16S sequence data with the DADA2 plug-in for QIIME2 and creating files that are easily imported into R and phyloseq.
FIGARO
FIGARO is a program from Zymo Research for finding the optimal truncation parameters when using the DADA2 plug-in for QIIME2.
Training the QIIME2 Classifier with UNITE ITS Reference Sequences
Instructions for creating a classifier file to be used by QIIME2 for the classification of fungal ITS sequences.
Processing ITS Sequences with QIIME2 and DADA2
Instructions for processing ITS sequence data with the DADA2 plug-in for QIIME2 and creating files that are easily imported into R and phyloseq.
Merging DADA2 Results in QIIME2
You can merge the ASV tables from different DADA2 analyses provided the sample sequences were processed identically, .i.e., with the same QIIME2 version and parameters. See instructions here.
FastQC for DeterminingSequence Quality
Instructions forcreating a conda environment containing the programs FastQC and MultiQC and using them to generate quality reports for your sequences prior to processing them.
