This section provides instructions for tasks not covered in my workshops. They include how to install some programs, how to use “helper” scripts to get data into the proper format for further processing. examples of data analysis using R, how to download SRA files, how to train RDP and QIIME2 classifiers, and how to process 16S and ITS sequences with the QIIME2 DADA2 plugin.
RDP’s web-based tools are also available on GitHub as a set of command line programs named RDPTools. Advantages of the command line versions are that you can avoid repeatedly uploading and downloading data, you can handle larger data sets, and you can write scripts to automate sequence processing and analysis.
You can use the RDP Classifier without installing all of the RDPTools – just install the stand-alone verson of the classifier from Sourceforge.
SeqMatch is part of RDPTools but the installation instructions above do not include adding a SeqMatch database. See this page for how to do so.
PANDAseq is a program for merging paired Illumina reads. The RDP made modifications to the original PANDAseq to improve accuracy of assembly by performing a modified statistical analysis using the sequencer supplied Q scores to find the most likely region of overlap. These modifications are now largely available in the original PANDAseq by using the options
-A rdp_mle and
-C min_readscore:25, but the modified version provides more options important for running the RDP algorithm.
The FunGene Pipeline is for processing functional gene sequences. It makes use of FrameBot to correct sequencing errors (insertions and deletions) and translate nucleotide sequences into protein sequences. It is available as a web-based tool at this link, but can also be installed locally.
For ecologists, the most common use of Procrustes analysis is to compare ordinations, for example of samples by species and environmental data. This tutorial explains how to carry out such an analysis in R with functions in the
Sequences can be quickly downloaded from NCBI’s short read archive and given meaningful file names using the directions in this tutorial.
The RDP Classier includes databases for bacterial and archaeal 16S rRNA and fungal LSU gene sequences, and for fungal ITS sequences by both Warcup and UNITE taxonomy. It is possible to update these databases and to make up your own databases for other genes and/or taxonomies using the directions in this tutorial.
Solve program installation problems by using conda, a package and virtual environment manager.
Instructions for processing 16S sequence data with the DADA2 plug-in for QIIME2 and creating files that are easily imported into R and phyloseq.
FIGARO is a program from Zymo Research for finding the optimal truncation parameters when using the DADA2 plug-in for QIIME2.
Instructions for creating a classifier file to be used by QIIME2 for the classification of fungal ITS sequences.
Instructions for processing ITS sequence data with the DADA2 plug-in for QIIME2 and creating files that are easily imported into R and phyloseq.
You can merge the ASV tables from different DADA2 analyses provided the sample sequences were processed identically, .i.e., with the same QIIME2 version and parameters. See instructions here.
Instructions forcreating a conda environment containing the programs FastQC and MultiQC and using them to generate quality reports for your sequences prior to processing them.