This section provides instructions for tasks not covered in my workshops. Most have to do with installing programs and the use of “helper” scripts to get data into the proper format for further processing. Some others have to do with data analysis using R.

Installing RDPTools

RDP’s web-based tools are also available on GitHub as a set of command line programs named RDPTools. Advantages of the command line versions are that you can avoid repeatedly uploading and downloading data, you can handle larger data sets, and you can write scripts to automate sequence processing and analysis.

Installing Modified PANDAseq

PANDAseq is a program for merging paired Illumina reads. The RDP made modifications to the original PANDAseq to improve accuracy of assembly by performing a modified statistical analysis using the sequencer supplied Q scores to find the most likely region of overlap. These modifications are now largely available in the original PANDAseq by using the options -A rdp_mle and -C min_readscore:25, but the modified version provides more options important for running the RDP algorithm.

Installing FunGene Pipeline

The FunGene Pipeline is for processing functional gene sequences. It makes use of FrameBot to correct sequencing errors (insertions and deletions) and translate nucleotide sequences into protein sequences. It is available as a web-based tool at this link, but can also be installed locally.

Procrustes Analysis

For ecologists, the most common use of Procrustes analysis is to compare ordinations, for example of samples by species and environmental data. This tutorial explains how to carry out such an analysis in R with functions in the vegan package.

Training the RDP Classifer

The RDP Classier includes databases for bacterial and archaeal 16S rRNA and fungal LSU gene sequences, and for fungal ITS sequences by both Warcup and UNITE taxonomy. It is possible to update these databases and to make up your own databases for other genes and/or taxonomies using the directions in this tutorial.

Downloading Sequence Files from NCBI’s SRA

Sequences can be quickly downloaded from NCBI’s short read archive and given meaningful file names using the directions in this tutorial.

Conda and Virtual Environments

Solve program installation problems by using conda, a package and virtual environment manager.