Import QIIME2 Results into R

After processing amplicon sequencing data with QIIME2, I like to import QIIME2 results into R as an experiment level phyloseq object. Such phyloseq objects have slots for an OTU or ASV table, a taxonomy table, representative sequences, a tree file of the representative sequences and a sample data table. Thus, all of the components can be saved together in a single file. The QIIME2 scripts on this website include outputting QIIME2 artifacts into files that are easily imported into phyloseq. Below is an R script for importing all of the components into an experiment level phyloseq object.

If necessary, install the phyloseq and Biostrings packages from Bioconductor. The speedytax package is installed from CRAN.

library(phyloseq)
library(Biostrings)
library(speedytax)

# Read in the otu/asv table
asv <- read.table("otu_table.tsv", header = TRUE, row.names = 1, sep = "\t")
# Convert the otu/asv table into a phyloseq object.
asv <- phyloseq::otu_table(asv, taxa_are_rows = TRUE)
# Read in the sample data table; row names must match otu/asv table sample names.
sam <- read.table(file = "sample_data.tsv", header = TRUE, row.names = 1, sep = "\t")
# Convert the sample data into a phyloseq object
sam <- phyloseq::sample_data(sam)
# Read in the fasta file of representative sequences.
rep_seqs <- Biostrings::readDNAStringSet("dna-sequences.fasta", format = "fasta")
# Read in the classification table. In this example the representative sequences
# were classified with the RDP Classifier. speedytax includes other functions for
# importing QIIME2 and USEARCH sintax classification files.
rdp_taxa <- speedytax::import_rdp_tax_table("rdp_214_classified.txt", confidence = 0.5)
# Read in the tree file.
my_tree <- phyloseq::read_tree("rooted_tree.nwk")
# Package all into a single experiment level phyloseq object.
expt <- phyloseq::phyloseq(asv, sam, rep_seqs, rdp_taxa, my_tree)
expt

An often performed  subsequent step is to filter the results taxonomically, for example  removing taxa/representative sequences that are not classified or that may be chloroplast sequences. In the example above, the representative sequences were classified with the RDP Classifier, in which case I filter the results with the following R script.

# Copy expt to expt_filt
expt_filt <- expt
# Filter taxonomically.
# Get the rank names.
rank_names(expt_filt)
# Check classification at the domain level.
get_taxa_unique(expt_filt, taxonomic.rank = "Domain")
# Keep only taxa/sequences classified as Bacteria or Archaea.
expt_filt <- subset_taxa(expt_filt, Domain %in% c("Bacteria", "Archaea"))
# Re-check classification at the Domain level.
get_taxa_unique(expt_filt, taxonomic.rank = "Domain")
# Check classification at the phylum level
get_taxa_unique(expt_filt, taxonomic.rank = "Phylum")
# Remove taxa/sequences not classified to the phylum level
expt_filt <- subset_taxa(expt_filt, Phylum != "uncl_Bacteria")
expt_filt <- subset_taxa(expt_filt, Phylum != "uncl_Archaea")
# Re-check classification at the phylum level
get_taxa_unique(expt_filt, taxonomic.rank = "Phylum")
# Subset to taxa/sequences classified to Cyanobacteria.
cyan <- subset_taxa(expt_filt, Phylum == "Cyanobacteriota")
# Check classification of Cyanobacteriota at the class level
get_taxa_unique(cyan, taxonomic.rank = "Class")
# Remove taxa/sequences classified to chloroplasts 
# or potentially to chloroplasts.
expt_filt <- subset_taxa(expt_filt, Class != "Chloroplast")
expt_filt <- subset_taxa(expt_filt, Class != "uncl_Cyanobacteriota")
# Check tht potential chloroplast taxa/sequences have been removed.
cyan <- subset_taxa(expt_filt, Phylum == "Cyanobacteriota")
get_taxa_unique(cyan, taxonomic.rank = "Class")
# Print summary of the experiment level phyloseq object
expt_filt
# Save the taxonomically filtered experiment level object.
saveRDS(expt_filt, file = "expt_filt.rds")

A similar procedure can be applied when another classifier is used to classify the representative sequences. For example, potential chloroplast sequences will not be found if chloroplast sequences are not part of the database used to build the classifier.

RDP Classifier Updated

The RDP Classifier was updated to version 2.13 and released 30 July 2020 on SourceForge (https://sourceforge.net/projects/rdp-classifier/). The update is based on  bacterial and archaeal training set No.18 with over 800 new genera and significant rearrangements of several phyla and genera based on the latest genome analyses.  For detailed explanations of these revisions, please see the release notes.

As in earlier editions, databases are also included for classification of  fungal ITS sequences by UNITE and Warcup taxonomies and for  classification of fungal 28S rRNA gene sequences.

Web Version

The web version of the classifier has been updated to use training set No. 18. The taxonomy browser has also been updated to comply with the new training set.

Installing as a stand-alone application

Written in Java, the RDP Classifier may be installed and run in Windows, Mac OS, Linux, Unix and Solaris environments.

  1. Test if the Java Runtime Environment (JRE) is already installed by opening a terminal and entering:
 java -version
  1. If necessary, install JRE  downloaded from here. While Oracle now charges for the development version of Java (JDK), JRE is still free.
  2. Download RDP Classifier version 2.13 from here and extract the compressed file.

The classifier.jar file will be in sub-directory dist.

Installing as part of RDPTools

The only way to update the classifier in RDPTools is to remove and re-install RDPTools following the instructions here. Updating RDPTools will not work because the new databases are not part of the code but are downloaded during RDPTools installation.

Updating the QIIME2 version of the RDP Classifier

Reference sequences and corresponding taxonomy file for re-training the RDP Classifier included in QIIME2 can be downloaded by clicking here. See this page for how to re-train the classifier.