ASV Tables Created in R
ASV tables created using the Bioconductor/R version of DADA2 are matrix files with samples as rows and taxa as columns. The taxa names are the sequences themselves. Because these matrices can be quite large they are most conveniently saved as compressed rds files. Read these files into R and create an experiment level phyloseq object containing an OTU or ASV table and representative sequences with the following R script:
# Load libraries:
library(phyloseq)
library(Biostrings)
library(RDPutils
# Read in the ASV file:
otu <- readRDS("seqtab_collapsed_nochim.rds")
# Get the representative sequences:
rep.seqs <- colnames(otu)
rep.seqs <- Biostrings::DNAStringSet(rep.seqs)
# Generate taxa names and assign them to the representative
# sequences and the ASV table taxa names (i.e. column names):
otu.names <- RDPutils::make_otu_names(1:length(rep.seqs))
colnames(otu) <- otu.names
names(rep.seqs) <- otu.names
# Import into phyloseq:
otu.table <- otu_table(otu, = FALSE)
expt <- phyloseq(otu.table, rep.seqs)
expt
phyloseq-class experiment-level object
otu_table() OTU Table: [ 19891 taxa and 75 samples ]
refseq() DNAStringSet: [ 19891 reference sequences ]
The representative sequences can then be exported to a fasta file, classified by your favorite method, treed if appropriate, and the results read into R and combined with the phyloseq object. Export the representative sequences with the R code:
Biostrings::writeXStringSet(rep.seqs, file = "rep_seqs.fasta", format = "fasta")
ASV Tables Created in QIIME2
QIIME2 saves its objects termed “artifacts” as qza files. These are actually zip files containing some extra information about the object. It is possible to extract the OTU (or ASV) table by simply unzipping the table object, or you can use QIIME2 commands to export a text version of the object. If you use the dada2 plug-in, the taxa names for the ASV table are hashes that encode the sequences, rather than the sequences themselves. Therefore if you want to include representative sequences in your phyloseq object, you will have to extract or export them separately. Here are the QIIME2 commands I use to put the required files in the sub-directory phyloseq
:
# Export OTU table:
mkdir phyloseq
qiime tools export \
--input-path table.qza \
--output-path phyloseq
# Convert biom format to tab-separated text format:
biom convert \
-i phyloseq/feature-table.biom \
-o phyloseq/otu_table.tsv \
--to-tsv
# Modify otu_table.txt to make it easier to read into R.
# Use sed to delete the first line and "#OTU ID" from the
# second line.
cd phyloseq
sed -i '1d' otu_table.tsv
sed -i 's/#OTU ID//' otu_table.tsv
cd ../
# Export representative sequences:
qiime tools export \
--input-path rep-seqs.qza \
--output-path phyloseq
The text files are then readily read into R and combined into a phyloseq
object. Just remember that in this case taxa are rows of the OTU table.:
# Load libraries:
library(phyloseq)
library(Biostrings)
otu <- read.table("otu_table.tsv", row.names = 1, header = TRUE, sep = "\t")
otu.table <- phyloseq::otu_table(otu, taxa_are_rows = TRUE)
rep.seqs <- Biostrings::readDNAStringSet("dna-sequences.fasta", format = "fasta")
expt <- phyloseq::phyloseq(otu.table, rep.seqs)
expt
phyloseq-class experiment-level object
otu_table() OTU Table: [ 19887 taxa and 75 samples ]
refseq() DNAStringSet: [ 19887 reference sequences ]