Import DADA2 ASV Tables into phyloseq

ASV Tables Created in R

ASV tables created using the Bioconductor/R version of DADA2 are matrix files with samples as rows and taxa as columns. The taxa names are the sequences themselves. Because these matrices can be quite large they are most conveniently saved as compressed rds files. Read these files into R and create an experiment level phyloseq object containing an OTU or ASV table and representative sequences with the following R script:

# Load libraries:

# Read in the ASV file:
otu <- readRDS("seqtab_collapsed_nochim.rds")

# Get the representative sequences:
rep.seqs <- colnames(otu)
rep.seqs <- Biostrings::DNAStringSet(rep.seqs)

# Generate taxa names and assign them to the representative
# sequences and the ASV table taxa names (i.e. column names):
otu.names <- RDPutils::make_otu_names(1:length(rep.seqs))
colnames(otu) <- otu.names
names(rep.seqs) <- otu.names

# Import into phyloseq:
otu.table <- otu_table(otu, taxa_are_rows = FALSE)
expt <- phyloseq(otu.table, rep.seqs)

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 19891 taxa and 75 samples ]
refseq()      DNAStringSet:      [ 19891 reference sequences ]

The representative sequences can then be exported to a fasta file, classified by your favorite method, treed if appropriate, and the results read into R and combined with the phyloseq object. Export the representative sequences with the R code:

Biostrings::writeXStringSet(rep.seqs,  file = "rep_seqs.fasta", format = "fasta")

ASV Tables Created in QIIME2

QIIME saves its objects termed “artifacts” as qza files. These are actually zip files containing some extra information about the object. It is possible to extract the OTU (or ASV) table by simply unzipping the table object, or you can use QIIME2 commands to export a text version of the object.  If you use the dada2 plug-in, the taxa names for the ASV table are hashes that encode the sequences, rather than the sequences themselves. Therefore if you want to include representative sequences in your phyloseq object, you will have to extract or export them separately. Here are the QIIME2 commands  I use to put the required files in the sub-directory phyloseq:

# Export OTU table:
mkdir phyloseq
qiime tools export \
--input-path table.qza \
--output-path phyloseq

# Convert biom format to tab-separated text format:
biom convert \
-i phyloseq/feature-table.biom \
-o phyloseq/otu_table.tsv \

# Modify otu_table.txt to make it easier to read into R.
# Use sed to delete the first line and "#OTU ID" from the
# second line.
cd phyloseq
sed -i '1d' otu_table.tsv
sed -i 's/#OTU ID//' otu_table.tsv
cd ../

# Export representative sequences:
qiime tools export \
--input-path rep-seqs.qza \
--output-path phyloseq

The text files are then readily read into R and combined into a phyloseq object. Just remember that in this case taxa are rows of the OTU table.:

# Load libraries:
otu <- read.table("otu_table.tsv", row.names = 1, header = TRUE, sep = "\t")
otu.table <- phyloseq::otu_table(otu, taxa_are_rows = TRUE)
rep.seqs <- Biostrings::readDNAStringSet("dna-sequences.fasta", format = "fasta")
expt <- phyloseq::phyloseq(otu.table, rep.seqs)

phyloseq-class experiment-level object
otu_table() OTU Table: [ 19887 taxa and 75 samples ]
refseq() DNAStringSet: [ 19887 reference sequences ]


Get Execution Time for a Shell Script

If you need to know the execution time for a bash script, you can place it inside the script below. The the total run time will be printed to the screen after the script finishes.

res1=$(date +%s.%N)

<your script here>

res2=$(date +%s.%N)
dt=$(echo "$res2 - $res1" | bc)
dd=$(echo "$dt/86400" | bc)
dt2=$(echo "$dt-86400*$dd" | bc)
dh=$(echo "$dt2/3600" | bc)
dt3=$(echo "$dt2-3600*$dh" | bc)
dm=$(echo "$dt3/60" | bc)
ds=$(echo "$dt3-60*$dm" | bc)
printf "Total runtime: %d:%02d:%02d:%02.4f\n" $dd $dh $dm $ds

Shamelessly copied from a post on by jwchew on August 30, 2013 .

Commenting Code in gedit

Gedit is the basic editor that is included in Ubuntu and other Linux distributions. Its functionality can be extended with plugins as explained in the post below. I installed the plugin initially because it allows one to comment or un-comment selected lines of text. I find this useful when I want to include two configuration blocks in a script, say one for a local installation of a program and another for a remote installation on a cluster. If you do this, just make sure the appropriate blocks are commented and un-commented when you run the script.

Source: Code Comment – gedit Plugin | Delightly Linux