Web-Based Supervised Approach

Introduction

RDP’s Classifier is actually a multi-classifier, too. That is, it can classify sequences in several samples at the same time. We make use of of this capability in the Supervised Approach to classify sorted and trimmed sequences for several samples. Two of the output files are directly importable into experiment level phyloseq objects with an otu_table and a _tax_table using the hier2phyloseq function of the RDPutils package.

Objectives

To learn how to use RDP’s web-based Classifier tool to do the following:

  • Classify multiple samples
  • Import the result into phyloseq

Web-Based Multiclassifier – 16S

Download the file classifier_input.zip from here to a directory of your choice. The
file contains four fastq files:

  • Native_1_2_A_trimmed.fastq
  • Native_1_4_A_trimmed.fastq
  • Native_1_7_A_trimmed.fastq
  • Native_2_7_A_trimmed.fastq

Go to RDP’s home page at http://rdp.cme.msu.edu/ and do the following:

  • Click on the RDPipeline tile.
  • Click on CLASSIFIER under Data Processing Steps.
  • Log into your account if necessary.

Fill in the form with the following:

  • A job name so you may easily identify the result later, e.g. test_classifier.
  • In the Select a gene field, select Bacterial 16S from the pulldown menu.
  • In the Select a format field, select fixrank from the pull down menu.
  • Leave the Treat all input files as one sample unchcked.
  • In the Confidence Cutoff field, enter 50. This is the recommended value for short
    16S rRNA gene sequences.
  • Select your file for upload. You can do this more than once if necessary. Compressed
    files containing sequences for several samples, as in this example, are accepted.
    The names of the submitted files will appear in the lower left-hand corner of the
    screen.
  • Click on Submit For Classification.

Your job will be submitted to the que. You may check its status by clicking on my
jobs near the top of the screen. When the job is complete you may download the
results in a compressed file of either a zip or tgz format. Save the result to the directory of your choice and decompress it.

You will get serveral files:

  • bootstrap_conf.txt
  • classifications.txt
  • cnadjusted_hierachy.txt
  • failed_sequences.txt
  • hierarchy.txt

You may import the hierarchy.txt and cnadjusted_hierachy.txt files into an
experiment level phyloseq object with the RDPutils function her2phyloseq:

In R, set your working directory to that containing the classifier result files and
enter:

library(RDPutils)
expt <- hier2phyloseq("hierarchy.txt")
expt

R should respond with:

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 81 taxa and 4 samples ] 
tax_table()   Taxonomy Table:    [ 81 taxa by 7 taxonomic ranks ]

The sample names are automatically trimmed of the “_trimmed.fastq” portion of the fastq file names that were input.

sample_names(expt)

Which gives:

[1] "Native.1.2.A" "Native.1.4.A" "USGA.1.7.A" "USGA.2.7.A"

The cnadjsuted_hierarchy.txt file may be imported in the same way. It does not
contain integer counts, but rather the numbers have been divided by the average copy number per genus if known.