The first step in processing the data returned by a sequencing facitlity should be to determine its quality. The results are useful in deciding how to filter the data for quality before any downstream analysis. FastQC analyzes sequence quality and produces visual reports as html files that can be opened in a browser.
Create a conda environment containing the programs fastqc and multiqc. FastQC generates a report for each sequence in html
format which can be opened in a browser. It also generates compressed data files for each sequence. Multiqc scans the Fastqc results and compiles a summary report, also in html
format.
cd
conda source miniconda3/etc/profile.d/conda.sh
conda create --name fastqc fastqc'>=0.12' multiqc'>=1.18'
You only need to create this environment once.
Activate the environment and run FastQC with the code below.
conda activate fastqc # Create the directory test_fastqc and move into it
cd mkdir test_fastqc
cd test_fastqc
# Create an output directory
mkdir fastqc
# Download example files from the QIIME2 tutorial pages
wget https://data.qiime2.org/2020.2/tutorials/atacama-soils/1p/forward.fastq.gz
wget https://data.qiime2.org/2020.2/tutorials/atacama-soils/1p/reverse.fastq.gz
# Run fastqc
for f in $(ls *.fastq.gz); do
fastqc $f -o fastqc
done
# Run multiqc
cd fastqc
multiqc .
Open the html
files in the directory test_fastqc
in a browser. There are ten different kinds of reports.