SeqKit

SeqKit is an excellent program with 32 sub-commands for manipulating fasta and fastq files. Its abilities include converting fastq to fasta format, extracting amplicon regions, dereplication, filtering by length, removing gaps and reverse complementing sequences, to name just a few. I find it particularly useful in sampling large sequence files for the purpose of creating smaller datasets to be used in developing processing pipelines. It can even rematch forward and reverse reads should they somehow become unmatched in your workflow.

SeqKit is most easily installed using one of the package managers conda, mamba or micromamba. For example:

micromamba create --name seqkit -c bioconda seqkit

After installation and activation, get a list of available commands with:

seqkit --help

and help on any of the individual commands with:

seqkit <command> --help

For example,

seqkit seq --help

Leave a Reply

Your email address will not be published. Required fields are marked *