Conda and Virtual Environments

Anaconda, Miniconda, Conda, Bioconda – what’s with all of these condas? Anaconda is a full blown python distribution including over 720 open source packages. Miniconda comes in two versions: Miniconda2 is a python 2.7 distribution  while Miniconda3 is a python 3 distribution. The minicondas (you probably guessed it) are smaller and do not take up as much drive space because they do not include all of the packages found in Anaconda. All three of these python distributions include conda, a package and virtual environment manager. Conda installs programs from repositories called channels, and Bioconda is a channel devoted to bioinformatic programs.

Conda allows you to solve problems often encountered when installing bioinformatics programs. You may have already found that you could not install a particular program, or that adding another program broke one or more previously installed programs. The first problem may be because installation depends on a specific compiler version you do not have, it has dependencies you do not have installed, or it requires versions of dependencies differing from what you have installed. The second problem usually results from the programs having dependencies that conflict with one another. With conda you create virtual environments into which you install binary (i.e. previously compiled) versions of programs and their dependencies that do not conflict with one another. If this cannot be done, conda says so. Then you can create multiple environments, each containing compatible programs and dependencies, and switch between the environments as you need to.

Installing Conda

To get conda, I suggest that you install Miniconda3. You will never need most of the packages in Anaconda, and with the conda included with Miniconda3 you can install any version of python and any python packages you need to support your bioinformatic programs. Follow the installation instructions under the heading “Regular Installation” at https://docs.conda.io/projects/conda/en/latest/user-guide/install/.

Add channels from which to install packages in the following order:

conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda

I suggest this order because as you add each channel, it moves to the top of a priority list. This list can always be re-ordered and/or added to later if you desire.

Creating Environments

The minimal command to create and environment is :

conda create --name my_env

This creates an environment named my_env. Optionally, you may specify a version of python and add packages at the same time. For example,

conda create -n my_env python=2.7 biopython

Using Environments

To use an environment, you must first activate it. To change environments, you must deactivate the first before activating the second:

conda activate my_env
<run some analysis>
conda deactivate
conda activate my_env2
<run some other analysis>

Using conda with pip

You can add python packages to an active environment with pip. If you need to do this, it is best to first install pip in the environment with conda. That way, conda will keep track of pip installed packages as well as conda installed packages. This becomes important if you need to share or transfer environments.

Transferring Environments

If you want to share an environment with a colleague or transfer it to another computer, you can use conda’s export function to create a yml file containing a list of all of the packages and their dependencies in the environment.

conda env export -n assemblers > assemblers.yml

The environment can then be installed on a second computer with:

conda env create -f path_to/assemblers.yml

Popular Bioinformatic Packages

Some developers, those of Qiime and Qiime2 for example, include instructions for installing their programs using conda on their web sites. The instructions for installing Qiime include the following command:

conda create -n qiime1 python=2.7 qiime matplotlib=1.4.3 mock nose -c bioconda

You can download yml files for installing Qiime2 from the web site. There are different versions  for MacOS and Linux , and they are updated frequently, so it is best to go to the Qiime2 site for the latest instructions.

In cases where the developers do not provide such instructions, you can search Bioconda for the program. Searching for RDPTools provides the following instruction:

conda install -c bioconda rdptools

This is so much easier than following the instructions on GitHub or the more detailed instructions on this site, and provides the same result but in its own environment.

More Detailed Instructions

More detailed instructions for using conda can be found on its read the docs pages , and an excellent summary on the conda  cheat sheet.