Overview
Questions
Objectives
Working with Python requires one to have different packages installed with a specific version which gets updated once in a while. On Koa, there are software packages already installed on the cluster which one can use to install the required libraries, softwares and can even choose which version to install.
You can use following commands to see what modules are available on the cluster or which ones are already loaded or to load a specific module in your environment:
module avail
module list
module load <MODULE_NAME>
Sometimes different applications require different versions of the Python packages than the one you’ve been using and this is where a Python environment comes in handy.
An environment (or a conda environment specifically, which we’ll discuss later) is a directory, specific or isolated to a project, that contains a specific collection of python packages and their different versions that you have installed. There are 2 most popular tools to set up your environment:
Pip: a tool to install Python software packages only.
Anaconda (or Conda): cross platform package and environment manager which lets you access C, C++ libraries, R package for scientific computing along with Python.
Note on packages
Packages contains all the files you need for the modules it supplies
This is a popular package manager in scientific computing which handles the Python and R programming language related dependencies rather easily. It is preferred more because:
Environment isolation
If you try to access a library with different version based on your project, pip may throw an error. To create isolated environments, you can use virtual environment (venv) with pip.
Exercise: Load Anaconda and libraries
First, create a conda environment:
module load lang/Anaconda3
conda create --name tf2
source activate tf2
Second, install relevant libraries:
mamba install -c conda-forge -c nvidia tensorflow=2.10 matplotlib keras cudatoolkit
Although we created a conda environment, the Jupyter notebook still cannot access it because “conda” is the directory that contains all the installed conda packages but it is the “kernel” that runs the user’s code and can use and access different conda environments, if required.
A kernel is the computational engine that executes the code contained in Jupyter notebook or it is the interface which tells Jupyter notebook which kernel it should use to access the packages and softwares.
Exercise: Create an ipykernel
Start up a python kernel:
conda install ipykernel
python -m ipykernel install --user --name tf2 --display-name tf2
Bio Break!
Let’s take a brief break to stretch before moving on to the next page. See you in a few minutes.