The dask-tutorial-sa from ndungundegwa

#Tutorial on Parallel/fast ML pipelines in Python

##First ssh connection to the Lengau cluster

The first step to do is login to the CHPC cluster. To do this you can open the CLI (or terminal) on your laptop and type the command below for login node 2 ( a shared login node):

ssh -o PubkeyAuthentication=no -o PreferredAuthentications=password [email protected]

ssh -i /home/serafina/.ssh/id_Sera_Leng [email protected]

changing 'user' with your own username. It will ask the password associated with your user account.

##Automatic connection to LENGAU CLUSTER

To connect to LENGAU cluster in an automatic and safe way we need to create a pair of ssh keys

step 1: ssh key creation

You can use the command:

ssh-keygen -t ecdsa -b 521

step 2: copy public key to Lengau

scp -o PubkeyAuthentication=no -o PreferredAuthentications=password  .ssh/id_Lengau_user.pub   [email protected]:/home/user/.ssh/

##Setting Python environment

Before starting the tutorial activities, we need to setup the conda environment, where we install all the modules/packages needed for running ML pipelines in Python in parallel with the Dask scheduler.

First, we need to load the conda module and all the libraries that serve as interface of the Python libraries for GPUs (graphical processing units).

module load chpc/cuda/11.2/SXM2/11.2 chpc/openmpi/4.0.0/gcc-7.3.0 chpc/openblas/0.2.19/gcc-6.1.0 chpc/astro/anaconda/3

Now we can create the Python environment using conda.

For people who are not expert conda users here there is the CONDA documentation:

Before we get started, some of you might be wondering what the difference is between conda, pip, and venv.

I can’t put it any better than this:

pip is a package manager for Python
venv is an environment manager for Python
conda is both a package and environment manager and is language agnostic.

Whereas pip only installs Python packages from PyPI, conda can both

Install packages (written in any language) from repositories like Anaconda Repository and Anaconda Cloud.
Install packages from PyPI by using pip in an active Conda environment.

To automize this process I created the file env_tutorial.yml from which you can set a conda environment doing:

conda env create -f env_tutorial.yml

##Running jobs on Lengau computing nodes

The Lengau cluster is managed through a PBS scheduler, with queues and projects.

To see the names of the available queues we can use the command:

qstat -q

To see on which nodes are running the jobs at the moment

qstat -rn

To see which nodes are free at the moment:

pbsnodes -ajS

To run interactive PBS session you can type:

qsub -P CHPC -I -l nodes=1:ppn=4,walltime=00:15:00 -j Sera_test

##Overview of the notebooks

00_overview.ipynb : A brief overview of the Dask API
01_dataframe.ipynb : performing operations on dataframe with dask
02_local_cluster_monte_carlo_estimate_of_pi.ipynb : A short recap of LocalClusters that runs a Monte Carlo estimate of the number pi that is explained here.
03_Image_classification_with_RESNET.ipynb : This jupyter notebook show as to run the RESNET model training on GPUs with dask. As input data for the model training and evaluation we used the Stanford Dogs Datasetdataset webpage
04_Galaxy_classification.ipynb

Depending on what type of a learner you are, you might want to learn more about Dask itself before diving in here. The https://examples.dask.org website and especially this binder with all the examplpes to be run interactively are a great place to start.

ndungundegwa / dask-tutorial-sa Goto Github PK

dask-tutorial-sa's Introduction

dask-tutorial-sa's People

Contributors

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent