Giter Site home page Giter Site logo

ndungundegwa / dask-tutorial-sa Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sera91/dask-tutorial-sa

0.0 0.0 1.0 698 KB

This repo contains the scripts for the tutorial activity on Dask-ML for the School 'High Performance Computing for sustainable development' organized by ICTP/CHPC in SA.

Shell 0.02% Python 0.01% Jupyter Notebook 99.97%

dask-tutorial-sa's Introduction

#Tutorial on Parallel/fast ML pipelines in Python

##First ssh connection to the Lengau cluster

The first step to do is login to the CHPC cluster. To do this you can open the CLI (or terminal) on your laptop and type the command below for login node 2 ( a shared login node):

ssh -o PubkeyAuthentication=no -o PreferredAuthentications=password [email protected]

or

ssh -o PubkeyAuthentication=no -o PreferredAuthentications=password [email protected]

ssh -i /home/serafina/.ssh/id_Sera_Leng [email protected]

changing 'user' with your own username. It will ask the password associated with your user account.

##Automatic connection to LENGAU CLUSTER

To connect to LENGAU cluster in an automatic and safe way we need to create a pair of ssh keys

step 1: ssh key creation

You can use the command:

ssh-keygen -t ecdsa -b 521

step 2: copy public key to Lengau

scp -o PubkeyAuthentication=no -o PreferredAuthentications=password  .ssh/id_Lengau_user.pub   [email protected]:/home/user/.ssh/

##Setting Python environment

Before starting the tutorial activities, we need to setup the conda environment, where we install all the modules/packages needed for running ML pipelines in Python in parallel with the Dask scheduler.

First, we need to load the conda module and all the libraries that serve as interface of the Python libraries for GPUs (graphical processing units).

module load chpc/cuda/11.2/SXM2/11.2 chpc/openmpi/4.0.0/gcc-7.3.0 chpc/openblas/0.2.19/gcc-6.1.0 chpc/astro/anaconda/3

Now we can create the Python environment using conda.

For people who are not expert conda users here there is the CONDA documentation:

Before we get started, some of you might be wondering what the difference is between conda, pip, and venv.

I can’t put it any better than this:

  • pip is a package manager for Python
  • venv is an environment manager for Python
  • conda is both a package and environment manager and is language agnostic.

Whereas pip only installs Python packages from PyPI, conda can both

  • Install packages (written in any language) from repositories like Anaconda Repository and Anaconda Cloud.
  • Install packages from PyPI by using pip in an active Conda environment.

To automize this process I created the file env_tutorial.yml from which you can set a conda environment doing:

conda env create -f env_tutorial.yml

##Running jobs on Lengau computing nodes

The Lengau cluster is managed through a PBS scheduler, with queues and projects.

To see the names of the available queues we can use the command:

qstat -q

To see on which nodes are running the jobs at the moment

qstat -rn

To see which nodes are free at the moment:

pbsnodes -ajS

To run interactive PBS session you can type:

qsub -P CHPC -I -l nodes=1:ppn=4,walltime=00:15:00 -j Sera_test

##Overview of the notebooks

Depending on what type of a learner you are, you might want to learn more about Dask itself before diving in here. The https://examples.dask.org website and especially this binder with all the examplpes to be run interactively are a great place to start.

dask-tutorial-sa's People

Contributors

sera91 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.