Llama-on-babel

Setup

Connect to babel by ssh [andrewid]@babel.lti.cs.cmu.edu
Install miniconda for managing python virtual envs
Clone this repo and install requirements by pip install -r requirements.txt
Login to huggingface by huggingface-cli login. The Llama 2 models are gated, so you may need to request access if you haven't done so before.
(Optional) setup passwordless login
(Optional, but recommended) huggingface caches files in the home directory, which eats up disk space quickly. You can ask huggingface to cache model files on /scratch, a large shared storage space available on compute nodes, by adding the following lines to your ~/.bashrc file:

if [ -d /scratch ]; then
    mkdir -p /scratch/$USER
    export TRANSFORMERS_CACHE="/scratch/$USER/hf_cache/models"
fi

Start an interactive job and run llama inference

First, request an interactive session with GPU using the following command:

srun    --time 1:00:00 \
        --gres=gpu:1 \
        --mem=30GB \
        --exclude=babel-3-[3,11,32,36],babel-4-[7,11,13,18] \
        --pty \
        bash

The srun command starts a job in the real-time, and this command will request a node with 1 GPU, 30GB memory, and 1 hour time limit. The --exclude flag is optional, and it is used to include only nodes with A6000 GPUs on babel.

Note: slurm documentation can be found here.

With the appropriate python environment activated, python src/llama-pipeline.py should run a llama generation pipeline.

Submit a job and let it run in the background

You might want to submit a job to run in the background. sbatch does exactly this: it takes a script file that describes how a node is setup and what commands to run, and submits that job for execution.

Note: the environment inside sbatch is by default inherited from the environment where sbatch is called. So you need to activate the appropriate conda environment before calling sbatch.

srun and sbatch practically share the same arguments. sbatch scripts/submit.sh python src/llama-pipeline.py submits a job that runs python src/llama-pipeline.py in the background. Stdout and stderr are redirected to a file named slurm-[jobid].out.

daphnei / llama-on-babel Goto Github PK

llama-on-babel's Introduction

Llama-on-babel

Setup

Start an interactive job and run llama inference

Submit a job and let it run in the background

llama-on-babel's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent