Ultraviolet-Tutorial

This is an unofficial quick start tutorial for NYU's UltraViolet HPC, intended for statistical analysts with no prior experience with remote SSH or Linux.

Official guide: https://hpcmed.org/guide
Wiki: http://bigpurple-ws.nyumc.org/wiki/index.php/BigPurple_HPC_Cluster

Before getting started: software recommendataions

Windows

MobaXTerm (console access + file transfer + light editing): https://mobaxterm.mobatek.net/

Also consider:

WinSCP (file transfer): https://www.winscp.net/

Mac

Terminal (console access): Included

Also consider:

Filezilla (file transfer): https://filezilla-project.org/
XQuartz (enable graphic user interface for Stata and SAS): https://filezilla-project.org/

Web interface

OnDemand: https://ondemand.hpc.nyumc.org/

Navigate your directories and submit batch jobs on your web browser. VPN needed.

Text editors

Consider Sublime Text or VSCode to edit Stata or R scripts.

Logging into UltraViolet

VPN access is needed when you’re off-campus: https://atnyulmc.org/help-documentation/NYU-Langone-Advanced-Access-App

Read this first: https://hpcmed.org/guide/bigpurple

Once you log in, you'll see:

   __  __  __  __                   _    __  _           __         __
  / / / / / / / /_  _____  _____   | |  / / (_) ____    / / ___    / /_
 / / / / / / / __/ / ___/ / __  |  | | / / / / / __ \  / / / _ \  / __/
/ /_/ / / / / /_  / /    / /_/  |  | |/ / / / / /_/ / / / /  __/ / /_
\____/ /_/  \__/ /_/     \____/\_\ |___/ /_/  \____/ /_/  \___/  \__/
                                           NYU Langone Health HPC

Use the following commands to adjust your environment:

module avail            - show available modules
module add <module>     - adds a module to your environment for this session
module initadd <module> - configure module to be loaded at every login


    BigPurple User Guide available at: http://bigpurple-ws.nyumc.org/wiki
    New HPC Portal: https://hpcmed.org/


    HPC Community town hall is held every Thursday from 12 to 1 PM.
    Meeting link: https://nyumc.webex.com/meet/siavoa01

    You may email <[email protected]> for any further assistance.


    Quarterly maintenance, schaduled for June 9th, 2024 IS POSTPONED.
    The new date will be announced well in advance to reduce the
    impact to computational researches.

Loading default-environment
  Loading requirement: slurm/current

[baes03@bigpurple-ln2 ~]$

You are in the login node. Your command prompt shows ln2.

Developing your script in an interactive session

Use this workflow only for development purposes. Your final results must come from a batch job.

Read this first: https://hpcmed.org/guide/slurm#headings2

1. Log in to an actual computing node.

Run this command: srun -p cpu_short --mem-per-cpu=4G -t 00-02:00:00 --pty bash

[baes03@bigpurple-ln2 ~]$ srun -p cpu_short --mem-per-cpu=4G -t 00-02:00:00  --pty bash
srun: job 48302015 queued and waiting for resources
srun: job 48302015 has been allocated resources
[baes03@cn-0012 ~]$

Now your command prompt shows cn-0012.

You can request more memory by increasing --mem-per-cpu and longer runtime by increasing -t.

2. Load modules

Loading stata: module load stata

Loading R: module load r

List of available modules: https://hpcmed.org/guide/modules

3. Starting programs

Stata (CLI): stata

Stata (GUI): xstata (You need X11. Use MobaXTerm or XQuartz.)

R (CLI): R

Where to find datasets

SRTR

Go to /gpfs/data/massielab/data/srtr. You’ll see many subdirectories named srtrYYMM.

YY and MM are the year and month of the release. SRTR releases datasets quarterly, plus when necessary. We will use the latest release with the standard analysis files (SAFs) in most cases.

You’ll probably start with the TX_KI dataset.

Data dictionary available here: https://www.srtr.org/requesting-srtr-data/saf-data-dictionary/

USRDS

The latest release available here: /gpfs/data/easelab/data/USRDS/stata

Previous versions (e.g. 2022 release) here. You probably won’t need this: /gpfs/data/easelab/data/USRDS/USRDS2022

USRDS researcher’s guide has data file descriptions, data dictionary, and more: https://www.niddk.nih.gov/about-niddk/strategic-plans-reports/usrds/for-researchers/researchers-guide

Running your completed script as a batch job

R

Suppose you want to run an R script called demo.r, which is stored at ~/demo_project.

Create a text file called run_demo.sh, with the script below. Make sure you update [YOUR EMAIL]

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --time=10:00:00
#SBATCH --mem=8GB
#SBATCH --job-name=demo
#SBATCH --mail-user=[YOUR EMAIL]
#SBATCH --mail-type=END
#SBATCH --output=/dev/null
#SBATCH --error=/dev/null

module load r

cd ~/demo_project
R CMD BATCH --no-restore --no-save demo.r

Now, go to the directory where you stored run_demo.sh, and run sbatch run_demo.sh

Stata

Suppose you want to run a Stata script called demo.do, which is stored at ~/demo_project.

Create a text file called run_demo.sh, with the script below. Make sure you update [YOUR EMAIL]

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --time=10:00:00
#SBATCH --mem=8GB
#SBATCH --job-name=demo
#SBATCH --mail-user=[YOUR EMAIL]
#SBATCH --mail-type=END
#SBATCH --output=/dev/null
#SBATCH --error=/dev/null

module load stata

cd ~/demo_project
stata-mp -b do demo.do

Now, go to the directory where you stored run_demo.sh, and run sbatch run_demo.sh

Check the status of your batch jobs

squeue -u $USER

Cancel batch jobs

Basic command: scancel [JOBID]

In action:

Check your JOBID first

[baes03@cn-0032 bin]$ squeue -u $USER
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          48302494 cpu_short     bash   baes03  R       0:27      1 cn-0032

Then cancel it! scancel 48302494

Creating shortcuts for `sbatch`

Go to ~/bin. If you don't have one, create one.

cd ~/
mkdir bin
cd ~/bin

Create two files.

First, create srun_r (no extension). Open the file in any text editor (like the MobaXTerm internal editor). Put the script below into srun_r and save it.

echo '#!/bin/bash
module load stata
stata-mp -b do '$1 | sbatch --output=/dev/null --error=/dev/null --job-name=$1 --time=24:00:00 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=6G --mail-user=[YOUR EMAIL HERE] --mail-type=END,FAIL

Second, create srun_stata (no extension). Put the script below.

echo '#!/bin/bash
module load r
R CMD BATCH --no-restore --no-save '$1 | sbatch --output=/dev/null --error=/dev/null --job-name=$1 --time=24:00:00 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=6G --mail-user=[YOUR EMAIL HERE] --mail-type=END,FAIL

All done! Let's try these.

Go to your project directory, and run your script. Let's say its name is demo.r

cd ~/demo_project
srun_r demo.r

If Stata,

cd ~/demo_project
srun_stata demo.do

sbae / ultraviolet-tutorial Goto Github PK