Giter Site home page Giter Site logo

galyleo's Introduction

galyleo

galyleo is a shell utility to help you launch Jupyter notebooks on high-performance computing (HPC) systems in a simple, secure way. It works with SDSC's Satellite reverse proxy service and a batch job scheduler like Slurm to provide each Jupyter notebook server you start with its own one-time, token-authenticated HTTPS connection between the compute resources of the HPC system the notebook server is running on and your web browser. This secure connection affords both privacy and integrity to the data exchanged between the notebook server and your browser, helping protect you and your work against network eavesdropping and data tampering.

Quick Start Guide

galyleo is currently deployed on the following HPC systems at SDSC:

To use galyleo, you first need to prepend its install location to your PATH environment variable. This path is different for each HPC system at SDSC.

On Expanse, use:

export PATH="/cm/shared/apps/sdsc/galyleo:${PATH}"

On TSCC, there is now a software module available for loading galyleo into your environment.

[mkandes@login1 ~]$ module load galyleo/0.7.4 
[mkandes@login1 ~]$ module list

Currently Loaded Modules:
  1) shared   2) cpu/0.17.3   3) slurm/tscc/23.02.7   4) sdsc/1.0   5) DefaultModules   6) galyleo/0.7.4

[mkandes@login1 ~]$ which galyleo
/cm/shared/apps/spack/0.17.3/cpu/opt/spack/linux-rocky9-cascadelake/gcc-11.2.0/galyleo-0.7.4/galyleo
[mkandes@login1 ~]$ echo $PATH
/cm/shared/apps/spack/0.17.3/cpu/opt/spack/linux-rocky9-cascadelake/gcc-11.2.0/galyleo-0.7.4:/tscc/nfs/home/mkandes/.local/bin:/tscc/nfs/home/mkandes/bin:/cm/shared/apps/sdsc/1.0/bin:/cm/shared/apps/sdsc/1.0/sbin:/cm/shared/apps/slurm/current/sbin:/cm/shared/apps/slurm/current/bin:/cm/shared/apps/spack/0.17.3/cpu/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin
[mkandes@login1 ~]$

Once galyleo is in your PATH, you can then use its launch command to create a secure Jupyter notebook session. A number of command-line options will allow you to configure:

  • the compute resources required to run your Jupyter notebook session;
  • the type of Jupyter interface you want to use for the session and the location of the notebook working directory; and
  • the software environment that contains the jupyter notebook server and the other software packages you want to work with during the session.

For example, the following launch command will create a 30-minute JupyterLab session on two CPU-cores with 4 GB of memory on one of Expanse's shared AMD compute nodes using the base anaconda3 distribution available in its default cpu software module environment.

galyleo launch --account abc123 --partition shared --cpus 2 --memory 4 --time-limit 00:30:00 --env-modules cpu/0.17.3b,anaconda3/2021.05

When the launch command completes successfully, you will be issued a unique HTTPS URL generated for your secure Jupyter notebook session.

https://wages-astonish-recapture.expanse-user-content.sdsc.edu?token=1abe04ac1703ca623e4e907cc37678ae

Copy and paste this HTTPS URL into your web browser. Your Jupyter notebook session will begin once the requested compute resources are allocated to your job by the scheduler.

Command-line options

A list of the most commonly used command-line options for the launch command are described below.

Scheduler options:

  • -A, --account: charge the compute resources required by this job to the specified account or allocation project id
  • -p, --partition: select the resource partition or queue the job should be submitted to
  • -c, --cpus: number of cpus to request for the job
  • -m, --memory: amount of memory (in GB) required for the job
  • -g, --gpus: number of GPUs required for the job
  • -t, --time-limit: set a maximum runtime (in HH:MM:SS) for the job
  • -C, --constraint: apply a feature constraint to specify the type of compute node required for the job

Jupyter options:

  • -i, --interface: select the user interface for the Jupyter notebook session; the only options are lab or notebook or voila
  • -d, --notebook-dir: path to the working directory where the Jupyter notebook session will start; default value is your $HOME directory

Software environment options:

  • -e, --env-modules: comma-separated list of environment modules that will be loaded to create the software environment for the Jupyter notebook session
  • -s, --sif: path to a Singularity container image file that will be run to create the software environment for the Jupyter notebook session
  • -B, --bind: comma-separated list of user-defined bind paths to be mounted within a Singularity container
  • --nv: enable NVIDIA GPU support when running a Singularity container
  • --conda-env: name of a conda environment to activate to create the software environment for the Jupyter notebook session
  • --conda-yml: path to an environment.yml file
  • --mamba: use mamba instead of miniconda to create your conda environment from an environment.yml file.
  • --cache: cache your conda environment created from an environment.yml file using conda-pack; a cached environment will be unpacked and reused if the environment.yml file does not change

Defining your software environment

After you specify the compute resources required for your Jupyter notebook session using the Scheduler options outlined above, the next most important set of command-line options for the launch command are those that help you define the software environment. Listed in the Software environment options section above, these command-line options are discussed in detail in the next few subsections below.

Environment modules

Most HPC systems use a software module system like Lmod or Environment Modules to provide you with a convenient way to dynamically load pre-installed software applications, libraries, and other packages into your shell's environment.

If you need to module load any software to create the environment for your Jupyter notebook session, you can do so by including them as a comma-separated list to the --env-modules option in your launch command. Each module included in the list will be loaded prior to starting jupyter. In some cases, the --env-modules command-line option may be the only one you need to define your software environment. For example, if you have a standard Python-based data science workflow that you want run on Expanse, then you might only need to load one of the Anaconda distributions available in its software module environment.

galyleo launch --account abc123 --partition shared --cpus 2 --memory 4 --time-limit 00:30:00 --env-modules cpu/0.17.3b,anaconda3/2021.05

By default, each Anaconda distribution comes with over 250 of the most popular data science software packages pre-installed, including jupyter.

Singularity containers

Singularity containers bring operating system-level virtualization to scientific and high-performance computing, allowing you to package complete software environments --- including operating systems, software applications, libraries, and data --- in a simple, portable, and reproducible way, which can then be executed and run almost anywhere.

If you have a Singularity container that you would like to run your Jupyter notebook session within, then you simply need to provide a path to the container with the --sif option in your launch command. This will start jupyter within the container using the singularity exec command. If necessary, you can also pass user-defined --bind mounts to the container and enable NVIDIA GPU support via the --nv flag.

One of the most powerful features of Singularity is its ability to convert an existing Docker container to a Singularity container. So, even if you are not familiar with how to build your own Singularity container, you can always search public container registries like Docker Hub for an existing container that may help you get your work done.

For example, let's say you need an R environment for your Jupyter notebook session. Why not try the latest r-notebook container from the Jupyter Docker Stacks project? To get started, you first use the singularity pull command to download and convert the Docker container to a Singularity container.

singularity pull docker://jupyter/r-notebook:latest

Once all of the layers of the Docker container have been downloaded and the container conversion process is complete, you can then launch your Jupyter notebook session with the newly built Singularity container.

galyleo launch --account abc123 --cpus 2 --memory 4 --time-limit 00:30:00 --sif r-notebook_latest.sif

On some systems like Expanse, you may need to load Singularity via the software module environment as well.

galyleo launch --account abc123 --cpus 2 --memory 4 --time-limit 00:30:00 --env-modules singularitypro --sif r-notebook_latest.sif --bind /expanse,/scratch

Here, the user-defined --bind mount option also enables access to both the /expanse network filesystems (e.g., /expanse/lustre) and the local NVMe /scratch disk(s) available on each compute node from within the container. By default, only your $HOME directory is accessible from within the container.

Singularity also provides native support for running containerized applications on NVIDIA GPUs. If you have a GPU-accelerated application you would like to run during your Jupyter notebook session, please make sure your container includes a CUDA-enabled version of the application that can utilize NVIDIA GPUs.

NVIDIA distributes a number of GPU-optimized containers via their container registry. This includes containers for all of the most popular deep learning frameworks --- PyTorch, TensorFlow, and MXNet --- with jupyter pre-installed. Like the the containers available from DockerHub, you can pull these containers to the HPC system you are working on

singularity pull docker://nvcr.io/nvidia/pytorch:21.07-py3

and then launch your Jupyter notebook session with galyleo. For example, you might want to run this PyTorch container on a single NVIDIA V100 GPU available in Expanse's gpu-shared partition.

galyleo launch --account abc123 --partition gpu-shared --cpus 10 --memory 93 --gpus 1 --time-limit 00:30:00 --env-modules singularitypro --sif pytorch_21.07-py3.sif --bind /expanse,/scratch --nv 

Note, however, how you request GPU resources with galyleo may be different from one HPC system to another. For example, on Comet you must use the --gres command-line option on Comet to specify both the type and number of GPUs required for your Jupyter notebook session. The following launch command would create a session within the NVIDIA PyTorch container on a single P100 GPU available in Comet's gpu-shared partition.

galyleo launch --account abc123 --partition gpu-shared --cpus 7 --gres gpu:p100:1 --time-limit 00:30:00 --sif pytorch_21.07-py3.sif --bind /oasis,/scratch --nv

In contrast, on TSCC you'll never explicitly request a specific number of GPUs for your Jupyter notebook session. All GPUs on TSCC are currently allocated implicitly in proportion to the number of CPU-cores requested by a job and available on the type of GPU-accelerated compute node you expect it to run on. And if you would like to request your notebook session be scheduled on a certain type of GPU, then you must pass the type of GPU required listed in the pbsnodes properties via the --constraint command-line option. For example, the following launch command will schedule your session on one of the NVIDIA GeForce RTX ] 2080Ti GPUs available in the gpu-hotel queue on TSCC.

galyleo launch --account abc123 --partition gpu-hotel --cpus 2 --constraint gpu2080ti --time-limit 00:30:00 --sif pytorch_21.07-py3.sif --bind /oasis --nv

Whatever you do, whenever you're launching your Jupyter notebook session with galyleo from a Singularity container on compute resources with NVIDIA GPUs, don't forget the include the --nv flag.

Conda environments

Conda is an open-source software package and environment manager developed by Anaconda Inc.. Its ease of use, compatibility across multiple operating systems, and comprehensive support for both the Python and R software ecosystems has made it one of the most popular ways to build and maintain custom software environments in the data science and machine learning communities. And because of the constantly evolving software landscape in these spaces, which can involve quite complex software dependencies, conda is often the simplest way to get your custom Python or R software environment up and running on an HPC system.

galyleo supports the use of conda environments to configure the software environment for your Jupyter notebook session. If you've already installed a conda distribution --- we recommend Miniconda --- and configured a custom conda environment within it, then you should only need to specify the name of the conda environment you want to activate for your notebook session with the --conda-env command-line option.

For example, let's imagine you've already created a custom conda environment from the following environment.yml file.

name: notebooks-sharing

channels:
  - conda-forge
  - anaconda

dependencies:
  - python=3.7
  - jupyterlab=3
  - pandas=1.2.4
  - matplotlib=3.4.2
  - seaborn=0.11.0
  - scikit-learn=0.23.2

You should then be able to launch a 30-minute JupyterLab session on a four CPU-cores with 8 GB of memory on one of Expanse's shared AMD compute nodes by simply activating the notebooks-sharing environment.

galyleo launch --account abc123 --partition shared --cpus 4 --memory 8 --time-limit 00:30:00 --conda-env notebooks-sharing

Note, however, the use of the --conda-env command-line option here assumes you've already configured your ~/.bashrc file with the conda init command. If you have not done so (or choose not to do so), then you can also initialize any conda distribution in your launch command by providing the path to its conda.sh initialization script in the etc/profile.d directory via the --conda-init command-line option.

galyleo launch --account abc123 --partition shared --cpus 4 --memory 8 --time-limit 00:30:00 --conda-env notebooks-sharing --conda-init miniconda3/etc/profile.d/conda.sh

While creating your own custom software environment with conda may be convenient, it can also generate a high metadata load on the types of shared network filesystems you'll often find on an HPC system. At a minimum, if you install your conda distribution on a network filesystem, you can expect this to increase the installation time of software packages into your conda environment when compared to a local filesystem installation you may have done previously on your laptop. Under some circumstances, this metadata issue can lead to a serious degradation of the aggregate I/O performance across a filesystem, affecting the performance of all user jobs on the system.

If you have not yet installed your conda environment on a shared filesystem (such as in your $HOME directory), galyleo now also allows you to dynamically create the environment at runtime from an environment.yml file. To use this feature, you simply need to provide the name of the environment.yml file with the --conda-yml command-line option. For example, if you wanted to start an Juoyter notebook session with the notebooks-sharing environment, you would use the following command:

galyleo launch --account abc123 --partition shared --cpus 4 --memory 8 --time-limit 00:30:00 --conda-env notebooks-sharing --conda-yml environment.yml

You can further improve the installation performance and reuse of these dynamically generated conda environments by using the new --mamba and --cache command-line options, which enables the use of Mamba to speed up software installs and saves the completed conda environment using conda-pack for future reuse, respectively.

galyleo launch --account abc123 --partition shared --cpus 4 --memory 8 --time-limit 00:30:00 --conda-env notebooks-sharing --conda-yml environment.yml --mamba --cache

Debugging your session

If you experience a problem launching your Jupyter notebook session with galyleo, you may be able to debug the issue yourself by reviewing the batch job script generated by galyleo or the standard output/error file generated by the job itself. You can find these files stored in the hidden ~/.galyleo directory created in your HOME directory.

Additional Information

Expanse User Portal

galyleo has been integrated with the Open OnDemand-based Expanse User Portal to help simplify launching Jupyter notebooks on Expanse. After logging into the portal, you can access this web-based interface to galyleo from the Interactive Apps tab in the toolbar across the top of your browser, then select Jupyter.

Containers

SDSC builds and maintains a number of custom Singularity containers for use on its HPC systems. Pre-built copies of many of these containers are made available from a central storage location on each HPC system. Please check the following locations for the latest containers. If you do not find the container you're looking for, please feel free to contact us and make a request for a container to be made available.

On Expanse:

  • /cm/shared/apps/containers/singularity

Status

A work in progress.

Contribute

If you would like to contribute to the project, then please submit a pull request via GitHub. If you have a feature request or a problem to report, then please create a GitHub issue.

Author

Marty Kandes, Ph.D.
Computational & Data Science Research Specialist
High-Performance Computing User Services Group Data-Enabled Scientific Computing Division San Diego Supercomputer Center
University of California, San Diego

Version

0.7.6

Last Updated

Monday, May 6th, 2024

galyleo's People

Contributors

mkandes avatar

Stargazers

Hampton Copeland avatar  avatar Akshay Kulkarni avatar Jeho Park avatar Hovakim Grabski avatar BPY avatar Seyed (Yahya) Shirazi avatar Killian Sheriff avatar Zixian Wang avatar Deanna avatar  avatar Adam Hughes avatar  avatar M. Eric Irrgang avatar Susan Howard avatar Tim Cera avatar Patrick Bunn avatar Taher Chegini avatar Jacob avatar Brian Yee avatar Shengjie Xu avatar Ajay Khanna avatar  avatar Peter Rose avatar Caitlin Guccione avatar jag2231 avatar Mike Farley avatar Liam Ekblad avatar Brett Copeland avatar Lucas Patel avatar Yang Eric Li avatar Clarence Mah avatar Arya Massarat avatar Marvin Poul avatar Berhane avatar Zhaoyi Li avatar

Watchers

James Cloos avatar  avatar

galyleo's Issues

Support multiple conda installations

In addition to supporting multiple conda environments from the conda installation configured in a user's ~/.bashrc file, allow a user to source the conda.sh file from another installation, which would override the installation sourced in their ~/.bashrc file.

Potential conda-pack issue?

@StevenYeu and @kenneth59715 reported that NSG is having a problem using the --cache feature when testing HNN + Voila where they need to remove (rm) the ~/.galyleo directory from their HOME prior to subsequent launches for the environemnt to startup correctly. The galyleo launch command is:

galyleo launch --account csd403 --partition debug --cpus 16 --memory 32 --time-limit 00:30:00 --conda-env hnn --conda-yml hnn.yaml --mamba --cache

and the conda environment file is ...

name: hnn

channels:
  - defaults

dependencies:
  - python=3.11
  - numpy
  - scipy
  - matplotlib
  - jupyterlab
  - mpi4py
  - ipywidgets
  - lxml
  - pip:
      - voila
      - neuron
      - hnn_core[gui]

I've asked them for a set of the job script(s) and their corresponding standard output to confirm where the issue may be occuring.

Add user-specified module paths by a new option --modulepathadd

Allow user to use their own module files that are added to MODULEPATH. The option --modulepathadd can allow one or multiple paths users specified to be added to MODULEPATH by inserting a command line export MODULEPATH=<specified_path(s)>:$MODULEPATH after module reset to the galyleo script.

Remove support for PBS/Torque schedulers

This limited support for PBS/Torque schedulers was originally provided to deploy galyleo on TSCC @ SDSC. However, TSCC has recently undergone an hardware refresh and system overhaul. The new TSCC2 now runs Slurm like (most) of the other HPC systems at SDSC. Any future multi-scheduler support for galyleo is only planned to be inlcuded once a complete Python-based re-write can be made, which is tentatively named galyleo2. Planned scheduler/job/cloud submission support in galyleo2 will include:

  • Slurm
  • HTCondor
  • K8s
  • OpenStack
  • AWS
  • Azure

Running in debug queue needs specifying amount of memory

the sample script you provide in the README works fine even if I do not specify memory for compute and for shared. Instead if I use debug:

galyleo launch --account sds166 --partition debug --cpus 1 --time-limit 00:30:00 --env-modules cpu,gcc,anaconda3

I get:

sbatch: error: Batch job submission failed: Requested node configuration is not available

workaround is to always specify --memory

-d, --notebook-dir option not quite working as expected

If you open two notebook sessions on the same system one after another with different working directory paths specified with the -d, --notebook-dir option, the second notebook session still starts in the working directory of the first notebook session. The problem may be Jupyter caching the working directory of the first notebook session in the ~/.jupyter directory. Investigate and provide a fix, if and when possible

Support all common SHELLs

The current version of galyleo is not compatible with all SHELLs. e.g., users of zsh on Expanse have recently reported problems with the new validation checks/tests now performed prior to submitting the galyleo job to the scheduler.

This will not be a simple issue to resolve. But it may be one of the first key issues that could motivate transitioning galyleo from (Ba)sh to Python in the future.

Remove the use of miniconda from galyleo; restrict use of default channels

Anaconda, Inc has recently changed their terms of service.

https://legal.anaconda.com/policies/en/?name=terms-of-service#terms-of-service

We should plan to replace the use of the miniconda installer and/or make the installer change an option for the end user. And we should attempt to restrict the use of the default Anaconda channels, except to users of galyleo that acknowledge they are aware of the new terms of service from Anaconda, Inc.

Conda-yaml should not be required if conda environment specified

Debugging an issue with one of our users and came across an issue. The user specified a conda-init that does not include Jupyter, but can be used to launch conda. They also specified a conda environment that contains Jupyter. Launching the notebook fails with the error:

/cm/shared/apps/sdsc/galyleo/galyleo: line 459: jupyter: command not found ERROR :: No jupyter executable was found within the software environment. ERROR :: galyleo launch command failed.

On line 414 of galyleo.sh the check only activates the environment specified if a conda-yaml file is also specified.

Why is the yaml file required?
image

Make command-line parsing of launch options more robust

For example, the --env-modules option will fail when loading multiple modules and place a space in a comma-separated list.

[mkandes@login02 ~]$ export PATH="/cm/shared/apps/sdsc/galyleo:${PATH}"
[mkandes@login02 ~]$ galyleo launch --account use300 --partition shared --cpus 2 --memory 4 --time-limit 00:30:00 --env-modules cpu/0.17.3b, anaconda3/2021.05
ERROR :: Command-line option anaconda3/2021.05 not recognized or not supported.
ERROR :: galyleo_launch command failed.
[mkandes@login02 ~]$ galyleo launch --account use300 --partition shared --cpus 2 --memory 4 --time-limit 00:30:00 --env-modules cpu/0.17.3b,anaconda3/2021.05
Preparing galyleo for launch into Jupyter orbit ...
Listing all launch parameters ...
  command-line options       : values
    -A | --account           : use300
    -R | --reservation       : 
    -p | --partition         : shared
    -q | --qos               : 
    -N | --nodes             : 1
    -c | --cpus              : 2
    -m | --memory            : 4 GB
    -g | --gpus              : 
       | --gres              : 
    -t | --time-limit        : 00:30:00
    -C | --constraint        : 
    -j | --jupyter           : lab
    -d | --notebook-dir      : 
       | --scratch-dir       : "/scratch/${USER}/job_${SLURM_JOB_ID}"
    -e | --env-modules       : cpu/0.17.3b,anaconda3/2021.05
    -s | --sif               : 
    -B | --bind              : 
       | --nv                : 
       | --conda-init        : 
       | --conda-env         : 
       | --conda-yml         : 
       | --conda-version     : latest
       | --mamba             : false
       | --cache             : false
       | --spark-home        : 
       | --disable-checklist : false
       | --checklist-timeout : 10 s
    -Q | --quiet             : 1
Your token is 
hurling-impose-headstand
200
Generating Jupyter launch script ...
Submitted Jupyter launch script to Slurm. Your SLURM_JOB_ID is 29140433.
Success! Token linked to jobid.
Please copy and paste the HTTPS URL provided below into your web browser.
Do not share this URL with others. It is the password to your Jupyter notebook session.
Your Jupyter notebook session will begin once compute resources are allocated to your job by the scheduler.
https://hurling-impose-headstand.expanse-user-content.sdsc.edu?token=7261f043f897ad2253f875f7e

See https://sdsc.zendesk.com/agent/tickets/34202

Create command-line option to bypass pre-launch validation checklist

New validation checks/tests added in the past month are not SHELL agnostic and can create problems with users not using BASH as their SHELL. A simple bypass command-line option to avoid this validation checklist prior to job submission to the scheduler should help these users continue to use galyleo in the near-term. However, long-term the best solution would be to either create SHELL agnostic checks/tests and/or create checks/tests that are specific to the most common SHELLs in use and have galyleo use the appropriate checks based on the user's SHELL.

Potential IFS-related bug in galyleo launch command's checklist section

@mulroony reported a potential IFS-realted bug in the galyleo launch command's checklist section of the code. This was his full report and suggested fix provided in ZEN-33452.

Was looking at an issue with getting Galyleo to work and found a bug. In
the section where you do the checklist (starts at line 350) in the
section that checks the passed environment modules (starting at line
398). The variable IFS is changed and then unset. Doing so changes the
behavior of line 476. Instead of being one line it splits it into 3
which is unexpected by the parsing.

I would recommend changing the IFS code, line 398 from:

    # Check if all environment modules specified by the user, if
any, are
    # available and can be loaded successfully. If not, then halt
the launch.
    if [[ -n "${env_modules}" ]]; then
      IFS=','
      read -r -a modules <<< "${env_modules}"
      unset IFS
      for module in "${modules[@]}"; do
        module load "${module}"
        if [[ $? -ne 0 ]]; then
          slog error -m "module not found: ${module}"
          return 1
        fi
      done
    fi

To:

    # Check if all environment modules specified by the user, if
any, are
    # available and can be loaded successfully. If not, then halt
the launch.
    if [[ -n "${env_modules}" ]]; then
      OLDIFS="${IFS}"
      IFS=','
      read -r -a modules <<< "${env_modules}"
      IFS="${OLDIFS}"
      unset OLDIFS
      for module in "${modules[@]}"; do
        module load "${module}"
        if [[ $? -ne 0 ]]; then
          slog error -m "module not found: ${module}"
          return 1
        fi
      done
    fi

This bug is likely the root cause of this previously reported launch issue:

Your token is
nuptials-recopy-epidermal
200
/cm/shared/apps/sdsc/galyleo/galyleo: line 476: is
nuptials-recopy-epidermal
200: syntax error in expression (error token is "nuptials-recopy-epidermal
200")

For example, see these other Zendesk tickets for reference:

The prior quick fix solution was simply to utilize the --disable-checklist command-line option.

Implement error check to make sure environment.yml file exists? And path problem?

WARNING :: Using a packaged conda environment; cannot check if Jupyter is available prior to launch.
Your token is 
atom-stoning-barmaid
200
Generating Jupyter launch script ...
cp: cannot stat ‘/home/mkandes/galyleo/notebooks-sharing.yml’: No such file or directory
md5sum: notebooks-sharing.md5: No such file or directory
Submitted Jupyter launch script to Slurm. Your SLURM_JOB_ID is 43024488.
Success! Token linked to jobid.
Please copy and paste the HTTPS URL provided below into your web browser.
Do not share this URL with others. It is the password to your Jupyter notebook session.
Your Jupyter notebook session will begin once compute resources are allocated to your job by the scheduler.
https://atom-stoning-barmaid.comet-user-content.sdsc.edu?token=63d94214399ab284db5c8e9732c71728
[mkandes@comet-ln3 ~]$

job doesn't leave queue on TSCC

Hi @mkandes,

Thanks for creating such a useful tool! I am really looking forward to using this on a day to day basis.

I've been having some trouble with it, and I'm wondering if you might have any suggestions. Basically, the notebook job never seems to leave the queue. I'm running on TSCC with the following command:

galyleo launch --account amassara --partition home-gymrek --cpus 1 --memory 2 --time-limit 00:30:00 --conda-env jupyter --conda-init /home/amassara/miniconda3/etc/profile.d/conda.sh

I've installed jupyter lab in a conda environment named jupyter. The galyleo launch command seems to work fine, but when I navigate to the webpage that it gives me, it just says that it's been submitted to the queue. It never leaves the queue after that.
I was worried that this was just because the queue was overwhelmed, so I tried to submit another job to the queue with a longer requested walltime. The longer job got executed but the notebook job never did. Also, I looked to see if there were any other jobs queued up at the time, and there weren't.

I would appreciate any suggestions you might be able to give.

Also, quick question: Does the -A option refer to our own account username? Or should that be the lab's ID, like gymreklab or something similar?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.