Giter Site home page Giter Site logo

laion_idle_cap's Introduction

NOTE: INTENDED FOR INTERNAL USE ONLY

This is an internal repositroy of LAION. Using this script outside the LAION cluster will fail.

Dockerfile for Idle Captioning

This script generates synthetic captions for images of the LAION text-image datasets to utilize GPUs during 'idle' periods.

Setup & Run

Installing and running the captioning script on a fresh machine:

  1. Run git clone https://github.com/andreaskoepf/laion_idle_cap.git to clone this repository on the new machine.
  2. Run cd laion_idle_cap to change to the newly created directory.
  3. Run ./install_docker.sh to install nvidia-docker.
  4. Run ./pull.sh to pull the captioning docker image which contains all dependencies.
  5. Run ./start.sh --gpus 0-7 --workers 2 to start the captioning script (detached) in a new docker container. If the --gpus option is omitted all available GPUs are used. To select specific devices use comma separated device indices or indice-ranges (e.g. 1-3 or 0,2,4). The --workers option allows to launch more then one worker per GPU (recommended is 2 for full GPU utilization, default is 1).
  6. Optionally run ./attach.sh to attach your terminal to the running instance of the captioning script and see its output.

Note: Some of the scripts (e.g. start.sh and stop.sh) fail when they are launched by a user who is is not member of the docker group (e.g. you might see an error like 'permission denied to connect to the docker deamon socket'). In this case please use sudo to run them as superuser.

Stopping the Docker Container

  • run ./stop.sh or docker stop laion_cah

Other Script Files

  • start_bash.sh starts the docker container and launches bash (start attached, source will be mounted to /mnt/src)
  • start_dev.sh maps the file docker/c_h2.py into a new docker container and starts the script attached (useful for testing changes made outside the docker container, e.g. during development).
  • build.sh builds the docker image (e.g. laion_idle_cah:v0)
  • save_image.sh writes the docker image into a tar file
  • push.sh push the docker container to docker hub

laion_idle_cap's People

Contributors

andreaskoepf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

laion_idle_cap's Issues

option to spin up single-GPU jobs respectively bound to idle GPUs

rather than a single container allocating all of the available GPUs, assigning a container to each GPU makes it a bit more flexible. Full motivation:

so one of the nodes I'm on is a general-purpose compute I'm sharing with other members of a working group.
I'm imagining if the node is idle spinning up the laion idle-cap and we point it at the however many GPUS are idle
but that means that if someone wants to grab one of those GPUs, they need to stop the container and re-start it with a different gpu count, which stops the process everywhere
so my thought is: what if we modify the start.sh script to map containers to each GPU individually. that way, when someone wants GPUs, they can just stop the idlecap jobs mapped to those GPUs, which conversely could be trivially resumed since the respective containers were just stopped.

additionally, we could probably add a background daemon to check to see if any GPUs have been idle for a while and incrementally annex idle GPUs into the captioning job pool

File system usage

Hi, please explain where the results are stored? is it off node? we do not want the disk to fill up completely

detecting defective GPU

At nvidia-smi we see a gpu staying on 0% at all times. Is there a test we can make in order to determine if the GPU is defective?

ubuntu in docker group

Note: Some of the scripts (e.g. start.sh and stop.sh) fail when they are launched by a user who is is not member of the docker group (e.g. you might see an error like 'permission denied to connect to the docker deamon socket'). In this case please use sudo to run them as superuser.

From my experiments, the install is asking if ubuntu user should be added to docker group with Y as default answer. It requires a reboot and then pull.sh and the rest can be ran as ubuntu user.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.