Giter Site home page Giter Site logo

deepspeed-windows's Introduction

DeepSpeed Version 13.1 with CUDA 12.1 - Installation Instructions:

  1. Download the 13.1 release of DeepSpeed 13.1 extract it to a folder.

  2. Install Visual C++ build tools, such as VS2019 C++ x64/x86 build tools.

  3. Download and install the Nvidia Cuda Toolkit 12.1

  4. Edit your Windows environment variables to ensure that CUDA_HOME and CUDA_PATH are set to your Nvidia Cuda Toolkit path. (The folder above the bin folder that nvcc.exe is installed in). Examples are:
    set CUDA_HOME=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1
    set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1

  5. OPTIONAL If you do not have an python environment already created, you can install Miniconda, then at a command prompt, create and activate your environment with:
    conda create -n pythonenv python=3.11
    activate pythonenv

  6. Launch the Command Prompt cmd with Administrator privilege as it requires admin to allow creating symlink folders.

  7. Install PyTorch, 2.1.2 with CUDA 12.1 into your Python 3.11 environment e.g:
    activate pythonenv (activate your python environment)
    conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia

  8. In your python environment check that your CUDA_HOME and CUDA_PATH are still pointing to the correct location.
    set (to list and check the windows environment variables. Refer to step 4 if not)

  9. Navigate to your deepspeed folder in the Command Prompt:
    cd c:\deepspeed (wherever you extracted it to)

  10. Modify the following files:

deepspeed-0.13.1/build_win.bat - at the top of the file, add:

set DS_BUILD_EVOFORMER_ATTN=0

set DS_BUILD_CUTLASS_OPS=0
set DS_BUILD_RAGGED_DEVICE_OPS=0
set DS_BUILD_INFERENCE_CORE_OPS=0

deepspeed-0.13.1/csrc/quantization/pt_binding.cpp - lines 244-250 - change to:

    std::vector<int64_t> sz_vector(input_vals.sizes().begin(), input_vals.sizes().end());
    sz_vector[sz_vector.size() - 1] = sz_vector.back() / devices_per_node;  // num of GPU per nodes
    at::IntArrayRef sz(sz_vector);
    auto output = torch::empty(sz, output_options);

    const int elems_per_in_tensor = at::numel(input_vals) / devices_per_node;
    const int elems_per_in_group = elems_per_in_tensor / (in_groups / devices_per_node);
    const int elems_per_out_group = elems_per_in_tensor / out_groups;

deepspeed-0.13.1/csrc/transformer/inference/csrc/pt_binding.cpp lines 541-542 - change to:

									 {static_cast<unsigned>(hidden_dim * InferenceContext::Instance().GetMaxTokenLength()),
									  static_cast<unsigned>(k * InferenceContext::Instance().GetMaxTokenLength()),

lines 550-551 - change to:

						 {static_cast<unsigned>(hidden_dim * InferenceContext::Instance().GetMaxTokenLength()),
						  static_cast<unsigned>(k * InferenceContext::Instance().GetMaxTokenLength()),

line 1581 - change to:

		at::from_blob(intermediate_ptr, {input.size(0), input.size(1), static_cast<int64_t>(mlp_1_out_neurons)}, options);

deepspeed-0.13.1/deepspeed/env_report.py line 10 - add:

import psutil

line 83 - 100 - change to:

def get_shm_size():
    try:
        temp_dir = os.getenv('TEMP') or os.getenv('TMP') or os.path.join(os.path.expanduser('~'), 'tmp')
        shm_stats = psutil.disk_usage(temp_dir)
        shm_size = shm_stats.total
        shm_hbytes = human_readable_size(shm_size)
        warn = []
        if shm_size < 512 * 1024**2:
            warn.append(
                f" {YELLOW} [WARNING] Shared memory size might be too small, consider increasing it. {END}"
            )
            # Add additional warnings specific to your use case if needed.
        return shm_hbytes, warn
    except Exception as e:
        return "UNKNOWN", [f"Error getting shared memory size: {e}"]
  1. While still in your command line with python environment enabled run:
    build_win.bat

  2. Once you are done building there should be a .whl file is present in:
    deepspeed-0.13.1/dist/

  3. Copy that file to the root of your Oobabooga folder and run:
    cmd_windows.bat
    pip install deepspeed-YOURFILENAME.whl (Or whichever name your .whl has you just created)

  4. To check if its working correctly you can type the following:
    set CUDA_HOME=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1
    set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1
    (This is only needed to make the ds_report work and check if its correctly installed, and shouldnt be needed for TTS generation.)
    bash
    ds_report

deepspeed-windows's People

Contributors

s95sedan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.