Giter Site home page Giter Site logo

openhackathons-org / end-to-end-llm Goto Github PK

View Code? Open in Web Editor NEW
38.0 8.0 16.0 21.74 MB

This repository is an AI Bootcamp material that consist of a workflow for LLM

License: Apache License 2.0

Jupyter Notebook 55.73% Shell 6.22% Python 37.79% HTML 0.26% JavaScript 0.01%
deep-learning natural-language-processing p-tuning prompt-tuning nemo-megatron llm nemo-guardrails question-answering tensorrt-llm genai

end-to-end-llm's Introduction

End-to-End LLM Bootcamp

The End-to-End LLM (Large Language Model) Bootcamp is designed from a real-world perspective that follows the data processing, development, and deployment pipeline paradigm. Attendees walk through the workflow of preprocessing the SQuAD (Stanford Question Answering Dataset) dataset for Question Answering task, training the dataset using BERT (Bidirectional Encoder Representations from Transformers), and executing prompt learning strategy using NVIDIA® NeMo™ and a transformer-based language model, NVIDIA Megatron. Attendees will also learn to optimize an LLM using NVIDIA TensorRT™, an SDK for high-performance deep learning inference, guardrail prompts and responses from the LLM model using NeMo Guardrails, and deploy the AI pipeline using NVIDIA Triton™ Inference Server, an open-source software that standardizes AI model deployment and execution across every workload.

Bootcamp Content

This content contains three Labs, plus an introductory notebook and two lab activities notebooks:

  • Overview of End-To-End LLM bootcamp
  • Lab 1: Megatron-GPT
  • Lab 2: TensorRT-LLM and Triton Deployment with LLama-2-7B Model
  • Lab 3: NeMo Guardrails
  • Lab Activity 1: Question Answering task
  • Lab Activity 2: P-tuning/Prompt tuning task

Tools and Frameworks

The tools and frameworks used in the Bootcamp material are as follows:

Tutorial duration

The total Bootcamp material would take approximately 8 hours and 45 minutes. We recommend dividing the material's teaching into two days, covering Lab 1 in one session and the rest in the next session.

Deploying the Bootcamp Material

To deploy the Labs, please refer to the Deployment guide presented here

Attribution

This material originates from the OpenHackathons Github repository. Check out additional materials here

Don't forget to check out additional Open Hackathons Resources and join our OpenACC and Hackathons Slack Channel to share your experience and get more help from the community.

Licensing

Copyright © 2023 OpenACC-Standard.org. This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). These materials may include references to hardware and software developed by other entities; all applicable licensing and copyrights apply.

end-to-end-llm's People

Contributors

muntasers avatar programmah avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

end-to-end-llm's Issues

Feature Request - Addition of Benchmarking for TRT-LLM

The current TRT-LLM Materials discusses the Hands-on aspects of getting from a Model to Deployment in a Triton server.

Given that TRT-LLM focuses on Performance, we could have a section that discusses the performance aspects of TRT-LLM and the various optimisations that are available to the end user.

Issue: Deployment guide is outdated or incorrect

Deployment guide is stating the following:

_When you are inside the container, launch jupyter lab: jupyter-lab --no-browser --allow-root --ip=0.0.0.0 --port=8888 --NotebookApp.token="" --notebook-dir=/workspace.

Open the browser at http://localhost:8888 and click on the Start_here.ipynb notebook_

But when building the container there is no actual Start_here.ipynb (unless you go to archived/workspace, which indicates me that it is either deprecated or not well defined where i should look for the notebook).

Issue: NeMo container library issues and Start_Here.ipynb links conflict issues for different containers

NeMo container issues:

  • Unable to download dataset due to gdown library issue. The gdown library requires an upgrade within the nemo container.
  • Unable to connect to the server with NeMo-LLM service. To solve the issue, the NeMO guardrail library requires an upgrade within the container.

Start_Here.ipynb links conflict issues for different containers:

  • Users sometimes click on labs that run on different containers and get errors. To avoid this issue, a separate Start_Here.ipynb notebooks for each lab should be created.

Issue: Nemo_primer.ipynb imports not working.

In Nemo_primer.ipynb when doing import nemo.collections.asr as nemo_asr, import nemo.collections.nlp as nemo_nlp and
import nemo.collections.tts as nemo_tts I get the following error
ImportError: tokenizers>=0.11.1,!=0.11.3,<0.14 is required for a normal functioning of this module, but found tokenizers==0.15.2.

If I try to solve it by doing pip install tokenizers==0.13.1 I get this other error
File /usr/local/lib/python3.10/dist-packages/pytorch_lightning/_graveyard/utilities.py:25
17 def _get_gpu_memory_map() -> None:
18 # TODO: Remove in v2.0.0
19 raise RuntimeError(
20 "pytorch_lightning.utilities.memory.get_gpu_memory_map was deprecated in v1.5 and is no longer supported"
21 " as of v1.9. Use pytorch_lightning.accelerators.cuda.get_nvidia_gpu_stats instead."
22 )
---> 25 pl.utilities.memory.get_gpu_memory_map = _get_gpu_memory_map

AttributeError: partially initialized module 'pytorch_lightning' has no attribute 'utilities' (most likely due to a circular import)

It might be helpful to specify the desired package versions in the pip install inside the Dockerfile_nemo because it might be that
doing
RUN pip install lightning RUN pip install megatron.core RUN pip install --upgrade nemoguardrails RUN pip install openai RUN pip install ujson RUN pip install --upgrade --no-cache-dir gdown
is installing new and uncompatible versions of the libraries (I mean uncompatibles with the tutorials showed in the notebooks).

Issue: 98 - Address already in use, Unable to download MegatronGPT 1.3B, and Triton Server issue

  • Issues with downloading the MegatronGPT 1.3B model from google drive which cause a delay running the lab activity 2 notebook. Google drive restrict permission when it sense multiple download request. The solution is to download the files ahead before mounting workspace into the container
  • errno: 98 - Address already in use” error when running the trainer.fit() cell within the Prompt/p-tuning notebook. The solution is to set the DDP port to something else before trainer.fit (eg: os.environ['MASTER_PORT'] = PORT + )
  • Triton Server error ”mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated ” due several jobs on the same nodes. The issue can be resolved by modifying a line in the launch_triton_server.py script, from:

cmd += ' -n 1 {} --model-repository={} --disable-auto-complete-config --backend-config=python,shm-region-prefix-name=prefix{} : '.format(tritonserver, model_repo, i)_

to

cmd += ' -n 1 {} --model-repository={} --disable-auto-complete-config --backend-config=python,shm-region-prefix-name=prefix{} : '.format( tritonserver, model_repo, str(i)+os.environ['USER'])_

Feature Request: Fine-tune Llama-2-7B with Custom Dataset

This feature request is required as part of an End-to-End pipeline. The process should include:

  • dataset preprocessing
  • use of PEFT method to fine-tune llama-2-7b for text generation task
  • Fine-tuned and based model merging
  • inferencing

Issue: Many unnecessary files and folders within the NeMo Guardrails lab

Many unnecessary files and folders are included within the NeMo Guardrails lab, making navigation within the lab difficult. The lab should not have the entire clone repository but a folder containing only needed files, folders, and notebooks. The Deployment_Guide.md file should explicitly state the type of services and requirements (openai and nemo llm service) to run the lab.

Feature Request: Validating prompt response from Triton server using NeMo Guardrails

This feature request is about creating a content that demonstrate how to connect nemo guardrails to Llama-2-7b-chat TensorRT engine deployed on Triton Inference Server. This approach helps avoid the need for an Openai key and bypass NeMo-LLM Service when using NeMo guardrails to guard user prompts to/from the deployed model. You can use the LangChain framework to achieve the task.
The feature is required to complete the End-to-End LLM pipeline.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.