Giter Site home page Giter Site logo

falcontamer's Introduction

FalconTamers

28th June:- I know how to fine-tune Open-Source LLMs

8th July:- Deployed fine-tuned falcon 7b on AWS Sagemakers

14th July:- Increased the inference time of LLaMa 7b by 24x using vLLMs and deployed it

20th July:- Production-ready fine-tuned LLM deployed using vLLM

17th May'24:- Adjusted the dependencies to make the AI alive again

Start a GPU-enabled instance. do pip install vllm. If it doesn't work. Download from source. Consult vllm.a Use !python -m vllm.entrypoints.api_server --model FarziBuilder/LLama-remark-try2 --host 127.0.0.1 --port 8080 for starting the server and then run inference.py
Change the model attribute to run the right one. Consult my HF profile (FarziBuilder)
Don't use the above cmd if instantiating multiple workers, in that case:- update the api_server.py, add the run.py script and run that

Note:-

The first token generation time is affected by the max number of tokens generated. Generating all at once may take >2.5 sec, while generating in small steps (20 tokens at a time) may only take ~0.5 sec. LLaMa v1 has been fine-tuned over 80 steps on a 790 dataset. The quality can be easily improved. You can only specify the max tokens to be generated and hope that the whole statement is delivered within that limit. Sometimes, it ends mid-sentence. In addition, this is an API endpoint. It needs to be hosted on AWS EC2 (Sashakt said this wouldn't be an issue). Need to learn how to host multiple workers

What each doc does

finalVLLMtrainer:- This will fine-tune LLaMa v1 models and host on HF
fastInference.py:- You use this script after the model inference endpoint is set up
settingSagemaker.py:- This is for deploying on AWS Sagemaker and making an endpoint there. You need the model.tar.gz file for the fine-tuned model for that

LLaMaTrainer.ipynb/falcon7-try3/falcon-try3-works.ipynb:- Can ignore, these are the notebooks first used for fine-tuning LLaMa and falcon
api_server.py:- Script that needs to be changed in vllm/vllm/entrypoints for instantiating multiple workers
run.py:- Run this new script for instantiating multiple workers

New learnings

  • bitsandbytes earlier versions don't support cuda 12.4. So update likewise.
  • Remember to replace libcuda cpu

falcontamer's People

Contributors

farzibuilder avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.