Giter Site home page Giter Site logo

replitlm's Introduction

ReplitLM

Guides, code and configs for the ReplitLM model family.

This is being continuously updated to add more ways to use and build on top of our models.

Table of Contents

Models

Model Checkpoint [CC BY-SA 4.0] Vocabulary [CC BY-SA 4.0] Code [Apache 2.0]
replit-code-v1-3b Download Link Download Repo
replit-code-v1_5-3b (Coming Soon) (Coming Soon) Coming Soon

Releases

May 2, 2023: replit-code-v1-3b

Usage

Hosted Demo

We also have a GPU-powered Space for the replit-code-v1-3b model where you can use the model directly!

GPU-powered Hosted Demo

Using with Hugging Face Transformers

All released Replit models are available on Hugging Face under the Replit organization page and can be used with the Hugging Face Transformers library.

You can use the Replit models with Hugging Face Transformers library. The README for each released model has instructions on how to use the model with Hugging Face Transformers. Make sure you set the clean_up_tokenization_spaces=False when decoding with the tokenizer as well use the recommended post processing given in the README.

Model README
replit-code-v1-3b Documentation

Training and Fine-tuning

Training with LLM Foundry

We recommend any further training, pre-training and finetuning of the Replit models with MosaicML's LLM Foundry and Composer.

Our Replit models are compatible with LLM Foundry and can be trained/tuned in a highly optimized way with LLM Foundry + Composer using state of the art training techniques, architectural components, optimizers, and more. All models, LLM Foundry and the Composer training framework are Pytorch-based. Using these you can train the Replit models on your own datasets.

The following steps give you the outline of what needs to be done to train the models with links to the LLM Foundry documentation sections needed for each step:

(0) Install LLM Foundry and Requirements

Install LLM Foundry

To get started with LLM Foundry, you can follow the LLM Foundry README to:

  1. Setup the Prerequisites, the Docker file is recommended to avoid environment issues
  2. Perform the Installation steps as they recommend
  3. (Optional) Run the Quickstart steps out of the box to check everything is working

At a high-level, LLM Foundry is used by defining a configuration yaml and then running train/train.py training script in the LLM Foundry repo with the defined configuration yaml using a command like composer train/train.py <configuration_yaml_path> <extra_args>. The scripts/train/yamls dir contains example YAMLs for both finetuning and pretaining.

Install Other Requirements for the Replit Models

You will then have to install a few other dependencies specified in the requirements.txt.

(1) Convert and Save Your Dataset

To train with LLM Foundry, you need to convert your dataset to the Mosaic StreamingDataset format.

The types of dataset sources supported are JSON datasets and Hugging Face Datasets.

The Data Preparation documentation in LLM Foundry gives the steps on how to do this.

⚠️ Important ⚠️

When running the convert_dataset_hf.py or convert_dataset_json.py in the steps above, you will have to specify that you are using the Replit tokenizer by passing in the argument --tokenizer replit/replit-code-v1-3b. A key step (due to the current implementation of llm-foundry) is to edit scripts/data_prep/convert_dataset_hf.py by passing the trust_remote_code=True kwarg to the AutoTokenizer.from_pretrained call when the tokenizer is loaded in the main() method.

Testing Your Converted Dataset

To test the converted dataset and check that it's working with the dataloader, you can follow the Test the Dataloader section in LLM Foundry docs.

(2) Define a Run Configuration YAML with the Replit Models

To train with LLM Foundry, you need to define a run configuration yaml. This yaml defines the model, training dataset, eval dataset and metric, training parameters and more.

Using the Replit Models

For any config YAML you define to train/tune with LLM Foundry, you can plug in and use the Replit model by replacing the model and tokenizer keys in your YAML as follows:

...
model:
  name: hf_causal_lm
  pretrained: true
  pretrained_model_name_or_path: replit/replit-code-v1-3b
  config_overrides:
    attn_config:
      attn_impl: triton
      attn_uses_sequence_id: false

tokenizer:
  name: replit/replit-code-v1-3b
  kwargs:
    model_max_length: ${max_seq_len}
    trust_remote_code: true
...

This will load our model with its weights from Hugging Face for your config.

(3) Running Training with LLM Foundry and Composer

After having converted your dataset and defined a run configuration yaml, you can run training with LLM Foundry.

Follow the How to Start Training section in the LLM Foundry docs to run training. The section shows you how to run single-node and multi-node training. Effectively, you will run the scripts/train/train.py training script in the LLM Foundry repo with the defined configuration yaml using a command like composer train/train.py <configuration_yaml_path> <extra_args>.

⚠️ Important ⚠️

There is some hardcoded logic in Composer that we need to circumvent in order to save the checkpoints. In the scripts/train/train.py training script, add the line model.tokenizer = None just after the model is initialized and before the train dataloader is set up, i.e., at the moment of writing, line 147 in main(). This effectively ensures that we don't save out the tokenizer with the checkpoint state. We need this workaround because currently Composer cannot handle saving checkpoints with tokenizers that include *.py files.

Relevant Documentation

  • The Composer Docs are your best friend for using the Composer training framework and its options, and configuring integrations such as WandB, etc. in your configuration yamls, including how to setup checkpointing, logging, etc.
  • The LLM Foundry README and the LLM Foundry Training Documentation are great starting points. As a heads up, the LLM Foundry documentation is spread across several locations in the repo, so we did our best to directly link to the relevant sections above.

Instruction Tuning

You can instruct-tune our ReplitLM models for your own use case. For most instruct-tuning use cases, we recommend starting from the Hugging Face examples below. Otherwise, we also provide a detailed guide to do Instruction Tuning with LLM Foundry.

Alpaca-style Instruct Tuning with Hugging Face Transformers

You can instruct-tune the replit-code-v1-3b model on Alpaca-style datasets using the transformers library.

To accomplish that, you will need an instruct tuning dataset that is already in Alpaca-style format, such as:

Open source contributor Teknium has forked the original Alpaca repo to the stanford_alpaca-replit repo that is pre-configured to run with our models. We strongly recommend you use this as your starting point.

The repo contains instructions on how to setup and run the trainer. The required Alpaca-style dataset format is described here. Any dataset formatted Alpaca-style will work with the trainer. For example, the Code Alpaca dataset can be used to instruct tune our model using the training script in Teknium's repo.

Instruct Tuning with LLM Foundry

You can also use LLM Foundry to do Instruction Tuning. To do so you need to the following steps at a high-level, with the specific details and steps you need to follow linked to as needed:

(0) Install LLM Foundry and Requirements

Install LLM Foundry

To get started with LLM Foundry, you can follow the LLM Foundry README to:

  1. Setup the Prerequisites, the Docker file is recommended to avoid environment issues
  2. Perform the Installation steps as they recommend
  3. (Optional) Run the Quickstart steps out of the box to check everything is working

At a high-level, LLM Foundry is used by defining a configuration yaml and then running train/train.py training script in the LLM Foundry repo with the defined configuration yaml using a command like composer train/train.py <configuration_yaml_path> <extra_args>. The scripts/train/yamls dir contains example YAMLs for both finetuning an pretaining.

Install Other Requirements for the Replit Models

You will then have to install a few other dependencies specified in the requirements.txt.

(1) Find an instruct tuning dataset

Can be any of the following:

  • some instruct tuning dataset on the Hugging Face Hub
  • a local dataset in a JSONL file
  • a local or remote streaming dataset, i.e., a dataset in the specific MDS format used by Mosaic Streaming available locally or in some Cloud store such as a GCS/S3 bucket. You will likely not have this dataset, unless you already have been customizing your training and datasets for use with the Mosaic ecosystem.

(2) Format the Dataset with a Custom Preprocessing Function

Depending on the dataset you are using, you may or may not need to format the dataset into the format expected by LLM Foundry.

Datasets for which Custom Preprocessing is Not Needed

Some datasets like mosaicml/dolly_hhrlhf already come with a preprocessing function that you can use right away. As of the time of publishing, the following Hugging Face datasets came with a pre-registered preprocessing function: HuggingFaceH4/databricks_dolly_15k, Muennighoff/P3, Muennighoff/flan, bigscience/P3, tatsu-lab/alpaca.

Datasets for which Custom Preprocessing is Needed

If you're not using any of the above datasets, you will need to write your own preprocessing function and register it.

For any dataset, you need each example formatted as a dictionary with the following keys:

formatted_example = {'prompt': <prompt_text>, 'response': <response_text>}

i.e., each sample is a dictionary with the two keys. This is the format the finetuning dataloader expects downstream.

Guide for Formatting Your Dataset

The Data Formatting section in the original LLM Foundry repo describes how to do this.

In the case that you need to create a custom preprocessing function to get your data into the right format, and the steps in the LLM Foundry documentation is confusing you, the TL;DR paraphrased is as follows:

  1. You create a file (for example, preprocess.py) somewhere in your codebase, e.g., in the same directory as your training script, as long as it can be imported by your training script.
  2. You define a fuction preprocess_function() that takes as input one sample from your dataset and returns a dictionary with the keys prompt and response as described above, according to your logic of how to format the sample into the required format.
  3. In the YAML config you setup for your training run, you will point to the file (for example, preprocess.py) and the function (for example, preprocess_function()) you created.

(3) Using your Dataset and Finetuning the Replit Model

Now you can use your dataset to finetune the Replit model.

Guide

The Usage section in the original LLM Foundry repo describes how to use your dataset and finetune the Replit model.

If you are using options 1) or 2) in that section, you will modify the train_loader, and eval_loader if applicable, in your training YAML based on what you did in the previous two steps. If you are using option 3) (i.e., streaming dataset) you will first convert the dataset into the right format with prompt and response keys, and then you will write it out to a local MDS dataset. After this you can modify your YAML to point to this.

FAQs

  • What dataset was this trained on?
  • What languages was the model trained on?
    • The training mixture includes 20 different languages, listed here in descending order of number of tokens: Markdown, Java, JavaScript, Python, TypeScript, PHP, SQL, JSX, reStructuredText, Rust, C, CSS, Go, C++, HTML, Vue, Ruby, Jupyter Notebook, R, Shell
  • How many GPUs do I need to train a LLM?
  • Optimizing Performance

replitlm's People

Contributors

eltociear avatar madhavatreplit avatar mishig25 avatar pirroh avatar tanmay-bakshi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

replitlm's Issues

Using replit_lm_tokenizer locally

Hi community, I just got familiar with LLM recently so sorry if my question doesn't make sense.

As my work requires I need to process data locally, is there a way to use replit_lm_tokenizer locally instead having to set trust_remote_code = True as described in ReadME. Thanks a lot.

Finetuning fails with `RuntimeError: Please install flash-attn==1.0.3.post0 and triton==2.0.0.dev20221202`

Following the instructions in the README, running through the docker container mosaicml/llm-foundry:1.13.1_cu117-latest, finetuning fails with RuntimeError: Please install flash-attn==1.0.3.post0 and triton==2.0.0.dev20221202.

pip install triton==2.0.0.dev20221202 fixes the problem. Ideally this hint should be part of the README. Furthermore, there is a few more hurdles to take if going through Docker. Which docker image to use? With which parameters to run it?

This command currently works for me (and then following instructions in llm-foundry and installing the expected triton version as outlined above).

docker run --rm -it --gpus all --shm-size=512m -v.:/root/replit mosaicml/llm-foundry:1.13.1_cu117-latest

Maybe you want to update the README accordingly? I imagine a few other people will get stuck along the way.
(ideally specify a fixed docker image, not latest)

Model fails with AttributeError: 'ReplitLMTokenizer' object has no attribute 'sp_model'

Here is the full error prompt
2024-01-13 22:58:35,407: rank0[1152][MainThread]: INFO: __main__: Building tokenizer... Traceback (most recent call last): File "/llm-foundry/scripts/train/train.py", line 653, in <module> main(cfg) File "/llm-foundry/scripts/train/train.py", line 454, in main tokenizer = build_tokenizer(tokenizer_name, tokenizer_kwargs) File "/llm-foundry/llmfoundry/utils/builders.py", line 404, in build_tokenizer tokenizer = AutoTokenizer.from_pretrained(tokenizer_name, File "/usr/lib/python3/dist-packages/transformers/models/auto/tokenization_auto.py", line 774, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) File "/usr/lib/python3/dist-packages/transformers/tokenization_utils_base.py", line 2028, in from_pretrained return cls._from_pretrained( File "/usr/lib/python3/dist-packages/transformers/tokenization_utils_base.py", line 2260, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) File "/root/.cache/huggingface/modules/transformers_modules/replit/replit-code-v1-3b/cc0a4f17a8d72b71d62ea53cb0e23e4dac352067/replit_lm_tokenizer.py", line 66, in __init__ super().__init__(bos_token=bos_token, eos_token=eos_token, unk_token=unk_token, pad_token=pad_token, sep_token=sep_token, sp_model_kwargs=self.sp_model_kwargs, **kwargs) File "/usr/lib/python3/dist-packages/transformers/tokenization_utils.py", line 367, in __init__ self._add_tokens( File "/usr/lib/python3/dist-packages/transformers/tokenization_utils.py", line 467, in _add_tokens current_vocab = self.get_vocab().copy() File "/root/.cache/huggingface/modules/transformers_modules/replit/replit-code-v1-3b/cc0a4f17a8d72b71d62ea53cb0e23e4dac352067/replit_lm_tokenizer.py", line 76, in get_vocab vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)} File "/root/.cache/huggingface/modules/transformers_modules/replit/replit-code-v1-3b/cc0a4f17a8d72b71d62ea53cb0e23e4dac352067/replit_lm_tokenizer.py", line 73, in vocab_size return self.sp_model.get_piece_size() AttributeError: 'ReplitLMTokenizer' object has no attribute 'sp_model'

I am using the following versions:
Python - 3.10.13
Transformers - 4.36.0

I followed each step as mentioned on llm-foundary and am running the instance in the docker container. This case is the same for running the direct version from huggingface.

Warnings and Errors when generating with the given code

A nice model for code generation!
I'd like to test this model on other languages of HumanEval, and here is my code:

from transformers import AutoModelForCausalLM, AutoTokenizer
from tqdm import tqdm
import os
import json
from loguru import logger
import logging
import torch

logger.add("output_go.log")

os.environ['CURL_CA_BUNDLE'] = ""
os.environ["CUDA_VISIBLE_DEVICES"] = "7"

logger.info('loading model...')
tokenizer = AutoTokenizer.from_pretrained('replit/replit-code-v1-3b', trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained('replit/replit-code-v1-3b', trust_remote_code=True)
device = 'cuda:7'
model.to(device=device )
logger.info('model loaded.')

lines = []
with open('humaneval_go.jsonl', 'r') as fr:
    lines = fr.readlines()

logger.info(len(lines))

fw = open('output_go.jsonl', 'a', encoding='utf-8')
for i, line in tqdm(enumerate(lines)):
    logger.info(i)
    task = json.loads(line)
    x = tokenizer.encode(task['prompt'], return_tensors='pt').to(device=device)
    y = model.generate(x, max_length=768, do_sample=True, top_p=0.95, top_k=4, temperature=0.2, num_return_sequences=1,
                       eos_token_id=tokenizer.eos_token_id)

    # decoding, clean_up_tokenization_spaces=False to ensure syntactical correctness
    generated_code = tokenizer.decode(y[0], skip_special_tokens=True, clean_up_tokenization_spaces=False)
    result = {
        'question_id': f'HumanEval/{i}',
        'snippets': [
            generated_code
        ]
    }
    fw.write(json.dumps(result) + '\n')
    logger.info(generated_code)

fw.close()

Running on CPU seems slowly but ok, if we ignore such warnings (how to get avoid them?):

/root/.cache/huggingface/modules/transformers_modules/replit/replit-code-v1-3b/9eceafb041eb8abd565dabfbfadd328869140011/attention.py:290: UserWarning: Using `attn_impl: torch`. If 
your model does not use `alibi` or `prefix_lm` we recommend using `attn_impl: flash` otherwise we recommend using `attn_impl: triton`.                                              
  warnings.warn(                                                                                                                                                                    
You are using config.init_device='cpu', but you can also use config.init_device="meta" with Composer + FSDP for fast initialization.     

(Apologize in advance since maybe these issues are from newbies, but I do believe that a complete demo inference code will save us a lot!)

Expected eval results on Multiple-E?

Hi thanks for releasing this awesome codebase!

I wonder if there is documentation on what results we should expect for running bash scripts/multiple_eval.sh.

cuda use and out of memory

Hey! so, to use cuda,

I had to go here:
https://developer.nvidia.com/cuda-downloads

then uninstall torch
pip uninstall torch

then download torch with cuda from here
https://pytorch.org/get-started/locally/

but now I am getting

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 100.00 MiB (GPU 0; 8.00 GiB total capacity; 7.30 GiB already allocated; 0 bytes free; 7.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I couldn't figure out how to fix that error. Any clues? I'm on Windows 10 laptop with a 3070.

Im also not sure if the configuration is still correct if I try to run it with cuda. As I have to change the device. Im using the following code as a test.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

device = "cuda:0"
dtype = torch.int8

tokenizer = AutoTokenizer.from_pretrained(
    "replit/replit-code-v1-3b", trust_remote_code=True
)
model = AutoModelForCausalLM.from_pretrained(
    "replit/replit-code-v1-3b",
    trust_remote_code=True,
    # attn_impl="triton",
    # init_device="meta",
    init_device=device,
)

model.to(device=device, dtype=dtype)


x = tokenizer.encode("def fibonacci(n): ", return_tensors="pt")
x = x.to(device=device, dtype=dtype)
y = model.generate(
    x,
    max_length=100,
    do_sample=True,
    top_p=0.95,
    top_k=4,
    temperature=0.2,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
)

# decoding, clean_up_tokenization_spaces=False to ensure syntactical correctness
generated_code = tokenizer.decode(
    y[0], skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(generated_code)

The config seems to be the default from config.json

Thanks!

where can it run? hardware specs, performance data

Got it running on my laptop with a i5-1135G7, 16GB RAM and a RTX 3060 connected with Thunderbolt in an eGPU enclosure, running Windows 10.

It's loading the model after about 45s on my ssd disk.

When generating 100 tokens from class AVeryLongClass: (which is 8 tokens long), I'm getting numbers around ~8.47s for those tokens so about 11.8 tok/s. It gets slower with a bigger context of course. You can see my script here https://gist.github.com/elikoga/c300b9bf6b090fda9187644766347348

Just wanted to share some numbers and where I got it running :D I like the generation results I'm seeing so far

Maybe you can share some of your numbers too

`ImportError: This modeling file requires... flash_attn`

Trying to follow the instructions on an m1 mac, I get the above error.

Unfortunately, attempting to install flash_attn does not succeed, due to: RuntimeError: flash_attn was requested, but nvcc was not found., which may be just an unfortunate aspect of not having an nvidia card.

Anyway, the point is probably you should add flash_attn to your list of required modules?

Could your guys provide some official example tutorials on README.md for this? Thank you.

I am very optimistic about the large models and commercial agreements you provide, and I appreciate your contributions to this world.

Additionally, could you provide some tutorials or instructions in the readme file? I couldn't find any examples of how to use this model on either Huggingface or GitHub. I hope you can add them when you have time, just like with llama or Moss.

Thank you!

How to start the server mode?

Thank you for your work. Any chance you can give some demo on how to start the server mode?

Due to information risk policy, we could not use the server mode as the way your provide in "inference Endpoints".

CC BY-SA-4.0 license

It appears that the CC BY-SA-4.0 license is not included in the repository, which raises concerns about its licensing status. Could you please provide a link to the license or clarify where it can be found? As per my understanding, the license must be included with the code, and its absence may render the code unlicensed.

errors in generation, TypeError: gelu()

When I run the following code, it came to the error
"y = model(x)
return torch._C._nn.gelu(input, approximate)
TypeError: gelu(): argument 'approximate' (position 2) must be bool, not str"

from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM


tokenizer = AutoTokenizer.from_pretrained('replit/replit-code-v1-3b', trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained('replit/replit-code-v1-3b', trust_remote_code=True)
model.to(device='cuda:1')

x = torch.tensor([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
x = x.to(device='cuda:1')
y = model(x)
print(y)

How to train the ReplitLM model

When training, it is based on tramsformers' training course.
It is started on the A100-80g machine, but the per gpu batch-size can be set to 2 at most, and there is extremely unbalanced memory occupation on multiple cards, such as 60G+ for the 0th card and 30G+ for other cards.
In addition, is there a training parameter? Because of the current training strategy, the loss value is very large, and it almost drops slowly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.