replit / replitlm Goto Github PK

All released Replit models are available on Hugging Face under the Replit organization page and can be used with the Hugging Face Transformers library.

You can use the Replit models with Hugging Face Transformers library. The README for each released model has instructions on how to use the model with Hugging Face Transformers. Make sure you set the clean_up_tokenization_spaces=False when decoding with the tokenizer as well use the recommended post processing given in the README.

Model	README
replit-code-v1-3b	Documentation

Training and Fine-tuning

Training with LLM Foundry

We recommend any further training, pre-training and finetuning of the Replit models with MosaicML's LLM Foundry and Composer.

Our Replit models are compatible with LLM Foundry and can be trained/tuned in a highly optimized way with LLM Foundry + Composer using state of the art training techniques, architectural components, optimizers, and more. All models, LLM Foundry and the Composer training framework are Pytorch-based. Using these you can train the Replit models on your own datasets.

The following steps give you the outline of what needs to be done to train the models with links to the LLM Foundry documentation sections needed for each step:

(0) Install LLM Foundry and Requirements

Install LLM Foundry

To get started with LLM Foundry, you can follow the LLM Foundry README to:

Setup the Prerequisites, the Docker file is recommended to avoid environment issues
Perform the Installation steps as they recommend
(Optional) Run the Quickstart steps out of the box to check everything is working

At a high-level, LLM Foundry is used by defining a configuration yaml and then running train/train.py training script in the LLM Foundry repo with the defined configuration yaml using a command like composer train/train.py <configuration_yaml_path> <extra_args>. The scripts/train/yamls dir contains example YAMLs for both finetuning and pretaining.

Install Other Requirements for the Replit Models

You will then have to install a few other dependencies specified in the requirements.txt.

(1) Convert and Save Your Dataset

To train with LLM Foundry, you need to convert your dataset to the Mosaic StreamingDataset format.

The types of dataset sources supported are JSON datasets and Hugging Face Datasets.

The Data Preparation documentation in LLM Foundry gives the steps on how to do this.

⚠️ Important ⚠️

When running the convert_dataset_hf.py or convert_dataset_json.py in the steps above, you will have to specify that you are using the Replit tokenizer by passing in the argument --tokenizer replit/replit-code-v1-3b. A key step (due to the current implementation of llm-foundry) is to edit scripts/data_prep/convert_dataset_hf.py by passing the trust_remote_code=True kwarg to the AutoTokenizer.from_pretrained call when the tokenizer is loaded in the main() method.

Testing Your Converted Dataset

To test the converted dataset and check that it's working with the dataloader, you can follow the Test the Dataloader section in LLM Foundry docs.

(2) Define a Run Configuration YAML with the Replit Models

To train with LLM Foundry, you need to define a run configuration yaml. This yaml defines the model, training dataset, eval dataset and metric, training parameters and more.

Using the Replit Models

For any config YAML you define to train/tune with LLM Foundry, you can plug in and use the Replit model by replacing the model and tokenizer keys in your YAML as follows:

...
model:
  name: hf_causal_lm
  pretrained: true
  pretrained_model_name_or_path: replit/replit-code-v1-3b
  config_overrides:
    attn_config:
      attn_impl: triton
      attn_uses_sequence_id: false

tokenizer:
  name: replit/replit-code-v1-3b
  kwargs:
    model_max_length: ${max_seq_len}
    trust_remote_code: true
...

This will load our model with its weights from Hugging Face for your config.

(3) Running Training with LLM Foundry and Composer

After having converted your dataset and defined a run configuration yaml, you can run training with LLM Foundry.

Follow the How to Start Training section in the LLM Foundry docs to run training. The section shows you how to run single-node and multi-node training. Effectively, you will run the scripts/train/train.py training script in the LLM Foundry repo with the defined configuration yaml using a command like composer train/train.py <configuration_yaml_path> <extra_args>.

⚠️ Important ⚠️

There is some hardcoded logic in Composer that we need to circumvent in order to save the checkpoints. In the scripts/train/train.py training script, add the line model.tokenizer = None just after the model is initialized and before the train dataloader is set up, i.e., at the moment of writing, line 147 in main(). This effectively ensures that we don't save out the tokenizer with the checkpoint state. We need this workaround because currently Composer cannot handle saving checkpoints with tokenizers that include *.py files.

Relevant Documentation

The Composer Docs are your best friend for using the Composer training framework and its options, and configuring integrations such as WandB, etc. in your configuration yamls, including how to setup checkpointing, logging, etc.
The LLM Foundry README and the LLM Foundry Training Documentation are great starting points. As a heads up, the LLM Foundry documentation is spread across several locations in the repo, so we did our best to directly link to the relevant sections above.

Instruction Tuning

You can instruct-tune our ReplitLM models for your own use case. For most instruct-tuning use cases, we recommend starting from the Hugging Face examples below. Otherwise, we also provide a detailed guide to do Instruction Tuning with LLM Foundry.

Alpaca-style Instruct Tuning with Hugging Face Transformers

You can instruct-tune the replit-code-v1-3b model on Alpaca-style datasets using the transformers library.

To accomplish that, you will need an instruct tuning dataset that is already in Alpaca-style format, such as:

Open source contributor Teknium has forked the original Alpaca repo to the stanford_alpaca-replit repo that is pre-configured to run with our models. We strongly recommend you use this as your starting point.

The repo contains instructions on how to setup and run the trainer. The required Alpaca-style dataset format is described here. Any dataset formatted Alpaca-style will work with the trainer. For example, the Code Alpaca dataset can be used to instruct tune our model using the training script in Teknium's repo.

Instruct Tuning with LLM Foundry

You can also use LLM Foundry to do Instruction Tuning. To do so you need to the following steps at a high-level, with the specific details and steps you need to follow linked to as needed:

(0) Install LLM Foundry and Requirements

Install LLM Foundry

To get started with LLM Foundry, you can follow the LLM Foundry README to:

Setup the Prerequisites, the Docker file is recommended to avoid environment issues
Perform the Installation steps as they recommend
(Optional) Run the Quickstart steps out of the box to check everything is working

Install Other Requirements for the Replit Models

You will then have to install a few other dependencies specified in the requirements.txt.

(1) Find an instruct tuning dataset

Can be any of the following:

some instruct tuning dataset on the Hugging Face Hub
a local dataset in a JSONL file
a local or remote streaming dataset, i.e., a dataset in the specific MDS format used by Mosaic Streaming available locally or in some Cloud store such as a GCS/S3 bucket. You will likely not have this dataset, unless you already have been customizing your training and datasets for use with the Mosaic ecosystem.

(2) Format the Dataset with a Custom Preprocessing Function

Depending on the dataset you are using, you may or may not need to format the dataset into the format expected by LLM Foundry.

Datasets for which Custom Preprocessing is Not Needed

Some datasets like mosaicml/dolly_hhrlhf already come with a preprocessing function that you can use right away. As of the time of publishing, the following Hugging Face datasets came with a pre-registered preprocessing function: HuggingFaceH4/databricks_dolly_15k, Muennighoff/P3, Muennighoff/flan, bigscience/P3, tatsu-lab/alpaca.

Datasets for which Custom Preprocessing is Needed

If you're not using any of the above datasets, you will need to write your own preprocessing function and register it.

For any dataset, you need each example formatted as a dictionary with the following keys:

formatted_example = {'prompt': <prompt_text>, 'response': <response_text>}

i.e., each sample is a dictionary with the two keys. This is the format the finetuning dataloader expects downstream.

Guide for Formatting Your Dataset

The Data Formatting section in the original LLM Foundry repo describes how to do this.

In the case that you need to create a custom preprocessing function to get your data into the right format, and the steps in the LLM Foundry documentation is confusing you, the TL;DR paraphrased is as follows:

You create a file (for example, preprocess.py) somewhere in your codebase, e.g., in the same directory as your training script, as long as it can be imported by your training script.
You define a fuction preprocess_function() that takes as input one sample from your dataset and returns a dictionary with the keys prompt and response as described above, according to your logic of how to format the sample into the required format.
In the YAML config you setup for your training run, you will point to the file (for example, preprocess.py) and the function (for example, preprocess_function()) you created.

(3) Using your Dataset and Finetuning the Replit Model

Now you can use your dataset to finetune the Replit model.

Guide

The Usage section in the original LLM Foundry repo describes how to use your dataset and finetune the Replit model.

If you are using options 1) or 2) in that section, you will modify the train_loader, and eval_loader if applicable, in your training YAML based on what you did in the previous two steps. If you are using option 3) (i.e., streaming dataset) you will first convert the dataset into the right format with prompt and response keys, and then you will write it out to a local MDS dataset. After this you can modify your YAML to point to this.

FAQs

What dataset was this trained on?
- Stack Dedup
What languages was the model trained on?
- The training mixture includes 20 different languages, listed here in descending order of number of tokens: Markdown, Java, JavaScript, Python, TypeScript, PHP, SQL, JSX, reStructuredText, Rust, C, CSS, Go, C++, HTML, Vue, Ruby, Jupyter Notebook, R, Shell
How many GPUs do I need to train a LLM?
Optimizing Performance

replitlm's People

Contributors

Stargazers

Watchers

Forkers

srini-98 pierizvi roger68 mrghappy jonas-schmitt send2cloud dineshmatta tanmay-bakshi prompteco rioncarter benthomasson hbcbh1999 brett-dun mcx kaushal07wick sattiraju84 yifree joelesilva viniciusbeckerdesouza weltonvaz abdoiiii hzin laidson asdlei99 techthiyanes ar4s-dev hhy5277 swifilaboroka eltociear lcsouzamenezes jankieai nfx evelynmitchell epinnock mishig25 classicvalues gavinchen1314 ramstorage illin-villains amirulandalib etiennexiong sysonlai iheartalexx hetieke2 gbolahr hp2706 albinoorca randyblo7 t7x77 apetree100122 tonywhite11 karlfiner-robotiken budavarapu yuchao86 justcoding21 deep7544 meowboy326 rosesummer144 amorleinis pieea thatonekid3y4y marvinelmascapo axxionzenith ego

replitlm's Issues

Using replit_lm_tokenizer locally

Hi community, I just got familiar with LLM recently so sorry if my question doesn't make sense.

As my work requires I need to process data locally, is there a way to use replit_lm_tokenizer locally instead having to set trust_remote_code = True as described in ReadME. Thanks a lot.

Finetuning fails with `RuntimeError: Please install flash-attn==1.0.3.post0 and triton==2.0.0.dev20221202`

Following the instructions in the README, running through the docker container mosaicml/llm-foundry:1.13.1_cu117-latest, finetuning fails with RuntimeError: Please install flash-attn==1.0.3.post0 and triton==2.0.0.dev20221202.

pip install triton==2.0.0.dev20221202 fixes the problem. Ideally this hint should be part of the README. Furthermore, there is a few more hurdles to take if going through Docker. Which docker image to use? With which parameters to run it?

This command currently works for me (and then following instructions in llm-foundry and installing the expected triton version as outlined above).

docker run --rm -it --gpus all --shm-size=512m -v.:/root/replit mosaicml/llm-foundry:1.13.1_cu117-latest

Maybe you want to update the README accordingly? I imagine a few other people will get stuck along the way.
(ideally specify a fixed docker image, not latest)

Model fails with AttributeError: 'ReplitLMTokenizer' object has no attribute 'sp_model'

Here is the full error prompt
2024-01-13 22:58:35,407: rank0[1152][MainThread]: INFO: __main__: Building tokenizer... Traceback (most recent call last): File "/llm-foundry/scripts/train/train.py", line 653, in <module> main(cfg) File "/llm-foundry/scripts/train/train.py", line 454, in main tokenizer = build_tokenizer(tokenizer_name, tokenizer_kwargs) File "/llm-foundry/llmfoundry/utils/builders.py", line 404, in build_tokenizer tokenizer = AutoTokenizer.from_pretrained(tokenizer_name, File "/usr/lib/python3/dist-packages/transformers/models/auto/tokenization_auto.py", line 774, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) File "/usr/lib/python3/dist-packages/transformers/tokenization_utils_base.py", line 2028, in from_pretrained return cls._from_pretrained( File "/usr/lib/python3/dist-packages/transformers/tokenization_utils_base.py", line 2260, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) File "/root/.cache/huggingface/modules/transformers_modules/replit/replit-code-v1-3b/cc0a4f17a8d72b71d62ea53cb0e23e4dac352067/replit_lm_tokenizer.py", line 66, in __init__ super().__init__(bos_token=bos_token, eos_token=eos_token, unk_token=unk_token, pad_token=pad_token, sep_token=sep_token, sp_model_kwargs=self.sp_model_kwargs, **kwargs) File "/usr/lib/python3/dist-packages/transformers/tokenization_utils.py", line 367, in __init__ self._add_tokens( File "/usr/lib/python3/dist-packages/transformers/tokenization_utils.py", line 467, in _add_tokens current_vocab = self.get_vocab().copy() File "/root/.cache/huggingface/modules/transformers_modules/replit/replit-code-v1-3b/cc0a4f17a8d72b71d62ea53cb0e23e4dac352067/replit_lm_tokenizer.py", line 76, in get_vocab vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)} File "/root/.cache/huggingface/modules/transformers_modules/replit/replit-code-v1-3b/cc0a4f17a8d72b71d62ea53cb0e23e4dac352067/replit_lm_tokenizer.py", line 73, in vocab_size return self.sp_model.get_piece_size() AttributeError: 'ReplitLMTokenizer' object has no attribute 'sp_model'

I am using the following versions:
Python - 3.10.13
Transformers - 4.36.0

I followed each step as mentioned on llm-foundary and am running the instance in the docker container. This case is the same for running the direct version from huggingface.

Warnings and Errors when generating with the given code

A nice model for code generation!
I'd like to test this model on other languages of HumanEval, and here is my code:

from transformers import AutoModelForCausalLM, AutoTokenizer
from tqdm import tqdm
import os
import json
from loguru import logger
import logging
import torch

logger.add("output_go.log")

os.environ['CURL_CA_BUNDLE'] = ""
os.environ["CUDA_VISIBLE_DEVICES"] = "7"

logger.info('loading model...')
tokenizer = AutoTokenizer.from_pretrained('replit/replit-code-v1-3b', trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained('replit/replit-code-v1-3b', trust_remote_code=True)
device = 'cuda:7'
model.to(device=device )
logger.info('model loaded.')

lines = []
with open('humaneval_go.jsonl', 'r') as fr:
    lines = fr.readlines()

logger.info(len(lines))

fw = open('output_go.jsonl', 'a', encoding='utf-8')
for i, line in tqdm(enumerate(lines)):
    logger.info(i)
    task = json.loads(line)
    x = tokenizer.encode(task['prompt'], return_tensors='pt').to(device=device)
    y = model.generate(x, max_length=768, do_sample=True, top_p=0.95, top_k=4, temperature=0.2, num_return_sequences=1,
                       eos_token_id=tokenizer.eos_token_id)

    # decoding, clean_up_tokenization_spaces=False to ensure syntactical correctness
    generated_code = tokenizer.decode(y[0], skip_special_tokens=True, clean_up_tokenization_spaces=False)
    result = {
        'question_id': f'HumanEval/{i}',
        'snippets': [
            generated_code
        ]
    }
    fw.write(json.dumps(result) + '\n')
    logger.info(generated_code)

fw.close()

Running on CPU seems slowly but ok, if we ignore such warnings (how to get avoid them?):

/root/.cache/huggingface/modules/transformers_modules/replit/replit-code-v1-3b/9eceafb041eb8abd565dabfbfadd328869140011/attention.py:290: UserWarning: Using `attn_impl: torch`. If 
your model does not use `alibi` or `prefix_lm` we recommend using `attn_impl: flash` otherwise we recommend using `attn_impl: triton`.                                              
  warnings.warn(                                                                                                                                                                    
You are using config.init_device='cpu', but you can also use config.init_device="meta" with Composer + FSDP for fast initialization.

(Apologize in advance since maybe these issues are from newbies, but I do believe that a complete demo inference code will save us a lot!)

Does ReplitLM support gradient checkpoints?

Replit code can be enhanced with RLHF (Reinforcement learning from human feedback) ?

Can anyone have idea about my issue.

Expected eval results on Multiple-E?

Hi thanks for releasing this awesome codebase!

I wonder if there is documentation on what results we should expect for running bash scripts/multiple_eval.sh.

Challenge 1.1

https://replit.com/@RahulPriyas/Challenge-11?s=app

C

FlashAttention2 for ReplitLM

I want to add FlashAttention2 to the replit-code-v1 for better performance and efficiency.
Please, let me know If i am missing anything.
@pirroh @madhavatreplit @amasad

cuda use and out of memory

Hey! so, to use cuda,

I had to go here:
https://developer.nvidia.com/cuda-downloads

then uninstall torch
pip uninstall torch

then download torch with cuda from here
https://pytorch.org/get-started/locally/

but now I am getting

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 100.00 MiB (GPU 0; 8.00 GiB total capacity; 7.30 GiB already allocated; 0 bytes free; 7.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I couldn't figure out how to fix that error. Any clues? I'm on Windows 10 laptop with a 3070.

Im also not sure if the configuration is still correct if I try to run it with cuda. As I have to change the device. Im using the following code as a test.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

device = "cuda:0"
dtype = torch.int8

tokenizer = AutoTokenizer.from_pretrained(
    "replit/replit-code-v1-3b", trust_remote_code=True
)
model = AutoModelForCausalLM.from_pretrained(
    "replit/replit-code-v1-3b",
    trust_remote_code=True,
    # attn_impl="triton",
    # init_device="meta",
    init_device=device,
)

model.to(device=device, dtype=dtype)


x = tokenizer.encode("def fibonacci(n): ", return_tensors="pt")
x = x.to(device=device, dtype=dtype)
y = model.generate(
    x,
    max_length=100,
    do_sample=True,
    top_p=0.95,
    top_k=4,
    temperature=0.2,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
)

# decoding, clean_up_tokenization_spaces=False to ensure syntactical correctness
generated_code = tokenizer.decode(
    y[0], skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(generated_code)

The config seems to be the default from config.json

Thanks!

Are there any vim plugins?

where can it run? hardware specs, performance data

Got it running on my laptop with a i5-1135G7, 16GB RAM and a RTX 3060 connected with Thunderbolt in an eGPU enclosure, running Windows 10.

It's loading the model after about 45s on my ssd disk.

When generating 100 tokens from class AVeryLongClass: (which is 8 tokens long), I'm getting numbers around ~8.47s for those tokens so about 11.8 tok/s. It gets slower with a bigger context of course. You can see my script here https://gist.github.com/elikoga/c300b9bf6b090fda9187644766347348

Just wanted to share some numbers and where I got it running :D I like the generation results I'm seeing so far

Maybe you can share some of your numbers too

Challenge 1.1

https://replit.com/@RahulPriyas/Challenge-11?s=app

HTML, CSS, JS

https://replit.com/@MATHANKUMARMK1/HTML-CSS-JS?s=app

Challenge 1.1

GPU hosted demo link broken in README

This link is broken in the README:
https://huggingface.co/spaces/replit/replit-code-v1-3b-demo

`ImportError: This modeling file requires... flash_attn`

Trying to follow the instructions on an m1 mac, I get the above error.

Unfortunately, attempting to install flash_attn does not succeed, due to: RuntimeError: flash_attn was requested, but nvcc was not found., which may be just an unfortunate aspect of not having an nvidia card.

Anyway, the point is probably you should add flash_attn to your list of required modules?

Could your guys provide some official example tutorials on README.md for this? Thank you.

I am very optimistic about the large models and commercial agreements you provide, and I appreciate your contributions to this world.

Additionally, could you provide some tutorials or instructions in the readme file? I couldn't find any examples of how to use this model on either Huggingface or GitHub. I hope you can add them when you have time, just like with llama or Moss.

Thank you!

How to start the server mode?

Thank you for your work. Any chance you can give some demo on how to start the server mode?

Due to information risk policy, we could not use the server mode as the way your provide in "inference Endpoints".

Maachudo

Replit

CC BY-SA-4.0 license

It appears that the CC BY-SA-4.0 license is not included in the repository, which raises concerns about its licensing status. Could you please provide a link to the license or clarify where it can be found? As per my understanding, the license must be included with the code, and its absence may render the code unlicensed.

errors in generation, TypeError: gelu()

When I run the following code, it came to the error
"y = model(x)
return torch._C._nn.gelu(input, approximate)
TypeError: gelu(): argument 'approximate' (position 2) must be bool, not str"

from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM


tokenizer = AutoTokenizer.from_pretrained('replit/replit-code-v1-3b', trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained('replit/replit-code-v1-3b', trust_remote_code=True)
model.to(device='cuda:1')

x = torch.tensor([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
x = x.to(device='cuda:1')
y = model(x)
print(y)

Excuse me, how to fine tune thie llm model?

I am a beginner in the LLM direction, I would like to ask if the official can provide some methods or tutorials to fine-tune this LLM model, I am very grateful for this, thank you. @madhavatreplit @Replit

Speed up the inference

Hi, this model seems nice, but I do find that the inference speed is very slow (70ms/token on single A100), so I want to speed up it.

It seems to be related with MPT itself: https://huggingface.co/mosaicml/mpt-7b-instruct/discussions/23

Any suggestions or best practices on speeding up? E.g., FastTransformer (a bit low-level), ONNX Runtime, or Oneflow?

How to train the ReplitLM model

When training, it is based on tramsformers' training course.
It is started on the A100-80g machine, but the per gpu batch-size can be set to 2 at most, and there is extremely unbalanced memory occupation on multiple cards, such as 60G+ for the 0th card and 30G+ for other cards.
In addition, is there a training parameter? Because of the current training strategy, the loss value is very large, and it almost drops slowly.

replit / replitlm Goto Github PK

replitlm's Introduction

ReplitLM

Table of Contents

Models

Releases

Usage

Hosted Demo

Using with Hugging Face Transformers

Training and Fine-tuning

Training with LLM Foundry

(0) Install LLM Foundry and Requirements

(1) Convert and Save Your Dataset

(2) Define a Run Configuration YAML with the Replit Models

(3) Running Training with LLM Foundry and Composer

Relevant Documentation

Instruction Tuning

Alpaca-style Instruct Tuning with Hugging Face Transformers

Instruct Tuning with LLM Foundry

(0) Install LLM Foundry and Requirements

(1) Find an instruct tuning dataset

(2) Format the Dataset with a Custom Preprocessing Function

(3) Using your Dataset and Finetuning the Replit Model

FAQs

replitlm's People

Contributors

Stargazers

Watchers

Forkers

replitlm's Issues

Recommend Projects

Recommend Topics

Recommend Org