declare-lab / flan-alpaca Goto Github PK

This repository contains code for extending the Stanford Alpaca synthetic instruction tuning to existing instruction-tuned models such as Flan-T5.

License: Apache License 2.0

Python 100.00%

flan-t5 alpaca language-model llm transformers

flan-alpaca's People

Contributors

Stargazers

Watchers

flan-alpaca's Issues

In ShareGPT, why the conversation from human is accumulated?

Line 329 data_loading.py

Flan Data

The readme file says that models use Alpaca + Flan data; however, I do not see Flan data in data loaders. It simply means that the Huggingface checkpoint was loaded which was already fine-tuned using Flan data, right ?

Thanks

RuntimeError: Trying to resize storage that is not resizable

During inference or upload I get this RuntimeError: Trying to resize storage that is not resizable

I have done training using use_fsdp on 4x RTX A4000 . It ran for 8 hrs on clean alpaca dataset and gave this error. Seems all work gone waste.

Error occurs at this line

model: LightningModel = LightningModel.load_from_checkpoint(path)

And leads to torch utils

unable to use new flan-alpaca-gpt4-xl in pipeline

Hi,
I've tried to use the new model, but get the following error:
ValueError: Could not load model declare-lab/flan-alpaca-gpt4-xl with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForSeq2SeqLM'>, <class 'transformers.models.t5.modeling_t5.T5ForConditionalGeneration'>).

code to reproduce:
`from transformers import pipeline

model = pipeline(model="declare-lab/flan-alpaca-gpt4-xl")`

thank you!

is there any plan for flan-ul2?

i just wonder that is there going to be new checkpoint released base of flan ul2 ?

anyway, really love the work on this repo 😃 you guys did a really great job!

LoRA + FSDP -- issue

Hi, dumb question, when using both lora & fsdp together for faster training on ml.p3.16xlarge instance which has 8 GPUs I'm encountering error.

This works fine
python training.py --output_dir outputs/model/xl \ --use_fsdp \ --train_epochs 2 \ --data_path data/train.json \ --model_name_or_path "google/flan-t5-xl" \ --train_batch_size 8 \ --gradient_accumulation_steps 32

However running this, throws error:
python training.py --output_dir outputs/model/xl \ --use_fsdp \ --use_lora \ --train_epochs 2 \ --data_path data/train.json \ --model_name_or_path "google/flan-t5-xl" \ --train_batch_size 8 \ --gradient_accumulation_steps 32

File "/opt/conda/lib/python3.8/site-packages/torch/distributed/fsdp/flat_param.py", line 440, in _init_flat_param
self._init_flat_param(params, fully_sharded_module, use_orig_params)
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/fsdp/flat_param.py", line 440, in _init_flat_param
raise ValueError(
ValueError: FlatParameter requires uniform requires_grad

will this fix the issue?
for param in model.parameters(): param.requires_grad_()

I dont see any progress logs

I am training Flan-XL on 4 A 6000. Earlier (few days back when i trained a large model) , I used to see progress logs. Now I dont see

Unable to train on 4-5 gtx 1070s

I've got 5 1070s here I'm trying to train on. The memory goes away quick initially. 40G of VRAM but only 64G of system memory. I added swap but I imagine this will take forever to load in. Are there other flags I can use to reduce this?

Trouble training

Having trouble running your training script. I haven't looked into debugging the code but I assumed it works with the alpaca dataset as is. Should the alpaca json file be preprocessed in some way before running the training script?

File "/home/voiduser/pydev/flan-alpaca/flan-alpaca/data_loading.py", line 45, in
samples = [TextToTextSample(**json.loads(line)) for line in f]
File "/usr/local/lib/python3.10/json/init.py", line 346, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python3.10/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/lib/python3.10/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 2)

issue in training declare-lab/flan-alpaca-base model

Hi,
im able to train google/flan-t5-base or large models on my data after tweaking the training code for bfloat as my gpu does not support it, but i want to train declare-lab/flan-alpaca-base model on my data this time, its throwing long error message

File "/home/ec2-user/SageMaker/conda_envs/llm/lib/python3.8/site-packages/triton/language/semantic.py", line 633, in bitcast raise ValueError("Cannot bitcast data-type of size " + str(src_bits) + "to " ValueError: Cannot bitcast data-type of size 64to data-type of size 32

can anyone suggest anything on it?

use gpt4all dataset

I am impressed by the quality results and speed of the XL model. And it can very easily fit in my 24GB GPU.
I found this dataset recently: https://github.com/nomic-ai/gpt4all But my GPU is unable to train the model due to insufficient memory. Can anyone train it, please?

Loss value is NaN

I am trying to finetune the model using V100 and a lower version of torch (1.13.0).

After removing "--use_compile" and updating "precision" to "16-mixed", I got a [nan.0] value at each step during training.
I tried to set "precision" to "32-true", then I can get the loss values. However, I cannot see the convergence after the first epoch.

All the other settings are the same as the readme file. Could anyone give me some suggestions on this? Thanks very much!

Quantized

First off, this model is pretty good and I'm enjoying it almost as much as ChatGPT.

Would like to ask if it is possible to quantize the model using Optimum? The following code throws me an Onnx Error.

[ONNXRuntimeError] : 1 : FAIL : Deserialize tensor onnx::MatMul_5230 failed.tensorprotoutils.cc:640 onnxruntime::utils::TensorProtoToTensor External initializer: onnx::MatMul_5230 offset: 0 size to read: 41943040 given file_length: 16777216 are out of bounds or can not be read in full.

model_id="declare-lab/flan-alpaca-xl"
model = ORTModelForSeq2SeqLM.from_pretrained(model_id, from_transformers=True)

Thank you!

Performance on MMLU

Have you evaluated the performance on MMLU compared to original?

Performance of the model on gsm8k/SVAMP/MultiArith.

Thank you for your excellent project. I have conducted an evaluation of Flan-Alpaca-Base/Large/XL on the gsm8k/SVAMP/MultiArith datasets, and the evaluation results are as follows:
| Model | gsm8k | MultiArith | SVAMP |
| --------------- | --------------- |
| Flan-Alpaca-Base | 13.42 | 20.33 | 19.50 |
| Flan-Alpaca-Large | 14.40 | 19.83 | 17.80 |
| Flan-Alpaca-XL | 9.25 | 13.83 | 14.30 |
Overall, the larger the number of parameters in the model, the worse its performance. What do you think is the reason for this? Also, did you use the test sets of the three datasets mentioned above to train the model? If so, could the reason for this be that the smaller model overfit on the test data?
Thank you~

OMP: Error #100: Fatal system error detected.

(base) root@anaconda-statefulset-0:~/flan-alpaca# python data_loading.py preprocess_alpaca \
--path_in data/alpaca_clean.json \
--path_out data/train.json
OMP: Error #100: Fatal system error detected.
OMP: System error #22: Invalid argument
Aborted (core dumped)

I'm inside a k8s pod when doing this, but doing export KMP_AFFINITY=disabled resolved this for me. Not sure if it's being in a container or more. Using the anacodna3 container

Usage example in readme doesn't work

I cloned the repo, got the requirements, and placed this in a test script

from transformers import pipeline

prompt = "Write an email about an alpaca that likes flan"
model = pipeline(model="`declare-lab/flan-alpaca-xl`")
model(prompt, max_length=128, do_sample=True)
print(model)

As an output, I get <transformers.pipelines.text2text_generation.Text2TextGenerationPipeline object at 0x7fdd5559de20>

Commercial Use?

Hey just double checking, it is cool to use this for model for commercial use. I noticed the apache license so I am guessing is it ok but I want to confirm since it is related to alpaca. Just confirming this is directly related to https://huggingface.co/declare-lab/flan-alpaca-xl.

What are the available parameters for generator?

What are the available parameters for generator? I see that temperature needs to be a float, but I am not aware of other:

generator = pipeline(model="declare-lab/flan-alpaca-xl")
generated_text = generator(prompt, temperature=0.1, max_length=784, do_sample=False)

Any plan to support trl-peft load_in_8bit for training.py ?

Hello,

I am fairly new with LLM in general (only started to study 2 weeks ago). So if I say/ask something silly, please excuse me.

And I stumble upon this blog post from HuggingFace
https://huggingface.co/blog/trl-peft

After a quick check it seem that training.py currently not support load_in_8bit.
And I wonder if there are any specific reason to not do so?

(I also want try to add such support to flan-alpaca)

wget https://raw.githubusercontent.com/tloen/alpaca-lora/main/alpaca_data_cleaned.json -O data/alpaca_clean.json

I received the below error. It seems this error means that the specified URL "https://raw.githubusercontent.com/tloen/alpaca-lora/main/alpaca_data_cleaned.json" returned a 404 Not Found HTTP response code, indicating that the requested resource was not found on the server.

This error could occur for several reasons, including:

The URL may be incorrect or outdated, and the resource may have been moved or deleted.
The server hosting the resource may be down or inaccessible.
The resource may have been intentionally removed or blocked.
In this case, it seems that the alpaca_data_cleaned.json file could not be found at the specified location. You may want to check if the file exists at the URL or if the URL has changed.

The file alpaca_data_cleaned.json seems to be no longer available.

wget https://raw.githubusercontent.com/tloen/alpaca-lora/main/alpaca_data_cleaned.json -O data/alpaca_clean.json
--2023-04-10 09:39:10-- https://raw.githubusercontent.com/tloen/alpaca-lora/main/alpaca_data_cleaned.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2023-04-10 09:39:11 ERROR 404: Not Found.

declare-lab / flan-alpaca Goto Github PK

flan-alpaca's People

Contributors

Stargazers

Watchers

Forkers

flan-alpaca's Issues

The file alpaca_data_cleaned.json seems to be no longer available.

Recommend Projects

Recommend Topics

Recommend Org