Giter Site home page Giter Site logo

declare-lab / flan-alpaca Goto Github PK

View Code? Open in Web Editor NEW
337.0 7.0 37.0 34 KB

This repository contains code for extending the Stanford Alpaca synthetic instruction tuning to existing instruction-tuned models such as Flan-T5.

License: Apache License 2.0

Python 100.00%
flan-t5 alpaca language-model llm transformers

flan-alpaca's People

Contributors

chiayewken avatar soujanyaporia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

flan-alpaca's Issues

I dont see any progress logs

I am training Flan-XL on 4 A 6000. Earlier (few days back when i trained a large model) , I used to see progress logs. Now I dont see
image

Unable to train on 4-5 gtx 1070s

image
I've got 5 1070s here I'm trying to train on. The memory goes away quick initially. 40G of VRAM but only 64G of system memory. I added swap but I imagine this will take forever to load in. Are there other flags I can use to reduce this?

What are the available parameters for generator?

What are the available parameters for generator? I see that temperature needs to be a float, but I am not aware of other:

generator = pipeline(model="declare-lab/flan-alpaca-xl")
generated_text = generator(prompt, temperature=0.1, max_length=784, do_sample=False)

Loss value is NaN

I am trying to finetune the model using V100 and a lower version of torch (1.13.0).

  1. After removing "--use_compile" and updating "precision" to "16-mixed", I got a [nan.0] value at each step during training.
  2. I tried to set "precision" to "32-true", then I can get the loss values. However, I cannot see the convergence after the first epoch.

All the other settings are the same as the readme file. Could anyone give me some suggestions on this? Thanks very much!

use gpt4all dataset

I am impressed by the quality results and speed of the XL model. And it can very easily fit in my 24GB GPU.
I found this dataset recently: https://github.com/nomic-ai/gpt4all But my GPU is unable to train the model due to insufficient memory. Can anyone train it, please?

RuntimeError: Trying to resize storage that is not resizable

During inference or upload I get this RuntimeError: Trying to resize storage that is not resizable

I have done training using use_fsdp on 4x RTX A4000 . It ran for 8 hrs on clean alpaca dataset and gave this error. Seems all work gone waste.

Error occurs at this line

model: LightningModel = LightningModel.load_from_checkpoint(path)

And leads to torch utils

Performance of the model on gsm8k/SVAMP/MultiArith.

Thank you for your excellent project. I have conducted an evaluation of Flan-Alpaca-Base/Large/XL on the gsm8k/SVAMP/MultiArith datasets, and the evaluation results are as follows:
| Model | gsm8k | MultiArith | SVAMP |
| --------------- | --------------- |
| Flan-Alpaca-Base | 13.42 | 20.33 | 19.50 |
| Flan-Alpaca-Large | 14.40 | 19.83 | 17.80 |
| Flan-Alpaca-XL | 9.25 | 13.83 | 14.30 |
Overall, the larger the number of parameters in the model, the worse its performance. What do you think is the reason for this? Also, did you use the test sets of the three datasets mentioned above to train the model? If so, could the reason for this be that the smaller model overfit on the test data?
Thank you~

Trouble training

Having trouble running your training script. I haven't looked into debugging the code but I assumed it works with the alpaca dataset as is. Should the alpaca json file be preprocessed in some way before running the training script?

File "/home/voiduser/pydev/flan-alpaca/flan-alpaca/data_loading.py", line 45, in
samples = [TextToTextSample(**json.loads(line)) for line in f]
File "/usr/local/lib/python3.10/json/init.py", line 346, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python3.10/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/lib/python3.10/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 2)

unable to use new flan-alpaca-gpt4-xl in pipeline

Hi,
I've tried to use the new model, but get the following error:
ValueError: Could not load model declare-lab/flan-alpaca-gpt4-xl with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForSeq2SeqLM'>, <class 'transformers.models.t5.modeling_t5.T5ForConditionalGeneration'>).

code to reproduce:
`from transformers import pipeline

model = pipeline(model="declare-lab/flan-alpaca-gpt4-xl")`

thank you!

OMP: Error #100: Fatal system error detected.

(base) root@anaconda-statefulset-0:~/flan-alpaca# python data_loading.py preprocess_alpaca \
--path_in data/alpaca_clean.json \
--path_out data/train.json
OMP: Error #100: Fatal system error detected.
OMP: System error #22: Invalid argument
Aborted (core dumped)

I'm inside a k8s pod when doing this, but doing export KMP_AFFINITY=disabled resolved this for me. Not sure if it's being in a container or more. Using the anacodna3 container

Quantized

First off, this model is pretty good and I'm enjoying it almost as much as ChatGPT.

Would like to ask if it is possible to quantize the model using Optimum? The following code throws me an Onnx Error.

[ONNXRuntimeError] : 1 : FAIL : Deserialize tensor onnx::MatMul_5230 failed.tensorprotoutils.cc:640 onnxruntime::utils::TensorProtoToTensor External initializer: onnx::MatMul_5230 offset: 0 size to read: 41943040 given file_length: 16777216 are out of bounds or can not be read in full.

model_id="declare-lab/flan-alpaca-xl"
model = ORTModelForSeq2SeqLM.from_pretrained(model_id, from_transformers=True)

Thank you!

is there any plan for flan-ul2?

i just wonder that is there going to be new checkpoint released base of flan ul2 ?

anyway, really love the work on this repo ๐Ÿ˜ƒ you guys did a really great job!

Usage example in readme doesn't work

I cloned the repo, got the requirements, and placed this in a test script

from transformers import pipeline

prompt = "Write an email about an alpaca that likes flan"
model = pipeline(model="`declare-lab/flan-alpaca-xl`")
model(prompt, max_length=128, do_sample=True)
print(model)

As an output, I get <transformers.pipelines.text2text_generation.Text2TextGenerationPipeline object at 0x7fdd5559de20>

Flan Data

The readme file says that models use Alpaca + Flan data; however, I do not see Flan data in data loaders. It simply means that the Huggingface checkpoint was loaded which was already fine-tuned using Flan data, right ?

Thanks

LoRA + FSDP -- issue

Hi, dumb question, when using both lora & fsdp together for faster training on ml.p3.16xlarge instance which has 8 GPUs I'm encountering error.

This works fine
python training.py --output_dir outputs/model/xl \ --use_fsdp \ --train_epochs 2 \ --data_path data/train.json \ --model_name_or_path "google/flan-t5-xl" \ --train_batch_size 8 \ --gradient_accumulation_steps 32

However running this, throws error:
python training.py --output_dir outputs/model/xl \ --use_fsdp \ --use_lora \ --train_epochs 2 \ --data_path data/train.json \ --model_name_or_path "google/flan-t5-xl" \ --train_batch_size 8 \ --gradient_accumulation_steps 32

File "/opt/conda/lib/python3.8/site-packages/torch/distributed/fsdp/flat_param.py", line 440, in _init_flat_param
self._init_flat_param(params, fully_sharded_module, use_orig_params)
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/fsdp/flat_param.py", line 440, in _init_flat_param
raise ValueError(
ValueError: FlatParameter requires uniform requires_grad

will this fix the issue?
for param in model.parameters(): param.requires_grad_()

wget https://raw.githubusercontent.com/tloen/alpaca-lora/main/alpaca_data_cleaned.json -O data/alpaca_clean.json

I received the below error. It seems this error means that the specified URL "https://raw.githubusercontent.com/tloen/alpaca-lora/main/alpaca_data_cleaned.json" returned a 404 Not Found HTTP response code, indicating that the requested resource was not found on the server.

This error could occur for several reasons, including:

The URL may be incorrect or outdated, and the resource may have been moved or deleted.
The server hosting the resource may be down or inaccessible.
The resource may have been intentionally removed or blocked.
In this case, it seems that the alpaca_data_cleaned.json file could not be found at the specified location. You may want to check if the file exists at the URL or if the URL has changed.

The file alpaca_data_cleaned.json seems to be no longer available.

wget https://raw.githubusercontent.com/tloen/alpaca-lora/main/alpaca_data_cleaned.json -O data/alpaca_clean.json
--2023-04-10 09:39:10-- https://raw.githubusercontent.com/tloen/alpaca-lora/main/alpaca_data_cleaned.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2023-04-10 09:39:11 ERROR 404: Not Found.

issue in training declare-lab/flan-alpaca-base model

Hi,
im able to train google/flan-t5-base or large models on my data after tweaking the training code for bfloat as my gpu does not support it, but i want to train declare-lab/flan-alpaca-base model on my data this time, its throwing long error message

File "/home/ec2-user/SageMaker/conda_envs/llm/lib/python3.8/site-packages/triton/language/semantic.py", line 633, in bitcast raise ValueError("Cannot bitcast data-type of size " + str(src_bits) + "to " ValueError: Cannot bitcast data-type of size 64to data-type of size 32

can anyone suggest anything on it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.