nlpxucan / wizardlm Goto Github PK

View Code? Open in Web Editor NEW

9.2K 9.2K 713.0 11.77 MB

LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath

Python 100.00%

wizardlm's People

Contributors

Stargazers

Watchers

Forkers

mistobaan nashid robertmarton eltociear theonetrueguy dr-soni ralf12358 dengningyuan jan-karsten-kuhnke hunter5thompson zebrajack msgpo annias erickwill twilwa thbmelatoe varinliali sstobo rayjue edskamor wonder1905 sizzles kencheren maveth ashrithjacob efocht henryhesz sanyamlakhanpal azure-arc-0 xcytxs raxbits anusornc petragom botmasterza cyd3nt kumar045 misterye datacraft-ai parsedark xargs007 shrinivasanss maxakbar evdcush yijunwu ehartford ptichkass changhz philamz kernelzeroday standardgalactic kzke techthiyanes goswamig integritynoble ai-alebrijecircus-x jaedukseo air23zj theseer507 cyberhipp soon14 amesianx jesusoctavioas joskid echallenge iprofiles warmuseum rodrigonz erima2020 roger68 hrb518 oyelowo matthiasreccius frostbite1536 apollohuang1 ekryski player1537-forks petercao lc1332 dreamgonfly laplacekorea prestomac guacachips testsavant stjordanis wenxcs-msft oalee jackiej vikkas-goel gmy77 cathline jgayle99 anyadike22 nettecture worthmining dkzdev argent-oxidum ishandutta2007 cygwynd plaethosai yanndd1

wizardlm's Issues

WizardCoder improvement request: Train on OpenAPI specs

WizardCoder currently has weaknesses in understanding and using OpenAPI specs in order to create code that queries the specified API to answer a specific question. It would be great to have a OpenAPI-finetuned version.

new format, new finetune code?

Llamax code it knows how to handle alpaca formatted QA data, but I didnt' see anything in there to handle ShareGPT format data,

How do I finetune with the new format? Your finetune guide (https://github.com/nlpxucan/WizardLM/blob/main/WizardLM/README.md#fine-tuning) still references the 70k dataset

can't accurately process text

wizardLM can't receive chunk of text and rewrite it as specified. I use simple prompt template.

TEXT: {book}

DIRECTIVE: read the text and formulate N relevant questions with answers. All questions should be independent and answerable without context. Start questions strictly with "Q:", and answers with "A:". Begin:
### Response 
Q:

It could follow format. But hallucinate questions and answers, which are irrelevant to text.

fine-tuning only takes 7% VRAM

I am trying to fine-tune on 8x 4090s

I use this command I copied from the readme.md:

deepspeed --num_gpus 8 --num_nodes 1 train_freeform.py \
    --model_name_or_path /workspace/llama-7b-hf \
    --data_path /workspace/WizardLM_alpaca_evol_instruct_70k_unfiltered/WizardLM_alpaca_evol_instruct_70k_unfiltered.json \
    --output_dir /workspace/WizardLM-7B-unfiltered/ \
    --num_train_epochs 3 \
    --model_max_length 2048 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 800 \
    --save_total_limit 3 \
    --learning_rate 2e-5 \
    --warmup_steps 2 \
    --logging_steps 2 \
    --lr_scheduler_type "cosine" \
    --report_to "tensorboard" \
    --gradient_checkpointing True \
    --deepspeed configs/deepspeed_config.json \
    --fp16 True

I am confused because I don't see any output on stdout, but I see all 8 GPUs at 100% utilization but the VRAM is only 7% utilization. Do I need to modify any of these parameters to use 8x 4090 VRAM? Can you please tell me which ones I need to change and and how to I get it to display the status on stdout?

Can you provide the code for reproducing the evolution instruction?

Evol-Instruct on OSS FalconLM instead of LAMA?

Hello, you models are very interesting.
However the work would have bigger impact if also train a FalconLM derived model, which has Apache-2 license.
Thank, Vaclav

Huggingface space replication lagging

I created a NF space with the biggest Nvidia A100 - but don't seem to be getting any completions

Any suggestions for how to try out?

from train import smart_tokenizer_and_embedding_resize

The weight_diff_wizard.py is importing from a train module. But there is no train module in the repo.

CMD closes after I select WizardLM as my model.

When I try to load WizardLM into my CMD it says "Loading wizardLM-7B-GPTQ-4bit-128g... Can't determine model type from model name. Please specify it manually using --model_type argument" and then closes. I do not have these issues with 2 other models that I load.

errors in inference_wizardcoder.py

When I use the infer command(inference_wizardcoder.py), an error occurs

python src/inference_wizardcoder.py --base_model "bigcode/starcoder" --input_data_path "./input.jsonl" --output_data_path "./result.jsonl"

please help me! thankyou

Add acknowledgement to HumanEval evaluation script from CodeT5+ repo

Hi WizardCoder team,

I find that your evaluation script for HumanEval is mainly based on the one we released in the CodeT5+ repo. Specifically, the provided example bash script is almost identical to our run_generate.sh, and your humaneval_gen.py is mainly based on our generate_codet5p.py. Please add proper acknowledgement to our CodeT5+ if you leverage our open-source scripts in your project. Thanks.

78k evolved code instructions

Thank you for your wonderful work.
The paper introduces a data volume of 78k, of which 20K comes from Alpaca. Where does the other instruction data come from?

resume_from_checkpoint

resume_from_checkpoint doesn't

Evol instruct with Dolly

Please confirm if there are any licensing terms with Evol instruct 70k, especially for commercial use

how to use with Auto-gpt

i want to use wizardlm with autogpt, anyone has idea how to do it???

Are you planning WizardLM 65B model?

Are you planning WizardLM 65B model?
Just asking ;)

Noways running such big models is quite easy with the cheap rtx 3090 and llama.cpp for instance - getting 2 tokens/s

Remove "As an AI language model..." from dataset.

The model very often writes "As an AI language model, I do not have opinions", even though the question was factual and did not ask for the model's personal opinion. I think there is too much that stuff in the dataset. Think about removing this when you start training other models.

Clarification in the paper

Hi authors, thank you for releasing code and data for this project. I am confused about the following part in the paper.

For fair comparison, we replace Alpaca’s original Davici-003 response with ChatGPT’s response, and also sample 70K instructions subset to train WizardLM.

(1) In your alpaca baseline, do you use their original data or response generated from ChatGPT?
(2) Since you use the dataset from alpaca as seed dataset, do you change original alpaca response to train WizardLM?
(3) What does the following sentence mean? My understanding is that you sampled 70k data from the full 250k data, which have already included alpaca' seed dataset. Why do you say also sample 70K instructions subset to train WizardLM.?

also sample 70K instructions subset to train WizardLM.

Thank you for your clarification.

有没有考虑出一版中文数据集？

DeepSpeed Configuration JSON

Please what GPUs did you use to train

Can you please share your deepspeed config json

I can't get finetune to work with the command you gave on your readme.md and with the deepspeed config json in llamax.

I tried 8x 4090 and 4x A100 neither worked

I will need to use the exact hardware and exact hyperparameters and exact deepspeed config file you used.

Wizardcoder for code completion?

First of all, thanks for releasing the model. It is way better than anything else that was available up to this point.

I see the prompt used to get Wizardcoder to answer to instructions. I was wondering, can unprompted wizardcoder be used for code completion like base starcoder (e.g. if I processed its output to make it compatible with the HF Vscode extension)

Secondly, does it retain the capability of performing fill in the middle like the original starcoder model? If so, are the special tokens needed the same?

Missing arguments in evaluate method

Hi guys,
ther model and the tokenizer have to passed in as arguments into the evaluate() method within inference_wizardlm.py

Failed to recognize model when trying to load it in

I'm having this issue from which my computer can't load in the 7b parameter model. I downloaded the whole folder and tried to run it but kept getting this issue. When I run it I use "--auto-devices --chat --wbit 4 --groupsize 128 --pre_layer 27" and from the interface I load in this mode, and I've also done --model wizardLM-7B-GPTQ-4bit-128g with the same result. I don't think its a memory issue as I've gotten them before and typically they inform you that you couldn't allocate the required space. Here it doesn't give me that prompt but just tells me it can't determine the model type. I've ran other models like the gpt4 x alpaca model so I know I shouldn't be a location issue.

Can't determine model type from model name. Please specify it manually using --model_type argument

Traceback (most recent call last):
File “C:\Users\Name\Downloads\Super_SD2\Ooba\oobabooga_windows\text-generation-webui[server.py](http://server.py/)”, line 102, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name)
File “C:\Users\Name\Downloads\Super_SD2\Ooba\oobabooga_windows\text-generation-webui\modules[models.py](http://models.py/)”, line 158, in load_model
model = load_quantized(model_name)
File “C:\Users\Name\Downloads\Super_SD2\Ooba\oobabooga_windows\text-generation-webui\modules\GPTQ_loader.py”, line 147, in load_quantized
exit()
File “C:\Users\Name\Downloads\Super_SD2\Ooba\oobabooga_windows\installer_files\env\lib_sitebuiltins.py”, line 26, in call
raise SystemExit(code)
SystemExit: None

Dataset for evol 250k?

I very much hope the dataset will be released, any info on this?

training

thank you for sharing this amazing model
could you please guide me on how I can train this model for another language?

Error while generating wizardlm weights: missing utils file in train_freeform.py

Hi guys,

I followed the instructions to generate the weights from source. There is no file named "utils" imported in line 25 in train_freeform.py. Please provide that in the repo.

About Gradio

Thanks for your excellent work!

I wonder how do you keep your Gradio demo link active, since share links of Gradio will expire after 72 hours (according to the offcial document of Gradio).

Use OpenAI Evals

Introduction

After releasing GPT-4, OpenAI found that there is a significant lack in standardized benchmarks for how LLMs perform at common tasks. Thus, OpenAI created Evals, an open-source collection of crowdsourced tests for LLMs. Unless there are other projects I'm unaware of, this is the closest to a standardized benchmark for how we've now come to use LLMs.

Why?

I believe that it is vital for our open-access LLM ecosystem to use the same standardized testing methodology, allowing us to objectively compare our results.
While surveys are good, they aren't trivial to run. Automated evaluation can help measure a model's performance throughout the training process or compare different model sizes, instruction datasets, or iterations.
It is also vital to push this kind of robust testing to other models, especially those who didn't have the time or resources to run surveys, and instead opted to ask ChatGPT to "compare" the responses, an obviously flawed approach for many reasons. The ability to objectively compare WizardLM against Vicuna or WizardVicunaLM may, for example, help test if mixing these two works well or if some changes to the dataset generation are required.

Implementation:

I cannot speak much on the matter of implementation, however it is worth mentioning that someone had already used OpenAI Evals on OpenAssistant models. I suppose the implementation would be rather similar.

Why is the 30B version not listed on the Alpaca Eval leaderboard?

https://tatsu-lab.github.io/alpaca_eval/

Just curious why only the 13B version is included here, would the 30B version perform better?

Code Instruction Data Release

Hi, great work! I was curious if the code_instruction_data is released somewhere or otherwise when it might be released!

Does WizardCoder's training data have spelling mistakes?

WizardLM/WizardCoder/README.md

Line 140 in 2c900fe

{"idx": 12, "Instruction": "Write a Jave code to sum 1 to 10."}

Model does not obey follow up instructions very well(at all), compared to Vicuna for example

I use these LLM's for creative writing and with Vicuna I'm able to get it to revise sections, like add or remove details, make things longer or shorter, etc (with varying degrees of success of course).

With WizardLM it sometimes give me a prompt saying, sure I'll do such and such, but it never changes anything.

request for uploading gradio demo on repo

@nlpxucan could you upload the current gradio source scripts for the current wizardlm 13b trained on 250k dataset.
also would be cool if you guys could make a chat UI.

Have you tried LoRA?

It looks like you are doing a full fine-tuning run, have you experimented with LoRA?

Missing LICENSE file

I see you have no LICENSE file for this project. The repository mentions Apache-2.0. When the license file is standardized with the full text of the license, the license clearly appears in the About section on the first pageview.

I could make a patch for this if you would like help.

Alaways get the exactaly same answer with the same question

What I mean is that the answer doesn't miss a single letter.
Why is this happening? Usually, language models like GPT will generate slightly different results for the same answer, at least not a single letter off.

dumb number tokenisation

All modern transformer models has problems with math. that's because they tokenise numbers weirdly.
Your model supposed to be better in math than others. But main reason, why are transformers so bad wasn't fixed.
to fix it properly, you'll need special tokeniser for numbers. But it's too much headache.
It's possible to fix it with less effort. Just remove all tokens with numbers, except single digits. And train model a bit more to correct loss.

Gradio demo erroring at the moment

Asking "Why is the sky blue?" to the linked demo causes the message "Error" without any further elaboration.

Over-hyping

Your README claims this model outperforms ChatGPT with "complex tasks." How should we define complex tasks? I asked it to write me shader code for Open GL ES 2.0, and it generated gibberish.

Once models fail a test like that, I like to ask them science questions because most of them will spew garbage (such as the Earth having 5 moons, or Jupiter being 10x larger than it actually is). I asked Wizard via this interface to describe the Moon... so it told me the Moon directly influences human menstrual cycles:

I then completely changed the parameters from the default, using temp 0.69, top p 0.5, top k 0, and it basically repeated itself:

There is no universe in which ChatGPT would say this.

Please don't misunderstand - I'm by no means trying to degrade your work. It really does seem incredible, and I can't wait to see how this turns out. It kicks ass with some other tests. But claiming you've defeated the industry standard God only harms progress. The Alpaca team did the same and claimed they too had beaten ChatGPT. Please be honest in your README. Perhaps this model fulfilled your exact use case; but the goal of defeating the ChatGPT has not been achieved.

Add problem solving to evol-instruct

Description:

The current performance of Evol-Instruct on math, geometry, and physics problem solving is rather poor. To enhance the overall reasoning/basic math capabilities of WizardLM, I believe more high-school level physics, algebra, or geometry problems should be present within the dataset. GPT-4 seems to do mostly fine on them, so it seems doable, though being a smaller model, it would be quite interesting to see how far it can get.

This dataset I found didn't contain much physics questions in particular, which tracks well with hallucinated formulas and inability to reason step by step to find intermediary values.

Load the model using the transformer and pytorch api

Sorry if this question has been asked and answered but I cannot find anything related to it.

Is there a way to download and use the model through the transformer and pytorch API, just like dolly or vicuna, explained on their model card page like below:
https://huggingface.co/databricks/dolly-v2-3b
https://huggingface.co/CarperAI/stable-vicuna-13b-delta

The WizardLM model card on huggingface (https://huggingface.co/TheBloke/wizardLM-7B-GPTQ) only shows how to configure it for text-generate-webui.

Can anyone shed some light on this please?

AssertionError: LLaMA is now in HuggingFace's main branch.

Hey, no matter what I do (I installed the transformers lib from the git repo directly), this error pops up when trying to load the inference code.

Could you please provide a basic example of how to install all dependencies via conda and which ones are required? Thank you very much

how to use prompt in 7B model

could u provide a example to how to use prompt in 7B model, I can hardly unerstand the readme file. Many thx

Error loading fine-tuned wizard model

Hi guys,
I have used your fine-tuning script on a custom data set to fine tune the wizardLm model. The training works without problems, the loss decreases and all the relevant model files are stored in the specified output directory.
However I checked the size of the pytorch_model.bin and it is only 623 KB large so I guess the error must be within saving the model. I used the model "TheBloke/wizardLM-7B-HF" from HuggingFace as base model for finetuning.
If I try to load the model for inference using the inference_wizardlm.py script giving it the output_directory specified during training it turns up this error:

'Traceback (most recent call last):
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/dolly-gpu-a100/code/Users/adrian.gabriel/WizardLM/src/inference_wizardlm.py", line 132, in
fire.Fire(main)
File "/anaconda/envs/llamax/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/anaconda/envs/llamax/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/anaconda/envs/llamax/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/dolly-gpu-a100/code/Users/adrian.gabriel/WizardLM/src/inference_wizardlm.py", line 121, in main
_output = evaluate(tokenizer, model, instruction)
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/dolly-gpu-a100/code/Users/adrian.gabriel/WizardLM/src/inference_wizardlm.py", line 57, in evaluate
generation_output = model.generate(
File "/anaconda/envs/llamax/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/anaconda/envs/llamax/lib/python3.10/site-packages/transformers/generation/utils.py", line 1437, in generate
return self.greedy_search(
File "/anaconda/envs/llamax/lib/python3.10/site-packages/transformers/generation/utils.py", line 2248, in greedy_search
outputs = self(
File "/anaconda/envs/llamax/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/anaconda/envs/llamax/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/anaconda/envs/llamax/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 687, in forward
outputs = self.model(
File "/anaconda/envs/llamax/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/anaconda/envs/llamax/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 530, in forward
inputs_embeds = self.embed_tokens(input_ids)
File "/anaconda/envs/llamax/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/anaconda/envs/llamax/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/anaconda/envs/llamax/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 158, in forward
return F.embedding(
File "/anaconda/envs/llamax/lib/python3.10/site-packages/torch/nn/functional.py", line 2199, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: 'weight' must be 2-D
'
Where could the problem be?

Falcon SFT

Great work. I guess the sft dataset can affect the performance of model. Do you make a supervised finetuning using your high-quality data on Falcon-40B? Thanks a lot.

Demo 64bit bitwise test minor issue found.

CodeTest.txt
Performs relatively well analyzing this code(much better than starchat). it gets close to figuring out the compression being used but tends to miss slightly on the particulars of what data is being removed when the data is compressed. The answer being every 64bit value at the beginning of the set with all bits set to true. it got close in a run saying every 64bit value set to true but this is too vague to be correct since a 64bit value with just 1 bit set would still satisfy that statement but the compression would in fact leave such a value in.

78k evolved code instructions

Hi WizardCoder team,

Is the dataset( 78k evolved code instructions) available for download?
Thank you very much

Plans for future improvements?

Thank you for providing two excellent models to the open source community. Since you created WizardCoder after WizardLM I wonder: Do you have any plans on optimizing those models further or introduce new ones?

It would be nice if new insights from the Orca paper would be used to improve the same models even further.

Model used for generating training data?

Hi guys,
what LLM are you using for generating the training data? It is the proprietary ChatGPT 3.5 Turbo right? Have you tried out open-source alternatives for data generation as well? How does the usage of ChatGPT for training data generation affect the use of WizardLLM?

Duplications in evol_instruct_70k dataset

Firstly, thanks for publicly releasing the instruction dataset.

While looking through the dataset, I've noticed several examples where the output is already present in the instruction with barely any changes.
Example (index 47821 from the dataset):
Instruction:

Can you provide a table of 6 different tools used for data science along with their descriptions and features?
| Tool | Description | Features |
|------|-------------|----------|
| Python | A high-level programming language used for data analysis and machine learning | Large standard library, easy to learn, supports multiple paradigms |
| R | A programming language and software environment for statistical computing and graphics | Large community, extensive library of statistical and graphical methods, open-source |
| SQL | A domain-specific language used in programming and designed for managing data held in a relational database management system | Easy to learn, widely used, powerful for managing large amounts of structured data |
| Tableau | A data visualization and business intelligence tool used to create interactive dashboards and reports | User-friendly interface, drag-and-drop functionality, ability to connect to multiple data sources |
| Apache Spark | An open-source distributed computing system used for big data processing and machine learning | Fast processing speed, ability to handle large datasets, supports multiple languages |
| TensorFlow | An open-source software library for dataflow and differentiable programming across a range of tasks | Popular for machine learning and deep learning, supports distributed computing, flexible architecture |

Output:

Sure, here's a table of 6 different tools used for data science along with their descriptions and features:
| Tool | Description | Features |
|------|-------------|----------|
| Python | A high-level programming language used for data analysis and machine learning | Large standard library, easy to learn, supports multiple paradigms |
| R | A programming language and software environment for statistical computing and graphics | Large community, extensive library of statistical and graphical methods, open-source |
| SQL | A domain-specific language used in programming and designed for managing data held in a relational database management system | Easy to learn, widely used, powerful for managing large amounts of structured data |
| Tableau | A data visualization and business intelligence tool used to create interactive dashboards and reports | User-friendly interface, drag-and-drop functionality, ability to connect to multiple data sources |
| Apache Spark | An open-source distributed computing system used for big data processing and machine learning | Fast processing speed, ability to handle large datasets, supports multiple languages |
| TensorFlow | An open-source software library for dataflow and differentiable programming across a range of tasks | Popular for machine learning and deep learning, supports distributed computing, flexible architecture |

Indices of some similar samples: 17421, 17926, 24717, 29166, 41705, 61826, 62587

Is this due to a bug during the creation of the dataset? Was the WizardLM model trained with such examples?