nlpxucan / wizardlm Goto Github PK
View Code? Open in Web Editor NEWLLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath
LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath
WizardCoder currently has weaknesses in understanding and using OpenAPI specs in order to create code that queries the specified API to answer a specific question. It would be great to have a OpenAPI-finetuned version.
Llamax code it knows how to handle alpaca formatted QA data, but I didnt' see anything in there to handle ShareGPT format data,
How do I finetune with the new format? Your finetune guide (https://github.com/nlpxucan/WizardLM/blob/main/WizardLM/README.md#fine-tuning) still references the 70k dataset
wizardLM can't receive chunk of text and rewrite it as specified. I use simple prompt template.
TEXT: {book}
DIRECTIVE: read the text and formulate N relevant questions with answers. All questions should be independent and answerable without context. Start questions strictly with "Q:", and answers with "A:". Begin:
### Response
Q:
It could follow format. But hallucinate questions and answers, which are irrelevant to text.
I am trying to fine-tune on 8x 4090s
I use this command I copied from the readme.md:
deepspeed --num_gpus 8 --num_nodes 1 train_freeform.py \
--model_name_or_path /workspace/llama-7b-hf \
--data_path /workspace/WizardLM_alpaca_evol_instruct_70k_unfiltered/WizardLM_alpaca_evol_instruct_70k_unfiltered.json \
--output_dir /workspace/WizardLM-7B-unfiltered/ \
--num_train_epochs 3 \
--model_max_length 2048 \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 1 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 800 \
--save_total_limit 3 \
--learning_rate 2e-5 \
--warmup_steps 2 \
--logging_steps 2 \
--lr_scheduler_type "cosine" \
--report_to "tensorboard" \
--gradient_checkpointing True \
--deepspeed configs/deepspeed_config.json \
--fp16 True
I am confused because I don't see any output on stdout, but I see all 8 GPUs at 100% utilization but the VRAM is only 7% utilization. Do I need to modify any of these parameters to use 8x 4090 VRAM? Can you please tell me which ones I need to change and and how to I get it to display the status on stdout?
Hello, you models are very interesting.
However the work would have bigger impact if also train a FalconLM derived model, which has Apache-2 license.
Thank, Vaclav
The weight_diff_wizard.py
is importing from a train
module. But there is no train module in the repo.
When I try to load WizardLM into my CMD it says "Loading wizardLM-7B-GPTQ-4bit-128g... Can't determine model type from model name. Please specify it manually using --model_type argument" and then closes. I do not have these issues with 2 other models that I load.
Hi WizardCoder team,
I find that your evaluation script for HumanEval is mainly based on the one we released in the CodeT5+ repo. Specifically, the provided example bash script is almost identical to our run_generate.sh, and your humaneval_gen.py is mainly based on our generate_codet5p.py. Please add proper acknowledgement to our CodeT5+ if you leverage our open-source scripts in your project. Thanks.
Thank you for your wonderful work.
The paper introduces a data volume of 78k, of which 20K comes from Alpaca. Where does the other instruction data come from?
resume_from_checkpoint doesn't
Please confirm if there are any licensing terms with Evol instruct 70k, especially for commercial use
i want to use wizardlm with autogpt, anyone has idea how to do it???
Are you planning WizardLM 65B model?
Just asking ;)
Noways running such big models is quite easy with the cheap rtx 3090 and llama.cpp for instance - getting 2 tokens/s
The model very often writes "As an AI language model, I do not have opinions", even though the question was factual and did not ask for the model's personal opinion. I think there is too much that stuff in the dataset. Think about removing this when you start training other models.
Hi authors, thank you for releasing code and data for this project. I am confused about the following part in the paper.
For fair comparison, we replace Alpaca’s original Davici-003 response with ChatGPT’s response, and also sample 70K instructions subset to train WizardLM.
(1) In your alpaca baseline, do you use their original data or response generated from ChatGPT?
(2) Since you use the dataset from alpaca as seed dataset, do you change original alpaca response to train WizardLM?
(3) What does the following sentence mean? My understanding is that you sampled 70k data from the full 250k data, which have already included alpaca' seed dataset. Why do you say also sample 70K instructions subset to train WizardLM.?
also sample 70K instructions subset to train WizardLM.
Thank you for your clarification.
RT
Please what GPUs did you use to train
Can you please share your deepspeed config json
I can't get finetune to work with the command you gave on your readme.md and with the deepspeed config json in llamax.
I tried 8x 4090 and 4x A100 neither worked
I will need to use the exact hardware and exact hyperparameters and exact deepspeed config file you used.
First of all, thanks for releasing the model. It is way better than anything else that was available up to this point.
I see the prompt used to get Wizardcoder to answer to instructions. I was wondering, can unprompted wizardcoder be used for code completion like base starcoder (e.g. if I processed its output to make it compatible with the HF Vscode extension)
Secondly, does it retain the capability of performing fill in the middle like the original starcoder model? If so, are the special tokens needed the same?
Hi guys,
ther model and the tokenizer have to passed in as arguments into the evaluate() method within inference_wizardlm.py
I'm having this issue from which my computer can't load in the 7b parameter model. I downloaded the whole folder and tried to run it but kept getting this issue. When I run it I use "--auto-devices --chat --wbit 4 --groupsize 128 --pre_layer 27" and from the interface I load in this mode, and I've also done --model wizardLM-7B-GPTQ-4bit-128g with the same result. I don't think its a memory issue as I've gotten them before and typically they inform you that you couldn't allocate the required space. Here it doesn't give me that prompt but just tells me it can't determine the model type. I've ran other models like the gpt4 x alpaca model so I know I shouldn't be a location issue.
Can't determine model type from model name. Please specify it manually using --model_type argument
Traceback (most recent call last):
File “C:\Users\Name\Downloads\Super_SD2\Ooba\oobabooga_windows\text-generation-webui[server.py](http://server.py/)”, line 102, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name)
File “C:\Users\Name\Downloads\Super_SD2\Ooba\oobabooga_windows\text-generation-webui\modules[models.py](http://models.py/)”, line 158, in load_model
model = load_quantized(model_name)
File “C:\Users\Name\Downloads\Super_SD2\Ooba\oobabooga_windows\text-generation-webui\modules\GPTQ_loader.py”, line 147, in load_quantized
exit()
File “C:\Users\Name\Downloads\Super_SD2\Ooba\oobabooga_windows\installer_files\env\lib_sitebuiltins.py”, line 26, in call
raise SystemExit(code)
SystemExit: None
I very much hope the dataset will be released, any info on this?
thank you for sharing this amazing model
could you please guide me on how I can train this model for another language?
Hi guys,
I followed the instructions to generate the weights from source. There is no file named "utils" imported in line 25 in train_freeform.py. Please provide that in the repo.
Thanks for your excellent work!
I wonder how do you keep your Gradio demo link active, since share links of Gradio will expire after 72 hours (according to the offcial document of Gradio).
After releasing GPT-4, OpenAI found that there is a significant lack in standardized benchmarks for how LLMs perform at common tasks. Thus, OpenAI created Evals, an open-source collection of crowdsourced tests for LLMs. Unless there are other projects I'm unaware of, this is the closest to a standardized benchmark for how we've now come to use LLMs.
I believe that it is vital for our open-access LLM ecosystem to use the same standardized testing methodology, allowing us to objectively compare our results.
While surveys are good, they aren't trivial to run. Automated evaluation can help measure a model's performance throughout the training process or compare different model sizes, instruction datasets, or iterations.
It is also vital to push this kind of robust testing to other models, especially those who didn't have the time or resources to run surveys, and instead opted to ask ChatGPT to "compare" the responses, an obviously flawed approach for many reasons. The ability to objectively compare WizardLM against Vicuna or WizardVicunaLM may, for example, help test if mixing these two works well or if some changes to the dataset generation are required.
I cannot speak much on the matter of implementation, however it is worth mentioning that someone had already used OpenAI Evals on OpenAssistant models. I suppose the implementation would be rather similar.
https://tatsu-lab.github.io/alpaca_eval/
Just curious why only the 13B version is included here, would the 30B version perform better?
Hi, great work! I was curious if the code_instruction_data
is released somewhere or otherwise when it might be released!
WizardLM/WizardCoder/README.md
Line 140 in 2c900fe
I use these LLM's for creative writing and with Vicuna I'm able to get it to revise sections, like add or remove details, make things longer or shorter, etc (with varying degrees of success of course).
With WizardLM it sometimes give me a prompt saying, sure I'll do such and such, but it never changes anything.
@nlpxucan could you upload the current gradio source scripts for the current wizardlm 13b trained on 250k dataset.
also would be cool if you guys could make a chat UI.
It looks like you are doing a full fine-tuning run, have you experimented with LoRA?
I see you have no LICENSE file for this project. The repository mentions Apache-2.0. When the license file is standardized with the full text of the license, the license clearly appears in the About section on the first pageview.
I could make a patch for this if you would like help.
What I mean is that the answer doesn't miss a single letter.
Why is this happening? Usually, language models like GPT will generate slightly different results for the same answer, at least not a single letter off.
All modern transformer models has problems with math. that's because they tokenise numbers weirdly.
Your model supposed to be better in math than others. But main reason, why are transformers so bad wasn't fixed.
to fix it properly, you'll need special tokeniser for numbers. But it's too much headache.
It's possible to fix it with less effort. Just remove all tokens with numbers, except single digits. And train model a bit more to correct loss.
Asking "Why is the sky blue?" to the linked demo causes the message "Error" without any further elaboration.
Your README claims this model outperforms ChatGPT with "complex tasks." How should we define complex tasks? I asked it to write me shader code for Open GL ES 2.0, and it generated gibberish.
Once models fail a test like that, I like to ask them science questions because most of them will spew garbage (such as the Earth having 5 moons, or Jupiter being 10x larger than it actually is). I asked Wizard via this interface to describe the Moon... so it told me the Moon directly influences human menstrual cycles:
I then completely changed the parameters from the default, using temp 0.69, top p 0.5, top k 0, and it basically repeated itself:
There is no universe in which ChatGPT would say this.
Please don't misunderstand - I'm by no means trying to degrade your work. It really does seem incredible, and I can't wait to see how this turns out. It kicks ass with some other tests. But claiming you've defeated the industry standard God only harms progress. The Alpaca team did the same and claimed they too had beaten ChatGPT. Please be honest in your README. Perhaps this model fulfilled your exact use case; but the goal of defeating the ChatGPT has not been achieved.
The current performance of Evol-Instruct on math, geometry, and physics problem solving is rather poor. To enhance the overall reasoning/basic math capabilities of WizardLM, I believe more high-school level physics, algebra, or geometry problems should be present within the dataset. GPT-4 seems to do mostly fine on them, so it seems doable, though being a smaller model, it would be quite interesting to see how far it can get.
This dataset I found didn't contain much physics questions in particular, which tracks well with hallucinated formulas and inability to reason step by step to find intermediary values.
Sorry if this question has been asked and answered but I cannot find anything related to it.
Is there a way to download and use the model through the transformer and pytorch API, just like dolly or vicuna, explained on their model card page like below:
https://huggingface.co/databricks/dolly-v2-3b
https://huggingface.co/CarperAI/stable-vicuna-13b-delta
The WizardLM model card on huggingface (https://huggingface.co/TheBloke/wizardLM-7B-GPTQ) only shows how to configure it for text-generate-webui.
Can anyone shed some light on this please?
Hey, no matter what I do (I installed the transformers lib from the git repo directly), this error pops up when trying to load the inference code.
Could you please provide a basic example of how to install all dependencies via conda and which ones are required? Thank you very much
could u provide a example to how to use prompt in 7B model, I can hardly unerstand the readme file. Many thx
Hi guys,
I have used your fine-tuning script on a custom data set to fine tune the wizardLm model. The training works without problems, the loss decreases and all the relevant model files are stored in the specified output directory.
However I checked the size of the pytorch_model.bin and it is only 623 KB large so I guess the error must be within saving the model. I used the model "TheBloke/wizardLM-7B-HF" from HuggingFace as base model for finetuning.
If I try to load the model for inference using the inference_wizardlm.py script giving it the output_directory specified during training it turns up this error:
'Traceback (most recent call last):
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/dolly-gpu-a100/code/Users/adrian.gabriel/WizardLM/src/inference_wizardlm.py", line 132, in
fire.Fire(main)
File "/anaconda/envs/llamax/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/anaconda/envs/llamax/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/anaconda/envs/llamax/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/dolly-gpu-a100/code/Users/adrian.gabriel/WizardLM/src/inference_wizardlm.py", line 121, in main
_output = evaluate(tokenizer, model, instruction)
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/dolly-gpu-a100/code/Users/adrian.gabriel/WizardLM/src/inference_wizardlm.py", line 57, in evaluate
generation_output = model.generate(
File "/anaconda/envs/llamax/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/anaconda/envs/llamax/lib/python3.10/site-packages/transformers/generation/utils.py", line 1437, in generate
return self.greedy_search(
File "/anaconda/envs/llamax/lib/python3.10/site-packages/transformers/generation/utils.py", line 2248, in greedy_search
outputs = self(
File "/anaconda/envs/llamax/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/anaconda/envs/llamax/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/anaconda/envs/llamax/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 687, in forward
outputs = self.model(
File "/anaconda/envs/llamax/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/anaconda/envs/llamax/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 530, in forward
inputs_embeds = self.embed_tokens(input_ids)
File "/anaconda/envs/llamax/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/anaconda/envs/llamax/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/anaconda/envs/llamax/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 158, in forward
return F.embedding(
File "/anaconda/envs/llamax/lib/python3.10/site-packages/torch/nn/functional.py", line 2199, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: 'weight' must be 2-D
'
Where could the problem be?
Great work. I guess the sft dataset can affect the performance of model. Do you make a supervised finetuning using your high-quality data on Falcon-40B? Thanks a lot.
CodeTest.txt
Performs relatively well analyzing this code(much better than starchat). it gets close to figuring out the compression being used but tends to miss slightly on the particulars of what data is being removed when the data is compressed. The answer being every 64bit value at the beginning of the set with all bits set to true. it got close in a run saying every 64bit value set to true but this is too vague to be correct since a 64bit value with just 1 bit set would still satisfy that statement but the compression would in fact leave such a value in.
Thank you for providing two excellent models to the open source community. Since you created WizardCoder after WizardLM I wonder: Do you have any plans on optimizing those models further or introduce new ones?
It would be nice if new insights from the Orca paper would be used to improve the same models even further.
Hi guys,
what LLM are you using for generating the training data? It is the proprietary ChatGPT 3.5 Turbo right? Have you tried out open-source alternatives for data generation as well? How does the usage of ChatGPT for training data generation affect the use of WizardLLM?
Firstly, thanks for publicly releasing the instruction dataset.
While looking through the dataset, I've noticed several examples where the output
is already present in the instruction
with barely any changes.
Example (index 47821
from the dataset):
Instruction:
Can you provide a table of 6 different tools used for data science along with their descriptions and features?
| Tool | Description | Features |
|------|-------------|----------|
| Python | A high-level programming language used for data analysis and machine learning | Large standard library, easy to learn, supports multiple paradigms |
| R | A programming language and software environment for statistical computing and graphics | Large community, extensive library of statistical and graphical methods, open-source |
| SQL | A domain-specific language used in programming and designed for managing data held in a relational database management system | Easy to learn, widely used, powerful for managing large amounts of structured data |
| Tableau | A data visualization and business intelligence tool used to create interactive dashboards and reports | User-friendly interface, drag-and-drop functionality, ability to connect to multiple data sources |
| Apache Spark | An open-source distributed computing system used for big data processing and machine learning | Fast processing speed, ability to handle large datasets, supports multiple languages |
| TensorFlow | An open-source software library for dataflow and differentiable programming across a range of tasks | Popular for machine learning and deep learning, supports distributed computing, flexible architecture |
Output:
Sure, here's a table of 6 different tools used for data science along with their descriptions and features:
| Tool | Description | Features |
|------|-------------|----------|
| Python | A high-level programming language used for data analysis and machine learning | Large standard library, easy to learn, supports multiple paradigms |
| R | A programming language and software environment for statistical computing and graphics | Large community, extensive library of statistical and graphical methods, open-source |
| SQL | A domain-specific language used in programming and designed for managing data held in a relational database management system | Easy to learn, widely used, powerful for managing large amounts of structured data |
| Tableau | A data visualization and business intelligence tool used to create interactive dashboards and reports | User-friendly interface, drag-and-drop functionality, ability to connect to multiple data sources |
| Apache Spark | An open-source distributed computing system used for big data processing and machine learning | Fast processing speed, ability to handle large datasets, supports multiple languages |
| TensorFlow | An open-source software library for dataflow and differentiable programming across a range of tasks | Popular for machine learning and deep learning, supports distributed computing, flexible architecture |
Indices of some similar samples: 17421, 17926, 24717, 29166, 41705, 61826, 62587
Is this due to a bug during the creation of the dataset? Was the WizardLM
model trained with such examples?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.