Giter Site home page Giter Site logo

magicoder's Introduction

🎩 Magicoder: Source Code Is All You Need

Paper

🎩 Models | 📚 Dataset | 🚀 Quick Start | 👀 Demo | 📝 Citation | 🙏 Acknowledgements

We are thrilled that Magicoder and OSS-Instruct have inspired many amazing projects, including:

Contact: Yuxiang Wei, Zhe Wang, Yifeng Ding, Jiawei Liu, Lingming Zhang.

About

  • 🎩Magicoder is a model family empowered by 🪄OSS-Instruct, a novel approach to enlightening LLMs with open-source code snippets for generating low-bias and high-quality instruction data for code.
  • 🪄OSS-Instruct mitigates the inherent bias of the LLM-synthesized instruction data by empowering them with a wealth of open-source references to produce more diverse, realistic, and controllable data.

Important

Overview of OSS-Instruct Overview of Result

🎩 Models

Model Checkpoint Size HumanEval (+) MBPP (+) License
Magicoder-CL-7B 🤗 HF Link 7B 60.4 (55.5) 64.2 (52.6) Llama2
Magicoder-S-CL-7B 🤗 HF Link 7B 70.7 (66.5) 68.4 (56.6) Llama2
Magicoder-DS-6.7B 🤗 HF Link 6.7B 66.5 (60.4) 75.4 (61.9) DeepSeek
Magicoder-S-DS-6.7B 🤗 HF Link 6.7B 76.8 (70.7) 75.7 (64.4) DeepSeek

👀 Demo

Online Gradio Demo

Quickly try out our Magicoder Playground powered by gradio! Huge thanks to AK(@_akhaliq) and the Hugging Face team for their support!

Local Gradio Demo

We follow WizardCoder and provide the script to build a local demo server. You can launch your local gradio demo as following:

cd demo
CUDA_VISIBLE_DEVICES=0 python magicoder_demo.py \
   --base_model "ise-uiuc/Magicoder-S-DS-6.7B" \
   --device "cuda:0" \
   --port 8080

📚 Dataset

🚀 Quick Start

from transformers import pipeline
import torch

MAGICODER_PROMPT = """You are an exceptionally intelligent coding assistant that consistently delivers accurate and reliable responses to user instructions.

@@ Instruction
{instruction}

@@ Response
"""

instruction = "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."

prompt = MAGICODER_PROMPT.format(instruction=instruction)
generator = pipeline(
    model="ise-uiuc/Magicoder-S-DS-6.7B",
    task="text-generation",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
result = generator(prompt, max_length=2048, num_return_sequences=1, temperature=0.0)
print(result[0]["generated_text"])

This code snippet will generate the following output:

Here is a simple Python implementation of a TODO list API:

```python
class TodoList:
    def __init__(self):
        self.todo_list = []

    def add_task(self, task):
        if not isinstance(task, str):
            raise ValueError("Task must be a string")
        self.todo_list.append(task)

    def remove_task(self, task):
        if task not in self.todo_list:
            raise ValueError("Task not found in the list")
        self.todo_list.remove(task)

    def get_tasks(self):
        return self.todo_list

    def update_task(self, old_task, new_task):
        if old_task not in self.todo_list:
            raise ValueError("Old task not found in the list")
        if not isinstance(new_task, str):
            raise ValueError("New task must be a string")
        index = self.todo_list.index(old_task)
        self.todo_list[index] = new_task

    def clear_list(self):
        self.todo_list = []
```

This API allows you to add tasks, remove tasks, get all tasks, update tasks, and clear the list. It also raises exceptions for invalid operations.

You can use this API like this:

```python
todo = TodoList()
todo.add_task("Buy groceries")
todo.add_task("Finish project")
print(todo.get_tasks())  # Output: ['Buy groceries', 'Finish project']
todo.update_task("Buy groceries", "Buy fruits")
print(todo.get_tasks())  # Output: ['Buy fruits', 'Finish project']
todo.remove_task("Finish project")
print(todo.get_tasks())  # Output: ['Buy fruits']
todo.clear_list()
print(todo.get_tasks())  # Output: []
```

📝 Citation

@article{wei2023magicoder,
  title={Magicoder: Source Code Is All You Need},
  author={Wei, Yuxiang and Wang, Zhe and Liu, Jiawei and Ding, Yifeng and Zhang, Lingming},
  journal={arXiv preprint arXiv:2312.02120},
  year={2023}
}

🙏 Acknowledgements

We thank AK(@_akhaliq) and the Hugging Face team for their support in the Magicoder Playground! We also thank the following amazing projects that truly inspired us:

⚠️ Important Note

  • Bias, Risks, and Limitations: Magicoders may sometimes make errors, produce misleading contents, or struggle to manage tasks that are not related to coding.

  • Usage: Magicoder models are trained on the synthetic data generated by OpenAI models. Please pay attention to OpenAI's terms of use when using the models and the datasets. Magicoders will not compete with any OpenAI's commercial product.

⭐️ Star History

Star History Chart

magicoder's People

Contributors

ganler avatar natedingyifeng avatar universefly avatar zhewang2001 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

magicoder's Issues

The templates used in reproducing the eval results: why adding the instruction again after "### Response: "?

There is an input format mismatch between the eval and training process. Do you intend to emphasize the problem before the model generates its output?

When doing the Humaneval(+) eval, the compiled inputs are as follows, eg.:


@@ Instruction
Write a solution to the following problem:
```python
def fib(n: int):
    """Return n-th Fibonacci number.
    >>> fib(10)
    55
    >>> fib(1)
    1
    >>> fib(8)
    21
    """

@@ Response

def fib(n: int):
    """Return n-th Fibonacci number.
    >>> fib(10)
    55
    >>> fib(1)
    1
    >>> fib(8)
    21
    """```

But in the code files about data processing and training, the instruction data would be compiled as:

You are an exceptionally intelligent coding assistant that consistently delivers accurate and reliable responses to user instructions.

@@ Instruction
Write a solution to the following coding problem:
{problem}

@@ Response
{response}

There is no such **_rephrasing/emphasizing_** in the training data of Magicoder. 
From the eval results this mismatch seems not to bring obvious negative effects, but did you deliberately do so?

the evaluation result of magicoder is not aligned with the result on paper

Hi, thanks for your great work.

I test the performance of magicoder, however the performance is not align with the result on the paper(68.9 vs. 76.8). I guess it is because I used different hyperparameter for inference, eg, --top_p 1, --temperature 1 and so on. I will be grateful if author can provide the specific hyperparameter for inference. Thank u. i list the script i used below:

I try to evaluate the magicoder on the script you provided in the 'experiment' folder by the following commond:

python experiments/text2code.py \
    --model_key deepseek-ai/deepseek-coder-6.7b-base \
    --dataset humaneval \
    --save_path output_dir/mc_6_7_ds \
    --n_batches 1 \
    --n_problems_per_batch 1 \
    --n_samples_per_problem 1 \
    --model_name_or_path ~/weight/magicoders-s-ds-6.7/ \
    --top_p 1 \
    --max_new_tokens 4096 \
    --temperature 1

then I use the the the following commond for evalplus:

docker run -v $(pwd):/app ganler/evalplus:latest --dataset humaneval --samples output_dir/mc_6_7_ds.jsonl

Finally, i got the result:

Base
{'pass@1': 0.6890243902439024}
Base + Extra
{'pass@1': 0.6158536585365854}

can you consider adding my explanation on how to use magicoder in text-generation-webui

After experimenting with text-generation-webui by oobabooga I found the following things
-Magicoder models are all instruct only models (no chat/chat-instruct)
-you need to create a new custom template under parameters/instruction template tab
-you also need to change these values under parameters/generation tab (max-new-tokens=1024,top_p=0.9,top_k=50,repetition-penality=1,repetetion-penality-range=1024)
-copy the content in the text file to custom template/instruction template
instruction-template-magicode.txt

I took the generation parameters from deepseek-coder/demo/app.py
The instruction template is edited from airoborous-v1.2 after comparing it with its prompt template

Confusion about the code of train

First of all, thank you for your amazing work!
I'm attempting to replicate the training process, and I have a question regarding the train.py file. In your paper, you mentioned using two A100-80G GPUs, but I couldn't find any mention of multiprocessing or distributed training in your code. I'm curious if you used deepspeed for training? If not, could you provide guidance on modifying the code to make it compatible with a multi-GPU setup?
Thanks once again!

The correctness of solution

how about the quality of the python code implementation in solutions?
Can they be used directly to train the model?

any environment requirement for the model, doesn't work in MacAir M1 (16G)

try to use magicoder with ollama on MacAir M1 (16G), it works for other model, but when I run this, got error

...
ggml_metal_init: GPU name:   Apple M1
ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007)
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 10922.67 MiB
ggml_metal_init: maxTransferRate               = built-in GPU
llama_new_context_with_model: compute buffer total size = 1083.07 MiB
llama_new_context_with_model: max tensor size =   102.54 MiB
ggml_metal_add_buffer: allocated 'data            ' buffer, size =  3648.58 MiB, ( 3649.20 / 10922.67)
ggml_metal_add_buffer: allocated 'kv              ' buffer, size =  8192.00 MiB, offs =            0
ggml_metal_add_buffer: allocated 'kv              ' buffer, size =     0.03 MiB, offs =   8589918208, (11841.23 / 10922.67)ggml_metal_add_buffer: warning: current allocated size is greater than the recommended max working set size
ggml_metal_add_buffer: allocated 'alloc           ' buffer, size =  1080.02 MiB, (12921.25 / 10922.67)ggml_metal_add_buffer: warning: current allocated size is greater than the recommended max working set size
ggml_metal_graph_compute: command buffer 0 failed with status 5
GGML_ASSERT: /tmp/ollama-20231213-4188-jpu97j/llm/llama.cpp/gguf/ggml-metal.m:1623: false
2023/12/23 16:46:59 llama.go:451: signal: abort trap
2023/12/23 16:46:59 llama.go:459: error starting llama runner: llama runner process has terminated
2023/12/23 16:46:59 llama.go:525: llama runner stopped successfully

googled, is it similar to ggerganov/llama.cpp#2048 ?

Not sure whether it can be tuned to work on this Mac, if not, it is better to add limitation (or requirement) in the README

Are the training loss and validation loss recorded?

Hi, Dear:

Thank you very much for your code. I am reproducing your training process. I wonder what your training loss and validation loss are during the training process, and I want to align them with your training process on dataset Magicoder-OSS-Instruct-75K and datasets--ise-uiuc--Magicoder-Evol-Instruct-110K

Thx

How to write prompt for code completion task

I've run prompt for python code completion using:
prompt_template = f"""Write a solution to the following problem:

{code}
```"""

"""
but the LLM result got nothing new in the  input prompt code part, just generated some other informations.

Data collection and generation

Thanks for releasing your research codes to everyone.
The variables here I found a bit difficult to figure out what they are for.
Can you please explain them?
And in general, could you please add a more comprehensive readme about the data collection/generation parts of the code base?
Thanks!

used Dilated attenton instead of Vanilla Attention in Llama model and fine-tuen the model ,

i want to ask if I can replace the Dilated attention with Attention used in the based model and do the fine-tuning, the idea behind this is to reduce the complexity of Attention and increase the Windows context, does DeepSeek use Llama 2 as a based model the same arch which means, I can load the Checkpoint of layers of the model such Normlayer and feedforward or I need to re-factor the LLM model from Scratch !!
or there's any method to adapt weight or Shared Weight

HuggingFace Playground has failed

Hello,
I have used the huggingface playground for this model in the past but now it has a runtime error. Should I be expecting that a fix is coming?

Thank You!

Possibility for a Mixture-of-Experts Model?

With the recent release of Mixtral 8x7B, there's a lot of hope and excitement around open-source MoE models.

It would be very interesting to see how a narrowly focused MoE model performs.

Text-gen prompt template?

Sorry if this was mentioned before, but is there a stock prompt template in ooba's text-gen that works with this?

Quantised Finetuning on 22GB*4 GPUs

Hello
I am trying to finetune CodeLlama-Python-hf on 4 GPUs with 22GB memory. Using the training processes mentioned in magiccoder readme gives error that CUDA is out of memory.

How can I quantise the model or optimise the memory usage to load in my machine?

catastrophic forgetting problem

Hi dear:

Thanks for your open source, but when i finetuned (whatever full parameters or LoRa ) on my dataset, catastrophic forgetting kept coming up (decrease in performance on the humaneval), i do not know how to solve it, do you have any tops?

Training data format for Magicoder-OSS-Instruct-75K

Hi, thx for the work!

I was wondering how you format the OSS75k data for training? Is it in the alpaca format like:

You are an exceptionally intelligent coding assistant that consistently delivers accurate and reliable responses to user instructions.

@@ Instruction
{instruction} # problem column of the OSS75k dataset

@@ Response
# solution column of the OSS75k dataset

Thx

Evaluation codes not found

Hello,

Thanks for providing such an amazing repo for LLMs to generate codes.
I am impressive that magicoder got a great result in HumanEval however I can't find the evaluation code for this.
It's great if the evaluation code is made available.

Optimizer selection

Hi, thx for the brilliant work!

I am curious about the decision to use Adafactor as the optimizer for Magicoder. Have other options been explored or tried in this context? 🤔

8台A40机器上复现magicoder-S-DS-6.7B的结果

因为README-DEV.md脚本直接使用accelerate提示训练内存不足,故修改为deepspeed-stage1启动,其余参数均为默认。因是8卡迭代步长缩小了1/4。

经过实验后我发现:

  1. 训练速度大幅降低
  2. 1阶段和2截断训练效果均无法达到60%

想咨询下机器不同,且增加了deepspeed有可能让结果变差这么多吗?

Achieved close performance of MagicoderS by finetuning only with `evol-codealpaca-v1`.

Thanks to your amazing tutorial, we reproduced the training process and experiments in the paper. The model finetuned by ourselves achieved the performance close to yours. For HumanEval(+), we got 57.32% / 52.44% pass@1 for Magicoder and 70.12% / 67.07% for MagicoderS.
Moreover, we conducted ablation studies to clarify the contribution of OSS-instruct with evol-instruct in the training process of MagicoderS.

  • We got close performance of MagicoderS (70.12% / 65.24% for HumanEval(+), similar results for DS-1000) by finetuning ONLY with evol-codealpaca-v1 dataset under the same training setting mentioned in the tutorial.
  • We got even worse results (66.46% / 62.20% for HumanEval(+)) by swaping the training order of oss-instruct-75k and evol-codealpaca-v1.

We noticed that oss-instruct-75k was generated by base model gpt-3.5-turbo-1106, whereas evol-codealpaca-v1 was generated by gpt-4, so the experiments of MagicoderS may be unfair. I think there should be additional evidence to show the contribution of OSS-instruct when it is combined with other data generation methods.

Any plans for a 33b fine tune?

Awesome models! Great job guys! :)
I am wondering if you plan to also finetune on top of deepseek-coder-33b-instruct as well? I wonder how high the evaluations will go with that model :)

Max Token = ?

After reading through the page i dint see any mention of any max token for Magicoder-S-DS-6.7B & Magicoder-DS-6.7B (is it safe to assume its the same as Deepseek Coder = 16k)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.