microsoft / pycodegpt Goto Github PK

View Code? Open in Web Editor NEW

240.0 15.0 39.0 1.06 MB

A pre-trained GPT model for Python code completion and generation

License: MIT License

Python 96.74% Shell 2.83% Jupyter Notebook 0.43%

pycodegpt's Issues

Demo notebook

Can you provide a demo notebook for how to use PyCodeGPT for code completion and code generation?

Training data

Have you released the training data that is used to train APIRetriever?

This repo is missing important files

There are important files that Microsoft projects should all have that are not present in this repository. A pull request has been opened to add the missing file(s). When the pr is merged this issue will be closed automatically.

Microsoft teams can learn more about this effort and share feedback within the open source guidance available internally.

Merge this pull request

No module named 'nl2code.dynamic_block_dataset' when run_generating_codes.sh

Hello.
When I executing run_generating_codes.sh, the following error occurs.

 File "/data/gmkim/PyCodeGPT/apicoder/CodeGenAPI/eval_private.py", line 13, in <module>
    from nl2code.modeling_codegen import CodeGenForCausalLM
  File "/data/gmkim/PyCodeGPT/apicoder/CodeGenAPI/nl2code/__init__.py", line 3, in <module>
    from .code_dataset import CodeBlockDataset, CodeDatasetCallBack
  File "/data/gmkim/PyCodeGPT/apicoder/CodeGenAPI/nl2code/code_dataset.py", line 14, in <module>
    from .dynamic_block_dataset import DynamicBlockDataset
ModuleNotFoundError: No module named 'nl2code.dynamic_block_dataset'

I couldn't find dynamic_block_dataset from this repository. Could you suggest the solution?

Could you clarify how to get `data/Cleaned-Private-Code-Files`?

From logic diagram between data and code for apicoder, it seems that data/Cleaned-Private-Code-Files is needed in advance before running APIRetriever/scripts/run_extract_apiretriever_corpus.sh.

However, this is not provided by default and I couldn't find script to generate these files.
Could you clarify how to get data/Cleaned-Private-Code-Files?

Gtp

Huggingface / others web app request

(Feature request) it would be better to deploy a web app using this code just like in the case of visual chatgpt so as to make this easily accessible ...

Where can i find the code for sktcher

thank you for answer

How can I get traind APIRetriever model(in apicoder)

In the part of apicoder, I notice although the API embediing and score is public , but the weight of the traind APIRetriever is not released.
I would appreciate it if you could release the weight kindly😍.

pandas-numpy-eval

It seems that filename should be official rather than offical?

pass@1 = 1.0 for HumanEval, pass@1 = 0.0 for TorchDataEval

I am trying to validate evaluation of apicoder.

I simply make a perfect evaluation file by "completion" field in the evaluation file same as "canonical_solutions" in the problem file.
However, all of the examples in TorchDataEval failed with "result": "failed: 'NoneType' object is not callable" error, while HumanEval pass all examples.
Any suggestion to solve this issue?

I attach the 2 examples in problem & evaluation files for HumanEval & TorchDataEval datasets for reference.

HumanEval

Problem file

{"task_id": "HumanEval/0", "prompt": "from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n", "entry_point": "has_close_elements", "canonical_solution": "    for idx, elem in enumerate(numbers):\n        for idx2, elem2 in enumerate(numbers):\n            if idx != idx2:\n                distance = abs(elem - elem2)\n                if distance < threshold:\n                    return True\n\n    return False\n", "test": "\n\nMETADATA = {\n    'author': 'jt',\n    'dataset': 'test'\n}\n\n\ndef check(candidate):\n    assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) == True\n    assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) == False\n    assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) == True\n    assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) == False\n    assert candidate([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) == True\n    assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) == True\n    assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False\n\n"}
{"task_id": "HumanEval/1", "prompt": "from typing import List\n\n\ndef separate_paren_groups(paren_string: str) -> List[str]:\n    \"\"\" Input to this function is a string containing multiple groups of nested parentheses. Your goal is to\n    separate those group into separate strings and return the list of those.\n    Separate groups are balanced (each open brace is properly closed) and not nested within each other\n    Ignore any spaces in the input string.\n    >>> separate_paren_groups('( ) (( )) (( )( ))')\n    ['()', '(())', '(()())']\n    \"\"\"\n", "entry_point": "separate_paren_groups", "canonical_solution": "    result = []\n    current_string = []\n    current_depth = 0\n\n    for c in paren_string:\n        if c == '(':\n            current_depth += 1\n            current_string.append(c)\n        elif c == ')':\n            current_depth -= 1\n            current_string.append(c)\n\n            if current_depth == 0:\n                result.append(''.join(current_string))\n                current_string.clear()\n\n    return result\n", "test": "\n\nMETADATA = {\n    'author': 'jt',\n    'dataset': 'test'\n}\n\n\ndef check(candidate):\n    assert candidate('(()()) ((())) () ((())()())') == [\n        '(()())', '((()))', '()', '((())()())'\n    ]\n    assert candidate('() (()) ((())) (((())))') == [\n        '()', '(())', '((()))', '(((())))'\n    ]\n    assert candidate('(()(())((())))') == [\n        '(()(())((())))'\n    ]\n    assert candidate('( ) (( )) (( )( ))') == ['()', '(())', '(()())']\n"}
...

Evaluation file

{"task_id": "HumanEval/0", "completion": "    for idx, elem in enumerate(numbers):\n        for idx2, elem2 in enumerate(numbers):\n            if idx != idx2:\n                distance = abs(elem - elem2)\n                if distance < threshold:\n                    return True\n\n    return False\n"}
{"task_id": "HumanEval/1", "completion": "    result = []\n    current_string = []\n    current_depth = 0\n\n    for c in paren_string:\n        if c == '(':\n            current_depth += 1\n            current_string.append(c)\n        elif c == ')':\n            current_depth -= 1\n            current_string.append(c)\n\n            if current_depth == 0:\n                result.append(''.join(current_string))\n                current_string.clear()\n\n    return result\n"}
...

TorchDataEval

Problem file

{"task_id": "TorchDataEval/0", "prompt": "from torchdata.datapipes.iter import IterableWrapper\ndatapipe = IterableWrapper([1,2,3])\n# How to augument the datapipe by repeating it six times.\nnew_datapipe =", "entry_point": "none", "canonical_solution": [" Cycler(datapipe, 6)", " datapipe.cycle(6)"], "test": "\n\nMETADATA = {\n    'author': 'msra-v-dazan',\n    'dataset': 'test',\n    'type': 'Cycler'\n}\n\n\ndef check():\n    assert list(new_datapipe) == [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]\n\n"}
{"task_id": "TorchDataEval/1", "prompt": "from torchdata.datapipes.iter import IterableWrapper\n\ndp = IterableWrapper(['a', 'b', 'c'])\n# Assign indexs to the datepipe object.\nnew_dp =", "entry_point": "none", "canonical_solution": [" dp.enumerate()", " Enumerator(dp)"], "test": "\n\nMETADATA = {\n    'author': 'msra-v-dazan',\n    'dataset': 'test',\n    'type': 'Enumerator'\n}\n\n\ndef check():\n    assert list(new_dp) == [(0, 'a'), (1, 'b'), (2, 'c')]\n\n"}
...

Evaluation file

{"task_id": "TorchDataEval/0", "completion": " datapipe.cycle(6)"}
{"task_id": "TorchDataEval/1", "completion": " dp.enumerate()"}
...

Run eval on CPU instead of GPU

Use --gpu-device as -1

python eval_human_eval.py \
	--model_name_or_path PyCodeGPT-110M/ \
	--output_dir results/ \
	--num_completions 100 \
	--temperature 0.2 \
	--top_p 0.95 \
	--max_new_tokens 100 \
	--gpu_device -1

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.