codium-ai / alphacodium Goto Github PK

View Code? Open in Web Editor NEW

3.1K 45.0 217.0 1.44 MB

Official implementation for the paper: "Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering""

Home Page: https://www.codium.ai

License: GNU Affero General Public License v3.0

Dockerfile 0.49% Python 99.51%

code-generation flow-engineering paper-implementations state-of-the-art broader-impacts

alphacodium's People

Contributors

Stargazers

Watchers

Forkers

mrt23 antonosika sweetdevil144 dnzdlklc vaibhavmalik4187 wayum999 vineetp6 maximus12793 mj3b mz0in llmapparchitect felixbade shabbirhasan1 tonywhite11 polya20 suryatmodulus artus-lytiq codeaudit fourpartswater andreslavescu blaizzy djlaserman sf9040 jansystemic webclinic017 pdragonlabs rarjun19 twobob hbcbh1999 siddharth1988 thegovind jfontestad a7t0fwa7 ototao o2alexanderfedin brunoscaglione 10nates easonatuestcglasgow mivanovitch nochwysid yiakwy-xpu-ml-framework-team johnxg xshapira vrmars3d gdlf13 zzmjohn techthiyanes sdrakulich o7s8r6 lahiaomar jjhw rlancemartin evelynmitchell ashwinrajendraprasad evdcush pierrevalade nhsjgczryf ssusantachary jkf87 killman122 saurabhchandra1024 cinkovic im-hidden rhinojosa osub wodole yun-fei-xie nshilon worldofxeen mehulgo93 mbrukman josephrp plurigrid beimingmaster eltociear qobiljon allinbsv ailabteam lunabegray abhishek-honey vechtomov violabs dragekjeks xc0r biofiction shivamms cassini-chris sonnydev runrunliuliu apollohuang1 jadegeek liuchaoxd dearborn-open-ai dearbornlavern king-darius ravikiranbadathala jenningsloy318 buddhanepal sergiobasstidas bryanfran

alphacodium's Issues

Problems during the AlphaCodium installation process

I encountered some problems during the AlphaCodium installation process.

OS : MacOS Sonoma 14.3
Python : 3.12.1

When I executed the command < pip install -r requirements.txt >, I got error logs like below :

Collecting PyYAML==6.0 (from -r requirements.txt (line 12))
Downloading PyYAML-6.0.tar.gz (124 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 125.0/125.0 kB 11.7 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [54 lines of output]
running egg_info
writing lib/PyYAML.egg-info/PKG-INFO
writing dependency_links to lib/PyYAML.egg-info/dependency_links.txt
writing top-level names to lib/PyYAML.egg-info/top_level.txt
Traceback (most recent call last):
File "/Users/jholim/workspace/codeai/demo/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in
main()
File "/Users/jholim/workspace/codeai/demo/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jholim/workspace/codeai/demo/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
return hook(config_settings)
^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/k8/27k28wmn09j3109z45qxk9xr0000gn/T/pip-build-env-gondi16m/overlay/lib/python3.12/site-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=['wheel'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I passed that point using the following information, but other issues arose.

https://discuss.python.org/t/getting-requirements-to-build-wheel-did-not-run-successfully-exit-code-1/30365

Could you give some advise that I can solve this problem?
Or, please let me know your environment info - including python version.

Thanks in advance.

Generate Solution for other language and where to see test files generated

Module works fine for python solution , is there are any parameter for specifying any programming language.

Where to see the generate test files

Custom Problem

Is it possible to modify the code so that it works for a custom problem that is not included in the dataset?

Enhancements to the Iterative Flow Mechanism in AlphaCodium for Robust Code Generation

Dear Tal Ridnik, Dedy Kredo, and Itamar Friedman,

I have been thoroughly engrossed in the study of your work on AlphaCodium as detailed in your recent GitHub repository. The methodology you have proposed for code generation through the use of a test-based, multi-stage iterative flow is indeed revolutionary and appears to have the potential to significantly improve the accuracy of language models on code-related tasks.

However, upon delving into the intricacies of your approach, I have identified a few areas where the iterative flow mechanism could possibly be enhanced to ensure even more robust code generation. I am listing these below, along with suggestions for potential improvements:

Context Management Optimisation: As noted in your Technical Q&A section, the model tends to overlook certain details in the problem description when the context grows too large. Would it be feasible to implement a more dynamic context management strategy that prioritises the most relevant information from previous iterations, ensuring that the model retains focus on the key aspects of the problem?
Enhanced Feedback Loop for Test Generation: While iterating on the generated code is the current focus, could there be merit in establishing a feedback loop for the AI-generated tests as well? For instance, tests that consistently fail could trigger a deeper analysis of specific code segments, potentially uncovering subtle bugs that are not immediately apparent.
Granular Control Over Iterative Steps: Could the configuration file expose more granular control over the iterative steps? For example, allowing users to specify different iteration strategies for certain types of problems or to adjust the iteration count based on the complexity of the task at hand.
Integration with Real-world Development Environments: How might AlphaCodium be integrated into real-world development environments to support live coding scenarios? Would it be possible to create plugins or extensions for popular Integrated Development Environments (IDEs) that utilise AlphaCodium's flow to assist developers in real-time?
Cross-language Applicability and Testing: While the flow is language-agnostic, have there been any efforts to test its efficacy across a broader range of programming languages? Insights gained from such tests could help refine the flow to better accommodate the idiosyncrasies of different programming paradigms.

I believe that addressing these points could further elevate the practicality and effectiveness of AlphaCodium in real-world coding applications. I am eager to hear your thoughts on these suggestions and whether they could be incorporated into your future work.

Thank you for your pioneering contributions to the field of AI-driven code generation. I look forward to your response and am excited about the potential advancements that your continued research will bring to the developer community.

Best regards,
yihong1120

Got errors when using gpt-3.5-turbo-1106

Hi,

python -m alpha_codium.solve_problem --dataset_name valid_and_test_processed --split_name test --problem_number 0

The above script works well when I use gpt-3.5-turbo-0613. But when I use 'gpt-3.5-turbo-1106', it always shows the following error. Can you have a try on 'gpt-3.5-turbo-1106' to see if you have the same error? Thanks!

2024-01-25 21:11:46.480 | INFO | alpha_codium.gen.coding_competitor:solve_problem:118 - problem['name']: 1575_A. Another Sorting Problem
2024-01-25 21:11:46.484 | INFO | alpha_codium.gen.coding_competitor:run:60 - Running code contests competitor, model gpt-3.5-turbo-1106
2024-01-25 21:11:46.485 | INFO | alpha_codium.gen.stages.run_self_reflect:run_self_reflect:18 - --reflection stage--
2024-01-25 21:11:46.491 | INFO | alpha_codium.llm.ai_handler:chat_completion:86 - -----------------
2024-01-25 21:11:46.491 | INFO | alpha_codium.llm.ai_handler:chat_completion:87 - Running inference ...
2024-01-25 21:11:56.045 | INFO | alpha_codium.llm.ai_handler:chat_completion:133 - done
2024-01-25 21:11:56.045 | INFO | alpha_codium.llm.ai_handler:chat_completion:134 - -----------------
ERROR:root:'run_self_reflect' stage, counter_retry 0, Error: while scanning for the next token
found character '`' that cannot start any token
in "", line 1, column 1:
```yaml
^

Meaning of parameters in 'configuration.toml'

Hi,

In the 'configuration.toml' file, I see a range of parameters, but not sure what those parameters control. Could you please provide one example config file that can produce the result in Table 1 and Table 2 of the paper? One example for each table will be great! Thank you!

Support for Claude 3

Can AlphaCodium run on Claude 3 Opus?

It would be great to see how AlphaCodium using Claude 3 performs compared to AlphaCodium using GPT-4

How to use deepseek

When I modified the configuration file to use the model="deepseek-coder-33b-instruct" and ran the code for model inference, it failed with the following error:

alpha_codium.llm.ai_handler:chat_completion:87 - Running inference ...

Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.

ERROR:root:Error during OpenAI inference

It seems that litellm does not support deepseek, and I would like to know how to resolve this issue.

Benchmark on SWE-Bench

It would be interesting to see the performance on SWE-Bench benchmarks, so that this project can be more clearly differentiated from the increasing number of other coding agents.

https://www.swebench.com/
https://github.com/princeton-nlp/SWE-bench
[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
https://arxiv.org/abs/2310.06770
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, Karthik Narasimhan
Language models have outpaced our ability to evaluate them effectively, but for their future development it is essential to study the frontier of their capabilities. We consider real-world software engineering to be a rich, sustainable, and challenging testbed for evaluating the next generation of language models. We therefore introduce SWE-bench, an evaluation framework including 2,294 software engineering problems drawn from real GitHub issues and corresponding pull requests across 12 popular Python repositories. Given a codebase along with a description of an issue to be resolved, a language model is tasked with editing the codebase to address the issue. Resolving issues in SWE-bench frequently requires understanding and coordinating changes across multiple functions, classes, and even files simultaneously, calling for models to interact with execution environments, process extremely long contexts and perform complex reasoning that goes far beyond traditional code generation. Our evaluations show that both state-of-the-art proprietary models and our fine-tuned model SWE-Llama can resolve only the simplest issues. Claude 2 and GPT-4 solve a mere 4.8% and 1.7% of instances respectively, even when provided with an oracle retriever. Advances on SWE-bench represent steps towards LMs that are more practical, intelligent, and autonomous.

'gbk' codec can't encode character '\u22c5' and test_timeout_generate: 200

Recommended approach for local models? i.e. Swappable model support.

Looks like litellm does a (too) good job of encapsulating the calls to openai, making calls to local openai-api-based models require a proxy to intercept and re-route.

Is this the recommended approach for the time being? Any plans to drop the litellm dependency, use one that's a little more open, or write your own layer?

It would be nice to use this with swappable models especially since AC seems to generalize across general instruct models and not require function-calling models.

Invalid IPC stream: negative configuration token

I followed all the steps in the readme to set up the environment, then extracted the folder to the AlphaCodium directory. I ran this command: python -m alpha_codium.solve_problem --dataset_name "C:\Users\****\Downloads\py\AlphaCodium\valid_and_test_processed" --split_name valid --problem_number 0
and got the following output:

  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\****\Downloads\py\AlphaCodium\alpha_codium\solve_problem.py", line 16, in <module>
    solve_problem(dataset_name=args.dataset_name,
  File "C:\Users\****\Downloads\py\AlphaCodium\alpha_codium\gen\coding_competitor.py", line 109, in solve_problem
    data_provider = CodeContestDataProvider(dataset_location=dataset_name)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\****\Downloads\py\AlphaCodium\alpha_codium\code_contests\data\provider.py", line 29, in __init__
    self.dataset = self.load_dataset()
                   ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\****\Downloads\py\AlphaCodium\alpha_codium\code_contests\data\provider.py", line 131, in load_dataset
    return f(self.dataset_location)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\****\Downloads\py\AlphaCodium\env\Lib\site-packages\datasets\load.py", line 2636, in load_from_disk
    return DatasetDict.load_from_disk(dataset_path, keep_in_memory=keep_in_memory, storage_options=storage_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\****\Downloads\py\AlphaCodium\env\Lib\site-packages\datasets\dataset_dict.py", line 1369, in load_from_disk
    dataset_dict[k] = Dataset.load_from_disk(
                      ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\****\Downloads\py\AlphaCodium\env\Lib\site-packages\datasets\arrow_dataset.py", line 1706, in load_from_disk
    arrow_table = concat_tables(
                  ^^^^^^^^^^^^^^
  File "C:\Users\****\Downloads\py\AlphaCodium\env\Lib\site-packages\datasets\table.py", line 1765, in concat_tables
    tables = list(tables)
             ^^^^^^^^^^^^
  File "C:\Users\****\Downloads\py\AlphaCodium\env\Lib\site-packages\datasets\arrow_dataset.py", line 1707, in <genexpr>
    table_cls.from_file(posixpath.join(dest_dataset_path, data_file["filename"]))
  File "C:\Users\loydni\Downloads\py\AlphaCodium\env\Lib\site-packages\datasets\table.py", line 1022, in from_file
    table = _memory_mapped_arrow_table_from_file(filename)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\****\Downloads\py\AlphaCodium\env\Lib\site-packages\datasets\table.py", line 64, in _memory_mapped_arrow_table_from_file
    opened_stream = _memory_mapped_record_batch_reader_from_file(filename)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\****\Downloads\py\AlphaCodium\env\Lib\site-packages\datasets\table.py", line 50, in _memory_mapped_record_batch_reader_from_file
    return pa.ipc.open_stream(memory_mapped_stream)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\****\Downloads\py\AlphaCodium\env\Lib\site-packages\pyarrow\ipc.py", line 190, in open_stream
    return RecordBatchStreamReader(source, options=options,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\****\Downloads\py\AlphaCodium\env\Lib\site-packages\pyarrow\ipc.py", line 52, in __init__
    self._open(source, options=options, memory_pool=memory_pool)
  File "pyarrow\ipc.pxi", line 974, in pyarrow.lib._RecordBatchStreamReader._open
  File "pyarrow\error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow\error.pxi", line 91, in pyarrow.lib.check_status
OSError: Invalid IPC stream: negative continuation token```

httpx.ConnectError: All connection attempts failed

Hi,

I run
python -m alpha_codium.solve_problem --dataset_name /workspace/xxx/codes/AlphaCodium/valid_and_test_processed --split_name test --problem_number 1

It always shows the problem:
`2024-03-28 14:14:21.151 | INFO | alpha_codium.gen.coding_competitor:solve_problem:116 - problem_name: 1575_B. Building an Amusement Park
2024-03-28 14:14:21.156 | INFO | alpha_codium.gen.coding_competitor:solve_problem:120 - problem['name']: 1575_B. Building an Amusement Park
2024-03-28 14:14:21.159 | INFO | alpha_codium.gen.coding_competitor:run:63 - Running code contests competitor, model gpt-3.5-turbo-16k
2024-03-28 14:14:21.164 | INFO | alpha_codium.llm.ai_handler:chat_completion:86 - -----------------
2024-03-28 14:14:21.164 | INFO | alpha_codium.llm.ai_handler:chat_completion:87 - Running inference ...

Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.

...
httpcore.ConnectError: All connection attempts failed.
...

Traceback (most recent call last):
File "/workspace/xxx/codes/AlphaCodium/alpha_codium/llm/ai_invoker.py", line 15, in send_inference
return await f(model)
^^^^^^^^^^^^^^
File "/workspace/xxx/codes/AlphaCodium/alpha_codium/gen/coding_competitor.py", line 52, in _run
response, finish_reason = await self.ai_handler.chat_completion(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/xxx/codes/AlphaCodium/alpha_codium/llm/ai_handler.py", line 127, in chat_completion
raise APIError from e
TypeError: APIError.init() missing 5 required positional arguments: 'status_code', 'message', 'llm_provider', 'model', and 'request'

ERROR:root:Error: APIError.init() missing 5 required positional arguments: 'status_code', 'message', 'llm_provider', 'model', and 'request'

...

2024-03-28 14:43:50.504 | INFO | alpha_codium.gen.coding_competitor:solve_my_problem:184 - evaluating solution on generated tests...
Process Process-6:
Traceback (most recent call last):
File "/workspace/xxx/miniconda3/envs/llm_coder_py3.11/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/workspace/xxx/miniconda3/envs/llm_coder_py3.11/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/workspace/xxx/codes/AlphaCodium/alpha_codium/code_contests/eval/local_exec.py", line 89, in unsafe_execute
with create_tempdir():
File "/workspace/xxx/miniconda3/envs/llm_coder_py3.11/lib/python3.11/contextlib.py", line 144, in exit
next(self.gen)
File "/workspace/xxx/codes/AlphaCodium/alpha_codium/code_contests/eval/local_exec.py", line 278, in create_tempdir
with tempfile.TemporaryDirectory() as dirname:
File "/workspace/xxx/miniconda3/envs/llm_coder_py3.11/lib/python3.11/tempfile.py", line 943, in exit
self.cleanup()
File "/workspace/xxx/miniconda3/envs/llm_coder_py3.11/lib/python3.11/tempfile.py", line 947, in cleanup
self._rmtree(self.name, ignore_errors=self._ignore_cleanup_errors)
File "/workspace/xxx/miniconda3/envs/llm_coder_py3.11/lib/python3.11/tempfile.py", line 929, in _rmtree
_shutil.rmtree(name, onerror=onerror)
TypeError: 'NoneType' object is not callable
2024-03-28 14:43:50.627 | INFO | alpha_codium.gen.coding_competitor:solve_my_problem:188 -
test_passed_generate: 0, test_passed_private: 0, test_passed_public: 0
test_failed_generate: 0, test_failed_private: 0, test_failed_public: 0
test_timeout_generate: 0, test_timeout_private: 0, test_timeout_public: 0
`

Hello, I noticed that while running AI tests, some tests were mistakenly marked as failed due to timeouts. Upon reviewing the code, I found that the timeout duration is set to 3 seconds and it is not configurable.

Hello, I noticed that while running AI tests, some tests were mistakenly marked as failed due to timeouts. Upon reviewing the code, I found that the timeout duration is set to 3 seconds and it is not configurable.

However, when I run this test case individually with the generated code, the result is as expected, it's just that the execution time is a bit long.