nus-apr / auto-code-rover Goto Github PK
View Code? Open in Web Editor NEWA project structure aware autonomous software engineer aiming for autonomous program improvement. Resolved 15.95% tasks in full SWE-bench
License: Other
A project structure aware autonomous software engineer aiming for autonomous program improvement. Resolved 15.95% tasks in full SWE-bench
License: Other
Is there a reason why you are doing 2 inferences instead of only 1 to analyze and generate the json with api calls?
With a good prompt, we could do both together and save costs and noise.
Do you want me to send a PR?
Hi , Anaconda is blocked for my department, can I get any solution which can help me to use it just using python.
Hello!
I am the author of PullRequestBenchmark and I am wondering if you have any thoughts on that?
Best Regards
I am trying to understand results for Auto Code Rover and SWE-Agent.
Can you please let me know the format of the SWE-Agent test results in:
https://github.com/nus-apr/auto-code-rover/tree/main/results/swe-agent-results
What are all these cost_2_1, cost_2_2, and cost_2_3?
How can I to understand the results in this directory?
Also for Auto Code Reover, I see acr-run-1, acr-run-2, acr-run-3. Which one should I take? Which result are you reporting in the paper?
Currently, the fresh issue mode only supports cloning a remote project and working on issues from GitHub links. Sometimes one may want to pre-download the codebase and write the issue description in a file instead.
On Win32 shutil.move()
fails because the logger handlers are kept open:
(auto-code-rover) C:\Users\kripp\source\repos\auto-code-rover>python app/main.py --enable-layered --model gpt-4-0125-preview --setup-map ../SWE-bench/setup_result/setup_map.json --tasks-map ../SWE-bench/setup_result/tasks_map.json --output-dir output --task django__django-11133
[2024-04-13 09:39:49] Total number of tasks: 1
[2024-04-13 09:39:49] Total number of processes: 1
[2024-04-13 09:39:49] Task group info: (number of groups: 1)
[2024-04-13 09:39:49] setup_django__django__3.0: 1 tasks
[2024-04-13 09:39:49] Running in single process mode.
[2024-04-13 09:39:49] ============= Running task django__django-11133 =============
Error running command: ['git', 'apply', 'C:\\Users\\kripp\\source\\repos\\SWE-bench\\testbed\\django__django\\setup_django__django__3.0\\swe_bench_tests.patch'], Command '['git', 'apply', 'C:\\Users\\kripp\\source\\repos\\SWE-bench\\testbed\\django__django\\setup_django__django__3.0\\swe_bench_tests.patch']' returned non-zero exit status 1.
[2024-04-13 09:39:50] Finished all tasks sequentially.
[2024-04-13 09:39:50] Post-processing completed experiment results.
[2024-04-13 09:39:50] SWE-Bench input file created: C:\Users\kripp\source\repos\auto-code-rover\output\predictions_for_swebench.json
(auto-code-rover) C:\Users\kripp\source\repos\auto-code-rover>python app/main.py --enable-layered --model gpt-4-0125-preview --setup-map ../SWE-bench/setup_result/setup_map.json --tasks-map ../SWE-bench/setup_result/tasks_map.json --output-dir output --task django__django-11133
[2024-04-13 09:40:14] Total number of tasks: 1
[2024-04-13 09:40:14] Total number of processes: 1
[2024-04-13 09:40:14] Task group info: (number of groups: 1)
[2024-04-13 09:40:14] setup_django__django__3.0: 1 tasks
[2024-04-13 09:40:14] Running in single process mode.
[2024-04-13 09:40:14] ============= Running task django__django-11133 =============
Error running command: ['git', 'apply', 'C:\\Users\\kripp\\source\\repos\\SWE-bench\\testbed\\django__django\\setup_django__django__3.0\\swe_bench_tests.patch'], Command '['git', 'apply', 'C:\\Users\\kripp\\source\\repos\\SWE-bench\\testbed\\django__django\\setup_django__django__3.0\\swe_bench_tests.patch']' returned non-zero exit status 1.
[2024-04-13 09:40:15] Finished all tasks sequentially.
[2024-04-13 09:40:15] Post-processing completed experiment results.
Traceback (most recent call last):
File "C:\Users\kripp\miniconda3\envs\auto-code-rover\Lib\shutil.py", line 886, in move
os.rename(src, real_dst)
PermissionError: [WinError 5] Access is denied: 'C:\\Users\\kripp\\source\\repos\\auto-code-rover\\output\\django__django-11133_2024-04-13_09-40-14' -> 'C:\\Users\\kripp\\source\\repos\\auto-code-rover\\output\\no_patch\\django__django-11133_2024-04-13_09-40-14'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\kripp\source\repos\auto-code-rover\app\main.py", line 477, in <module>
main()
File "C:\Users\kripp\source\repos\auto-code-rover\app\main.py", line 472, in main
swe_input_file = organize_and_form_input(globals.output_dir)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\kripp\source\repos\auto-code-rover\app\post_process.py", line 477, in organize_and_form_input
organize_experiment_results(expr_dir)
File "C:\Users\kripp\source\repos\auto-code-rover\app\post_process.py", line 275, in organize_experiment_results
shutil.move(task_dir, corresponding_dir)
File "C:\Users\kripp\miniconda3\envs\auto-code-rover\Lib\shutil.py", line 904, in move
rmtree(src)
File "C:\Users\kripp\miniconda3\envs\auto-code-rover\Lib\shutil.py", line 820, in rmtree
return _rmtree_unsafe(path, onexc)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\kripp\miniconda3\envs\auto-code-rover\Lib\shutil.py", line 648, in _rmtree_unsafe
onexc(os.unlink, fullname, err)
File "C:\Users\kripp\miniconda3\envs\auto-code-rover\Lib\shutil.py", line 646, in _rmtree_unsafe
os.unlink(fullname)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\kripp\\source\\repos\\auto-code-rover\\output\\django__django-11133_2024-04-13_09-40-14\\info.log'
I am trying to get ACR running on my local machine but the Docker image (Dockerfile.scratch since I am on Apple Silicon) will not build.
First error:
$ docker build -f Dockerfile.scratch -t acr .
(...)
2.048 E: Package 'python-tk' has no installation candidate
------
Dockerfile.scratch:10
--------------------
9 |
10 | >>> RUN apt update && apt install -y \
11 | >>> git wget vim \
12 | >>> libffi-dev python3-pytest pkg-config build-essential libssl-dev \
13 | >>> libfreetype6-dev libqhull-dev \
14 | >>> texlive cm-super dvipng python-tk ffmpeg \
15 | >>> imagemagick fontconfig ghostscript inkscape graphviz \
16 | >>> optipng fonts-comic-neue python3-pikepdf
17 |
--------------------
ERROR: failed to solve: process "/bin/sh -c apt update && apt install -y git wget vim libffi-dev python3-pytest pkg-config build-essential libssl-dev libfreetype6-dev libqhull-dev texlive cm-super dvipng python-tk ffmpeg imagemagick fontconfig ghostscript inkscape graphviz optipng fonts-comic-neue python3-pikepdf" did not complete successfully: exit code: 100
On Apple Silicon it seems tkinter
is no longer installable via pip, but is bundled with Python (unless I'm misunderstanding something). I verified that I do have python-tk on my machine, so I removed the dependency from the apt install
hoping that would fix the issue.
In any case, I get a different error now:
$ docker build -f Dockerfile.scratch -t acr .
(...)
=> [ 7/11] RUN conda env create -f environment.yml 29.4s
=> [ 8/11] RUN ln -sf /bin/bash /bin/sh 0.3s
=> [ 9/11] COPY . /opt/auto-code-rover 6.2s
=> [10/11] WORKDIR /opt/auto-code-rover 0.0s
=> ERROR [11/11] RUN conda env create -f environment.yml 9.6s
------
> [11/11] RUN conda env create -f environment.yml:
0.663 Channels:
0.663 - conda-forge
0.663 - defaults
0.663 Platform: linux-aarch64
0.663 Collecting package metadata (repodata.json): ...working... done
4.591 Solving environment: ...working... failed
5.119 Channels:
5.119 - conda-forge
5.119 - defaults
5.119 Platform: linux-aarch64
5.119 Collecting package metadata (repodata.json): ...working... done
9.014 Solving environment: ...working... failed
9.533
9.533 LibMambaUnsatisfiableError: Encountered problems while solving:
9.533 - package unidiff-0.7.5-py38he3eb160_0 requires python >=3.8,<3.9.0a0 *_cpython, but none of the providers can be installed
9.533
9.533 Could not solve for environment specs
9.533 The following packages are incompatible
9.533 ├─ libuuid 1.41.5** is requested and can be installed;
9.533 ├─ python 3.11.7** is installable with the potential options
9.533 │ ├─ python [3.10.11|3.10.12|...|3.9.19] would require
9.533 │ │ └─ libuuid >=2.38.1,<3.0a0 , which conflicts with any installable versions previously reported;
9.533 │ └─ python 3.11.7, which can be installed;
9.533 ├─ unidiff 0.7.5** is installable with the potential options
9.533 │ ├─ unidiff 0.7.5 would require
9.533 │ │ └─ python >=3.10,<3.11.0a0 *_cpython but there are no viable options
9.533 │ │ ├─ python [3.10.0|3.10.1|...|3.9.9] would require
9.533 │ │ │ └─ libuuid >=2.32.1,<3.0a0 , which conflicts with any installable versions previously reported;
9.533 │ │ └─ python [3.10.11|3.10.12|...|3.9.19], which cannot be installed (as previously explained);
9.533 │ ├─ unidiff 0.7.5 would require
9.533 │ │ └─ python >=3.11,<3.12.0a0 *_cpython with the potential options
9.533 │ │ ├─ python [3.10.0|3.10.1|...|3.9.9], which cannot be installed (as previously explained);
9.533 │ │ ├─ python [3.10.11|3.10.12|...|3.9.19], which cannot be installed (as previously explained);
9.533 │ │ └─ python 3.11.0 would require
9.533 │ │ └─ xz >=5.2.6,<5.3.0a0 , which can be installed;
9.533 │ ├─ unidiff 0.7.5 would require
9.533 │ │ └─ python >=3.12,<3.13.0a0 *_cpython, which cannot be installed (as previously explained);
9.533 │ ├─ unidiff 0.7.5 would require
9.533 │ │ └─ python >=3.8,<3.9.0a0 *_cpython but there are no viable options
9.533 │ │ ├─ python [3.8.10|3.8.12|...|3.8.8] conflicts with any installable versions previously reported;
9.533 │ │ ├─ python [3.10.0|3.10.1|...|3.9.9], which cannot be installed (as previously explained);
9.533 │ │ └─ python [3.10.11|3.10.12|...|3.9.19], which cannot be installed (as previously explained);
9.533 │ ├─ unidiff 0.7.5 would require
9.533 │ │ └─ python_abi 3.9 *_pypy39_pp73, which requires
9.533 │ │ └─ python 3.9.* *_73_pypy, which conflicts with any installable versions previously reported;
9.533 │ └─ unidiff 0.7.5 would require
9.533 │ └─ python >=3.9,<3.10.0a0 with the potential options
9.533 │ ├─ python [3.10.0|3.10.1|...|3.9.9], which cannot be installed (as previously explained);
9.533 │ ├─ python [3.10.11|3.10.12|...|3.9.19], which cannot be installed (as previously explained);
9.533 │ ├─ python [3.9.0|3.9.1|...|3.9.7] conflicts with any installable versions previously reported;
9.533 │ └─ python 3.9.19 would require
9.533 │ └─ xz >=5.4.6,<6.0a0 , which can be installed;
9.533 └─ xz 5.4.5** is not installable because it conflicts with any installable versions previously reported.
9.533
------
Dockerfile.scratch:30
--------------------
28 | COPY . /opt/auto-code-rover
29 | WORKDIR /opt/auto-code-rover
30 | >>> RUN conda env create -f environment.yml
31 |
--------------------
ERROR: failed to solve: process "/bin/sh -c conda env create -f environment.yml" did not complete successfully: exit code: 1
I am not sure what the problem is. I have Python 3.12.2 installed as my default version, but the ACR README doesn't specify a requirement on a particular Python version?
Anyway, any help would be most appreciated.
Great job! How about creating a Discord discussion group so the community can have real-time discussions?
Just like https://github.com/princeton-nlp/SWE-agent?tab=readme-ov-file#-contributions-
When testing the llama3 model and ollama, I encountered an error indicating that communication with the ollama server is unreachable:
httpx.ConnectError: [Errno 111] Connection refused
This issue arises because ollama.chat(model=self.name, messages=[])
invokes chat = _client.chat
(located in site-packages/ollama/init.py), where _client = Client()
. The Client() constructor defaults to 'http://localhost:11434', which, within a Docker container, refers to the container itself rather than the host machine, while I install ollama in the host.
To resolve this, I propose two options:
Update the README: Suggest that ollama should be installed within the same Docker container as the agent. This approach requires users to configure a GPU environment within the container if they wish to utilize GPU capabilities for running llama3, which might be cumbersome.
Host Installation with Custom Client Configuration: Recommend installing ollama on the host machine. Use client.chat
where client = Client(host='http://host.docker.internal:11434')
. Here, host.docker.internal
points to the host within the Docker network.
I hope the maintainers acknowledge this issue. Considering that llama3 is a cost-effective option, its popularity is likely to increase, potentially affecting many users with this connectivity problem.
PLEASE IMPLEMENT LLM INFERENCE USING LITE LLM OR THE PROJECT CANNOT GROW EFFICIENTLY.
Not possible to install libs to the image
$DPKG_HOOK_ACTION" = remove-architecture; } && test -x /usr/share/pkg-config-dpkghook; then /usr/share/pkg-config-dpkghook update; fi', exit code 32512
E: Sub-process /usr/bin/dpkg returned an error code (2)
E: Problem executing scripts DPkg::Post-Invoke 'if [ -d /var/lib/update-notifier ]; then touch /var/lib/update-notifier/dpkg-run-stamp; fi; /usr/lib/update-notifier/update-motd-updates-available 2>/dev/null || true'
E: Sub-process returned an error code
The command '/bin/sh -c apt install -y vim build-essential libssl-dev' returned a non-zero code: 100
I have evaluated your predictions using my Docker based swe-bench evaluator. I achieve 26% on pass@3 compared to the 22% you reported. It might be worthwhile to review the logs for the failed benchmarks to see if your agent can actually achieve even better results :D
You find the logs and report here
And here's a sheet I use to compare the results.
Hello, I see you added new supported models. Can you provide an evaluation of them on SWE-bench so that it can be compared with the evaluations already done?
Thank you
missing rich library , when I use pip install rich in container, it's solved .
PYTHONPATH=. python app/main.py swe-bench --model gpt-4-0125-preview --setup-map ../SWE-bench/setup_result/setup_map.json --tasks-map ../SWE-bench/setup_result/tasks_map.json --output-dir output --task django__django-11133
Traceback (most recent call last):
File "/opt/auto-code-rover/app/main.py", line 16, in
from app import globals, globals_mut, inference, log
File "/opt/auto-code-rover/app/inference.py", line 11, in
from app.api.manage import ProjectApiManager
File "/opt/auto-code-rover/app/api/manage.py", line 12, in
from app import log
File "/opt/auto-code-rover/app/log.py", line 5, in
from rich.console import Console
ModuleNotFoundError: No module named 'rich'
(auto-code-rover) root@aa1d1cf79120:/opt/auto-code-rover# pip install rich
(base) root@26c024020254:/opt/auto-code-rover# cd /opt/SWE-bench
(base) root@26c024020254:/opt/SWE-bench# echo opendevin__ssh-connection-issue-911 > tasks.txt
(base) root@26c024020254:/opt/SWE-bench# conda activate swe-bench
(swe-bench) root@26c024020254:/opt/SWE-bench# python harness/run_setup.py --log_dir logs --testbed testbed --result_dir setup_result --subset_file tasks.txt
2024-04-09 09:11:13,566 - INFO - env_name for all setup entries: []
2024-04-09 09:11:13,566 - INFO - No setup needed.
(swe-bench) root@26c024020254:/opt/SWE-bench# cd /opt/auto-code-rover
(swe-bench) root@26c024020254:/opt/auto-code-rover# conda activate auto-code-rover
(auto-code-rover) root@26c024020254:/opt/auto-code-rover# PYTHONPATH=. python app/main.py --enable-layered --model gpt-4-0125-preview --setup-map /opt/SWE-bench/setup_result/setup_map.json --tasks-map /opt/SWE-bench/setup_result/tasks_map.json --output-dir /mnt/c/Users/pierr/output --task opendevin__ssh-connection-issue-911
Traceback (most recent call last):
File "/opt/auto-code-rover/app/main.py", line 477, in
main()
File "/opt/auto-code-rover/app/main.py", line 399, in main
with open(setup_map_file, "r") as f:
^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/opt/SWE-bench/setup_result/setup_map.json'
(auto-code-rover) root@26c024020254:/opt/auto-code-rover#
Hello,
In your paper, how do you run swe agent in your docker env? I saw the comparison between your docker env and theirs.
Thank you
I noticed that the AutoCodeRover has been implemented from scratch. There are several existing frameworks, such as AutoGPT and Baby AGI, that provide robust functionality for creating LLM-based agents. These frameworks could potentially save development time and leverage existing solutions for common challenges.
Could you please provide more details on the rationale behind the decision to develop this from scratch? Specifically, I am curious to know:
Understanding these points would be really helpful in appreciating the design choices and the potential advantages of the custom implementation.
Thank you!
Hello,
I have noticed that in the code, the project restricts the use of OpenAI parallel tool calls. Specifically, when using the OpenAI function calling, the agent can only make one function call at a time. Could you please provide some insight into the reason behind this restriction?
Thank you for your time and assistance.
Hi so I added a custom issue (the one not in the conf/swe_lite_tasks.txt file ) from GitHub and was getting this error
vercel__next.js-64413
2024-04-12 17:11:58,722 - INFO - env_name for all setup entries: [] 2024-04-12 17:11:58,722 - INFO - No setup needed..
So what should I do?
Hey, I would like to suggest support for integrating additional language model APIs beyond just OpenAI. Specifically, it would be very helpful to have the ability to use:
These models rank among the top 10 AI language models according to benchmarks like https://chat.lmsys.org/ and provide capabilities complementary to OpenAI's models.
The recent Command-R model from Cohere is particularly compelling for its strong retrieval-augmented capabilities using its embeddings. And the Claude model from Anthropic has received acclaim for its coherence and abilities to code.
Having this flexibility would be incredibly valuable. Would be amazing if you consider adding it!
For easy and better selfhosting docker compose is needed for this awesome project ;-))
Thank you for developing and maintaining this inspiring project!
I'm using harness/run_setup.py
to obtain different versions of a repository (e.g., Django) for testing but noticed the clone_repo
function in harness/utils.py
doesn't switch to specific branches/tags. This results in always getting the latest version of the codebase. Is there a way to clone a repo's specific versions (e.g., tags) using the current setup, or did I miss something?
I am looking forward to your help and thanks again!
A lower barrier to entry would enhance the adoption and usefulness of the project. Consider the following scenarios that a repo owner might want:
These links could be added manually or by bots.
There are lots of potential options for where instances might run -- Replit, Colab, Github, ...
The important thing is that more people will have the time and ability to use SWE-agent for working issues if doing so is as simple as possible. That is true even if they need to have a paid account somewhere to use the link.
In this example.mp4 file, it replay the planing and reasoning trajectories, but I can't find that mode in main execution file.
Can you show me some pointers to replay it?
Thanks
I am planning on writing an article on Auto Code Rover and I was wondering if you could tell me about the format of the SWE-bench test results in: https://github.com/nus-apr/auto-code-rover/tree/main/results/swe-agent-results
How am I to interpret the results in this directory? Specifically for Devin they formatted diffs for their SWE-bench run into separate pass/fail directories: https://github.com/CognitionAI/devin-swebench-results/tree/main/output_diffs
How is this done for your results? Thanks in advance and thanks for publishing your work.
-Harry
Can you make something like SWE-Agent where you can just run a simple command inside WSL with the neccessary infos like Model, API Key and Link to the Issue on Github?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.