Dockerization of run_evaluation.py about swe-bench HOT 5 OPEN

aorwall commented on June 2, 2024 7

Dockerization of run_evaluation.py

from swe-bench.

Comments (5)

aorwall commented on June 2, 2024

I'm down to 2 failing tests now in pydata/xarray 0.12. I probably need to compare to logs from a successful run to fix those effectively.

I'm also testing testbeds for the regular dataset using the check harness predictions now.

from swe-bench.

paul-gauthier commented on June 2, 2024

I'll chime in that @aorwall's docker images and run_evaluation.py script have worked very well for me. I was able to run ~all the "lite" tests without problems. Whereas working with the original conda testbeds, most tests of the gold patches were failing to build or pass.

Also, the docker testbeds launch and execute very quickly compared to re-building the conda testbeds.

from swe-bench.

PandelisZ commented on June 2, 2024

~all the lite meaning, not quite all? I've been struggling to get much to run

from swe-bench.

aorwall commented on June 2, 2024

I got all except for pydata__xarray-4094 and pydata__xarray-4493 to run.

from swe-bench.

paul-gauthier commented on June 2, 2024

@PandelisZ sorry, I should have been more clear. I got 298 out of 300 test cases to work out of the box with @aorwall's dockerized SWE-bench-docker tooling. The 2 that fail are known not to work, so that was expected.

I only got a few of the test cases to work with the original/official conda test beds, after a half a day of trying.

from swe-bench.

Dockerization of run_evaluation.py about swe-bench HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent