Giter Site home page Giter Site logo

minif2f's Introduction

MiniF2F

MiniF2F is a formal mathematics benchmark (translated across multiple formal systems) consisting of exercise statements from olympiads (AMC, AIME, IMO) as well as high-school and undergraduate maths classes.

The goal of the project is to provide a shared benchmark to evaluate and directly compare automated theorem proving systems based on the formal systems targeted, initially Lean, and Metamath (targeting also Hol Light and Isabelle).

The benchmark (released under permissive licenses (MIT for Metamath, Apache for Lean)) is a work in progress and contributions are welcome and encouraged through pull requests.

Citation

The benchmark is described in detail in the following pre-print:

@article{zheng2021minif2f,
  title={MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics},
  author={Zheng, Kunhao and Han, Jesse Michael and Polu, Stanislas},
  journal={arXiv preprint arXiv:2109.00110},
  year={2021}
}

Statistics

Test Valid
Lean 244 244
Metamath 244 244
Isabelle 244 244
Hol Light 165 165

Structure

Each problem is represented by a unique name and a file for each of the formal systems we target. Each file consists at minima in the problem statement and optionally one or more example proofs associated with it. The benchmark is divided in two splits:

  • valid: validation set that can be used while designing automated theorem proving systems (early-stopping, reinforcement learning, data-augmentation, curriculum design, ...).
  • test: held-out test set reserved for final evaluation.

Naming conventions are still a work in progress. Olympiads problems are generally named after their competition year and problem number (eg. imo-1990-p3 or aime-1983-p2). Problems coming from a particular dataset (eg the MATH dataset) are named to ease their retrieval (eg. mathd-algebra-125). Other problems are prefixed by a category hint and a unique name in the style of Metamath naming conventions (eg. induction-11div10tonmn1ton).

Each exercise file complies to the following system-specific conventions.

Lean

To install the project make sure you have elan installed, then in the directory where you want the project installed run:

git clone https://github.com/openai/miniF2F
cd miniF2F
leanpkg configure
leanpkg build

Since having one file per statement causes slowness in Lean parsing stage, all Lean statements are exceptionally aggregated in two files (valid.lean and test.lean). These files contain a list of the problem statements defined as theorems. Optionally, proofs for these statements are provided as well as potential lemmas to support the ground-truth proof.

No theorem should appear that do not correspond to a problem statement; use lemma instead.

Please use lean/scripts/lint_style.py to check all the statements pass the linter. You can also make use of lean/scripts/simple_formatter.sh to enforce a few basic formatting rules.

The lean folder is released under the Apache License (so that it is aligned with Lean's mathlib license).

Metamath

Each file contains the problem statement with the same name as the problem unique name. The statement is commented (using Metamath convention) if provided without proof.

The metamath folder is released under the MIT License.

HOL Light

Each file contains the problem statement defined as a HOL Light term whose name must match the file name.

The hollight folder is released under the FreeBSD License.

Isabelle

Each file contains the problem statement defined as a theorem whose name must match the file name, optionally with a proof for it as well as the necessary imports.

The isabelle folder is released under the Apache License.

Code of Conduct and Contributions

MiniF2F is meant to serve as a shared and useful resource for the machine learning community working on formal mathematics.

There is no obligation tied with the use and reporting of a result based on miniF2F. But if you're using it and discovering new proofs (manually or automatically) please contribute them back to the benchmark.

All contributions, such as new statements for later versions, addition of missing statements for existing versions, bug fixes, additional proofs are all welcome.

Versioning

A version of miniF2F is defined by a frozen set of statements. The goal for each version is to get full coverage on all formal systems for that version even if that might not be the case when the version is frozen.

When reporting a result based on miniF2F please always specify the version you used. The current version is v1, frozen as of August 2021, including 244 statements (fully translated to Lean and Metamath but still WIP in other formal systems).

Each version will live in its own branch to allow later additions of translated statements or fixes to existing statements as needed. The main branch remains reserved for active development and should not be used when reporting results.

Active version

  • Version: v1
  • Freeze date: August 2021
  • Branch: v1

minif2f's People

Contributors

albertqjiang avatar cuppajoeman avatar dwrensha avatar dyekuu avatar javier-m avatar mantasbaksys avatar marco-dossantos avatar wenda302 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

minif2f's Issues

Lean aime_1988_p8

There appears to be a typo in the statement of aime_1988_p8 in test.lean.
Namely, h₂ is stated as ∀ x y, f x y = y * (f x (x + y)) which I believe should instead by ∀ x y, (x + y) * f x y = y * (f x (x + y)) as indicated by the problem statement in metamath/test/aime-1988-p8.mm.

Suggestion: declare that this repository is no longer maintained.

Dear @DyeKuu ,

I guess nobody has the admin right to handle pull requests to this repository and the development of miniF2F is continuing at the corresponding repository at Facebook Research.

Can you, as the first author of the miniF2F paper, declare that you are no longer maintaining this repository (openai/miniF2F) and that you welcome new contributions and issues at the corresponding repository at Facebook Research to avoid confusions?

(Sorry if I guessed wrong.)

Regards,
Yutaka

mathd_algebra_293

mathd_algebra_293
It should be: sqrt (60 * x) * sqrt (12 * x) * sqrt (63 * x) = 36 * |x| * sqrt (35 * x)

Missing constraint in amc12a_2013_p8

theorem amc12a_2013_p8

theorem amc12a_2013_p8
(x y : ℝ)
(h₀ : x ≠ 0)
(h₁ : y ≠ 0)
(h₂ : x + 2 / x = y + 2 / y) :
x * y = 2 :=

Original problem:

Problem
Given that $x$ and $y$ are distinct nonzero real numbers such that $x+\tfrac{2}{x} = y + \tfrac{2}{y}$, what is $xy$?

Missing a constraint on x and y being distinct.

FOL/SMT classical tools support?

I know Lean is a higher order logic...but it would be nice to have a benchmark where classical SMT methods can be benchmarked too. What is the possibility/effort/path to translate miniF2F to a format compatible with classical SMT/ATP solvers?

amc12_2000_p1

amc12_2000_p1
The condition i =/= 0, m =/= 0, k =/= 0 should become i =/= m, m =/= k, k =/= i.

imo_1965_p2

imo_1965_p2
The condition "a 7 < 0 ∧ a 9 < 0" should be "a 6 < 0 ∧ a 7 < 0"

Possibility of updating lean/mathlib versions

Hello,

I've been working on two proofs lately, I'm one line away from algebra_others_exirrpowirrrat, but norm_cast isn't working properly on the current lean version so I can't close the goal. Someone else tested and was able to get it to work with the same proof using a newer version.

Also I'm working on induction_pprime_pdvdapowpma for which I'd like to use fermats little theorem which would help immensely (and probably in other number theory proofs). Is it possible we can use a more up to date version of mathlib as well?

Thank you for your consideration.

Typo in theorem mathd_algebra_185?

theorem mathd_algebra_185

The original description of the problem is: How many integers are in the solution of the inequality $|x + 4|&lt; 9$?
Here the function f is defined as over real numbers: (f : ℝ → ℝ). I suspect that this is a typo, but I'm no lean expert. Hopefully someone can verify this.

Assumptions in theorem mathd_algebra_96 appear inconsistent

miniF2F/lean/src/valid.lean

Lines 1061 to 1070 in b4c6a46

theorem mathd_algebra_96
(x y z a : ℝ)
(h₀ : 0 < x ∧ 0 < y ∧ 0 < z ∧ 0 < a)
(h₁ : real.log x - real.log y = a)
(h₂ : real.log y - real.log z = 15)
(h₃ : real.log z - real.log x = -7) :
a = -8 :=
begin
nlinarith [h₁, h₂, h₃],
end

The assumption a>0 contradicts the conclusion (i.e., a=-8), which means that we should be able to derive a=n for all n. Additionally, it appears that the assumption 0 < x ∧ 0 < y ∧ 0 < z is also not needed.

This issue was first identified by @tonywu95.

Errors to be fixed in miniF2F-Isabelle / Lean / Metamath

I received an email from Daniel Sebastian Goc and Yutaka Nagashima. They report some errors on the Isabelle part.

@Wenda302 @albertqjiang mind having a look at this one if you have time? Thanks, Thanks 🙏. Happy to help if there's something I can do.

cc @spolu @jesse-michael-han

aime_1999_p11
need an additional assumption m > 0, and the condidtion dd/nn < 90 should be nn/dd < 90 instead.

amc12_2000_p1
The condition  i =/= 0, m =/= 0, k =/= 0  should become i =/= m, m =/= k, k =/= i.

amc12a_2019_p12
There is a square (^2) missing

amc12a_2020_p4
The numbers a, b, c, d are fixed!!! And they shouldn't be!

amc12a_2021_p12
The thesis should be b = -88.

amc12b_2021_p4
Is simply wrong.

amc12b_2021_p18
The norms are missing squares.

imo_1965_p2
The condition "a 7 < 0 ∧ a 9 < 0" should be "a 6 < 0 ∧ a 7 < 0"

imo_1974_p3
Bad summation range

imo_2007_p6
Check summation - the term  (a 100)^2 * a 1  is probably inside the summation.

imo_2019_p1
Missing brackets around the first quantifier

mathd_algebra_17
add the assumption 1 + a > 0.

mathd_algebra_276
The sum should have been 7.

mathd_algebra_288 
This simply doesn't hold.

mathd_algebra_293
It should be: sqrt (60 * x) * sqrt (12 * x) * sqrt (63 * x) = 36 * |x| * sqrt (35 * x)

mathd_algebra_320
We need to normalize the variables a, b, c, for example by fixing c = 2.

mathd_algebra_332
Is wrong.

mathd_algebra_452
Value a_5 should be 6/5

mathd_algebra_487
Is wrong.

mathd_numbertheory_135
There is missing assumption that a, b, c lie in the interval [0, 9]

mathd_numbertheory_321
Remove the quantifier!

mathd_numbertheory_447
Is wrong.

mathd_numbertheory_728
The remainder should be 3.

numbertheory_notequiv2i2jasqbsqdiv8
Is wrong. Counterexample: a = b = 0

bug in Lean problem statement of mathd_numbertheory_780

The Lean statement of mathd_numbertheory_780 has a bug in it:

theorem mathd_numbertheory_780
(m x : ℕ)
(h₀ : 10 ≤ m)
(h₁ : m ≤ 99)
(h₂ : (6 * x) % m = 1)
(h₃ : (x - 6^2) % m = 0) :
m = 43 :=
begin
sorry
end

The subtraction in h₃ can be degenerate when x < 36, in which case x - 6^2 = 0 (because Lean defines subtraction as a total function taking values on the natural numbers).

Assuming that the theorem is true as stated, we can derive a contradiction:

theorem mathd_numbertheory_780
    (m x : ℕ)
    (h₀ : 10 ≤ m)
    (h₁ : m ≤ 99)
    (h₂ : (6 * x) % m = 1)
    (h₃ : (x - 6^2) % m = 0) :
    m = 43 := sorry

example : false :=
begin
  have h := mathd_numbertheory_780 11 2
           (by norm_num) (by norm_num) (by norm_num) (by norm_num),
  norm_num at h
end

One way around this could be to use integers rather than natural numbers in the problem statement.

mathd_algebra_320

mathd_algebra_320
We need to normalize the variables a, b, c, for example by fixing c = 2.

Project causes huge lag and tactic window doesn't load

Hi there, I've been trying to get this project set up on my computer to no avail.

My computer specs are this:

OS: Manjaro Linux x86_64 
Kernel: 5.13.19-2-MANJARO 
CPU: AMD Ryzen 5 1600 (12) @ 3.200GHz 
GPU: NVIDIA GeForce GTX 1060 3GB 
Memory: 3449MiB / 15937MiB 

My elan version is:

elan 1.3.1 (4363704ad 2021-11-01)

I am able to work with other projects just fine.

Here is a video with htop running in the back, you can see that it looks like a whole bunch of elan servers are spawned, but I am not sure if that is expected behavior:

https://streamable.com/by37ir

Thanks for checking this out.

imo_2019_p1

imo_2019_p1
Missing brackets around the first quantifier

aime_1999_p11

aime_1999_p11
need an additional assumption m > 0, and the condidtion dd/nn < 90 should be nn/dd < 90 instead.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.