jasonrute / lean_proof_recording Goto Github PK
View Code? Open in Web Editor NEWProof recording for Lean 3
License: Apache License 2.0
Proof recording for Lean 3
License: Apache License 2.0
The core .lean
files used to be "-rw-r--r--", but I they are now "-r--r--r--" (as per a recently change to elan
). So when I copy them over they keep these permissions. I need to change the permissions back, either when copying them over or write before I modify the file. (Note, the mathlib files still have write permissions, but that isn't guaranteed to last, so I should force them to be writable.)
Hi thanks for the lib! After upgrading Lean and Mathlib it fails to compile:
[12:10:57]: $ lean --make src/empty.lean (shell=True, cwd=/Users/tom/Main/research/code/third_party/lean_proof_recording, extra_env=None)
[12:11:12]: /Users/tom/Main/research/code/third_party/lean_proof_recording/_target/deps/lean/library/init/meta/interactive.lean:972:20: error: failed to synthesize type class instance for
[12:11:12]: ⊢ (pr.recorded (with_desc ↑"{...}" parser.itactic)?).reflectable
[12:11:12]: /Users/tom/Main/research/code/third_party/lean_proof_recording/_target/deps/lean/library/init/meta/interactive.lean:989:22: error: failed to synthesize type class instance for
[12:11:12]: ⊢ (pr.recorded (with_desc ↑"{...}" parser.itactic)?).reflectable
...
I will PR after learning more about Lean :)
Currently, we get the following error when trying to process the data for bernstein.lean.
data_mathlib/lean_files/mathlib/src/analysis/special_functions/bernstein.lean
Traceback (most recent call last):
File "/root/code/formal/formal/lean/lean_proof_recording/lean_proof_recording/pipeline/extract_proof_data.py", line 487, in run
parser_ast = parser.read_begin()
File "/root/code/formal/formal/lean/lean_proof_recording/lean_proof_recording/parser.py", line 1108, in read_begin
tactics = self.read_tactic_list()
File "/root/code/formal/formal/lean/lean_proof_recording/lean_proof_recording/parser.py", line 1040, in read_tactic_list
self.raise_error(f'Expected "," or "{right}"')
File "/root/code/formal/formal/lean/lean_proof_recording/lean_proof_recording/parser.py", line 384, in raise_error
raise Exception(f"{msg}:\n{self.format_file_location()}")
Exception: Expected "," or "end":
data_mathlib/lean_files/mathlib/src/analysis/special_functions/bernstein.lean:240:5
0239: |(bernstein_approximation n f - f) x|
^
The issue, is two fold:
|
, which is normally used to delimit cases in a match
or in an inductively
defined term proof.calc
which is one of the few cases where we have no information from Lean on how to parse, so I have to just scan along until we reach a reasonable end point. Currently that is set as one of "end", ";", ",", "|", "<|>", a right bracket, or a user command (but taking nesting parentheses and match statements into account).Hence it is done scanning too early on the calc tactic. Luckily this is caught as an error, since the calc tactic is inside a tactic list so we know that the calc tactic can actually only end with "," or "end" (or ";", or "<|>", if it was inside a semicolon list or alternative inside that outer tactic list.)
I see two solutions:
calc xxx : xxx ... xxx : xxx ... xxx : xxx
(where the xxx
are terms), I could first scan until :
, and then scan until ...
or one of the end delimiters. In this particular case, that should catch the user notation since it is always on the left of the :
.by
.I think I like option (1) better, but let me think about it for a bit first.
Here is an error message :
data_mathlib/lean_files/mathlib/src/geometry/euclidean/basic.lean
Traceback (most recent call last):
File "/root/code/formal/formal/lean/lean_proof_recording/lean_proof_recording/pipeline/extract_proof_data.py", line 485, in run
parser_ast = parser.read_by()
File "/root/code/formal/formal/lean/lean_proof_recording/lean_proof_recording/parser.py", line 1098, in read_by
tactic = self.read_maybe_semicolon_tactic()
File "/root/code/formal/formal/lean/lean_proof_recording/lean_proof_recording/parser.py", line 1049, in read_maybe_semicolon_tactic
tactic = self.read_maybe_alt_tactic()
File "/root/code/formal/formal/lean/lean_proof_recording/lean_proof_recording/parser.py", line 999, in read_maybe_alt_tactic
first_tactic = self.read_single_tactic()
File "/root/code/formal/formal/lean/lean_proof_recording/lean_proof_recording/parser.py", line 991, in read_single_tactic
return self.read_named_tactic()
File "/root/code/formal/formal/lean/lean_proof_recording/lean_proof_recording/parser.py", line 852, in read_named_tactic
assert (param_end_line, param_end_column) == self.start_pos()
AssertionError
In particular it is an issue with this bit of lean code:
lemma eq_reflection_of_eq_subspace {s s' : affine_subspace ℝ P} [nonempty s]
[nonempty s'] [complete_space s.direction] [complete_space s'.direction] (h : s = s') (p : P) :
(reflection s p : P) = (reflection s' p : P) :=
by unfreezingI { subst h }
/-- Reflection is its own inverse. -/
@[simp] lemma reflection_symm (s : affine_subspace ℝ P) [nonempty s] [complete_space s.direction] :
(reflection s).symm = reflection s :=
rfl
The issue is in two ways to parse the argument of unfreezingI
:
We do (1) but then check with an assert
that it agrees with (2). The reason we do this is that the parsing is quite finicky, so this gives us assurance that we are parsing correctly. Also, there are other times, e.g. conv
tactics, where Lean can't give us information about the position of the argument and we just have to parse it using (1). This check makes sure we are using the right method in those cases too.
The problem here is that doc strings shouldn't count toward the parameter (while line comments and normal block comments should).
There are two possible fixes:
consume_space
method. To do that, we would have to change the tokenizer to treat doc strings as different from other block comments. We would also have to probably add doc strings to places in the code where we are looking for a command (e.g. def
) since a doc string would mean we are at the end of a proof.(2) is technically better and ideally it should still be done since it would prevent doc strings from being recorded as part of the tactics, but it is also pedantic and a lot more work. (1) should be sufficient for this problem, especially since all comments including doc strings are stripped out of the training data. I recommend doing (1) which should be fast and easy to implement.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.