Giter Site home page Giter Site logo

lean_proof_recording's People

Contributors

jasonrute avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

lean_proof_recording's Issues

Lean file permissions have changed

The core .lean files used to be "-rw-r--r--", but I they are now "-r--r--r--" (as per a recently change to elan). So when I copy them over they keep these permissions. I need to change the permissions back, either when copying them over or write before I modify the file. (Note, the mathlib files still have write permissions, but that isn't guaranteed to last, so I should force them to be writable.)

`error: failed to synthesize type class instance for`

Hi thanks for the lib! After upgrading Lean and Mathlib it fails to compile:

[12:10:57]: $ lean --make src/empty.lean (shell=True, cwd=/Users/tom/Main/research/code/third_party/lean_proof_recording, extra_env=None)
[12:11:12]: /Users/tom/Main/research/code/third_party/lean_proof_recording/_target/deps/lean/library/init/meta/interactive.lean:972:20: error: failed to synthesize type class instance for
[12:11:12]: ⊢ (pr.recorded (with_desc ↑"{...}" parser.itactic)?).reflectable
[12:11:12]: /Users/tom/Main/research/code/third_party/lean_proof_recording/_target/deps/lean/library/init/meta/interactive.lean:989:22: error: failed to synthesize type class instance for
[12:11:12]: ⊢ (pr.recorded (with_desc ↑"{...}" parser.itactic)?).reflectable
...

I will PR after learning more about Lean :)

Parsing error using |x| local notation in calc tactic

Currently, we get the following error when trying to process the data for bernstein.lean.

data_mathlib/lean_files/mathlib/src/analysis/special_functions/bernstein.lean
Traceback (most recent call last):
  File "/root/code/formal/formal/lean/lean_proof_recording/lean_proof_recording/pipeline/extract_proof_data.py", line 487, in run
    parser_ast = parser.read_begin()
  File "/root/code/formal/formal/lean/lean_proof_recording/lean_proof_recording/parser.py", line 1108, in read_begin
    tactics = self.read_tactic_list()
  File "/root/code/formal/formal/lean/lean_proof_recording/lean_proof_recording/parser.py", line 1040, in read_tactic_list
    self.raise_error(f'Expected "," or "{right}"')
  File "/root/code/formal/formal/lean/lean_proof_recording/lean_proof_recording/parser.py", line 384, in raise_error
    raise Exception(f"{msg}:\n{self.format_file_location()}")
Exception: Expected "," or "end":
data_mathlib/lean_files/mathlib/src/analysis/special_functions/bernstein.lean:240:5
0239:     |(bernstein_approximation n f - f) x|
          ^

The issue, is two fold:

  • There is a local notation in the file using |, which is normally used to delimit cases in a match or in an inductively defined term proof.
  • It is used inside calc which is one of the few cases where we have no information from Lean on how to parse, so I have to just scan along until we reach a reasonable end point. Currently that is set as one of "end", ";", ",", "|", "<|>", a right bracket, or a user command (but taking nesting parentheses and match statements into account).

Hence it is done scanning too early on the calc tactic. Luckily this is caught as an error, since the calc tactic is inside a tactic list so we know that the calc tactic can actually only end with "," or "end" (or ";", or "<|>", if it was inside a semicolon list or alternative inside that outer tactic list.)

I see two solutions:

  1. Scan the calc tactic in stages. Since the calc tactic is of the form calc xxx : xxx ... xxx : xxx ... xxx : xxx (where the xxx are terms), I could first scan until :, and then scan until ... or one of the end delimiters. In this particular case, that should catch the user notation since it is always on the left of the :.
  2. Make the end points for scanning the calc tactic depend on if I am inside a tactic list or a by.

I think I like option (1) better, but let me think about it for a bit first.

Issues with doc strings when parsing itactic arguments

Here is an error message :

data_mathlib/lean_files/mathlib/src/geometry/euclidean/basic.lean
Traceback (most recent call last):
  File "/root/code/formal/formal/lean/lean_proof_recording/lean_proof_recording/pipeline/extract_proof_data.py", line 485, in run
    parser_ast = parser.read_by()
  File "/root/code/formal/formal/lean/lean_proof_recording/lean_proof_recording/parser.py", line 1098, in read_by
    tactic = self.read_maybe_semicolon_tactic()
  File "/root/code/formal/formal/lean/lean_proof_recording/lean_proof_recording/parser.py", line 1049, in read_maybe_semicolon_tactic
    tactic = self.read_maybe_alt_tactic()
  File "/root/code/formal/formal/lean/lean_proof_recording/lean_proof_recording/parser.py", line 999, in read_maybe_alt_tactic
    first_tactic = self.read_single_tactic()
  File "/root/code/formal/formal/lean/lean_proof_recording/lean_proof_recording/parser.py", line 991, in read_single_tactic
    return self.read_named_tactic()
  File "/root/code/formal/formal/lean/lean_proof_recording/lean_proof_recording/parser.py", line 852, in read_named_tactic
    assert (param_end_line, param_end_column) == self.start_pos()
AssertionError

In particular it is an issue with this bit of lean code:

lemma eq_reflection_of_eq_subspace {s s' : affine_subspace ℝ P} [nonempty s]
  [nonempty s'] [complete_space s.direction] [complete_space s'.direction] (h : s = s') (p : P) :
  (reflection s p : P) = (reflection s' p : P) :=
by unfreezingI { subst h }

/-- Reflection is its own inverse. -/
@[simp] lemma reflection_symm (s : affine_subspace ℝ P) [nonempty s] [complete_space s.direction] :
  (reflection s).symm = reflection s :=
rfl

The issue is in two ways to parse the argument of unfreezingI:

  1. Parse the tactic block manually by first matching parentheses and then taking any whitespace and comments.
  2. Use the beginning and end positions of the parameters as provided by Lean.

We do (1) but then check with an assert that it agrees with (2). The reason we do this is that the parsing is quite finicky, so this gives us assurance that we are parsing correctly. Also, there are other times, e.g. conv tactics, where Lean can't give us information about the position of the argument and we just have to parse it using (1). This check makes sure we are using the right method in those cases too.

The problem here is that doc strings shouldn't count toward the parameter (while line comments and normal block comments should).

There are two possible fixes:

  1. Modify the parsing code (around line 851 of parser.py) to not consume all white space. Instead, (similarly to the parsing of other arguments) we should parse itactics as follows:
    1. Parse the tactic block (matching parentheses).
    2. Test that we are before the end of the Lean-recorded parameter. If not raise an error (not using an assert).
    3. Take white space and comment tokens one at a time until we get to the desired end (like the code for parsing other parameters). Raise an error if a non-whitespace token is present.
  2. Remove doc strings from consume_space method. To do that, we would have to change the tokenizer to treat doc strings as different from other block comments. We would also have to probably add doc strings to places in the code where we are looking for a command (e.g. def) since a doc string would mean we are at the end of a proof.

(2) is technically better and ideally it should still be done since it would prevent doc strings from being recorded as part of the tactics, but it is also pedantic and a lot more work. (1) should be sufficient for this problem, especially since all comments including doc strings are stripped out of the training data. I recommend doing (1) which should be fast and easy to implement.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.