Giter Site home page Giter Site logo

"SHOW RESTRICTED" in Travis about set.mm HOT 16 CLOSED

metamath avatar metamath commented on July 21, 2024
"SHOW RESTRICTED" in Travis

from set.mm.

Comments (16)

david-a-wheeler avatar david-a-wheeler commented on July 21, 2024

I agree, a separate file is the cleanest solution. In one sense, this is a big change because set.mm will now be defined in multiple files. And it's about time. This massive file needs to be split up anyway, so this is a good time to start.

from set.mm.

digama0 avatar digama0 commented on July 21, 2024

I wouldn't say "set.mm is defined in several files". This is more like a compiled version of the file; the data passes strictly in one direction set.mm -> set.usage.

from set.mm.

nmegill avatar nmegill commented on July 21, 2024

On 7/7/16 2:21 AM, Mario Carneiro wrote:

The latest version of metamath.exe includes a command SHOW RESTRICTED
that will allow us to verify that (New usage/Proof modification is
discouraged.) marks are honored. Norm writes:

...

|MM> help show restricted Syntax: SHOW RESTRICTED This command shows
the usage and proof statistics of statements with "(Proof
modification is discouraged.)" and "(New usage is discouraged.)"
markup tags in their description comments. The output is intended to
be used by scripts that compare a modified .mm file to a previous
version. MM> show restricted SHOW RESTRICTED: Proof of "id1" is

[etc.]

BTW I used "restricted" instead of "discouraged" because at the time I
wasn't sure that "discouraged" would be the final tag keyword we would
agree on. Without too much trouble, I can change it to "discouraged"
everywhere, so the command would be "show discouraged" with output "SHOW
DISCOURAGED: ..." etc. What do you think?

Something to be aware of: unfortunately 'show restricted' currently
takes about 1.5 minutes to run. I think most of the time is spent
finding the usage for around 5000 statements that have usage
discouraged. I can probably speed that up.

This command can easily be converted to a usage generator script:

|./metamath 'read set.mm' 'show restricted' exit | grep 'SHOW RESTRICTED'

set.usage |

will calculate the usage in |set.mm| and output it to |set.usage|.
Treated as a function from mm files to usage files, whenever a change is
introduced that changes the usage output, this change ought to be marked
as "override" or something similar.

The problem is that Travis is not designed for evaluating a build in the
context of previous builds. It's not that it's impossible - it's as
simple as checking out the |HEAD^| version of set.mm and comparing the
generated |set.usage| files. But if this fails, it represents a
non-repeatable "error", since pushing the exact same commit a second
time will resolve the failure.

I have some questions how Travis works. What is done with the output of
Travis? If there is a failure, how does it notify the user? Wouldn't
the user know there was "discouraged" override from looking at the
current log? Is the log kept for each run, or is it wiped each time?

Second, there is the issue of to what extent a "discouraged" override
should be considered an error. Usually it would not be, I think,
assuming the proof assistant software prevents "discouraged" usage
unless overridden. And if it is overridden, it was obviously done on
purpose for some reason. "Discouraged" means discouraged, not
prohibited, and we already have (in metamath.exe and hopefully other
proof assistants eventually) mechanisms to block casual or accidental usage.

Third, to pin down a non-obvious discouraged override, a human may have
to inspect the previous set.mm vs. the current set.mm. If another
commit is done, the previous set.mm no longer has the difference, and
the user will have to go back to some unknown earlier set.mm that the
set.usage file corresponds to.

I guess what I'm asking is, why would another commit be done without
inspecting the Travis output for the current commit and fixing any
problems? I think in almost all cases a discouraged override would be
valid, even with manual edits like deleting a *OLD, changing a label,
adding a new statement to a discouraged section, etc. and all we would
need is just a quick sanity check by a human.

(Question: in the Travis script, what is the purpose of:
&& [ egrep -q '?Error|?Warning' < mm.log; echo $? -eq 1 ]
I'm not a bash expert. but this seems to do a test for an error or
warning without doing anything with the result. I thought maybe it sets
the script exit status, but that would be overridden by the next
verifier run.)

One way to solve this is to store the usage file in the repo. Then the
correctness check that can be performed by Travis is that the generated
usage matches the stored usage file, and the equivalent of an "override"
in this context is an update to the usage file in the commit using the
above command. This does mean that anyone who is not capable of running
this command (adapted for their platform) will not be able to override
usage checks, but this seems like a reasonable compromise.

Thoughts?

I'll go along with it if others want it. I am just wondering if it is
overkill per my discussion above. It would be one more thing we have to
maintain.

Maybe for now we can just put in a simple current vs. previous set.mm
'discouraged' check and see how it goes. It would already be better
than what we have now, which is nothing. I don't think 'discouraged'
overrides will happen very often anyway. If it ends up being a problem
like you describe, then we can upgrade the Travis script to use set.usage.

Norm

from set.mm.

nmegill avatar nmegill commented on July 21, 2024

from set.mm.

digama0 avatar digama0 commented on July 21, 2024

On Thu, Jul 7, 2016 at 9:43 AM, nmegill [email protected] wrote:

BTW I used "restricted" instead of "discouraged" because at the time I
wasn't sure that "discouraged" would be the final tag keyword we would
agree on. Without too much trouble, I can change it to "discouraged"
everywhere, so the command would be "show discouraged" with output "SHOW
DISCOURAGED: ..." etc. What do you think?

Seems reasonable.

Something to be aware of: unfortunately 'show restricted' currently
takes about 1.5 minutes to run. I think most of the time is spent
finding the usage for around 5000 statements that have usage
discouraged. I can probably speed that up.

It's 2.5 minutes on my machine. Another downside of doing differential
usage calculation in the Travis script is that it has to calculate usage
twice (once for the new and the old), and there's no obvious place to cache
the old result.

Maybe smm3 will have a better go at it.

I have some questions how Travis works. What is done with the output of
Travis? If there is a failure, how does it notify the user? Wouldn't
the user know there was "discouraged" override from looking at the
current log? Is the log kept for each run, or is it wiped each time?

The travis log is displayed at https://travis-ci.org/metamath/set.mm (you
may need to log in to see it). As far as I'm aware there is no way to
recover additional output besides the log itself, so I would probably set
it up to output "diff set.usage set.usage-calc" so that you can see what
specifically changed in the 1MB file.

You can view all historical logs of previous builds at
https://travis-ci.org/metamath/set.mm/builds. There are also links to each
build in a pull request (click on the green checkmark / red X next to the
commit hash, for example in #57),
along with builds for the overall PR in "show details", as well as builds
on each branch, for example
https://github.com/metamath/set.mm/compare/develop.

If a Travis build fails, the author and the committer (usually the same
person) are emailed a "build is broken" from Travis, which is resent with
each commit until the build is fixed.

The log will indicate the failure, but that doesn't mean the user will look
at the log. We want to make sure that errors that are flagged by Travis
have to be explicitly acknowledged by the user in order to get the
go-ahead.

Second, there is the issue of to what extent a "discouraged" override
should be considered an error. Usually it would not be, I think,
assuming the proof assistant software prevents "discouraged" usage
unless overridden. And if it is overridden, it was obviously done on
purpose for some reason. "Discouraged" means discouraged, not
prohibited, and we already have (in metamath.exe and hopefully other
proof assistants eventually) mechanisms to block casual or accidental
usage.

Indeed, this is a point against using "discouraged" in Travis at all. The
main issue is that we don't require any particular authoring tool to be
used, and Travis is a funneling point for any kind of edit to go through
the same rigorous checks.

Third, to pin down a non-obvious discouraged override, a human may have
to inspect the previous set.mm vs. the current set.mm. If another
commit is done, the previous set.mm no longer has the difference, and
the user will have to go back to some unknown earlier set.mm that the
set.usage file corresponds to.

If the Travis tool ensures that all (acceptable) commits have an in-sync
set.mm/set.usage, then the first errored commit will correctly identify the
usage change. Ideally, the user should at this point run the usage
generator in order to "accept" the change, or revert the set.mm change if
it was not intended. If the user instead chooses to ignore the Travis
failure, and changes even more, Travis will continue to fail, and at this
point the diff will be pointing out changes since the last successful build.

I think this is a better target than comparing against the last commit,
because if a large change is enacted with many small commits, the usage
diff will be fragmented. More importantly, earlier commits may not even be
built
(if you push two commits at once the older one is not built), so
some usage changes may go unnoticed. By keeping the usage file in the repo
we can ensure that no changes slip by in this manner.

I guess what I'm asking is, why would another commit be done without
inspecting the Travis output for the current commit and fixing any
problems? I think in almost all cases a discouraged override would be
valid, even with manual edits like deleting a *OLD, changing a label,
adding a new statement to a discouraged section, etc. and all we would
need is just a quick sanity check by a human.

Presumably, our users are smart enough to do this. But we're all human and
make mistakes, and Travis errors on a positive proportion of commits, which
just goes to show that you can't assume any invariant is held unless you
check it mechanically.

(Question: in the Travis script, what is the purpose of:

&& [ `egrep -q '?Error|?Warning' < mm.log; echo $?` -eq 1 ]

I'm not a bash expert. but this seems to do a test for an error or
warning without doing anything with the result. I thought maybe it sets
the script exit status, but that would be overridden by the next
verifier run.)

After running "metamath | tee mm.log", which prints the command output live
to the log so you can watch it while also printing to a file, I parse the
output file looking for errors. grep will print the found lines to stdout
(which I don't care about), and will report error code 0 on found matches,
1 on no matches, and 2 or other codes if there was an error (maybe IO
problems). The "success" condition is error code 1, so I test $? after
running egrep to get the error code. If both the original metamath command
and the error check succeed, then there are no errors.

I'll go along with it if others want it. I am just wondering if it is
overkill per my discussion above. It would be one more thing we have to
maintain.

I thought about this. Another option is that Travis itself will commit the
updated usage file if it sees a disconnect. This will free us from having
to update it ourselves. I was going to say that this makes it too easy to
override usage without noticing, but actually it would still be fairly
obvious, since I read all the commits that come through anyway.

Mario

from set.mm.

digama0 avatar digama0 commented on July 21, 2024

(Question: in the Travis script, what is the purpose of:

&& [ `egrep -q '?Error|?Warning' < mm.log; echo $?` -eq 1 ]

I'm not a bash expert. but this seems to do a test for an error or
warning without doing anything with the result. I thought maybe it sets
the script exit status, but that would be overridden by the next
verifier run.)

Also, I forgot to mention that each line in the Travis script is run separately, timed, and a nonzero error code halts the build as "failed". (That's why the error code does not need to be checked in the next line.) By contrast, if a step in the install section fails, the build is stopped as "errored" instead. The idea is that the script section checks the validity of set.mm itself (which is the content of the new commit), while problems in other steps of the process (like compiling metamath) are marked as build errors rather than failures.

from set.mm.

nmegill avatar nmegill commented on July 21, 2024

On 7/7/16 10:21 AM, Norman Megill wrote:

For master releases, I think that I will start to do a 'show restricted'
comparison to the previous master release as part of my overall QA
process (using the Travis log or by hand). I have doing that informally

Actually it would be by hand, since I wouldn't want to update the master
until I inspect all 'discouraged' differences.

in the past every now and then, checking to see that axiomatic studies
are isolated with 'show usage ax-meredith/rec' etc. Even with
set.usage, I think inspection by someone like me who "knows" what the
purpose is (especially since I added them in the first place) is a good
idea, and by comparing master releases I can do everything at once
instead of having to inspect every develop commit.

On 7/7/16 10:32 AM, Mario Carneiro wrote:

unless overridden. And if it is overridden, it was obviously done
on purpose for some reason. "Discouraged" means discouraged, not
prohibited, and we already have (in metamath.exe and hopefully
other proof assistants eventually) mechanisms to block casual or
accidental usage.

Indeed, this is a point against using "discouraged" in Travis at all.
The main issue is that we don't require any particular authoring tool
to be used, and Travis is a funneling point for any kind of edit to
go through the same rigorous checks.

I am also thinking that we shouldn't put it in Travis. Checking master
vs. previous master by hand isn't a big deal for me, and I would end up
with more confidence than doing incremental develop checks where I may
have missed one.

Norm

from set.mm.

digama0 avatar digama0 commented on July 21, 2024

Regarding larger comparisons such as master-to-master updates, if the set.usage file is in the repo then this can be done easily. If you take any two Travis-certified commits and compare them, the difference in set.mm will be what it always has, and the difference in set.usage will be the change in usage between the two versions.

One problem with usage comparisons of any sort is that theorem reordering will cause the usage file to also be reordered, which is a "spurious" usage change. This can be mitigated by alphabetizing the file (I'm sure there's a linux utility for that), although that wouldn't help against theorem renames.

from set.mm.

nmegill avatar nmegill commented on July 21, 2024

On 7/7/16 10:53 AM, Mario Carneiro wrote:

Regarding larger comparisons, if the set.usage file is in the repo then
this can be done easily. If you take any two Travis-certified commits
and compare them, the difference in set.mm will be what it always has,
and the difference in set.usage will be the change in usage between the
two versions.

One problem with usage comparisons of any sort is that theorem
reordering will cause the usage file to also be reordered, which is a
"spurious" usage change. This can be mitigated by alphabetizing the file
(I'm sure there's a linux utility for that), although that wouldn't help
against theorem renames.

I understand that. I don't think it will be a big problem (for me at
least), although experience will tell. I think keeping set.usage
synchronized with moves and label changes (without cheating by
re-running 'show restrictions') sounds like a pain and would probably be
more work overall.

Almost all of the 'discouraged' tags are in long-stable sections that
are unlikely to get moved or have their labels changed. In any case,
since I'm familiar with most of those sections, I'll probably recognize
a move or label change easily.

For fun, I just compared the 'show restricted' output of a 5/22 set.mm
and the current one. The output of set.mm diff had 146K lines, and the
'show restrictions' diff had 138 lines. All but one were due to new
'discouraged' tags I added (and their usages therefore appearing,
causing most of the diff lines). There is a mystery one, mulge0OLD,
whose proof got shorter in spite of a 'proof modification is
discouraged' tag, and I'll have to do some research for that one to see
what happened.

Norm

from set.mm.

digama0 avatar digama0 commented on July 21, 2024

On Thu, Jul 7, 2016 at 11:58 AM, nmegill [email protected] wrote:

On 7/7/16 10:53 AM, Mario Carneiro wrote:

Regarding larger comparisons, if the set.usage file is in the repo then
this can be done easily. If you take any two Travis-certified commits
and compare them, the difference in set.mm will be what it always has,
and the difference in set.usage will be the change in usage between the
two versions.

One problem with usage comparisons of any sort is that theorem
reordering will cause the usage file to also be reordered, which is a
"spurious" usage change. This can be mitigated by alphabetizing the file
(I'm sure there's a linux utility for that), although that wouldn't help
against theorem renames.

I understand that. I don't think it will be a big problem (for me at
least), although experience will tell. I think keeping set.usage
synchronized with moves and label changes (without cheating by
re-running 'show restrictions') sounds like a pain and would probably be
more work overall.

I guess I should say that I have zero intention of manually editing the
file. For all I care it could be an opaque binary file (except that the
diffs ought to be somewhat readable). Manually adjusting the file to match
what it should look like "with cheating" is much too hard to get right that
it is not worthwhile. The interaction is "change set.mm" -> "break Travis"
-> "examine diff" -> "regenerate usage" -> "Travis is happy" (or "change
set.mm" -> "break Travis" -> "Travis fixes itself" with automatic Travis
commits, with "examine diff" in there concurrently).

For fun, I just compared the 'show restricted' output of a 5/22 set.mm
and the current one. The output of set.mm diff had 146K lines, and the
'show restrictions' diff had 138 lines. All but one were due to new
'discouraged' tags I added (and their usages therefore appearing,
causing most of the diff lines). There is a mystery one, mulge0OLD,
whose proof got shorter in spite of a 'proof modification is
discouraged' tag, and I'll have to do some research for that one to see
what happened.

That was me, around the time when adding algebra deduction theorems. The
old version was deprecated because of its hypothesis order, not because it
had an interesting proof, and I just referred it to the newer proof.

Mario

from set.mm.

sorear avatar sorear commented on July 21, 2024
&& [ `egrep -q '?Error|?Warning' < mm.log; echo $?` -eq 1 ]

Can't this be better written as:

&& ! grep -qE '?Error|?Warning' mm.log

There's no need to drag $? into this, plain old status negation works; grep takes a filename as an argument; and egrep is vaguely deprecated in favor of grep -E

from set.mm.

digama0 avatar digama0 commented on July 21, 2024
  • I had problems with ! in the middle of the command, but I think quoting could fix that;
  • That tests for nonzero exit code, not specifically exit code 1. Although the chance of grep erroring is slim...
  • I hadn't heard of egrep deprecation, I'm no bash expert myself. Either way is fine with me.

from set.mm.

david-a-wheeler avatar david-a-wheeler commented on July 21, 2024

YAML treats "!" specially, so you sometimes have to quote. I did that elsewhere in the same file.

If grep errors for any reason there's a problem, so I think it's fine to fail then.

"egrep" isn't in the POSIX spec; "grep -E" is in POSIX spec. So I'd use "grep -E" instead.

from set.mm.

digama0 avatar digama0 commented on July 21, 2024

On Thu, Jul 7, 2016 at 8:00 PM, David A. Wheeler [email protected]
wrote:

If grep errors for any reason there's a problem, so I think it's fine
to fail then.

Right, that's the problem. If grep fails with error code 2, then "! grep"
will return error code 0 which is a success, not a failure. (Ideally this
should fail as "errored", because the grep error is probably not related to
the contents of set.mm, but it is important to never mistakenly mark a
failure as success.)

from set.mm.

nmegill avatar nmegill commented on July 21, 2024

On 7/7/16 10:32 AM, Mario Carneiro wrote:

On Thu, Jul 7, 2016 at 9:43 AM, nmegill [email protected] wrote:

BTW I used "restricted" instead of "discouraged" because at the time I
wasn't sure that "discouraged" would be the final tag keyword we would
agree on. Without too much trouble, I can change it to "discouraged"
everywhere, so the command would be "show discouraged" with output "SHOW
DISCOURAGED: ..." etc. What do you think?

Seems reasonable.

This is now done in metamath.exe version 0.132. I changed the
text in http://us2.metamath.org:88/mpegif/mmnotes.txt entry of
11-May-2016 to reflect all of the changes.

from set.mm.

digama0 avatar digama0 commented on July 21, 2024

Addressed in pull request #91, merged in 7414a93.

from set.mm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.