Check proof length as part of CI
Prosa has unfortunately accumulated a significant number of "long" proofs. Originally, we agreed to keep proofs down to 30-50 lines at most (and ideally shorter), as longer proofs tend to hide reusable helper lemmas and/or interesting intermediate steps that should be exposed at the spec level. Unfortunately, this rule was soon forgotten or ignored when inconvenient, since it was not frequently checked or strictly enforced.
To prevent the problem from getting bigger (i.e., to prevent further long proofs from slipping in), this merge request adds a script (scripts/proofloc.py
) that automatically flags new proofs that are too long as part of the continuous integration setup. By default, "too long" means 40 lines. Given the current distribution of proof lengths (see below, obtained with scripts/proofloc.py -b 5
), this seems like a reasonable cutoff (and it is squarely in the middle of the informal 30-50 LOC range, too). It bears repeating that this is intended as a maximum proof length; shorter proofs are of course always welcome. (It should also be noted that it is of course possible to squeeze any proof onto a single line; the idea here is that all contributors adhere to a reasonable coding style, so such extreme cases shouldn't be a problem in practice.)
To deal with existing long proofs, the script checks a database of "known long proofs" (scripts/known-long-proofs.json
) before complaining about a proof exceeding 40 lines. The idea is that, over time, these legacy long proofs will go away as parts of the library are refactored and improved.
Finally, for the rare case where a new proof really needs to span more than 40 lines, it is possible to add manual exceptions to the long-proofs database.
Please holler if you see any problem with this, or if you think the cutoff should be something different from 40 LOC.
LOC | #Proofs |
---|---|
<= 5 | 475 |
<= 10 | 220 |
<= 15 | 133 |
<= 20 | 116 |
<= 25 | 88 |
<= 30 | 77 |
<= 35 | 60 |
<= 40 | 50 |
<= 45 | 29 |
<= 50 | 24 |
<= 55 | 25 |
<= 60 | 9 |
<= 65 | 6 |
<= 70 | 5 |
<= 75 | 4 |
<= 80 | 8 |
<= 85 | 9 |
<= 90 | 4 |
<= 95 | 2 |
<= 100 | 7 |
<= 105 | 2 |
<= 110 | 2 |
<= 115 | 3 |
<= 120 | 2 |
<= 125 | 2 |
<= 130 | 3 |
<= 135 | 1 |
<= 140 | 1 |
<= 145 | 1 |
<= 150 | 1 |
<= 155 | 1 |
<= 160 | 3 |
<= 165 | 2 |
<= 170 | 2 |
<= 175 | 1 |
Merge request reports
Activity
Nice ! This should indeed be an efficient way to avoid adding more and more oversized proofs. I guess, we'll have to be careful not to multiply adhoc not very reusable lemmas that could be produced when cuting long proofs in smaller bits to please the CI.
Anyway, the risk is definitely worth taking and I fully agree with the current proposal.
added 4 commits
-
6dd796c6...03863f8f - 2 commits from branch
RT-PROOFS:master
- dedc4ea4 - add scripts/proofloc.py, a utility for checking proof length
- f90bdb7e - enable proof-length checking during CI
-
6dd796c6...03863f8f - 2 commits from branch
Thanks for the positive feedback, Pierre.
I agree that we shouldn't litter the spec with technical, uninteresting intermediate lemmas. I hope that we'll be able to identify reusable and interesting lemmas to be factored out, and in cases where that doesn't work out, we can always add an exception. The 40-line threshold is a strongly encouraged recommendation, not an iron rule.