Investigate performance difference caused by merging BI and SBI
@robbertkrebbers, in an experiment, merged the BI and SBI canonical structures to see if removing the inheritance and coercion between the two makes any difference. Turns out it does! The weak_mem branch becomes 37% faster. Iron, another monPred
user, gets 23% faster. Even lambdaRust master gains 5%.
This has triggered a flurry of discussion in Mattermost, and @janno entered the rabbit hole that is called "unification". In this issue, I hope to collect some of the conclusions and the other data we gathered in that discussion, in case we want to look at this again in the future. Feel free to edit this initial post like a Wiki page to keep it up-to-date.
Cc #3
RefCell::try_borrow
A lot of time was spent looking at refcell_try_borrow_type
, which went down in type-checking time from 50s to less than 2s.
Lemma refcell_try_borrow_type ty `{!TyWf ty} :
typed_val refcell_try_borrow
(fn(∀ α, ∅; &shr{α}(refcell ty)) → option (ref α ty)).
Why was it so slow?
The culprit seems to be Coq trying to unify ref
with refmut
during typeclass search.
@janno spent lot of time trying to figure out what Coq is doing in these 50s. (TODO: insert conclusion here)
When did it become so slow?
After some digging, we narrowed the regression down to this commit range (confirmed by building first and last commit in Coq 8.8.2 and timing the relevant lemma statement). The final commit here introduces a hack by @jjourdan that temporarily masked the problem:
(* FIXME : HACK *)
Local Hint Opaque ref : typeclass_instances.
That hack got removed soon thereafter, for unclear reasons, as the slowdown was still present. So starting with this commit (confirmed with Coq 8.8.2), type-checking the refcell_try_borrow_type
statement took 60s (going down to 50s for later Coq versions).
I think this demonstrates that the refcell RA and invariant changes in that range caused the problem (and not some other change in the wider ecosystem).
type_sum.v
rewrite
@janno also took a closer look at this rewrite in type_sum.v
, concluding:
We start with the task
bi_car (sbi_bi ?M6108) ~= bi_car (monPredI view_bi_index (uPredI (iResUR Σ)))
.
Here's a rundown of the steps of the version with
sbi
.
bi_car (sbi_bi X) ~= bi_car <something monPredI>
- (try to unify head & arguments)
bi_car ~= bi_car
✅ That leaves the argumentsbi_bi X ~= monPredI _ _
- (reduce)
BI <many terms of the form (sbi_.. X)> ~= BI <many terms with monPred>
.
- (unify head & arguments)
BI ~= BI
✅ & a bunch of arguments. The following steps are repeated for every argument! The first one issbi_car X ~= monPred view_bi_index (uPredI (iResUR Σ))
- 4.1. (CS search finds
monPredSI
) Sadly,uPredI (iResUR Σ)
is abi
, not ansbi
. To applymonPredSI
, we need to find the rightsbi
. Here is where the fun begins! Our new unification task is:uPredI (iResUR Σ) ~= sbi_bi ?Z
.- 4.2 (After running around aimlessly for a bit Coq reduces both sides):
BI <many terms with uPredI > ~= BI <many terms with sbi_ projections>
. EASY! Let's compare all fields separately! The following steps are repeated (this is an inner loop!!) for every argument ofBI
. Let's do the first one forbi_car
(the projection is not present on the left-hand side).- 4.2.1.
uPred (iResUR Σ) ~= sbi_car ?Z
: solved by CS search, unifies?Z ~= uPredSI _
- 4.2.2. repeat for every
bi
field- 4.3. repeat for every
bi
field
Note that this unification task pops up as part of a type class search for Proper
, spawned by one of the lemmas used in the rewrite. Note also that the explanation above applies to weak memory lambda-rust with sbi
. The SC version with sbi
does not have the inner loop. The SC version without sbi
performs only constant work; i.e. the unification task does not scale with the number of fields in bi
.