Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New evaluator that doesn't depend on flattening #1495

Merged
merged 39 commits into from
Sep 9, 2024
Merged

Conversation

bugarela
Copy link
Collaborator

@bugarela bugarela commented Sep 2, 2024

Hello :octocat:

This is a huge PR because it changes the core of our evaluation process to not depend on flattening anymore. I've moved a lot of things around and attempted many different things throughout this process, and I finally have a version that I think is good enough to ship.

Since this is pretty much a rewrite, we are getting rid of a lot of tech debt and fixing many bugs. Here's a list of what I could find:

The following issues are consequences of flattening bugs. After this PR, the issues stop happening for test, run, and repl, and only happen in verify and compile commands, as we still need flattening for Apalache. So the impact of these issues was significantly reduced.

Change overview

The main change is that, while the old evaluator can evaluate a single flattened module, this version can evaluate expressions, definitions, simulations and tests over the original modules, avoiding the flattening pass (which is slow and buggy).

Related to that, the new version compiles from a starting point: either an expression/definition typed into the REPL, a combiation of init/step/inv given to simulate, or from a test definition. It won't even touch code that is not a dependency of this entrypoint. The old evaluator would go through the entire flattened module and compile everything beforehand.

The biggest challenge is to handle constants. This was previously done by the flattener, and now we need to take care of them while simulating. A good example of this is the instances.qnt file:

action init = all { V1::init, V2::init }
action step = all {
V1::counter' = V1::counter + 1,
V2::counter' = V2::counter + 2,
}
// The two variables should increment independently
val inv = V1::counter == 0 or V2::counter > V1::counter

Evaluating V1::init and V2::init are two different things, even though both point to the same definition in the lookup table. To make sure we use different constant values for different constants, and that we use different variable registers for variables that come from the same module but through different instances, we need to track a some extra information while building (compiling) expressions.

Structure

The new evaluator has two main steps:

  1. Building (builder.ts). Take Quint expressions and build typescript arrow functions that can be called to obtain the value of that expression under a certain context
  2. Evaluating (`evaluator.ts). Take the built arrow functions and call them, keeping track of the context.

Performance

Although performance changes are not consistent, my evaluation is that it it overall better.

For the REPL, it is a complete game changer. For the malachite project, evaluating any expression in the REPL would take around 20 seconds, while now it is completely instant.

For runs and tests, it is faster for most cases, sometimes taking half of the time. For some cases, it is slower, but I couldn't find any case where it takes double of the time or more. I want to further improve this version so we can get faster results for all scenarios, but I think this version is good enough to release, specially considering the REPL performance and how many bugs it fixes. See benchmark details below.

Benchmarks "old version" in the benchmarks is the last released version. Ideally I should benchmark against main, but this is easier. Also, we are measuring simulation/testing time, and performance changes outside of this execution scope are not accounted.

Cases where the new version is better than the old one:

quint run --main=ics20Test --invariant=BalanceNonNegative ../examples/cosmos/ics20/ics20.qnt  --seed=0xbf95eeae47ada
old: (50803ms)
new: (25292ms)

quint run --main=prisoners3 --invariant='countInv and safetyInv' ../examples/puzzles/prisoners/prisoners.qnt
old: (8033ms)
new: (4848ms)

quint run --main=prisoners3 --invariant='countInv and safetyInv' ../examples/puzzles/prisoners/prisoners.qnt --max-samples=50000
old: (39185ms)
new: (23397ms)

quint test mysticeti_paper_test.qnt --max-samples=100
old: (24329ms)
new: (12704ms)

quint run mysticeti_paper_test.qnt --init=init_with_dag --max-steps=1 --max-samples=100
old: (9157ms)
new: (4187ms)

PS: The mysticeti ones actually have a flattening bug in the old version and I have to make a workaround

quint test --max-samples 1000 specs/quint/tests/disagreement/disagreementTest.qnt
old: (17998ms)
new: (16574ms)

This is the command that runs on Malachite's CI:
time bash scripts/quint-forall.sh "test --max-samples 100" specs/quint/tests/**/*Test.qnt
new:
Executed in  117.45 secs    fish           external
   usr time  152.82 secs    0.00 micros  152.82 secs
   sys time    6.73 secs  663.00 micros    6.73 secs
old:
Executed in  163.25 secs    fish           external
   usr time  196.00 secs  348.00 micros  196.00 secs
   sys time    7.89 secs  210.00 micros    7.89 secs

Cases where the new version is worse than the old one:

quint test --max-samples=100 --main TendermintModels ../examples/cosmos/tendermint/TendermintModels.qnt
old: (3653ms)
new: (6870ms)

quint test --main ics20Test ../examples/cosmos/ics20/ics20.qnt
old: (61450ms)
new: (101720ms)

quint test --main=properChannelsTests ../examples/cosmos/ics20/denomTrace.qnt
old: (4635ms)
new: (6770ms)

quint test --max-samples 100 specs/quint/tests/line28/line28Test.qnt
old: (20624ms)
new: (25692ms)
  • Tests added for any new code
  • Documentation added for any new functionality
  • Entries added to the respective CHANGELOG.md for any new functionality
  • Feature table on README.md updated for any listed functionality

…stance-constant-callgraph"

This change is no longer necessary as this dependency no longer exists.
I've checked that the reported problem doesn't happen after the revert

This reverts commit 384c28e, reversing
changes made to 576d72b.
This test case describes a scenario where we don't get the desired
result. The behavior changed, not for the better nor for the worse.
@bugarela bugarela marked this pull request as ready for review September 2, 2024 18:32
@bugarela
Copy link
Collaborator Author

bugarela commented Sep 2, 2024

Hi @romac @MahtabNorouzi! I realize this is way too big to review. Unfortunately, the nature of this work and my current resources made it so this is the only feasible option. @josef-widder has been using a version of this since last week as a beta tester, and I'll also be using this heavily over this week. I've also been testing this with our set of examples and I'm confident with its current state.

So I'd like to ask that you give this a look, however it fits in your time and priorities right now. Let me know if you think this is worth merging/releasing, or if there are any other requirements I should meet before putting it out.

I'm also happy to answer any questions or go through stuff in detail either in here, slack or zoom.

@bugarela
Copy link
Collaborator Author

bugarela commented Sep 2, 2024

@konnov if you want to take a look at this, I'm sure you'll have valuable feedback. I think this is an evolution of what you wrote, and I would not be able to come up with it without the base you provided through your implementation.

@konnov
Copy link
Contributor

konnov commented Sep 3, 2024

@konnov if you want to take a look at this, I'm sure you'll have valuable feedback. I think this is an evolution of what you wrote, and I would not be able to come up with it without the base you provided through your implementation.

It's a massive change. I think it would make more sense to break it down into much smaller PRs. Reviewing such a big change in the simulator is a full-time job for a week. If you need help with that, Informal management knows my hourly rates.

@bugarela
Copy link
Collaborator Author

bugarela commented Sep 3, 2024

It's a massive change. I think it would make more sense to break it down into much smaller PRs. Reviewing such a big change in the simulator is a full-time job for a week. If you need help with that, Informal management knows my hourly rates.

Of course, yes. I've always be very in favor of small PRs and feel bad that this is so big. However, in this particular case, I don't think the effort taken (probably more than a week) to break this down would be compensated in significantly more informed reviews, considering the people available to review it at this point.

On top of that, I mentioned you here to make you aware of this change. I've tested the new version in many of our internal and public Quint projects and it's all good. I know you are also writing Quint specs, and I'd rather have you informed than surprised by such meaningful change in the simulator.

And sorry if I implied that I expected you to work for free! I'm just trying to be considerate of your participation in this as best as I can.

@konnov
Copy link
Contributor

konnov commented Sep 4, 2024

It's a massive change. I think it would make more sense to break it down into much smaller PRs. Reviewing such a big change in the simulator is a full-time job for a week. If you need help with that, Informal management knows my hourly rates.

Of course, yes. I've always be very in favor of small PRs and feel bad that this is so big. However, in this particular case, I don't think the effort taken (probably more than a week) to break this down would be compensated in significantly more informed reviews, considering the people available to review it at this point.

On top of that, I mentioned you here to make you aware of this change. I've tested the new version in many of our internal and public Quint projects and it's all good. I know you are also writing Quint specs, and I'd rather have you informed than surprised by such meaningful change in the simulator.

And sorry if I implied that I expected you to work for free! I'm just trying to be considerate of your participation in this as best as I can.

Hey @bugarela! No worries! I am just saying that I could have a look at smaller PRs. Understanding a massive change is quite hard without having context. Normally the context exchange happens during regular meetings. Since we don't have those anymore, reviewing such a big change would be similar to a security audit to me, of course, the focus would not be on the security bugs.

If you have no other choice to refactor this big PR into smaller chunks, I would recommend you to introduce a flag to run both the old simulator and the new one in a lockstep, so you would be able to quickly detect when they start to deviate.

Comment on lines -146 to -147
// Update the definition to point to the expression being overriden
constDef.id = ex.id
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a kind of a hack so the compiler would evaluate the constant as the expression it was overridden to. We now do it on the level of registers on the evaluator.

Comment on lines -193 to -195
decl.overrides.forEach(([_param, expr]) => {
this.graphAddAll(expr.id, Set([decl.id]))
})
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from a reverted commit, which added a hack for something we don't need anymore

@bugarela bugarela enabled auto-merge September 9, 2024 18:15
@bugarela bugarela merged commit e9a0b38 into main Sep 9, 2024
14 checks passed
@bugarela bugarela deleted the gabriela/new-evaluator branch September 9, 2024 18:53
@bugarela bugarela mentioned this pull request Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment