SMT test cases rework #3592

WolframPfeifer · 2025-04-11T11:51:51Z

Motivation

Currently, we have a category of SMT test cases (TestCvc4, TestZ3) which work by loading a .key file with a single formula, for example \forall s x; p(x) -> \exists s y; p(y), directly converting it to SMT-LIB, and running a solver.

These test cases have some problems:

They can last quite long, since the solver is run into a timeout (expected behavior since the corresponding sequent is not provable). However, since the timeout is set to 5 min per individual test case (hardcoded in SMTTestSettings), the test always takes that time.
The test cases only use the legacy (aka non-modular) translation.
The test cases only distinguish two cases: Valid (unsat returned by solver), or not (sat returned by solver). However, as it is implemented, the test case also always succeeds if the solver returns unknown or runs into a timeout!

The question is: What do we intend with the SMT test cases? My take on this would be:

We want to detect regressions in our translation(s) of sequents to SMT-LIB.
We want to have certain smoke tests with simple examples to ensure that the SMT solvers are usable.
We do not want to test the SMT solvers per se (regressions in the solvers internally).

Intended Change

Add separate test cases for modular and legacy translation.
Specify the expected result by the solver more precisely in the test case: Now, ThreeValuedTruth is used, so the test cases actually specify whether unsat, sat, or unknown/timeout is expected by the solver.
Many test cases are expected to have timeouts, so it is crucial to keep the timeout as small as possible such that the tests run through quickly. I changed the settings to 50 sec per individual test case (from 5 min!). That turned out to be enough such that all tests with unsat or explicit unknown returned easily (on my machine this was always more than four times the time needed, but it needs to be that high since tests run slower on GitHub CI), but still much faster than the current situation, where the test cases block the pipeline for a long time without producing real insights.
Apart from the functional changes, I also did some refactoring: The parameterized test case is in class SMTSolverTest now, the subclasses Z3Test, Z3LegacTest, ... define the actual parameters by overwriting the abstract method provideTestData. This way the test cases are much more concise and readable.

Caveats

Apparently, there is quite a difference in provability between the modular and the legacy translation, i.e. certain sequents can easily be proven in the legacy case, but not with the new translation. Also, I noticed that with the legacy translation, it is possible to get sat (i.e., there is a counterexample), which is not possible with the modular translation. But I think that is expected ...
From our solver infrastructure, it is currently not possible to distinguish between solver timeout and explicit unknown returned by the solver. In the future, we might want to separate that to be able to define the expected result more precisely ...

Open TODOs

~~More test cases for other solvers (?)~~
~~Discussion: Disable the test cases that are expected to run into a timeout?~~ to avoid long waiting, if a timeout is expected the timeout is set to a significantly lower value (2 sec instead of 50 sec at the moment)
~~Discussion: What should happen if a solver is not installed? Skip or fail tests?~~ see Warn on missing SMT solvers if flag is set #3600
~~Maybe: Do not use hardcoded settings in SMTTestSettings, but respect the settings set in the individual file?~~

Type of pull request

Bug fix (non-breaking change which fixes an issue)
Refactoring (behaviour should not change or only minimally change)
New feature (non-breaking change which adds functionality)
There are changes to the (Java) code

Ensuring quality

I made sure that introduced/changed code is well documented (javadoc and inline comments).
I added new test case(s) for new functionality.
I have checked that runtime performance has not deteriorated.

The contributions within this pull request are licensed under GPLv2 (only) for inclusion in KeY.

…cy and new modular translation

Drodt · 2025-04-11T12:00:16Z

If the goal for test are these three points (and I think they are good!)

We want to detect regressions in our translation(s) of sequents to SMT-LIB.

We want to have certain smoke tests with simple examples to ensure that the SMT solvers are usable.

We do not want to test the SMT solvers per se (regressions in the solvers internally).

Would it then not make more sense to only run a few of the test cases through the SMT solvers to test the basics of the bridge, while for the majority of cases, we only compare the output of the SMT translation to snapshots?

…oring

key.core/src/main/resources/de/uka/ilkd/key/smt/solvertypes/CVC4_legacy.props

…/unsat and 2s for timeout/unknown cases

…s and CI

changed expected results to ThreeValuedTruth, separate tests for lega…

8e8737d

…cy and new modular translation

WolframPfeifer added 🐞 Bug Feature New feature or request SMT Test cases labels Apr 11, 2025

WolframPfeifer self-assigned this Apr 11, 2025

WolframPfeifer added 2 commits April 11, 2025 13:52

minor comment changes

33d93c4

spotless

76d496e

WolframPfeifer added 3 commits April 11, 2025 16:50

cleanup

20fee79

20sec timeout was not enough, increased to 30sec

9fd9504

removed all test cases for CVC4, make CVC4 experimental, minor refact…

80bd8fd

…oring

FliegendeWurst mentioned this pull request Apr 11, 2025

Enable SMT focus goals (unsat cores) for CVC5 #3594

Merged

5 tasks

wadoon reviewed Apr 14, 2025

View reviewed changes

key.core/src/main/resources/de/uka/ilkd/key/smt/solvertypes/CVC4_legacy.props Show resolved Hide resolved

WolframPfeifer added 4 commits April 25, 2025 13:11

Merge branch 'main' into pfeifer/smtTestRework

55d7061

fix typo

4300de5

better error message for failed tests, default timeout to 50s for sat…

745348f

…/unsat and 2s for timeout/unknown cases

disabled two test cases that show strange behavior between Z3 version…

8e7666d

…s and CI

wadoon approved these changes May 9, 2025

View reviewed changes

Merge branch 'main' into pfeifer/smtTestRework

84f1896

WolframPfeifer enabled auto-merge May 9, 2025 12:41

mattulbrich mentioned this pull request May 9, 2025

Warn on missing SMT solvers if flag is set #3600

Merged

WolframPfeifer added this pull request to the merge queue May 9, 2025

Merged via the queue into main with commit 2fb07b6 May 9, 2025
6 of 7 checks passed

WolframPfeifer deleted the pfeifer/smtTestRework branch May 9, 2025 14:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SMT test cases rework #3592

SMT test cases rework #3592

Uh oh!

WolframPfeifer commented Apr 11, 2025 •

edited

Loading

Uh oh!

Drodt commented Apr 11, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SMT test cases rework #3592

SMT test cases rework #3592

Uh oh!

Conversation

WolframPfeifer commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Intended Change

Caveats

Open TODOs

Type of pull request

Ensuring quality

Uh oh!

Drodt commented Apr 11, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

WolframPfeifer commented Apr 11, 2025 •

edited

Loading