Skip to content

Conversation

@WolframPfeifer
Copy link
Member

@WolframPfeifer WolframPfeifer commented Apr 11, 2025

Motivation

Currently, we have a category of SMT test cases (TestCvc4, TestZ3) which work by loading a .key file with a single formula, for example \forall s x; p(x) -> \exists s y; p(y), directly converting it to SMT-LIB, and running a solver.

These test cases have some problems:

  • They can last quite long, since the solver is run into a timeout (expected behavior since the corresponding sequent is not provable). However, since the timeout is set to 5 min per individual test case (hardcoded in SMTTestSettings), the test always takes that time.
  • The test cases only use the legacy (aka non-modular) translation.
  • The test cases only distinguish two cases: Valid (unsat returned by solver), or not (sat returned by solver). However, as it is implemented, the test case also always succeeds if the solver returns unknown or runs into a timeout!

The question is: What do we intend with the SMT test cases? My take on this would be:

  • We want to detect regressions in our translation(s) of sequents to SMT-LIB.
  • We want to have certain smoke tests with simple examples to ensure that the SMT solvers are usable.
  • We do not want to test the SMT solvers per se (regressions in the solvers internally).

Intended Change

  • Add separate test cases for modular and legacy translation.
  • Specify the expected result by the solver more precisely in the test case: Now, ThreeValuedTruth is used, so the test cases actually specify whether unsat, sat, or unknown/timeout is expected by the solver.
  • Many test cases are expected to have timeouts, so it is crucial to keep the timeout as small as possible such that the tests run through quickly. I changed the settings to 50 sec per individual test case (from 5 min!). That turned out to be enough such that all tests with unsat or explicit unknown returned easily (on my machine this was always more than four times the time needed, but it needs to be that high since tests run slower on GitHub CI), but still much faster than the current situation, where the test cases block the pipeline for a long time without producing real insights.
  • Apart from the functional changes, I also did some refactoring: The parameterized test case is in class SMTSolverTest now, the subclasses Z3Test, Z3LegacTest, ... define the actual parameters by overwriting the abstract method provideTestData. This way the test cases are much more concise and readable.

Caveats

  • Apparently, there is quite a difference in provability between the modular and the legacy translation, i.e. certain sequents can easily be proven in the legacy case, but not with the new translation. Also, I noticed that with the legacy translation, it is possible to get sat (i.e., there is a counterexample), which is not possible with the modular translation. But I think that is expected ...
  • From our solver infrastructure, it is currently not possible to distinguish between solver timeout and explicit unknown returned by the solver. In the future, we might want to separate that to be able to define the expected result more precisely ...

Open TODOs

  • More test cases for other solvers (?)
  • Discussion: Disable the test cases that are expected to run into a timeout? to avoid long waiting, if a timeout is expected the timeout is set to a significantly lower value (2 sec instead of 50 sec at the moment)
  • Discussion: What should happen if a solver is not installed? Skip or fail tests? see Warn on missing SMT solvers if flag is set #3600
  • Maybe: Do not use hardcoded settings in SMTTestSettings, but respect the settings set in the individual file?

Type of pull request

  • Bug fix (non-breaking change which fixes an issue)
  • Refactoring (behaviour should not change or only minimally change)
  • New feature (non-breaking change which adds functionality)
  • There are changes to the (Java) code

Ensuring quality

  • I made sure that introduced/changed code is well documented (javadoc and inline comments).
  • I added new test case(s) for new functionality.
  • I have checked that runtime performance has not deteriorated.

The contributions within this pull request are licensed under GPLv2 (only) for inclusion in KeY.

@Drodt
Copy link
Member

Drodt commented Apr 11, 2025

If the goal for test are these three points (and I think they are good!)

  • We want to detect regressions in our translation(s) of sequents to SMT-LIB.
  • We want to have certain smoke tests with simple examples to ensure that the SMT solvers are usable.
  • We do not want to test the SMT solvers per se (regressions in the solvers internally).

Would it then not make more sense to only run a few of the test cases through the SMT solvers to test the basics of the bridge, while for the majority of cases, we only compare the output of the SMT translation to snapshots?

@WolframPfeifer WolframPfeifer enabled auto-merge May 9, 2025 12:41
@WolframPfeifer WolframPfeifer added this pull request to the merge queue May 9, 2025
Merged via the queue into main with commit 2fb07b6 May 9, 2025
6 of 7 checks passed
@WolframPfeifer WolframPfeifer deleted the pfeifer/smtTestRework branch May 9, 2025 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants