-
Notifications
You must be signed in to change notification settings - Fork 6
strategy comparison
TODO: Rerun because of interference with vdiff-viewer, and because of a cbmc false positive bug.
Changelog:: Use data of a re-run
Executive Summary: All “weighted-positions”-strategies perform similarly.
smart performs worse. The runs with random-walk should be repeated because
it is (unfairly) not using the constant pool.
What troubles me is the following: The only unsoundness that we found (without
looking into --disagreement because for those we have no way to establish the
ground truth) are variations of the
reducercommutativity/avg20_false-def-behavior.i by cpachecker.
Unfortunately, this is not a really surprising finding as cpachecker already
reached a “false unsat” in SV-Comp18 on the original file. In fact, there are
multiple cases like that for cpachecker that can be found in SV-Comp’s
results table.
> du -sh *
3.4M bfs.db
3.4M dfs.db
2.7M random-uniform.db
2.7M random-walk.db
2.2M smart.dbOr in absolute numbers (vdiff-viewer --stat):
| data | runs | programs | used source files |
|---|---|---|---|
| bfs.db | 12446 | 1523 | 100 |
| dfs.db | 12446 | 1519 | 100 |
| random-uniform.db | 12446 | 1524 | 100 |
| random-walk.db | 12062 | 1504 | 99 |
| smart.db | 9724 | 1161 | 91 |
the invocation happened through random100_fast_3p1.sh, and
random100_fat_3p2.sh inside two docker containers that were initialized with
vdiff-docker.hs --cpuset-cpus=0,2,4,6 --memory=18GB and
vdiff-docker.hs --cpuset-cpus=1,3,5,7 --memory=18GB.
The parameters for each vdiff run were: budget=20, timeout=15, seed=123.
| data | –unsound | –incomplete | –disagreement | –unsound-klee-cbmc-smack |
|---|---|---|---|---|
| bfs.db | 252 | 26 | 302 | 8 |
| dfs.db | 319 | 22 | 392 | 2 |
| random-uniform.db | 292 | 28 | 280 | 10 |
| random-walk.db | 349 | 2 | 375 | 0 |
| smart.db | 159 | 4 | 169 | 0 |
| data | –unsound(files) | –incomplete(files) | –disagreement(files) | –unsound-klee-cbmc-smack(files) |
|---|---|---|---|---|
| bfs.db | 56 | 5 | 60 | 1 |
| dfs.db | 59 | 6 | 63 | 1 |
| random-uniform.db | 51 | 5 | 58 | 1 |
| random-walk.db | 50 | 2 | 55 | 0 |
| smart.db | 43 | 2 | 45 | 0 |
All cases of unsoundness were caused by KLEE or Smack. In general, KLEE causes a lot more unsoundness because of its ‘special needs’ regarding instrumentation.
TODO: Check Smart
BFS: By verifier:
./c/array-tiling/poly1_true-unreach-call.i
./c/bitvector/gcd_4_true-unreach-call_true-no-overflow_true-termination.i
./c/bitvector/num_conversion_2_true-unreach-call_true-no-overflow.i
./c/loop-new/gauss_sum_true-unreach-call_true-termination.inoneIn the following I am excluding smack as it is not even supposed to be sound.
Apparently seahorn and seacrab are unsound on c/reducercommutativity/avg20_false-def-behavior.i.
E.g. fb5466303335a75fb2b20bcdbbcdb8da75aaf304 (cbmc sat, klee/sea* unsat). In
short, the assertion is that the average of an array is unequal to 1, which
surely is possible. The rest of the unsoundess cases are actually caused by the
incompleteness bug of cbmc on the callfpointer benchmark.
Some cases that I liked:
bfs: 06fc45843ff949427305b3893d84257e56b571f3
(derive the fact that n * (n+1) is always an even number)