Skip to content

strategy comparison

Christian Klinger edited this page Jun 12, 2018 · 14 revisions

EXP04: Comparison between Strategies

TODO: Rerun because of interference with vdiff-viewer, and because of a cbmc false positive bug.

Changelog:: Use data of a re-run

Executive Summary: All “weighted-positions”-strategies perform similarly. smart performs worse. The runs with random-walk should be repeated because it is (unfairly) not using the constant pool.

What troubles me is the following: The only unsoundness that we found (without looking into --disagreement because for those we have no way to establish the ground truth) are variations of the reducercommutativity/avg20_false-def-behavior.i by cpachecker. Unfortunately, this is not a really surprising finding as cpachecker already reached a “false unsat” in SV-Comp18 on the original file. In fact, there are multiple cases like that for cpachecker that can be found in SV-Comp’s results table.

Setup

> du -sh *
3.4M    bfs.db
3.4M    dfs.db
2.7M    random-uniform.db
2.7M    random-walk.db
2.2M    smart.db

Or in absolute numbers (vdiff-viewer --stat):

data runs programs used source files
bfs.db 12446 1523 100
dfs.db 12446 1519 100
random-uniform.db 12446 1524 100
random-walk.db 12062 1504 99
smart.db 9724 1161 91

the invocation happened through random100_fast_3p1.sh, and random100_fat_3p2.sh inside two docker containers that were initialized with vdiff-docker.hs --cpuset-cpus=0,2,4,6 --memory=18GB and vdiff-docker.hs --cpuset-cpus=1,3,5,7 --memory=18GB. The parameters for each vdiff run were: budget=20, timeout=15, seed=123.

Results

data –unsound –incomplete –disagreement –unsound-klee-cbmc-smack
bfs.db 252 26 302 8
dfs.db 319 22 392 2
random-uniform.db 292 28 280 10
random-walk.db 349 2 375 0
smart.db 159 4 169 0
data –unsound(files) –incomplete(files) –disagreement(files) –unsound-klee-cbmc-smack(files)
bfs.db 56 5 60 1
dfs.db 59 6 63 1
random-uniform.db 51 5 58 1
random-walk.db 50 2 55 0
smart.db 43 2 45 0

Details

Unsoundness by Consent

All cases of unsoundness were caused by KLEE or Smack. In general, KLEE causes a lot more unsoundness because of its ‘special needs’ regarding instrumentation.

TODO: Check Smart

Incompleteness by Consent

BFS: By verifier:

seacrab/seahorn

 ./c/array-tiling/poly1_true-unreach-call.i
 ./c/bitvector/gcd_4_true-unreach-call_true-no-overflow_true-termination.i
 ./c/bitvector/num_conversion_2_true-unreach-call_true-no-overflow.i
 ./c/loop-new/gauss_sum_true-unreach-call_true-termination.i

Klee, Cpachecker, Uautomizer/taipan

none

Unsound by Klee or Cbmc (–unsound-klee-cbmc)

In the following I am excluding smack as it is not even supposed to be sound. Apparently seahorn and seacrab are unsound on c/reducercommutativity/avg20_false-def-behavior.i. E.g. fb5466303335a75fb2b20bcdbbcdb8da75aaf304 (cbmc sat, klee/sea* unsat). In short, the assertion is that the average of an array is unequal to 1, which surely is possible. The rest of the unsoundess cases are actually caused by the incompleteness bug of cbmc on the callfpointer benchmark.

Appendix

Some cases that I liked: bfs: 06fc45843ff949427305b3893d84257e56b571f3 (derive the fact that n * (n+1) is always an even number)

Clone this wiki locally