strategy comparison

EXP04: Comparison between Strategies

TODO: Rerun because of interference with vdiff-viewer, and because of a cbmc false positive bug.

Changelog:: Use data of a re-run

Executive Summary: All “weighted-positions”-strategies perform similarly. smart performs worse. The runs with random-walk should be repeated because it is (unfairly) not using the constant pool.

What troubles me is the following: The only unsoundness that we found (without looking into --disagreement because for those we have no way to establish the ground truth) are variations of the reducercommutativity/avg20_false-def-behavior.i by cpachecker. Unfortunately, this is not a really surprising finding as cpachecker already reached a “false unsat” in SV-Comp18 on the original file. In fact, there are multiple cases like that for cpachecker that can be found in SV-Comp’s results table.

Setup

> du -sh *
3.4M    bfs.db
3.4M    dfs.db
2.7M    random-uniform.db
2.7M    random-walk.db
2.2M    smart.db

Or in absolute numbers (vdiff-viewer --stat):

data	runs	programs	used source files
bfs.db	12446	1523	100
dfs.db	12446	1519	100
random-uniform.db	12446	1524	100
random-walk.db	12062	1504	99
smart.db	9724	1161	91

the invocation happened through random100_fast_3p1.sh, and random100_fat_3p2.sh inside two docker containers that were initialized with vdiff-docker.hs --cpuset-cpus=0,2,4,6 --memory=18GB and vdiff-docker.hs --cpuset-cpus=1,3,5,7 --memory=18GB. The parameters for each vdiff run were: budget=20, timeout=15, seed=123.

Results

data	–unsound	–incomplete	–disagreement	–unsound-klee-cbmc-smack
bfs.db	252	26	302	8
dfs.db	319	22	392	2
random-uniform.db	292	28	280	10
random-walk.db	349	2	375	0
smart.db	159	4	169	0

data	–unsound(files)	–incomplete(files)	–disagreement(files)	–unsound-klee-cbmc-smack(files)
bfs.db	56	5	60	1
dfs.db	59	6	63	1
random-uniform.db	51	5	58	1
random-walk.db	50	2	55	0
smart.db	43	2	45	0

Details

Unsoundness by Consent

All cases of unsoundness were caused by KLEE or Smack. In general, KLEE causes a lot more unsoundness because of its ‘special needs’ regarding instrumentation.

TODO: Check Smart

Incompleteness by Consent

BFS: By verifier:

seacrab/seahorn

 ./c/array-tiling/poly1_true-unreach-call.i
 ./c/bitvector/gcd_4_true-unreach-call_true-no-overflow_true-termination.i
 ./c/bitvector/num_conversion_2_true-unreach-call_true-no-overflow.i
 ./c/loop-new/gauss_sum_true-unreach-call_true-termination.i

Klee, Cpachecker, Uautomizer/taipan

none

Unsound by Klee or Cbmc (–unsound-klee-cbmc)

In the following I am excluding smack as it is not even supposed to be sound. Apparently seahorn and seacrab are unsound on c/reducercommutativity/avg20_false-def-behavior.i. E.g. fb5466303335a75fb2b20bcdbbcdb8da75aaf304 (cbmc sat, klee/sea* unsat). In short, the assertion is that the average of an array is unequal to 1, which surely is possible. The rest of the unsoundess cases are actually caused by the incompleteness bug of cbmc on the callfpointer benchmark.

Appendix

Some cases that I liked: bfs: 06fc45843ff949427305b3893d84257e56b571f3 (derive the fact that n * (n+1) is always an even number)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

strategy comparison

EXP04: Comparison between Strategies

Setup

Results

Details

Unsoundness by Consent

Incompleteness by Consent

seacrab/seahorn

Klee, Cpachecker, Uautomizer/taipan

Unsound by Klee or Cbmc (–unsound-klee-cbmc)

Appendix

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally