Diff Detail
- Build Status
Buildable 4989 Build 4989: arc lint + arc unit
Event Timeline
I don't like an algorithm were we determine the blocks to instrument while traversing and instrumenting other blocks.
It's much harder to reasons about.
lib/Transforms/Instrumentation/SanitizerCoverage.cpp | ||
---|---|---|
422 | what if you add here something like this: if (PDT->dominates(SUCC, BB)) return false; } |
lib/Transforms/Instrumentation/SanitizerCoverage.cpp | ||
---|---|---|
422 | Done. Also rearranged comments. PTAL. |
I think we can make a much simpler change: what if we just skip the optimization of not instrumenting post-dominators?
We apply optimizations in order to instrument less blocks without loss of precision, thus we don't instrument blocks
the path through which is uniquely identifiable using other blocks.
The problems start once this reasoning gets circular, and to me it's not obvious how this patch would solve this problem
in general (it very well may, and maybe I'm not looking close enough, but at least it's not obvious).
However, if we just apply the optimization for full dominators (not instrumenting these nodes)
we'll never get circular arguments, as domination relation is never circular.
From a limited set of examples I've tried, the post-dominator optimization produced a very tiny benefit
after the dominator optimization was applied.
Moreover, I could only see the benefit for a corner case of nodes with no successors (and if not instrumenting them
does speed fuzzing up, it might be easier to just check the is-strong-postdominator condition on those).
It's clearly very simple.
The question is whether it's too pessimistic.
We apply optimizations in order to instrument less blocks without loss of precision, thus we don't instrument blocks
the path through which is uniquely identifiable using other blocks.
The problems start once this reasoning gets circular, and to me it's not obvious how this patch would solve this problem
in general (it very well may, and maybe I'm not looking close enough, but at least it's not obvious).However, if we just apply the optimization for full dominators (not instrumenting these nodes)
we'll never get circular arguments, as domination relation is never circular.
From a limited set of examples I've tried, the post-dominator optimization produced a very tiny benefit
after the dominator optimization was applied.
Can you get some exact numbers on real code?
E.g. take https://github.com/google/fuzzer-test-suite/blob/master/sqlite-2016-11-14/sqlite3.c
(a single-file large chunk of code)
The numbers I remember were like DOM gives 30% saving, PDOM gives 20% more, which is a lot.
Moreover, I could only see the benefit for a corner case of nodes with no successors (and if not instrumenting them
does speed fuzzing up, it might be easier to just check the is-strong-postdominator condition on those).
The numbers I remember were like DOM gives 30% saving, PDOM gives 20% more, which is a lot.
But how would we know whether those numbers are good?
E.g. is it saving 20% of unneeded instrumentation, or missing 20% of code which actually needs to be instrumented?
LibFuzzer would find lots of bugs regardless, right?
With programs that large it would be hard to simply look at IR, and check whether instrumentation is spurious.
On your example the difference is 16k vs 22k, but it's not clear whether those extra calls are spurious.
Good question...
I was thinking about using libFuzzer itself to decide.
E.g. take a large corpus for some target (sqlite would be fine) and minimize it w/ and w/o optimization.
E.g. is it saving 20% of unneeded instrumentation, or missing 20% of code which actually needs to be instrumented?
My expectation is most of those 20% are really redundant.
But I did not invest much time into this problem.
LibFuzzer would find lots of bugs regardless, right?
Maybe not. When Mike implemented this optimization I compared a couple of targets and did not see any difference.
With programs that large it would be hard to simply look at IR, and check whether instrumentation is spurious.
After an offline discussion...
Looks like me and Vitaly both are busy with other stuff and this thing seems to be blocking you and a few others (and maybe prevents us from finding more bugs).
So let's just delete the PDOM part as George suggests and then later come up with a better strategy (and a way to test it).
George, want to send a separate CL?
Somewhat unrelated, just want to mention it here: I hope to get to implementing yet another way of coverage instrumentation which will not involve callbacks, just a single increment.
It's easy to do just that, but I also want to preserve the association bettween the counter and the BB, which is a bit trickier (and requires http://llvm.org/docs/LangRef.html#addresses-of-basic-blocks to work reliably).
This task is orthogonal to the optimization discussed here.
@kcc: sure, great!
In the spirit of disclosure, I'm now pretty certain I'm actually hitting a separate bug where the critical edges are not being split (even though SanitizerCoverage pass calls the proper function. Probably another after instrumentation adds them?.. ).
(but my stuff actually does work if I disable PDT optimization; it's the case of two bugs interacting)
what if you add here something like this:
for (const BasicBlock *SUCC : make_range(succ_begin(BB), succ_end(BB))) {