This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] Create attribute for fuzzing-specific optimizations.
ClosedPublic

Authored by morehouse on Mar 7 2018, 3:24 PM.

Details

Summary

When building with libFuzzer, converting control flow to selects or obscuring the original operands of CMPs reduces the effectiveness of libFuzzer's heuristics.

This patch provides an attribute to disable or modify certain optimizations for optimal fuzzing signal.

Provides a less aggressive alternative to https://reviews.llvm.org/D44057.

Event Timeline

morehouse created this revision.Mar 7 2018, 3:24 PM
morehouse updated this revision to Diff 137502.Mar 7 2018, 3:27 PM
  • Update test to new attribute name
junbuml added a subscriber: junbuml.Mar 8 2018, 6:23 AM

Another question: Do we actually want to disable select formation, or, do we want to expand all selects into control flow late in the pipeline (i.e., during instruction selection)? The issue here, as I understand it, is that fuzzing depends on control flow paths to differentiate executions. As a result, we really just don't want to have any selects (we don't want ones that the frontend might generate either).

Another question: Do we actually want to disable select formation, or, do we want to expand all selects into control flow late in the pipeline (i.e., during instruction selection)? The issue here, as I understand it, is that fuzzing depends on control flow paths to differentiate executions. As a result, we really just don't want to have any selects (we don't want ones that the frontend might generate either).

I don't think we could do it during instruction selection, since SanitizerCoverage instrumentation is inserted before that. But if we could expand selects right before the SanitizerCoverage instrumentation happens (maybe even during the SanitizerCoverage pass?), that would provide even better coverage signal for fuzzing. Of course, that would be significantly more work.

Another concern comes from https://github.com/google/sanitizers/issues/893#issuecomment-350036791, where simplifyCFG takes two conditions and combines them into a single CMP, resulting in libFuzzer's TraceCMP heuristic becoming useless. So we would probably still want to disable part of simplifyCFG to avoid that.

Another question: Do we actually want to disable select formation, or, do we want to expand all selects into control flow late in the pipeline (i.e., during instruction selection)? The issue here, as I understand it, is that fuzzing depends on control flow paths to differentiate executions. As a result, we really just don't want to have any selects (we don't want ones that the frontend might generate either).

I don't think we could do it during instruction selection, since SanitizerCoverage instrumentation is inserted before that. But if we could expand selects right before the SanitizerCoverage instrumentation happens (maybe even during the SanitizerCoverage pass?), that would provide even better coverage signal for fuzzing. Of course, that would be significantly more work.

Shouldn't be too much work. Just turn the logic in CodeGenPrepare::optimizeSelectInst into a utility function, add an aggressive mode, and call it.

Another concern comes from https://github.com/google/sanitizers/issues/893#issuecomment-350036791, where simplifyCFG takes two conditions and combines them into a single CMP, resulting in libFuzzer's TraceCMP heuristic becoming useless. So we would probably still want to disable part of simplifyCFG to avoid that.

I took a quite look at the bug report, but I'm still not exactly sure what's going on. Can you explain? Is the problem that the coverage instrumentation looks at the arguments to a comparison, somehow, but doesn't look through boolean operations?

Shouldn't be too much work. Just turn the logic in CodeGenPrepare::optimizeSelectInst into a utility function, add an aggressive mode, and call it.

Well that's easier than I thought. Thanks for the insight.

I took a quite look at the bug report, but I'm still not exactly sure what's going on. Can you explain? Is the problem that the coverage instrumentation looks at the arguments to a comparison, somehow, but doesn't look through boolean operations?

The coverage instrumentation passes both arguments of every comparison to a __sanitizer_cov_trace[_const]_cmp callback. The callbacks are implemented in libFuzzer. libFuzzer uses a simple (but effective) heuristic that searches the program input for either argument to the comparison and then mutates matches to be close (-1, ==, or +1) to the other argument.

In the bug report, if x > 16 && x < 32 had been translated into a comparison with 16 and a comparison with 32, and if x were found in the program input, libFuzzer would be able to quickly find x==17 or x==31 to take the true branch. But instead, x > 16 && x < 32 is translated to a single unsigned comparison between x - 17 and 15, thereby defeating our heuristic.

LGTM

llvm/lib/AsmParser/LLParser.cpp
1127

no_cfg_select_formation -> nocfgselectformation ?

Ping. Any objections to moving forward with this patch?

vitalybuka accepted this revision.Mar 21 2018, 2:02 PM
This revision is now accepted and ready to land.Mar 21 2018, 2:02 PM

Ping. Any objections to moving forward with this patch?

Yea, I think that we should name this something else. It's more than just disabling select formation (which, as we discussed, we'd probably just want to undo in CGP). The problem is optimizations that interfere with the libFuzzer input-value-matching heuristics -- that's really the thing that we need to disable earlier in the pipeline. Thoughts?

Yea, I think that we should name this something else. It's more than just disabling select formation (which, as we discussed, we'd probably just want to undo in CGP). The problem is optimizations that interfere with the libFuzzer input-value-matching heuristics -- that's really the thing that we need to disable earlier in the pipeline. Thoughts?

no_cfg_select_formation -> no_cfg_cmp_simplification?

Yea, I think that we should name this something else. It's more than just disabling select formation (which, as we discussed, we'd probably just want to undo in CGP). The problem is optimizations that interfere with the libFuzzer input-value-matching heuristics -- that's really the thing that we need to disable earlier in the pipeline. Thoughts?

no_cfg_select_formation -> no_cfg_cmp_simplification?

opt_for_fuzzing?

morehouse updated this revision to Diff 139395.Mar 21 2018, 4:43 PM
  • Rename attribute to OptForFuzzing.
morehouse retitled this revision from [SimplifyCFG] Create attribute to disable select formation. to [SimplifyCFG] Create attribute to for fuzzing-specific optimizations..Mar 21 2018, 4:48 PM
morehouse edited the summary of this revision. (Show Details)
morehouse retitled this revision from [SimplifyCFG] Create attribute to for fuzzing-specific optimizations. to [SimplifyCFG] Create attribute for fuzzing-specific optimizations..
This revision was automatically updated to reflect the committed changes.

I'd also add this attribute to docs/BitCodeFormat.rst and docs/LangRef.rst

I'd also add this attribute to docs/BitCodeFormat.rst and docs/LangRef.rst

Thanks, added in https://reviews.llvm.org/rL328236.