This is an archive of the discontinued LLVM Phabricator instance.

Introduce llvm-opt-fuzzer for fuzzing optimization passes
ClosedPublic

Authored by igor-laevsky on Nov 2 2017, 8:18 AM.

Details

Summary

Hi,

Seeing recent advances in fuzzing technologies for llvm (libFuzzer, FuzzMutate, OSSFuzz)
it's become reasonably simple to extend this approach for the general optimization
passes.

In this review I would like to propose a generic fuzzing target intended for the
optimization passes and various combinations of them. This is very initial implementation
which I tried to keep simple. Most of it's code is inherited from the llvm-isel-fuzzer.

This tool is intended to be run by the OSSFuzz, so interface is rather primitive.
User is only required to specify target triple and optimization pipeline using
the new pass manager syntax.

In general our primary goal here is to continuously run OSSFuzz testing for some of the llvm passes
which are not widely used (i.e passes which are not part of the default clang pipeline, IRCE,
Loop Predication, RS4GC and so on).

However I expect it would be simpler to start with the more popular passes just
to have a chance to stabilize infrastructure and figure out good workflow for the discovered bugs.
So after this tool is integrated to the tree (if no one will have objections), next step would be
to start OSSFuzz project for the InstCombine, as being single most used pass.

Diff Detail

Event Timeline

igor-laevsky created this revision.Nov 2 2017, 8:18 AM
kcc edited edge metadata.Nov 2 2017, 11:57 AM

Answering the questions from e-mail (not on the code review tool): OSS-Fuzz does not support run-time flags and we don't plan to -- it will complicate so many things on our side and it will also encourage heavy-weight fuzzers (like this one).
So, yes, if we want this on oss-fuzz, it should require no flags (same hack as llmv-isel-fuzzer--foo-bar is ok).

Thanks for the comments. I updated the patch to support embedded "instcombine" option.

bogner edited edge metadata.Nov 7 2017, 3:50 PM

Repeating my comments from email, as it looks like they got missed.

tools/llvm-opt-fuzzer/llvm-opt-fuzzer.cpp
135–140

These don't change from input to input do they? We should probably set this up in the initialize function.

142–161

Similarly here - we probably don't need to reconfigure the pass pipeline every time.

bogner added a comment.Nov 7 2017, 3:53 PM

Please add a note about this fuzzer in docs/FuzzingLLVM.rst

lib/FuzzMutate/FuzzerCLI.cpp
48–49

We should add a new/separate handleExecNameEncodedFEOpts or handleExecNameEncodedPassList instead of adding this to BE opts. This fuzzer shouldn't need the backend options AFAIK.

Justin, I must have missed your comments in the email. I added documentation and replied inline.

igor-laevsky added inline comments.Nov 8 2017, 7:19 AM
lib/FuzzMutate/FuzzerCLI.cpp
48–49

Yes, I agree. I think best option is to write parametrised version of the handleExecNameEncodedBEOpts which would receive the desired mapping between "exec" options and command line options. However I would prefer to do this in a separate change to not clobber this one with the refactoring.
Do you think we can leave this code as-is for the initial submission?

tools/llvm-opt-fuzzer/llvm-opt-fuzzer.cpp
135–140

I don't think we can do this. This is freshly created module, so we need to set correct target parameters during each invocation.

142–161

That was what I did initially. However libFuzzer complained about "number of mallocs didn't match number of free's" and disabled "LeakDetection after each iteration".

I suspect that somewhere in the pass manager something accumulates some state on each run, maybe to cache analysis results or something in this way. I don't really have good knowledge of this area, but having pass manager initialization on each iteration seems like a bulletproof solution from this kind of problems.

bogner added inline comments.Nov 8 2017, 10:43 AM
lib/FuzzMutate/FuzzerCLI.cpp
48–49

If that's what we end up doing this function has the wrong name, since IR passes really aren't BE (ie backend) options. I'd originally been thinking that there wouldn't be any real overlap between the option sets that are allowed here, but I guess the -march handling might be useful for some of the IR passes.

I'd really rather we just make a second function for "pass" options for now on the assumption that we won't really need the complexity of a fully general solution to embedding options in the filename.

tools/llvm-opt-fuzzer/llvm-opt-fuzzer.cpp
135–140

Good point, this bit is fine.

142–161

My main concern here is that we'd like to minimize the work we do here as much as possible so that TestOneInput is fast. This is fine for now and we can maybe try to improve it later.

Added separate function to parse exec encoded options. It's almost direct copy of the backend variant, but I hope to refactor it in a follow up changes.

bogner added inline comments.Nov 9 2017, 10:19 AM
lib/FuzzMutate/FuzzerCLI.cpp
80–84

Why not assume everything that isn't an arch is a pass name and pass them all as a comma separated list to -passes=?

igor-laevsky added inline comments.Nov 9 2017, 10:51 AM
lib/FuzzMutate/FuzzerCLI.cpp
80–84

Some passes contain dashes in their names, and this doesn't work well with the option parser. Also syntax of the new pass manager includes braces and I'm not sure how good they are going to work in the executable file name.

So instead I decided to treat each option as an a alias for the predefined pipeline configuration. So for example someday we may have "safepoints" option which will expand into pipeline where PlaceSafepoints is followed by the RS4GC pass.

bogner accepted this revision.Nov 9 2017, 12:13 PM

LGTM. We can iterate on this further in tree.

lib/FuzzMutate/FuzzerCLI.cpp
80–84

Fair enough, this seems reasonable for now.

This revision is now accepted and ready to land.Nov 9 2017, 12:13 PM
This revision was automatically updated to reflect the committed changes.
kcc added a comment.Nov 13 2017, 5:21 PM

I've added llvm-opt-fuzzer--x86_64-instcombine to oss-fuzz.
What would be other option combinations (expressed as binary names) worth adding?

Hi Kostya,

Thanks for doing that! My original thought was to start instcombine runs with the IR corpus. In my local experiments I found that using simple corpus gathered from the llvm-lit tests proves to be way more productive than non corpus runs. I found zero issues in about 10 hour runs without the corpus and with the corpus there were couple of failures during the first two minutes. However I also observed couple of crashes in the FuzzMutate itself, which I planned to fix first.

So the short answer to your question - next step is to use reasonable corpus for the instcombine, but first to fix all the immediate issues.

kcc added a comment.Nov 14 2017, 7:32 AM

First trophy:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=4210
Looks real. Did you see it?

Thanks for doing that! My original thought was to start instcombine runs with the IR corpus. In my local experiments I found that using simple corpus gathered from the llvm-lit tests proves to be way more productive than non corpus runs.

No surprise here!

I found zero issues in about 10 hour runs without the corpus and with the corpus there were couple of failures during the first two minutes.

Nice!

However I also observed couple of crashes in the FuzzMutate itself, which I planned to fix first.

So the short answer to your question - next step is to use reasonable corpus for the instcombine, but first to fix all the immediate issues.

Good strategy.

Ideally, a fuzz target would have an optional extra build rule.
that will create ${FUZZ_TARGET_NAME}_seed_corpus.zip so that
in https://github.com/google/oss-fuzz/blob/master/projects/llvm/build.sh
we just copy all such files to $OUT

In D39555#924678, @kcc wrote:

First trophy:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=4210
Looks real. Did you see it?

Wow, that's cool :) I haven't seen this failure, think it's a new one.

Ideally, a fuzz target would have an optional extra build rule.
that will create ${FUZZ_TARGET_NAME}_seed_corpus.zip so that
in https://github.com/google/oss-fuzz/blob/master/projects/llvm/build.sh
we just copy all such files to $OUT

Thanks, I was just wondering on how to better handle corpus creation.