This is an archive of the discontinued LLVM Phabricator instance.

[llvm-opt-fuzzer] Avoid adding incorrect inputs to the fuzzer corpus
ClosedPublic

Authored by igor-laevsky on Jan 23 2018, 3:48 AM.

Details

Summary

In general during continuous fuzzing we want to avoid adding invalid inputs into the corpus because they will test the wrong thing (error handling in the bitcode reader). Instead it would be better to concentrate on the actual fuzzing target by only producing valid inputs.

In the perfect world llvm mutator will always produce correct llvm ir and everything would work flawlessly. However there are number of cases when mutator fails to guarantee that. We catch them by running module verification after mutation. In theory this should be sufficient to prevent exposing incorrect inputs to the libFuzzer. However I noticed that still occasionally incorrect inputs would flow into the fuzzer corpus.

Problem lurks with some rare invariants which are only checked by the llvm reader. This means that verification after mutation will not catch them. Ideal solution would be to first fix all those issues in the verifier and then fix the mutator to not produce such mutations. However it's unclear how much of those are there and debugging each of them might prove to be complicated.

So until those are fixed I think it's reasonable to add explicit save/reload step as part of the after-mutation verification. This should produce cleaner continuous runs with clear indications of the mutator problems. On my machine this decreases exec/s by 10-30% but it seems like reasonable cost to pay for the correct runs.

Diff Detail

Repository
rL LLVM

Event Timeline

igor-laevsky created this revision.Jan 23 2018, 3:48 AM

Hi. Any comments on this?

I have mixed feelings about this, but I guess it's probably better than the status quo. My two main concerns are the time cost and whether we'll stop noticing the issues instead of fixing them.

tools/llvm-opt-fuzzer/llvm-opt-fuzzer.cpp
68–73 ↗(On Diff #131028)

Can we drop this part and only verify after the reload?

89–95 ↗(On Diff #131028)

Worth breaking out the parseAndVerify bit into its own function? We do it a lot now.

92–93 ↗(On Diff #131028)

Where does this output go when running the fuzzer? Will we see / be able to act on this information?

igor-laevsky marked an inline comment as done.

Thanks for the comments! Added parseAndVerify function.

tools/llvm-opt-fuzzer/llvm-opt-fuzzer.cpp
68–73 ↗(On Diff #131028)

I'm not sure how bitcode writer will behave on the invalid module. I would suspect that this verification is fast compared to the save/reload part.

92–93 ↗(On Diff #131028)

This goes to the stderr and I assume this will be visible in the ClusterFuzz logs.

bogner accepted this revision.Feb 2 2018, 10:00 AM

I think this is okay to go in, with the caveat that we need a plan for fixing up the verifier and removing this.

tools/llvm-opt-fuzzer/llvm-opt-fuzzer.cpp
83–85 ↗(On Diff #132605)

Interestingly, it would be trivial to write a fuzzer that finds these verifier bugs at this point. Maybe we should do that as a way to move towards a state where we can remove the redundant checks?

68–73 ↗(On Diff #131028)

Are you set up to measure this? It'd be nice to qualify that claim.

This revision is now accepted and ready to land.Feb 2 2018, 10:00 AM
This revision was automatically updated to reflect the committed changes.

With regards to the performance concerns - I will keep an eye on the fuzzer statistics and revert the change if the slowdown becomes too much,

tools/llvm-opt-fuzzer/llvm-opt-fuzzer.cpp
83–85 ↗(On Diff #132605)

Yes, that's an interesting idea. Actually we can even write more general fuzzer for the FuzzMutate itself which will cover this kinds of issues and all other possible bugs in the FuzzMutate.

68–73 ↗(On Diff #131028)

I based it mainly on the fact that verification is faster than save/reload + verification. In any case I don't think there is a way to avoid the first verification since bitcode writer may be unfriendly to the invalid modules.