This is an archive of the discontinued LLVM Phabricator instance.

Add a Fuzzer library
ClosedPublic

Authored by kcc on Jan 26 2015, 11:59 AM.

Details

Summary

A simple genetic in-process coverage-guided fuzz testing library.

I've used this fuzzer to test clang-format
(it found 12+ bugs, thanks djasper@ for the fixes!)
and it may also help us test other parts of LLVM.
So why not keep it in the LLVM repository?

I plan to add the cmake build rules later (in a separate patch, if that's ok)
and also add a clang-format-fuzzer target.

See README.txt for details.

Diff Detail

Event Timeline

kcc updated this revision to Diff 18780.Jan 26 2015, 11:59 AM
kcc retitled this revision from to Add a Fuzzer library.
kcc updated this object.
kcc edited the test plan for this revision. (Show Details)
kcc added reviewers: rnk, djasper, chandlerc.
kcc added a subscriber: Unknown Object (MLST).

It'd probably be good to have the build infrastructure go in with the source - having non-building/non-testing source in tree isn't terribly useful.

lib/Fuzzer/FuzzerIO.cpp
17

Dir could be StringRef here, maybe?

35

Path should be passed by const ref here?

lib/Fuzzer/FuzzerInternal.h
39

I suspect MSVC 2012 (2013? whatever our min support is) doesn't support non-static data member initializers.

53

StringPiece

63

StringPiece? Maybe?

lib/Fuzzer/FuzzerLoop.cpp
114

idiomatic != rather than <? And prefix increment rather than post increment?

(& maybe use a name like 'i' which I think is idiomatic LLVM for index variables - since this is an index, not an iterator)

(similar comments on lots of other loops)

lib/Fuzzer/FuzzerMain.cpp
26

StringRef (for both of these)?

61

Compute the bound once rather than on every iteration?

lib/Fuzzer/FuzzerMutate.cpp
35

pass U by reference?

39

Is this clang-format/LLVM's formatting for case? I would've expected the 'case' to be at the same level as the 'switch'.

lib/Fuzzer/FuzzerUtil.cpp
43

Could be marginally more efficient to use an ostringstream here to avoid two allocations? (Well, I guess they'll be short strings, etc... - so maybe not a problem)

rnk edited edge metadata.Jan 26 2015, 4:27 PM

Looks good. Specifically, it doesn't seem worth trying to get access to any of the LLVM helpers I would suggest due to the desire to remain isolated.

lib/Fuzzer/FuzzerFlags.h
1 ↗(On Diff #18780)

LLVM typically uses a .def suffix for such repeated include files. Also, you can change the header comment to match.

ygribov added inline comments.
lib/Fuzzer/FuzzerFlags.h
16 ↗(On Diff #18780)

I wonder if people will mostly be interested to run fuzzer until coverage keeps increasing.

lib/Fuzzer/FuzzerIO.cpp
20

Perhaps allow files as well?

lib/Fuzzer/FuzzerLoop.cpp
119

Just curious, have you considered making this a heap to explore the most covering units first?

lib/Fuzzer/FuzzerMain.cpp
71

You know the len so perhaps just do memcmp for both checks?

86

Extra ;

119

Perhaps add an option for this?

lib/Fuzzer/FuzzerMutate.cpp
35

Ditto for other places (CrossOver, MutateAndTestOne, etc.)?

lib/Fuzzer/FuzzerUtil.cpp
57

Don't you need SA_SIGINFO and perhaps SA_ONSTACK here?

ygribov added inline comments.Jan 26 2015, 11:10 PM
lib/Fuzzer/FuzzerMutate.cpp
37

These asserts are not very user-friendly, perhaps check for them in main() instead? Actually I wonder if MaxLen should be MaxSizeIncreasePercentage...

kcc added a comment.Jan 27 2015, 9:43 AM

It'd probably be good to have the build infrastructure go in with the source - having non-building/non-testing source in tree isn't terribly useful.

I'll send the patch for cmake support later today (I already have it working, but would like to disentangle the two commits)

lib/Fuzzer/FuzzerFlags.h
1 ↗(On Diff #18780)

done.

16 ↗(On Diff #18780)

How do you know that coverage keeps increasing?
And generally, no. Even if the coverage is steady fuzzing is still interesting (it finds bugs :))

lib/Fuzzer/FuzzerIO.cpp
17

This lib should not have any external dependencies.

20

Maybe in next steps, or maybe not.
Allowing files trades (small) flexibility to (small) complexity.

35

done

lib/Fuzzer/FuzzerInternal.h
39

Grrrr. I'd prefer not to bother about old MSVC at this point...
The whole thing (even sanitizer coverage) does not work on windows yet.
And the fuzzer is a strictly tool for developers, they can use the newest MSVC.

lib/Fuzzer/FuzzerLoop.cpp
114

changed back to 'i'. the rest is bikesheding (?)

119

Sure, I wanted to add a suite minimizer here. (later)

lib/Fuzzer/FuzzerMain.cpp
61

done

71

will it make the code more readable?

86

done

119

Err. Maybe. need to see how this will be used.
If we can get away w/o extra options -- we should.

lib/Fuzzer/FuzzerMutate.cpp
37

Well, asserts are not for users, they are for developers.

MaxSizeIncreasePercentage is something to consider, yes.

39

reformatted.

lib/Fuzzer/FuzzerUtil.cpp
43

This is a very cold part of code.
Much better to keep things simple.
Also, I want to depend on STL less, not more.
I actually may need to get rid of all of STL later.

Reason: this fuzzer is in-process, so it will be testing other code that also uses STL in the same process,
i.e. it will receive coverage feedback from STL functions. We don't want to mix coverage data from the library under test and the fuzzer itself.

57

What for? For now all I need it to die on timeout.
I may need these later, sure.

kcc updated this revision to Diff 18821.Jan 27 2015, 9:43 AM
kcc edited edge metadata.

addressed most of the comments

kcc updated this revision to Diff 18824.Jan 27 2015, 10:01 AM

add cmake rules to build the Fuzzer lib.

majnemer added inline comments.
lib/Fuzzer/FuzzerIO.cpp
19

Could you instead use fs::directory_iterator so that this works on Windows too?

42

Perhaps use sys::path::get_separator() instead of hardcoding '/'

kcc added inline comments.Jan 27 2015, 10:09 AM
lib/Fuzzer/FuzzerIO.cpp
19

No. This lib has to be independent from the rest of LLVM.
Same below

Well, asserts are not for users, they are for developers.

Still, you don't verify inputs in main() so user will face an assert if she passes invalid arguments.

How do you know that coverage keeps increasing?

Does not __sanitizer_get_total_unique_coverage tell you this?

And generally, no. Even if the coverage is steady fuzzing is still interesting (it finds bugs :))

Hm, interesting.

kcc added a comment.Jan 27 2015, 10:21 AM

Well, asserts are not for users, they are for developers.

Still, you don't verify inputs in main() so user will face an assert if she passes invalid arguments.

The inputs are truncated in Fuzzer::ShuffleAndMinimize

If Mutate gets inputs of unexpected length this is a bug => assertion should fail.

How do you know that coverage keeps increasing?

Does not __sanitizer_get_total_unique_coverage tell you this?

Of course.
But how do you know that you are not going to discover new coverage after 100000000 iterations?
The only way you may be certain of that is if you've already covered *all* basic blocks. In practice this never happens.

And generally, no. Even if the coverage is steady fuzzing is still interesting (it finds bugs :))

Hm, interesting.

edge and bb coverage is a rather weak signal.
Even if you have a full edge coverage you still may find lots of funny stuff with simple mutations.

rnk accepted this revision.Jan 27 2015, 2:01 PM
rnk edited edge metadata.

lgtm

lib/Fuzzer/CMakeLists.txt
1

Maybe this should be LLVMFuzzer to namespace it better? You're bypassing the add_llvm_library wrapper which will add this prefix.

This revision is now accepted and ready to land.Jan 27 2015, 2:01 PM
kcc closed this revision.Jan 27 2015, 2:10 PM