This is an archive of the discontinued LLVM Phabricator instance.

[libFuzzer] Add an option to keep initial seed inputs around.
ClosedPublic

Authored by dokyungs on Aug 25 2020, 2:32 PM.

Details

Summary

This patch adds an option "keep_seed" to keep all initial seed inputs. Previously, only the initial seed inputs that find new coverage were added to the corpus, and all the other initial inputs were discarded. We observed in some circumstances that useful initial seed inputs are discarded as they find no new coverage, even though they contain useful fragments in them (e.g., SQLITE3 FuzzBench benchmark). This newly added option provides a way to keeping seed inputs for those circumstances. With this patch, and with -keep_seed=1, all initial seed inputs are kept regardless of whether they find new coverage or not. Further, these seed inputs are not replaced with smaller inputs even if -reduce_inputs=1.

Diff Detail

Event Timeline

dokyungs created this revision.Aug 25 2020, 2:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2020, 2:32 PM
Herald added a subscriber: Restricted Project. · View Herald Transcript
dokyungs requested review of this revision.Aug 25 2020, 2:32 PM
dokyungs updated this revision to Diff 287779.Aug 25 2020, 2:41 PM

Add missing code.

hctim added a comment.Aug 25 2020, 6:07 PM

Is it possible to fork the cross_over_uniformdist changes into a supplementary patchset easily? Seems like an related but indepentent change (prefer one patchset per feature if possible please -- reviewing is O(n^3) in lines-of-code-per-patch and it's also nice to have different commits for different features).

compiler-rt/lib/fuzzer/FuzzerFlags.def
27

When used with |reduce_inputs==1|, the seed inputs will never be reduced.

compiler-rt/lib/fuzzer/FuzzerFork.cpp
316

nit: spaces not tabs (and throughout)

compiler-rt/lib/fuzzer/FuzzerLoop.cpp
490

Can't this be derived by II->SeedInput (as elsewhere)? It would be much cleaner to not have to maintain this variable in the class just for initialization. If it can't be derived by that, can you please make it a function paramater (you can even default it to false to avoid having to change all the other instances)

793

nit: number of corpus inputs = %d\n (just because it's absolutely 100% clear that it's number-of-units, not number-of-bytes)

compiler-rt/lib/fuzzer/tests/FuzzerUnittest.cpp
1061–1063

std::unique_ptr<InputCorpus> C(new InputCorpus("", Entropic, /* KeepSeed */ false)); (and elsewhere)

dokyungs updated this revision to Diff 288016.Aug 26 2020, 9:46 AM

Addressed comments. Added keep-seed.test to support/test usefulness of the -keep_seed=1 flag.

dokyungs marked 4 inline comments as done.Aug 26 2020, 9:49 AM
dokyungs marked an inline comment as done.
dokyungs added inline comments.
compiler-rt/lib/fuzzer/FuzzerFork.cpp
316

I just checked that these are actually spaces.

dokyungs marked an inline comment as done.Aug 26 2020, 9:53 AM

Is it possible to fork the cross_over_uniformdist changes into a supplementary patchset easily? Seems like an related but indepentent change (prefer one patchset per feature if possible please -- reviewing is O(n^3) in lines-of-code-per-patch and it's also nice to have different commits for different features).

Yes, it is definitely possible. I will do in the next upload though, because I'd like to see if we can have some discussion here about how to make a good use the kept seed inputs.

hctim added inline comments.Aug 26 2020, 10:42 AM
compiler-rt/lib/fuzzer/FuzzerCorpus.h
284

nit: ChooseUnweightedInputToMutate or something like that. Make it clear that this is a uniform selection, not a bias-weighted selection.

compiler-rt/lib/fuzzer/FuzzerFlags.def
29

nit: cross_over_uniform_dist

compiler-rt/lib/fuzzer/FuzzerFork.cpp
316

Ah - I see, it's just the way that phabricator displays that the indentation level was increased. Interesting...

compiler-rt/lib/fuzzer/FuzzerLoop.cpp
774

nit: please also give these unnamed constants the /* ArgName */ treatment

compiler-rt/lib/fuzzer/tests/FuzzerUnittest.cpp
602

While you're here can you give these the /* ArgName */ treatment as well, thanks

compiler-rt/test/fuzzer/keep-seed.test
8 ↗(On Diff #288016)

Hmm - maybe a better test would be to run with and without keep_seed and verify that keep_seed can find it in less iterations? Can you also add a quick comment with the amount of runs required on your machine to trigger this bug with and without this patch? Thanks.

dokyungs updated this revision to Diff 288973.Aug 31 2020, 9:55 AM

Addressed the comments except for the test comment (will follow up on the next upload), and removed uniform dist changes which will be uploaded as a separate patch.

dokyungs edited the summary of this revision. (Show Details)Aug 31 2020, 9:57 AM
dokyungs marked 4 inline comments as done.
dokyungs added inline comments.
compiler-rt/lib/fuzzer/FuzzerCorpus.h
284

Will reflect this comment in a separate patch.

compiler-rt/lib/fuzzer/FuzzerFlags.def
29

Reflect this comment in a separate patch.

dokyungs updated this revision to Diff 288982.Aug 31 2020, 10:23 AM
dokyungs marked 2 inline comments as done.

Addressed the comment on the test.

With -keep_seed=0, it takes 5,150,128 execs to find the crash.
With -keep_seed=1, it takes 1,049,074 execs to find the crash. (because the seed input is not reduced.)

dokyungs marked an inline comment as done.Aug 31 2020, 10:24 AM
dokyungs added inline comments.
compiler-rt/test/fuzzer/keep-seed.test
8 ↗(On Diff #288016)

Done.

With -keep_seed=0, it takes 5,150,128 execs to find the crash.
With -keep_seed=1, it takes 1,049,074 execs to find the crash. (because the seed input is not reduced.)

dokyungs updated this revision to Diff 288983.Aug 31 2020, 10:26 AM
dokyungs marked an inline comment as done.

Fix comment in KeepSeedTest.cpp

morehouse added inline comments.Aug 31 2020, 5:13 PM
compiler-rt/lib/fuzzer/FuzzerCorpus.h
186

Feel free to disregard, but I'd like to protest the length of this parameter list.

I would happily review a separate patch that cleaned it up.

compiler-rt/lib/fuzzer/FuzzerFlags.def
28

Please also document the intended use case (i.e. when seeds are not properly formed for the fuzz target but still have useful snippets.

compiler-rt/lib/fuzzer/FuzzerInternal.h
71

Nit: Input params should come before output params

compiler-rt/lib/fuzzer/FuzzerLoop.cpp
481

I think it's cleaner if we avoid checking Options.KeepSeed in this function, and instead check it in the caller. Then we could rename SeedInput to ForceAddToCorpus for clarity.

793

Isn't this information already printed above?

compiler-rt/lib/fuzzer/tests/FuzzerUnittest.cpp
597

Side note, feel free to ignore: Now that LLVM uses C++14, all these unique_ptr patterns can be simplified with std::make_unique:

auto C = std::make_unique<InputCorpus>("", Entropic, false);

(Separate patch is welcome)

compiler-rt/test/fuzzer/KeepSeedTest.cpp
34 ↗(On Diff #288983)

Please clang-format this test.

dokyungs updated this revision to Diff 289187.Sep 1 2020, 8:45 AM

Partially addressed comments.

dokyungs marked 6 inline comments as done.Sep 1 2020, 8:50 AM
dokyungs added inline comments.
compiler-rt/lib/fuzzer/FuzzerCorpus.h
186

Thanks! will send a separate patch!

compiler-rt/lib/fuzzer/FuzzerFlags.def
28

Done!

compiler-rt/lib/fuzzer/FuzzerInternal.h
71

Done.

compiler-rt/lib/fuzzer/FuzzerLoop.cpp
481

I tried to do this, but wasn't sure the scope of change you were thinking of. Do you want the SeedInput member variable of InputInfo to be changed to ForceAddToCorpus too? We need to keep this information in InputInfo to prevent it being reduced to a smaller input here.

793

Removed. Printing wasn't necessary here.

compiler-rt/lib/fuzzer/tests/FuzzerUnittest.cpp
597

Thanks for the suggestion! Will send a separate patch.

compiler-rt/test/fuzzer/KeepSeedTest.cpp
34 ↗(On Diff #288983)

Done. compiler-rt/test/.clang-format has ColumnLimit: 0 so I had to temporarily remove it to get 80 chars in each line.

morehouse added inline comments.Sep 1 2020, 10:37 AM
compiler-rt/lib/fuzzer/FuzzerLoop.cpp
481

I just meant we could simplify slightly in this function by changing the param to ForceAddToCorpus and then unconditionally adding to corpus if its true. We can check the KeepSeed flag in the caller to determine if we want to force add the input or not.

dokyungs marked 6 inline comments as done.Sep 1 2020, 11:18 AM
dokyungs added inline comments.
compiler-rt/lib/fuzzer/FuzzerLoop.cpp
481

I am not sure if I can simplify this that way, because SeedInput needs to propagate; it's stored in the InputInfos of all seed inputs when InputInfos are created by AddToCorpus. Storing this in a member variable SeedInput of InputInfo is needed to prevent them from being replaced. The above line prevents them from being replaced with new, non-seed inputs that's based on the seed input. It seems to me that, if we change the argument SeedInput to ForceAddToCorpus, SeedInput of InputInfo needs changing too. Please correct me if I am still misled :)

morehouse added inline comments.Sep 1 2020, 11:31 AM
compiler-rt/lib/fuzzer/FuzzerLoop.cpp
481

Ok, I think I was confused because we have both SeedInput and II->SeedInput used in this function.

Is there any way we can make this simpler? Having to check 3 things that seem similar (Options.KeepSeed, SeedInput, II->SeedInput) in this single function is tricky.

dokyungs added inline comments.Sep 2 2020, 9:11 AM
compiler-rt/lib/fuzzer/FuzzerLoop.cpp
481

Yes, I share the sense that this is confusing. What do you think of adding a new flag called reduce_seed_inputs, and check !Options.ReduceSeedInputs && II->SeedInput here rather than Options.KeepSeed && II->SeedInput? This is probably not simplifying things since it's more code, but adding a new flag could probably reduce possible confusion because it's more explanatory.

hctim added inline comments.Sep 2 2020, 2:27 PM
compiler-rt/lib/fuzzer/FuzzerCorpus.h
135

Vestigal - please delete.

compiler-rt/test/fuzzer/keep-seed.test
8 ↗(On Diff #288016)

Can you also explicitly check that the number of runs with -keep_seed=1 uses less iterations? You can get the data by providing -print_final_stats=1, you can grep the number by stat::number_of_executed_units

morehouse added inline comments.Sep 2 2020, 2:40 PM
compiler-rt/lib/fuzzer/FuzzerLoop.cpp
481

One idea:

  • Rename II->KeepSeed to II->NeverReduce, and only set it to true if Options.KeepSeed == true.
  • Rename the parameter KeepSeed to ForceAddToCorpus.

This seems more understandable to me, and we don't need to check Options.KeepSeed at all in this function.

dokyungs updated this revision to Diff 289579.Sep 2 2020, 3:01 PM
dokyungs marked an inline comment as done.

Removed unused var.

compiler-rt/test/fuzzer/keep-seed.test
8 ↗(On Diff #288016)

Currently it's implicitly implied that -keep_seed=1 uses less iterations (i) with not and 2 million runs above, and (ii) without not and 4 million runs below.

Do you have any idea how I can explicitly check this in a lit test? Can I use bash's -lt?

hctim added inline comments.Sep 2 2020, 3:33 PM
compiler-rt/test/fuzzer/keep-seed.test
8 ↗(On Diff #288016)

Actually - what you have currently seems reasonable to me - but would you mind adding a comment to clarify? I missed the "works with 2m runs with this flag, doesn't work with 4m runs without it" context at first glance :(.

dokyungs updated this revision to Diff 289592.Sep 2 2020, 3:36 PM

Give more general but descriptive variable names as suggested.

Options.KeepSeed -> ForceAddToCorpus -> InputInfo::NeverReduce

dokyungs updated this revision to Diff 289593.Sep 2 2020, 3:41 PM

Add comment in keep-seed.test

dokyungs marked 6 inline comments as done.Sep 2 2020, 3:42 PM
dokyungs added inline comments.
compiler-rt/lib/fuzzer/FuzzerLoop.cpp
481

Thanks for the idea. This seems a lot more descriptive, which is now reflected in the code.

compiler-rt/test/fuzzer/keep-seed.test
8 ↗(On Diff #288016)

Done.

dokyungs updated this revision to Diff 289598.Sep 2 2020, 3:47 PM
dokyungs marked 2 inline comments as done.

Reflect variable name change in FuzzerInternal.h

This revision is now accepted and ready to land.Sep 2 2020, 5:20 PM
This revision was landed with ongoing or failed builds.Sep 3 2020, 8:55 AM
This revision was automatically updated to reflect the committed changes.