This patch adds an option "keep_seed" to keep all initial seed inputs. Previously, only the initial seed inputs that find new coverage were added to the corpus, and all the other initial inputs were discarded. We observed in some circumstances that useful initial seed inputs are discarded as they find no new coverage, even though they contain useful fragments in them (e.g., SQLITE3 FuzzBench benchmark). This newly added option provides a way to keeping seed inputs for those circumstances. With this patch, and with -keep_seed=1, all initial seed inputs are kept regardless of whether they find new coverage or not. Further, these seed inputs are not replaced with smaller inputs even if -reduce_inputs=1.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Is it possible to fork the cross_over_uniformdist changes into a supplementary patchset easily? Seems like an related but indepentent change (prefer one patchset per feature if possible please -- reviewing is O(n^3) in lines-of-code-per-patch and it's also nice to have different commits for different features).
compiler-rt/lib/fuzzer/FuzzerFlags.def | ||
---|---|---|
27 | When used with |reduce_inputs==1|, the seed inputs will never be reduced. | |
compiler-rt/lib/fuzzer/FuzzerFork.cpp | ||
316 | nit: spaces not tabs (and throughout) | |
compiler-rt/lib/fuzzer/FuzzerLoop.cpp | ||
490 | Can't this be derived by II->SeedInput (as elsewhere)? It would be much cleaner to not have to maintain this variable in the class just for initialization. If it can't be derived by that, can you please make it a function paramater (you can even default it to false to avoid having to change all the other instances) | |
799 | nit: number of corpus inputs = %d\n (just because it's absolutely 100% clear that it's number-of-units, not number-of-bytes) | |
compiler-rt/lib/fuzzer/tests/FuzzerUnittest.cpp | ||
1061–1062 | std::unique_ptr<InputCorpus> C(new InputCorpus("", Entropic, /* KeepSeed */ false)); (and elsewhere) |
Addressed comments. Added keep-seed.test to support/test usefulness of the -keep_seed=1 flag.
compiler-rt/lib/fuzzer/FuzzerFork.cpp | ||
---|---|---|
316 | I just checked that these are actually spaces. |
Yes, it is definitely possible. I will do in the next upload though, because I'd like to see if we can have some discussion here about how to make a good use the kept seed inputs.
compiler-rt/lib/fuzzer/FuzzerCorpus.h | ||
---|---|---|
284 | nit: ChooseUnweightedInputToMutate or something like that. Make it clear that this is a uniform selection, not a bias-weighted selection. | |
compiler-rt/lib/fuzzer/FuzzerFlags.def | ||
29 | nit: cross_over_uniform_dist | |
compiler-rt/lib/fuzzer/FuzzerFork.cpp | ||
316 | Ah - I see, it's just the way that phabricator displays that the indentation level was increased. Interesting... | |
compiler-rt/lib/fuzzer/FuzzerLoop.cpp | ||
780 | nit: please also give these unnamed constants the /* ArgName */ treatment | |
compiler-rt/lib/fuzzer/tests/FuzzerUnittest.cpp | ||
602 | While you're here can you give these the /* ArgName */ treatment as well, thanks | |
compiler-rt/test/fuzzer/keep-seed.test | ||
8 ↗ | (On Diff #288016) | Hmm - maybe a better test would be to run with and without keep_seed and verify that keep_seed can find it in less iterations? Can you also add a quick comment with the amount of runs required on your machine to trigger this bug with and without this patch? Thanks. |
Addressed the comments except for the test comment (will follow up on the next upload), and removed uniform dist changes which will be uploaded as a separate patch.
Addressed the comment on the test.
With -keep_seed=0, it takes 5,150,128 execs to find the crash.
With -keep_seed=1, it takes 1,049,074 execs to find the crash. (because the seed input is not reduced.)
compiler-rt/test/fuzzer/keep-seed.test | ||
---|---|---|
8 ↗ | (On Diff #288016) | Done. With -keep_seed=0, it takes 5,150,128 execs to find the crash. |
compiler-rt/lib/fuzzer/FuzzerCorpus.h | ||
---|---|---|
186 | Feel free to disregard, but I'd like to protest the length of this parameter list. I would happily review a separate patch that cleaned it up. | |
compiler-rt/lib/fuzzer/FuzzerFlags.def | ||
28 | Please also document the intended use case (i.e. when seeds are not properly formed for the fuzz target but still have useful snippets. | |
compiler-rt/lib/fuzzer/FuzzerInternal.h | ||
71 | Nit: Input params should come before output params | |
compiler-rt/lib/fuzzer/FuzzerLoop.cpp | ||
481 | I think it's cleaner if we avoid checking Options.KeepSeed in this function, and instead check it in the caller. Then we could rename SeedInput to ForceAddToCorpus for clarity. | |
799 | Isn't this information already printed above? | |
compiler-rt/lib/fuzzer/tests/FuzzerUnittest.cpp | ||
597 | Side note, feel free to ignore: Now that LLVM uses C++14, all these unique_ptr patterns can be simplified with std::make_unique: auto C = std::make_unique<InputCorpus>("", Entropic, false); (Separate patch is welcome) | |
compiler-rt/test/fuzzer/KeepSeedTest.cpp | ||
34 ↗ | (On Diff #288983) | Please clang-format this test. |
compiler-rt/lib/fuzzer/FuzzerCorpus.h | ||
---|---|---|
186 | Thanks! will send a separate patch! | |
compiler-rt/lib/fuzzer/FuzzerFlags.def | ||
28 | Done! | |
compiler-rt/lib/fuzzer/FuzzerInternal.h | ||
71 | Done. | |
compiler-rt/lib/fuzzer/FuzzerLoop.cpp | ||
481 | I tried to do this, but wasn't sure the scope of change you were thinking of. Do you want the SeedInput member variable of InputInfo to be changed to ForceAddToCorpus too? We need to keep this information in InputInfo to prevent it being reduced to a smaller input here. | |
799 | Removed. Printing wasn't necessary here. | |
compiler-rt/lib/fuzzer/tests/FuzzerUnittest.cpp | ||
597 | Thanks for the suggestion! Will send a separate patch. | |
compiler-rt/test/fuzzer/KeepSeedTest.cpp | ||
34 ↗ | (On Diff #288983) | Done. compiler-rt/test/.clang-format has ColumnLimit: 0 so I had to temporarily remove it to get 80 chars in each line. |
compiler-rt/lib/fuzzer/FuzzerLoop.cpp | ||
---|---|---|
481 | I just meant we could simplify slightly in this function by changing the param to ForceAddToCorpus and then unconditionally adding to corpus if its true. We can check the KeepSeed flag in the caller to determine if we want to force add the input or not. |
compiler-rt/lib/fuzzer/FuzzerLoop.cpp | ||
---|---|---|
481 | I am not sure if I can simplify this that way, because SeedInput needs to propagate; it's stored in the InputInfos of all seed inputs when InputInfos are created by AddToCorpus. Storing this in a member variable SeedInput of InputInfo is needed to prevent them from being replaced. The above line prevents them from being replaced with new, non-seed inputs that's based on the seed input. It seems to me that, if we change the argument SeedInput to ForceAddToCorpus, SeedInput of InputInfo needs changing too. Please correct me if I am still misled :) |
compiler-rt/lib/fuzzer/FuzzerLoop.cpp | ||
---|---|---|
481 | Ok, I think I was confused because we have both SeedInput and II->SeedInput used in this function. Is there any way we can make this simpler? Having to check 3 things that seem similar (Options.KeepSeed, SeedInput, II->SeedInput) in this single function is tricky. |
compiler-rt/lib/fuzzer/FuzzerLoop.cpp | ||
---|---|---|
481 | Yes, I share the sense that this is confusing. What do you think of adding a new flag called reduce_seed_inputs, and check !Options.ReduceSeedInputs && II->SeedInput here rather than Options.KeepSeed && II->SeedInput? This is probably not simplifying things since it's more code, but adding a new flag could probably reduce possible confusion because it's more explanatory. |
compiler-rt/lib/fuzzer/FuzzerCorpus.h | ||
---|---|---|
135 | Vestigal - please delete. | |
compiler-rt/test/fuzzer/keep-seed.test | ||
8 ↗ | (On Diff #288016) | Can you also explicitly check that the number of runs with -keep_seed=1 uses less iterations? You can get the data by providing -print_final_stats=1, you can grep the number by stat::number_of_executed_units |
compiler-rt/lib/fuzzer/FuzzerLoop.cpp | ||
---|---|---|
481 | One idea:
This seems more understandable to me, and we don't need to check Options.KeepSeed at all in this function. |
Removed unused var.
compiler-rt/test/fuzzer/keep-seed.test | ||
---|---|---|
8 ↗ | (On Diff #288016) | Currently it's implicitly implied that -keep_seed=1 uses less iterations (i) with not and 2 million runs above, and (ii) without not and 4 million runs below. Do you have any idea how I can explicitly check this in a lit test? Can I use bash's -lt? |
compiler-rt/test/fuzzer/keep-seed.test | ||
---|---|---|
8 ↗ | (On Diff #288016) | Actually - what you have currently seems reasonable to me - but would you mind adding a comment to clarify? I missed the "works with 2m runs with this flag, doesn't work with 4m runs without it" context at first glance :(. |
Give more general but descriptive variable names as suggested.
Options.KeepSeed -> ForceAddToCorpus -> InputInfo::NeverReduce
clang-tidy: warning: private field 'KeepSeed' is not used [clang-diagnostic-unused-private-field]
not useful