Page MenuHomePhabricator

[libFuzzer] Initial implementation of weighted mutation leveraging during runtime.
ClosedPublic

Authored by kodewilliams on Jul 20 2018, 3:36 PM.

Details

Summary

Added functions that calculate stats while fuzz targets are running and give
mutations weight based on how much new coverage they provide, and choose better
performing mutations more often.

Patch by Kodé Williams (@kodewilliams).

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Dor1s added inline comments.Jul 27 2018, 1:43 PM
lib/fuzzer/FuzzerMutate.cpp
581–582

It would be much safer to use Stats.size() here instead of Mutators.size() since you're using i to access elements in Stats, not in Mutators.

582

Move Printf call to the next line please. Also, this stat::... syntax is used for final stats. If we decide to print something during runtime, we should probably follow the style of the other log messages.

583

Again, overcomplication. We should start with some initial weights, and later on only overwrite those.

585

+1

585

Actually, that seems to be unnecessary complicated. Initially, we should have some default values in Stats. Probably zero, since none of the stats have been useful at that point. Later on, we should be able to overwrite default Stats value with a new one. That's it, there is no way we'll have to put the default value back.

Also, I don't find it good to check whether TotalCount is 0 or not on every run. Let's just initialize it with 1 in the very beginning, so it will never be 0 and it always will be safe to use that value as a divisor.

588

I don't understand why we need + kDefaultMutationWeight, as well as why we multiply the value by 1000.

590

This code is duplicated in MutationDispatcher::PrintMutationStats, can you please have it in a one place and call it from different places as needed.

600

I'd ask you to move SumOfStats += Stat; to the next line just to follow the style, but actually you should use std::accumulate instead of manual looping through the values.

610

It would be very inefficient to create a new random generator every time when we need to select an index. There is Random Rand(Seed); declared in the FuzzerDriver and I believe all the other code is using it, please use it as well.

611

Also, it's not the first time it looks like you've just copied the code from somewhere without even changing variable names to follow project's style. I guess, https://en.cppreference.com/w/cpp/numeric/random/discrete_distribution this time.

Also, please don't forget to run clang format before uploading a CL.

616

This is an overkill, we never change sizes of Mutators and MutationWeights from the very beginning of the fuzzing. If you want to assert that their sizes are equal, somewhere after FuzzerMutate.cpp:68 would be the best place, but that wouldn't make any sense since you've just called resize there. Just remove the assert and return SelectIndex(gen).

kodewilliams marked 18 inline comments as done.
lib/fuzzer/FuzzerMutate.cpp
23

It is just there to represent a usefulness ratio that is near to but not entirely useless. So that it still gets weight instead of calculating to 0.

526–529

AssignMutationWeights is called initially and sets all the mutation weights to the default, therefore they would be all be weighted the same and have the same probability before 10k runs.

588

I add the default to ensure that mutations that are any amount more useful than the default get weighted higher in the distribution. No longer multiplying by 1000.

test/fuzzer/fuzzer-weightedmutations.test
5

Real: 0m 0.157s
User: 0m 0.036s
Sys: 0m 0.076s

metzman added inline comments.Jul 30 2018, 3:54 PM
lib/fuzzer/FuzzerMutate.cpp
36

Can you leave a comment here about why we are starting with a UsefulCount and a TotalCount of 1?

43

Why do ChangeASCIIInt and ChangeBinInt start with a UsefulCount of 0 when everything else starts with 1?

603

I don't think we need to multiply by 1.0, right? I don't think it (or the parentheses) does anything, please remove them if I am correct.

607

Kode, I don't think this is what Max meant. I believe that here you need to use the rand obtained by calling GetRand.
It's possible, though unlikely, that not doing this was causing the test flakiness you were seeing before.
I would run the previously flaky test in a loop and ensure that the tests pass every time, since if there is any flake we don't know about, it will be more painful to get rid of later rather than now.

metzman added inline comments.Jul 30 2018, 3:56 PM
test/fuzzer/fuzzer-weightedmutations.test
5

Amount of time LGTM.

Dor1s added inline comments.Jul 30 2018, 5:27 PM
lib/fuzzer/FuzzerMutate.cpp
23

That doesn't seem right to me. Let's say we have a tough target -- where it's hard to reach any new coverage. In fact, we have a lot of them, when we use a full corpus.

So, after running for a while, it finally finds a couple useful mutations, all of them get weights, say, 10^(-6) (again, totally possible), while all "useless" mutations get the default weight of 10^(-5). That would mean, "useless" mutations would be chosen much more often than "useful" ones.

IMO, the better approach would be something like what we've discussed in the past:

  1. make random decision whether we use weighted mutations or default selection, 80% vs 20% should be fine. Once start testing, can change to 90 vs 10 or any other proportion
  2. if weighted mutations was chosen, call WeightedIndex(); otherwise, use default case

That approach would be universal as it doesn't use any magic numbers which correspond to a particular target, as your 100*1000 can behave very differently with targets having different speed.

43

+1

588

That won't work well for all targets. If mutations stats are very low numbers, that default mutation weight constant will bias the selection towards more uniform distribution, rather than giving more favor to more useful mutations.

595

here and on line 603, what's the point of multiplying by 1.0?

607

Yes, don't create a new Random object, use an existing one from FuzzerDriver. That's crucial for keeping deterministic behavior, as Jonathan has pointed out.

kodewilliams marked 11 inline comments as done.
metzman added inline comments.Jul 30 2018, 5:37 PM
lib/fuzzer/FuzzerMutate.cpp
23

I think it will be harder to determine if this technique is useful with the 80/20 strategy.
If the concern is that the weight is too high, why don't we use the smallest positive double as the default?

Dor1s added inline comments.Jul 31 2018, 11:26 AM
lib/fuzzer/FuzzerMutate.cpp
23

I think it will be harder to determine if this technique is useful with the 80/20 strategy.

Why will it be harder? That percentage would just control the factor of how strongly our strategy affects fuzzing process. We should still be able to see either a negative a positive impact. Changing the distribution (e.g. use 90/10 after 80/20) would multiple that impact a little bit.

If the concern is that the weight is too high, why don't we use the smallest positive double as the default?

I can't think of a value that would work well for both cases when useful mutations have stats like 10^(-3) and 10^(-6). We should avoid relying on anything that depends on fuzz targets speed / complexity of finding a new input. We already have magic threshold of 10000, let's not add more of those.

526–529

With my proposal it will be something like:

// Even when use weighted mutations, fallback to the default selection in 20% of cases.
if (Options.UseWeightedMutations && Rand(100) < 80) 
  M = &Mutators[WeightedIndex()];
else
  M = &Mutators[Rand(Mutators.size())];
kodewilliams marked 11 inline comments as done.
kodewilliams added inline comments.Jul 31 2018, 3:57 PM
lib/fuzzer/FuzzerMutate.cpp
23

@metzman @Dor1s PTAL. Made changes and ran tests. Test in question is no longer flaky (ran about 50 times) and ran two experiments locally. In both cases, the corpus grew more with the option enabled :) so maybe a good sign.

595

Both UsefulCount and TotalCount are uint64_t but Stats is a vector of doubles, so I multiply by 1.0 to make sure it gets calculated as double.

Dor1s added inline comments.Jul 31 2018, 4:21 PM
lib/fuzzer/FuzzerMutate.cpp
595

It's an obscure way to cast a value to another type. Please use static_cast<double>(Mutators[i].UsefulCount) instead.

kodewilliams marked 2 inline comments as done.
Dor1s added a comment.Aug 1 2018, 8:54 AM

I don't like the test as it only tests that we do not completely break libFuzzer, but doesn't test the feature itself. I'll play with some ideas locally, will share those if anything works out. Otherwise, I guess we'll proceed with this test.

lib/fuzzer/FuzzerMutate.cpp
36–37

"zero division" -> "division by zero"

67

add ", init with uniform weights distribution." to the comment please

68

Let's remove kDefaultMutationWeight variable, just use 1.0 here.

69

We should start with 0 values in Stats, remove kDefaultMutationStat and leave Stats.resize(Mutators.size() here.

524

Please do Mutator *M = nullptr; just to be safe in case there ever will be any bug and the pointer won't be initialized.

609

We should keep the distribution object inside of MutationDispatcher class and update it only every kUpdateMutationWeightRuns, not for every weighted index retrieval.

lib/fuzzer/FuzzerMutate.h
104

Let's rename this method to UpdateMutationStats, that name would make more sense.

174

This line and line 175 feel slightly inconsistent, both should be MutationWeights and MutationStats or just Weights and Stats. I'd prefer the latter, as we're already inside MutationDispatcher class and clearly deal with mutations here.

174

Actually, we don't need MutationWeights here, we need only the distribution. Could you please do the following:

  1. Remove this MutationWeights member
  1. Add Distribution member (also see my comment for WeightedIndex() function)
  1. Rename MutationDispatcher::AssignMutationWeights() to MutationDispatcher::UpdateDistribution()
  1. In MutationDispatcher::UpdateDistribution() create a local vector Weights and update it using Stats value as you do now
  1. Update the Distribution using the Weights, e.g. see how it's done for the corpus distribution (FuzzerCorpus.h:294)
kodewilliams marked 13 inline comments as done.
Dor1s added a subscriber: kcc.

Left a couple minor comments. Looks good otherwise. Still not happy with the test, but can't think of anything better so far.

Matt, please take a look when you get a change.

lib/fuzzer/FuzzerMutate.cpp
600

This variable is not needed until line 605. And may be not needed at all if we do an early return on line 603. Please declare it between lines 603 and 605.

601

We don't need to do this, just pass Mutators.size()to constructor of Vector<double> Weights. Default weight should be 0, so Vector<double> Weights(Mutators.size()); would be enough

kodewilliams marked 2 inline comments as done.
Dor1s accepted this revision.Aug 2 2018, 7:51 AM

LGTM

morehouse added inline comments.Aug 2 2018, 8:56 AM
lib/fuzzer/FuzzerMutate.cpp
526–529

use -> using

528

Maybe if (... && Rand(5)) would be simpler.

606

Is normalization necessary? I think discrete_distribution might do this internally.

607

I think you can do Distribution.param(Weights) rather than reconstructing. Might be faster.

lib/fuzzer/FuzzerMutate.h
103–111

What is the "most previous run"? Please clarify comment

Dor1s added inline comments.Aug 2 2018, 9:21 AM
lib/fuzzer/FuzzerMutate.cpp
607

Oh, right. Thanks for spotting, Matt! Kode, please remove lines 603-605 and try passing Stats to the distribution.

kodewilliams marked 5 inline comments as done.
morehouse added inline comments.Aug 2 2018, 10:53 AM
lib/fuzzer/FuzzerMutate.cpp
603

Does Distribution.param(Stats) work instead?

lib/fuzzer/FuzzerMutate.h
104

There's nothing inside UpdateMutationStats that refers to the number 10,000.

Maybe something like "Recalculates mutation stats based on latest run data."

test/fuzzer/fuzzer-weightedmutations.test
7

This really only tests that there's no major regression with the flag on. If this flag performs better under some circumstances, we should add a test to demonstrate that.

morehouse added inline comments.Aug 2 2018, 10:56 AM
test/fuzzer/fuzzer-weightedmutations.test
7

Also, we can probably add this to fuzzer-mutationstats.test instead.

kodewilliams marked 2 inline comments as done.Aug 2 2018, 11:20 AM
kodewilliams added inline comments.
lib/fuzzer/FuzzerMutate.cpp
603

Please see other comment on Distribution.param(Weights).

607

One param function doesnt take arguments (returns a param_type object) and the other one sets the new params based on a passed in param_type object, but there would have to be another discrete_distribution object from which the param_type object would be obtained so I think that's why they do it this way in https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm/projects/compiler-rt/lib/fuzzer/FuzzerCorpus.h?rcl=206864844&l=294

Quite literally I would have to do Distribution.param(Distribution.param()) but I'm not sure that would update the distribution.

morehouse added inline comments.Aug 2 2018, 11:35 AM
lib/fuzzer/FuzzerMutate.cpp
602

SumOfStats doesn't need to be calculated.

kodewilliams marked an inline comment as done.
morehouse accepted this revision.Aug 2 2018, 1:44 PM
Dor1s added inline comments.Aug 2 2018, 1:52 PM
lib/fuzzer/FuzzerMutate.cpp
602

Matt, we needed SumOfStats to be non zero to ensure that we have at least some distribution. Until that moment is reached, we should be using the initial weights. I like this simplification though, but in that case we'll have to initialize UsefulCount with 1, so that initial Stats will be 1 / 1 for every mutation.

morehouse added inline comments.Aug 2 2018, 2:15 PM
lib/fuzzer/FuzzerMutate.cpp
602

Either approach SGTM.

kodewilliams marked 6 inline comments as done.

All comments addressed @morehouse

Dor1s updated this revision to Diff 158843.Aug 2 2018, 3:14 PM

Getting ready to land.

Dor1s edited the summary of this revision. (Show Details)Aug 2 2018, 3:19 PM
Dor1s updated this revision to Diff 158847.Aug 2 2018, 3:28 PM
Dor1s edited the summary of this revision. (Show Details)

Rebase

This revision was not accepted when it landed; it landed in state Needs Review.Aug 2 2018, 3:30 PM
This revision was automatically updated to reflect the committed changes.