This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
compiler-rt/lib/fuzzer/
-
lib/
-
fuzzer/
57/66
FuzzerCorpus.h
-
FuzzerDriver.cpp
2/3
FuzzerFlags.def
5/6
FuzzerLoop.cpp
1/1
FuzzerOptions.h
-
tests/
3/3
FuzzerUnittest.cpp

Differential D73776

Entropic: Boosting LibFuzzer Performance
ClosedPublic

Authored by marcel on Jan 31 2020, 4:48 AM.

Download Raw Diff

Details

Reviewers

kcc
metzman
morehouse
Dor1s
vitalybuka

Commits

rGe2e38fca64e4: Entropic: Boosting LibFuzzer Performance

Summary

This is collaboration between Marcel Boehme @ Monash, Australia and Valentin Manès plus Sang Kil Cha @ KAIST, South Korea.

We have made a few modifications to boost LibFuzzer performance by changing how weights are assigned to the seeds in the corpus. Essentially, seeds that reveal more "information" about globally rare features are assigned a higher weight. Our results on the Fuzzer Test Suite seem quite promising. In terms of bug finding, our Entropic patch usually finds the same errors much faster and in more runs. In terms of coverage, our version Entropic achieves the same coverage in less than half the time for the majority of subjects. For the lack of space, we shared more detailed performance results directly with @kcc. We'll publish the preprint with all the technical details as soon as it is accepted. Happy to share if you drop us an email.

There should be plenty of opportunities to optimise further. For instance, while Entropic achieves the same coverage in less than half the time, Entropic has a much lower #execs per second. We ran the perf-tool and found a few performance bottlenecks.

Thanks for open-sourcing LibFuzzer (and the entire LLVM Compiler Infrastructure)! This has been such a tremendous help to my research.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

marcel created this revision.Jan 31 2020, 4:48 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 31 2020, 4:48 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

marcel edited the summary of this revision. (Show Details)Jan 31 2020, 4:58 AM

It's exciting that such a small change can bring such a great improvement.
Thanks for the contribution.

I left several comments in the code and here are some higher level comments:

put the new functionality under a flag, off by default, so that we can properly A/B test it once it's committed.

If it proves to be superior we'll enable the flag by default and then remove the old algorithm and remove the flag.

try to unit-test the most interesting functionality (see fuzzer/tests/FuzzerUnittest.cpp)
write a top-level comment explaining the algorithm.
try to avoid unordered_map on the hot code: sadly, the STL hash table is almost always the wrong choice of a data structure for hot code.

Once there is a top-level comment I may be able to suggest a better data structure.
But I expect it to be something like an array of pairs, organized somehow (sorted? heap?)
How many rare features do we need per input? Maybe just have a fixed size array?

don't use size_t where you can get away with uint32_t - memory footprint is important.
don't use long double (unless you have to -- in which case explain why)
there are mentions of local vs global frequencies. Please explain in a comment what that is (corpus vs individual input?) and reflect this in the variable names (like, RareFeaturesFreqMap => LocalFreqMap)

compiler-rt/lib/fuzzer/FuzzerCorpus.h
44	why not just 'double'?
45	Any chance to have a planer data structure here? (sorted array or some such) Also, why size_t? The feature is 4 bytes and the frequency can probably be 4 or even 2 bytes. (2 bytes is hard for alignment).
127	why is this public? Also, why the g_ prefix?
142	do you need this? the following line should take care of it
164	Please try to follow the coding style, i.e. capitalize the names: Singletons Also, the term Singleton may be a bit too overloaded. How about NumFeaturesCoveredByASingleInput or some such?
322	try not to use constants like this. At the very least, use const size_t kSomething = 100; .. if (xxx.size() > kSomething) if there is any reason to play with different value, you may want to use a flag instead Similar for 0xFF. Probably, the best here is to replace these two constants with function parameters so that this function becomes unit-testable. Which reminds me: please consider adding unit tests to the key functionality like this one.
395	This is the hotspot, right? My guess is that using a hash map here causes most of the slowdown, need a faster data structure... Need to think... It will help if you have describe the algorithm in a comment.
442	a top-level comment explaining the computations would be nice
449	use () just in case, replace 10000 with kSomething don't use random(), pass (Random &Rand) instead
455	why not for (auto It : I->RareFeaturesFreqMap) ?
465	don't use libc's rand(), pass (Random &Rand) instead
525	does this have to be 8-byte per feature?
compiler-rt/lib/fuzzer/FuzzerLoop.cpp
607	do you need this change?

Thanks so much for your feedback!

We maintain feature abundance counts globally (for the entire fuzzing campaign) and locally (for each InputInfo).
- Whenever an input generated by fuzzing II executes feature f, we increment the global abundance count for f as well as the local abundance count for f @ II.
- The global abundance counts are stored in FeaturesFreqMap, an array of size kFeatureSetSize.
  - It requires one read/write for *each* UpdateFeatureFrequency which is very hot (Line 331). To maximize efficiency, I decided to use a fixed-size array. There is only one global map. So, even though the map is fairly sparse, memory overhead might be okay.
  - Currently, each element is of size_t because a feature might be executed as often as new inputs are generated. The total number of inputs generated (g_NumExecutedMutations) will likely not fit into uint32_t.
  - However, if memory footprint is a concern over efficiency, we might even go down as much as uint16_t since only abundance counts below MostAbundant_RareFeature are relevant. This adds an overflow check to each (hot!) write (Line 331).
- The local abundance counts are stored with each InputInfo in RareFeaturesFreqMap, and unordered_map<size_t,size_t>
  - It is written to in UpdateFeatureFrequency but only if the current feature is globally rare. In most cases UpdateFeatureFrequency returns in Line 337 before the local feature count is updated.
  - It is read when we iterate over all keys (and values) to compute the energy for an input in UpdateEnergy.
  - We can go faster with other hashmaps. Most efficient would be a fixed size array that is simply indexed with the feature ID (like the global abundance counts). However, the arrays are very sparse and the memory footprint should be too high.
- There is also some reading of the global and local abundance counts in AddRareFeature (Line 244), but that should happen quite rarely.
How do we decide what is considered a globally rare feature?
- Currently, we fixed this definition quite arbitrary as: {the top 100 least abundant features} UNION {all features with abundance below 0xFF}.
- When a feature becomes abundant, it is removed globally and from each II locally. This is handled in AddRareFeature (which also updates MostAbundant_RareFeature)
- If you allow too many features to be considered as rare, you will spend more time computing the energy for a seed.
- If you allow too few features to considered as rare, your performance gain from Entropic might decrease.
- Not sure whether we already hit the sweet spot. We can pull these out as command line options to see what works.

In addition, we will

add a command line flag to enable entropic for libfuzzer,
pull out the constants either as command line options or as global constants,
turn most occurrences of long double to double (except in UpdateEnergy where we need the precision, I think).
update variable names, add comments, and
see how we can unit test our patch (@Valentin?)

Let me know what you think.

compiler-rt/lib/fuzzer/FuzzerCorpus.h
45	Whether or not unorder_map, see top-level comment. If unordered_map, then uint32_t for keys and values should work.
127	`g_NumExecutedMutations` contains the total number of inputs generated. It is updated in FuzzerLoop.c in `MutateAndTestOne`. I can drop the `g_` prefix.
322	We will pull these constants out as command line options.
395	Yes. This is the hot spot. However, most executions should exit in Line 337. FeaturesFreqMap is an array. Don't think you can get much faster here.
525	Each entry is upper-bounded by the total number of generated inputs `g_NumExecutedMutations` which likely won't fit into `uint32_t`. That's why I chose size_t. However, we really only need abundance information for features with an abundance below `MostAbundant_RareFeature`. If memory footprint is a concern, we can go down to `uint16_t` at the cost of an overflow check in the hot code in `UpdateFeatureFrequency`. See top-level comment for more details.
compiler-rt/lib/fuzzer/FuzzerLoop.cpp
607	Unrelated. This is just fixing a problem where LibFuzzer prints REDUCE more often than it should.

I'm happy to take a review pass too. Just ping me when / if this is needed.

Sounds good. Max and I will do the next round(s) of review.

Re RareFeaturesFreqMap, I would consider two more options:

a vector of pairs (feature, freq), ordered by feature. Updating the frequency would be log(N) and removing a feature would be N*log(N), but it may still be better than unordered map.
same as a above, but just use a fixed-size array of some small size, e.g. 8. My assumption is that most inputs have very few, if any, rare features, and in cases where a given input has more than 8 rare features, it doesn't matter if we drop some of them.

option 2 is an optimization of option 1, so can just go with option 1 and see if it's too slow.

compiler-rt/lib/fuzzer/FuzzerCorpus.h
127	yea, please drop g_.
525	Yea, I'd prefer uint16_t and a saturated add. Just to save some RAM
compiler-rt/lib/fuzzer/FuzzerLoop.cpp
607	I'd prefer to not mix unrelated changes in one diff -- makes the code review quadratic. Please contribute this one separately (I am not 100% sure I understand it)

vitalybuka added a reviewer: vitalybuka.Feb 3 2020, 5:52 PM

marcel updated this revision to Diff 246591.Feb 25 2020, 4:09 PM

Changes since last commit:

Better memory footprint
- Global feature frequencies is a fixed size array with uint16_t-width elements: uint16_t GlobalFeatureFreqs[kFeatureSetSize] (instead of size_t-width)
- Local feature frequencies is a sorted vector of pairs: Vector<std::pair<uint32_t,uint16_t>> FeatureFreqs (instead of unordered unordered_map<size_t,size_t>)
Added options to enable/disable entropic and pulled some constants out:
- Option entropic (default: 0): Enables entropic power schedule.
- Option considered_rare (default: 0xFF): If entropic is enabled, all features which are observed less often than the specified value are considered as rare.
- Option topX_rarest_features (default: 100): If entropic is enabled, we keep track of the frequencies only for the Top-X least abundant features (union features that are considered as rare).
- Option sparse_energy_updates (default: 10000): If entropic is enabled, the inverse value specifies the probability to update the corpus distribution even though no features or seeds were added or deleted. A larger value means less updates.
- Constant kProbAgressiveSchedule (default: 80): Determines the probability to choose a more aggressive schedule (assigning zero weight to below average seeds).
Turned most occurrences of long double to double.
Better variable names and comments.
Removed some redundant code.
Fuzzer unit tests (Thanks @Valentin!)

The patch applies to LLVM-trunk. We tested these changes by repeating all our FTS experiments (20 runs).
The performance results look pretty much the same as with the other patch, which is good.

vitalybuka added inline comments.Feb 27 2020, 2:04 PM

compiler-rt/lib/fuzzer/FuzzerCorpus.h
41	it would be nice to manipulate this fields only with reasonable named methods
43	could you please initialize them?
443	It would be nice to rename variables in a such way that reader without background can understand what is going on.
451	please don't reuse variables like Y here just declare as close as possible to first use, or even better with assignment
525	uint16_t GlobalFeatureFreqs[kFeatureSetSize] = {}; instead of memsets it would be nice do to the same for other arrays here, but in a separate patch
compiler-rt/lib/fuzzer/FuzzerFlags.def
156	entropic -> focus_rare_features Not sure how, it would be nice to rename sparse_energy_updates as something meaningful to libfuzzer user, to make it explain behavior change, not implementation details like now.

dgg5503 added a subscriber: dgg5503.Mar 9 2020, 1:56 PM

In this *third revision*, we accomodate Vitaly's comments:

Better naming (e.g., s/Sum_Y/SumIncidence/g)
Initialized variables on struct InputInfo
Locally, don't reuse variables but declare them anew.
Instead of memset'ing or global array, we declare uint16_t GlobalFeatureFreqs[kFeatureSetSize] = {};
We keep the entropic option, though. Hope this is okay.

In addition, we removed option "sparse_energy_updates" and made it a constant kSparseEnergyUpdates = 10000.

Also, good news: Entropic is coming in second @ Fuzzbench: https://www.fuzzbench.com/reports/2020-03-04/index.html

Ping

Dor1s added inline comments.Mar 18 2020, 9:59 PM

compiler-rt/lib/fuzzer/FuzzerCorpus.h
143	since this is a vector, I don't think we need to manually clear it in the destructor
194	nit: seems like `log` should be sufficient, as Energy is `double`, not `long double`
343	nit: I'd rather use `auto` or `decltype(RareFeatures)::value_type` to avoid type mismatch if we ever change `RareFeatures` definition and forget to change the type here
352	assuming this code gets executed quite often, and the order inside `RareFeatures` isn't important, we can avoid erase-remove and do something like: RareFeatures[index_from_the_loop] = RareFeatures.back(); RareFeatures.resize(RareFeatures.size() - 1); but the loop on line 269 would have to use index in the vector (from 1 to `< RareFeatures.size()`) instead of the iterator feel free to ignore though, it's just a suggestion which may or may not be a good one :)
366	since we always do this, `resize()` in my suggestion above might not be even needed, but that's a minor thing
373	can this and the loop on the line 294 be combined?
374	please consider adding a comment why we skip inputs with zero energy
414	invert the condition to reduce indentation: if (!II) return;
416	can do return after this line and avoid `else` with extra indentation below
416	what is `1` here? would it make sense to have a static const variable with a descriptive name, e.g. `kDefaultSomething`?
421	same point, what is `0`? a constant with a descriptive name or (less preferred) comment would be really helpful for others who come to read the code in future
439	From the code below it seems like `Energy` represents entropy and the max value is 0, which we reduce depending on the actual feature frequencies. Is that correct understanding?
451	it seems like the `Rand(kSparseEnergyUpdates)` clause is applicable to the `Entropic` case only, is that correct? Do we really need it in the vanilla case?
480	could you please rewrite this in a more readable if-else form?
482	why `20`? a constant with a descriptive name or a comment would be appreciated
494	so if there is at least one input that touches focus function, we will be always wasting time in this loop (starting on the line 472) and then falling back to the `VanillaSchedule` case? In such case I think we should just check `Options.FocusFunction` and use vanilla schedule if it's set, just because almost always there will be input(s) touching the focus function
compiler-rt/lib/fuzzer/FuzzerLoop.cpp
712	does it always need update, even when new coverage wasn't observed?

Preparing the next patch.

compiler-rt/lib/fuzzer/FuzzerCorpus.h
352	With the subsequent push_back (Line 292), do you mean a swap and pop_back here?
416	We are setting the frequency of Idx32 to 1. Adding a comment.
421	Zero (0) is the default value for lower_bound as binary search over a vector of pairs. In this case, any value would work.
439	Yes, we estimate the entropy over the probabilities of the features in the neighborhood of the seed. Entropy is positive. The maximum entropy is `logl(GlobalNumberOfFeatures)`.
451	Yes. `kSparseEnergyUpdates` should apply only for Entropic.
compiler-rt/lib/fuzzer/FuzzerLoop.cpp
712	For II, the local feature frequencies have changed. So we schedule an update. However, it will only be updated when the distribution needs an update, and we do not set `DistributionNeedsUpdate` here.

Changes in this fourth revision.

Error message + exit if parameters '--focus_function` and --entropic are used together.
Refactored a constant (kMaxMutationFactor)
Better code formatting
- early return for better indentation,
- more comments,
- less ternaries,

vitalybuka added inline comments.Mar 20 2020, 12:43 AM

compiler-rt/lib/fuzzer/FuzzerCorpus.h
136	please make field private and add access method if it's needed outside
316	DeleteFeatureFreq -> InputInfo::DeleteFeatureFreq
338	uint32_t MostAbundantRareFeatureIdx[2] = {} or just MostAbundantRareFeatureIdx1 MostAbundantRareFeatureIdx2
395	InputInfo::UpdateFeatureFrequency
446	InputInfo::UpdateEnergy
447	"long double" is still there?
compiler-rt/lib/fuzzer/FuzzerFlags.def
156	many of comments are marked as "Done" but I see no changes.

Dor1s added inline comments.Mar 23 2020, 11:29 PM

compiler-rt/lib/fuzzer/FuzzerCorpus.h

352

yes, swap and pop_back would have the same effect

439

sorry, I don't understand. Below are the code lines changing Energy value:

  II->Energy = 0.0;
  II->SumIncidence = 0;

  // Apply add-one smoothing to locally discovered features.
  for (auto F : II->FeatureFreqs) {
    size_t LocalIncidence = F.second + 1;
    Energy -= LocalIncidence * logl(LocalIncidence);
    SumIncidence += LocalIncidence;
  }

  <...>

  // Add a single locally abundant feature apply add-one smoothing.
  size_t AbdIncidence = II->NumExecutedMutations + 1;
  Energy -= AbdIncidence * logl(AbdIncidence);
  <...>

  // Normalize.
  if (SumIncidence != 0)
    Energy = (Energy / SumIncidence) + logl(SumIncidence);

  II->Energy = (double)Energy;
<...>
}

as I read this, I see that Energy should be negative in many cases?

Just completed a few tests with the revised patch in FuzzBench. Going to upload the revision soon.

compiler-rt/lib/fuzzer/FuzzerCorpus.h
316	Moved directly into the InputInfo struct.
352	Implemented your swap and pop_back idea. Cheers!
439	Sorry for the brevity. This is why `II->Energy` is positive. Entropy is computed as $-\sum_{i=1}^S p_i \log(p_i)$ where $p_i$ is the probability that fuzzing `II` generates an input that exercises feature $i$ and $S$ is the total number of rare features. We could estimate the probability $p_i$ as the proportion of generated inputs that exercise $i$, i.e., $\hat p_i = LocalIncidence_i / SumIncidence$. If you plug this proportion into the formula for entropy, you can compute entropy as $[-\sum_{i=1}^S LocalIncidence_i \log(LocalIncidence_i)] / SumIncidence + log(SumIncidence)$. While Energy is certainly negative before `// Normalize.`, it is positive after. Just drop me a DM. I'll send you the write up.
447	Yes, keeping maximum precision during the processing to minimize the cumulative FP arithmetic error, and downcast to double once the processing is done.
compiler-rt/lib/fuzzer/FuzzerFlags.def
156	Tried to address all comments either inline or in the summary. In this case, I wrote In D73776#1921184, @marcel wrote: We keep the entropic option, though. Hope this is okay.

This is revision number 5. Thanks for all the great feedback!

Moved InputInfo functions (UpdateEnergy, UpdateFeatureFrequency, DeleteFeatureFreq) into InputInfo struct.
Made NumExecutedMutations private. Added public FuzzerCorpus::IncrementNumExecutedMutations()
Set kSparseEnergyUpdates to 100 (instead of 10000). Seems to give better results on some FuzzBench subjects.
Removed kProbAgressiveSchedule. Seems to give better results on some FuzzBench subjects.
Some cleanup in FuzzerCorpus::AddRareFeature(). Use swap and pop_back. Use arrary to maintain most and second-most abundant rare feature.
Been playing with LF's power schedule, and in preliminary experiments it seems that prioritizing faster seeds brings quite some performance gains.
Submitted PR to FuzzBench for evaluation: https://github.com/google/fuzzbench/pull/226

Submitted PR to FuzzBench for evaluation: https://github.com/google/fuzzbench/pull/226

FuzzBench results for Entropic: https://www.fuzzbench.com/reports/2020-04-14/index.html

Commenting on just to issues, not the hole patch.

compiler-rt/lib/fuzzer/FuzzerCorpus.h
36	this is new in the patch, is it? While I completely understand why we'd want to use execution time as a signal for weights, it makes fuzzing process non-reproducible with a given seed, which I consider pretty bad. If we used 32- or 64- bit edge counters we could have substituted them for time, but alas, we use 8-bit ones.
69	I'm still worried about long double due to portability. Do you actually "know" that it's critical to use long double here?
compiler-rt/lib/fuzzer/FuzzerLoop.cpp
682	for consistency, please use the C++ interface for getting current time (as elsewhere in the code). But see above about my comment on time in gneral.

marcel marked 2 inline comments as done.Apr 22 2020, 4:40 PM

marcel added inline comments.

compiler-rt/lib/fuzzer/FuzzerCorpus.h
36	this is new in the patch, is it? Yes. Been playing with a few smaller tweaks to boost LF performance. While I completely understand why we'd want to use execution time as a signal for weights, it makes fuzzing process non-reproducible with a given seed, which I consider pretty bad. Do you mean LibFuzzer should be fully deterministic when you start it with the same seed corpus (e.g., by fixing the random seed)? Currently, even without this patch I've been observing quite some variance in the coverage achieved over time. Happy to take it out, though, if this messes with the LF design principles.
69	You are right. After fixing frequencies to `uint16_t`, this can definitely be a `double`.

Please take out the time-related changes for now. If anything, extra changes make the code review process quadratic.

Yes, I expect that given the same seeds corpus and a fixed random seed (-seed=<N>) libFuzzer will produce the same mutations.
There is probably at least one case where this is not actually true today: ASLR (we use PCs to index into arrays, etc),
but other than that I'd expect LF to be deterministic.

Please also replace long double with double.
I'll try to make another review PASS ASAP.

Revision 6: Removed time-based performance boost (would render LibFuzzer non-deterministic even if the random seed is fixed with -seed=<n>). Also, sed "s/long double/double/g".

Sorry for the delay. Mostly naming/style nits left.

compiler-rt/lib/fuzzer/FuzzerCorpus.h
42	NeedsEnergyUpdate?
56	Lower (use upper case)
106	Lower
133	I'd like to put these into a struct EntropicOptions { bool Enabled; size_t EntropicFeatureFrequencyThreshold; ... } and pass such a struct (by value) to InputCorpus CTOR. Of course, please match the names with those in Options
compiler-rt/lib/fuzzer/FuzzerOptions.h
47	I'd prefer if these names were more descriptive (ok if longer) and have "entropic" in it. E.g. Entropic EntropicFeatureFrequencyThreshold EntropicNumberOfRarestFeatures of course, make the command line parameters match

marcel marked 5 inline comments as done.May 15 2020, 6:47 AM

Thanks again everyone for your time to review. Learned a lot!

Revision 7. Created struct EntropicOptions which lives in the FuzzerCorpus.h header. The EntropicOptions are passed by value from FuzzerDriver (which handles FuzzerOptions) when constructing the InputCorpus. Renamed option parameters as per KCC's suggestions. As always, formatted with clang-format.

Let me know if there is anything else, I can do.

kcc added inline comments.May 15 2020, 2:14 PM

compiler-rt/lib/fuzzer/FuzzerCorpus.h
132	here and below: remove 'struct'.
compiler-rt/lib/fuzzer/tests/FuzzerUnittest.cpp
595–596	When running 'ninja check-fuzzer' this test fails for me: That's weird: why does the functionality change with Entropic off? [5/9] Running Fuzzer unit tests FAIL: LLVMFuzzer-Unittest :: ./Fuzzer-x86_64-Test/Corpus.Distribution (48 of 55) TEST 'LLVMFuzzer-Unittest :: ./Fuzzer-x86_64-Test/Corpus.Distribution' FAILED **** Note: Google Test filter = Corpus.Distribution [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from Corpus [ RUN ] Corpus.Distribution /usr/local/google/home/kcc/llvm-project/compiler-rt/lib/fuzzer/tests/FuzzerUnittest.cpp:609: Failure Expected: (Hist[i]) > (TriesPerUnit / N / 3), actual: 0 vs 2184 /usr/local/google/home/kcc/llvm-project/compiler-rt/lib/fuzzer/tests/FuzzerUnittest.cpp:609: Failure Expected: (Hist[i]) > (TriesPerUnit / N / 3), actual: 0 vs 2184 /usr/local/google/home/kcc/llvm-project/compiler-rt/lib/fuzzer/tests/FuzzerUnittest.cpp:609: Failure
1100	Will a simpler syntax work, e.g.: {1,3}, {2,3}

marcel marked 4 inline comments as done.May 15 2020, 9:38 PM

marcel added inline comments.

compiler-rt/lib/fuzzer/tests/FuzzerUnittest.cpp
595–596	Instead of updating the corpus distribution every time it changes (e.g., FuzzerCorpus.h#L114 and FuzzerCorpus.h#L165), entropic schedules that update by setting a flag. For efficiency, only when (and just before) a new input is chosen, the corpus distribution is actually updated. I had this update in ChooseUnitToMutate which calls ChooseUnitIdxToMutate. Now moved the call to UpdateCorpusDistribution to ChooseUnitIdxToMutate (which is used by the test case). All 40 fuzzer unit tests pass.

Revision 8. Moved the call to UpdateCorpusDistribution from ChooseUnitToMutate to ChooseUnitIdxToMutate and removed some nits.

make -j 32 FuzzerUnitTests && \
projects/compiler-rt/lib/fuzzer/tests/Fuzzer-x86_64-Test

returns with

[==========] 40 tests from 9 test cases ran. (408 ms total)
[  PASSED  ] 40 tests.

Minor. Removed one line I used for debugging..

Thanks for this work, and the effort to make the code better!

This revision is now accepted and ready to land.May 18 2020, 11:47 AM

(let me land it)

Closed by commit rGe2e38fca64e4: Entropic: Boosting LibFuzzer Performance (authored by morehouse). · Explain WhyMay 19 2020, 10:56 AM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2020, 10:56 AM

Herald added a subscriber: Restricted Project. · View Herald Transcript

dokyungs added a subscriber: dokyungs.Jun 30 2020, 11:27 AM

Revision Contents

Path

Size

compiler-rt/

lib/

fuzzer/

249 lines

22 lines

8 lines

5 lines

3 lines

tests/

FuzzerUnittest.cpp

65 lines

Diff 264969

compiler-rt/lib/fuzzer/FuzzerCorpus.h

Show All 27 Lines	struct InputInfo {
Unit U; // The actual input data.		Unit U; // The actual input data.
uint8_t Sha1[kSHA1NumBytes]; // Checksum.		uint8_t Sha1[kSHA1NumBytes]; // Checksum.
// Number of features that this input has and no smaller input has.		// Number of features that this input has and no smaller input has.
size_t NumFeatures = 0;		size_t NumFeatures = 0;
size_t Tmp = 0; // Used by ValidateFeatureSet.		size_t Tmp = 0; // Used by ValidateFeatureSet.
// Stats.		// Stats.
size_t NumExecutedMutations = 0;		size_t NumExecutedMutations = 0;
size_t NumSuccessfullMutations = 0;		size_t NumSuccessfullMutations = 0;
bool MayDeleteFile = false;		bool MayDeleteFile = false;
		kccUnsubmitted Not Done Reply Inline Actions this is new in the patch, is it? While I completely understand why we'd want to use execution time as a signal for weights, it makes fuzzing process non-reproducible with a given seed, which I consider pretty bad. If we used 32- or 64- bit edge counters we could have substituted them for time, but alas, we use 8-bit ones. kcc: this is new in the patch, is it? While I completely understand why we'd want to use execution…
		marcelAuthorUnsubmitted Done Reply Inline Actions this is new in the patch, is it? Yes. Been playing with a few smaller tweaks to boost LF performance. While I completely understand why we'd want to use execution time as a signal for weights, it makes fuzzing process non-reproducible with a given seed, which I consider pretty bad. Do you mean LibFuzzer should be fully deterministic when you start it with the same seed corpus (e.g., by fixing the random seed)? Currently, even without this patch I've been observing quite some variance in the coverage achieved over time. Happy to take it out, though, if this messes with the LF design principles. marcel: > this is new in the patch, is it? Yes. Been playing with a few smaller tweaks to boost LF…
bool Reduced = false;		bool Reduced = false;
bool HasFocusFunction = false;		bool HasFocusFunction = false;
Vector<uint32_t> UniqFeatureSet;		Vector<uint32_t> UniqFeatureSet;
Vector<uint8_t> DataFlowTraceForFocusFunction;		Vector<uint8_t> DataFlowTraceForFocusFunction;
		// Power schedule.
		vitalybukaUnsubmitted Done Reply Inline Actions it would be nice to manipulate this fields only with reasonable named methods vitalybuka: it would be nice to manipulate this fields only with reasonable named methods
		bool NeedsEnergyUpdate = false;
		kccUnsubmitted Done Reply Inline Actions NeedsEnergyUpdate? kcc: NeedsEnergyUpdate?
		double Energy = 0.0;
		vitalybukaUnsubmitted Done Reply Inline Actions could you please initialize them? vitalybuka: could you please initialize them?
		size_t SumIncidence = 0;
		kccUnsubmitted Done Reply Inline Actions why not just 'double'? kcc: why not just 'double'?
		Vector<std::pair<uint32_t, uint16_t>> FeatureFreqs;
		kccUnsubmitted Done Reply Inline Actions Any chance to have a planer data structure here? (sorted array or some such) Also, why size_t? The feature is 4 bytes and the frequency can probably be 4 or even 2 bytes. (2 bytes is hard for alignment). kcc: Any chance to have a planer data structure here? (sorted array or some such) Also, why size_t?
		marcelAuthorUnsubmitted Done Reply Inline Actions Whether or not unorder_map, see top-level comment. If unordered_map, then uint32_t for keys and values should work. marcel: Whether or not unorder_map, see top-level comment. If unordered_map, then uint32_t for keys and…

		// Delete feature Idx and its frequency from FeatureFreqs.
		bool DeleteFeatureFreq(uint32_t Idx) {
		if (FeatureFreqs.empty())
		return false;

		// Binary search over local feature frequencies sorted by index.
		auto Lower = std::lower_bound(FeatureFreqs.begin(), FeatureFreqs.end(),
		std::pair<uint32_t, uint16_t>(Idx, 0));

		if (Lower != FeatureFreqs.end() && Lower->first == Idx) {
		kccUnsubmitted Done Reply Inline Actions Lower (use upper case) kcc: Lower (use upper case)
		FeatureFreqs.erase(Lower);
		return true;
		}
		return false;
		}

		// Assign more energy to a high-entropy seed, i.e., that reveals more
		// information about the globally rare features in the neighborhood
		// of the seed. Since we do not know the entropy of a seed that has
		// never been executed we assign fresh seeds maximum entropy and
		// let II->Energy approach the true entropy from above.
		void UpdateEnergy(size_t GlobalNumberOfFeatures) {
		Energy = 0.0;
		kccUnsubmitted Not Done Reply Inline Actions I'm still worried about long double due to portability. Do you actually "know" that it's critical to use long double here? kcc: I'm still worried about long double due to portability. Do you actually "know" that it's…
		marcelAuthorUnsubmitted Done Reply Inline Actions You are right. After fixing frequencies to `uint16_t`, this can definitely be a `double`. marcel: You are right. After fixing frequencies to `uint16_t`, this can definitely be a `double`.
		SumIncidence = 0;

		// Apply add-one smoothing to locally discovered features.
		for (auto F : FeatureFreqs) {
		size_t LocalIncidence = F.second + 1;
		Energy -= LocalIncidence * logl(LocalIncidence);
		SumIncidence += LocalIncidence;
		}

		// Apply add-one smoothing to locally undiscovered features.
		// PreciseEnergy -= 0; // since logl(1.0) == 0)
		SumIncidence += (GlobalNumberOfFeatures - FeatureFreqs.size());

		// Add a single locally abundant feature apply add-one smoothing.
		size_t AbdIncidence = NumExecutedMutations + 1;
		Energy -= AbdIncidence * logl(AbdIncidence);
		SumIncidence += AbdIncidence;

		// Normalize.
		if (SumIncidence != 0)
		Energy = (Energy / SumIncidence) + logl(SumIncidence);
		}

		// Increment the frequency of the feature Idx.
		void UpdateFeatureFrequency(uint32_t Idx) {
		NeedsEnergyUpdate = true;

		// The local feature frequencies is an ordered vector of pairs.
		// If there are no local feature frequencies, push_back preserves order.
		// Set the feature frequency for feature Idx32 to 1.
		if (FeatureFreqs.empty()) {
		FeatureFreqs.push_back(std::pair<uint32_t, uint16_t>(Idx, 1));
		return;
		}

		// Binary search over local feature frequencies sorted by index.
		auto Lower = std::lower_bound(FeatureFreqs.begin(), FeatureFreqs.end(),
		kccUnsubmitted Done Reply Inline Actions Lower kcc: Lower
		std::pair<uint32_t, uint16_t>(Idx, 0));

		// If feature Idx32 already exists, increment its frequency.
		// Otherwise, insert a new pair right after the next lower index.
		if (Lower != FeatureFreqs.end() && Lower->first == Idx) {
		Lower->second++;
		} else {
		FeatureFreqs.insert(Lower, std::pair<uint32_t, uint16_t>(Idx, 1));
		}
		}
		};

		struct EntropicOptions {
		bool Enabled;
		size_t NumberOfRarestFeatures;
		size_t FeatureFrequencyThreshold;
};		};

class InputCorpus {		class InputCorpus {
static const size_t kFeatureSetSize = 1 << 21;		static const uint32_t kFeatureSetSize = 1 << 21;
		static const uint8_t kMaxMutationFactor = 20;
		kccUnsubmitted Done Reply Inline Actions why is this public? Also, why the g_ prefix? kcc: why is this public? Also, why the g_ prefix?
		marcelAuthorUnsubmitted Done Reply Inline Actions `g_NumExecutedMutations` contains the total number of inputs generated. It is updated in FuzzerLoop.c in `MutateAndTestOne`. I can drop the `g_` prefix. marcel: `g_NumExecutedMutations` contains the total number of inputs generated. It is updated in…
		kccUnsubmitted Done Reply Inline Actions yea, please drop g_. kcc: yea, please drop g_.
		static const size_t kSparseEnergyUpdates = 100;

		size_t NumExecutedMutations = 0;

		EntropicOptions Entropic;
		kccUnsubmitted Done Reply Inline Actions here and below: remove 'struct'. kcc: here and below: remove 'struct'.

		kccUnsubmitted Done Reply Inline Actions I'd like to put these into a struct EntropicOptions { bool Enabled; size_t EntropicFeatureFrequencyThreshold; ... } and pass such a struct (by value) to InputCorpus CTOR. Of course, please match the names with those in Options kcc: I'd like to put these into a struct EntropicOptions { bool Enabled; size_t…
public:		public:
InputCorpus(const std::string &OutputCorpus) : OutputCorpus(OutputCorpus) {		InputCorpus(const std::string &OutputCorpus, EntropicOptions Entropic)
		: Entropic(Entropic), OutputCorpus(OutputCorpus) {
		vitalybukaUnsubmitted Done Reply Inline Actions please make field private and add access method if it's needed outside vitalybuka: please make field private and add access method if it's needed outside
memset(InputSizesPerFeature, 0, sizeof(InputSizesPerFeature));		memset(InputSizesPerFeature, 0, sizeof(InputSizesPerFeature));
memset(SmallestElementPerFeature, 0, sizeof(SmallestElementPerFeature));		memset(SmallestElementPerFeature, 0, sizeof(SmallestElementPerFeature));
}		}
~InputCorpus() {		~InputCorpus() {
for (auto II : Inputs)		for (auto II : Inputs)
delete II;		delete II;
		kccUnsubmitted Done Reply Inline Actions do you need this? the following line should take care of it kcc: do you need this? the following line should take care of it
}		}
		Dor1sUnsubmitted Done Reply Inline Actions since this is a vector, I don't think we need to manually clear it in the destructor Dor1s: since this is a vector, I don't think we need to manually clear it in the destructor
size_t size() const { return Inputs.size(); }		size_t size() const { return Inputs.size(); }
size_t SizeInBytes() const {		size_t SizeInBytes() const {
size_t Res = 0;		size_t Res = 0;
for (auto II : Inputs)		for (auto II : Inputs)
Res += II->U.size();		Res += II->U.size();
return Res;		return Res;
}		}
size_t NumActiveUnits() const {		size_t NumActiveUnits() const {
size_t Res = 0;		size_t Res = 0;
for (auto II : Inputs)		for (auto II : Inputs)
Res += !II->U.empty();		Res += !II->U.empty();
return Res;		return Res;
}		}
size_t MaxInputSize() const {		size_t MaxInputSize() const {
size_t Res = 0;		size_t Res = 0;
for (auto II : Inputs)		for (auto II : Inputs)
Res = std::max(Res, II->U.size());		Res = std::max(Res, II->U.size());
return Res;		return Res;
}		}
		void IncrementNumExecutedMutations() { NumExecutedMutations++; }

		kccUnsubmitted Done Reply Inline Actions Please try to follow the coding style, i.e. capitalize the names: Singletons Also, the term Singleton may be a bit too overloaded. How about NumFeaturesCoveredByASingleInput or some such? kcc: Please try to follow the coding style, i.e. capitalize the names: Singletons Also, the term…
size_t NumInputsThatTouchFocusFunction() {		size_t NumInputsThatTouchFocusFunction() {
return std::count_if(Inputs.begin(), Inputs.end(), [](const InputInfo *II) {		return std::count_if(Inputs.begin(), Inputs.end(), [](const InputInfo *II) {
return II->HasFocusFunction;		return II->HasFocusFunction;
});		});
}		}

size_t NumInputsWithDataFlowTrace() {		size_t NumInputsWithDataFlowTrace() {
return std::count_if(Inputs.begin(), Inputs.end(), [](const InputInfo *II) {		return std::count_if(Inputs.begin(), Inputs.end(), [](const InputInfo *II) {
Show All 12 Lines	if (FeatureDebug)
Printf("ADD_TO_CORPUS %zd NF %zd\n", Inputs.size(), NumFeatures);		Printf("ADD_TO_CORPUS %zd NF %zd\n", Inputs.size(), NumFeatures);
Inputs.push_back(new InputInfo());		Inputs.push_back(new InputInfo());
InputInfo &II = *Inputs.back();		InputInfo &II = *Inputs.back();
II.U = U;		II.U = U;
II.NumFeatures = NumFeatures;		II.NumFeatures = NumFeatures;
II.MayDeleteFile = MayDeleteFile;		II.MayDeleteFile = MayDeleteFile;
II.UniqFeatureSet = FeatureSet;		II.UniqFeatureSet = FeatureSet;
II.HasFocusFunction = HasFocusFunction;		II.HasFocusFunction = HasFocusFunction;
		// Assign maximal energy to the new seed.
		II.Energy = RareFeatures.empty() ? 1.0 : log(RareFeatures.size());
		Dor1sUnsubmitted Done Reply Inline Actions nit: seems like `log` should be sufficient, as Energy is `double`, not `long double` Dor1s: nit: seems like `log` should be sufficient, as Energy is `double`, not `long double`
		II.SumIncidence = RareFeatures.size();
		II.NeedsEnergyUpdate = false;
std::sort(II.UniqFeatureSet.begin(), II.UniqFeatureSet.end());		std::sort(II.UniqFeatureSet.begin(), II.UniqFeatureSet.end());
ComputeSHA1(U.data(), U.size(), II.Sha1);		ComputeSHA1(U.data(), U.size(), II.Sha1);
auto Sha1Str = Sha1ToString(II.Sha1);		auto Sha1Str = Sha1ToString(II.Sha1);
Hashes.insert(Sha1Str);		Hashes.insert(Sha1Str);
if (HasFocusFunction)		if (HasFocusFunction)
if (auto V = DFT.Get(Sha1Str))		if (auto V = DFT.Get(Sha1Str))
II.DataFlowTraceForFocusFunction = *V;		II.DataFlowTraceForFocusFunction = *V;
// This is a gross heuristic.		// This is a gross heuristic.
// Ideally, when we add an element to a corpus we need to know its DFT.		// Ideally, when we add an element to a corpus we need to know its DFT.
// But if we don't, we'll use the DFT of its base input.		// But if we don't, we'll use the DFT of its base input.
if (II.DataFlowTraceForFocusFunction.empty() && BaseII)		if (II.DataFlowTraceForFocusFunction.empty() && BaseII)
II.DataFlowTraceForFocusFunction = BaseII->DataFlowTraceForFocusFunction;		II.DataFlowTraceForFocusFunction = BaseII->DataFlowTraceForFocusFunction;
UpdateCorpusDistribution();		DistributionNeedsUpdate = true;
PrintCorpus();		PrintCorpus();
// ValidateFeatureSet();		// ValidateFeatureSet();
return &II;		return &II;
}		}

// Debug-only		// Debug-only
void PrintUnit(const Unit &U) {		void PrintUnit(const Unit &U) {
if (!FeatureDebug) return;		if (!FeatureDebug) return;
Show All 34 Lines	public:
void Replace(InputInfo *II, const Unit &U) {		void Replace(InputInfo *II, const Unit &U) {
assert(II->U.size() > U.size());		assert(II->U.size() > U.size());
Hashes.erase(Sha1ToString(II->Sha1));		Hashes.erase(Sha1ToString(II->Sha1));
DeleteFile(*II);		DeleteFile(*II);
ComputeSHA1(U.data(), U.size(), II->Sha1);		ComputeSHA1(U.data(), U.size(), II->Sha1);
Hashes.insert(Sha1ToString(II->Sha1));		Hashes.insert(Sha1ToString(II->Sha1));
II->U = U;		II->U = U;
II->Reduced = true;		II->Reduced = true;
UpdateCorpusDistribution();		DistributionNeedsUpdate = true;
}		}

bool HasUnit(const Unit &U) { return Hashes.count(Hash(U)); }		bool HasUnit(const Unit &U) { return Hashes.count(Hash(U)); }
bool HasUnit(const std::string &H) { return Hashes.count(H); }		bool HasUnit(const std::string &H) { return Hashes.count(H); }
InputInfo &ChooseUnitToMutate(Random &Rand) {		InputInfo &ChooseUnitToMutate(Random &Rand) {
InputInfo &II = *Inputs[ChooseUnitIdxToMutate(Rand)];		InputInfo &II = *Inputs[ChooseUnitIdxToMutate(Rand)];
assert(!II.U.empty());		assert(!II.U.empty());
return II;		return II;
}		}

// Returns an index of random unit from the corpus to mutate.		// Returns an index of random unit from the corpus to mutate.
size_t ChooseUnitIdxToMutate(Random &Rand) {		size_t ChooseUnitIdxToMutate(Random &Rand) {
		UpdateCorpusDistribution(Rand);
size_t Idx = static_cast<size_t>(CorpusDistribution(Rand));		size_t Idx = static_cast<size_t>(CorpusDistribution(Rand));
assert(Idx < Inputs.size());		assert(Idx < Inputs.size());
return Idx;		return Idx;
}		}

void PrintStats() {		void PrintStats() {
for (size_t i = 0; i < Inputs.size(); i++) {		for (size_t i = 0; i < Inputs.size(); i++) {
const auto &II = *Inputs[i];		const auto &II = *Inputs[i];
Show All 19 Lines	void DeleteFile(const InputInfo &II) {
if (!OutputCorpus.empty() && II.MayDeleteFile)		if (!OutputCorpus.empty() && II.MayDeleteFile)
RemoveFile(DirPlusFile(OutputCorpus, Sha1ToString(II.Sha1)));		RemoveFile(DirPlusFile(OutputCorpus, Sha1ToString(II.Sha1)));
}		}

void DeleteInput(size_t Idx) {		void DeleteInput(size_t Idx) {
InputInfo &II = *Inputs[Idx];		InputInfo &II = *Inputs[Idx];
DeleteFile(II);		DeleteFile(II);
Unit().swap(II.U);		Unit().swap(II.U);
		II.Energy = 0.0;
		II.NeedsEnergyUpdate = false;
		DistributionNeedsUpdate = true;
if (FeatureDebug)		if (FeatureDebug)
Printf("EVICTED %zd\n", Idx);		Printf("EVICTED %zd\n", Idx);
}		}

		void AddRareFeature(uint32_t Idx) {
		vitalybukaUnsubmitted Done Reply Inline Actions DeleteFeatureFreq -> InputInfo::DeleteFeatureFreq vitalybuka: DeleteFeatureFreq -> InputInfo::DeleteFeatureFreq
		marcelAuthorUnsubmitted Done Reply Inline Actions Moved directly into the InputInfo struct. marcel: Moved directly into the InputInfo struct.
		// Maintain at least TopXRarestFeatures many rare features
		// and all features with a frequency below ConsideredRare.
		// Remove all other features.
		while (RareFeatures.size() > Entropic.NumberOfRarestFeatures &&
		FreqOfMostAbundantRareFeature > Entropic.FeatureFrequencyThreshold) {

		kccUnsubmitted Done Reply Inline Actions try not to use constants like this. At the very least, use const size_t kSomething = 100; .. if (xxx.size() > kSomething) if there is any reason to play with different value, you may want to use a flag instead Similar for 0xFF. Probably, the best here is to replace these two constants with function parameters so that this function becomes unit-testable. Which reminds me: please consider adding unit tests to the key functionality like this one. kcc: try not to use constants like this. At the very least, use const size_t kSomething = 100; .
		marcelAuthorUnsubmitted Done Reply Inline Actions We will pull these constants out as command line options. marcel: We will pull these constants out as command line options.
		// Find most and second most abbundant feature.
		uint32_t MostAbundantRareFeatureIndices[2] = {RareFeatures[0],
		RareFeatures[0]};
		size_t Delete = 0;
		for (size_t i = 0; i < RareFeatures.size(); i++) {
		uint32_t Idx2 = RareFeatures[i];
		if (GlobalFeatureFreqs[Idx2] >=
		GlobalFeatureFreqs[MostAbundantRareFeatureIndices[0]]) {
		MostAbundantRareFeatureIndices[1] = MostAbundantRareFeatureIndices[0];
		MostAbundantRareFeatureIndices[0] = Idx2;
		Delete = i;
		}
		}

		// Remove most abundant rare feature.
		RareFeatures[Delete] = RareFeatures.back();
		vitalybukaUnsubmitted Done Reply Inline Actions uint32_t MostAbundantRareFeatureIdx[2] = {} or just MostAbundantRareFeatureIdx1 MostAbundantRareFeatureIdx2 vitalybuka: uint32_t MostAbundantRareFeatureIdx[2] = {} or just MostAbundantRareFeatureIdx1…
		RareFeatures.pop_back();

		for (auto II : Inputs) {
		if (II->DeleteFeatureFreq(MostAbundantRareFeatureIndices[0]))
		II->NeedsEnergyUpdate = true;
		Dor1sUnsubmitted Not Done Reply Inline Actions nit: I'd rather use `auto` or `decltype(RareFeatures)::value_type` to avoid type mismatch if we ever change `RareFeatures` definition and forget to change the type here Dor1s: nit: I'd rather use `auto` or `decltype(RareFeatures)::value_type` to avoid type mismatch if we…
		}

		// Set 2nd most abundant as the new most abundant feature count.
		FreqOfMostAbundantRareFeature =
		GlobalFeatureFreqs[MostAbundantRareFeatureIndices[1]];
		}

		// Add rare feature, handle collisions, and update energy.
		RareFeatures.push_back(Idx);
		Dor1sUnsubmitted Not Done Reply Inline Actions assuming this code gets executed quite often, and the order inside `RareFeatures` isn't important, we can avoid erase-remove and do something like: RareFeatures[index_from_the_loop] = RareFeatures.back(); RareFeatures.resize(RareFeatures.size() - 1); but the loop on line 269 would have to use index in the vector (from 1 to `< RareFeatures.size()`) instead of the iterator feel free to ignore though, it's just a suggestion which may or may not be a good one :) Dor1s: assuming this code gets executed quite often, and the order inside `RareFeatures` isn't…
		marcelAuthorUnsubmitted Done Reply Inline Actions With the subsequent push_back (Line 292), do you mean a swap and pop_back here? marcel: With the subsequent push_back (Line 292), do you mean a swap and pop_back here?
		Dor1sUnsubmitted Done Reply Inline Actions yes, `swap` and `pop_back` would have the same effect Dor1s: yes, `swap` and `pop_back` would have the same effect
		marcelAuthorUnsubmitted Done Reply Inline Actions Implemented your swap and pop_back idea. Cheers! marcel: Implemented your swap and pop_back idea. Cheers!
		GlobalFeatureFreqs[Idx] = 0;
		for (auto II : Inputs) {
		II->DeleteFeatureFreq(Idx);

		// Apply add-one smoothing to this locally undiscovered feature.
		// Zero energy seeds will never be fuzzed and remain zero energy.
		if (II->Energy > 0.0) {
		II->SumIncidence += 1;
		II->Energy += logl(II->SumIncidence) / II->SumIncidence;
		}
		}

		DistributionNeedsUpdate = true;
		}
		Dor1sUnsubmitted Done Reply Inline Actions since we always do this, `resize()` in my suggestion above might not be even needed, but that's a minor thing Dor1s: since we always do this, `resize()` in my suggestion above might not be even needed, but that's…

bool AddFeature(size_t Idx, uint32_t NewSize, bool Shrink) {		bool AddFeature(size_t Idx, uint32_t NewSize, bool Shrink) {
assert(NewSize);		assert(NewSize);
Idx = Idx % kFeatureSetSize;		Idx = Idx % kFeatureSetSize;
uint32_t OldSize = GetFeature(Idx);		uint32_t OldSize = GetFeature(Idx);
if (OldSize == 0 \|\| (Shrink && OldSize > NewSize)) {		if (OldSize == 0 \|\| (Shrink && OldSize > NewSize)) {
if (OldSize > 0) {		if (OldSize > 0) {
		Dor1sUnsubmitted Done Reply Inline Actions can this and the loop on the line 294 be combined? Dor1s: can this and the loop on the line 294 be combined?
size_t OldIdx = SmallestElementPerFeature[Idx];		size_t OldIdx = SmallestElementPerFeature[Idx];
		Dor1sUnsubmitted Done Reply Inline Actions please consider adding a comment why we skip inputs with zero energy Dor1s: please consider adding a comment why we skip inputs with zero energy
InputInfo &II = *Inputs[OldIdx];		InputInfo &II = *Inputs[OldIdx];
assert(II.NumFeatures > 0);		assert(II.NumFeatures > 0);
II.NumFeatures--;		II.NumFeatures--;
if (II.NumFeatures == 0)		if (II.NumFeatures == 0)
DeleteInput(OldIdx);		DeleteInput(OldIdx);
} else {		} else {
NumAddedFeatures++;		NumAddedFeatures++;
		if (Entropic.Enabled)
		AddRareFeature((uint32_t)Idx);
}		}
NumUpdatedFeatures++;		NumUpdatedFeatures++;
if (FeatureDebug)		if (FeatureDebug)
Printf("ADD FEATURE %zd sz %d\n", Idx, NewSize);		Printf("ADD FEATURE %zd sz %d\n", Idx, NewSize);
SmallestElementPerFeature[Idx] = Inputs.size();		SmallestElementPerFeature[Idx] = Inputs.size();
InputSizesPerFeature[Idx] = NewSize;		InputSizesPerFeature[Idx] = NewSize;
return true;		return true;
}		}
return false;		return false;
}		}

		// Increment frequency of feature Idx globally and locally.
		kccUnsubmitted Done Reply Inline Actions This is the hotspot, right? My guess is that using a hash map here causes most of the slowdown, need a faster data structure... Need to think... It will help if you have describe the algorithm in a comment. kcc: This is the hotspot, right? My guess is that using a hash map here causes most of the slowdown…
		marcelAuthorUnsubmitted Done Reply Inline Actions Yes. This is the hot spot. However, most executions should exit in Line 337. FeaturesFreqMap is an array. Don't think you can get much faster here. marcel: Yes. This is the hot spot. However, most executions should exit in Line 337. FeaturesFreqMap is…
		vitalybukaUnsubmitted Done Reply Inline Actions InputInfo::UpdateFeatureFrequency vitalybuka: InputInfo::UpdateFeatureFrequency
		void UpdateFeatureFrequency(InputInfo *II, size_t Idx) {
		uint32_t Idx32 = Idx % kFeatureSetSize;

		// Saturated increment.
		if (GlobalFeatureFreqs[Idx32] == 0xFFFF)
		return;
		uint16_t Freq = GlobalFeatureFreqs[Idx32]++;

		// Skip if abundant.
		if (Freq > FreqOfMostAbundantRareFeature \|\|
		std::find(RareFeatures.begin(), RareFeatures.end(), Idx32) ==
		RareFeatures.end())
		return;

		// Update global frequencies.
		if (Freq == FreqOfMostAbundantRareFeature)
		FreqOfMostAbundantRareFeature++;

		// Update local frequencies.
		Dor1sUnsubmitted Not Done Reply Inline Actions invert the condition to reduce indentation: if (!II) return; Dor1s: invert the condition to reduce indentation: ``` if (!II) return; ```
		if (II)
		II->UpdateFeatureFrequency(Idx32);
		Dor1sUnsubmitted Done Reply Inline Actions can do return after this line and avoid `else` with extra indentation below Dor1s: can do return after this line and avoid `else` with extra indentation below
		Dor1sUnsubmitted Done Reply Inline Actions what is `1` here? would it make sense to have a static const variable with a descriptive name, e.g. `kDefaultSomething`? Dor1s: what is `1` here? would it make sense to have a static const variable with a descriptive name…
		marcelAuthorUnsubmitted Done Reply Inline Actions We are setting the frequency of Idx32 to 1. Adding a comment. marcel: We are setting the frequency of Idx32 to 1. Adding a comment.
		}

size_t NumFeatures() const { return NumAddedFeatures; }		size_t NumFeatures() const { return NumAddedFeatures; }
size_t NumFeatureUpdates() const { return NumUpdatedFeatures; }		size_t NumFeatureUpdates() const { return NumUpdatedFeatures; }

		Dor1sUnsubmitted Not Done Reply Inline Actions same point, what is `0`? a constant with a descriptive name or (less preferred) comment would be really helpful for others who come to read the code in future Dor1s: same point, what is `0`? a constant with a descriptive name or (less preferred) comment would…
		marcelAuthorUnsubmitted Done Reply Inline Actions Zero (0) is the default value for lower_bound as binary search over a vector of pairs. In this case, any value would work. marcel: Zero (0) is the default value for lower_bound as binary search over a vector of pairs. In this…
private:		private:

static const bool FeatureDebug = false;		static const bool FeatureDebug = false;

size_t GetFeature(size_t Idx) const { return InputSizesPerFeature[Idx]; }		size_t GetFeature(size_t Idx) const { return InputSizesPerFeature[Idx]; }

void ValidateFeatureSet() {		void ValidateFeatureSet() {
if (FeatureDebug)		if (FeatureDebug)
PrintFeatureSet();		PrintFeatureSet();
for (size_t Idx = 0; Idx < kFeatureSetSize; Idx++)		for (size_t Idx = 0; Idx < kFeatureSetSize; Idx++)
if (GetFeature(Idx))		if (GetFeature(Idx))
Inputs[SmallestElementPerFeature[Idx]]->Tmp++;		Inputs[SmallestElementPerFeature[Idx]]->Tmp++;
for (auto II: Inputs) {		for (auto II: Inputs) {
if (II->Tmp != II->NumFeatures)		if (II->Tmp != II->NumFeatures)
Printf("ZZZ %zd %zd\n", II->Tmp, II->NumFeatures);		Printf("ZZZ %zd %zd\n", II->Tmp, II->NumFeatures);
assert(II->Tmp == II->NumFeatures);		assert(II->Tmp == II->NumFeatures);
II->Tmp = 0;		II->Tmp = 0;
}		}
		Dor1sUnsubmitted Done Reply Inline Actions From the code below it seems like `Energy` represents entropy and the max value is 0, which we reduce depending on the actual feature frequencies. Is that correct understanding? Dor1s: From the code below it seems like `Energy` represents entropy and the max value is 0, which we…
		marcelAuthorUnsubmitted Done Reply Inline Actions Yes, we estimate the entropy over the probabilities of the features in the neighborhood of the seed. Entropy is positive. The maximum entropy is `logl(GlobalNumberOfFeatures)`. marcel: Yes, we estimate the entropy over the probabilities of the features in the neighborhood of the…
		Dor1sUnsubmitted Not Done Reply Inline Actions sorry, I don't understand. Below are the code lines changing `Energy` value: II->Energy = 0.0; II->SumIncidence = 0; // Apply add-one smoothing to locally discovered features. for (auto F : II->FeatureFreqs) { size_t LocalIncidence = F.second + 1; Energy -= LocalIncidence * logl(LocalIncidence); SumIncidence += LocalIncidence; } <...> // Add a single locally abundant feature apply add-one smoothing. size_t AbdIncidence = II->NumExecutedMutations + 1; Energy -= AbdIncidence * logl(AbdIncidence); <...> // Normalize. if (SumIncidence != 0) Energy = (Energy / SumIncidence) + logl(SumIncidence); II->Energy = (double)Energy; <...> } as I read this, I see that `Energy` should be negative in many cases? Dor1s: sorry, I don't understand. Below are the code lines changing `Energy` value: ``` II…
		marcelAuthorUnsubmitted Done Reply Inline Actions Sorry for the brevity. This is why `II->Energy` is positive. Entropy is computed as $-\sum_{i=1}^S p_i \log(p_i)$ where $p_i$ is the probability that fuzzing `II` generates an input that exercises feature $i$ and $S$ is the total number of rare features. We could estimate the probability $p_i$ as the proportion of generated inputs that exercise $i$, i.e., $\hat p_i = LocalIncidence_i / SumIncidence$. If you plug this proportion into the formula for entropy, you can compute entropy as $[-\sum_{i=1}^S LocalIncidence_i \log(LocalIncidence_i)] / SumIncidence + log(SumIncidence)$. While Energy is certainly negative before `// Normalize.`, it is positive after. Just drop me a DM. I'll send you the write up. marcel: Sorry for the brevity. This is why `II->Energy` is positive. Entropy is computed as $…
}		}

// Updates the probability distribution for the units in the corpus.		// Updates the probability distribution for the units in the corpus.
		kccUnsubmitted Done Reply Inline Actions a top-level comment explaining the computations would be nice kcc: a top-level comment explaining the computations would be nice
// Must be called whenever the corpus or unit weights are changed.		// Must be called whenever the corpus or unit weights are changed.
		vitalybukaUnsubmitted Done Reply Inline Actions It would be nice to rename variables in a such way that reader without background can understand what is going on. vitalybuka: It would be nice to rename variables in a such way that reader without background can…
//		//
// Hypothesis: units added to the corpus last are more interesting.		// Hypothesis: inputs that maximize information about globally rare features
//		// are interesting.
		vitalybukaUnsubmitted Done Reply Inline Actions InputInfo::UpdateEnergy vitalybuka: InputInfo::UpdateEnergy
// Hypothesis: inputs with infrequent features are more interesting.		void UpdateCorpusDistribution(Random &Rand) {
		vitalybukaUnsubmitted Not Done Reply Inline Actions "long double" is still there? vitalybuka: "long double" is still there?
		marcelAuthorUnsubmitted Done Reply Inline Actions Yes, keeping maximum precision during the processing to minimize the cumulative FP arithmetic error, and downcast to double once the processing is done. marcel: Yes, keeping maximum precision during the processing to minimize the cumulative FP arithmetic…
void UpdateCorpusDistribution() {		// Skip update if no seeds or rare features were added/deleted.
		// Sparse updates for local change of feature frequencies,
		kccUnsubmitted Done Reply Inline Actions use () just in case, replace 10000 with kSomething don't use random(), pass (Random &Rand) instead kcc: use () just in case, replace 10000 with kSomething don't use random(), pass (Random &Rand)…
		// i.e., randomly do not skip.
		if (!DistributionNeedsUpdate &&
		vitalybukaUnsubmitted Done Reply Inline Actions please don't reuse variables like Y here just declare as close as possible to first use, or even better with assignment vitalybuka: please don't reuse variables like Y here just declare as close as possible to first use, or…
		Dor1sUnsubmitted Done Reply Inline Actions it seems like the `Rand(kSparseEnergyUpdates)` clause is applicable to the `Entropic` case only, is that correct? Do we really need it in the vanilla case? Dor1s: it seems like the `Rand(kSparseEnergyUpdates)` clause is applicable to the `Entropic` case only…
		marcelAuthorUnsubmitted Done Reply Inline Actions Yes. `kSparseEnergyUpdates` should apply only for Entropic. marcel: Yes. `kSparseEnergyUpdates` should apply only for Entropic.
		(!Entropic.Enabled \|\| Rand(kSparseEnergyUpdates)))
		return;

		DistributionNeedsUpdate = false;
		kccUnsubmitted Done Reply Inline Actions why not for (auto It : I->RareFeaturesFreqMap) ? kcc: why not for (auto It : I->RareFeaturesFreqMap) ?

size_t N = Inputs.size();		size_t N = Inputs.size();
assert(N);		assert(N);
Intervals.resize(N + 1);		Intervals.resize(N + 1);
Weights.resize(N);		Weights.resize(N);
std::iota(Intervals.begin(), Intervals.end(), 0);		std::iota(Intervals.begin(), Intervals.end(), 0);

		bool VanillaSchedule = true;
		if (Entropic.Enabled) {
		for (auto II : Inputs) {
		kccUnsubmitted Done Reply Inline Actions don't use libc's rand(), pass (Random &Rand) instead kcc: don't use libc's rand(), pass (Random &Rand) instead
		if (II->NeedsEnergyUpdate && II->Energy != 0.0) {
		II->NeedsEnergyUpdate = false;
		II->UpdateEnergy(RareFeatures.size());
		}
		}

		for (size_t i = 0; i < N; i++) {

		if (Inputs[i]->NumFeatures == 0) {
		// If the seed doesn't represent any features, assign zero energy.
		Weights[i] = 0.;
		} else if (Inputs[i]->NumExecutedMutations / kMaxMutationFactor >
		NumExecutedMutations / Inputs.size()) {
		// If the seed was fuzzed a lot more than average, assign zero energy.
		Weights[i] = 0.;
		Dor1sUnsubmitted Done Reply Inline Actions could you please rewrite this in a more readable if-else form? Dor1s: could you please rewrite this in a more readable if-else form?
		} else {
		// Otherwise, simply assign the computed energy.
		Dor1sUnsubmitted Done Reply Inline Actions why `20`? a constant with a descriptive name or a comment would be appreciated Dor1s: why `20`? a constant with a descriptive name or a comment would be appreciated
		Weights[i] = Inputs[i]->Energy;
		}

		// If energy for all seeds is zero, fall back to vanilla schedule.
		if (Weights[i] > 0.0)
		VanillaSchedule = false;
		}
		}

		if (VanillaSchedule) {
for (size_t i = 0; i < N; i++)		for (size_t i = 0; i < N; i++)
Weights[i] = Inputs[i]->NumFeatures		Weights[i] = Inputs[i]->NumFeatures
		Dor1sUnsubmitted Done Reply Inline Actions so if there is at least one input that touches focus function, we will be always wasting time in this loop (starting on the line 472) and then falling back to the `VanillaSchedule` case? In such case I think we should just check `Options.FocusFunction` and use vanilla schedule if it's set, just because almost always there will be input(s) touching the focus function Dor1s: so if there is at least one input that touches focus function, we will be always wasting time…
? (i + 1) * (Inputs[i]->HasFocusFunction ? 1000 : 1)		? (i + 1) * (Inputs[i]->HasFocusFunction ? 1000 : 1)
: 0.;		: 0.;
		}

if (FeatureDebug) {		if (FeatureDebug) {
for (size_t i = 0; i < N; i++)		for (size_t i = 0; i < N; i++)
Printf("%zd ", Inputs[i]->NumFeatures);		Printf("%zd ", Inputs[i]->NumFeatures);
Printf("SCORE\n");		Printf("SCORE\n");
for (size_t i = 0; i < N; i++)		for (size_t i = 0; i < N; i++)
Printf("%f ", Weights[i]);		Printf("%f ", Weights[i]);
Printf("Weights\n");		Printf("Weights\n");
}		}
CorpusDistribution = std::piecewise_constant_distribution<double>(		CorpusDistribution = std::piecewise_constant_distribution<double>(
Intervals.begin(), Intervals.end(), Weights.begin());		Intervals.begin(), Intervals.end(), Weights.begin());
}		}
std::piecewise_constant_distribution<double> CorpusDistribution;		std::piecewise_constant_distribution<double> CorpusDistribution;

Vector<double> Intervals;		Vector<double> Intervals;
Vector<double> Weights;		Vector<double> Weights;

std::unordered_set<std::string> Hashes;		std::unordered_set<std::string> Hashes;
Vector<InputInfo*> Inputs;		Vector<InputInfo*> Inputs;

size_t NumAddedFeatures = 0;		size_t NumAddedFeatures = 0;
size_t NumUpdatedFeatures = 0;		size_t NumUpdatedFeatures = 0;
uint32_t InputSizesPerFeature[kFeatureSetSize];		uint32_t InputSizesPerFeature[kFeatureSetSize];
uint32_t SmallestElementPerFeature[kFeatureSetSize];		uint32_t SmallestElementPerFeature[kFeatureSetSize];

		bool DistributionNeedsUpdate = true;
		uint16_t FreqOfMostAbundantRareFeature = 0;
		uint16_t GlobalFeatureFreqs[kFeatureSetSize] = {};
		kccUnsubmitted Done Reply Inline Actions does this have to be 8-byte per feature? kcc: does this have to be 8-byte per feature?
		marcelAuthorUnsubmitted Done Reply Inline Actions Each entry is upper-bounded by the total number of generated inputs `g_NumExecutedMutations` which likely won't fit into `uint32_t`. That's why I chose size_t. However, we really only need abundance information for features with an abundance below `MostAbundant_RareFeature`. If memory footprint is a concern, we can go down to `uint16_t` at the cost of an overflow check in the hot code in `UpdateFeatureFrequency`. See top-level comment for more details. marcel: Each entry is upper-bounded by the total number of generated inputs `g_NumExecutedMutations`…
		kccUnsubmitted Done Reply Inline Actions Yea, I'd prefer uint16_t and a saturated add. Just to save some RAM kcc: Yea, I'd prefer uint16_t and a saturated add. Just to save some RAM
		vitalybukaUnsubmitted Not Done Reply Inline Actions uint16_t GlobalFeatureFreqs[kFeatureSetSize] = {}; instead of memsets it would be nice do to the same for other arrays here, but in a separate patch vitalybuka: uint16_t GlobalFeatureFreqs[kFeatureSetSize] = {}; instead of memsets it would be nice do to…
		Vector<uint32_t> RareFeatures;

std::string OutputCorpus;		std::string OutputCorpus;
};		};

} // namespace fuzzer		} // namespace fuzzer

#endif // LLVM_FUZZER_CORPUS		#endif // LLVM_FUZZER_CORPUS

compiler-rt/lib/fuzzer/FuzzerDriver.cpp

Show First 20 Lines • Show All 702 Lines • ▼ Show 20 Lines	int FuzzerDriver(int argc, char **argv, UserCallback Callback) {
if (Flags.data_flow_trace)		if (Flags.data_flow_trace)
Options.DataFlowTrace = Flags.data_flow_trace;		Options.DataFlowTrace = Flags.data_flow_trace;
if (Flags.features_dir)		if (Flags.features_dir)
Options.FeaturesDir = Flags.features_dir;		Options.FeaturesDir = Flags.features_dir;
if (Flags.collect_data_flow)		if (Flags.collect_data_flow)
Options.CollectDataFlow = Flags.collect_data_flow;		Options.CollectDataFlow = Flags.collect_data_flow;
if (Flags.stop_file)		if (Flags.stop_file)
Options.StopFile = Flags.stop_file;		Options.StopFile = Flags.stop_file;
		Options.Entropic = Flags.entropic;
		Options.EntropicFeatureFrequencyThreshold =
		(size_t)Flags.entropic_feature_frequency_threshold;
		Options.EntropicNumberOfRarestFeatures =
		(size_t)Flags.entropic_number_of_rarest_features;
		if (Options.Entropic) {
		if (!Options.FocusFunction.empty()) {
		Printf("ERROR: The parameters `--entropic` and `--focus_function` cannot "
		"be used together.\n");
		exit(1);
		}
		Printf("INFO: Running with entropic power schedule (0x%X, %d).\n",
		Options.EntropicFeatureFrequencyThreshold,
		Options.EntropicNumberOfRarestFeatures);
		}
		struct EntropicOptions Entropic;
		Entropic.Enabled = Options.Entropic;
		Entropic.FeatureFrequencyThreshold =
		Options.EntropicFeatureFrequencyThreshold;
		Entropic.NumberOfRarestFeatures = Options.EntropicNumberOfRarestFeatures;

unsigned Seed = Flags.seed;		unsigned Seed = Flags.seed;
// Initialize Seed.		// Initialize Seed.
if (Seed == 0)		if (Seed == 0)
Seed =		Seed =
std::chrono::system_clock::now().time_since_epoch().count() + GetPid();		std::chrono::system_clock::now().time_since_epoch().count() + GetPid();
if (Flags.verbosity)		if (Flags.verbosity)
Printf("INFO: Seed: %u\n", Seed);		Printf("INFO: Seed: %u\n", Seed);

if (Flags.collect_data_flow && !Flags.fork && !Flags.merge) {		if (Flags.collect_data_flow && !Flags.fork && !Flags.merge) {
if (RunIndividualFiles)		if (RunIndividualFiles)
return CollectDataFlow(Flags.collect_data_flow, Flags.data_flow_trace,		return CollectDataFlow(Flags.collect_data_flow, Flags.data_flow_trace,
ReadCorpora({}, *Inputs));		ReadCorpora({}, *Inputs));
else		else
return CollectDataFlow(Flags.collect_data_flow, Flags.data_flow_trace,		return CollectDataFlow(Flags.collect_data_flow, Flags.data_flow_trace,
ReadCorpora(*Inputs, {}));		ReadCorpora(*Inputs, {}));
}		}

Random Rand(Seed);		Random Rand(Seed);
auto *MD = new MutationDispatcher(Rand, Options);		auto *MD = new MutationDispatcher(Rand, Options);
auto *Corpus = new InputCorpus(Options.OutputCorpus);		auto *Corpus = new InputCorpus(Options.OutputCorpus, Entropic);
auto F = new Fuzzer(Callback, Corpus, *MD, Options);		auto F = new Fuzzer(Callback, Corpus, *MD, Options);

for (auto &U: Dictionary)		for (auto &U: Dictionary)
if (U.size() <= Word::GetMaxSize())		if (U.size() <= Word::GetMaxSize())
MD->AddWordToManualDictionary(Word(U.data(), U.size()));		MD->AddWordToManualDictionary(Word(U.data(), U.size()));

// Threads are only supported by Chrome. Don't use them with emscripten		// Threads are only supported by Chrome. Don't use them with emscripten
// for now.		// for now.
▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

compiler-rt/lib/fuzzer/FuzzerFlags.def

Show First 20 Lines • Show All 147 Lines • ▼ Show 20 Lines	FUZZER_FLAG_STRING(exit_on_item, "Exit if an item with a given sha1 sum"
"Used primarily for testing libFuzzer itself.")		"Used primarily for testing libFuzzer itself.")
FUZZER_FLAG_INT(ignore_remaining_args, 0, "If 1, ignore all arguments passed "		FUZZER_FLAG_INT(ignore_remaining_args, 0, "If 1, ignore all arguments passed "
"after this one. Useful for fuzzers that need to do their own "		"after this one. Useful for fuzzers that need to do their own "
"argument parsing.")		"argument parsing.")
FUZZER_FLAG_STRING(focus_function, "Experimental. "		FUZZER_FLAG_STRING(focus_function, "Experimental. "
"Fuzzing will focus on inputs that trigger calls to this function. "		"Fuzzing will focus on inputs that trigger calls to this function. "
"If -focus_function=auto and -data_flow_trace is used, libFuzzer "		"If -focus_function=auto and -data_flow_trace is used, libFuzzer "
"will choose the focus functions automatically.")		"will choose the focus functions automatically.")
		FUZZER_FLAG_INT(entropic, 0, "Experimental. Enables entropic power schedule.")
		vitalybukaUnsubmitted Done Reply Inline Actions entropic -> focus_rare_features Not sure how, it would be nice to rename sparse_energy_updates as something meaningful to libfuzzer user, to make it explain behavior change, not implementation details like now. vitalybuka: entropic -> focus_rare_features Not sure how, it would be nice to rename…
		vitalybukaUnsubmitted Not Done Reply Inline Actions many of comments are marked as "Done" but I see no changes. vitalybuka: many of comments are marked as "Done" but I see no changes.
		marcelAuthorUnsubmitted Done Reply Inline Actions Tried to address all comments either inline or in the summary. In this case, I wrote In D73776#1921184, @marcel wrote: We keep the entropic option, though. Hope this is okay. marcel: Tried to address all comments either inline or in the summary. In this case, I wrote >>! In…
		FUZZER_FLAG_INT(entropic_feature_frequency_threshold, 0xFF, "Experimental. If "
		"entropic is enabled, all features which are observed less often than "
		"the specified value are considered as rare.")
		FUZZER_FLAG_INT(entropic_number_of_rarest_features, 100, "Experimental. If "
		"entropic is enabled, we keep track of the frequencies only for the "
		"Top-X least abundant features (union features that are considered as "
		"rare).")

FUZZER_FLAG_INT(analyze_dict, 0, "Experimental")		FUZZER_FLAG_INT(analyze_dict, 0, "Experimental")
FUZZER_DEPRECATED_FLAG(use_clang_coverage)		FUZZER_DEPRECATED_FLAG(use_clang_coverage)
FUZZER_FLAG_STRING(data_flow_trace, "Experimental: use the data flow trace")		FUZZER_FLAG_STRING(data_flow_trace, "Experimental: use the data flow trace")
FUZZER_FLAG_STRING(collect_data_flow,		FUZZER_FLAG_STRING(collect_data_flow,
"Experimental: collect the data flow trace")		"Experimental: collect the data flow trace")

compiler-rt/lib/fuzzer/FuzzerLoop.cpp

Show First 20 Lines • Show All 469 Lines • ▼ Show 20 Lines	bool Fuzzer::RunOne(const uint8_t *Data, size_t Size, bool MayDeleteFile,
ExecuteCallback(Data, Size);		ExecuteCallback(Data, Size);

UniqFeatureSetTmp.clear();		UniqFeatureSetTmp.clear();
size_t FoundUniqFeaturesOfII = 0;		size_t FoundUniqFeaturesOfII = 0;
size_t NumUpdatesBefore = Corpus.NumFeatureUpdates();		size_t NumUpdatesBefore = Corpus.NumFeatureUpdates();
TPC.CollectFeatures([&](size_t Feature) {		TPC.CollectFeatures([&](size_t Feature) {
if (Corpus.AddFeature(Feature, Size, Options.Shrink))		if (Corpus.AddFeature(Feature, Size, Options.Shrink))
UniqFeatureSetTmp.push_back(Feature);		UniqFeatureSetTmp.push_back(Feature);
		if (Options.Entropic)
		Corpus.UpdateFeatureFrequency(II, Feature);
if (Options.ReduceInputs && II)		if (Options.ReduceInputs && II)
if (std::binary_search(II->UniqFeatureSet.begin(),		if (std::binary_search(II->UniqFeatureSet.begin(),
II->UniqFeatureSet.end(), Feature))		II->UniqFeatureSet.end(), Feature))
FoundUniqFeaturesOfII++;		FoundUniqFeaturesOfII++;
});		});
if (FoundUniqFeatures)		if (FoundUniqFeatures)
*FoundUniqFeatures = FoundUniqFeaturesOfII;		*FoundUniqFeatures = FoundUniqFeaturesOfII;
PrintPulseAndReportSlowInput(Data, Size);		PrintPulseAndReportSlowInput(Data, Size);
▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	void Fuzzer::PrintStatusForNewUnit(const Unit &U, const char *Text) {
PrintStats(Text, "");		PrintStats(Text, "");
if (Options.Verbosity) {		if (Options.Verbosity) {
Printf(" L: %zd/%zd ", U.size(), Corpus.MaxInputSize());		Printf(" L: %zd/%zd ", U.size(), Corpus.MaxInputSize());
MD.PrintMutationSequence();		MD.PrintMutationSequence();
Printf("\n");		Printf("\n");
}		}
}		}

void Fuzzer::ReportNewCoverage(InputInfo *II, const Unit &U) {		void Fuzzer::ReportNewCoverage(InputInfo *II, const Unit &U) {
		kccUnsubmitted Done Reply Inline Actions do you need this change? kcc: do you need this change?
		marcelAuthorUnsubmitted Done Reply Inline Actions Unrelated. This is just fixing a problem where LibFuzzer prints REDUCE more often than it should. marcel: Unrelated. This is just fixing a problem where LibFuzzer prints REDUCE more often than it…
		kccUnsubmitted Done Reply Inline Actions I'd prefer to not mix unrelated changes in one diff -- makes the code review quadratic. Please contribute this one separately (I am not 100% sure I understand it) kcc: I'd prefer to not mix unrelated changes in one diff -- makes the code review quadratic. Please…
II->NumSuccessfullMutations++;		II->NumSuccessfullMutations++;
MD.RecordSuccessfulMutationSequence();		MD.RecordSuccessfulMutationSequence();
PrintStatusForNewUnit(U, II->Reduced ? "REDUCE" : "NEW ");		PrintStatusForNewUnit(U, II->Reduced ? "REDUCE" : "NEW ");
WriteToOutputCorpus(U);		WriteToOutputCorpus(U);
NumberOfNewUnitsAdded++;		NumberOfNewUnitsAdded++;
CheckExitOnSrcPosOrItem(); // Check only after the unit is saved to corpus.		CheckExitOnSrcPosOrItem(); // Check only after the unit is saved to corpus.
LastCorpusUpdateRun = TotalNumberOfRuns;		LastCorpusUpdateRun = TotalNumberOfRuns;
}		}
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	void Fuzzer::MutateAndTestOne() {

assert(MaxMutationLen > 0);		assert(MaxMutationLen > 0);

size_t CurrentMaxMutationLen =		size_t CurrentMaxMutationLen =
Min(MaxMutationLen, Max(U.size(), TmpMaxMutationLen));		Min(MaxMutationLen, Max(U.size(), TmpMaxMutationLen));
assert(CurrentMaxMutationLen > 0);		assert(CurrentMaxMutationLen > 0);

for (int i = 0; i < Options.MutateDepth; i++) {		for (int i = 0; i < Options.MutateDepth; i++) {
if (TotalNumberOfRuns >= Options.MaxNumberOfRuns)		if (TotalNumberOfRuns >= Options.MaxNumberOfRuns)
		kccUnsubmitted Not Done Reply Inline Actions for consistency, please use the C++ interface for getting current time (as elsewhere in the code). But see above about my comment on time in gneral. kcc: for consistency, please use the C++ interface for getting current time (as elsewhere in the…
break;		break;
MaybeExitGracefully();		MaybeExitGracefully();
size_t NewSize = 0;		size_t NewSize = 0;
if (II.HasFocusFunction && !II.DataFlowTraceForFocusFunction.empty() &&		if (II.HasFocusFunction && !II.DataFlowTraceForFocusFunction.empty() &&
Size <= CurrentMaxMutationLen)		Size <= CurrentMaxMutationLen)
NewSize = MD.MutateWithMask(CurrentUnitData, Size, Size,		NewSize = MD.MutateWithMask(CurrentUnitData, Size, Size,
II.DataFlowTraceForFocusFunction);		II.DataFlowTraceForFocusFunction);

// If MutateWithMask either failed or wasn't called, call default Mutate.		// If MutateWithMask either failed or wasn't called, call default Mutate.
if (!NewSize)		if (!NewSize)
NewSize = MD.Mutate(CurrentUnitData, Size, CurrentMaxMutationLen);		NewSize = MD.Mutate(CurrentUnitData, Size, CurrentMaxMutationLen);
assert(NewSize > 0 && "Mutator returned empty unit");		assert(NewSize > 0 && "Mutator returned empty unit");
assert(NewSize <= CurrentMaxMutationLen && "Mutator return oversized unit");		assert(NewSize <= CurrentMaxMutationLen && "Mutator return oversized unit");
Size = NewSize;		Size = NewSize;
II.NumExecutedMutations++;		II.NumExecutedMutations++;
		Corpus.IncrementNumExecutedMutations();

bool FoundUniqFeatures = false;		bool FoundUniqFeatures = false;
bool NewCov = RunOne(CurrentUnitData, Size, /MayDeleteFile=/true, &II,		bool NewCov = RunOne(CurrentUnitData, Size, /MayDeleteFile=/true, &II,
&FoundUniqFeatures);		&FoundUniqFeatures);
TryDetectingAMemoryLeak(CurrentUnitData, Size,		TryDetectingAMemoryLeak(CurrentUnitData, Size,
/DuringInitialCorpusExecution/ false);		/DuringInitialCorpusExecution/ false);
if (NewCov) {		if (NewCov) {
ReportNewCoverage(&II, {CurrentUnitData, CurrentUnitData + Size});		ReportNewCoverage(&II, {CurrentUnitData, CurrentUnitData + Size});
break; // We will mutate this input more in the next rounds.		break; // We will mutate this input more in the next rounds.
}		}
if (Options.ReduceDepth && !FoundUniqFeatures)		if (Options.ReduceDepth && !FoundUniqFeatures)
break;		break;
}		}

		Dor1sUnsubmitted Done Reply Inline Actions does it always need update, even when new coverage wasn't observed? Dor1s: does it always need update, even when new coverage wasn't observed?
		marcelAuthorUnsubmitted Done Reply Inline Actions For II, the local feature frequencies have changed. So we schedule an update. However, it will only be updated when the distribution needs an update, and we do not set `DistributionNeedsUpdate` here. marcel: For II, the local feature frequencies have changed. So we schedule an update. However, it will…
		II.NeedsEnergyUpdate = true;
}		}

void Fuzzer::PurgeAllocator() {		void Fuzzer::PurgeAllocator() {
if (Options.PurgeAllocatorIntervalSec < 0 \|\| !EF->__sanitizer_purge_allocator)		if (Options.PurgeAllocatorIntervalSec < 0 \|\| !EF->__sanitizer_purge_allocator)
return;		return;
if (duration_cast<seconds>(system_clock::now() -		if (duration_cast<seconds>(system_clock::now() -
LastAllocatorPurgeAttemptTime)		LastAllocatorPurgeAttemptTime)
.count() < Options.PurgeAllocatorIntervalSec)		.count() < Options.PurgeAllocatorIntervalSec)
▲ Show 20 Lines • Show All 152 Lines • Show Last 20 Lines

compiler-rt/lib/fuzzer/FuzzerOptions.h

Show All 38 Lines	struct FuzzingOptions {
bool Shrink = false;		bool Shrink = false;
bool ReduceInputs = false;		bool ReduceInputs = false;
int ReloadIntervalSec = 1;		int ReloadIntervalSec = 1;
bool ShuffleAtStartUp = true;		bool ShuffleAtStartUp = true;
bool PreferSmall = true;		bool PreferSmall = true;
size_t MaxNumberOfRuns = -1L;		size_t MaxNumberOfRuns = -1L;
int ReportSlowUnits = 10;		int ReportSlowUnits = 10;
bool OnlyASCII = false;		bool OnlyASCII = false;
		bool Entropic = false;
		kccUnsubmitted Done Reply Inline Actions I'd prefer if these names were more descriptive (ok if longer) and have "entropic" in it. E.g. Entropic EntropicFeatureFrequencyThreshold EntropicNumberOfRarestFeatures of course, make the command line parameters match kcc: I'd prefer if these names were more descriptive (ok if longer) and have "entropic" in it. E.g.
		size_t EntropicFeatureFrequencyThreshold = 0xFF;
		size_t EntropicNumberOfRarestFeatures = 100;
std::string OutputCorpus;		std::string OutputCorpus;
std::string ArtifactPrefix = "./";		std::string ArtifactPrefix = "./";
std::string ExactArtifactPath;		std::string ExactArtifactPath;
std::string ExitOnSrcPos;		std::string ExitOnSrcPos;
std::string ExitOnItem;		std::string ExitOnItem;
std::string FocusFunction;		std::string FocusFunction;
std::string DataFlowTrace;		std::string DataFlowTrace;
std::string CollectDataFlow;		std::string CollectDataFlow;
Show All 28 Lines

compiler-rt/lib/fuzzer/tests/FuzzerUnittest.cpp

Show First 20 Lines • Show All 586 Lines • ▼ Show 20 Lines	TEST(FuzzerUtil, Base64) {
EXPECT_EQ("YWJjeA==", Base64({'a', 'b', 'c', 'x'}));		EXPECT_EQ("YWJjeA==", Base64({'a', 'b', 'c', 'x'}));
EXPECT_EQ("YWJjeHk=", Base64({'a', 'b', 'c', 'x', 'y'}));		EXPECT_EQ("YWJjeHk=", Base64({'a', 'b', 'c', 'x', 'y'}));
EXPECT_EQ("YWJjeHl6", Base64({'a', 'b', 'c', 'x', 'y', 'z'}));		EXPECT_EQ("YWJjeHl6", Base64({'a', 'b', 'c', 'x', 'y', 'z'}));
}		}

TEST(Corpus, Distribution) {		TEST(Corpus, Distribution) {
DataFlowTrace DFT;		DataFlowTrace DFT;
Random Rand(0);		Random Rand(0);
std::unique_ptr<InputCorpus> C(new InputCorpus(""));		struct EntropicOptions Entropic = {false, 0xFF, 100};
		std::unique_ptr<InputCorpus> C(new InputCorpus("", Entropic));
		kccUnsubmitted Done Reply Inline Actions When running 'ninja check-fuzzer' this test fails for me: That's weird: why does the functionality change with Entropic off? [5/9] Running Fuzzer unit tests FAIL: LLVMFuzzer-Unittest :: ./Fuzzer-x86_64-Test/Corpus.Distribution (48 of 55) TEST 'LLVMFuzzer-Unittest :: ./Fuzzer-x86_64-Test/Corpus.Distribution' FAILED ** Note: Google Test filter = Corpus.Distribution [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from Corpus [ RUN ] Corpus.Distribution /usr/local/google/home/kcc/llvm-project/compiler-rt/lib/fuzzer/tests/FuzzerUnittest.cpp:609: Failure Expected: (Hist[i]) > (TriesPerUnit / N / 3), actual: 0 vs 2184 /usr/local/google/home/kcc/llvm-project/compiler-rt/lib/fuzzer/tests/FuzzerUnittest.cpp:609: Failure Expected: (Hist[i]) > (TriesPerUnit / N / 3), actual: 0 vs 2184 /usr/local/google/home/kcc/llvm-project/compiler-rt/lib/fuzzer/tests/FuzzerUnittest.cpp:609: Failure kcc:** When running 'ninja check-fuzzer' this test fails for me: That's weird: why does the…
		marcelAuthorUnsubmitted Done Reply Inline Actions Instead of updating the corpus distribution every time it changes (e.g., FuzzerCorpus.h#L114 and FuzzerCorpus.h#L165), entropic schedules that update by setting a flag. For efficiency, only when (and just before) a new input is chosen, the corpus distribution is actually updated. I had this update in ChooseUnitToMutate which calls ChooseUnitIdxToMutate. Now moved the call to UpdateCorpusDistribution to ChooseUnitIdxToMutate (which is used by the test case). All 40 fuzzer unit tests pass. marcel: Instead of updating the corpus distribution every time it changes (e.g., FuzzerCorpus.h#L114…
size_t N = 10;		size_t N = 10;
size_t TriesPerUnit = 1<<16;		size_t TriesPerUnit = 1<<16;
for (size_t i = 0; i < N; i++)		for (size_t i = 0; i < N; i++)
C->AddToCorpus(Unit{static_cast<uint8_t>(i)}, 1, false, false, {}, DFT,		C->AddToCorpus(Unit{static_cast<uint8_t>(i)}, 1, false, false, {}, DFT,
nullptr);		nullptr);

Vector<size_t> Hist(N);		Vector<size_t> Hist(N);
for (size_t i = 0; i < N * TriesPerUnit; i++) {		for (size_t i = 0; i < N * TriesPerUnit; i++) {
▲ Show 20 Lines • Show All 441 Lines • ▼ Show 20 Lines	TEST(FuzzerCommand, SetOutput) {

Cmd.combineOutAndErr();		Cmd.combineOutAndErr();
EXPECT_TRUE(Cmd.isOutAndErrCombined());		EXPECT_TRUE(Cmd.isOutAndErrCombined());

CmdLine = Cmd.toString();		CmdLine = Cmd.toString();
EXPECT_EQ(CmdLine, makeCmdLine("", ">thud 2>&1"));		EXPECT_EQ(CmdLine, makeCmdLine("", ">thud 2>&1"));
}		}

		TEST(Entropic, UpdateFrequency) {
		const size_t One = 1, Two = 2;
		const size_t FeatIdx1 = 0, FeatIdx2 = 42, FeatIdx3 = 12, FeatIdx4 = 26;
		size_t Index;
		// Create input corpus with default entropic configuration
		struct EntropicOptions Entropic = {true, 0xFF, 100};
		std::unique_ptr<InputCorpus> C(new InputCorpus("", Entropic));
		InputInfo *II = new InputInfo();

		C->AddRareFeature(FeatIdx1);
		C->UpdateFeatureFrequency(II, FeatIdx1);
		EXPECT_EQ(II->FeatureFreqs.size(), One);
		C->AddRareFeature(FeatIdx2);
		C->UpdateFeatureFrequency(II, FeatIdx1);
		C->UpdateFeatureFrequency(II, FeatIdx2);
		EXPECT_EQ(II->FeatureFreqs.size(), Two);
		EXPECT_EQ(II->FeatureFreqs[0].second, 2);
		EXPECT_EQ(II->FeatureFreqs[1].second, 1);

		C->AddRareFeature(FeatIdx3);
		C->AddRareFeature(FeatIdx4);
		C->UpdateFeatureFrequency(II, FeatIdx3);
		C->UpdateFeatureFrequency(II, FeatIdx3);
		C->UpdateFeatureFrequency(II, FeatIdx3);
		C->UpdateFeatureFrequency(II, FeatIdx4);

		for (Index = 1; Index < II->FeatureFreqs.size(); Index++)
		EXPECT_LT(II->FeatureFreqs[Index - 1].first, II->FeatureFreqs[Index].first);

		II->DeleteFeatureFreq(FeatIdx3);
		for (Index = 1; Index < II->FeatureFreqs.size(); Index++)
		EXPECT_LT(II->FeatureFreqs[Index - 1].first, II->FeatureFreqs[Index].first);
		}

		double SubAndSquare(double X, double Y) {
		double R = X - Y;
		R = R * R;
		return R;
		}

		TEST(Entropic, ComputeEnergy) {
		const double Precision = 0.01;
		struct EntropicOptions Entropic = {true, 0xFF, 100};
		std::unique_ptr<InputCorpus> C(new InputCorpus("", Entropic));
		InputInfo *II = new InputInfo();
		Vector<std::pair<uint32_t, uint16_t>> FeatureFreqs = {{1, 3}, {2, 3}, {3, 3}};
		II->FeatureFreqs = FeatureFreqs;
		kccUnsubmitted Done Reply Inline Actions Will a simpler syntax work, e.g.: {1,3}, {2,3} kcc: Will a simpler syntax work, e.g.: {1,3}, {2,3}
		II->NumExecutedMutations = 0;
		II->UpdateEnergy(4);
		EXPECT_LT(SubAndSquare(II->Energy, 1.450805), Precision);

		II->NumExecutedMutations = 9;
		II->UpdateEnergy(5);
		EXPECT_LT(SubAndSquare(II->Energy, 1.525496), Precision);

		II->FeatureFreqs[0].second++;
		II->FeatureFreqs.push_back(std::pair<uint32_t, uint16_t>(42, 6));
		II->NumExecutedMutations = 20;
		II->UpdateEnergy(10);
		EXPECT_LT(SubAndSquare(II->Energy, 1.792831), Precision);
		}

int main(int argc, char **argv) {		int main(int argc, char **argv) {
testing::InitGoogleTest(&argc, argv);		testing::InitGoogleTest(&argc, argv);
return RUN_ALL_TESTS();		return RUN_ALL_TESTS();
}		}

This is an archive of the discontinued LLVM Phabricator instance.

Entropic: Boosting LibFuzzer PerformanceClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 264969

compiler-rt/lib/fuzzer/FuzzerCorpus.h

compiler-rt/lib/fuzzer/FuzzerDriver.cpp

compiler-rt/lib/fuzzer/FuzzerFlags.def

compiler-rt/lib/fuzzer/FuzzerLoop.cpp

compiler-rt/lib/fuzzer/FuzzerOptions.h

compiler-rt/lib/fuzzer/tests/FuzzerUnittest.cpp

Entropic: Boosting LibFuzzer Performance
ClosedPublic