This is an archive of the discontinued LLVM Phabricator instance.

Refactor mutation strategies into a standalone library

Authored by aarongreen on May 13 2021, 2:52 PM.



This change introduces libMutagen/libclang_rt.mutagen.a as a subset of libFuzzer/libclang_rt.fuzzer.a. This library contains only the fuzzing strategies used by libFuzzer to produce new test inputs from provided inputs, dictionaries, and SanitizerCoverage feedback.

Most of this change is simply moving sections of code to one side or the other of the library boundary. The only meaningful new code is:

  • The Mutagen.h interface and its implementation in Mutagen.cpp.
  • The following methods in MutagenDispatcher.cpp:
    • UseCmp
    • UseMemmem
    • SetCustomMutator
    • SetCustomCrossOver
    • LateInitialize (similar to the MutationDispatcher's original constructor)
    • Mutate_AddWordFromTORC (uses callbacks instead of accessing TPC directly)
    • StartMutationSequence
    • MutationSequence
    • DictionaryEntrySequence
    • RecommendDictionary
    • RecommendDictionaryEntry
  • FuzzerMutate.cpp (which now justs sets callbacks and handles printing)
  • MutagenUnittest.cpp (which adds tests of Mutagen.h)

A note on performance: This change was tested with a 100 passes of test/fuzzer/LargeTest.cpp with 1000 runs per pass, both with and without the change. The running time distribution was qualitatively similar both with and without the change, and the average difference was within 30 microseconds (2.240 ms/run vs 2.212 ms/run, respectively). Both times were much higher than observed with the fully optimized system clang (~0.38 ms/run), most likely due to the combination of CMake "dev mode" settings (e.g. CMAKE_BUILD_TYPE="Debug", LLVM_ENABLE_LTO=OFF, etc.). The difference between the two versions built similarly seems to be "in the noise" and suggests no meaningful performance degradation.

Diff Detail

Event Timeline

aarongreen created this revision.May 13 2021, 2:52 PM
aarongreen requested review of this revision.May 13 2021, 2:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 13 2021, 2:52 PM
Herald added a subscriber: Restricted Project. · View Herald Transcript
morehouse added inline comments.May 18 2021, 11:43 AM

These functions appear unused.


I think we have an off-by-one error here.

Needed does not include the null terminator, so ToCopy is the number of bytes *not* including the null byte. But the function comment says it should include the null byte.

i.e. we are writing one extra byte (the null terminator) than we said we would.


Why initialize lazily?

If explicitly initialize, we can avoid the if (!Initialized) check in Mutate.


This is missing a recent bugfix for libFuzzer + MSan:

if (EF->__msan_unpoison)
  EF->__msan_unpoison(Data, Size);
if (EF->__msan_unpoison_param)

Similar MSan bugfix is missing here.


This line is missing a recent off-by-one bugfix.


Maybe a Sequence::clear() function would make sense.


This line is missing a recent bugfix.


It's confusing what's going on in this function. I think it might be clearer if we make Sequence into a class type that manages its own members.

21 ↗(On Diff #345294)

This appears unused. Also, we don't we use the one in FuzzerUtil?




Is the CrossOver test intentionally deleted?

aarongreen marked 14 inline comments as done.
aarongreen added inline comments.

They are used by FuzzerDriver and FuzzerUnittest. They were always implemented in FuzzerUtil.cpp, but declared in the old FuzzerDictionary.h. When the latter file moved "over the line" to mutagen/MutagenDictionary, it made sense to relocate them to FuzzerUtil.h.


Obsolete with MutagenSequence.


I wanted this to be snprintf-like, but I was worried about performance. That's a bit silly: we print mutation and dictionary entry sequences infrequently enough that the right way to go is just to use snprintf and have done.


It was too allow UseCmp, UseMemmem, SetCustomMutator, and SetCustomCrossOver to be called between the constructor and Mutate.

But, the need for msan_unposion and msan_unpoison_param has pushed me over the edge: I've elected to create an 'LLVMMutagenConfiguration` struct instead, that has all these options and function pointers and can be set together by the constructor.


See the new mutagen/MutagenSequence.h.

21 ↗(On Diff #345294)

Whoops, good catch. At one point I tried moving the TORCs "across the line". For performance reasons, I decided to keep them closer to the TPC and use the From* callbacks instead.

More generally, I've tried hard to minimize the dependences of libMutagen on libFuzzer to make it as standalone as possible. At the moment, there's a few header dependencies for things I didn't want to duplicate.

The ideal, final state would be for libMutagen not to depend on any libFuzzer source, and libFuzzer to only depend on Mutagen.h.


I don't disagree, but I think that kind of clean up so happen in a different change. This is a straight copy/paste from FuzzerUtilWindows.cpp on main.


Nope, I think I copied "FuzzerMutate" tests, and then deleted everything that looked like a mutate test, including CrossOver. It's been added back to MutagenUnittest, while the Hash test has been restored to FuzzerUnittest.

3 out of the 4 lines clang-tidy complains about match what's in other source files on main; this change addresses the last one.

morehouse accepted this revision.May 24 2021, 10:21 AM

Please remove libMutagen.a from the diff before landing since we don't have libFuzzer.a in the current tree either.

This revision is now accepted and ready to land.May 24 2021, 10:21 AM

Please remove libMutagen.a from the diff before landing since we don't have libFuzzer.a in the current tree either.

Or better yet, I should fix my gitignore so I stop making that mistake. :|

This revision was landed with ongoing or failed builds.May 26 2021, 1:28 PM
This revision was automatically updated to reflect the committed changes.

Multiples tests are failing on the bots:

Failed Tests (3):
  libFuzzer :: magic-separator.test
  libFuzzer :: noasan-strstr.test
  libFuzzer :: strstr.test

Also on Windows:

Failed Tests (25):
  libFuzzer :: bad-strcmp.test
  libFuzzer :: fork.test
  libFuzzer :: fuzzer-customcrossover.test
  libFuzzer :: fuzzer-customcrossoverandmutate.test
  libFuzzer :: fuzzer-dict.test
  libFuzzer :: fuzzer-dirs.test
  libFuzzer :: fuzzer-fdmask.test
  libFuzzer :: fuzzer-finalstats.test
  libFuzzer :: fuzzer-flags.test
  libFuzzer :: fuzzer-oom.test
  libFuzzer :: fuzzer-seed.test
  libFuzzer :: fuzzer-threaded.test
  libFuzzer :: large.test
  libFuzzer :: len_control.test
  libFuzzer :: max-number-of-runs.test
  libFuzzer :: print-func.test
  libFuzzer :: recommended-dictionary.test
  libFuzzer :: reduce_inputs.test
  libFuzzer :: seed_inputs.test
  libFuzzer :: shrink.test
  libFuzzer :: strstr.test
  libFuzzer :: three-bytes.test
  libFuzzer :: trace-malloc-2.test
  libFuzzer :: trace-malloc-unbalanced.test
  libFuzzer :: trace-malloc.test

Needed to compile on Windows.

aarongreen reopened this revision.May 27 2021, 11:58 AM
This revision is now accepted and ready to land.May 27 2021, 11:58 AM

Fixed an issue with msan function pointers being uninitialized.

aarongreen marked an inline comment as done.

Add no-sanitize-all to from* functions.

The previous patch was a long-shot to try to get clang-tidy to find clang/clang.h. It didn't work, so there's no reason to modify tests/CMakeLists.txt. The patch reverts it.

morehouse added inline comments.Jun 3 2021, 10:29 AM


It can't hurt, I'm just curious. AFAIK we don't normally compile the libFuzzer code itself with sanitizers anymore.


Does the unittest run with MSan currently?

aarongreen marked 2 inline comments as done.

Okay, so after return from a few weeks vacation, I've tried to examine the three tests that failed earlier in detail. I added counters for additions to the TORCs, MMT, and ValueProfileMap to ensure no signal was being lost, and compared the log of successful mutation sequences before and after the change.

strstr.test and nonasan-strstr.test produce exactly the same output with and without this change, modulo things like memory addresses.

The output for the magic-separator.test on the other hand, does differ slightly. With this change, it appears to execute *faster* by the slightest amount and therefore ends up producing a different sequence of mutations. My best guess is the TimeOfUnit for one or more inputs are hovering right around one of the edges in InputInfo::UpdateEnergy, resulting in a different distribution. The result is the failure is found in about half as many iterations with the change as without.

I can get the test to produce the same sequence of mutations by adding microsecond delays after ~1700 runs.

Given all of this, I believe the patch should pass the tests now and can be relanded. I have some residual uncertainty from the fact that I haven't been able to reproduce the failures locally, but given the data I have, I can't find any other discrepancy introduced by the change.


I was concerned that the calls to From* might be introducing noise into the TORCs/MMT. But, as you say, none of this is instrumented. I'll remove them.


Not without a lot of hoop-jumping to rebuild libcxx, gtest, libFuzzer, etc. with MSan enabled. It's not really worth the hassle, so I'm changing the preprocessor condition to instead emit a compilation error if you try to compile the unit test with -fsanitize=memory.

This revision was landed with ongoing or failed builds.Jul 2 2021, 9:20 AM
This revision was automatically updated to reflect the committed changes.
aarongreen reopened this revision.Jul 2 2021, 10:07 AM
This revision is now accepted and ready to land.Jul 2 2021, 10:07 AM

Update: I'm still having trouble reproducing the Windows-only failure in a handful of integration tests. I'm de-prioritizing this just a bit on my end as it's not strictly necessary for a near-term demo. I will return and figure the Windows issue out, though, as it *is* necessary to land this change and make it available to Fuchsia downstream.