This is an archive of the discontinued LLVM Phabricator instance.

[libFuzzer] Extend ChangeBinaryInteger mutator to support overwriting selected input with predefined integers.
Needs ReviewPublic

Authored by dokyungs on Aug 21 2020, 10:23 AM.

Details

Summary

(Experimental - Uploading this to get early feedback before a large-scale experiment.)

This patch extends the ChangeBinaryInteger mutator to support overwriting the selected input with predefined integers. The rationale for this heuristic is that certain byte (word, qword, or qword) overwrite at a specific location (with "magic" integers) in a large input may make an invalid input valid, potentially triggering new neighbor code paths.

Currently, triggering such an overwrite is costly in libFuzzer. ChangeBinaryInteger mutator may do the same, but only with a low probability, because the chosen byte (word, dword, or qword) must already be an integer ranging from -10 to 10.

CopyPart/CrossOver mutator may also effectively do the same, but only if these predefined integers are found in any of the corpus inputs; even if the corpus inputs do contain the predefined integers, the chances are much narrower because a specific location and a specific width have to be selected.

InsertRepeatedBytes combined with EraseBytes mutators (or other combinations of existing mutators) may eventually trigger the desired change, but still the probability is low, as the probabilities of different mutators multiply.

This patch allows to find the desired input in a single mutation (as tested by the accompanying test - overwrite-bytes.test), effectively increasing the probability of finding the desired input given a corpus input.

Diff Detail

Unit TestsFailed

Event Timeline

dokyungs created this revision.Aug 21 2020, 10:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 21 2020, 10:23 AM
Herald added a subscriber: Restricted Project. · View Herald Transcript
dokyungs requested review of this revision.Aug 21 2020, 10:23 AM
morehouse added inline comments.Aug 21 2020, 5:29 PM
compiler-rt/lib/fuzzer/FuzzerMutate.cpp
454

I think we should add more. What does honggfuzz use?

compiler-rt/test/fuzzer/OverwriteBytesTest.cpp
26

Probably the large input defeats our memcmp hook, but we should be careful that it isn't the reason we can pass this test.

compiler-rt/test/fuzzer/OverwriteBytesTest.h
2

Why do we need such a large sequence of bytes, and how did you generate it?

compiler-rt/test/fuzzer/overwrite-bytes.test
3

Can we hard code the magics in the C++?

7

100M runs seems high. How long does it take if we hit that limit?

dokyungs added inline comments.Aug 24 2020, 3:19 PM
compiler-rt/lib/fuzzer/FuzzerMutate.cpp
454

afl uses the following magic values.

#define INTERESTING_8 \
  -128,          /* Overflow signed 8-bit when decremented  */ \
  -1,            /*                                         */ \
   0,            /*                                         */ \
   1,            /*                                         */ \
   16,           /* One-off with common buffer size         */ \
   32,           /* One-off with common buffer size         */ \
   64,           /* One-off with common buffer size         */ \
   100,          /* One-off with common buffer size         */ \
   127           /* Overflow signed 8-bit when incremented  */

#define INTERESTING_16 \
  -32768,        /* Overflow signed 16-bit when decremented */ \
  -129,          /* Overflow signed 8-bit                   */ \
   128,          /* Overflow signed 8-bit                   */ \
   255,          /* Overflow unsig 8-bit when incremented   */ \
   256,          /* Overflow unsig 8-bit                    */ \
   512,          /* One-off with common buffer size         */ \
   1000,         /* One-off with common buffer size         */ \
   1024,         /* One-off with common buffer size         */ \
   4096,         /* One-off with common buffer size         */ \
   32767         /* Overflow signed 16-bit when incremented */

#define INTERESTING_32 \
  -2147483648LL, /* Overflow signed 32-bit when decremented */ \
  -100663046,    /* Large negative number (endian-agnostic) */ \
  -32769,        /* Overflow signed 16-bit                  */ \
   32768,        /* Overflow signed 16-bit                  */ \
   65535,        /* Overflow unsig 16-bit when incremented  */ \
   65536,        /* Overflow unsig 16 bit                   */ \
   100663045,    /* Large positive number (endian-agnostic) */ \
   2147483647    /* Overflow signed 32-bit when incremented */

honggfuzz uses the following values for 8B as well as similar values for 1, 2, 4B too:

/* 8B - LE */
{"\x00\x00\x00\x00\x00\x00\x00\x00", 8},
{"\x01\x00\x00\x00\x00\x00\x00\x00", 8},
{"\x02\x00\x00\x00\x00\x00\x00\x00", 8},
{"\x03\x00\x00\x00\x00\x00\x00\x00", 8},
{"\x04\x00\x00\x00\x00\x00\x00\x00", 8},
{"\x05\x00\x00\x00\x00\x00\x00\x00", 8},
{"\x06\x00\x00\x00\x00\x00\x00\x00", 8},
{"\x07\x00\x00\x00\x00\x00\x00\x00", 8},
{"\x08\x00\x00\x00\x00\x00\x00\x00", 8},
{"\x09\x00\x00\x00\x00\x00\x00\x00", 8},
{"\x0A\x00\x00\x00\x00\x00\x00\x00", 8},
{"\x0B\x00\x00\x00\x00\x00\x00\x00", 8},
{"\x0C\x00\x00\x00\x00\x00\x00\x00", 8},
{"\x0D\x00\x00\x00\x00\x00\x00\x00", 8},
{"\x0E\x00\x00\x00\x00\x00\x00\x00", 8},
{"\x0F\x00\x00\x00\x00\x00\x00\x00", 8},
{"\x10\x00\x00\x00\x00\x00\x00\x00", 8},
{"\x20\x00\x00\x00\x00\x00\x00\x00", 8},
{"\x40\x00\x00\x00\x00\x00\x00\x00", 8},
{"\x7E\x00\x00\x00\x00\x00\x00\x00", 8},
{"\x7F\x00\x00\x00\x00\x00\x00\x00", 8},
{"\x80\x00\x00\x00\x00\x00\x00\x00", 8},
{"\x81\x00\x00\x00\x00\x00\x00\x00", 8},
{"\xC0\x00\x00\x00\x00\x00\x00\x00", 8},
{"\xFE\x00\x00\x00\x00\x00\x00\x00", 8},
{"\xFF\x00\x00\x00\x00\x00\x00\x00", 8},
{"\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x7E", 8},
{"\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x7F", 8},
{"\x00\x00\x00\x00\x00\x00\x00\x80", 8},
{"\x01\x00\x00\x00\x00\x00\x00\x80", 8},
{"\xFE\xFF\xFF\xFF\xFF\xFF\xFF\xFF", 8},

We can probably start with well-described magic values in afl?

compiler-rt/test/fuzzer/OverwriteBytesTest.cpp
26

Right, we need to make sure that. I attempted to do that by using FileCheck that looks for the mutation sequence printed when the test aborts.

compiler-rt/test/fuzzer/OverwriteBytesTest.h
2

This input is the base input used for magic byte mutation that lead to a big subtree that libFuzzer missed in my short experiment. The file was generated by xxd -i.

We do not need this large sequence of bytes for testing purposes. I will trim it down to a lot more compact version in the next upload.

compiler-rt/test/fuzzer/overwrite-bytes.test
3

Will do in the next upload.

7

I will adjust this value after trimming down the base input.

dokyungs updated this revision to Diff 287761.Aug 25 2020, 1:07 PM

Add magic values used in AFL.

dokyungs updated this revision to Diff 287771.Aug 25 2020, 1:51 PM

Addressed comments.

dokyungs marked 4 inline comments as done.Aug 25 2020, 1:52 PM
dokyungs added inline comments.
compiler-rt/test/fuzzer/OverwriteBytesTest.cpp
26

I now use memmem here and invoke fuzzing with -use_memmem=0 to eliminate that possibility.

compiler-rt/test/fuzzer/OverwriteBytesTest.h
2

Reduced to 36 bytes.

compiler-rt/test/fuzzer/overwrite-bytes.test
3

Done.

7

Reduced to 10M.

dokyungs updated this revision to Diff 287772.Aug 25 2020, 1:54 PM

clang-format

dokyungs edited the summary of this revision. (Show Details)Aug 25 2020, 1:54 PM
morehouse added inline comments.Aug 26 2020, 9:49 AM
compiler-rt/lib/fuzzer/FuzzerMutate.cpp
382

Please add a comment saying these are borrowed from AFL.

413

The indentation seems weird. Can we fix it?

441

The templates here seem like overkill. I'd rather have a runtime check if it simplifies the code.

For example, a simple function should suffice:

template<typename T>
static T GetRandomMagic(Random &Rand) {
  switch (sizeof(T)) {
    case 1: {
      constexpr T Magics[] = {INTERESTING_8};
      return Magics[Rand(sizeof(Magics) / sizeof(T))];
    }
    case 2:
      ...
  }

Actually, this probably gets optimized to similar code since sizeof(T) is a compile-time constant.

compiler-rt/test/fuzzer/OverwriteBytesTest.h
2

Now it's probably small enough to be included in the test itself instead of a header.

compiler-rt/test/fuzzer/overwrite-bytes.test
7

How many runs does it take with and without this mutation?

dokyungs updated this revision to Diff 288120.Aug 26 2020, 3:09 PM

Addressed comments.

dokyungs marked 2 inline comments as done.Aug 26 2020, 3:14 PM
dokyungs added inline comments.
compiler-rt/lib/fuzzer/FuzzerMutate.cpp
413

This is actually what clang-format gives us. Is it better to manually fix the indentation?

441

It's been simplified now as you suggested. Yes, it's more concise!

compiler-rt/test/fuzzer/overwrite-bytes.test
7

Without this mutation it takes 172,094,572 execs to find it the crash. With this mutation, it takes 531,941 execs.

dokyungs marked an inline comment as done.Aug 26 2020, 3:14 PM
morehouse added inline comments.Aug 26 2020, 3:22 PM
compiler-rt/lib/fuzzer/FuzzerMutate.cpp
413

Yes, I would prefer manual formatting.

423

Can we avoid the memcpy by just returning SignedVal, possibly with a cast?

429

There's no breaks, so all of these are fallthroughs...

dokyungs updated this revision to Diff 288128.Aug 26 2020, 3:59 PM

Addressed comments.

dokyungs marked 3 inline comments as done.Aug 26 2020, 4:05 PM
dokyungs added inline comments.
compiler-rt/lib/fuzzer/FuzzerMutate.cpp
423

I am not sure what cast to use - static_cast or repinterpret_cast? To figure this out I should probably go read the C++ standard.

I thought that we want the exact machine representation of signed magic values here. Based on this thought, I thought memcpy could be better here.

429

Thanks. I still make this mistake :(

dokyungs marked an inline comment as done.Aug 26 2020, 4:11 PM

New data with switch breakthrough fix.

compiler-rt/test/fuzzer/overwrite-bytes.test
7

-mutate_depth=1:
With magic mutation: 34,024
Without magic mutation: more than 134,217,728

-mutate_depth=5:
With magic mutation: 537,111
Without magic mutation: more than 67,108,864

morehouse added inline comments.Aug 26 2020, 4:20 PM
compiler-rt/lib/fuzzer/FuzzerMutate.cpp
423

static_cast should work as long as T isn't some kind of pointer. In that case we would need reinterpret_cast.

Assuming T is some kind of integer, casting is equivalent to memcpy. So if the compiler lets you, I'd prefer to just assume integer types.

We can add a static_assert(std::is_integral<T>::value) to ensure this isn't abused in the future.

dokyungs updated this revision to Diff 288136.Aug 26 2020, 4:51 PM

Addressed comment - use static_cast instead of memcpy.

LGTM for an experiment.

compiler-rt/lib/fuzzer/FuzzerMutate.cpp
446

Nit: we can save some lines of code by directly returning instead of assigning to Val. That also lets us remove the breaks.