This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Fuzzer/
-
Fuzzer/
4
FuzzerInternal.h
-
FuzzerLoop.cpp
-
FuzzerMutate.cpp
-
test/
-
FuzzerUnittest.cpp

Differential D21218

[LibFuzzer] Avoid using std::random_swap() due to platform differences and implement our own version.
AbandonedPublic

Authored by delcypher on Jun 9 2016, 10:22 PM.

Download Raw Diff

Details

Reviewers

kcc
aizatsky

Summary

[LibFuzzer] Avoid using std::random_swap() due to platform differences
and implement our own version.

It turns out that the behavior of std::random_swap is different between
libstdc++ and libcxx (even with the same random number source).
Therefore if we want consistent behavior between platforms we have to
use our own implementation.

This change (plus a change to the number of iterations of the Mutator in
the test required for the particular shuffle implementation used) fixes
the `FuzzerMutate.ShuffleBytes2` unit test on OSX.

I have verified that for the above unittest identical mutations are
now generated on both Linux and on OSX.

Diff Detail

Event Timeline

delcypher updated this revision to Diff 60310.Jun 9 2016, 10:22 PM

delcypher retitled this revision from to [LibFuzzer] Avoid using std::random_swap() due to platform differences and implement our own version..

delcypher updated this object.

delcypher added reviewers: kcc, aizatsky.

delcypher added subscribers: kcc, aizatsky, zaks.anna and 3 others.

@kcc: Sorry it took me quite a while to figure out exactly where divergence in mutation behavior was coming from but it does seem to be the use of std::random_shuffle() that is causing it which is rather sad. Here's a small test cases you can run to see the problem. You should fine that the printed result is different on Linux (with libstdc++) and OSX (with libcxx).

#include <algorithm>
#include <iostream>
#include <random>
#include <stdint.h>

void printArray(uint8_t* X, size_t length) {
  for (size_t index=0; index < length; ++index) {
    std::cout << "0x" << std::hex << static_cast<unsigned>(X[index]) << ",";
  }
  std::cout << std::endl;
}

class Random {
 public:
  Random(unsigned int seed) : R(seed) {}
  size_t Rand() { return R(); }
  size_t RandBool() { return Rand() % 2; }
  size_t operator()(size_t n) { return n ? Rand() % n : 0; }
  std::mt19937 &Get_mt19937() { return R; }
 private:
  std::mt19937 R;
};


int main() {
  Random R(0);
  uint8_t X[7] = { 0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66};
  printArray(X, 7);
  // Shuffle
  std::random_shuffle(X + 2, X + 2 + 4, R);
  // With libstdc++ : 0x0,0x11,0x44,0x55,0x33,0x22,0x66,
  // With libcxx    : 0x0,0x11,0x22,0x33,0x55,0x44,0x66,
  printArray(X, 7);
  return 0;
}

I'm not entirely sure the implementation I've picked to replace random_shuffle is the best as it required more iterations in the unit test to hit all the mutations we want to see (previously 524288 now its 566171).

aizatsky requested changes to this revision.Jun 10 2016, 2:22 PM

aizatsky edited edge metadata.

aizatsky added inline comments.

lib/Fuzzer/FuzzerInternal.h
145	Specify algorithm reference.
152	Let's see that this is literally Fisher-Yates shuffle: https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle Even (N-2) is important to have it produce uniform shuffles.

This revision now requires changes to proceed.Jun 10 2016, 2:22 PM

delcypher added inline comments.Jun 10 2016, 6:36 PM

lib/Fuzzer/FuzzerInternal.h
145	This algorithm is based on what is in libc++ but the loop iteration goes forwards rather than backwards.
152	Let's see that this is literally Fisher-Yates shuffle: https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle I did not know this algorithm had a name. My version is basically the `Durstenfeld` implementation shuffling from lowest index to highest. This had made me realize there is a mistake in my implementation. My implementation loops one too many time as it will try to swap the last element with itself. Even (N-2) is important to have it produce uniform shuffles. I did not know that. I didn't see that explicitly stated in the wikipedia article. Is that a solution to the modulo bias problem it mentions?

What problem are you trying to fix and do you really need to fix it?

kcc (sent from phone)

Can you just change the number of iterations to e.g. 1<<20 or 1<<21?

In D21218#455412, @kcc wrote:

What problem are you trying to fix and do you really need to fix it?

The problem I'm trying to fix is inconsistent mutation behavior between OSX and Linux. Having inconsistent behavior is undesirable because

It makes debugging issues on OSX harder because it prevents me from using how LibFuzzer behaves on Linux as my point of reference.
It makes tests flakey.

In D21218#456101, @kcc wrote:

Can you just change the number of iterations to e.g. 1<<20 or 1<<21?

That is the very first thing I tried and that does allow the test to pass on OSX but that's a terrible fix which just hides the underlying issue. I think it is very undesirable to have mutation behavior differ between platforms. The fact that the unit tests use a fix seed for the PRNG suggests to me that the author was trying to make the test behave consistently. Having tests behave consistently is a good thing and I don't understand why we would want only partially consistent behavior by not bothering to make random shuffle behave consistently.

I am more than happy to debate what the algorithm should be but I very strongly believe that LibFuzzer's reliance on std::random_shuffle needs to be removed.

In D21218#456562, @delcypher wrote:

In D21218#455412, @kcc wrote:

What problem are you trying to fix and do you really need to fix it?

The problem I'm trying to fix is inconsistent mutation behavior between OSX and Linux. Having inconsistent behavior is undesirable because

It makes debugging issues on OSX harder because it prevents me from using how LibFuzzer behaves on Linux as my point of reference.

It makes tests flakey.

Tests that do random mutations will behave this way.
They are not flaky in usual meaning, i.e. running the same binary 10000000 times will give the same result.
But running on a different OS, or with a different STL may produce different result here and it should be fine.

In D21218#456978, @kcc wrote:

In D21218#456562, @delcypher wrote:

In D21218#455412, @kcc wrote:

What problem are you trying to fix and do you really need to fix it?

The problem I'm trying to fix is inconsistent mutation behavior between OSX and Linux. Having inconsistent behavior is undesirable because

It makes debugging issues on OSX harder because it prevents me from using how LibFuzzer behaves on Linux as my point of reference.

It makes tests flakey.

Tests that do random mutations will behave this way.
They are not flaky in usual meaning, i.e. running the same binary 10000000 times will give the same result.
But running on a different OS, or with a different STL may produce different result here and it should be fine.

I disagree with your choice here. I would prefer for consistent behavior where it is possible, however:

I have run out time to do any more major porting work for LibFuzzer.
LibFuzzer is your project so you have the final say here.

So would you accept a patch that just increases the number iterations we do of the mutator to fix the test? If yes do you want it to be the minimum number of iterations required or something more readable like (1 << 20)?

Minimal power of two would be the best.

Superseded by http://reviews.llvm.org/D21359

Revision Contents

Path

Size

lib/

Fuzzer/

FuzzerInternal.h

16 lines

FuzzerLoop.cpp

2 lines

FuzzerMutate.cpp

3 lines

test/

FuzzerUnittest.cpp

2 lines

Diff 60310

lib/Fuzzer/FuzzerInternal.h

	Show First 20 Lines • Show All 131 Lines • ▼ Show 20 Lines

	class Random {			class Random {
	public:			public:
	Random(unsigned int seed) : R(seed) {}			Random(unsigned int seed) : R(seed) {}
	size_t Rand() { return R(); }			size_t Rand() { return R(); }
	size_t RandBool() { return Rand() % 2; }			size_t RandBool() { return Rand() % 2; }
	size_t operator()(size_t n) { return n ? Rand() % n : 0; }			size_t operator()(size_t n) { return n ? Rand() % n : 0; }
	std::mt19937 &Get_mt19937() { return R; }			std::mt19937 &Get_mt19937() { return R; }
				// Shuffle data in the range [First, End)
				//
				// We do not use std::random_shuffle() here because its
				// behavior is not consistent across different platforms.
				//
				// The algorithm used here will pick a permutation at
				aizatskyUnsubmitted Not Done Reply Inline Actions Specify algorithm reference. aizatsky: Specify algorithm reference.
				delcypherAuthorUnsubmitted Not Done Reply Inline Actions This algorithm is based on what is in libc++ but the loop iteration goes forwards rather than backwards. delcypher: This algorithm is based on what is in libc++ but the loop iteration goes forwards rather than…
				// random where every permutation has equal probability
				// (provided the random source is uniformly distributed).
				template<typename RndAccessIt>
				void Shuffle(RndAccessIt First, RndAccessIt End) {
				typename std::iterator_traits<RndAccessIt>::difference_type Offset, N;
				N = End - First;
				for (Offset = 0; Offset < N; ++Offset) {
				aizatskyUnsubmitted Not Done Reply Inline Actions Let's see that this is literally Fisher-Yates shuffle: https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle Even (N-2) is important to have it produce uniform shuffles. aizatsky: Let's see that this is literally Fisher-Yates shuffle: https://en.wikipedia.
				delcypherAuthorUnsubmitted Not Done Reply Inline Actions Let's see that this is literally Fisher-Yates shuffle: https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle I did not know this algorithm had a name. My version is basically the `Durstenfeld` implementation shuffling from lowest index to highest. This had made me realize there is a mistake in my implementation. My implementation loops one too many time as it will try to swap the last element with itself. Even (N-2) is important to have it produce uniform shuffles. I did not know that. I didn't see that explicitly stated in the wikipedia article. Is that a solution to the modulo bias problem it mentions? delcypher: > Let's see that this is literally Fisher-Yates shuffle: > https://en.wikipedia.
				std::swap(First[Offset], First[Offset + (*this)(N - Offset)]);
				}
				}
	private:			private:
	std::mt19937 R;			std::mt19937 R;
	};			};

	// Dictionary.			// Dictionary.

	// Parses one dictionary entry.			// Parses one dictionary entry.
	// If successfull, write the enty to Unit and returns true,			// If successfull, write the enty to Unit and returns true,
	▲ Show 20 Lines • Show All 343 Lines • Show Last 20 Lines

lib/Fuzzer/FuzzerLoop.cpp

Show First 20 Lines • Show All 315 Lines • ▼ Show 20 Lines	if (UnitHashesAddedToCorpus.insert(Hash(X)).second) {
UpdateCorpusDistribution();		UpdateCorpusDistribution();
PrintStats("RELOAD");		PrintStats("RELOAD");
}		}
}		}
}		}
}		}

void Fuzzer::ShuffleCorpus(UnitVector *V) {		void Fuzzer::ShuffleCorpus(UnitVector *V) {
std::random_shuffle(V->begin(), V->end(), MD.GetRand());		MD.GetRand().Shuffle(V->begin(), V->end());
if (Options.PreferSmall)		if (Options.PreferSmall)
std::stable_sort(V->begin(), V->end(), [](const Unit &A, const Unit &B) {		std::stable_sort(V->begin(), V->end(), [](const Unit &A, const Unit &B) {
return A.size() < B.size();		return A.size() < B.size();
});		});
}		}

// Tries random prefixes of corpus items.		// Tries random prefixes of corpus items.
// Prefix length is chosen according to exponential distribution		// Prefix length is chosen according to exponential distribution
▲ Show 20 Lines • Show All 471 Lines • Show Last 20 Lines

lib/Fuzzer/FuzzerMutate.cpp

	Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines

	size_t MutationDispatcher::Mutate_ShuffleBytes(uint8_t *Data, size_t Size,			size_t MutationDispatcher::Mutate_ShuffleBytes(uint8_t *Data, size_t Size,
	size_t MaxSize) {			size_t MaxSize) {
	assert(Size);			assert(Size);
	size_t ShuffleAmount =			size_t ShuffleAmount =
	Rand(std::min(Size, (size_t)8)) + 1; // [1,8] and <= Size.			Rand(std::min(Size, (size_t)8)) + 1; // [1,8] and <= Size.
	size_t ShuffleStart = Rand(Size - ShuffleAmount);			size_t ShuffleStart = Rand(Size - ShuffleAmount);
	assert(ShuffleStart + ShuffleAmount <= Size);			assert(ShuffleStart + ShuffleAmount <= Size);
	std::random_shuffle(Data + ShuffleStart, Data + ShuffleStart + ShuffleAmount,			Rand.Shuffle(Data + ShuffleStart, Data + ShuffleStart + ShuffleAmount);
	Rand);
	return Size;			return Size;
	}			}

	size_t MutationDispatcher::Mutate_EraseByte(uint8_t *Data, size_t Size,			size_t MutationDispatcher::Mutate_EraseByte(uint8_t *Data, size_t Size,
	size_t MaxSize) {			size_t MaxSize) {
	assert(Size);			assert(Size);
	if (Size == 1) return 0;			if (Size == 1) return 0;
	size_t Idx = Rand(Size);			size_t Idx = Rand(Size);
	▲ Show 20 Lines • Show All 213 Lines • Show Last 20 Lines

lib/Fuzzer/test/FuzzerUnittest.cpp

Show First 20 Lines • Show All 253 Lines • ▼ Show 20 Lines	void TestShuffleBytes(Mutator M, int NumIter) {
}		}
EXPECT_EQ(FoundMask, 31);		EXPECT_EQ(FoundMask, 31);
}		}

TEST(FuzzerMutate, ShuffleBytes1) {		TEST(FuzzerMutate, ShuffleBytes1) {
TestShuffleBytes(&MutationDispatcher::Mutate_ShuffleBytes, 1 << 16);		TestShuffleBytes(&MutationDispatcher::Mutate_ShuffleBytes, 1 << 16);
}		}
TEST(FuzzerMutate, ShuffleBytes2) {		TEST(FuzzerMutate, ShuffleBytes2) {
TestShuffleBytes(&MutationDispatcher::Mutate, 1 << 19);		TestShuffleBytes(&MutationDispatcher::Mutate, 566171);
}		}

void TestAddWordFromDictionary(Mutator M, int NumIter) {		void TestAddWordFromDictionary(Mutator M, int NumIter) {
std::unique_ptr<ExternalFunctions> t(new ExternalFunctions());		std::unique_ptr<ExternalFunctions> t(new ExternalFunctions());
fuzzer::EF = t.get();		fuzzer::EF = t.get();
Random Rand(0);		Random Rand(0);
MutationDispatcher MD(Rand);		MutationDispatcher MD(Rand);
uint8_t Word1[4] = {0xAA, 0xBB, 0xCC, 0xDD};		uint8_t Word1[4] = {0xAA, 0xBB, 0xCC, 0xDD};
▲ Show 20 Lines • Show All 180 Lines • Show Last 20 Lines