Download Raw Diff

Details

Reviewers

Commits

rGf7e610eda1c3: [libFuzzer] Experimenting with dictionary minimization.
rL298031: [libFuzzer] Experimenting with dictionary minimization.

Summary

Tracking issue: https://github.com/google/oss-fuzz/issues/331

Diff Detail

Event Timeline

Dor1s created this revision.Mar 14 2017, 7:23 AM

Kostya, below is what I'm getting with this code:

Dictionary: 27 entries
INFO: Seed: 2599778320
INFO: Loaded 2 modules (22862 guards): [0x7fc343e70160, 0x7fc343e81294), [0x5f98a0, 0x5feca4), 
Loading corpus dir: ../libpng_read_fuzzer
Loaded 1024/2611 files from ../libpng_read_fuzzer
Loaded 2048/2611 files from ../libpng_read_fuzzer
Started dictionary minimization. Need to perform 140940 tests.
Initial corpus unit: \x89PNG\x0d\x0a\x1a\x0a\x00\x00\x00\x0dIHDR\x00\x00\x00%\x00\x00\x00\x01\x01\x00\x00\x00\x00\x00\x00a\x00\x00\x00\x000IDAThCcB\x00\x14zTXt\x1e\x8c\x1e\x1e\x1e\x1e\x1a\x1e\x1e\x1e\x1e\x1e\xe1\x90`NG\x89cP\x1e`NG\x89cP\x1eG\x89cP\x1e\x1eia\x00\xffA\xc0
Initial Result: 2, Features: 11200 11208, Coverage: 753

Dictionary unit: \x89PNG\x0d\x0a\x1a\x0a
Modified corpus unit: v\xaf\xb1\xb8\xf2\xf5\xe5\xf5\x00\x00\x00\x0dIHDR\x00\x00\x00%\x00\x00\x00\x01\x01\x00\x00\x00\x00\x00\x00a\x00\x00\x00\x000IDAThCcB\x00\x14zTXt\x1e\x8c\x1e\x1e\x1e\x1e\x1a\x1e\x1e\x1e\x1e\x1e\xe1\x90`NG\x89cP\x1e`NG\x89cP\x1eG\x89cP\x1e\x1eia\x00\xffA\xc0
Modified Result: 2, Features: 11200 11208, Coverage: 754

Dictionary unit: IDAT
Modified corpus unit: \x89PNG\x0d\x0a\x1a\x0a\x00\x00\x00\x0dIHDR\x00\x00\x00%\x00\x00\x00\x01\x01\x00\x00\x00\x00\x00\x00a\x00\x00\x00\x000\xb6\xbb\xbe\xabhCcB\x00\x14zTXt\x1e\x8c\x1e\x1e\x1e\x1e\x1a\x1e\x1e\x1e\x1e\x1e\xe1\x90`NG\x89cP\x1e`NG\x89cP\x1eG\x89cP\x1e\x1eia\x00\xffA\xc0
Modified Result: 0, Features:, Coverage: 759

Dictionary unit: IHDR
Modified corpus unit: \x89PNG\x0d\x0a\x1a\x0a\x00\x00\x00\x0d\xb6\xb7\xbb\xad\x00\x00\x00%\x00\x00\x00\x01\x01\x00\x00\x00\x00\x00\x00a\x00\x00\x00\x000IDAThCcB\x00\x14zTXt\x1e\x8c\x1e\x1e\x1e\x1e\x1a\x1e\x1e\x1e\x1e\x1e\xe1\x90`NG\x89cP\x1e`NG\x89cP\x1eG\x89cP\x1e\x1eia\x00\xffA\xc0
Modified Result: 2, Features: 11200 11208, Coverage: 759

Dictionary unit: zTXt
Modified corpus unit: \x89PNG\x0d\x0a\x1a\x0a\x00\x00\x00\x0dIHDR\x00\x00\x00%\x00\x00\x00\x01\x01\x00\x00\x00\x00\x00\x00a\x00\x00\x00\x000IDAThCcB\x00\x14\x85\xab\xa7\x8b\x1e\x8c\x1e\x1e\x1e\x1e\x1a\x1e\x1e\x1e\x1e\x1e\xe1\x90`NG\x89cP\x1e`NG\x89cP\x1eG\x89cP\x1e\x1eia\x00\xffA\xc0
Modified Result: 0, Features:, Coverage: 760

###### Useless dictionary elements. ######
"\x89PNG\x0d\x0a\x1a\x0a" # Score: -1
"IDAT" # Score: 1
"IHDR" # Score: -1
"zTXt" # Score: 1
###### End of useless dictionary elements. ######
Dictionary minimization suceeded

As you can see, initial corpus unit gives the following stats:

Initial Result: 2, Features: 11200 11208, Coverage: 753

If I replace PNG signature in the beginning with a trash, it still gives:

Modified Result: 2, Features: 11200 11208, Coverage: 754

Which is totally wrong. I do not obtain coverage info correctly. What could be a good way to distinguish a difference between two corpus units?

Dor1s updated this revision to Diff 91721.Mar 14 2017, 7:28 AM

Dor1s updated this revision to Diff 91723.Mar 14 2017, 7:33 AM

ah...
TPC.CollectFeatures is destructive...
For performance reasons I had to do Counters[i] = 0; inside it, so you can only call it once after each execution and Fuzzer::RunOne already does that.
So, one way to "fix" your code is to call ExecuteCallback and then call TPC.CollectFeatures.
Another way, (maybe) is to fix TPC.CollectFeatures so that it does not reset Counters. (I'll look into this too)

in t297783 I removed Counters[i] = 0; -- try again.
(also removed some other stale code)

Yeah, this is much better now! Thank you!!! Look, really important elements for PNG dictionary get a pretty high score now:

"\x89PNG\x0d\x0a\x1a\x0a" # Score: 2589, Used: 2589
"IDAT" # Score: 1323, Used: 1649
"IEND" # Score: -297, Used: 319
"IHDR" # Score: 2527, Used: 2545
"PLTE" # Score: 132, Used: 266
"bKGD" # Score: 18, Used: 82
"cHRM" # Score: 139, Used: 189
"fRAc" # Score: -17, Used: 47
"gAMA" # Score: 232, Used: 368
"gIFg" # Score: -1, Used: 43
"gIFt" # Score: -21, Used: 49
"gIFx" # Score: 0, Used: 10
"hIST" # Score: -9, Used: 47
"iCCP" # Score: 182, Used: 450
"iTXt" # Score: -30, Used: 64
"oFFs" # Score: -74, Used: 122
"pCAL" # Score: -12, Used: 124
"pHYs" # Score: 2, Used: 66
"sBIT" # Score: 39, Used: 97
"sCAL" # Score: -4, Used: 42
"sPLT" # Score: -16, Used: 62
"sRGB" # Score: 64, Used: 118
"sTER" # Score: 4, Used: 26
"tEXt" # Score: 83, Used: 121
"tIME" # Score: -6, Used: 42
"tRNS" # Score: 73, Used: 173
"zTXt" # Score: 458, Used: 612

or some entries from net_dns_record_fuzzer dictionary:

"\x00\x00" # Score: 316, Used: 182
"\x00\x01" # Score: 139, Used: 122
"\x00\x02" # Score: 0, Used: 9
"\x00\x03" # Score: 136, Used: 89
"\x00\x04" # Score: -37, Used: 37
"\x00\x05" # Score: 3, Used: 3
"\x00\x08" # Score: 2, Used: 1
"\x00\x0c" # Score: 4, Used: 2
"\x00\x10" # Score: -11, Used: 20
"\x00\x1c" # Score: 22, Used: 68
"\x00 " # Score: 99, Used: 63
"\x000" # Score: 2, Used: 1
"\x03foo\x00" # Score: 2, Used: 1
"\x00\x01\x00\x01" # Score: 2, Used: 1
"\x00\x05\x00\x01" # Score: 1, Used: 2
"\x03foo\x00\x00\x01\x00\x01" # Score: 2, Used: 1
"\x00\x05\x00\x01\x00\x00\x00\xff" # Score: 2, Used: 1
"A.B.C.D" # Score: -6, Used: 6
"0" # Score: -119, Used: 125
"M." # Score: -11, Used: 11
"IN," # Score: -1, Used: 1
"TTL" # Score: -15, Used: 15
"[" # Score: 59, Used: 46
"MB" # Score: -1, Used: 1
"MX" # Score: -1, Used: 1
"use" # Score: -1, Used: 1
"to" # Score: -1, Used: 1
"TELNET," # Score: 2, Used: 1
"F." # Score: -55, Used: 64
"A" # Score: -126, Used: 135
"\"A" # Score: 115, Used: 80
"a" # Score: -75, Used: 81
"RFC-953." # Score: -1, Used: 1

rename MinimizeDictionary -> AnalyzeDictionary
replace all occurrences of a dictionary unit inside corpus testcase
add two points to score if dictionary unit seems to be useful

Small fixes.

Kostya, what do you think about this approach in general? I guess that you don't want to add new flags and new auxiliary features to libFuzzer, but I don't see a way to minimize dictionaries without runtime analysis and coverage comparison.

I like how aggressive this approach is against "useless" units in dictionary. I even had to use Scores[i] += 2 instead of ++Scores[i] for cases when dictionary unit looks "useful". Otherwise it's too aggressive, e.g.:

PNG fuzzer:

###### Useless dictionary elements. ######
"IEND" # Score: -297, Used: 319
"fRAc" # Score: -1, Used: 47
"gIFt" # Score: -9, Used: 49
"iTXt" # Score: -20, Used: 64
"oFFs" # Score: -68, Used: 122
###### End of useless dictionary elements. ######

or prtime_fuzzer:

###### Useless dictionary elements. ######
"." # Score: -33, Used: 129
"Z" # Score: -70, Used: 90
"Mar" # Score: -2, Used: 10
"Apr" # Score: -6, Used: 12
"Jun" # Score: -1, Used: 9
"Nov" # Score: -3, Used: 7
###### End of useless dictionary elements. ######

With += 2 it looks better, e.g. PNG:

###### Useless dictionary elements. ######
"IEND" # Score: -286, Used: 319
"oFFs" # Score: -41, Used: 122
###### End of useless dictionary elements. ######

prtime:

###### Useless dictionary elements. ######
"Z" # Score: -60, Used: 90
"Apr" # Score: -3, Used: 12
"Nov" # Score: -1, Used: 7
###### End of useless dictionary elements. ######

I'm sure this approach does not love IEND tag because existing libpng fuzzer almost never reaches end of the input where IEND chunk could make sense. Anyway, those units are from manually compiled dictionaries, and we do not want to minimize them. We need to minimize auto-collected dictionaries. And for them this approach suggests to remove lots of trashy units.

Dor1s updated this revision to Diff 91879.Mar 15 2017, 8:40 AM

I don't know if this approach is useful or not. The only way to know is to try it more :)

Adding experimental flags like this is fine, as long as the implementation does not make things more complex.
This one doesn't.

Since this is an highly experimental feature I don't insist on adding tests at this moment,
but unless there are tests the feature may break at any time.

lib/Fuzzer/FuzzerDriver.cpp
363	Do you need this here? Fuzzer::ExecuteCallback does this already.
372	not sure if we care about efficiency here. If we do, std::vector (declared outside of the loop, with .clear() call here) might be a better choice
377	ditto
379	just size_t, and use < instead of !=

Address review comments

lib/Fuzzer/FuzzerDriver.cpp
372	Thanks! Did the same for ModifiedFeatures as well.
379	Did the same in the loop below.

I do not have commit access. May I ask you please to land the CL if you won't request more changes?

LGTM, I'll commit

lib/Fuzzer/FuzzerDriver.cpp
394	{some perfectionism} I would try to write this code with just one call std::search? But for an experimental code this is good enough.

This revision is now accepted and ready to land.Mar 16 2017, 6:51 PM

kcc closed this revision.Mar 16 2017, 6:52 PM

Thank you very much!

lib/Fuzzer/FuzzerDriver.cpp
394	Hm, but how? I could write a separate function that loops through the Data and modify it when substring is found, but I'm not sure that it would be faster, since here I also iterate through the Data only once.

Diff 91723

lib/Fuzzer/FuzzerDriver.cpp

Show First 20 Lines • Show All 349 Lines • ▼ Show 20 Lines	int MinimizeCrashInputInternalStep(Fuzzer F, InputCorpus Corpus) {
F->SetMaxInputLen(U.size());		F->SetMaxInputLen(U.size());
F->SetMaxMutationLen(U.size() - 1);		F->SetMaxMutationLen(U.size() - 1);
F->MinimizeCrashLoop(U);		F->MinimizeCrashLoop(U);
Printf("INFO: Done MinimizeCrashInputInternalStep, no crashes found\n");		Printf("INFO: Done MinimizeCrashInputInternalStep, no crashes found\n");
exit(0);		exit(0);
return 0;		return 0;
}		}

		int MinimizeDictionary(Fuzzer *F, const std::vector<Unit>& Dict,
		UnitVector& Corpus) {
		std::size_t NumberOfTests = Dict.size() * Corpus.size() * 2;
		Printf("Started dictionary minimization.");
		Printf(" Need to perform %zd tests.\n", NumberOfTests);

		kccUnsubmitted Done Reply Inline Actions Do you need this here? Fuzzer::ExecuteCallback does this already. kcc: Do you need this here? Fuzzer::ExecuteCallback does this already.
		// F->Corpus.ResetFeatureSet(); // private method :(
		F->ResetCoverage();
		TPC.ResetMaps();

		// Scores for every dictionary unit.
		std::vector<int> Scores(Dict.size());
		for (auto &Data : Corpus) {
		// Get coverage for Corpus unit without modifications.
		F->RunOne(Data.data(), Data.size());
		kccUnsubmitted Done Reply Inline Actions not sure if we care about efficiency here. If we do, std::vector (declared outside of the loop, with .clear() call here) might be a better choice kcc: not sure if we care about efficiency here. If we do, std::vector (declared outside of the loop…
		Dor1sAuthorUnsubmitted Not Done Reply Inline Actions Thanks! Did the same for ModifiedFeatures as well. Dor1s: Thanks! Did the same for ModifiedFeatures as well.
		auto InitialCoverage = TPC.GetTotalPCCoverage();
		std::set<size_t> InitialFeatures;
		auto InitialResult = TPC.CollectFeatures([&](size_t Feature) -> bool {
		InitialFeatures.insert(Feature);
		return true;
		kccUnsubmitted Done Reply Inline Actions ditto kcc: ditto
		});

		kccUnsubmitted Done Reply Inline Actions just size_t, and use < instead of != kcc: just size_t, and use < instead of !=
		Dor1sAuthorUnsubmitted Not Done Reply Inline Actions Did the same in the loop below. Dor1s: Did the same in the loop below.
		// DEBUG OUTPUT
		Printf("Initial corpus unit: ");
		PrintASCII(Data.data(), Data.size(), "\n");
		Printf("Initial Result: ");
		Printf("%zd, Features:", InitialResult);
		for (size_t F : InitialFeatures)
		Printf(" %zd", F);
		Printf(", Coverage: %zd\n", InitialCoverage);

		TPC.ResetMaps();
		F->ResetCoverage();

		for (std::size_t i = 0; i != Dict.size(); ++i) {
		auto Pos = std::search(Data.begin(), Data.end(),
		Dict[i].begin(), Dict[i].end());
		kccUnsubmitted Not Done Reply Inline Actions {some perfectionism} I would try to write this code with just one call std::search? But for an experimental code this is good enough. kcc: {some perfectionism} I would try to write this code with just one call std::search? But for an…
		Dor1sAuthorUnsubmitted Not Done Reply Inline Actions Hm, but how? I could write a separate function that loops through the Data and modify it when substring is found, but I'm not sure that it would be faster, since here I also iterate through the Data only once. Dor1s: Hm, but how? I could write a separate function that loops through the Data and modify it when…
		if (Pos == Data.end())
		continue;

		// DEBUG OUTPUT
		Printf("\nDictionary unit: ");
		PrintASCII(Dict[i].data(), Dict[i].size(), "\n");

		// Replace dictionary unit in the data by its masked value.
		for (auto It = Pos; It != Pos + Dict[i].size(); ++It)
		*It ^= 0xFF;

		F->RunOne(Data.data(), Data.size());
		auto ModifiedCoverage = TPC.GetTotalPCCoverage();
		std::set<size_t> ModifiedFeatures;
		auto ModifiedResult = TPC.CollectFeatures([&](size_t Feature) -> bool {
		ModifiedFeatures.insert(Feature);
		return true;
		});

		// DEBUG OUTPUT
		Printf("Modified corpus unit: ");
		PrintASCII(Data.data(), Data.size(), "\n");
		Printf("Modified Result: ");
		Printf("%zd, Features:", ModifiedResult);
		for (size_t F : ModifiedFeatures)
		Printf(" %zd", F);
		Printf(", Coverage: %zd\n", ModifiedCoverage);

		TPC.ResetMaps();
		F->ResetCoverage();

		// Restore initial contents of the data.
		for (auto It = Pos; It != Pos + Dict[i].size(); ++It)
		*It ^= 0xFF;

		// if (InitialCoverage == ModifiedCoverage) {
		if (InitialFeatures == ModifiedFeatures) {
		--Scores[i];
		} else {
		++Scores[i];
		}

		if (i > 20) // DEBUG for early bailout.
		break;
		}
		Printf("\n"); // DEBUG
		break; // DEBUG for bailout after the first input.
		}

		Printf("###### Useless dictionary elements. ######\n");
		for (std::size_t i = 0; i != Dict.size(); ++i) {
		// if (Scores[i] > 0)
		if (Scores[i] == 0)
		continue;

		// PRINT ALL ELEMENST WITH THEIR SCORES FOR DEBUGING
		Printf("\"");
		PrintASCII(Dict[i].data(), Dict[i].size(), "\"");
		Printf(" # Score: %d\n", Scores[i]);
		}
		Printf("###### End of useless dictionary elements. ######\n");
		return 0;
		}

int FuzzerDriver(int argc, char **argv, UserCallback Callback) {		int FuzzerDriver(int argc, char **argv, UserCallback Callback) {
using namespace fuzzer;		using namespace fuzzer;
assert(argc && argv && "Argument pointers cannot be nullptr");		assert(argc && argv && "Argument pointers cannot be nullptr");
std::string Argv0((*argv)[0]);		std::string Argv0((*argv)[0]);
EF = new ExternalFunctions();		EF = new ExternalFunctions();
if (EF->LLVMFuzzerInitialize)		if (EF->LLVMFuzzerInitialize)
EF->LLVMFuzzerInitialize(argc, argv);		EF->LLVMFuzzerInitialize(argc, argv);
const std::vector<std::string> Args(argv, argv + *argc);		const std::vector<std::string> Args(argv, argv + *argc);
▲ Show 20 Lines • Show All 179 Lines • ▼ Show 20 Lines	int FuzzerDriver(int argc, char **argv, UserCallback Callback) {

UnitVector InitialCorpus;		UnitVector InitialCorpus;
for (auto &Inp : *Inputs) {		for (auto &Inp : *Inputs) {
Printf("Loading corpus dir: %s\n", Inp.c_str());		Printf("Loading corpus dir: %s\n", Inp.c_str());
ReadDirToVectorOfUnits(Inp.c_str(), &InitialCorpus, nullptr,		ReadDirToVectorOfUnits(Inp.c_str(), &InitialCorpus, nullptr,
TemporaryMaxLen, /ExitOnError=/false);		TemporaryMaxLen, /ExitOnError=/false);
}		}

		if (Flags.minimize_dict) {
		if (Dictionary.empty() \|\| Inputs->empty()) {
		Printf("ERROR: can't minimize dict without dict and corpus provided\n");
		return 1;
		}
		if (MinimizeDictionary(F, Dictionary, InitialCorpus)) {
		Printf("Dictionary minimization failed\n");
		exit(1);
		}
		Printf("Dictionary minimization suceeded\n");
		exit(0);
		}

if (Options.MaxLen == 0) {		if (Options.MaxLen == 0) {
size_t MaxLen = 0;		size_t MaxLen = 0;
for (auto &U : InitialCorpus)		for (auto &U : InitialCorpus)
MaxLen = std::max(U.size(), MaxLen);		MaxLen = std::max(U.size(), MaxLen);
F->SetMaxInputLen(std::min(std::max(kMinDefaultLen, MaxLen), kMaxSaneLen));		F->SetMaxInputLen(std::min(std::max(kMinDefaultLen, MaxLen), kMaxSaneLen));
}		}

if (InitialCorpus.empty()) {		if (InitialCorpus.empty()) {
Show All 20 Lines

lib/Fuzzer/FuzzerFlags.def

Show All 33 Lines	FUZZER_FLAG_INT(timeout_exitcode, 77, "When libFuzzer reports a timeout "
"this exit code will be used.")		"this exit code will be used.")
FUZZER_FLAG_INT(max_total_time, 0, "If positive, indicates the maximal total "		FUZZER_FLAG_INT(max_total_time, 0, "If positive, indicates the maximal total "
"time in seconds to run the fuzzer.")		"time in seconds to run the fuzzer.")
FUZZER_FLAG_INT(help, 0, "Print help.")		FUZZER_FLAG_INT(help, 0, "Print help.")
FUZZER_FLAG_INT(merge, 0, "If 1, the 2-nd, 3-rd, etc corpora will be "		FUZZER_FLAG_INT(merge, 0, "If 1, the 2-nd, 3-rd, etc corpora will be "
"merged into the 1-st corpus. Only interesting units will be taken. "		"merged into the 1-st corpus. Only interesting units will be taken. "
"This flag can be used to minimize a corpus.")		"This flag can be used to minimize a corpus.")
FUZZER_FLAG_STRING(merge_control_file, "internal flag")		FUZZER_FLAG_STRING(merge_control_file, "internal flag")
		FUZZER_FLAG_INT(minimize_dict, 0, "internal flag")
FUZZER_FLAG_INT(minimize_crash, 0, "If 1, minimizes the provided"		FUZZER_FLAG_INT(minimize_crash, 0, "If 1, minimizes the provided"
" crash input. Use with -runs=N or -max_total_time=N to limit "		" crash input. Use with -runs=N or -max_total_time=N to limit "
"the number attempts")		"the number attempts")
FUZZER_FLAG_INT(minimize_crash_internal_step, 0, "internal flag")		FUZZER_FLAG_INT(minimize_crash_internal_step, 0, "internal flag")
FUZZER_FLAG_INT(use_counters, 1, "Use coverage counters")		FUZZER_FLAG_INT(use_counters, 1, "Use coverage counters")
FUZZER_FLAG_INT(use_indir_calls, 1, "Use indirect caller-callee counters")		FUZZER_FLAG_INT(use_indir_calls, 1, "Use indirect caller-callee counters")
FUZZER_FLAG_INT(use_memmem, 1,		FUZZER_FLAG_INT(use_memmem, 1,
"Use hints from intercepting memmem, strstr, etc")		"Use hints from intercepting memmem, strstr, etc")
▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[libFuzzer] Experimenting with dictionary minimization.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 91723

lib/Fuzzer/FuzzerDriver.cpp

lib/Fuzzer/FuzzerFlags.def

This is an archive of the discontinued LLVM Phabricator instance.

[libFuzzer] Experimenting with dictionary minimization.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 91723

lib/Fuzzer/FuzzerDriver.cpp

lib/Fuzzer/FuzzerFlags.def

[libFuzzer] Experimenting with dictionary minimization.
ClosedPublic