This is an archive of the discontinued LLVM Phabricator instance.

[llvm-opt-fuzzer] Avoid adding incorrect inputs to the fuzzer corpus
ClosedPublic

Authored by igor-laevsky on Jan 23 2018, 3:48 AM.

Download Raw Diff

Details

Reviewers

kcc
bogner

Commits

rG14c979da329d: [llvm-opt-fuzzer] Avoid adding incorrect inputs to the fuzzer corpus
rL324225: [llvm-opt-fuzzer] Avoid adding incorrect inputs to the fuzzer corpus

Summary

In general during continuous fuzzing we want to avoid adding invalid inputs into the corpus because they will test the wrong thing (error handling in the bitcode reader). Instead it would be better to concentrate on the actual fuzzing target by only producing valid inputs.

In the perfect world llvm mutator will always produce correct llvm ir and everything would work flawlessly. However there are number of cases when mutator fails to guarantee that. We catch them by running module verification after mutation. In theory this should be sufficient to prevent exposing incorrect inputs to the libFuzzer. However I noticed that still occasionally incorrect inputs would flow into the fuzzer corpus.

Problem lurks with some rare invariants which are only checked by the llvm reader. This means that verification after mutation will not catch them. Ideal solution would be to first fix all those issues in the verifier and then fix the mutator to not produce such mutations. However it's unclear how much of those are there and debugging each of them might prove to be complicated.

So until those are fixed I think it's reasonable to add explicit save/reload step as part of the after-mutation verification. This should produce cleaner continuous runs with clear indications of the mutator problems. On my machine this decreases exec/s by 10-30% but it seems like reasonable cost to pay for the correct runs.

Diff Detail

Event Timeline

igor-laevsky created this revision.Jan 23 2018, 3:48 AM

Hi. Any comments on this?

I have mixed feelings about this, but I guess it's probably better than the status quo. My two main concerns are the time cost and whether we'll stop noticing the issues instead of fixing them.

tools/llvm-opt-fuzzer/llvm-opt-fuzzer.cpp
68–73	Can we drop this part and only verify after the reload?
89–95	Worth breaking out the parseAndVerify bit into its own function? We do it a lot now.
92–93	Where does this output go when running the fuzzer? Will we see / be able to act on this information?

Thanks for the comments! Added parseAndVerify function.

tools/llvm-opt-fuzzer/llvm-opt-fuzzer.cpp
68–73	I'm not sure how bitcode writer will behave on the invalid module. I would suspect that this verification is fast compared to the save/reload part.
92–93	This goes to the stderr and I assume this will be visible in the ClusterFuzz logs.

I think this is okay to go in, with the caveat that we need a plan for fixing up the verifier and removing this.

tools/llvm-opt-fuzzer/llvm-opt-fuzzer.cpp
68–73	Are you set up to measure this? It'd be nice to qualify that claim.
83–85	Interestingly, it would be trivial to write a fuzzer that finds these verifier bugs at this point. Maybe we should do that as a way to move towards a state where we can remove the redundant checks?

This revision is now accepted and ready to land.Feb 2 2018, 10:00 AM

Closed by commit rL324225: [llvm-opt-fuzzer] Avoid adding incorrect inputs to the fuzzer corpus (authored by igor.laevsky). · Explain WhyFeb 5 2018, 3:09 AM

This revision was automatically updated to reflect the committed changes.

With regards to the performance concerns - I will keep an eye on the fuzzer statistics and revert the change if the slowdown becomes too much,

tools/llvm-opt-fuzzer/llvm-opt-fuzzer.cpp
68–73	I based it mainly on the fact that verification is faster than save/reload + verification. In any case I don't think there is a way to avoid the first verification since bitcode writer may be unfriendly to the invalid modules.
83–85	Yes, that's an interesting idea. Actually we can even write more general fuzzer for the FuzzMutate itself which will cover this kinds of issues and all other possible bugs in the FuzzMutate.

Revision Contents

Path

Size

tools/

llvm-opt-fuzzer/

llvm-opt-fuzzer.cpp

30 lines

Diff 131028

tools/llvm-opt-fuzzer/llvm-opt-fuzzer.cpp

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	extern "C" LLVM_ATTRIBUTE_USED size_t LLVMFuzzerCustomMutator(
auto M = parseModule(Data, Size, Context);		auto M = parseModule(Data, Size, Context);
if (!M \|\| verifyModule(*M, &errs())) {		if (!M \|\| verifyModule(*M, &errs())) {
errs() << "error: mutator input module is broken!\n";		errs() << "error: mutator input module is broken!\n";
return 0;		return 0;
}		}

Mutator->mutateModule(*M, Seed, Size, MaxSize);		Mutator->mutateModule(*M, Seed, Size, MaxSize);

#ifndef NDEBUG
if (verifyModule(*M, &errs())) {		if (verifyModule(*M, &errs())) {
errs() << "mutation result doesn't pass verification\n";		errs() << "mutation result doesn't pass verification\n";
M->dump();		M->dump();
abort();		// Avoid adding incorrect test cases to the corpus.
		return 0;
		}
		bognerUnsubmitted Not Done Reply Inline Actions Can we drop this part and only verify after the reload? bogner: Can we drop this part and only verify after the reload?
		igor-laevskyAuthorUnsubmitted Not Done Reply Inline Actions I'm not sure how bitcode writer will behave on the invalid module. I would suspect that this verification is fast compared to the save/reload part. igor-laevsky: I'm not sure how bitcode writer will behave on the invalid module. I would suspect that this…
		bognerUnsubmitted Not Done Reply Inline Actions Are you set up to measure this? It'd be nice to qualify that claim. bogner: Are you set up to measure this? It'd be nice to qualify that claim.
		igor-laevskyAuthorUnsubmitted Not Done Reply Inline Actions I based it mainly on the fact that verification is faster than save/reload + verification. In any case I don't think there is a way to avoid the first verification since bitcode writer may be unfriendly to the invalid modules. igor-laevsky: I based it mainly on the fact that verification is faster than save/reload + verification. In…

		std::string Buf;
		{
		raw_string_ostream OS(Buf);
		WriteBitcodeToFile(M.get(), OS);
		}
		if (Buf.size() > MaxSize)
		return 0;

		// There are some invariants which are not checked by the verifier in favor
		// of having them checked by the parser. They may be considered as bugs in the
		// verifier and should be fixed there. However until all of those are covered
		bognerUnsubmitted Not Done Reply Inline Actions Interestingly, it would be trivial to write a fuzzer that finds these verifier bugs at this point. Maybe we should do that as a way to move towards a state where we can remove the redundant checks? bogner: Interestingly, it would be trivial to write a fuzzer that finds these verifier bugs at this…
		igor-laevskyAuthorUnsubmitted Not Done Reply Inline Actions Yes, that's an interesting idea. Actually we can even write more general fuzzer for the FuzzMutate itself which will cover this kinds of issues and all other possible bugs in the FuzzMutate. igor-laevsky: Yes, that's an interesting idea. Actually we can even write more general fuzzer for the…
		// we want to check for them explicitly. Otherwise we will add incorrect input
		// to the corpus and this is going to confuse the fuzzer which will start
		// exploration of the bitcode reader error handling code.
		auto NewM = parseModule(
		reinterpret_cast<const uint8_t*>(Buf.data()), Buf.size(), Context);
		if (!NewM \|\| verifyModule(*NewM, &errs())) {
		errs() << "mutator failed to re-read the module\n";
		M->dump();
		bognerUnsubmitted Not Done Reply Inline Actions Where does this output go when running the fuzzer? Will we see / be able to act on this information? bogner: Where does this output go when running the fuzzer? Will we see / be able to act on this…
		igor-laevskyAuthorUnsubmitted Not Done Reply Inline Actions This goes to the stderr and I assume this will be visible in the ClusterFuzz logs. igor-laevsky: This goes to the stderr and I assume this will be visible in the ClusterFuzz logs.
		return 0;
}		}
		bognerUnsubmitted Done Reply Inline Actions Worth breaking out the parseAndVerify bit into its own function? We do it a lot now. bogner: Worth breaking out the parseAndVerify bit into its own function? We do it a lot now.
#endif

return writeModule(*M, Data, MaxSize);		memcpy(Data, Buf.data(), Buf.size());
		return Buf.size();
}		}

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {		extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
assert(TM && "Should have been created during fuzzer initialization");		assert(TM && "Should have been created during fuzzer initialization");

if (Size <= 1)		if (Size <= 1)
// We get bogus data given an empty corpus - ignore it.		// We get bogus data given an empty corpus - ignore it.
return 0;		return 0;
▲ Show 20 Lines • Show All 138 Lines • Show Last 20 Lines