This is an archive of the discontinued LLVM Phabricator instance.

[llvm-opt-fuzzer] Avoid adding incorrect inputs to the fuzzer corpus
ClosedPublic

Authored by igor-laevsky on Jan 23 2018, 3:48 AM.

Download Raw Diff

Details

Reviewers

kcc
bogner

Commits

rG14c979da329d: [llvm-opt-fuzzer] Avoid adding incorrect inputs to the fuzzer corpus
rL324225: [llvm-opt-fuzzer] Avoid adding incorrect inputs to the fuzzer corpus

Summary

In general during continuous fuzzing we want to avoid adding invalid inputs into the corpus because they will test the wrong thing (error handling in the bitcode reader). Instead it would be better to concentrate on the actual fuzzing target by only producing valid inputs.

In the perfect world llvm mutator will always produce correct llvm ir and everything would work flawlessly. However there are number of cases when mutator fails to guarantee that. We catch them by running module verification after mutation. In theory this should be sufficient to prevent exposing incorrect inputs to the libFuzzer. However I noticed that still occasionally incorrect inputs would flow into the fuzzer corpus.

Problem lurks with some rare invariants which are only checked by the llvm reader. This means that verification after mutation will not catch them. Ideal solution would be to first fix all those issues in the verifier and then fix the mutator to not produce such mutations. However it's unclear how much of those are there and debugging each of them might prove to be complicated.

So until those are fixed I think it's reasonable to add explicit save/reload step as part of the after-mutation verification. This should produce cleaner continuous runs with clear indications of the mutator problems. On my machine this decreases exec/s by 10-30% but it seems like reasonable cost to pay for the correct runs.

Diff Detail

Repository: rL LLVM

Event Timeline

igor-laevsky created this revision.Jan 23 2018, 3:48 AM

Hi. Any comments on this?

I have mixed feelings about this, but I guess it's probably better than the status quo. My two main concerns are the time cost and whether we'll stop noticing the issues instead of fixing them.

tools/llvm-opt-fuzzer/llvm-opt-fuzzer.cpp
68–73 ↗	(On Diff #131028)	Can we drop this part and only verify after the reload?
89–95 ↗	(On Diff #131028)	Worth breaking out the parseAndVerify bit into its own function? We do it a lot now.
92–93 ↗	(On Diff #131028)	Where does this output go when running the fuzzer? Will we see / be able to act on this information?

Thanks for the comments! Added parseAndVerify function.

tools/llvm-opt-fuzzer/llvm-opt-fuzzer.cpp
68–73 ↗	(On Diff #131028)	I'm not sure how bitcode writer will behave on the invalid module. I would suspect that this verification is fast compared to the save/reload part.
92–93 ↗	(On Diff #131028)	This goes to the stderr and I assume this will be visible in the ClusterFuzz logs.

I think this is okay to go in, with the caveat that we need a plan for fixing up the verifier and removing this.

tools/llvm-opt-fuzzer/llvm-opt-fuzzer.cpp
83–85 ↗	(On Diff #132605)	Interestingly, it would be trivial to write a fuzzer that finds these verifier bugs at this point. Maybe we should do that as a way to move towards a state where we can remove the redundant checks?
68–73 ↗	(On Diff #131028)	Are you set up to measure this? It'd be nice to qualify that claim.

This revision is now accepted and ready to land.Feb 2 2018, 10:00 AM

Closed by commit rL324225: [llvm-opt-fuzzer] Avoid adding incorrect inputs to the fuzzer corpus (authored by igor.laevsky). · Explain WhyFeb 5 2018, 3:09 AM

This revision was automatically updated to reflect the committed changes.

With regards to the performance concerns - I will keep an eye on the fuzzer statistics and revert the change if the slowdown becomes too much,

tools/llvm-opt-fuzzer/llvm-opt-fuzzer.cpp
83–85 ↗	(On Diff #132605)	Yes, that's an interesting idea. Actually we can even write more general fuzzer for the FuzzMutate itself which will cover this kinds of issues and all other possible bugs in the FuzzMutate.
68–73 ↗	(On Diff #131028)	I based it mainly on the fact that verification is faster than save/reload + verification. In any case I don't think there is a way to avoid the first verification since bitcode writer may be unfriendly to the invalid modules.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

FuzzMutate/

FuzzerCLI.h

6 lines

lib/

FuzzMutate/

FuzzerCLI.cpp

10 lines

tools/

llvm-isel-fuzzer/

llvm-isel-fuzzer.cpp

4 lines

llvm-opt-fuzzer/

llvm-opt-fuzzer.cpp

38 lines

Diff 132799

llvm/trunk/include/llvm/FuzzMutate/FuzzerCLI.h

	Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	///			///
	/// \param M Module to print			/// \param M Module to print
	/// \param Dest Location to store serialized module			/// \param Dest Location to store serialized module
	/// \param MaxSize Size of the destination buffer			/// \param MaxSize Size of the destination buffer
	/// \return Number of bytes that were written. When module size exceeds MaxSize			/// \return Number of bytes that were written. When module size exceeds MaxSize
	/// returns 0 and leaves Dest unchanged.			/// returns 0 and leaves Dest unchanged.
	size_t writeModule(const Module &M, uint8_t *Dest, size_t MaxSize);			size_t writeModule(const Module &M, uint8_t *Dest, size_t MaxSize);

				/// Try to parse module and verify it. May output verification errors to the
				/// errs().
				/// \return New module or nullptr in case of error.
				std::unique_ptr<Module> parseAndVerify(const uint8_t *Data, size_t Size,
				LLVMContext &Context);

	} // end llvm namespace			} // end llvm namespace

	#endif // LLVM_FUZZMUTATE_FUZZER_CLI_H			#endif // LLVM_FUZZMUTATE_FUZZER_CLI_H

llvm/trunk/lib/FuzzMutate/FuzzerCLI.cpp

Show All 12 Lines
#include "llvm/Bitcode/BitcodeWriter.h"		#include "llvm/Bitcode/BitcodeWriter.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include "llvm/Support/Error.h"		#include "llvm/Support/Error.h"
#include "llvm/Support/MemoryBuffer.h"		#include "llvm/Support/MemoryBuffer.h"
#include "llvm/Support/SourceMgr.h"		#include "llvm/Support/SourceMgr.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
		#include "llvm/IR/Verifier.h"

using namespace llvm;		using namespace llvm;

void llvm::parseFuzzerCLOpts(int ArgC, char *ArgV[]) {		void llvm::parseFuzzerCLOpts(int ArgC, char *ArgV[]) {
std::vector<const char *> CLArgs;		std::vector<const char *> CLArgs;
CLArgs.push_back(ArgV[0]);		CLArgs.push_back(ArgV[0]);

int I = 1;		int I = 1;
▲ Show 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	std::string Buf;
raw_string_ostream OS(Buf);		raw_string_ostream OS(Buf);
WriteBitcodeToFile(&M, OS);		WriteBitcodeToFile(&M, OS);
}		}
if (Buf.size() > MaxSize)		if (Buf.size() > MaxSize)
return 0;		return 0;
memcpy(Dest, Buf.data(), Buf.size());		memcpy(Dest, Buf.data(), Buf.size());
return Buf.size();		return Buf.size();
}		}

		std::unique_ptr<Module> llvm::parseAndVerify(const uint8_t *Data, size_t Size,
		LLVMContext &Context) {
		auto M = parseModule(Data, Size, Context);
		if (!M \|\| verifyModule(*M, &errs()))
		return nullptr;

		return M;
		}

llvm/trunk/tools/llvm-isel-fuzzer/llvm-isel-fuzzer.cpp

	Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
	}			}

	extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {			extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
	if (Size <= 1)			if (Size <= 1)
	// We get bogus data given an empty corpus - ignore it.			// We get bogus data given an empty corpus - ignore it.
	return 0;			return 0;

	LLVMContext Context;			LLVMContext Context;
	auto M = parseModule(Data, Size, Context);			auto M = parseAndVerify(Data, Size, Context);
	if (!M \|\| verifyModule(*M, &errs())) {			if (!M) {
	errs() << "error: input module is broken!\n";			errs() << "error: input module is broken!\n";
	return 0;			return 0;
	}			}

	// Set up the module to build for our target.			// Set up the module to build for our target.
	M->setTargetTriple(TM->getTargetTriple().normalize());			M->setTargetTriple(TM->getTargetTriple().normalize());
	M->setDataLayout(TM->createDataLayout());			M->setDataLayout(TM->createDataLayout());

	▲ Show 20 Lines • Show All 74 Lines • Show Last 20 Lines

llvm/trunk/tools/llvm-opt-fuzzer/llvm-opt-fuzzer.cpp

	Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines

	extern "C" LLVM_ATTRIBUTE_USED size_t LLVMFuzzerCustomMutator(			extern "C" LLVM_ATTRIBUTE_USED size_t LLVMFuzzerCustomMutator(
	uint8_t *Data, size_t Size, size_t MaxSize, unsigned int Seed) {			uint8_t *Data, size_t Size, size_t MaxSize, unsigned int Seed) {

	assert(Mutator &&			assert(Mutator &&
	"IR mutator should have been created during fuzzer initialization");			"IR mutator should have been created during fuzzer initialization");

	LLVMContext Context;			LLVMContext Context;
	auto M = parseModule(Data, Size, Context);			auto M = parseAndVerify(Data, Size, Context);
	if (!M \|\| verifyModule(*M, &errs())) {			if (!M) {
	errs() << "error: mutator input module is broken!\n";			errs() << "error: mutator input module is broken!\n";
	return 0;			return 0;
	}			}

	Mutator->mutateModule(*M, Seed, Size, MaxSize);			Mutator->mutateModule(*M, Seed, Size, MaxSize);

	#ifndef NDEBUG
	if (verifyModule(*M, &errs())) {			if (verifyModule(*M, &errs())) {
	errs() << "mutation result doesn't pass verification\n";			errs() << "mutation result doesn't pass verification\n";
	M->dump();			M->dump();
	abort();			// Avoid adding incorrect test cases to the corpus.
				return 0;
				}

				std::string Buf;
				{
				raw_string_ostream OS(Buf);
				WriteBitcodeToFile(M.get(), OS);
				}
				if (Buf.size() > MaxSize)
				return 0;

				// There are some invariants which are not checked by the verifier in favor
				// of having them checked by the parser. They may be considered as bugs in the
				// verifier and should be fixed there. However until all of those are covered
				// we want to check for them explicitly. Otherwise we will add incorrect input
				// to the corpus and this is going to confuse the fuzzer which will start
				// exploration of the bitcode reader error handling code.
				auto NewM = parseAndVerify(
				reinterpret_cast<const uint8_t*>(Buf.data()), Buf.size(), Context);
				if (!NewM) {
				errs() << "mutator failed to re-read the module\n";
				M->dump();
				return 0;
	}			}
	#endif

	return writeModule(*M, Data, MaxSize);			memcpy(Data, Buf.data(), Buf.size());
				return Buf.size();
	}			}

	extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {			extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
	assert(TM && "Should have been created during fuzzer initialization");			assert(TM && "Should have been created during fuzzer initialization");

	if (Size <= 1)			if (Size <= 1)
	// We get bogus data given an empty corpus - ignore it.			// We get bogus data given an empty corpus - ignore it.
	return 0;			return 0;

	// Parse module			// Parse module
	//			//

	LLVMContext Context;			LLVMContext Context;
	auto M = parseModule(Data, Size, Context);			auto M = parseAndVerify(Data, Size, Context);
	if (!M \|\| verifyModule(*M, &errs())) {			if (!M) {
	errs() << "error: input module is broken!\n";			errs() << "error: input module is broken!\n";
	return 0;			return 0;
	}			}

	// Set up target dependant options			// Set up target dependant options
	//			//

	M->setTargetTriple(TM->getTargetTriple().normalize());			M->setTargetTriple(TM->getTargetTriple().normalize());
	▲ Show 20 Lines • Show All 123 Lines • Show Last 20 Lines