This is an archive of the discontinued LLVM Phabricator instance.

llvm-mc-fuzzer: add support for assembly
Needs RevisionPublic

Authored by bcain on Feb 19 2017, 9:09 AM.

Download Raw Diff

Details

Reviewers

kcc
dsanders

Summary

This enables the "-assemble" feature for llvm-mc-fuzzer.

Currently we just attempt assembly and ignore the result.

Diff Detail

Repository: rL LLVM

Event Timeline

bcain created this revision.Feb 19 2017, 9:09 AM

Herald added a subscriber: mgorny. · View Herald TranscriptFeb 19 2017, 9:09 AM

The code mostly looks good, but I am not an expert in the used APIs, hopefully someone else chimes in.

tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp
0	I strongly suggest to make this a separate fuzz target instead of using flags. Otherwise it'll be harder to automate running this target.
11	Why C-style comments?
18	LLVM coding style wants 'Data' and 'Size'
40	hm? need a return?
45	merge with the return?
91	coding style...

Currently we just attempt assembly and ignore the result.

Ignoring the result is the right thing to do since failure to assemble is an expected response to some inputs. Whether it's a correct response to a particular input can be found by separately running the corpus through the assembler and comparing against a reference (most likely, another assembler).

tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp
0	I'm not sure what you mean here. What difficulties are you thinking of? FWIW, this is in line with my original intent which was to mimic llvm-mc's interface.
5	reinterpret_cast tends to cause portability problems but this one is ok for the targets LLVM supports in-tree as far as I know.
52–54	These should be 'AsmVerbose' and 'UseDwarfDirectory' Also, why static? (and similarly for the other trivial constants below)
56–58	If we're going to do this for the assembler, we should do it for the disassembler too. It would be strange to be inconsistent between the two
73	Indentation
98	Indentation
122–126	You'll also catch more bugs by running it through the assembly streamer since some things can only be detected during emission. Optional: Could you make it an option to run it through the object streamer? Ideally, we'd have the -filetype option from llvm-mc. Mips in particular has some things that will only be detected when macro/directive expansion occurs in the object streamer.
128	Indentation

Suggested changes:

formatting: indentation
formatting: identifier names
feature: support for -filetype arg a la llvm-mc

I think I've satisfied all of the review concerns, save the one about reinterpret_cast. Daniel, please let me know if the comment was just informative or if you prefer a change there.

tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp
0	I strongly suggest to make this a separate fuzz target instead of using flags. I've preserved the original design for llvm-mc-fuzzer, apparently to imitate llvm-mc. Pros/cons of the current design: pro: matches llvm-mc pro: changing focus to probe different paths only requires different command line args con: reproducing fuzzer configuration more difficult because it depends on those args con: libFuzzer might see the uncovered feature set as a goal for coverage (that we already know statically it cannot cover). For that last one, it's speculation on my part. Kostya, would you be satisfied with this as-is or should I decompose it into two fuzzers? "Harder to automate" consists of "I must make sure that I can deliver the right command line args to the automation feature"? Or "won't fit well in oss-fuzz" or something else?

In D30156#681438, @dsanders wrote:

Currently we just attempt assembly and ignore the result.

Ignoring the result is the right thing to do since failure to assemble is an expected response to some inputs. Whether it's a correct response to a particular input can be found by separately running the corpus through the assembler and comparing against a reference (most likely, another assembler).

Just to make sure we're all on the same page, I think anything leveraging a reference assembler is out of scope (for now anyways).

In D30156#681790, @bcain wrote:

In D30156#681438, @dsanders wrote:

Currently we just attempt assembly and ignore the result.

Ignoring the result is the right thing to do since failure to assemble is an expected response to some inputs. Whether it's a correct response to a particular input can be found by separately running the corpus through the assembler and comparing against a reference (most likely, another assembler).

Just to make sure we're all on the same page, I think anything leveraging a reference assembler is out of scope (for now anyways).

That's right. Crashes aside, it's not llvm-mc-fuzzers job to check the correctness of any particular output. The comment about a reference assembler was meant to indicate how someone would check correctness externally using the corpus emitted by llvm-mc-fuzzer.

kcc added inline comments.Feb 20 2017, 8:29 PM

tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp
0	I'm not sure what you mean here. What difficulties are you thinking of? Imagine an automated system that runs continuous fuzzing (e.g. https://github.com/google/oss-fuzz). How are you going to tell it to run the same binary with two different flags and to treat those as two independent entities? Of course, it's possible to implement support for something like this, but OSS-Fuzz does not and will not support it. (because of KISS: https://en.wikipedia.org/wiki/KISS_principle) When analyzing the code coverage (manually, or automatically) there will be a huge lump of code that is never reached in one mode, i.e. this 2-in-1 bundle will confuse the analysis. Finally, at least in libFuzzer, part of the algorithm is linear by the size of the binary (more precisely: number of instrumented blocks) and so this bundled fuzzer will just be burning CPUs with no reason. FWIW, this is in line with my original intent which was to mimic llvm-mc's interface. Yes, and I objected back then :)

dsanders added inline comments.Feb 21 2017, 2:52 AM

tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp
0	I'm not sure what you mean here. What difficulties are you thinking of? Imagine an automated system that runs continuous fuzzing (e.g. https://github.com/google/oss-fuzz). How are you going to tell it to run the same binary with two different flags and to treat those as two independent entities? I'm not familiar with oss-fuzz but based on an initial glance through I'm not sure how this is different from oss-fuzz/projects/curl/. That project is using pre-processor macros to select between different fuzzers. To answer the question though, if I wanted to fuzz everything (assembler/disassembler, all arches, subarches, and feature combinations) in this kind of system and the curl/llvm-mc-fuzzer way had been ruled out. I'd probably use the first few bytes of the data as the configuration and do a full setup/teardown in LLVMFuzzerTestOneInput(). That said, I think that's a different kind of fuzzer to llvm-mc-fuzzer. It would aim to improve the quality of the LLVM project as a whole whereas llvm-mc-fuzzer was meant to help backend developers improve the quality of their particular targets and subtargets. Of course, it's possible to implement support for something like this, but OSS-Fuzz does not and will not support it. (because of KISS: https://en.wikipedia.org/wiki/KISS_principle) This principle is the reason this tool uses command line arguments for the action/triple/arch/subarch/features. Command line arguments were the simplest way to configure a particular target without having to re-compile for each combination. I included support for other archs/subarches/features because it made the original goal easier and also made the tool more useful to others. When analyzing the code coverage (manually, or automatically) there will be a huge lump of code that is never reached in one mode, i.e. this 2-in-1 bundle will confuse the analysis. FWIW, this is also the case between arches/subarches/features. For example, on an X86 host using default options, the AArch64/ARM/Mips/etc. disassemblers are not tested. Finally, at least in libFuzzer, part of the algorithm is linear by the size of the binary (more precisely: number of instrumented blocks) and so this bundled fuzzer will just be burning CPUs with no reason. That's a fair point. FWIW, this is in line with my original intent which was to mimic llvm-mc's interface. Yes, and I objected back then :) I remember you objected to having a custom main function that mangled the arguments before passing them on to libFuzzer and I fixed that. I didn't think there was an objection to command line arguments in general though. If the objection was to command line arguments in general, Is there a way to test an architecture in isolation from the others that's more in keeping with libFuzzer's style?

kcc added inline comments.Feb 21 2017, 12:26 PM

tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp
0	using pre-processor macros to select between different fuzzers. That's fine, since it creates different binaries, where unused code has a good chance to not even be linked in. I'd probably use the first few bytes of the data as the configuration That's an option but it has 2 problems: now the inputs are from some new artificial data format fuzzing is less efficient due to larger binary FWIW, this is also the case between arches/subarches/features. Yes. you objected to having a custom main function that mangled the arguments Yes, and that is currently not supported at all (was a mistake). Is there a way to test an architecture in isolation from the others that's more in keeping with libFuzzer's style? Have separate binary (build target) for every distinct configuration of code we have in mind. How many are we talking about here? tens or hundreds? Having 20-30 binaries like this is totally fine imho, and works great on oss-fuzz (see e.g. the way we've done it for ffmpeg, where we have ~40 binaries). https://github.com/google/oss-fuzz/blob/master/projects/ffmpeg/build.sh

bcain added inline comments.Feb 21 2017, 12:54 PM

tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp
0	Have separate binary (build target) for every distinct configuration of code we have in mind. I'd be willing to modify my submission to at least decompose llvm-mc-fuzzer into llvm-mc-assemble-fuzzer and llvm-mc-disassemble-fuzzer. That would solve the problem of having primarily disjoint content under a single binary. Is that a reasonable compromise, or is it necessary to have zero command line args?

It's a good compromise to start with.

dsanders added inline comments.Feb 21 2017, 1:21 PM

tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp
0	Have separate binary (build target) for every distinct configuration of code we have in mind. How many are we talking about here? tens or hundreds? Having 20-30 binaries like this is totally fine imho, and works great on oss-fuzz (see e.g. the way we've done it for ffmpeg, where we have ~40 binaries). https://github.com/google/oss-fuzz/blob/master/projects/ffmpeg/build.sh It depends how far you go with splitting it up. If you split binaries at the arch level then there's 15 in-tree targets (I didn't check how many have an MC layer). Including subarches it's probably hundreds (Mips has ~20 I can think of off-hand), and including features it's likely to push towards thousands. You only save binary size by splitting at the arch level though since LLVM doesn't have a means to limit support to subarches and smaller. I'd be willing to modify my submission to at least decompose llvm-mc-fuzzer into llvm-mc-assemble-fuzzer and llvm-mc-disassemble-fuzzer. That would solve the problem of having primarily disjoint content under a single binary. Is that a reasonable compromise, or is it necessary to have zero command line args? That sounds reasonable to me.

Now decomposed into separate -assemble, -disassemble executables.

Looks good. We can iterate after this version is submitted.
I've left two comments, but feel free to address them in future commits.

tools/llvm-mc-disassemble-fuzzer/llvm-mc-disassemble-fuzzer.cpp
79	why do you limit the size this way? Isn't it useful to run tiny inputs?
97	what will be the behavior if no flags are supplied? Can we set the default values so that the fuzzer will do something meaningful w/o any flags? Also, if we have the default values as a macro that we can re-define from a cmake flag, this will solve the problem of building multiple binaries .

This revision is now accepted and ready to land.Feb 23 2017, 11:34 AM

dsanders added inline comments.Feb 24 2017, 2:06 AM

tools/llvm-mc-disassemble-fuzzer/llvm-mc-disassemble-fuzzer.cpp
79	I don't think we should have this limit. When I was testing the Mips disassembler, I found it very useful to limit the fuzzer to 4-bytes of data so that the buffer was always the opcode of the unsupported/broken instruction. I also found a bug in 0-3 byte buffers where it assumed it was safe to read the first instruction and would overflow the buffer.
97	what will be the behavior if no flags are supplied? Can we set the default values so that the fuzzer will do something meaningful w/o any flags? It will test the default triple from sys::getDefaultTargetTriple(). This is usually the host but it can be set in CMake. Also, if we have the default values as a macro that we can re-define from a cmake flag, this will solve the problem of building multiple binaries. This is partially available through CMake's LLVM_DEFAULT_TARGET_TRIPLE variable. The triple influences the default -mcpu and -mattrs but not all subtargets can be described with just a triple.

bcain added inline comments.Feb 24 2017, 6:48 AM

tools/llvm-mc-disassemble-fuzzer/llvm-mc-disassemble-fuzzer.cpp
79	Agreed: this was an error, I was experimenting and I will remove it.
97	Also, if we have the default values as a macro that we can re-define from a cmake flag, this will solve the problem of building multiple binaries. This is partially available through CMake's LLVM_DEFAULT_TARGET_TRIPLE variable. The triple influences the default -mcpu and -mattrs but not all subtargets can be described with just a triple. I believe Kostya was referring to building the set of all dis/assemblers. I think archs are available in CMake -- we could use that to iterate over, but I think what we really need are the set of all triples. And I suspect that there is no such facility.

Delete experimental "(Size < 1024)" debug code -- was not intended for submission.

LGTM from fuzzing POV. Let's commit and experiment further.

This revision now requires changes to proceed.Feb 27 2017, 5:19 PM

Revision Contents

Path

Size

tools/

llvm-mc-assemble-fuzzer/

CMakeLists.txt

19 lines

llvm-mc-assemble-fuzzer.cpp

313 lines

llvm-mc-disassemble-fuzzer/

CMakeLists.txt

21 lines

llvm-mc-disassemble-fuzzer.cpp

143 lines

llvm-mc-fuzzer/

CMakeLists.txt

llvm-mc-fuzzer.cpp

Diff 89793

tools/llvm-mc-assemble-fuzzer/CMakeLists.txt

This file was added.

				if( LLVM_USE_SANITIZE_COVERAGE )
				include_directories(BEFORE
				${CMAKE_CURRENT_SOURCE_DIR}/../../lib/Fuzzer)

				set(LLVM_LINK_COMPONENTS
				AllTargetsAsmPrinters
				AllTargetsAsmParsers
				AllTargetsDescs
				AllTargetsInfos
				MC
				MCParser
				Support
				)
				add_llvm_tool(llvm-mc-assemble-fuzzer
				llvm-mc-assemble-fuzzer.cpp)
				target_link_libraries(llvm-mc-assemble-fuzzer
				LLVMFuzzer
				)
				endif()

tools/llvm-mc-assemble-fuzzer/llvm-mc-assemble-fuzzer.cpp

This file was added.

				//===--- llvm-mc-fuzzer.cpp - Fuzzer for the MC layer ---------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				//===----------------------------------------------------------------------===//

				#include "FuzzerInterface.h"
				#include "llvm-c/Target.h"
				#include "llvm/MC/SubtargetFeature.h"
				#include "llvm/MC/MCAsmBackend.h"
				#include "llvm/MC/MCAsmInfo.h"
				#include "llvm/MC/MCContext.h"
				#include "llvm/MC/MCInstPrinter.h"
				#include "llvm/MC/MCInstrInfo.h"
				#include "llvm/MC/MCObjectFileInfo.h"
				#include "llvm/MC/MCParser/AsmLexer.h"
				#include "llvm/MC/MCParser/MCTargetAsmParser.h"
				#include "llvm/MC/MCRegisterInfo.h"
				#include "llvm/MC/MCSectionMachO.h"
				#include "llvm/MC/MCStreamer.h"
				#include "llvm/MC/MCSubtargetInfo.h"
				#include "llvm/MC/MCTargetOptionsCommandFlags.h"
				#include "llvm/Support/MemoryBuffer.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Support/FileUtilities.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Support/SourceMgr.h"
				#include "llvm/Support/TargetSelect.h"
				#include "llvm/Support/TargetRegistry.h"
				#include "llvm/Support/ToolOutputFile.h"

				using namespace llvm;

				static cl::opt<std::string>
				TripleName("triple", cl::desc("Target triple to assemble for, "
				"see -version for available targets"));

				static cl::opt<std::string>
				MCPU("mcpu",
				cl::desc("Target a specific cpu type (-mcpu=help for details)"),
				cl::value_desc("cpu-name"), cl::init(""));

				// This is useful for variable-length instruction sets.
				static cl::opt<unsigned> InsnLimit(
				"insn-limit",
				cl::desc("Limit the number of instructions to process (0 for no limit)"),
				cl::value_desc("count"), cl::init(0));

				static cl::list<std::string>
				MAttrs("mattr", cl::CommaSeparated,
				cl::desc("Target specific attributes (-mattr=help for details)"),
				cl::value_desc("a1,+a2,-a3,..."));
				// The feature string derived from -mattr's values.
				std::string FeaturesStr;

				static cl::list<std::string>
				FuzzerArgs("fuzzer-args", cl::Positional,
				cl::desc("Options to pass to the fuzzer"), cl::ZeroOrMore,
				cl::PositionalEatsArgs);
				static std::vector<char *> ModifiedArgv;

				enum OutputFileType {
				OFT_Null,
				OFT_AssemblyFile,
				OFT_ObjectFile
				};
				static cl::opt<OutputFileType>
				FileType("filetype", cl::init(OFT_AssemblyFile),
				cl::desc("Choose an output file type:"),
				cl::values(
				clEnumValN(OFT_AssemblyFile, "asm",
				"Emit an assembly ('.s') file"),
				clEnumValN(OFT_Null, "null",
				"Don't emit anything (for timing purposes)"),
				clEnumValN(OFT_ObjectFile, "obj",
				"Emit a native object ('.o') file")));


				class LLVMFuzzerInputBuffer : public MemoryBuffer
				{
				public:
				LLVMFuzzerInputBuffer(const uint8_t *data_, size_t size_)
				: Data(reinterpret_cast<const char *>(data_)),
				Size(size_) {
				init(Data, Data+Size, false);
				}


				virtual BufferKind getBufferKind() const {
				return MemoryBuffer_Malloc; // it's not disk-backed so I think that's
				// the intent ... though AFAIK it
				// probably came from an mmap or sbrk
				}

				private:
				const char *Data;
				size_t Size;
				};

				static int AssembleInput(const char ProgName, const Target TheTarget,
				SourceMgr &SrcMgr, MCContext &Ctx, MCStreamer &Str,
				MCAsmInfo &MAI, MCSubtargetInfo &STI,
				MCInstrInfo &MCII, MCTargetOptions &MCOptions) {
				static const bool NoInitialTextSection = false;

				std::unique_ptr<MCAsmParser> Parser(
				createMCAsmParser(SrcMgr, Ctx, Str, MAI));

				std::unique_ptr<MCTargetAsmParser> TAP(
				TheTarget->createMCAsmParser(STI, *Parser, MCII, MCOptions));

				if (!TAP) {
				errs() << ProgName
				<< ": error: this target '" << TripleName
				<< "', does not support assembly parsing.\n";
				abort();
				}

				Parser->setTargetParser(*TAP);

				return Parser->Run(NoInitialTextSection);
				}


				int AssembleOneInput(const uint8_t *Data, size_t Size) {
				const bool ShowInst = false;
				const bool AsmVerbose = false;
				const bool UseDwarfDirectory = true;

				Triple TheTriple(Triple::normalize(TripleName));

				SourceMgr SrcMgr;

				std::unique_ptr<MemoryBuffer> BufferPtr(new LLVMFuzzerInputBuffer(Data, Size));

				// Tell SrcMgr about this buffer, which is what the parser will pick up.
				SrcMgr.AddNewSourceBuffer(std::move(BufferPtr), SMLoc());

				static const std::vector<std::string> NoIncludeDirs;
				SrcMgr.setIncludeDirs(NoIncludeDirs);

				static std::string ArchName;
				std::string Error;
				const Target *TheTarget = TargetRegistry::lookupTarget(ArchName, TheTriple,
				Error);
				if (!TheTarget) {
				errs() << "error: this target '" << TheTriple.normalize()
				<< "/" << ArchName << "', was not found: '" << Error << "'\n";

				abort();
				}

				std::unique_ptr<MCRegisterInfo> MRI(TheTarget->createMCRegInfo(TripleName));
				if (!MRI) {
				errs() << "Unable to create target register info!";
				abort();
				}

				std::unique_ptr<MCAsmInfo> MAI(TheTarget->createMCAsmInfo(*MRI, TripleName));
				if (!MAI) {
				errs() << "Unable to create target asm info!";
				abort();
				}


				MCObjectFileInfo MOFI;
				MCContext Ctx(MAI.get(), MRI.get(), &MOFI, &SrcMgr);

				static const bool UsePIC = false;
				static const CodeModel::Model CMModel = CodeModel::Default;
				MOFI.InitMCObjectFileInfo(TheTriple, UsePIC, CMModel, Ctx);

				const unsigned OutputAsmVariant = 0;
				std::unique_ptr<MCInstrInfo> MCII(TheTarget->createMCInstrInfo());
				MCInstPrinter *IP = TheTarget->createMCInstPrinter(Triple(TripleName), OutputAsmVariant,
				MAI, MCII, *MRI);
				if (!IP) {
				errs()
				<< "error: unable to create instruction printer for target triple '"
				<< TheTriple.normalize() << "' with assembly variant "
				<< OutputAsmVariant << ".\n";

				abort();
				}

				const char *ProgName = "llvm-mc-fuzzer";
				std::unique_ptr<MCSubtargetInfo> STI(
				TheTarget->createMCSubtargetInfo(TripleName, MCPU, FeaturesStr));
				MCCodeEmitter *CE = nullptr;
				MCAsmBackend *MAB = nullptr;

				MCTargetOptions MCOptions = InitMCTargetOptionsFromFlags();

				std::string OutputString;
				raw_string_ostream Out(OutputString);
				auto FOut = llvm::make_unique<formatted_raw_ostream>(Out);

				std::unique_ptr<MCStreamer> Str;

				if (FileType == OFT_AssemblyFile) {
				Str.reset(TheTarget->createAsmStreamer(
				Ctx, std::move(FOut), AsmVerbose,
				UseDwarfDirectory, IP, CE, MAB, ShowInst));
				} else {
				assert(FileType == OFT_ObjectFile && "Invalid file type!");

				std::error_code EC;
				const std::string OutputFilename = "-";
				auto Out = llvm::make_unique<tool_output_file>(OutputFilename, EC,
				sys::fs::F_None);
				if (EC) {
				errs() << EC.message() << '\n';
				abort();
				}

				// Don't waste memory on names of temp labels.
				Ctx.setUseNamesOnTempLabels(false);

				std::unique_ptr<buffer_ostream> BOS;
				raw_pwrite_stream *OS = &Out->os();
				if (!Out->os().supportsSeeking()) {
				BOS = make_unique<buffer_ostream>(Out->os());
				OS = BOS.get();
				}

				MCCodeEmitter CE = TheTarget->createMCCodeEmitter(MCII, *MRI, Ctx);
				MCAsmBackend MAB = TheTarget->createMCAsmBackend(MRI, TripleName, MCPU,
				MCOptions);
				Str.reset(TheTarget->createMCObjectStreamer(
				TheTriple, Ctx, MAB, OS, CE, *STI, MCOptions.MCRelaxAll,
				MCOptions.MCIncrementalLinkerCompatible,
				/DWARFMustBeAtTheEnd/ false));
				}
				const int Res = AssembleInput(ProgName, TheTarget, SrcMgr, Ctx, Str, MAI, *STI,
				*MCII, MCOptions);

				(void) Res;

				return 0;
				}

				int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
				return AssembleOneInput(Data, Size);
				}

				int LLVMFuzzerInitialize(int argc, char **argv) {
				// The command line is unusual compared to other fuzzers due to the need to
				// specify the target. Options like -triple, -mcpu, and -mattr work like
				// their counterparts in llvm-mc, while -fuzzer-args collects options for the
				// fuzzer itself.
				//
				// Examples:
				//
				// Fuzz the big-endian MIPS32R6 disassembler using 100,000 inputs of up to
				// 4-bytes each and use the contents of ./corpus as the test corpus:
				// llvm-mc-fuzzer -triple mips-linux-gnu -mcpu=mips32r6 -disassemble \
				// -fuzzer-args -max_len=4 -runs=100000 ./corpus
				//
				// Infinitely fuzz the little-endian MIPS64R2 disassembler with the MSA
				// feature enabled using up to 64-byte inputs:
				// llvm-mc-fuzzer -triple mipsel-linux-gnu -mcpu=mips64r2 -mattr=msa \
				// -disassemble -fuzzer-args ./corpus
				//
				// If your aim is to find instructions that are not tested, then it is
				// advisable to constrain the maximum input size to a single instruction
				// using -max_len as in the first example. This results in a test corpus of
				// individual instructions that test unique paths. Without this constraint,
				// there will be considerable redundancy in the corpus.

				char *OriginalArgv = argv;

				LLVMInitializeAllTargetInfos();
				LLVMInitializeAllTargetMCs();
				LLVMInitializeAllAsmParsers();

				cl::ParseCommandLineOptions(*argc, OriginalArgv);

				// Rebuild the argv without the arguments llvm-mc-fuzzer consumed so that
				// the driver can parse its arguments.
				//
				// FuzzerArgs cannot provide the non-const pointer that OriginalArgv needs.
				// Re-use the strings from OriginalArgv instead of copying FuzzerArg to a
				// non-const buffer to avoid the need to clean up when the fuzzer terminates.
				ModifiedArgv.push_back(OriginalArgv[0]);
				for (const auto &FuzzerArg : FuzzerArgs) {
				for (int i = 1; i < *argc; ++i) {
				if (FuzzerArg == OriginalArgv[i])
				ModifiedArgv.push_back(OriginalArgv[i]);
				}
				}
				*argc = ModifiedArgv.size();
				*argv = ModifiedArgv.data();

				// Package up features to be passed to target/subtarget
				// We have to pass it via a global since the callback doesn't
				// permit any user data.
				if (MAttrs.size()) {
				SubtargetFeatures Features;
				for (unsigned i = 0; i != MAttrs.size(); ++i)
				Features.AddFeature(MAttrs[i]);
				FeaturesStr = Features.getString();
				}

				if (TripleName.empty())
				TripleName = sys::getDefaultTargetTriple();

				return 0;
				}

tools/llvm-mc-disassemble-fuzzer/CMakeLists.txt

This file was added.

				if( LLVM_USE_SANITIZE_COVERAGE )
				include_directories(BEFORE
				${CMAKE_CURRENT_SOURCE_DIR}/../../lib/Fuzzer)

				set(LLVM_LINK_COMPONENTS
				AllTargetsAsmPrinters
				AllTargetsDescs
				AllTargetsDisassemblers
				AllTargetsInfos
				MC
				MCDisassembler
				MCParser
				Support
				)
				add_llvm_tool(llvm-mc-disassemble-fuzzer
				llvm-mc-disassemble-fuzzer.cpp)

				target_link_libraries(llvm-mc-disassemble-fuzzer
				LLVMFuzzer
				)
				endif()

tools/llvm-mc-disassemble-fuzzer/llvm-mc-disassemble-fuzzer.cpp

This file was added.

				//===--- llvm-mc-fuzzer.cpp - Fuzzer for the MC layer ---------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				//===----------------------------------------------------------------------===//

				#include "FuzzerInterface.h"
				#include "llvm-c/Disassembler.h"
				#include "llvm-c/Target.h"
				#include "llvm/MC/SubtargetFeature.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Support/raw_ostream.h"

				using namespace llvm;

				const unsigned AssemblyTextBufSize = 80;

				static cl::opt<std::string>
				TripleName("triple", cl::desc("Target triple to assemble for, "
				"see -version for available targets"));

				static cl::opt<std::string>
				MCPU("mcpu",
				cl::desc("Target a specific cpu type (-mcpu=help for details)"),
				cl::value_desc("cpu-name"), cl::init(""));

				// This is useful for variable-length instruction sets.
				static cl::opt<unsigned> InsnLimit(
				"insn-limit",
				cl::desc("Limit the number of instructions to process (0 for no limit)"),
				cl::value_desc("count"), cl::init(0));

				static cl::list<std::string>
				MAttrs("mattr", cl::CommaSeparated,
				cl::desc("Target specific attributes (-mattr=help for details)"),
				cl::value_desc("a1,+a2,-a3,..."));
				// The feature string derived from -mattr's values.
				std::string FeaturesStr;

				static cl::list<std::string>
				FuzzerArgs("fuzzer-args", cl::Positional,
				cl::desc("Options to pass to the fuzzer"), cl::ZeroOrMore,
				cl::PositionalEatsArgs);
				static std::vector<char *> ModifiedArgv;

				int DisassembleOneInput(const uint8_t *Data, size_t Size) {
				char AssemblyText[AssemblyTextBufSize];

				std::vector<uint8_t> DataCopy(Data, Data + Size);

				LLVMDisasmContextRef Ctx = LLVMCreateDisasmCPUFeatures(
				TripleName.c_str(), MCPU.c_str(), FeaturesStr.c_str(), nullptr, 0,
				nullptr, nullptr);
				assert(Ctx);
				uint8_t *p = DataCopy.data();
				unsigned Consumed;
				unsigned InstructionsProcessed = 0;
				do {
				Consumed = LLVMDisasmInstruction(Ctx, p, Size, 0, AssemblyText,
				AssemblyTextBufSize);
				Size -= Consumed;
				p += Consumed;

				InstructionsProcessed ++;
				if (InsnLimit != 0 && InstructionsProcessed < InsnLimit)
				break;
				} while (Consumed != 0);
				LLVMDisasmDispose(Ctx);
				return 0;
				}

				int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
				return DisassembleOneInput(Data, Size);
				}
				kccUnsubmitted Not Done Reply Inline Actions why do you limit the size this way? Isn't it useful to run tiny inputs? kcc: why do you limit the size this way? Isn't it useful to run tiny inputs?
				dsandersUnsubmitted Not Done Reply Inline Actions I don't think we should have this limit. When I was testing the Mips disassembler, I found it very useful to limit the fuzzer to 4-bytes of data so that the buffer was always the opcode of the unsupported/broken instruction. I also found a bug in 0-3 byte buffers where it assumed it was safe to read the first instruction and would overflow the buffer. dsanders: I don't think we should have this limit. When I was testing the Mips disassembler, I found it…
				bcainAuthorUnsubmitted Not Done Reply Inline Actions Agreed: this was an error, I was experimenting and I will remove it. bcain: Agreed: this was an error, I was experimenting and I will remove it.

				int LLVMFuzzerInitialize(int argc, char **argv) {
				// The command line is unusual compared to other fuzzers due to the need to
				// specify the target. Options like -triple, -mcpu, and -mattr work like
				// their counterparts in llvm-mc, while -fuzzer-args collects options for the
				// fuzzer itself.
				//
				// Examples:
				//
				// Fuzz the big-endian MIPS32R6 disassembler using 100,000 inputs of up to
				// 4-bytes each and use the contents of ./corpus as the test corpus:
				// llvm-mc-fuzzer -triple mips-linux-gnu -mcpu=mips32r6 -disassemble \
				// -fuzzer-args -max_len=4 -runs=100000 ./corpus
				//
				// Infinitely fuzz the little-endian MIPS64R2 disassembler with the MSA
				// feature enabled using up to 64-byte inputs:
				// llvm-mc-fuzzer -triple mipsel-linux-gnu -mcpu=mips64r2 -mattr=msa \
				// -disassemble -fuzzer-args ./corpus
				kccUnsubmitted Not Done Reply Inline Actions what will be the behavior if no flags are supplied? Can we set the default values so that the fuzzer will do something meaningful w/o any flags? Also, if we have the default values as a macro that we can re-define from a cmake flag, this will solve the problem of building multiple binaries . kcc: what will be the behavior if no flags are supplied? Can we set the default values so that the…
				dsandersUnsubmitted Not Done Reply Inline Actions what will be the behavior if no flags are supplied? Can we set the default values so that the fuzzer will do something meaningful w/o any flags? It will test the default triple from sys::getDefaultTargetTriple(). This is usually the host but it can be set in CMake. Also, if we have the default values as a macro that we can re-define from a cmake flag, this will solve the problem of building multiple binaries. This is partially available through CMake's LLVM_DEFAULT_TARGET_TRIPLE variable. The triple influences the default -mcpu and -mattrs but not all subtargets can be described with just a triple. dsanders: > what will be the behavior if no flags are supplied? > Can we set the default values so that…
				bcainAuthorUnsubmitted Not Done Reply Inline Actions Also, if we have the default values as a macro that we can re-define from a cmake flag, this will solve the problem of building multiple binaries. This is partially available through CMake's LLVM_DEFAULT_TARGET_TRIPLE variable. The triple influences the default -mcpu and -mattrs but not all subtargets can be described with just a triple. I believe Kostya was referring to building the set of all dis/assemblers. I think archs are available in CMake -- we could use that to iterate over, but I think what we really need are the set of all triples. And I suspect that there is no such facility. bcain: >> Also, if we have the default values as a macro that we can re-define from a cmake flag, this…
				//
				// If your aim is to find instructions that are not tested, then it is
				// advisable to constrain the maximum input size to a single instruction
				// using -max_len as in the first example. This results in a test corpus of
				// individual instructions that test unique paths. Without this constraint,
				// there will be considerable redundancy in the corpus.

				char *OriginalArgv = argv;

				LLVMInitializeAllTargetInfos();
				LLVMInitializeAllTargetMCs();
				LLVMInitializeAllDisassemblers();

				cl::ParseCommandLineOptions(*argc, OriginalArgv);

				// Rebuild the argv without the arguments llvm-mc-fuzzer consumed so that
				// the driver can parse its arguments.
				//
				// FuzzerArgs cannot provide the non-const pointer that OriginalArgv needs.
				// Re-use the strings from OriginalArgv instead of copying FuzzerArg to a
				// non-const buffer to avoid the need to clean up when the fuzzer terminates.
				ModifiedArgv.push_back(OriginalArgv[0]);
				for (const auto &FuzzerArg : FuzzerArgs) {
				for (int i = 1; i < *argc; ++i) {
				if (FuzzerArg == OriginalArgv[i])
				ModifiedArgv.push_back(OriginalArgv[i]);
				}
				}
				*argc = ModifiedArgv.size();
				*argv = ModifiedArgv.data();

				// Package up features to be passed to target/subtarget
				// We have to pass it via a global since the callback doesn't
				// permit any user data.
				if (MAttrs.size()) {
				SubtargetFeatures Features;
				for (unsigned i = 0; i != MAttrs.size(); ++i)
				Features.AddFeature(MAttrs[i]);
				FeaturesStr = Features.getString();
				}

				if (TripleName.empty())
				TripleName = sys::getDefaultTargetTriple();

				return 0;
				}

tools/llvm-mc-fuzzer/CMakeLists.txt

This file was deleted.

	if( LLVM_USE_SANITIZE_COVERAGE )
	include_directories(BEFORE
	${CMAKE_CURRENT_SOURCE_DIR}/../../lib/Fuzzer)

	set(LLVM_LINK_COMPONENTS
	AllTargetsDescs
	AllTargetsDisassemblers
	AllTargetsInfos
	MC
	MCDisassembler
	Support
	)
	add_llvm_tool(llvm-mc-fuzzer
	llvm-mc-fuzzer.cpp)
	target_link_libraries(llvm-mc-fuzzer
	LLVMFuzzer
	)
	endif()

tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp

This file was deleted.

	//===--- llvm-mc-fuzzer.cpp - Fuzzer for the MC layer ---------------------===//
	//
	// The LLVM Compiler Infrastructure
	//
	// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.
	//
	//===----------------------------------------------------------------------===//
	//
	//===----------------------------------------------------------------------===//

	#include "FuzzerInterface.h"
	#include "llvm-c/Disassembler.h"
	#include "llvm-c/Target.h"
	#include "llvm/MC/SubtargetFeature.h"
	#include "llvm/Support/CommandLine.h"
	#include "llvm/Support/raw_ostream.h"

	using namespace llvm;

	const unsigned AssemblyTextBufSize = 80;

	enum ActionType {
	AC_Assemble,
	AC_Disassemble
	};

	static cl::opt<ActionType>
	Action(cl::desc("Action to perform:"),
	cl::init(AC_Assemble),
	cl::values(clEnumValN(AC_Assemble, "assemble",
	"Assemble a .s file (default)"),
	clEnumValN(AC_Disassemble, "disassemble",
	"Disassemble strings of hex bytes")));

	static cl::opt<std::string>
	TripleName("triple", cl::desc("Target triple to assemble for, "
	"see -version for available targets"));

	static cl::opt<std::string>
	MCPU("mcpu",
	cl::desc("Target a specific cpu type (-mcpu=help for details)"),
	cl::value_desc("cpu-name"), cl::init(""));

	// This is useful for variable-length instruction sets.
	static cl::opt<unsigned> InsnLimit(
	"insn-limit",
	cl::desc("Limit the number of instructions to process (0 for no limit)"),
	cl::value_desc("count"), cl::init(0));

	static cl::list<std::string>
	MAttrs("mattr", cl::CommaSeparated,
	cl::desc("Target specific attributes (-mattr=help for details)"),
	cl::value_desc("a1,+a2,-a3,..."));
	// The feature string derived from -mattr's values.
	std::string FeaturesStr;

	static cl::list<std::string>
	FuzzerArgs("fuzzer-args", cl::Positional,
	cl::desc("Options to pass to the fuzzer"), cl::ZeroOrMore,
	cl::PositionalEatsArgs);
	static std::vector<char *> ModifiedArgv;

	int DisassembleOneInput(const uint8_t *Data, size_t Size) {
	char AssemblyText[AssemblyTextBufSize];

	std::vector<uint8_t> DataCopy(Data, Data + Size);

	LLVMDisasmContextRef Ctx = LLVMCreateDisasmCPUFeatures(
	TripleName.c_str(), MCPU.c_str(), FeaturesStr.c_str(), nullptr, 0,
	nullptr, nullptr);
	assert(Ctx);
	uint8_t *p = DataCopy.data();
	unsigned Consumed;
	unsigned InstructionsProcessed = 0;
	do {
	Consumed = LLVMDisasmInstruction(Ctx, p, Size, 0, AssemblyText,
	AssemblyTextBufSize);
	Size -= Consumed;
	p += Consumed;

	InstructionsProcessed ++;
	if (InsnLimit != 0 && InstructionsProcessed < InsnLimit)
	break;
	} while (Consumed != 0);
	LLVMDisasmDispose(Ctx);
	return 0;
	}

	int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
	if (Action == AC_Assemble)
	errs() << "error: -assemble is not implemented\n";
	else if (Action == AC_Disassemble)
	return DisassembleOneInput(Data, Size);

	llvm_unreachable("Unknown action");
	return 0;
	}

	int LLVMFuzzerInitialize(int argc, char **argv) {
	// The command line is unusual compared to other fuzzers due to the need to
	// specify the target. Options like -triple, -mcpu, and -mattr work like
	// their counterparts in llvm-mc, while -fuzzer-args collects options for the
	// fuzzer itself.
	//
	// Examples:
	//
	// Fuzz the big-endian MIPS32R6 disassembler using 100,000 inputs of up to
	// 4-bytes each and use the contents of ./corpus as the test corpus:
	// llvm-mc-fuzzer -triple mips-linux-gnu -mcpu=mips32r6 -disassemble \
	// -fuzzer-args -max_len=4 -runs=100000 ./corpus
	//
	// Infinitely fuzz the little-endian MIPS64R2 disassembler with the MSA
	// feature enabled using up to 64-byte inputs:
	// llvm-mc-fuzzer -triple mipsel-linux-gnu -mcpu=mips64r2 -mattr=msa \
	// -disassemble -fuzzer-args ./corpus
	//
	// If your aim is to find instructions that are not tested, then it is
	// advisable to constrain the maximum input size to a single instruction
	// using -max_len as in the first example. This results in a test corpus of
	// individual instructions that test unique paths. Without this constraint,
	// there will be considerable redundancy in the corpus.

	char *OriginalArgv = argv;

	LLVMInitializeAllTargetInfos();
	LLVMInitializeAllTargetMCs();
	LLVMInitializeAllDisassemblers();

	cl::ParseCommandLineOptions(*argc, OriginalArgv);

	// Rebuild the argv without the arguments llvm-mc-fuzzer consumed so that
	// the driver can parse its arguments.
	//
	// FuzzerArgs cannot provide the non-const pointer that OriginalArgv needs.
	// Re-use the strings from OriginalArgv instead of copying FuzzerArg to a
	// non-const buffer to avoid the need to clean up when the fuzzer terminates.
	ModifiedArgv.push_back(OriginalArgv[0]);
	for (const auto &FuzzerArg : FuzzerArgs) {
	for (int i = 1; i < *argc; ++i) {
	if (FuzzerArg == OriginalArgv[i])
	ModifiedArgv.push_back(OriginalArgv[i]);
	}
	}
	*argc = ModifiedArgv.size();
	*argv = ModifiedArgv.data();

	// Package up features to be passed to target/subtarget
	// We have to pass it via a global since the callback doesn't
	// permit any user data.
	if (MAttrs.size()) {
	SubtargetFeatures Features;
	for (unsigned i = 0; i != MAttrs.size(); ++i)
	Features.AddFeature(MAttrs[i]);
	FeaturesStr = Features.getString();
	}

	return 0;
	}

This is an archive of the discontinued LLVM Phabricator instance.

llvm-mc-fuzzer: add support for assemblyNeeds RevisionPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 89793

tools/llvm-mc-assemble-fuzzer/CMakeLists.txt

tools/llvm-mc-assemble-fuzzer/llvm-mc-assemble-fuzzer.cpp

tools/llvm-mc-disassemble-fuzzer/CMakeLists.txt

tools/llvm-mc-disassemble-fuzzer/llvm-mc-disassemble-fuzzer.cpp

tools/llvm-mc-fuzzer/CMakeLists.txt

tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp

llvm-mc-fuzzer: add support for assembly
Needs RevisionPublic