This is an archive of the discontinued LLVM Phabricator instance.

llvm-mc-fuzzer: add support for assembly
Needs RevisionPublic

Authored by bcain on Feb 19 2017, 9:09 AM.

Details

Reviewers
kcc
dsanders
Summary

This enables the "-assemble" feature for llvm-mc-fuzzer.

Currently we just attempt assembly and ignore the result.

Diff Detail

Repository
rL LLVM

Event Timeline

bcain created this revision.Feb 19 2017, 9:09 AM
kcc edited edge metadata.Feb 19 2017, 8:25 PM

The code mostly looks good, but I am not an expert in the used APIs, hopefully someone else chimes in.

tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp
0

I strongly suggest to make this a separate fuzz target instead of using flags.
Otherwise it'll be harder to automate running this target.

11

Why C-style comments?

18

LLVM coding style wants 'Data' and 'Size'

40

hm? need a return?

45

merge with the return?

91

coding style...

dsanders edited edge metadata.Feb 20 2017, 3:02 AM

Currently we just attempt assembly and ignore the result.

Ignoring the result is the right thing to do since failure to assemble is an expected response to some inputs. Whether it's a correct response to a particular input can be found by separately running the corpus through the assembler and comparing against a reference (most likely, another assembler).

tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp
0

I'm not sure what you mean here. What difficulties are you thinking of?

FWIW, this is in line with my original intent which was to mimic llvm-mc's interface.

5

reinterpret_cast tends to cause portability problems but this one is ok for the targets LLVM supports in-tree as far as I know.

52–54

These should be 'AsmVerbose' and 'UseDwarfDirectory'

Also, why static? (and similarly for the other trivial constants below)

56–58

If we're going to do this for the assembler, we should do it for the disassembler too. It would be strange to be inconsistent between the two

73

Indentation

98

Indentation

122–126

You'll also catch more bugs by running it through the assembly streamer since some things can only be detected during emission.

Optional: Could you make it an option to run it through the object streamer? Ideally, we'd have the -filetype option from llvm-mc. Mips in particular has some things that will only be detected when macro/directive expansion occurs in the object streamer.

128

Indentation

bcain updated this revision to Diff 89142.Feb 20 2017, 12:41 PM

Suggested changes:

  • formatting: indentation
  • formatting: identifier names
  • feature: support for -filetype arg a la llvm-mc
bcain marked 11 inline comments as done.Feb 20 2017, 12:56 PM

I think I've satisfied all of the review concerns, save the one about reinterpret_cast. Daniel, please let me know if the comment was just informative or if you prefer a change there.

tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp
0

I strongly suggest to make this a separate fuzz target instead of using flags.

I've preserved the original design for llvm-mc-fuzzer, apparently to imitate llvm-mc.

Pros/cons of the current design:

  • pro: matches llvm-mc
  • pro: changing focus to probe different paths only requires different command line args
  • con: reproducing fuzzer configuration more difficult because it depends on those args
  • con: libFuzzer might see the uncovered feature set as a goal for coverage (that we already know statically it cannot cover).

For that last one, it's speculation on my part.

Kostya, would you be satisfied with this as-is or should I decompose it into two fuzzers? "Harder to automate" consists of "I must make sure that I can deliver the right command line args to the automation feature"? Or "won't fit well in oss-fuzz" or something else?

bcain added a comment.Feb 20 2017, 1:01 PM

Currently we just attempt assembly and ignore the result.

Ignoring the result is the right thing to do since failure to assemble is an expected response to some inputs. Whether it's a correct response to a particular input can be found by separately running the corpus through the assembler and comparing against a reference (most likely, another assembler).

Just to make sure we're all on the same page, I think anything leveraging a reference assembler is out of scope (for now anyways).

Currently we just attempt assembly and ignore the result.

Ignoring the result is the right thing to do since failure to assemble is an expected response to some inputs. Whether it's a correct response to a particular input can be found by separately running the corpus through the assembler and comparing against a reference (most likely, another assembler).

Just to make sure we're all on the same page, I think anything leveraging a reference assembler is out of scope (for now anyways).

That's right. Crashes aside, it's not llvm-mc-fuzzers job to check the correctness of any particular output. The comment about a reference assembler was meant to indicate how someone would check correctness externally using the corpus emitted by llvm-mc-fuzzer.

kcc added inline comments.Feb 20 2017, 8:29 PM
tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp
0

I'm not sure what you mean here. What difficulties are you thinking of?

Imagine an automated system that runs continuous fuzzing (e.g. https://github.com/google/oss-fuzz).
How are you going to tell it to run the same binary with two different flags and to treat those
as two independent entities?
Of course, it's possible to implement support for something like this, but OSS-Fuzz does not and will not support it.
(because of KISS: https://en.wikipedia.org/wiki/KISS_principle)

When analyzing the code coverage (manually, or automatically) there will be a huge lump of code that is never reached in one mode, i.e. this 2-in-1 bundle will confuse the analysis.

Finally, at least in libFuzzer, part of the algorithm is linear by the size of the binary (more precisely: number of instrumented blocks) and so this bundled fuzzer will just be burning CPUs with no reason.

FWIW, this is in line with my original intent which was to mimic llvm-mc's interface.

Yes, and I objected back then :)

dsanders added inline comments.Feb 21 2017, 2:52 AM
tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp
0

I'm not sure what you mean here. What difficulties are you thinking of?

Imagine an automated system that runs continuous fuzzing (e.g.
https://github.com/google/oss-fuzz).
How are you going to tell it to run the same binary with two different flags and to treat those
as two independent entities?

I'm not familiar with oss-fuzz but based on an initial glance through I'm not sure how this is different from oss-fuzz/projects/curl/. That project is using pre-processor macros to select between different fuzzers.

To answer the question though, if I wanted to fuzz everything (assembler/disassembler, all arches, subarches, and feature combinations) in this kind of system and the curl/llvm-mc-fuzzer way had been ruled out. I'd probably use the first few bytes of the data as the configuration and do a full setup/teardown in LLVMFuzzerTestOneInput().

That said, I think that's a different kind of fuzzer to llvm-mc-fuzzer. It would aim to improve the quality of the LLVM project as a whole whereas llvm-mc-fuzzer was meant to help backend developers improve the quality of their particular targets and subtargets.

Of course, it's possible to implement support for something like this, but OSS-Fuzz does not and
will not support it.
(because of KISS: https://en.wikipedia.org/wiki/KISS_principle)

This principle is the reason this tool uses command line arguments for the action/triple/arch/subarch/features. Command line arguments were the simplest way to configure a particular target without having to re-compile for each combination. I included support for other archs/subarches/features because it made the original goal easier and also made the tool more useful to others.

When analyzing the code coverage (manually, or automatically) there will be a huge lump of code
that is never reached in one mode, i.e. this 2-in-1 bundle will confuse the analysis.

FWIW, this is also the case between arches/subarches/features. For example, on an X86 host using default options, the AArch64/ARM/Mips/etc. disassemblers are not tested.

Finally, at least in libFuzzer, part of the algorithm is linear by the size of the binary (more precisely:
number of instrumented blocks) and so this bundled fuzzer will just be burning CPUs with no
reason.

That's a fair point.

FWIW, this is in line with my original intent which was to mimic llvm-mc's interface.

Yes, and I objected back then :)

I remember you objected to having a custom main function that mangled the arguments before passing them on to libFuzzer and I fixed that. I didn't think there was an objection to command line arguments in general though.

If the objection was to command line arguments in general, Is there a way to test an architecture in isolation from the others that's more in keeping with libFuzzer's style?

kcc added inline comments.Feb 21 2017, 12:26 PM
tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp
0

using pre-processor macros to select between different fuzzers.

That's fine, since it creates different binaries, where unused code has a good chance to not even be linked in.

I'd probably use the first few bytes of the data as the configuration

That's an option but it has 2 problems:

  1. now the inputs are from some new artificial data format
  2. fuzzing is less efficient due to larger binary

FWIW, this is also the case between arches/subarches/features.

Yes.

you objected to having a custom main function that mangled the arguments

Yes, and that is currently not supported at all (was a mistake).

Is there a way to test an architecture in isolation from the others that's more in keeping with libFuzzer's style?

Have separate binary (build target) for every distinct configuration of code we have in mind.
How many are we talking about here?
tens or hundreds?
Having 20-30 binaries like this is totally fine imho, and works great on oss-fuzz (see e.g. the way we've done it for ffmpeg, where we have ~40 binaries).
https://github.com/google/oss-fuzz/blob/master/projects/ffmpeg/build.sh

bcain added inline comments.Feb 21 2017, 12:54 PM
tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp
0

Have separate binary (build target) for every distinct configuration of code we have in mind.

I'd be willing to modify my submission to at least decompose llvm-mc-fuzzer into llvm-mc-assemble-fuzzer and llvm-mc-disassemble-fuzzer. That would solve the problem of having primarily disjoint content under a single binary. Is that a reasonable compromise, or is it necessary to have zero command line args?

kcc added a comment.Feb 21 2017, 12:56 PM

It's a good compromise to start with.

dsanders added inline comments.Feb 21 2017, 1:21 PM
tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp
0

Have separate binary (build target) for every distinct configuration of code we have in mind.
How many are we talking about here? tens or hundreds?
Having 20-30 binaries like this is totally fine imho, and works great on oss-fuzz (see e.g. the way we've done it for ffmpeg, where we have ~40 binaries).
https://github.com/google/oss-fuzz/blob/master/projects/ffmpeg/build.sh

It depends how far you go with splitting it up. If you split binaries at the arch level then there's 15 in-tree targets (I didn't check how many have an MC layer). Including subarches it's probably hundreds (Mips has ~20 I can think of off-hand), and including features it's likely to push towards thousands.

You only save binary size by splitting at the arch level though since LLVM doesn't have a means to limit support to subarches and smaller.

I'd be willing to modify my submission to at least decompose llvm-mc-fuzzer into
llvm-mc-assemble-fuzzer and llvm-mc-disassemble-fuzzer. That would solve the
problem of having primarily disjoint content under a single binary. Is that a reasonable
compromise, or is it necessary to have zero command line args?

That sounds reasonable to me.

bcain updated this revision to Diff 89465.Feb 22 2017, 8:22 PM

Now decomposed into separate -assemble, -disassemble executables.

kcc accepted this revision.Feb 23 2017, 11:34 AM

Looks good. We can iterate after this version is submitted.
I've left two comments, but feel free to address them in future commits.

tools/llvm-mc-disassemble-fuzzer/llvm-mc-disassemble-fuzzer.cpp
79

why do you limit the size this way?
Isn't it useful to run tiny inputs?

97

what will be the behavior if no flags are supplied?
Can we set the default values so that the fuzzer will do something meaningful w/o any flags?

Also, if we have the default values as a macro that we can re-define from a cmake flag,
this will solve the problem of building multiple binaries .

This revision is now accepted and ready to land.Feb 23 2017, 11:34 AM
dsanders added inline comments.Feb 24 2017, 2:06 AM
tools/llvm-mc-disassemble-fuzzer/llvm-mc-disassemble-fuzzer.cpp
79

I don't think we should have this limit. When I was testing the Mips disassembler, I found it very useful to limit the fuzzer to 4-bytes of data so that the buffer was always the opcode of the unsupported/broken instruction. I also found a bug in 0-3 byte buffers where it assumed it was safe to read the first instruction and would overflow the buffer.

97

what will be the behavior if no flags are supplied?
Can we set the default values so that the fuzzer will do something meaningful w/o any flags?

It will test the default triple from sys::getDefaultTargetTriple(). This is usually the host but it can be set in CMake.

Also, if we have the default values as a macro that we can re-define from a cmake flag,
this will solve the problem of building multiple binaries.

This is partially available through CMake's LLVM_DEFAULT_TARGET_TRIPLE variable. The triple influences the default -mcpu and -mattrs but not all subtargets can be described with just a triple.

bcain added inline comments.Feb 24 2017, 6:48 AM
tools/llvm-mc-disassemble-fuzzer/llvm-mc-disassemble-fuzzer.cpp
79

Agreed: this was an error, I was experimenting and I will remove it.

97

Also, if we have the default values as a macro that we can re-define from a cmake flag, this will solve the problem of building multiple binaries.

This is partially available through CMake's LLVM_DEFAULT_TARGET_TRIPLE variable. The triple influences the default -mcpu and -mattrs but not all subtargets can be described with just a triple.

I believe Kostya was referring to building the set of all dis/assemblers. I think archs are available in CMake -- we could use that to iterate over, but I think what we really need are the set of all triples. And I suspect that there is no such facility.

bcain updated this revision to Diff 89793.Feb 25 2017, 9:10 AM

Delete experimental "(Size < 1024)" debug code -- was not intended for submission.

kcc requested changes to this revision.Feb 27 2017, 5:19 PM

LGTM from fuzzing POV. Let's commit and experiment further.

This revision now requires changes to proceed.Feb 27 2017, 5:19 PM
tools/llvm-mc-disassemble-fuzzer/llvm-mc-disassemble-fuzzer.cpp