Only the disassembler is supported in this patch but it has already found a few
issues in the Mips disassembler (mostly invalid instructions being successfully
disassembled).
Details
Diff Detail
Event Timeline
The recent threads on libFuzzer inspired me to try it out and llvm-mc-fuzzer is
the result. It seems to be useful and has already found a few issues with the
Mips disassembler, so I thought I'd post the patch to see if there's interest
in bringing this upstream.
I forgot to mention: I also have some python scripts to convert our disassembler test inputs to the raw binary format used by libFuzzer and back. I haven't included those in this patch since they are just quick hacks. They'll need implementing properly to be suitable for upstreaming.
Nice!
Could you please also add a line about your findings
to http://llvm.org/docs/LibFuzzer.html#trophies
(or tell me what to add there) ?
Once committed, I'll add it to the fuzzer bot:
lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer
tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp | ||
---|---|---|
22 | please use a C++ constant | |
62 | why not vector? | |
81 | Since the command line is unusual (compared to other uses of libFuzzer) | |
103 | Do you really need to copy these here? | |
109 | Do you need it to be this complex? |
tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp | ||
---|---|---|
62 | I didn't give it much thought. I'll switch it to a vector. | |
103 | Unfortunately, yes but there is an alternate solution. The problem is that c_str() returns a const char * but fuzzer::FuzzerDriver() expects an array of char *. It's unsafe to just drop the const-ness so I make a non-const copy. If fuzzer::FuzzerDriver's second was const char ** then I could avoid the copy. | |
109 | My thinking was that it would be nice if llvm-mc-fuzzer had a similar command line to llvm-mc. If it's preferred to use a separate binary then I don't mind doing that. |
Replaced macro with C++ constant.
Added trophy to libFuzzer documentation.
Changed DataCopy to a std::vector.
Added usage examples.
Nice! I know Russell had been looking at using fuzz-testing to test
round-tripping through assembly, which seems like a perfect fit for a
libFuzzer-based tool. Russell, is this something that you are still working
on? Maybe llvm-mc-fuzzer will grow that functionality some day.
- Sean Silva
I found a number of bugs comparing direct object emission and via assembly with the check_cfc tool. Sean I and spoke about using fuzzing in that area. I expect it would find issues but I haven't done any work on it so no objections if someone wanted to add this to llvm-mc-fuzzer.
Added links to libFuzzer.rst and better explained it.
Fixed contrain -> constrain.
Switched to the new FuzzerDriver() interface.
I'm keen to get that functionality too. Mips's move instructions will be a bit troublesome here since many distinct opcodes disassemble to 'move $1, $2' but that string only assembles to a single opcode.
One feature that would be helpful from the Fuzzer is the ability for the callback to be able to classify inputs into various bins. For example, "this input is invalid", "this input disassembled but failed to complete the round trip", "this input completed a round trip but the encodings don't match", etc. At the moment, we need to determine this when converting inputs into test cases which seems redundant when the callback already knew what happened.
tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp | ||
---|---|---|
106 | That worked nicely. Thanks |
One feature that would be helpful from the Fuzzer is the ability for the callback to be able to classify inputs into various bins. For example, "this input is invalid", "this input disassembled but failed to complete the round trip", "this input completed a round trip but the encodings don't match", etc. At the moment, we need to determine this when converting inputs into test cases which seems redundant when the callback already knew what happened.
Yes, I've seen similar requests already.
goFuzz does it this way:
The function must return 1 if the input is interesting in some way (for example, it was parsed successfully, that is, it is lexically correct, go-fuzz will give more priority to such inputs); -1 if the input must not be added to corpus even if gives new coverage; and 0 otherwise; other values are reserved for future use.
So, I'll probably add some similar functionality (probably not in the nearest two weeks though).
LGTM, thanks for doing this!
tools/llvm-mc-fuzzer/llvm-mc-fuzzer.cpp | ||
---|---|---|
128 | I would just call fuzzer::FuzzerDriver and not create the "TestOneInput" temporary. |
please use a C++ constant