Provide a new tool for LLVM bitcode files that makes it easier to test
and fuzz. That is, a simple textual form that captures the contents of
binary bitcode files.
The textual form removes all references to abbreviations in the binary
bitcode file. This is done to simplify its form.
A textual bitcode record is a sequence of (unsigned) integers,
separated by commas, and terminated with a semicolin. Each bitcode
record appears on a separate line.
The tool llvm-bcanalyzer also provides a textual version of bitcode
files. However, it doesn't provide anyway to convert the output back
to a binary bitcode file. It is also much more complex.
Besides providing a command line tool, it set up for use with lit
(LLVM Integrated Tester). Invalid bitcode tests, that don't
corresponding to abbreviations and bitstream issues, can be written as
a text file. This text file can run tests and check the results using
FileCheck.
The use of textual bitcode files (unlike the current invalid bitcode
tests) can use a single file to describe the invalid test. It also has
the advantage of being easier to port if the format of bitcode records
change. In such cases, you only need to edit the problematic textual
bitcode records that have changed since the test was added. Current
tests are in binary form, and very difficult to figure out how to
change when the bitcode format changes.
For fuzzing, the text form is easy to model using tokens (the digits,
separator ',', and terminator ';\n'). This allows (local) mutations to
easily be formed by existing fuzzers (such as afl-fuzz) without
modification. The textual-to-binary conversion methods can be used as
a post-processor of the fuzzer, or as a preprocessor to the llvm tool
being tested.
The motivation for this tool has been the success PNaCl
(www.gonacl.com) has had in testing its tools, as well as the ease of
generating invalid test cases.
Short description next to the file name?