Page MenuHomePhabricator

[mctoll] Initial changes for MC to LL raiser that takes a binary and raises it back to llvm bitcode
AbandonedPublic

Authored by asmith on Sep 20 2018, 10:33 PM.

Details

Summary

This is the initial set of changes for a new tool called llvm-mctoll that raises binaries back to llvm bitcode. Currently there is support for raising Arm32 and x64 Linux elf shared libraries and simple executables such as dhrystone.

Here is a summary of features in varying states of completion:

  1. Function boundary identification. Analyzes the text section of an elf input binary (executable or shared library) to identify function boundaries.
  1. CFG construction. Builds the CFG for a function and the corresponding MachineFunction representation along with the constituent MachineBlocks. The MachineFunction object is used to materialize a Function object by raising the instructions of MachineBlock into BasicBlocks of the Function object.
  1. Instruction raising. Stack accesses are abstracted to alloca instructions. Various abstract instruction classes are defined - such as memory referencing instructions, floating point instructions, register move instructions, binary operator instructions, etc.
  1. Function prototype discovery. A MachineFunction is analyzed to create an abstract function prototype. The current implementation assumes that the binaries are 64-bit and are compiled from C sources. The function prototype discovery algorithm assumes C calling-conventions and is limited to arguments passed on the stack ( > 6 args is not implemented yet). Calls to variadic functions are discovered by analyzing the instructions. Linkage to external functions (such as to glibc) is handled by maintaining a table of known function signatures.
  1. Information from various sections of the ELF binary - such as GOT, PLT, data sections and symbol table is used to materialize materialize machine-independent abstractions such as string constants, external call linkage etc.
  1. There are tests that try to cover much of the major functionality for both Arm32 and x64.

Diff Detail

Repository
rL LLVM

Event Timeline

asmith created this revision.Sep 20 2018, 10:33 PM

Hi Aaron,

Thanks for sharing - this looks like an interesting tool.
Having scrolled real quickly through the code, I'm assuming that as is, this is the version of the tool as you created more or less out-of-tree?
Or in other words, a bit of polish probably is needed to get it integrated better - e.g.

  • is the LICENSE.TXT file necessary?
  • Some of the testing seems to require quite a lot of things to be enabled in the build - so maybe that needs refinement?
  • Do we really want to build and run (pristine?) dhrystone as part of testing?

However, I'm assuming those are just details at the moment.
I think it may be good to kick of a thread on llvm-dev about introducing this tool. I'm assuming that'll increase visibility, reaching more people who have built similar tools in the past.
Not being an expert in this area, I wonder about:

  1. Intended use cases for this tool.
  2. What are the expected limitations, both now (because of current implementation status) and in the future (theoretical limitations based on design)?

Thanks!

Kristof

Hi Aaron,

Thanks for sharing - this looks like an interesting tool.
Having scrolled real quickly through the code, I'm assuming that as is, this is the version of the tool as you created more or less out-of-tree?

...

Hi Kristof,

The tool is set up to build in-tree - similar to others in the tools directory. There are remnants in llvm-mctoll/CMakeLists.txt of an earlier support to build out-of-tree. I can clean it up.

Thanks,

Bharadwaj

Matt added a subscriber: Matt.Oct 11 2018, 8:50 PM
asmith abandoned this revision.Oct 16 2018, 9:33 PM

We've setup a github repo and can continue the discussion there.

https://github.com/Microsoft/llvm-mctoll