Page MenuHomePhabricator

[AMDGPU] llvm-objdump: disassembling amdgcn object file
ClosedPublic

Authored by vpykhtin on Feb 8 2016, 10:47 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

vpykhtin updated this revision to Diff 47224.Feb 8 2016, 10:47 AM
vpykhtin retitled this revision from to [AMDGPU] llvm-objdump, MCDisassembler: disassembling .hsatext section of HSA Code Object (draft).
vpykhtin updated this object.
vpykhtin added a reviewer: tstellarAMD.
vpykhtin set the repository for this revision to rL LLVM.
vpykhtin added subscribers: nhaustov, SamWot.

Is there some reason why you can't parse the code object in from the AMDGPU implementation of getInstruction()?

Is there some reason why you can't parse the code object in from the AMDGPU implementation of getInstruction()?

MCDisassembler::getInstruction is declared as follows:

/// Returns the disassembly of a single instruction.
///
/// \param Instr    - An MCInst to populate with the contents of the
///                   instruction.
/// \param Size     - A value to populate with the size of the instruction, or
///                   the number of bytes consumed while attempting to decode
///                   an invalid instruction.
/// ...
virtual DecodeStatus getInstruction(MCInst &Instr, uint64_t &Size,
                                    ArrayRef<uint8_t> Bytes, uint64_t Address,
                                    raw_ostream &VStream,
                                    raw_ostream &CStream) const = 0;

Its supposed to return filled MCInst as a result and return consumed bytes num.

Current DisassemblyObject just passes what it consider as 'code' to the disassembler and therefore inside getInstruction we facing the following:

  1. We need to know whether we're actually inside an amd_kernel_code_t. Yes we can analize Address but llvm-mc calls dissassembler for a raw stream of bytes and we need to care of it too.
  2. amd_kernel_code_t cannot be stored to MCInst but its desirable to dump it too as part of disassembly.
  3. it would be a hack to try reposition reader specifying negative size in the case when amd_kernel_code_t is actually before amd_kernel_code_t
  4. we need to process all code symbols in .hsasection to find all kernels ends as kernel size is not specified by design.
  5. DisassemblyObject knows nothing about ELF::STT_AMDGPU_HSA_KERNEL

Hi,

In the loop that iterates over symbols and emits them, there is already some target specific code for AARCH64.
I would recommend adding a callback to MCDisassembler called something like disassembleSymbolStart() and call that function for the AMDGPU target, and then move the disassembly code for the amd_kernel_code_t into the AMDGPU backend.

Hi,

In the loop that iterates over symbols and emits them, there is already some target specific code for AARCH64.
I would recommend adding a callback to MCDisassembler called something like disassembleSymbolStart() and call that function for the AMDGPU target, and then move the disassembly code for the amd_kernel_code_t into the AMDGPU backend.

For now I just moved all my code to AMDGPUDisassembler library (links with llvm-objdump already) and call DisassembleHSACodeObject in the beginning of llvm-objdump's DisassembleObject if Obj->getArch() == Triple::amdgcn. It still lefts llvm-objdump slightly AMDGPU target dependent but now I think it's not worth to touch MCDisassembler interface until we know all the requirements. It's likely that we might want to disassemble HSA code object completely with AMDGPU code not depending on llvm-objdump.

vpykhtin updated this revision to Diff 47672.Feb 11 2016, 9:49 AM
vpykhtin edited edge metadata.
  • AMDGPU dependent code moved to AMDGPUDisassembler library.
  • Dissassembly code slightly enhanced to be ready assembled again. A lot of TODO though.
  • Skipping unussembled code bytes nicely
arsenm added inline comments.Feb 11 2016, 11:47 AM
lib/Target/AMDGPU/Disassembler/HSAObjCodeDisassembler.cpp
113 ↗(On Diff #47672)

Would this be useful in the MCContext header?

150–152 ↗(On Diff #47672)

This is dead code

158 ↗(On Diff #47672)

The triple should be amdgcn-unknown-amdhsa. Why do you need to hardcode this here?

166–167 ↗(On Diff #47672)

I don't think this should be able to happen. These checks should maybe all be asserts

208 ↗(On Diff #47672)

Put declarations on separate lines

225 ↗(On Diff #47672)

Single quotes

228 ↗(On Diff #47672)

Single quotes

tools/llvm-objdump/llvm-objdump.cpp
834 ↗(On Diff #47672)

Function should start with lowercase letter

vpykhtin added inline comments.Feb 11 2016, 12:23 PM
lib/Target/AMDGPU/Disassembler/HSAObjCodeDisassembler.cpp
113 ↗(On Diff #47672)

Yep, I would move it there.

150–152 ↗(On Diff #47672)

Well, when NDEBUG there is no DebugFlag, but it is tested below

if (!DisAsm->getInstruction(Inst, EatenBytesNum,

Code.slice(Index), Index,
DebugFlag ? dbgs() : nulls(),
nulls())) {

I just dont like interleave code with macros a lot.

158 ↗(On Diff #47672)

probably not need to hardcode, but I haven't found yet how to construct it properly.

166–167 ↗(On Diff #47672)

ok.

vpykhtin added a project: Restricted Project.Feb 13 2016, 3:41 AM
tools/llvm-objdump/llvm-objdump.cpp
840 ↗(On Diff #47672)

You aren't going to be able to call target specific functions from a common tool like this. The only way this is going to work is with some kind of generic callback.

nhaustov added inline comments.Mar 9 2016, 5:36 AM
tools/llvm-objdump/llvm-objdump.cpp
840 ↗(On Diff #47672)

llvm-objdump actually has custom code for AArch64 that tries to determine whether bytes being disassembled is code or data:

// AArch64 ELF binaries can interleave data and text in the
// same section. We rely on the markers introduced to
// understand what we need to dump.

...
To me, proposed code in separate file looks cleaner and does not clutter llvm-objdump.cpp.

Anyway, another question is whether it is really only AMDGPU that needs custom handling of parts of .text segment? How do other platforms handle metadata needed for code startup? Any pointers?

One reason to keep amd_kernel_code_t (metadata needed for kernel startup) close to actual ISA is effiiency and cache concerns. We can still probably change it in new format.

vpykhtin updated this revision to Diff 50161.Mar 9 2016, 11:14 AM
vpykhtin retitled this revision from [AMDGPU] llvm-objdump, MCDisassembler: disassembling .hsatext section of HSA Code Object (draft) to [AMDGPU] llvm-objdump: disassembling .hsatext section of HSA Code Object v1.0.
vpykhtin updated this object.
vpykhtin removed rL LLVM as the repository for this revision.
vpykhtin marked 4 inline comments as done.
vpykhtin added a subscriber: llvm-commits.
  • HSACodeObjectDisassembler class introduced
  • ELFNote support added
  • Target ISA determined based on amdgpu_hsa_note_isa_t ELF note (spec)
  • printing for assembler directives
vpykhtin updated this revision to Diff 50280.Mar 10 2016, 8:14 AM
vpykhtin updated this object.

Added binary test

How has the test created. What is a ".co"?

How has the test created. What is a ".co"?

".co" means CodeObject. It was created using out of llvm tree utility. It is of HSA Code Object v1.0 format which is slightly different from what AMDGPU assembler currently emitting, the main difference is - code symbol has zero size and amd_kernel_code_t (runtime control structure) and ISA could be non-adjacent. The dissasembler is supposed to accept both versions.

lib/Target/AMDGPU/Disassembler/HSAObjCodeDisassembler.cpp
113 ↗(On Diff #47672)

Let's do it in separate change later.

Adding quote from summary so it can be seen in notification emails:

Problem: this change contains direct call for disassembleHSACodeObject from llvm-objdump which is actually located in the AMDGPU library. This is incorrect as require llvm-objdump always link with AMDGPU target. There should be a target callback somewhere that allows custom object file disassembly. One thought was to add such callback (something like disassembleObjectFile(ObjFile, OutStream)) to the MCDisassembler interface but it is a bit late because MCDisassembler is created when the target ISA specification is already known (it depends on MCSubtargetInfo) but the target ISA spec is yet to be read from the object file really.

In other words, this is a Elf_Rel file? Why do you need a new extension?

Right Elf_Rel. Disassembling of this format requires different features that is currently suggested by llvm-objdump and these are really target specific. In particular we would like to print all custom directives and structures so it can be fed directly to assembler input, for example amd_kernel_code_t runtime control structure definitions. Dumping this structure without llvm-objdump->AMDGPU link dependency would require copy-paste of the target code. I'm not sure it's possible to create sufficient number of target callbacks to accomplish this.

Another issue is with the HSA Code Object 1.0 format itself - zero sized kernel code symbol require to preprocess the whole .hsatext section to find start markers of every item in this section to calculate the kernel code size. This particular issue is fixed with what AMDGPU assembler currently emits but we would like disassembler could accept old files too.


llvm-commits mailing list
llvm-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits

vpykhtin updated this revision to Diff 50625.Mar 14 2016, 12:15 PM
vpykhtin retitled this revision from [AMDGPU] llvm-objdump: disassembling .hsatext section of HSA Code Object v1.0 to [AMDGPU] llvm-objdump: disassembling amdgcn object file.
vpykhtin updated this object.

Made minimal possible change.

Limitations:

  • accepts object file produced by current AMDGPU assembler (no older formats support)
  • only ISA dumping (no directives (notes), amd_kernel_code_t etc.)
  • mcpu should be explicitly specified upon llvm-objdump run (no autodetection)

Added test using llvm-mc | llvm-objdump check.

Bigcheese requested changes to this revision.Mar 17 2016, 1:36 PM
Bigcheese added a reviewer: Bigcheese.
Bigcheese added a subscriber: Bigcheese.
Bigcheese added inline comments.
tools/llvm-objdump/llvm-objdump.cpp
395 ↗(On Diff #50625)

This should be using a endian specific type. Either llvm::support::ulittle32_t or ubig32_t.

1077 ↗(On Diff #50625)

Same with the endianness.

This revision now requires changes to proceed.Mar 17 2016, 1:36 PM
vpykhtin updated this revision to Diff 51034.Mar 18 2016, 9:42 AM
vpykhtin edited edge metadata.

Thanks for the review!

Updated the diff with issues fixed.

Bigcheese accepted this revision.Apr 5 2016, 4:19 PM
Bigcheese edited edge metadata.

lgtm

This revision is now accepted and ready to land.Apr 5 2016, 4:19 PM
This revision was automatically updated to reflect the committed changes.

I reverted it for a while, instruction bytes are printed in bigendian order on ppc despite on support::ulittle32_t usage. Need to fix.