Page MenuHomePhabricator

[AMDGPU] Support disassembly for AMDGPU kernel descriptors
ClosedPublic

Authored by rochauha on May 28 2020, 5:20 AM.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
scott.linder added inline comments.Jul 6 2020, 1:06 PM
llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1349

Can you expand this comment a little and move it to a Doxygen comment for the function?

1624

Need to handle the "default" case here:

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:1510:1: warning: control may reach end of non-void function [-Wreturn-type]
rochauha updated this revision to Diff 276059.Jul 7 2020, 7:28 AM
  • Handled default statement to silence the warning.
  • Expanded comments for decodeCOMPUTE_PGM_RSRC1 and decodeCOMPUTE_PGM_RSRC2.
  • Removed extra comment at the end of functions.
  • Changed SoftFail to Success for code object v2.
  • Replaced the old test case with a small assembly file.
rochauha updated this revision to Diff 276063.Jul 7 2020, 7:42 AM
rochauha marked 5 inline comments as done.
  • Return MCDisassembler::Fail for code object v2.
  • Add missing full stops in doxygen comments.
llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1345

Done.

1349

Done.

1624

Done.

1665

Done.

jhenderson added inline comments.Jul 8 2020, 1:19 AM
llvm/test/tools/llvm-objdump/ELF/AMDGPU/kernel-descriptor.s
1 ↗(On Diff #276063)

No need for the <. llvm-mc is quite capable of taking inputs on the command-line as positional arguments.

2 ↗(On Diff #276063)

This line is too long. Please break it up into individual lines:
; RUN: llvm-objdump ... | \
; RUN: tail -n +8 | llvm-mc ...

5–8 ↗(On Diff #276063)

This test is also quite small. Does it actually cover every code path?

10 ↗(On Diff #276063)

Is this a FIXME/TODO? If so, please add "FIXME" or "TODO".

rochauha updated this revision to Diff 277360.Jul 13 2020, 2:55 AM

Changes as per review by @jhenderson.

rochauha marked 7 inline comments as done.Jul 13 2020, 3:03 AM
rochauha added inline comments.
llvm/test/tools/llvm-objdump/ELF/AMDGPU/kernel-descriptor.s
1 ↗(On Diff #276063)

Done.

2 ↗(On Diff #276063)

Done.

5–8 ↗(On Diff #276063)

These values must be always specified by the user. Values of some bytes are computed using the values passed here. Consequently there are some cases where we need to test for getting the exact same bytes in the re-assembled binary, even if the disassembled values slightly deviate from the original values.

Other bytes/bits hold the exact values specified using the assembler directives. Also they take default values if nothing is specified by the user. So I guess we don't need to test that for re-assembly.

10 ↗(On Diff #276063)

Done.

rochauha updated this revision to Diff 279490.Jul 21 2020, 5:15 AM
rochauha marked 3 inline comments as done.

Switched to disassembling as numerical value rather than .amdgcn.next_free_sgpr.

scott.linder requested changes to this revision.Jul 21 2020, 3:28 PM
scott.linder added inline comments.
llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1243–1250

I think this can just be replaced with:

NextFreeVGPR = (GranulatedWorkitemVGPRCount + 1) * getVGPRAllocGranule(STI);

Or we could add another function called getNumVGPRs(const MCSubtargetInfo *STI, unsigned NumVGPRBlocks, Optional<bool> EnableWavefrontSize32) and put the definition directly next to getNumVGPRBlocks(const MCSubtargetInfo *STI, unsigned NumVGPRs, Optional<bool> EnableWavefrontSize32) so any future changes affecting one also affect the other. I would lean towards this, and documenting that they are the inverse of one another.

1278–1285

Same as above, I think a new getNumSGPRs to complement getNumSGPRBlocks would make this easier to read.

For the GFX10 case we could either leave the check here, or have the new function return Optional to indicate when there is an error.

1290

What is "GS" and why is this commented out?

1484

I would still like to see this done, the magic numbers here could lead to a few problems down the line, and presently they just make the code harder to read.

1572–1576

Can this just be return decodeKernelDescriptor(...);?

llvm/test/tools/llvm-objdump/ELF/AMDGPU/kernel-descriptor.s
6 ↗(On Diff #279490)

Can you implement more tests? I don't know if it is feasible to include them all in the same .s file, as you have to work around the output not being re-assembleable in general, but even just copy-pasting and editing is fine with me. Just having some more test cases to cover at least a reasonable sample of the different branches, failure modes, etc.

This revision now requires changes to proceed.Jul 21 2020, 3:28 PM
rochauha updated this revision to Diff 281180.Jul 28 2020, 3:58 AM
rochauha marked an inline comment as done.
  • Got rid of raw number and added an enum for offsets.
  • Added new tests.
  • Updated the nop-data.ll test so that it passes with this patch.
rochauha added inline comments.Jul 28 2020, 3:59 AM
llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1243–1250

I think this route is good for now :
NextFreeVGPR = (GranulatedWorkitemVGPRCount + 1) * getVGPRAllocGranule(STI);

Similarly for SGPRs.

I'd like to add the new functions via a separate patch. This patch is already quite big in terms of size.

1278–1285

As I mentioned in my other reply, this patch is already quite big. So it'd be better to have a separate patch for the new functions.

1290

This was was for printing Granulated wave front SGPR count.

llvm/test/tools/llvm-objdump/ELF/AMDGPU/kernel-descriptor.s
6 ↗(On Diff #279490)

I have added new tests in separate files.

scott.linder accepted this revision.Jul 30 2020, 9:40 AM

LGTM, thank you!

I'm probably not in a position to review the majority of this further. However, I do have big reservations about the testing - there are a high number of possible code paths, but I only see 5 tests, which clearly can't cover all these code paths.

llvm/test/tools/llvm-objdump/ELF/AMDGPU/kernel-descriptor-1.s
1 ↗(On Diff #281180)

It would be helpful to add a small comment at the start of each of these tests explaining what the are specifically testing.

I'm probably not in a position to review the majority of this further. However, I do have big reservations about the testing - there are a high number of possible code paths, but I only see 5 tests, which clearly can't cover all these code paths.

I think some of the difficulty in writing tests for the failure cases is not having a tool to produce them. I think one would have to generate the code object, and then hand-edit it in a hex-editor, copy it into the source tree (i.e. under Inputs/) maybe with some accompanying comment about the nature of the original source and the edits made?

It would be nice to at least cover some of the failure cases, even if they are more awkward to tests. There could also be some similar tests for e.g. GFX7,GFX10 and the edge cases around the {S,V}GPR calculation for each.

I'm probably not in a position to review the majority of this further. However, I do have big reservations about the testing - there are a high number of possible code paths, but I only see 5 tests, which clearly can't cover all these code paths.

I think some of the difficulty in writing tests for the failure cases is not having a tool to produce them. I think one would have to generate the code object, and then hand-edit it in a hex-editor, copy it into the source tree (i.e. under Inputs/) maybe with some accompanying comment about the nature of the original source and the edits made?

It would be nice to at least cover some of the failure cases, even if they are more awkward to tests. There could also be some similar tests for e.g. GFX7,GFX10 and the edge cases around the {S,V}GPR calculation for each.

It seems to me that the format of the data structure is well-understood (otherwise you wouldn't be able to write code to disassemble it). In similar situations, e.g. the DWARF .debug_line parsing, we didn't rely on the built-in .file/.loc directives to generate our line table, and instead wrote it out by hand using .byte/.half/.long/.quad. It's not the prettiest of things of course, but it's better than code paths that never get exercised in tests. You could also use yaml2obj to do a generate the section with a raw hex blob, and/or write an array of bytes in a gtest unit test. The latter situation might be particularly useful because it would allow you to use the same "base" array, and modify individual bytes in individual test cases to check the behaviour for each.

rochauha updated this revision to Diff 285815.Aug 14 2020, 11:04 PM
  • Updated tests and added a failure test.
  • Checking error when cursor holds Error::success() while handling failure cases.
  • Removed Size from subroutines' signatures as we aren't tracking the size.

Can I consider this patch to be at NFC stage?

rochauha edited the summary of this revision. (Show Details)Aug 17 2020, 10:50 PM
This revision was not accepted when it landed; it landed in state Needs Review.Aug 18 2020, 8:19 PM
This revision was automatically updated to reflect the committed changes.
jhenderson reopened this revision.Aug 19 2020, 12:34 AM

Hi @rochauha,

I'm not sure this should have been pushed - nobody has reviewed your latest update, and I've had concerns about the testing prior to that, as stated. There is also a "Needs Revision" marker still outstanding, and I usually take that to mean that this shouldn't be pushed until the relevant reviewer is satisfied, regardless of what others have said (note that Phabricator highlighted that this patch landded in a "Needs Review" state).

Please could you revert, as there are potential assertion failures in my reading of this code, at least in the event you hit malformed input, in addition to the other issues raised.

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1220–1225

This is not how to handle malformed input. This will result in an assertion in debug builds, which is equivalent to a crash, without any useful context to draw on, because you've not reported the error. More below.

1408

Rather than all these checkError calls, I'd expect to see a check of the Cursor followed by a return of MCDisassembler::Fail to indicate there was a problem.

1542–1548

If this loop terminates due to C being invalid, you don't want to fall out the bottom and return Success, I think. You'd want to check C after loop termination and return Fail. Alternatively, if you return Fail from the decodeKernelDescriptorDirective you can do cantFail(C.takeError()); after the loop.

Ideally, we'd actually report the error from C in the event of failure, but currently there's no way of communicating that back up to the caller.

llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-failure.s
2

Please add a comment to the top of this test explaining what this test is actually testing.

4

Rather than writing text to another input file at run time in this way, you can use the new split-file tool to have all the input inline below, and split it up using the tool into multiple files.

12

Please don't mix comment markers within the same file. You use '//' here, but ';' everywhere else.

Additionally, new LLVM binutils tests tend to use double comment markers to indicate true comments as opposed to RUN/CHECK lines (i.e. ';;' in this context).

35–36

Please remove the additional blank lines at the end of file.

llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-sgpr.s
2

Please follow the comments from kd-failure.s in all the tests (where applicable) too.

rochauha updated this revision to Diff 290241.Mon, Sep 7, 4:39 AM

Changes based on comments by @jhenderson

  • Used llvm::cantFail(C.takeError()) to handle error.
  • Removed blank lines at the end of test files.
  • Streamlined beginning of comments.
  • Used split-file tool to separate SGPR and VGPR test cases.
rochauha marked 2 inline comments as done.Mon, Sep 7, 4:41 AM
rochauha added inline comments.
llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-failure.s
4

In case of kd-failure.s, concatenation is necessary because the disassembled output will be .bytes, and the symbol information is needed to be get the same binary again. I feel that "printf'ing" to a file and concatenating with disassembled text is a simpler compared to split file and concatenating.

However, I have used the split-file tool to separate the S/VGPR test cases.

rochauha marked an inline comment as done.Mon, Sep 7, 4:48 AM
rochauha added inline comments.
llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1542–1548

Thanks for the pointer! I think cantFail seems to be all that is needed, because the cursor was holding Error::success in some cases, which needs to be 'checked' before moving further.

Since the kernel descriptor is well defined, all cases where failure needs to be handled are handled using MCDisassembler::Fail.

@kzhuravl, do you have any additional comments to add at all, since you rejected the original patch?

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1394

I think for self-documentation purposes, it would be helpful to assert that Bytes.size() == 64 here. I see that it is verified in the calling function but a) that's not obvious when looking at this function in isolation, and b) in the future, we don't want other places calling this code without that check, so the assert provides a backstop of sorts.

1542–1548

Ah, I missed that the input can't be truncated, so C can't get into a failure state itself. Thanks! (See also my comment about an assert above).

llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-failure.s
3
llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-sgpr.s
23

Nit: missing full stop.

rochauha updated this revision to Diff 290443.Tue, Sep 8, 2:55 AM
rochauha marked 3 inline comments as done.

Changes based on comments by @jhenderson.

jhenderson accepted this revision.Tue, Sep 8, 4:45 AM

No more comments from me, but please give @kzhuravl a chance to respond before pushing this patch.

kzhuravl accepted this revision.Tue, Sep 8, 8:44 AM

LGTM, thank you!

This revision is now accepted and ready to land.Tue, Sep 8, 8:44 AM
rochauha added a comment.EditedTue, Sep 8, 11:22 AM

3 of the test cases - kd-sgpr.s, kd-vgpr.s, kd-zeroed-gfx10.s fail with the PowerPC buildbot (http://lab.llvm.org:8011/builders/clang-ppc64be-linux/builds/53608). From what I understand, these tests should only run if AMDGPU target is built. The lit.local.cfg file specifies that.

3 of the test cases - kd-sgpr.s, kd-vgpr.s, kd-zeroed-gfx10.s fail with the PowerPC buildbot (http://lab.llvm.org:8011/builders/clang-ppc64be-linux/builds/53608). From what I understand, these tests should only run if AMDGPU target is built. The lit.local.cfg file specifies that.

The PowerPC build bot just means a build bot that is running on a PowerPC host. It actually targets most targets. See the following excerpt from the CMake.

-- Targeting AArch64
-- Targeting AMDGPU
-- Targeting ARM
-- Targeting AVR
-- Targeting BPF
-- Targeting Hexagon
-- Targeting Lanai
-- Targeting Mips
-- Targeting MSP430
-- Targeting NVPTX
-- Targeting PowerPC
-- Targeting RISCV
-- Targeting Sparc
-- Targeting SystemZ
-- Targeting WebAssembly
-- Targeting X86
-- Targeting XCore

I've run into this sort of problem before. The issue is almost certainly either a) incorrect assumption about host system endianness, meaning that you've incorrectly/inadvertently assumed the host is little endian, or b) assumed a 64-bit system somewhere. a) is almost certainly the issue here, based on both the test output and build bot name (ppc64*be*). I've skimmed the patch, but can't obviously see where the code is going wrong, but the test output for kd-zeroed-gfx10.s suggests it's around bytes 48-52. There may be other issues elsewhere though, since 0 renders the same regardless of size and endianness.

...

I've run into this sort of problem before. The issue is almost certainly either a) incorrect assumption about host system endianness, meaning that you've incorrectly/inadvertently assumed the host is little endian, or b) assumed a 64-bit system somewhere. a) is almost certainly the issue here, based on both the test output and build bot name (ppc64*be*). I've skimmed the patch, but can't obviously see where the code is going wrong, but the test output for kd-zeroed-gfx10.s suggests it's around bytes 48-52. There may be other issues elsewhere though, since 0 renders the same regardless of size and endianness.

Yes, considering the failures appear to be exactly the same on PowerPC Big Endian and SystemZ (which is also Big Endian), I would assume that there is a host endianness assumption here (I haven't looked at the code). Please pull this patch until a fix is ready. There are at least the 4 bots that are red because of this for quite a few consecutive builds. Pull the patch so we can get the bots back to green and we can work on a fix afterwards.