This is an archive of the discontinued LLVM Phabricator instance.

In D80713#2108650, @rochauha wrote:

In D80713#2106033, @jhenderson wrote:

No idea about the functional logic of this code, but I do wonder whether you'd be better off dividing it into smaller patches for testability.

This patch is for entirely disassembling the kernel descriptor symbol. It happens to be so that many directives affect the kernel descriptor. I will also be adding a new test in this patch.

The problem is that you're trying to do it all at once. There's no need to implement the full disassembly at once. You can do it in a series of patches, one after the other that build towards the primary goal. For example, there are many if (<X is some value>) type cases, which you could omit - just assume those are all false, in earlier versions, and add them in (with corresponding testing) in a later patch. Similarly, you could assume that all results of x & y return some specific value, e.g. 0, and just print that for now. Yes, that means you won't support everything from the point at which this patch lands, but it will make each individual patch easier to reason with. This fits much better with LLVM's preferred approach - please see https://llvm.org/docs/DeveloperPolicy.html#incremental-development.

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1274–1277	I don't see a response to this suggestion.
1335	next -> Next
1481	`/IsLittleEndian =/true` -> `/IsLittleEndian=/true` (and same for `AddressSize`) as already asked for once...
1671	Nit: missing full stop.
1673–1674	Put this comment above the `if` it is referring to, like the exisitng bit of comment.

Oh, you could also put the additional behaviour behind a switch in llvm-objdump. That'll mean you won't break all the existing tests too. Once you've finished the development, you could either remove the switch entirely or just change its default (which would allow users to disable the behaviour if they want the less verbose version of the output).

Updated to use Cursor. Now there isn't any neet to maintain CurrentIndex.
Other small changes related to coding conventions.

The problem is that you're trying to do it all at once. There's no need to implement the full disassembly at once. You can do it in a series of patches, one after the other that build towards the primary goal. For example, there are many if (<X is some value>) type cases, which you could omit - just assume those are all false, in earlier versions, and add them in (with corresponding testing) in a later patch. Similarly, you could assume that all results of x & y return some specific value, e.g. 0, and just print that for now. Yes, that means you won't support everything from the point at which this patch lands, but it will make each individual patch easier to reason with. This fits much better with LLVM's preferred approach - please see https://llvm.org/docs/DeveloperPolicy.html#incremental-development.

Oh, you could also put the additional behaviour behind a switch in llvm-objdump. That'll mean you won't break all the existing tests too. Once you've finished the development, you could either remove the switch entirely or just change its default (which would allow users to disable the behaviour if they want the less verbose version of the output).

Thanks for the feedback! I'll keep these points in mind :)

Regarding this patch - I think that because it has been reviewed to quite some extent, it doesn't need to be broken into smaller patches now?

Following tests fail:

LLVM :: CodeGen/AMDGPU/call-encoding.ll
LLVM :: CodeGen/AMDGPU/s_code_end.ll
LLVM :: MC/AMDGPU/branch-comment.s
LLVM :: MC/AMDGPU/data.s
LLVM :: MC/AMDGPU/labels-branch-gfx9.s
LLVM :: MC/AMDGPU/labels-branch.s
LLVM :: MC/AMDGPU/offsetbug_once.s
LLVM :: MC/AMDGPU/offsetbug_one_and_one.s
LLVM :: MC/AMDGPU/offsetbug_twice.s
LLVM :: MC/AMDGPU/s_endpgm.s
LLVM :: Object/AMDGPU/objdump.s
LLVM :: tools/llvm-cov/ignore-filename-regex.test
LLVM :: tools/llvm-objdump/ELF/AMDGPU/source-lines.ll

These tests no longer fail.

Harbormaster failed remote builds in B61912: Diff 273682!Jun 26 2020, 7:37 AM

In D80713#2116706, @rochauha wrote:

The problem is that you're trying to do it all at once. There's no need to implement the full disassembly at once. You can do it in a series of patches, one after the other that build towards the primary goal. For example, there are many if (<X is some value>) type cases, which you could omit - just assume those are all false, in earlier versions, and add them in (with corresponding testing) in a later patch. Similarly, you could assume that all results of x & y return some specific value, e.g. 0, and just print that for now. Yes, that means you won't support everything from the point at which this patch lands, but it will make each individual patch easier to reason with. This fits much better with LLVM's preferred approach - please see https://llvm.org/docs/DeveloperPolicy.html#incremental-development.

Oh, you could also put the additional behaviour behind a switch in llvm-objdump. That'll mean you won't break all the existing tests too. Once you've finished the development, you could either remove the switch entirely or just change its default (which would allow users to disable the behaviour if they want the less verbose version of the output).

Thanks for the feedback! I'll keep these points in mind :)

Regarding this patch - I think that because it has been reviewed to quite some extent, it doesn't need to be broken into smaller patches now?

Maybe I missed something from some of the other reviewers, but I haven't seen much in the way of commentary on the core functionality of the new code, so I wouldn't say that it has been reivewed to quite some extent. I've personally only skirted around style aspects. I also don't see any evidence of any testing of the new code.

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1486	Bits or bytes?

Added new test case.

Herald added a reviewer: • espindola. · View Herald TranscriptJun 29 2020, 2:11 AM

Herald added a subscriber: emaste. · View Herald Transcript

Harbormaster failed remote builds in B62104: Diff 274016!Jun 29 2020, 3:44 AM

rochauha marked 2 inline comments as done.Jun 29 2020, 5:52 AM

rochauha added inline comments.

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1486	Bits. If there is a bit that is wrong in a particular chunk of bytes, we consider that the entire chunk of bytes is invalid. We then update the `Size` value. Further, we say that the first `Size` bytes in a symbol are invalid. Error handling in llvm-objdump will print these bytes using `.byte` directive. And then we fall back to decoding the remaining bytes in the symbol as instructions.

rochauha marked 2 inline comments as done.Jun 29 2020, 7:01 AM

rochauha added inline comments.

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1486	We do this because most directives in the kernel descriptor affect single or a very few bits.

In D80713#2110898, @jhenderson wrote:

In D80713#2108650, @rochauha wrote:

In D80713#2106033, @jhenderson wrote:

No idea about the functional logic of this code, but I do wonder whether you'd be better off dividing it into smaller patches for testability.

This patch is for entirely disassembling the kernel descriptor symbol. It happens to be so that many directives affect the kernel descriptor. I will also be adding a new test in this patch.

The problem is that you're trying to do it all at once. There's no need to implement the full disassembly at once. You can do it in a series of patches, one after the other that build towards the primary goal. For example, there are many if (<X is some value>) type cases, which you could omit - just assume those are all false, in earlier versions, and add them in (with corresponding testing) in a later patch. Similarly, you could assume that all results of x & y return some specific value, e.g. 0, and just print that for now. Yes, that means you won't support everything from the point at which this patch lands, but it will make each individual patch easier to reason with. This fits much better with LLVM's preferred approach - please see https://llvm.org/docs/DeveloperPolicy.html#incremental-development.

I'm curious if this is the intent of the document you linked, though? It says "The remaining inter-related work should be decomposed into unrelated sets of changes if possible.", but the disassembly of this directive is not decomposable into unrelated changes; if only part of the descriptor is disassembled then the whole .amdhsa_kernel block is invalid. I suppose in the interest of review one could break down the logical/atomic series of commits even further into inter-dependent patches that must be applied together? I think that causes confusion when doing git-bisect etc. so doesn't seem ideal. Or maybe the patches should be squashed back together after review but before they are committed?

I do think there needs to be more direct testing of the branches taken in the code. If decomposing the patch beyond the logical/atomic decomposition makes it easier to do that as an author and/or a reviewer then I am OK with it. In this case, though, a lot of the length in terms of SLOC is just repetition and a little redundancy, so breaking it up further would only hide that and make it harder to see where things are best removed or factored out.

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1237	This doesn't seem right. This symbol is not accurate without careful attention from an assembly author, so it will surely be incorrect when used by a disassembler. We can accurately compute a VGPR count from the `GRANULATED_WORKITEM_VGPR_COUNT`, we just calculate e.g. `(GRANULATED_WORKITEM_VGPR_COUNT + 1) * granularity` where the `+ 1` is needed to account for the minimum allocation (i.e. a granulated count encoded as "0" actually indicates a 1 granule allocation) and where the scaling by granularity is device-specific. This will calculate the greatest value possible for `.amdhsa_next_free_vgpr` which will produce the same granulated count, but equally valid you could choose to calculate the minimum, or any value in between. All we care about is producing disassembly which results in the same descriptor. (Caveat: I would prove this to yourself by referencing https://llvm.org/docs/AMDGPUUsage.html#amdgpu-amdhsa-compute-pgm-rsrc1-gfx6-gfx10-table as I may have the actual calculation wrong; we just want to calculate the inverse of what the assembler does) Your point that we can't recover the exact value the original author intended to place in the assembly text is true, but that is OK as long as we can always give them a valid input to the assembler which gives them the same output.
1242	This can be a char literal, i.e. '\n', same elsewhere.
1255	For this and the above case we should have tests to prove this out. I.e. assemble sources to a binary, disassemble and reassemble it, and then compare the two binaries. Ideally we would do this for some edge cases around VGPR/SGPR allocation granularity. There may need to be some fixup between disassembly and reassembly to account for the remaining non-reassembleable bits produced by llvm-objdump, but they should be pretty minor for a trivial kernel, and I would expect you could handle them with just `sed` which seems to be available to LIT tests.
1264	Just as with the VGPR count, we cannot use the symbol to define this, and we can compute a (non-unique) input which produces the same output.
1274	This shift isn't necessary, you just need to check for the presence of any set bits. I also noticed when checking the types here that for some reason we declare the enum for these masks/shifts as `int32_t`, which makes one have to think about both integer promotion rules and then possibly which bitwise-operations are valid for signed integers. I think here the signed mask is promoted to unsigned, and you get what you want, but it may be good to go fix the definition of the masks separately.
1279	I think a short-lived well-defined preprocessor macro could make the patch shorter and easier to read. For example: #define PRINT_TRIVIAL_FIELD(DIRECTIVE, MASK) do { \ KdStream << Indent << DIRECTIVE " " << ((FourByteBuffer & amdhsa:: ## MASK) >> amdhsa:: ## MASK ## _SHIFT) << '\n'; } while (0); PRINT_TRIVIAL_FIELD(".amdhsa_float_round_mode_32", COMPUTE_PGM_RSRC1_FLOAT_ROUND_MODE_32) PRINT_TRIVIAL_FIELD(".amdhsa_float_round_mode_16_64", COMPUTE_PGM_RSRC1_FLOAT_ROUND_MODE_16_64) // ... #undef HANDLE_TRIVIAL_FIELD I think in part this is because the definition of the masks themselves is in terms of preprocessor macros.
1300	Unnecessary shift
1311	Unnecessary shift, same for all other cases below when checking that the bits under a mask are 0.
1355	I don't know the general conventions here, but I don't think I have seen a comment for the end of a function elsewhere in LLVM. I do know that it is required for namespaces, so maybe it is permitted for long functions?
1397	All of these comments seem redundant to me, especially when the condition is simplified to just: if (Buffer & MASK) return Fail; At the very least, repeating the actual bit indices here when they are a part of the mask definition seems verbose.
1482	I don't understand the intent with carefully maintaining `Size` for the failure case. Aren't we certain by this point that this should be a kernel descriptor, and so the correct thing to do when we fail is to disassemble everything as `.byte` directives (i.e. set `Size` to the max value)? Why would we prefer to return a partial failure and have the disassembler start working on the remaining bytes as if they were instructions? That would also shorten the patch and make it obvious that the `Size` tracking is correct.
1494	Rather than have these comments, which are still just filled with magic numbers, could we define these offsets more explicitly somewhere, as e.g. `amdhsa::GROUP_SEGMENT_FIXED_SIZE_OFFSET`? For example in `llvm/include/llvm/Support/AMDHSAKernelDescriptor.h` alongside the other definitions needed by the compiler? We could then also update the `static_assert`s there to use those definitions, so we aren't relying on inspection to know the same offsets are used everywhere. I.e. the following: 165 static_assert( 1 sizeof(kernel_descriptor_t) == 64, 2 "invalid size for kernel_descriptor_t"); 3 static_assert( 4 offsetof(kernel_descriptor_t, group_segment_fixed_size) == 0, 5 "invalid offset for group_segment_fixed_size"); 6 static_assert( 7 offsetof(kernel_descriptor_t, private_segment_fixed_size) == 4, 8 "invalid offset for private_segment_fixed_size"); 9 static_assert( 10 offsetof(kernel_descriptor_t, reserved0) == 8, 11 "invalid offset for reserved0"); ... Becomes: 165 static_assert( 1 sizeof(kernel_descriptor_t) == 64, 2 "invalid size for kernel_descriptor_t"); 3 static_assert( 4 offsetof(kernel_descriptor_t, group_segment_fixed_size) == GROUP_SEGMENT_FIXED_SIZE_OFFSET, 5 "invalid offset for group_segment_fixed_size"); 6 static_assert( 7 offsetof(kernel_descriptor_t, private_segment_fixed_size) == PRIVATE_SEGMENT_FIXED_SIZE_OFFSET, 8 "invalid offset for private_segment_fixed_size"); 9 static_assert( 10 offsetof(kernel_descriptor_t, reserved0) == RESERVED0_OFFSET, 11 "invalid offset for reserved0"); ...
1640	The `!= 0` here is redundant.
1645	Could you move the call to `.drop_back(3)` out into the calling function, so it appears next to the check for `.endswith(StringRef(".kd"))`?
1675	I'm still not sure what we landed on for the semantics of `SoftFail` here?
llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
83	Could you either rename this to satisfy the linter, or else explicitly suppress the lint with a comment like: // NOLINTNEXTLINE(readability-identifier-naming)
llvm/test/tools/llvm-objdump/ELF/AMDGPU/code-object-v3.ll
48 ↗	(On Diff #274016)	I think we need more tests to ensure: The edge cases of the granularity calculation are correct That our promise of round-trip is fulfilled for at least some representative cases That some of the failure modes are handled how we expect

Updating the code bit by bit based on comments by @scott.linder.

Used macro to shorten printing directives
Replaced "\n" with '\n'
Removed unnecessary bit shifts for bits that must be 0
Removed extra comments for bits that must be 0

rochauha added inline comments.Jul 2 2020, 4:53 AM

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1355	I'm not sure. I added those comments because these functions were getting quite long.
1640	I know, but I thought that it is more readable this way.

rochauha marked an inline comment as done.Jul 2 2020, 4:56 AM

rochauha added inline comments.

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1675	It should be Success / Fail based on what the bytes are for code object v2. But there's nothing we are 'doing' at the moment for v2, I returned SoftFail.

rochauha marked an inline comment as done.Jul 2 2020, 5:02 AM

rochauha added inline comments.

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1255	Right now we can't really re-assemble in the lit-test. This needs to be tested 'informally' by: Manually writing a small test case. Make a copy of it too. Assembling it into the binary : Binary-1. Disassembling it. Replace the original kernel descriptor with the disassembled kernel descriptor in the copy. Assemble the copy : Binary-2. Compare Binary-1 and Binary-2.

Harbormaster failed remote builds in B62661: Diff 275067!Jul 2 2020, 6:25 AM

Fixed code-object-v3 lit test failure introduced in the previous change to this patch.
More changes as per inline comments.

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1482	My initial take was to decode as byte directives to the point of failure indicated by Size. Then going back to the normal flow of disassembling as instructions. But I get what you want to say in this comment. Updated the code based on this comment.
1645	Done.
llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
83	Done.

Size = 64 regardless of success or failure.

Harbormaster completed remote builds in B62807: Diff 275310.Jul 3 2020, 2:40 AM

Harbormaster completed remote builds in B62816: Diff 275321.Jul 3 2020, 3:12 AM

Compute .amdhsa_next_free_vgpr based on inverse of what the assembler does to compute GRANULATED_WORKITEM_VGPR_COUNT.
Some changes to accomodate differences between GFX9 and GFX10
Updated test case for GFX10 as well

rochauha marked an inline comment as done.Jul 6 2020, 5:07 AM

rochauha added inline comments.

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1255	Went this route to check whether re-assembled binaries match or not. Turns out that both binaries match, in size (overall size as well as size of sections) and also in terms of all the disassembled content. But a `diff object1 object2` says that binary files differ.

Harbormaster failed remote builds in B63006: Diff 275674!Jul 6 2020, 6:14 AM

scott.linder added inline comments.Jul 6 2020, 1:06 PM

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1255	I'm not sure I follow what you are describing; my thought was to start with just an asm source file containing only the kernel descriptor directive in the default section, and compare the output of the following (with, e.g. diff, as you mention): Assemble it to an object file with llvm-mc Assemble it to an object file with llvm-mc \| disassemble the kernel descriptor symbol \| trim any human-readable prologue \| assemble it to an object file with llvm-mc As a trivial example, diff doesn't find any difference for the following example: $ printf '.amdhsa_kernel my_kernel\n.amdhsa_next_free_vgpr 0\n.amdhsa_next_free_sgpr 0\n.end_amdhsa_kernel' >a.s $ release/bin/llvm-mc --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj a.s >a.o $ diff a.o \ <(release/bin/llvm-objdump --triple=amdgcn-amd-amdhsa --mcpu=gfx908 --disassemble-symbols=my_kernel.kd a.o \ \| tail -n +8 \ \| release/bin/llvm-mc --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj) I don't think you need to use `FileCheck` for these tests at all, you can just rely on ending the RUN pipeline with `diff`, which seems to be supported by lit. You can then just copy-paste the test and edit fields in the input to validate edge cases for things like the SGPR/VGPR allocation directives. I think more comprehensive testing, including for other sections and executables/DSOs, would be good eventually but for now we should at least have some tests that explicitly confirm the KD disassembly round-trips.
1355	I would lean towards omitting these, especially with the functions becoming shorter. For example, `decodeCOMPUTE_PGM_RSRC2()` is now <50 lines long at the entire definition now fits on one screen for me. It seems like there are other examples of this in the codebase, though, so I'm OK with it for the longer functions.
1359	Can you expand this comment a little and move it to a Doxygen comment for the function?
1634	Need to handle the "default" case here: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:1510:1: warning: control may reach end of non-void function [-Wreturn-type]
1640	Fair enough, in a type-safe language it would be required anyway, so it seems reasonable.
1675	If `SoftFail` isn't applicable I don't think we should return it, even if it is just because we haven't implemented something yet. It existing doesn't mean it needs to be used, I think it has a very narrow definition that doesn't apply here. Maybe just emit a diagnostic and return `Fail` so we get the "decode as .byte" behavior? What exactly happens now with the current patch as-is?

Handled default statement to silence the warning.
Expanded comments for decodeCOMPUTE_PGM_RSRC1 and decodeCOMPUTE_PGM_RSRC2.
Removed extra comment at the end of functions.
Changed SoftFail to Success for code object v2.
Replaced the old test case with a small assembly file.

Return MCDisassembler::Fail for code object v2.
Add missing full stops in doxygen comments.

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1355	Done.
1359	Done.
1634	Done.
1675	Done.

jhenderson added inline comments.Jul 8 2020, 1:19 AM

llvm/test/tools/llvm-objdump/ELF/AMDGPU/kernel-descriptor.s
1 ↗	(On Diff #276063)	No need for the `<`. llvm-mc is quite capable of taking inputs on the command-line as positional arguments.
2 ↗	(On Diff #276063)	This line is too long. Please break it up into individual lines: ; RUN: llvm-objdump ... \| \ ; RUN: tail -n +8 \| llvm-mc ...
5–8 ↗	(On Diff #276063)	This test is also quite small. Does it actually cover every code path?
10 ↗	(On Diff #276063)	Is this a FIXME/TODO? If so, please add "FIXME" or "TODO".

Changes as per review by @jhenderson.

rochauha marked 7 inline comments as done.Jul 13 2020, 3:03 AM

rochauha added inline comments.

llvm/test/tools/llvm-objdump/ELF/AMDGPU/kernel-descriptor.s
2 ↗	(On Diff #276063)	Done.
5–8 ↗	(On Diff #276063)	These values must be always specified by the user. Values of some bytes are computed using the values passed here. Consequently there are some cases where we need to test for getting the exact same bytes in the re-assembled binary, even if the disassembled values slightly deviate from the original values. Other bytes/bits hold the exact values specified using the assembler directives. Also they take default values if nothing is specified by the user. So I guess we don't need to test that for re-assembly.
10 ↗	(On Diff #276063)	Done.

rochauha added inline comments.Jul 13 2020, 3:03 AM

llvm/test/tools/llvm-objdump/ELF/AMDGPU/kernel-descriptor.s
1 ↗	(On Diff #276063)	Done.

Harbormaster failed remote builds in B63928: Diff 277360!Jul 13 2020, 3:52 AM

Switched to disassembling as numerical value rather than .amdgcn.next_free_sgpr.

Harbormaster failed remote builds in B65064: Diff 279490!Jul 21 2020, 5:44 AM

rochauha mentioned this in D84194: [AMDGPU] Correct the number of SGPR blocks used for GFX9.Jul 21 2020, 8:40 AM

scott.linder requested changes to this revision.Jul 21 2020, 3:28 PM

scott.linder added inline comments.

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1253–1260	I think this can just be replaced with: NextFreeVGPR = (GranulatedWorkitemVGPRCount + 1) * getVGPRAllocGranule(STI); Or we could add another function called `getNumVGPRs(const MCSubtargetInfo STI, unsigned NumVGPRBlocks, Optional<bool> EnableWavefrontSize32)` and put the definition directly next to `getNumVGPRBlocks(const MCSubtargetInfo STI, unsigned NumVGPRs, Optional<bool> EnableWavefrontSize32)` so any future changes affecting one also affect the other. I would lean towards this, and documenting that they are the inverse of one another.
1288–1295	Same as above, I think a new `getNumSGPRs` to complement `getNumSGPRBlocks` would make this easier to read. For the GFX10 case we could either leave the check here, or have the new function return `Optional` to indicate when there is an error.
1300	What is "GS" and why is this commented out?
1494	I would still like to see this done, the magic numbers here could lead to a few problems down the line, and presently they just make the code harder to read.
1582–1586	Can this just be `return decodeKernelDescriptor(...);`?
llvm/test/tools/llvm-objdump/ELF/AMDGPU/kernel-descriptor.s
6 ↗	(On Diff #279490)	Can you implement more tests? I don't know if it is feasible to include them all in the same .s file, as you have to work around the output not being re-assembleable in general, but even just copy-pasting and editing is fine with me. Just having some more test cases to cover at least a reasonable sample of the different branches, failure modes, etc.

This revision now requires changes to proceed.Jul 21 2020, 3:28 PM

Got rid of raw number and added an enum for offsets.
Added new tests.
Updated the nop-data.ll test so that it passes with this patch.

rochauha added inline comments.Jul 28 2020, 3:59 AM

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1253–1260	I think this route is good for now : NextFreeVGPR = (GranulatedWorkitemVGPRCount + 1) * getVGPRAllocGranule(STI); Similarly for SGPRs. I'd like to add the new functions via a separate patch. This patch is already quite big in terms of size.
1288–1295	As I mentioned in my other reply, this patch is already quite big. So it'd be better to have a separate patch for the new functions.
1300	This was was for printing Granulated wave front SGPR count.
llvm/test/tools/llvm-objdump/ELF/AMDGPU/kernel-descriptor.s
6 ↗	(On Diff #279490)	I have added new tests in separate files.

Harbormaster failed remote builds in B65991: Diff 281180!Jul 28 2020, 4:28 AM

LGTM, thank you!

I'm probably not in a position to review the majority of this further. However, I do have big reservations about the testing - there are a high number of possible code paths, but I only see 5 tests, which clearly can't cover all these code paths.

llvm/test/tools/llvm-objdump/ELF/AMDGPU/kernel-descriptor-1.s
1 ↗	(On Diff #281180)	It would be helpful to add a small comment at the start of each of these tests explaining what the are specifically testing.

In D80713#2186791, @jhenderson wrote:

I'm probably not in a position to review the majority of this further. However, I do have big reservations about the testing - there are a high number of possible code paths, but I only see 5 tests, which clearly can't cover all these code paths.

I think some of the difficulty in writing tests for the failure cases is not having a tool to produce them. I think one would have to generate the code object, and then hand-edit it in a hex-editor, copy it into the source tree (i.e. under Inputs/) maybe with some accompanying comment about the nature of the original source and the edits made?

It would be nice to at least cover some of the failure cases, even if they are more awkward to tests. There could also be some similar tests for e.g. GFX7,GFX10 and the edge cases around the {S,V}GPR calculation for each.

In D80713#2187653, @scott.linder wrote:

In D80713#2186791, @jhenderson wrote:

I'm probably not in a position to review the majority of this further. However, I do have big reservations about the testing - there are a high number of possible code paths, but I only see 5 tests, which clearly can't cover all these code paths.

I think some of the difficulty in writing tests for the failure cases is not having a tool to produce them. I think one would have to generate the code object, and then hand-edit it in a hex-editor, copy it into the source tree (i.e. under Inputs/) maybe with some accompanying comment about the nature of the original source and the edits made?

It would be nice to at least cover some of the failure cases, even if they are more awkward to tests. There could also be some similar tests for e.g. GFX7,GFX10 and the edge cases around the {S,V}GPR calculation for each.

It seems to me that the format of the data structure is well-understood (otherwise you wouldn't be able to write code to disassemble it). In similar situations, e.g. the DWARF .debug_line parsing, we didn't rely on the built-in .file/.loc directives to generate our line table, and instead wrote it out by hand using .byte/.half/.long/.quad. It's not the prettiest of things of course, but it's better than code paths that never get exercised in tests. You could also use yaml2obj to do a generate the section with a raw hex blob, and/or write an array of bytes in a gtest unit test. The latter situation might be particularly useful because it would allow you to use the same "base" array, and modify individual bytes in individual test cases to check the behaviour for each.

Updated tests and added a failure test.
Checking error when cursor holds Error::success() while handling failure cases.
Removed Size from subroutines' signatures as we aren't tracking the size.

Harbormaster completed remote builds in B68505: Diff 285815.Aug 14 2020, 11:44 PM

Can I consider this patch to be at NFC stage?

rochauha edited the summary of this revision. (Show Details)Aug 17 2020, 10:50 PM

This revision was not accepted when it landed; it landed in state Needs Review.Aug 18 2020, 8:19 PM

Closed by commit rGcacfb02d28a3: [AMDGPU] Support disassembly for AMDGPU kernel descriptors (authored by rochauha). · Explain Why

This revision was automatically updated to reflect the committed changes.

rochauha added a commit: rGcacfb02d28a3: [AMDGPU] Support disassembly for AMDGPU kernel descriptors.

Hi @rochauha,

I'm not sure this should have been pushed - nobody has reviewed your latest update, and I've had concerns about the testing prior to that, as stated. There is also a "Needs Revision" marker still outstanding, and I usually take that to mean that this shouldn't be pushed until the relevant reviewer is satisfied, regardless of what others have said (note that Phabricator highlighted that this patch landded in a "Needs Review" state).

Please could you revert, as there are potential assertion failures in my reading of this code, at least in the event you hit malformed input, in addition to the other issues raised.

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1230–1235	This is not how to handle malformed input. This will result in an assertion in debug builds, which is equivalent to a crash, without any useful context to draw on, because you've not reported the error. More below.
1418	Rather than all these `checkError` calls, I'd expect to see a check of the `Cursor` followed by a return of `MCDisassembler::Fail` to indicate there was a problem.
1552–1558	If this loop terminates due to `C` being invalid, you don't want to fall out the bottom and return `Success`, I think. You'd want to check `C` after loop termination and return `Fail`. Alternatively, if you return `Fail` from the `decodeKernelDescriptorDirective` you can do `cantFail(C.takeError());` after the loop. Ideally, we'd actually report the error from `C` in the event of failure, but currently there's no way of communicating that back up to the caller.
llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-failure.s
2	Please add a comment to the top of this test explaining what this test is actually testing.
4	Rather than writing text to another input file at run time in this way, you can use the new split-file tool to have all the input inline below, and split it up using the tool into multiple files.
12	Please don't mix comment markers within the same file. You use '//' here, but ';' everywhere else. Additionally, new LLVM binutils tests tend to use double comment markers to indicate true comments as opposed to RUN/CHECK lines (i.e. ';;' in this context).
35–36	Please remove the additional blank lines at the end of file.
llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-sgpr.s
2	Please follow the comments from `kd-failure.s` in all the tests (where applicable) too.

rochauha added a reverting change: rGfdf71d486c0f: Revert "[AMDGPU] Support disassembly for AMDGPU kernel descriptors".Aug 19 2020, 12:43 AM

Changes based on comments by @jhenderson

Used llvm::cantFail(C.takeError()) to handle error.
Removed blank lines at the end of test files.
Streamlined beginning of comments.
Used split-file tool to separate SGPR and VGPR test cases.

rochauha marked 2 inline comments as done.Sep 7 2020, 4:41 AM

rochauha added inline comments.

llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-failure.s
4	In case of kd-failure.s, concatenation is necessary because the disassembled output will be `.byte`s, and the symbol information is needed to be get the same binary again. I feel that "printf'ing" to a file and concatenating with disassembled text is a simpler compared to split file and concatenating. However, I have used the `split-file` tool to separate the S/VGPR test cases.

rochauha marked an inline comment as done.Sep 7 2020, 4:48 AM

rochauha added inline comments.

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1552–1558	Thanks for the pointer! I think `cantFail` seems to be all that is needed, because the cursor was holding Error::success in some cases, which needs to be 'checked' before moving further. Since the kernel descriptor is well defined, all cases where failure needs to be handled are handled using `MCDisassembler::Fail`.

Harbormaster completed remote builds in B70815: Diff 290241.Sep 7 2020, 5:56 AM

@kzhuravl, do you have any additional comments to add at all, since you rejected the original patch?

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1404	I think for self-documentation purposes, it would be helpful to assert that `Bytes.size() == 64` here. I see that it is verified in the calling function but a) that's not obvious when looking at this function in isolation, and b) in the future, we don't want other places calling this code without that check, so the assert provides a backstop of sorts.
1552–1558	Ah, I missed that the input can't be truncated, so `C` can't get into a failure state itself. Thanks! (See also my comment about an assert above).
llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-failure.s
3
llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-sgpr.s
23	Nit: missing full stop.

Changes based on comments by @jhenderson.

Harbormaster completed remote builds in B70918: Diff 290443.Sep 8 2020, 3:24 AM

No more comments from me, but please give @kzhuravl a chance to respond before pushing this patch.

LGTM, thank you!

This revision is now accepted and ready to land.Sep 8 2020, 8:44 AM

Closed by commit rG487a80531006: [AMDGPU] Support disassembly for AMDGPU kernel descriptors (authored by rochauha). · Explain WhySep 8 2020, 8:57 AM

This revision was automatically updated to reflect the committed changes.

rochauha added a commit: rG487a80531006: [AMDGPU] Support disassembly for AMDGPU kernel descriptors.

3 of the test cases - kd-sgpr.s, kd-vgpr.s, kd-zeroed-gfx10.s fail with the PowerPC buildbot (http://lab.llvm.org:8011/builders/clang-ppc64be-linux/builds/53608). From what I understand, these tests should only run if AMDGPU target is built. The lit.local.cfg file specifies that.

In D80713#2261550, @rochauha wrote:

3 of the test cases - kd-sgpr.s, kd-vgpr.s, kd-zeroed-gfx10.s fail with the PowerPC buildbot (http://lab.llvm.org:8011/builders/clang-ppc64be-linux/builds/53608). From what I understand, these tests should only run if AMDGPU target is built. The lit.local.cfg file specifies that.

The PowerPC build bot just means a build bot that is running on a PowerPC host. It actually targets most targets. See the following excerpt from the CMake.

-- Targeting AArch64
-- Targeting AMDGPU
-- Targeting ARM
-- Targeting AVR
-- Targeting BPF
-- Targeting Hexagon
-- Targeting Lanai
-- Targeting Mips
-- Targeting MSP430
-- Targeting NVPTX
-- Targeting PowerPC
-- Targeting RISCV
-- Targeting Sparc
-- Targeting SystemZ
-- Targeting WebAssembly
-- Targeting X86
-- Targeting XCore

I've run into this sort of problem before. The issue is almost certainly either a) incorrect assumption about host system endianness, meaning that you've incorrectly/inadvertently assumed the host is little endian, or b) assumed a 64-bit system somewhere. a) is almost certainly the issue here, based on both the test output and build bot name (ppc64*be*). I've skimmed the patch, but can't obviously see where the code is going wrong, but the test output for kd-zeroed-gfx10.s suggests it's around bytes 48-52. There may be other issues elsewhere though, since 0 renders the same regardless of size and endianness.

In D80713#2262751, @jhenderson wrote:

...

I've run into this sort of problem before. The issue is almost certainly either a) incorrect assumption about host system endianness, meaning that you've incorrectly/inadvertently assumed the host is little endian, or b) assumed a 64-bit system somewhere. a) is almost certainly the issue here, based on both the test output and build bot name (ppc64*be*). I've skimmed the patch, but can't obviously see where the code is going wrong, but the test output for kd-zeroed-gfx10.s suggests it's around bytes 48-52. There may be other issues elsewhere though, since 0 renders the same regardless of size and endianness.

Yes, considering the failures appear to be exactly the same on PowerPC Big Endian and SystemZ (which is also Big Endian), I would assume that there is a host endianness assumption here (I haven't looked at the code). Please pull this patch until a fix is ready. There are at least the 4 bots that are red because of this for quite a few consecutive builds. Pull the patch so we can get the bots back to green and we can work on a fix afterwards.

rochauha added a reverting change: rGf078577f31cc: Revert "[AMDGPU] Support disassembly for AMDGPU kernel descriptors".Sep 9 2020, 5:34 AM

rochauha added inline comments.Oct 5 2020, 4:08 AM

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1405	I think this is where little endian is being 'hardcoded'. However, since AMDGPU relocatable objects are meant to be little endian, I don't understand why they are big endian on pp64.

scott.linder added inline comments.Oct 5 2020, 12:04 PM

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1405	I think this is actually a bug with the encoding code. Like you say, we should be host-endianness agnostic when encoding the kernel descriptor, but it seems we aren't. I think I vaguely remember this coming up when we implemented it, but I don't remember why we didn't do this to start. I think it is just an oversight.

scott.linder added inline comments.Oct 5 2020, 3:55 PM

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1405	I think https://reviews.llvm.org/D88858 should be the fix, need to confirm if big-endian testers will run it automatically.

scott.linder added inline comments.Oct 6 2020, 10:46 AM

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1405	I tried to determine if the pre-checkin builders include any big-endian archs, but gave up and just committed it. I'll keep an eye on the builders and see if it needs to be revered. After that you can proceed with this patch again.

rochauha added inline comments.Oct 6 2020, 10:52 AM

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1405	Thanks!

scott.linder added inline comments.Oct 6 2020, 1:42 PM

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1405	An update, I missed one test in the initial commit, but I followed up in bf5c1d92d92ef8cee2adbfa17ecca20a8f65dc0e and now the big-endian testers seem to be happy. I think you can try reapplying this.

Previous reversions were due to test cases failing on big endian hosts.
This was due to multibyte values being laid out as-is from host memory.
Now https://reviews.llvm.org/D88858 addresses the above, and has landed.

This revision is now accepted and ready to land.Oct 6 2020, 8:44 PM

Re-applied patch.

Harbormaster completed remote builds in B74219: Diff 296590.Oct 6 2020, 9:08 PM

scott.linder accepted this revision.Oct 7 2020, 8:08 AM

This revision was landed with ongoing or failed builds.Oct 7 2020, 8:11 AM

Closed by commit rG528057c19755: [AMDGPU] Support disassembly for AMDGPU kernel descriptors (authored by rochauha). · Explain Why

This revision was automatically updated to reflect the committed changes.

rochauha added a commit: rG528057c19755: [AMDGPU] Support disassembly for AMDGPU kernel descriptors.

rochauha mentioned this in rGa85e43e99676: Remove D80713.diff added in 528057c19755ad842052fba3a42dcbf7deafc6de.Oct 7 2020, 12:32 PM

Revision Contents

Path

Size

D80713.diff

848 lines

llvm/

include/

llvm/

Support/

AMDHSAKernelDescriptor.h

70 lines

lib/

Target/

AMDGPU/

Disassembler/

AMDGPUDisassembler.h

30 lines

AMDGPUDisassembler.cpp

345 lines

test/

CodeGen/

AMDGPU/

nop-data.ll

4 lines

tools/

llvm-objdump/

ELF/

AMDGPU/

37 lines

49 lines

36 lines

58 lines

53 lines

41 lines

tools/

llvm-objdump/

llvm-objdump.cpp

17 lines

Diff 296683

D80713.diff

This file was added.

				diff --git a/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h b/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h
				--- a/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h
				+++ b/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h
				@@ -162,39 +162,49 @@
				uint8_t reserved2[6];
				};

				+enum : uint32_t {
				+ GROUP_SEGMENT_FIXED_SIZE_OFFSET = 0,
				+ PRIVATE_SEGMENT_FIXED_SIZE_OFFSET = 4,
				+ RESERVED0_OFFSET = 8,
				+ KERNEL_CODE_ENTRY_BYTE_OFFSET_OFFSET = 16,
				+ RESERVED1_OFFSET = 24,
				+ COMPUTE_PGM_RSRC3_OFFSET = 44,
				+ COMPUTE_PGM_RSRC1_OFFSET = 48,
				+ COMPUTE_PGM_RSRC2_OFFSET = 52,
				+ KERNEL_CODE_PROPERTIES_OFFSET = 56,
				+ RESERVED2_OFFSET = 58,
				+};
				+
				static_assert(
				sizeof(kernel_descriptor_t) == 64,
				"invalid size for kernel_descriptor_t");
				-static_assert(
				- offsetof(kernel_descriptor_t, group_segment_fixed_size) == 0,
				- "invalid offset for group_segment_fixed_size");
				-static_assert(
				- offsetof(kernel_descriptor_t, private_segment_fixed_size) == 4,
				- "invalid offset for private_segment_fixed_size");
				-static_assert(
				- offsetof(kernel_descriptor_t, reserved0) == 8,
				- "invalid offset for reserved0");
				-static_assert(
				- offsetof(kernel_descriptor_t, kernel_code_entry_byte_offset) == 16,
				- "invalid offset for kernel_code_entry_byte_offset");
				-static_assert(
				- offsetof(kernel_descriptor_t, reserved1) == 24,
				- "invalid offset for reserved1");
				-static_assert(
				- offsetof(kernel_descriptor_t, compute_pgm_rsrc3) == 44,
				- "invalid offset for compute_pgm_rsrc3");
				-static_assert(
				- offsetof(kernel_descriptor_t, compute_pgm_rsrc1) == 48,
				- "invalid offset for compute_pgm_rsrc1");
				-static_assert(
				- offsetof(kernel_descriptor_t, compute_pgm_rsrc2) == 52,
				- "invalid offset for compute_pgm_rsrc2");
				-static_assert(
				- offsetof(kernel_descriptor_t, kernel_code_properties) == 56,
				- "invalid offset for kernel_code_properties");
				-static_assert(
				- offsetof(kernel_descriptor_t, reserved2) == 58,
				- "invalid offset for reserved2");
				+static_assert(offsetof(kernel_descriptor_t, group_segment_fixed_size) ==
				+ GROUP_SEGMENT_FIXED_SIZE_OFFSET,
				+ "invalid offset for group_segment_fixed_size");
				+static_assert(offsetof(kernel_descriptor_t, private_segment_fixed_size) ==
				+ PRIVATE_SEGMENT_FIXED_SIZE_OFFSET,
				+ "invalid offset for private_segment_fixed_size");
				+static_assert(offsetof(kernel_descriptor_t, reserved0) == RESERVED0_OFFSET,
				+ "invalid offset for reserved0");
				+static_assert(offsetof(kernel_descriptor_t, kernel_code_entry_byte_offset) ==
				+ KERNEL_CODE_ENTRY_BYTE_OFFSET_OFFSET,
				+ "invalid offset for kernel_code_entry_byte_offset");
				+static_assert(offsetof(kernel_descriptor_t, reserved1) == RESERVED1_OFFSET,
				+ "invalid offset for reserved1");
				+static_assert(offsetof(kernel_descriptor_t, compute_pgm_rsrc3) ==
				+ COMPUTE_PGM_RSRC3_OFFSET,
				+ "invalid offset for compute_pgm_rsrc3");
				+static_assert(offsetof(kernel_descriptor_t, compute_pgm_rsrc1) ==
				+ COMPUTE_PGM_RSRC1_OFFSET,
				+ "invalid offset for compute_pgm_rsrc1");
				+static_assert(offsetof(kernel_descriptor_t, compute_pgm_rsrc2) ==
				+ COMPUTE_PGM_RSRC2_OFFSET,
				+ "invalid offset for compute_pgm_rsrc2");
				+static_assert(offsetof(kernel_descriptor_t, kernel_code_properties) ==
				+ KERNEL_CODE_PROPERTIES_OFFSET,
				+ "invalid offset for kernel_code_properties");
				+static_assert(offsetof(kernel_descriptor_t, reserved2) == RESERVED2_OFFSET,
				+ "invalid offset for reserved2");

				} // end namespace amdhsa
				} // end namespace llvm
				diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
				--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
				+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
				@@ -17,10 +17,11 @@

				#include "llvm/ADT/ArrayRef.h"
				#include "llvm/MC/MCContext.h"
				-#include "llvm/MC/MCInstrInfo.h"
				#include "llvm/MC/MCDisassembler/MCDisassembler.h"
				#include "llvm/MC/MCDisassembler/MCRelocationInfo.h"
				#include "llvm/MC/MCDisassembler/MCSymbolizer.h"
				+#include "llvm/MC/MCInstrInfo.h"
				+#include "llvm/Support/DataExtractor.h"

				#include <algorithm>
				#include <cstdint>
				@@ -66,6 +67,33 @@
				DecodeStatus tryDecodeInst(const uint8_t* Table, MCInst &MI, uint64_t Inst,
				uint64_t Address) const;

				+ Optional<DecodeStatus> onSymbolStart(SymbolInfoTy &Symbol, uint64_t &Size,
				+ ArrayRef<uint8_t> Bytes,
				+ uint64_t Address,
				+ raw_ostream &CStream) const override;
				+
				+ DecodeStatus decodeKernelDescriptor(StringRef KdName, ArrayRef<uint8_t> Bytes,
				+ uint64_t KdAddress) const;
				+
				+ DecodeStatus
				+ decodeKernelDescriptorDirective(DataExtractor::Cursor &Cursor,
				+ ArrayRef<uint8_t> Bytes,
				+ raw_string_ostream &KdStream) const;
				+
				+ /// Decode as directives that handle COMPUTE_PGM_RSRC1.
				+ /// \param FourByteBuffer - Bytes holding contents of COMPUTE_PGM_RSRC1.
				+ /// \param KdStream - Stream to write the disassembled directives to.
				+ // NOLINTNEXTLINE(readability-identifier-naming)
				+ DecodeStatus decodeCOMPUTE_PGM_RSRC1(uint32_t FourByteBuffer,
				+ raw_string_ostream &KdStream) const;
				+
				+ /// Decode as directives that handle COMPUTE_PGM_RSRC2.
				+ /// \param FourByteBuffer - Bytes holding contents of COMPUTE_PGM_RSRC2.
				+ /// \param KdStream - Stream to write the disassembled directives to.
				+ // NOLINTNEXTLINE(readability-identifier-naming)
				+ DecodeStatus decodeCOMPUTE_PGM_RSRC2(uint32_t FourByteBuffer,
				+ raw_string_ostream &KdStream) const;
				+
				DecodeStatus convertSDWAInst(MCInst &MI) const;
				DecodeStatus convertDPP8Inst(MCInst &MI) const;
				DecodeStatus convertMIMGInst(MCInst &MI) const;
				diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
				--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
				+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
				@@ -34,6 +34,7 @@
				#include "llvm/MC/MCFixedLenDisassembler.h"
				#include "llvm/MC/MCInst.h"
				#include "llvm/MC/MCSubtargetInfo.h"
				+#include "llvm/Support/AMDHSAKernelDescriptor.h"
				#include "llvm/Support/Endian.h"
				#include "llvm/Support/ErrorHandling.h"
				#include "llvm/Support/MathExtras.h"
				@@ -1215,6 +1216,350 @@
				return STI.getFeatureBits()[AMDGPU::FeatureGFX10];
				}

				+//===----------------------------------------------------------------------===//
				+// AMDGPU specific symbol handling
				+//===----------------------------------------------------------------------===//
				+#define PRINT_DIRECTIVE(DIRECTIVE, MASK) \
				+ do { \
				+ KdStream << Indent << DIRECTIVE " " \
				+ << ((FourByteBuffer & MASK) >> (MASK##_SHIFT)) << '\n'; \
				+ } while (0)
				+
				+// NOLINTNEXTLINE(readability-identifier-naming)
				+MCDisassembler::DecodeStatus AMDGPUDisassembler::decodeCOMPUTE_PGM_RSRC1(
				+ uint32_t FourByteBuffer, raw_string_ostream &KdStream) const {
				+ using namespace amdhsa;
				+ StringRef Indent = "\t";
				+
				+ // We cannot accurately backward compute #VGPRs used from
				+ // GRANULATED_WORKITEM_VGPR_COUNT. But we are concerned with getting the same
				+ // value of GRANULATED_WORKITEM_VGPR_COUNT in the reassembled binary. So we
				+ // simply calculate the inverse of what the assembler does.
				+
				+ uint32_t GranulatedWorkitemVGPRCount =
				+ (FourByteBuffer & COMPUTE_PGM_RSRC1_GRANULATED_WORKITEM_VGPR_COUNT) >>
				+ COMPUTE_PGM_RSRC1_GRANULATED_WORKITEM_VGPR_COUNT_SHIFT;
				+
				+ uint32_t NextFreeVGPR = (GranulatedWorkitemVGPRCount + 1) *
				+ AMDGPU::IsaInfo::getVGPREncodingGranule(&STI);
				+
				+ KdStream << Indent << ".amdhsa_next_free_vgpr " << NextFreeVGPR << '\n';
				+
				+ // We cannot backward compute values used to calculate
				+ // GRANULATED_WAVEFRONT_SGPR_COUNT. Hence the original values for following
				+ // directives can't be computed:
				+ // .amdhsa_reserve_vcc
				+ // .amdhsa_reserve_flat_scratch
				+ // .amdhsa_reserve_xnack_mask
				+ // They take their respective default values if not specified in the assembly.
				+ //
				+ // GRANULATED_WAVEFRONT_SGPR_COUNT
				+ // = f(NEXT_FREE_SGPR + VCC + FLAT_SCRATCH + XNACK_MASK)
				+ //
				+ // We compute the inverse as though all directives apart from NEXT_FREE_SGPR
				+ // are set to 0. So while disassembling we consider that:
				+ //
				+ // GRANULATED_WAVEFRONT_SGPR_COUNT
				+ // = f(NEXT_FREE_SGPR + 0 + 0 + 0)
				+ //
				+ // The disassembler cannot recover the original values of those 3 directives.
				+
				+ uint32_t GranulatedWavefrontSGPRCount =
				+ (FourByteBuffer & COMPUTE_PGM_RSRC1_GRANULATED_WAVEFRONT_SGPR_COUNT) >>
				+ COMPUTE_PGM_RSRC1_GRANULATED_WAVEFRONT_SGPR_COUNT_SHIFT;
				+
				+ if (isGFX10() && GranulatedWavefrontSGPRCount)
				+ return MCDisassembler::Fail;
				+
				+ uint32_t NextFreeSGPR = (GranulatedWavefrontSGPRCount + 1) *
				+ AMDGPU::IsaInfo::getSGPREncodingGranule(&STI);
				+
				+ KdStream << Indent << ".amdhsa_reserve_vcc " << 0 << '\n';
				+ KdStream << Indent << ".amdhsa_reserve_flat_scratch " << 0 << '\n';
				+ KdStream << Indent << ".amdhsa_reserve_xnack_mask " << 0 << '\n';
				+ KdStream << Indent << ".amdhsa_next_free_sgpr " << NextFreeSGPR << "\n";
				+
				+ if (FourByteBuffer & COMPUTE_PGM_RSRC1_PRIORITY)
				+ return MCDisassembler::Fail;
				+
				+ PRINT_DIRECTIVE(".amdhsa_float_round_mode_32",
				+ COMPUTE_PGM_RSRC1_FLOAT_ROUND_MODE_32);
				+ PRINT_DIRECTIVE(".amdhsa_float_round_mode_16_64",
				+ COMPUTE_PGM_RSRC1_FLOAT_ROUND_MODE_16_64);
				+ PRINT_DIRECTIVE(".amdhsa_float_denorm_mode_32",
				+ COMPUTE_PGM_RSRC1_FLOAT_DENORM_MODE_32);
				+ PRINT_DIRECTIVE(".amdhsa_float_denorm_mode_16_64",
				+ COMPUTE_PGM_RSRC1_FLOAT_DENORM_MODE_16_64);
				+
				+ if (FourByteBuffer & COMPUTE_PGM_RSRC1_PRIV)
				+ return MCDisassembler::Fail;
				+
				+ PRINT_DIRECTIVE(".amdhsa_dx10_clamp", COMPUTE_PGM_RSRC1_ENABLE_DX10_CLAMP);
				+
				+ if (FourByteBuffer & COMPUTE_PGM_RSRC1_DEBUG_MODE)
				+ return MCDisassembler::Fail;
				+
				+ PRINT_DIRECTIVE(".amdhsa_ieee_mode", COMPUTE_PGM_RSRC1_ENABLE_IEEE_MODE);
				+
				+ if (FourByteBuffer & COMPUTE_PGM_RSRC1_BULKY)
				+ return MCDisassembler::Fail;
				+
				+ if (FourByteBuffer & COMPUTE_PGM_RSRC1_CDBG_USER)
				+ return MCDisassembler::Fail;
				+
				+ PRINT_DIRECTIVE(".amdhsa_fp16_overflow", COMPUTE_PGM_RSRC1_FP16_OVFL);
				+
				+ if (FourByteBuffer & COMPUTE_PGM_RSRC1_RESERVED0)
				+ return MCDisassembler::Fail;
				+
				+ if (isGFX10()) {
				+ PRINT_DIRECTIVE(".amdhsa_workgroup_processor_mode",
				+ COMPUTE_PGM_RSRC1_WGP_MODE);
				+ PRINT_DIRECTIVE(".amdhsa_memory_ordered", COMPUTE_PGM_RSRC1_MEM_ORDERED);
				+ PRINT_DIRECTIVE(".amdhsa_forward_progress", COMPUTE_PGM_RSRC1_FWD_PROGRESS);
				+ }
				+ return MCDisassembler::Success;
				+}
				+
				+// NOLINTNEXTLINE(readability-identifier-naming)
				+MCDisassembler::DecodeStatus AMDGPUDisassembler::decodeCOMPUTE_PGM_RSRC2(
				+ uint32_t FourByteBuffer, raw_string_ostream &KdStream) const {
				+ using namespace amdhsa;
				+ StringRef Indent = "\t";
				+ PRINT_DIRECTIVE(
				+ ".amdhsa_system_sgpr_private_segment_wavefront_offset",
				+ COMPUTE_PGM_RSRC2_ENABLE_SGPR_PRIVATE_SEGMENT_WAVEFRONT_OFFSET);
				+ PRINT_DIRECTIVE(".amdhsa_system_sgpr_workgroup_id_x",
				+ COMPUTE_PGM_RSRC2_ENABLE_SGPR_WORKGROUP_ID_X);
				+ PRINT_DIRECTIVE(".amdhsa_system_sgpr_workgroup_id_y",
				+ COMPUTE_PGM_RSRC2_ENABLE_SGPR_WORKGROUP_ID_Y);
				+ PRINT_DIRECTIVE(".amdhsa_system_sgpr_workgroup_id_z",
				+ COMPUTE_PGM_RSRC2_ENABLE_SGPR_WORKGROUP_ID_Z);
				+ PRINT_DIRECTIVE(".amdhsa_system_sgpr_workgroup_info",
				+ COMPUTE_PGM_RSRC2_ENABLE_SGPR_WORKGROUP_INFO);
				+ PRINT_DIRECTIVE(".amdhsa_system_vgpr_workitem_id",
				+ COMPUTE_PGM_RSRC2_ENABLE_VGPR_WORKITEM_ID);
				+
				+ if (FourByteBuffer & COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_ADDRESS_WATCH)
				+ return MCDisassembler::Fail;
				+
				+ if (FourByteBuffer & COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_MEMORY)
				+ return MCDisassembler::Fail;
				+
				+ if (FourByteBuffer & COMPUTE_PGM_RSRC2_GRANULATED_LDS_SIZE)
				+ return MCDisassembler::Fail;
				+
				+ PRINT_DIRECTIVE(
				+ ".amdhsa_exception_fp_ieee_invalid_op",
				+ COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_IEEE_754_FP_INVALID_OPERATION);
				+ PRINT_DIRECTIVE(".amdhsa_exception_fp_denorm_src",
				+ COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_FP_DENORMAL_SOURCE);
				+ PRINT_DIRECTIVE(
				+ ".amdhsa_exception_fp_ieee_div_zero",
				+ COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_IEEE_754_FP_DIVISION_BY_ZERO);
				+ PRINT_DIRECTIVE(".amdhsa_exception_fp_ieee_overflow",
				+ COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_IEEE_754_FP_OVERFLOW);
				+ PRINT_DIRECTIVE(".amdhsa_exception_fp_ieee_underflow",
				+ COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_IEEE_754_FP_UNDERFLOW);
				+ PRINT_DIRECTIVE(".amdhsa_exception_fp_ieee_inexact",
				+ COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_IEEE_754_FP_INEXACT);
				+ PRINT_DIRECTIVE(".amdhsa_exception_int_div_zero",
				+ COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_INT_DIVIDE_BY_ZERO);
				+
				+ if (FourByteBuffer & COMPUTE_PGM_RSRC2_RESERVED0)
				+ return MCDisassembler::Fail;
				+
				+ return MCDisassembler::Success;
				+}
				+
				+#undef PRINT_DIRECTIVE
				+
				+MCDisassembler::DecodeStatus
				+AMDGPUDisassembler::decodeKernelDescriptorDirective(
				+ DataExtractor::Cursor &Cursor, ArrayRef<uint8_t> Bytes,
				+ raw_string_ostream &KdStream) const {
				+#define PRINT_DIRECTIVE(DIRECTIVE, MASK) \
				+ do { \
				+ KdStream << Indent << DIRECTIVE " " \
				+ << ((TwoByteBuffer & MASK) >> (MASK##_SHIFT)) << '\n'; \
				+ } while (0)
				+
				+ uint16_t TwoByteBuffer = 0;
				+ uint32_t FourByteBuffer = 0;
				+ uint64_t EightByteBuffer = 0;
				+
				+ StringRef ReservedBytes;
				+ StringRef Indent = "\t";
				+
				+ assert(Bytes.size() == 64);
				+ DataExtractor DE(Bytes, /IsLittleEndian=/true, /AddressSize=/8);
				+
				+ switch (Cursor.tell()) {
				+ case amdhsa::GROUP_SEGMENT_FIXED_SIZE_OFFSET:
				+ FourByteBuffer = DE.getU32(Cursor);
				+ KdStream << Indent << ".amdhsa_group_segment_fixed_size " << FourByteBuffer
				+ << '\n';
				+ return MCDisassembler::Success;
				+
				+ case amdhsa::PRIVATE_SEGMENT_FIXED_SIZE_OFFSET:
				+ FourByteBuffer = DE.getU32(Cursor);
				+ KdStream << Indent << ".amdhsa_private_segment_fixed_size "
				+ << FourByteBuffer << '\n';
				+ return MCDisassembler::Success;
				+
				+ case amdhsa::RESERVED0_OFFSET:
				+ // 8 reserved bytes, must be 0.
				+ EightByteBuffer = DE.getU64(Cursor);
				+ if (EightByteBuffer) {
				+ return MCDisassembler::Fail;
				+ }
				+ return MCDisassembler::Success;
				+
				+ case amdhsa::KERNEL_CODE_ENTRY_BYTE_OFFSET_OFFSET:
				+ // KERNEL_CODE_ENTRY_BYTE_OFFSET
				+ // So far no directive controls this for Code Object V3, so simply skip for
				+ // disassembly.
				+ DE.skip(Cursor, 8);
				+ return MCDisassembler::Success;
				+
				+ case amdhsa::RESERVED1_OFFSET:
				+ // 20 reserved bytes, must be 0.
				+ ReservedBytes = DE.getBytes(Cursor, 20);
				+ for (int I = 0; I < 20; ++I) {
				+ if (ReservedBytes[I] != 0) {
				+ return MCDisassembler::Fail;
				+ }
				+ }
				+ return MCDisassembler::Success;
				+
				+ case amdhsa::COMPUTE_PGM_RSRC3_OFFSET:
				+ // COMPUTE_PGM_RSRC3
				+ // - Only set for GFX10, GFX6-9 have this to be 0.
				+ // - Currently no directives directly control this.
				+ FourByteBuffer = DE.getU32(Cursor);
				+ if (!isGFX10() && FourByteBuffer) {
				+ return MCDisassembler::Fail;
				+ }
				+ return MCDisassembler::Success;
				+
				+ case amdhsa::COMPUTE_PGM_RSRC1_OFFSET:
				+ FourByteBuffer = DE.getU32(Cursor);
				+ if (decodeCOMPUTE_PGM_RSRC1(FourByteBuffer, KdStream) ==
				+ MCDisassembler::Fail) {
				+ return MCDisassembler::Fail;
				+ }
				+ return MCDisassembler::Success;
				+
				+ case amdhsa::COMPUTE_PGM_RSRC2_OFFSET:
				+ FourByteBuffer = DE.getU32(Cursor);
				+ if (decodeCOMPUTE_PGM_RSRC2(FourByteBuffer, KdStream) ==
				+ MCDisassembler::Fail) {
				+ return MCDisassembler::Fail;
				+ }
				+ return MCDisassembler::Success;
				+
				+ case amdhsa::KERNEL_CODE_PROPERTIES_OFFSET:
				+ using namespace amdhsa;
				+ TwoByteBuffer = DE.getU16(Cursor);
				+
				+ PRINT_DIRECTIVE(".amdhsa_user_sgpr_private_segment_buffer",
				+ KERNEL_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER);
				+ PRINT_DIRECTIVE(".amdhsa_user_sgpr_dispatch_ptr",
				+ KERNEL_CODE_PROPERTY_ENABLE_SGPR_DISPATCH_PTR);
				+ PRINT_DIRECTIVE(".amdhsa_user_sgpr_queue_ptr",
				+ KERNEL_CODE_PROPERTY_ENABLE_SGPR_QUEUE_PTR);
				+ PRINT_DIRECTIVE(".amdhsa_user_sgpr_kernarg_segment_ptr",
				+ KERNEL_CODE_PROPERTY_ENABLE_SGPR_KERNARG_SEGMENT_PTR);
				+ PRINT_DIRECTIVE(".amdhsa_user_sgpr_dispatch_id",
				+ KERNEL_CODE_PROPERTY_ENABLE_SGPR_DISPATCH_ID);
				+ PRINT_DIRECTIVE(".amdhsa_user_sgpr_flat_scratch_init",
				+ KERNEL_CODE_PROPERTY_ENABLE_SGPR_FLAT_SCRATCH_INIT);
				+ PRINT_DIRECTIVE(".amdhsa_user_sgpr_private_segment_size",
				+ KERNEL_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_SIZE);
				+
				+ if (TwoByteBuffer & KERNEL_CODE_PROPERTY_RESERVED0)
				+ return MCDisassembler::Fail;
				+
				+ // Reserved for GFX9
				+ if (isGFX9() &&
				+ (TwoByteBuffer & KERNEL_CODE_PROPERTY_ENABLE_WAVEFRONT_SIZE32)) {
				+ return MCDisassembler::Fail;
				+ } else if (isGFX10()) {
				+ PRINT_DIRECTIVE(".amdhsa_wavefront_size32",
				+ KERNEL_CODE_PROPERTY_ENABLE_WAVEFRONT_SIZE32);
				+ }
				+
				+ if (TwoByteBuffer & KERNEL_CODE_PROPERTY_RESERVED1)
				+ return MCDisassembler::Fail;
				+
				+ return MCDisassembler::Success;
				+
				+ case amdhsa::RESERVED2_OFFSET:
				+ // 6 bytes from here are reserved, must be 0.
				+ ReservedBytes = DE.getBytes(Cursor, 6);
				+ for (int I = 0; I < 6; ++I) {
				+ if (ReservedBytes[I] != 0)
				+ return MCDisassembler::Fail;
				+ }
				+ return MCDisassembler::Success;
				+
				+ default:
				+ llvm_unreachable("Unhandled index. Case statements cover everything.");
				+ return MCDisassembler::Fail;
				+ }
				+#undef PRINT_DIRECTIVE
				+}
				+
				+MCDisassembler::DecodeStatus AMDGPUDisassembler::decodeKernelDescriptor(
				+ StringRef KdName, ArrayRef<uint8_t> Bytes, uint64_t KdAddress) const {
				+ // CP microcode requires the kernel descriptor to be 64 aligned.
				+ if (Bytes.size() != 64 \|\| KdAddress % 64 != 0)
				+ return MCDisassembler::Fail;
				+
				+ std::string Kd;
				+ raw_string_ostream KdStream(Kd);
				+ KdStream << ".amdhsa_kernel " << KdName << '\n';
				+
				+ DataExtractor::Cursor C(0);
				+ while (C && C.tell() < Bytes.size()) {
				+ MCDisassembler::DecodeStatus Status =
				+ decodeKernelDescriptorDirective(C, Bytes, KdStream);
				+
				+ cantFail(C.takeError());
				+
				+ if (Status == MCDisassembler::Fail)
				+ return MCDisassembler::Fail;
				+ }
				+ KdStream << ".end_amdhsa_kernel\n";
				+ outs() << KdStream.str();
				+ return MCDisassembler::Success;
				+}
				+
				+Optional<MCDisassembler::DecodeStatus>
				+AMDGPUDisassembler::onSymbolStart(SymbolInfoTy &Symbol, uint64_t &Size,
				+ ArrayRef<uint8_t> Bytes, uint64_t Address,
				+ raw_ostream &CStream) const {
				+ // Right now only kernel descriptor needs to be handled.
				+ // We ignore all other symbols for target specific handling.
				+ // TODO:
				+ // Fix the spurious symbol issue for AMDGPU kernels. Exists for both Code
				+ // Object V2 and V3 when symbols are marked protected.
				+
				+ // amd_kernel_code_t for Code Object V2.
				+ if (Symbol.Type == ELF::STT_AMDGPU_HSA_KERNEL) {
				+ Size = 256;
				+ return MCDisassembler::Fail;
				+ }
				+
				+ // Code Object V3 kernel descriptors.
				+ StringRef Name = Symbol.Name;
				+ if (Symbol.Type == ELF::STT_OBJECT && Name.endswith(StringRef(".kd"))) {
				+ Size = 64; // Size = 64 regardless of success or failure.
				+ return decodeKernelDescriptor(Name.drop_back(3), Bytes, Address);
				+ }
				+ return None;
				+}
				+
				//===----------------------------------------------------------------------===//
				// AMDGPUSymbolizer
				//===----------------------------------------------------------------------===//
				diff --git a/llvm/test/CodeGen/AMDGPU/nop-data.ll b/llvm/test/CodeGen/AMDGPU/nop-data.ll
				--- a/llvm/test/CodeGen/AMDGPU/nop-data.ll
				+++ b/llvm/test/CodeGen/AMDGPU/nop-data.ll
				@@ -1,7 +1,7 @@
				; RUN: llc -mtriple=amdgcn--amdhsa -mattr=-code-object-v3 -mcpu=fiji -filetype=obj < %s \| llvm-objdump -d - --mcpu=fiji \| FileCheck %s

				; CHECK: <kernel0>:
				-; CHECK-NEXT: s_endpgm
				+; CHECK: s_endpgm
				define amdgpu_kernel void @kernel0() align 256 {
				entry:
				ret void
				@@ -80,7 +80,7 @@

				; CHECK-EMPTY:
				; CHECK-NEXT: <kernel1>:
				-; CHECK-NEXT: s_endpgm
				+; CHECK: s_endpgm
				define amdgpu_kernel void @kernel1(i32 addrspace(1)* addrspace(4)* %ptr.out) align 256 {
				entry:
				ret void
				diff --git a/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-failure.s b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-failure.s
				new file mode 100644
				--- /dev/null
				+++ b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-failure.s
				@@ -0,0 +1,37 @@
				+;; Failure test. We create a malformed kernel descriptor (KD) by manually
				+;; setting the bytes, because one can't create a malformed KD using the
				+;; assembler directives.
				+
				+; RUN: llvm-mc %s -mattr=+code-object-v3 --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t.o
				+
				+; RUN: printf ".type my_kernel.kd, @object \nmy_kernel.kd:\n.size my_kernel.kd, 64\n" > %t1.sym_info
				+; RUN: llvm-objdump --disassemble-symbols=my_kernel.kd %t.o \
				+; RUN: \| tail -n +9 > %t1.sym_content
				+; RUN: cat %t1.sym_info %t1.sym_content > %t1.s
				+
				+; RUN: llvm-mc %t1.s -mattr=+code-object-v3 --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t-re-assemble.o
				+; RUN: diff %t.o %t-re-assemble.o
				+
				+;; Test failure by setting one of the reserved bytes to non-zero value.
				+
				+.type my_kernel.kd, @object
				+.size my_kernel.kd, 64
				+my_kernel.kd:
				+ .long 0x00000000 ;; group_segment_fixed_size
				+ .long 0x00000000 ;; private_segment_fixed_size
				+ .quad 0x00FF000000000000 ;; reserved bytes.
				+ .quad 0x0000000000000000 ;; kernel_code_entry_byte_offset, any value works.
				+
				+ ;; 20 reserved bytes.
				+ .quad 0x0000000000000000
				+ .quad 0x0000000000000000
				+ .long 0x00000000
				+
				+ .long 0x00000000 ;; compute_PGM_RSRC3
				+ .long 0x00000000 ;; compute_PGM_RSRC1
				+ .long 0x00000000 ;; compute_PGM_RSRC2
				+ .short 0x0000 ;; additional fields.
				+
				+ ;; 6 reserved bytes.
				+ .long 0x0000000
				+ .short 0x0000
				diff --git a/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-sgpr.s b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-sgpr.s
				new file mode 100644
				--- /dev/null
				+++ b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-sgpr.s
				@@ -0,0 +1,49 @@
				+;; Test disassembly for GRANULATED_WAVEFRONT_SGPR_COUNT in the kernel descriptor.
				+
				+; RUN: split-file %s %t.dir
				+
				+; RUN: llvm-mc %t.dir/1.s -mattr=+code-object-v3 --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t1
				+; RUN: llvm-objdump --disassemble-symbols=my_kernel_1.kd %t1 \| tail -n +8 \
				+; RUN: \| llvm-mc --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t1-re-assemble
				+; RUN: diff %t1 %t1-re-assemble
				+
				+; RUN: llvm-mc %t.dir/2.s -mattr=+code-object-v3 --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t2
				+; RUN: llvm-objdump --disassemble-symbols=my_kernel_2.kd %t2 \| tail -n +8 \
				+; RUN: \| llvm-mc --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t2-re-assemble
				+; RUN: diff %t2 %t2-re-assemble
				+
				+; RUN: llvm-mc %t.dir/3.s -mattr=+code-object-v3 --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t3
				+; RUN: llvm-objdump --disassemble-symbols=my_kernel_3.kd %t3 \| tail -n +8 \
				+; RUN: \| llvm-mc --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t3-re-assemble
				+; RUN: diff %t3 %t3-re-assemble
				+
				+
				+;--- 1.s
				+;; Only set next_free_sgpr.
				+.amdhsa_kernel my_kernel_1
				+ .amdhsa_next_free_vgpr 0
				+ .amdhsa_next_free_sgpr 42
				+ .amdhsa_reserve_flat_scratch 0
				+ .amdhsa_reserve_xnack_mask 0
				+ .amdhsa_reserve_vcc 0
				+.end_amdhsa_kernel
				+
				+;--- 2.s
				+;; Only set other directives.
				+.amdhsa_kernel my_kernel_2
				+ .amdhsa_next_free_vgpr 0
				+ .amdhsa_next_free_sgpr 0
				+ .amdhsa_reserve_flat_scratch 1
				+ .amdhsa_reserve_xnack_mask 1
				+ .amdhsa_reserve_vcc 1
				+.end_amdhsa_kernel
				+
				+;--- 3.s
				+;; Set all affecting directives.
				+.amdhsa_kernel my_kernel_3
				+ .amdhsa_next_free_vgpr 0
				+ .amdhsa_next_free_sgpr 35
				+ .amdhsa_reserve_flat_scratch 1
				+ .amdhsa_reserve_xnack_mask 1
				+ .amdhsa_reserve_vcc 1
				+.end_amdhsa_kernel
				diff --git a/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-vgpr.s b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-vgpr.s
				new file mode 100644
				--- /dev/null
				+++ b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-vgpr.s
				@@ -0,0 +1,36 @@
				+;; Test disassembly for GRANULATED_WORKITEM_VGPR_COUNT in the kernel descriptor.
				+
				+; RUN: split-file %s %t.dir
				+
				+; RUN: llvm-mc %t.dir/1.s -mattr=+code-object-v3 --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t1
				+; RUN: llvm-objdump --disassemble-symbols=my_kernel_1.kd %t1 \| tail -n +8 \
				+; RUN: \| llvm-mc --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t1-re-assemble
				+; RUN: diff %t1 %t1-re-assemble
				+
				+; RUN: llvm-mc %t.dir/2.s -mattr=+code-object-v3 --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t2
				+; RUN: llvm-objdump --disassemble-symbols=my_kernel_2.kd %t2 \| tail -n +8 \
				+; RUN: \| llvm-mc --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t2-re-assemble
				+; RUN: diff %t2 %t2-re-assemble
				+
				+; RUN: llvm-mc %t.dir/3.s -mattr=+code-object-v3 --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t3
				+; RUN: llvm-objdump --disassemble-symbols=my_kernel_3.kd %t3 \| tail -n +8 \
				+; RUN: \| llvm-mc --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t3-re-assemble
				+; RUN: diff %t3 %t3-re-assemble
				+
				+;--- 1.s
				+.amdhsa_kernel my_kernel_1
				+ .amdhsa_next_free_vgpr 23
				+ .amdhsa_next_free_sgpr 0
				+.end_amdhsa_kernel
				+
				+;--- 2.s
				+.amdhsa_kernel my_kernel_2
				+ .amdhsa_next_free_vgpr 14
				+ .amdhsa_next_free_sgpr 0
				+.end_amdhsa_kernel
				+
				+;--- 3.s
				+.amdhsa_kernel my_kernel_3
				+ .amdhsa_next_free_vgpr 32
				+ .amdhsa_next_free_sgpr 0
				+.end_amdhsa_kernel
				diff --git a/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-gfx10.s b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-gfx10.s
				new file mode 100644
				--- /dev/null
				+++ b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-gfx10.s
				@@ -0,0 +1,58 @@
				+;; Entirely zeroed kernel descriptor (for GFX10).
				+
				+; RUN: llvm-mc %s -mattr=+code-object-v3 --triple=amdgcn-amd-amdhsa -mcpu=gfx1010 -filetype=obj -o %t
				+; RUN: llvm-objdump -s -j .text %t \| FileCheck --check-prefix=OBJDUMP %s
				+
				+;; TODO:
				+;; This file and kd-zeroed-raw.s should produce the same output for the kernel
				+;; descriptor - a block of 64 zeroed bytes. But looks like the assembler sets
				+;; the FWD_PROGRESS bit in COMPUTE_PGM_RSRC1 to 1 even when the directive
				+;; mentions 0 (see line 36).
				+
				+;; Check the raw bytes right now.
				+
				+; OBJDUMP: 0000 00000000 00000000 00000000 00000000
				+; OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000
				+; OBJDUMP-NEXT: 0020 00000000 00000000 00000000 00000000
				+; OBJDUMP-NEXT: 0030 01000000 00000000 00000000 00000000
				+
				+.amdhsa_kernel my_kernel
				+ .amdhsa_group_segment_fixed_size 0
				+ .amdhsa_private_segment_fixed_size 0
				+ .amdhsa_next_free_vgpr 8
				+ .amdhsa_reserve_vcc 0
				+ .amdhsa_reserve_flat_scratch 0
				+ .amdhsa_reserve_xnack_mask 0
				+ .amdhsa_next_free_sgpr 8
				+ .amdhsa_float_round_mode_32 0
				+ .amdhsa_float_round_mode_16_64 0
				+ .amdhsa_float_denorm_mode_32 0
				+ .amdhsa_float_denorm_mode_16_64 0
				+ .amdhsa_dx10_clamp 0
				+ .amdhsa_ieee_mode 0
				+ .amdhsa_fp16_overflow 0
				+ .amdhsa_workgroup_processor_mode 0
				+ .amdhsa_memory_ordered 0
				+ .amdhsa_forward_progress 0
				+ .amdhsa_system_sgpr_private_segment_wavefront_offset 0
				+ .amdhsa_system_sgpr_workgroup_id_x 0
				+ .amdhsa_system_sgpr_workgroup_id_y 0
				+ .amdhsa_system_sgpr_workgroup_id_z 0
				+ .amdhsa_system_sgpr_workgroup_info 0
				+ .amdhsa_system_vgpr_workitem_id 0
				+ .amdhsa_exception_fp_ieee_invalid_op 0
				+ .amdhsa_exception_fp_denorm_src 0
				+ .amdhsa_exception_fp_ieee_div_zero 0
				+ .amdhsa_exception_fp_ieee_overflow 0
				+ .amdhsa_exception_fp_ieee_underflow 0
				+ .amdhsa_exception_fp_ieee_inexact 0
				+ .amdhsa_exception_int_div_zero 0
				+ .amdhsa_user_sgpr_private_segment_buffer 0
				+ .amdhsa_user_sgpr_dispatch_ptr 0
				+ .amdhsa_user_sgpr_queue_ptr 0
				+ .amdhsa_user_sgpr_kernarg_segment_ptr 0
				+ .amdhsa_user_sgpr_dispatch_id 0
				+ .amdhsa_user_sgpr_flat_scratch_init 0
				+ .amdhsa_user_sgpr_private_segment_size 0
				+ .amdhsa_wavefront_size32 0
				+.end_amdhsa_kernel
				diff --git a/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-gfx9.s b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-gfx9.s
				new file mode 100644
				--- /dev/null
				+++ b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-gfx9.s
				@@ -0,0 +1,53 @@
				+;; Entirely zeroed kernel descriptor (for GFX9).
				+
				+; RUN: llvm-mc %s -mattr=+code-object-v3 --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t1
				+; RUN: llvm-objdump --disassemble-symbols=my_kernel.kd %t1 \
				+; RUN: \| tail -n +8 \| llvm-mc --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t2
				+; RUN: diff %t1 %t2
				+
				+; RUN: llvm-objdump -s -j .text %t1 \| FileCheck --check-prefix=OBJDUMP %s
				+
				+; OBJDUMP: 0000 00000000 00000000 00000000 00000000
				+; OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000
				+; OBJDUMP-NEXT: 0020 00000000 00000000 00000000 00000000
				+; OBJDUMP-NEXT: 0030 00000000 00000000 00000000 00000000
				+
				+;; This file and kd-zeroed-raw.s produce the same output for the kernel
				+;; descriptor - a block of 64 zeroed bytes.
				+
				+.amdhsa_kernel my_kernel
				+ .amdhsa_group_segment_fixed_size 0
				+ .amdhsa_private_segment_fixed_size 0
				+ .amdhsa_next_free_vgpr 0
				+ .amdhsa_reserve_vcc 0
				+ .amdhsa_reserve_flat_scratch 0
				+ .amdhsa_reserve_xnack_mask 0
				+ .amdhsa_next_free_sgpr 0
				+ .amdhsa_float_round_mode_32 0
				+ .amdhsa_float_round_mode_16_64 0
				+ .amdhsa_float_denorm_mode_32 0
				+ .amdhsa_float_denorm_mode_16_64 0
				+ .amdhsa_dx10_clamp 0
				+ .amdhsa_ieee_mode 0
				+ .amdhsa_fp16_overflow 0
				+ .amdhsa_system_sgpr_private_segment_wavefront_offset 0
				+ .amdhsa_system_sgpr_workgroup_id_x 0
				+ .amdhsa_system_sgpr_workgroup_id_y 0
				+ .amdhsa_system_sgpr_workgroup_id_z 0
				+ .amdhsa_system_sgpr_workgroup_info 0
				+ .amdhsa_system_vgpr_workitem_id 0
				+ .amdhsa_exception_fp_ieee_invalid_op 0
				+ .amdhsa_exception_fp_denorm_src 0
				+ .amdhsa_exception_fp_ieee_div_zero 0
				+ .amdhsa_exception_fp_ieee_overflow 0
				+ .amdhsa_exception_fp_ieee_underflow 0
				+ .amdhsa_exception_fp_ieee_inexact 0
				+ .amdhsa_exception_int_div_zero 0
				+ .amdhsa_user_sgpr_private_segment_buffer 0
				+ .amdhsa_user_sgpr_dispatch_ptr 0
				+ .amdhsa_user_sgpr_queue_ptr 0
				+ .amdhsa_user_sgpr_kernarg_segment_ptr 0
				+ .amdhsa_user_sgpr_dispatch_id 0
				+ .amdhsa_user_sgpr_flat_scratch_init 0
				+ .amdhsa_user_sgpr_private_segment_size 0
				+.end_amdhsa_kernel
				diff --git a/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-raw.s b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-raw.s
				new file mode 100644
				--- /dev/null
				+++ b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-raw.s
				@@ -0,0 +1,41 @@
				+; RUN: llvm-mc %s -mattr=+code-object-v3 --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t1
				+; RUN: llvm-objdump --disassemble-symbols=my_kernel.kd %t1 \
				+; RUN: \| tail -n +8 \| llvm-mc --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t2
				+; RUN: llvm-objdump -s -j .text %t2 \| FileCheck --check-prefix=OBJDUMP %s
				+
				+;; Not running lit-test over gfx10 (see kd-zeroed-gfx10.s for details).
				+;; kd-zeroed-raw.s and kd-zeroed-*.s should produce the same output for the
				+;; kernel descriptor - a block of 64 zeroed bytes.
				+
				+;; The disassembly will produce the contents of kd-zeroed-*.s which on being
				+;; assembled contains additional relocation info. A diff over the entire object
				+;; will fail in this case. So we check by looking the bytes in .text.
				+
				+; OBJDUMP: 0000 00000000 00000000 00000000 00000000
				+; OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000
				+; OBJDUMP-NEXT: 0020 00000000 00000000 00000000 00000000
				+; OBJDUMP-NEXT: 0030 00000000 00000000 00000000 00000000
				+
				+;; The entire object is zeroed out.
				+
				+.type my_kernel.kd, @object
				+.size my_kernel.kd, 64
				+my_kernel.kd:
				+ .long 0x00000000 ;; group_segment_fixed_size
				+ .long 0x00000000 ;; private_segment_fixed_size
				+ .quad 0x0000000000000000 ;; reserved bytes.
				+ .quad 0x0000000000000000 ;; kernel_code_entry_byte_offset, any value works.
				+
				+ ;; 20 reserved bytes.
				+ .quad 0x0000000000000000
				+ .quad 0x0000000000000000
				+ .long 0x00000000
				+
				+ .long 0x00000000 ;; compute_PGM_RSRC3
				+ .long 0x00000000 ;; compute_PGM_RSRC1
				+ .long 0x00000000 ;; compute_PGM_RSRC2
				+ .short 0x0000 ;; additional fields.
				+
				+ ;; 6 reserved bytes.
				+ .long 0x0000000
				+ .short 0x0000
				diff --git a/llvm/tools/llvm-objdump/llvm-objdump.cpp b/llvm/tools/llvm-objdump/llvm-objdump.cpp
				--- a/llvm/tools/llvm-objdump/llvm-objdump.cpp
				+++ b/llvm/tools/llvm-objdump/llvm-objdump.cpp
				@@ -1854,23 +1854,6 @@
				outs() << SectionName << ":\n";
				}

				- if (Obj->isELF() && Obj->getArch() == Triple::amdgcn) {
				- if (Symbols[SI].Type == ELF::STT_AMDGPU_HSA_KERNEL) {
				- // skip amd_kernel_code_t at the begining of kernel symbol (256 bytes)
				- Start += 256;
				- }
				- if (SI == SE - 1 \|\|
				- Symbols[SI + 1].Type == ELF::STT_AMDGPU_HSA_KERNEL) {
				- // cut trailing zeroes at the end of kernel
				- // cut up to 256 bytes
				- const uint64_t EndAlign = 256;
				- const auto Limit = End - (std::min)(EndAlign, End - Start);
				- while (End > Limit &&
				- reinterpret_cast<const support::ulittle32_t>(&Bytes[End - 4]) == 0)
				- End -= 4;
				- }
				- }
				-
				outs() << '\n';
				if (!NoLeadingAddr)
				outs() << format(Is64Bits ? "%016" PRIx64 " " : "%08" PRIx64 " ",

llvm/include/llvm/Support/AMDHSAKernelDescriptor.h

Show First 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	struct kernel_descriptor_t {
uint8_t reserved1[20];		uint8_t reserved1[20];
uint32_t compute_pgm_rsrc3; // GFX10+		uint32_t compute_pgm_rsrc3; // GFX10+
uint32_t compute_pgm_rsrc1;		uint32_t compute_pgm_rsrc1;
uint32_t compute_pgm_rsrc2;		uint32_t compute_pgm_rsrc2;
uint16_t kernel_code_properties;		uint16_t kernel_code_properties;
uint8_t reserved2[6];		uint8_t reserved2[6];
};		};

		enum : uint32_t {
		GROUP_SEGMENT_FIXED_SIZE_OFFSET = 0,
		PRIVATE_SEGMENT_FIXED_SIZE_OFFSET = 4,
		RESERVED0_OFFSET = 8,
		KERNEL_CODE_ENTRY_BYTE_OFFSET_OFFSET = 16,
		RESERVED1_OFFSET = 24,
		COMPUTE_PGM_RSRC3_OFFSET = 44,
		COMPUTE_PGM_RSRC1_OFFSET = 48,
		COMPUTE_PGM_RSRC2_OFFSET = 52,
		KERNEL_CODE_PROPERTIES_OFFSET = 56,
		RESERVED2_OFFSET = 58,
		};

static_assert(		static_assert(
sizeof(kernel_descriptor_t) == 64,		sizeof(kernel_descriptor_t) == 64,
"invalid size for kernel_descriptor_t");		"invalid size for kernel_descriptor_t");
static_assert(		static_assert(offsetof(kernel_descriptor_t, group_segment_fixed_size) ==
offsetof(kernel_descriptor_t, group_segment_fixed_size) == 0,		GROUP_SEGMENT_FIXED_SIZE_OFFSET,
"invalid offset for group_segment_fixed_size");		"invalid offset for group_segment_fixed_size");
static_assert(		static_assert(offsetof(kernel_descriptor_t, private_segment_fixed_size) ==
offsetof(kernel_descriptor_t, private_segment_fixed_size) == 4,		PRIVATE_SEGMENT_FIXED_SIZE_OFFSET,
"invalid offset for private_segment_fixed_size");		"invalid offset for private_segment_fixed_size");
static_assert(		static_assert(offsetof(kernel_descriptor_t, reserved0) == RESERVED0_OFFSET,
offsetof(kernel_descriptor_t, reserved0) == 8,
"invalid offset for reserved0");		"invalid offset for reserved0");
static_assert(		static_assert(offsetof(kernel_descriptor_t, kernel_code_entry_byte_offset) ==
offsetof(kernel_descriptor_t, kernel_code_entry_byte_offset) == 16,		KERNEL_CODE_ENTRY_BYTE_OFFSET_OFFSET,
"invalid offset for kernel_code_entry_byte_offset");		"invalid offset for kernel_code_entry_byte_offset");
static_assert(		static_assert(offsetof(kernel_descriptor_t, reserved1) == RESERVED1_OFFSET,
offsetof(kernel_descriptor_t, reserved1) == 24,
"invalid offset for reserved1");		"invalid offset for reserved1");
static_assert(		static_assert(offsetof(kernel_descriptor_t, compute_pgm_rsrc3) ==
offsetof(kernel_descriptor_t, compute_pgm_rsrc3) == 44,		COMPUTE_PGM_RSRC3_OFFSET,
"invalid offset for compute_pgm_rsrc3");		"invalid offset for compute_pgm_rsrc3");
static_assert(		static_assert(offsetof(kernel_descriptor_t, compute_pgm_rsrc1) ==
offsetof(kernel_descriptor_t, compute_pgm_rsrc1) == 48,		COMPUTE_PGM_RSRC1_OFFSET,
"invalid offset for compute_pgm_rsrc1");		"invalid offset for compute_pgm_rsrc1");
static_assert(		static_assert(offsetof(kernel_descriptor_t, compute_pgm_rsrc2) ==
offsetof(kernel_descriptor_t, compute_pgm_rsrc2) == 52,		COMPUTE_PGM_RSRC2_OFFSET,
"invalid offset for compute_pgm_rsrc2");		"invalid offset for compute_pgm_rsrc2");
static_assert(		static_assert(offsetof(kernel_descriptor_t, kernel_code_properties) ==
offsetof(kernel_descriptor_t, kernel_code_properties) == 56,		KERNEL_CODE_PROPERTIES_OFFSET,
"invalid offset for kernel_code_properties");		"invalid offset for kernel_code_properties");
static_assert(		static_assert(offsetof(kernel_descriptor_t, reserved2) == RESERVED2_OFFSET,
offsetof(kernel_descriptor_t, reserved2) == 58,
"invalid offset for reserved2");		"invalid offset for reserved2");

} // end namespace amdhsa		} // end namespace amdhsa
} // end namespace llvm		} // end namespace llvm

#endif // LLVM_SUPPORT_AMDHSAKERNELDESCRIPTOR_H		#endif // LLVM_SUPPORT_AMDHSAKERNELDESCRIPTOR_H

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h

Show All 11 Lines
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_LIB_TARGET_AMDGPU_DISASSEMBLER_AMDGPUDISASSEMBLER_H		#ifndef LLVM_LIB_TARGET_AMDGPU_DISASSEMBLER_AMDGPUDISASSEMBLER_H
#define LLVM_LIB_TARGET_AMDGPU_DISASSEMBLER_AMDGPUDISASSEMBLER_H		#define LLVM_LIB_TARGET_AMDGPU_DISASSEMBLER_AMDGPUDISASSEMBLER_H

#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/MC/MCContext.h"		#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCInstrInfo.h"
#include "llvm/MC/MCDisassembler/MCDisassembler.h"		#include "llvm/MC/MCDisassembler/MCDisassembler.h"
#include "llvm/MC/MCDisassembler/MCRelocationInfo.h"		#include "llvm/MC/MCDisassembler/MCRelocationInfo.h"
#include "llvm/MC/MCDisassembler/MCSymbolizer.h"		#include "llvm/MC/MCDisassembler/MCSymbolizer.h"
		#include "llvm/MC/MCInstrInfo.h"
		#include "llvm/Support/DataExtractor.h"

#include <algorithm>		#include <algorithm>
#include <cstdint>		#include <cstdint>
#include <memory>		#include <memory>

namespace llvm {		namespace llvm {

class MCInst;		class MCInst;
Show All 29 Lines	public:
MCOperand createRegOperand(unsigned RegClassID, unsigned Val) const;		MCOperand createRegOperand(unsigned RegClassID, unsigned Val) const;
MCOperand createSRegOperand(unsigned SRegClassID, unsigned Val) const;		MCOperand createSRegOperand(unsigned SRegClassID, unsigned Val) const;

MCOperand errOperand(unsigned V, const Twine& ErrMsg) const;		MCOperand errOperand(unsigned V, const Twine& ErrMsg) const;

DecodeStatus tryDecodeInst(const uint8_t* Table, MCInst &MI, uint64_t Inst,		DecodeStatus tryDecodeInst(const uint8_t* Table, MCInst &MI, uint64_t Inst,
uint64_t Address) const;		uint64_t Address) const;

		Optional<DecodeStatus> onSymbolStart(SymbolInfoTy &Symbol, uint64_t &Size,
		ArrayRef<uint8_t> Bytes,
		uint64_t Address,
		raw_ostream &CStream) const override;

		DecodeStatus decodeKernelDescriptor(StringRef KdName, ArrayRef<uint8_t> Bytes,
		uint64_t KdAddress) const;

		DecodeStatus
		decodeKernelDescriptorDirective(DataExtractor::Cursor &Cursor,
		ArrayRef<uint8_t> Bytes,
		raw_string_ostream &KdStream) const;
		jhendersonUnsubmitted Done Reply Inline Actions Use `raw_string_ostream` rather than `std::stringstream`. jhenderson: Use `raw_string_ostream` rather than `std::stringstream`.

		/// Decode as directives that handle COMPUTE_PGM_RSRC1.
		scott.linderUnsubmitted Done Reply Inline Actions Could you either rename this to satisfy the linter, or else explicitly suppress the lint with a comment like: // NOLINTNEXTLINE(readability-identifier-naming) scott.linder: Could you either rename this to satisfy the linter, or else explicitly suppress the lint with a…
		rochauhaAuthorUnsubmitted Done Reply Inline Actions Done. rochauha: Done.
		/// \param FourByteBuffer - Bytes holding contents of COMPUTE_PGM_RSRC1.
		/// \param KdStream - Stream to write the disassembled directives to.
		// NOLINTNEXTLINE(readability-identifier-naming)
		DecodeStatus decodeCOMPUTE_PGM_RSRC1(uint32_t FourByteBuffer,
		raw_string_ostream &KdStream) const;

		/// Decode as directives that handle COMPUTE_PGM_RSRC2.
		/// \param FourByteBuffer - Bytes holding contents of COMPUTE_PGM_RSRC2.
		/// \param KdStream - Stream to write the disassembled directives to.
		// NOLINTNEXTLINE(readability-identifier-naming)
		DecodeStatus decodeCOMPUTE_PGM_RSRC2(uint32_t FourByteBuffer,
		raw_string_ostream &KdStream) const;

DecodeStatus convertSDWAInst(MCInst &MI) const;		DecodeStatus convertSDWAInst(MCInst &MI) const;
DecodeStatus convertDPP8Inst(MCInst &MI) const;		DecodeStatus convertDPP8Inst(MCInst &MI) const;
DecodeStatus convertMIMGInst(MCInst &MI) const;		DecodeStatus convertMIMGInst(MCInst &MI) const;

MCOperand decodeOperand_VGPR_32(unsigned Val) const;		MCOperand decodeOperand_VGPR_32(unsigned Val) const;
MCOperand decodeOperand_VRegOrLds_32(unsigned Val) const;		MCOperand decodeOperand_VRegOrLds_32(unsigned Val) const;

MCOperand decodeOperand_VS_32(unsigned Val) const;		MCOperand decodeOperand_VS_32(unsigned Val) const;
▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp

Show All 28 Lines
#include "llvm/BinaryFormat/ELF.h"		#include "llvm/BinaryFormat/ELF.h"
#include "llvm/MC/MCAsmInfo.h"		#include "llvm/MC/MCAsmInfo.h"
#include "llvm/MC/MCContext.h"		#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCDisassembler/MCDisassembler.h"		#include "llvm/MC/MCDisassembler/MCDisassembler.h"
#include "llvm/MC/MCExpr.h"		#include "llvm/MC/MCExpr.h"
#include "llvm/MC/MCFixedLenDisassembler.h"		#include "llvm/MC/MCFixedLenDisassembler.h"
#include "llvm/MC/MCInst.h"		#include "llvm/MC/MCInst.h"
#include "llvm/MC/MCSubtargetInfo.h"		#include "llvm/MC/MCSubtargetInfo.h"
		#include "llvm/Support/AMDHSAKernelDescriptor.h"
#include "llvm/Support/Endian.h"		#include "llvm/Support/Endian.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"
#include "llvm/Support/TargetRegistry.h"		#include "llvm/Support/TargetRegistry.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
#include <cstddef>		#include <cstddef>
▲ Show 20 Lines • Show All 1,176 Lines • ▼ Show 20 Lines	bool AMDGPUDisassembler::isGFX9() const {
return STI.getFeatureBits()[AMDGPU::FeatureGFX9];		return STI.getFeatureBits()[AMDGPU::FeatureGFX9];
}		}

bool AMDGPUDisassembler::isGFX10() const {		bool AMDGPUDisassembler::isGFX10() const {
return STI.getFeatureBits()[AMDGPU::FeatureGFX10];		return STI.getFeatureBits()[AMDGPU::FeatureGFX10];
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		// AMDGPU specific symbol handling
		//===----------------------------------------------------------------------===//
		#define PRINT_DIRECTIVE(DIRECTIVE, MASK) \
		do { \
		KdStream << Indent << DIRECTIVE " " \
		<< ((FourByteBuffer & MASK) >> (MASK##_SHIFT)) << '\n'; \
		jhendersonUnsubmitted Not Done Reply Inline Actions This is not how to handle malformed input. This will result in an assertion in debug builds, which is equivalent to a crash, without any useful context to draw on, because you've not reported the error. More below. jhenderson: This is not how to handle malformed input. This will result in an assertion in debug builds…
		} while (0)

		scott.linderUnsubmitted Not Done Reply Inline Actions This doesn't seem right. This symbol is not accurate without careful attention from an assembly author, so it will surely be incorrect when used by a disassembler. We can accurately compute a VGPR count from the `GRANULATED_WORKITEM_VGPR_COUNT`, we just calculate e.g. `(GRANULATED_WORKITEM_VGPR_COUNT + 1) * granularity` where the `+ 1` is needed to account for the minimum allocation (i.e. a granulated count encoded as "0" actually indicates a 1 granule allocation) and where the scaling by granularity is device-specific. This will calculate the greatest value possible for `.amdhsa_next_free_vgpr` which will produce the same granulated count, but equally valid you could choose to calculate the minimum, or any value in between. All we care about is producing disassembly which results in the same descriptor. (Caveat: I would prove this to yourself by referencing https://llvm.org/docs/AMDGPUUsage.html#amdgpu-amdhsa-compute-pgm-rsrc1-gfx6-gfx10-table as I may have the actual calculation wrong; we just want to calculate the inverse of what the assembler does) Your point that we can't recover the exact value the original author intended to place in the assembly text is true, but that is OK as long as we can always give them a valid input to the assembler which gives them the same output. scott.linder: This doesn't seem right. This symbol is not accurate without careful attention from an assembly…
		// NOLINTNEXTLINE(readability-identifier-naming)
		MCDisassembler::DecodeStatus AMDGPUDisassembler::decodeCOMPUTE_PGM_RSRC1(
		uint32_t FourByteBuffer, raw_string_ostream &KdStream) const {
		using namespace amdhsa;
		StringRef Indent = "\t";
		jhendersonUnsubmitted Done Reply Inline Actions Nit, here and throughout: missing full stop at end of many comments. jhenderson: Nit, here and throughout: missing full stop at end of many comments.
		scott.linderUnsubmitted Done Reply Inline Actions This can be a char literal, i.e. '\n', same elsewhere. scott.linder: This can be a char literal, i.e. '\n', same elsewhere.

		// We cannot accurately backward compute #VGPRs used from
		// GRANULATED_WORKITEM_VGPR_COUNT. But we are concerned with getting the same
		// value of GRANULATED_WORKITEM_VGPR_COUNT in the reassembled binary. So we
		// simply calculate the inverse of what the assembler does.

		uint32_t GranulatedWorkitemVGPRCount =
		(FourByteBuffer & COMPUTE_PGM_RSRC1_GRANULATED_WORKITEM_VGPR_COUNT) >>
		COMPUTE_PGM_RSRC1_GRANULATED_WORKITEM_VGPR_COUNT_SHIFT;

		uint32_t NextFreeVGPR = (GranulatedWorkitemVGPRCount + 1) *
		AMDGPU::IsaInfo::getVGPREncodingGranule(&STI);

		scott.linderUnsubmitted Not Done Reply Inline Actions For this and the above case we should have tests to prove this out. I.e. assemble sources to a binary, disassemble and reassemble it, and then compare the two binaries. Ideally we would do this for some edge cases around VGPR/SGPR allocation granularity. There may need to be some fixup between disassembly and reassembly to account for the remaining non-reassembleable bits produced by llvm-objdump, but they should be pretty minor for a trivial kernel, and I would expect you could handle them with just `sed` which seems to be available to LIT tests. scott.linder: For this and the above case we should have tests to prove this out. I.e. assemble sources to a…
		rochauhaAuthorUnsubmitted Done Reply Inline Actions Right now we can't really re-assemble in the lit-test. This needs to be tested 'informally' by: Manually writing a small test case. Make a copy of it too. Assembling it into the binary : Binary-1. Disassembling it. Replace the original kernel descriptor with the disassembled kernel descriptor in the copy. Assemble the copy : Binary-2. Compare Binary-1 and Binary-2. rochauha: Right now we can't really re-assemble in the lit-test. This needs to be tested 'informally' by…
		rochauhaAuthorUnsubmitted Done Reply Inline Actions Went this route to check whether re-assembled binaries match or not. Turns out that both binaries match, in size (overall size as well as size of sections) and also in terms of all the disassembled content. But a `diff object1 object2` says that binary files differ. rochauha: Went this route to check whether re-assembled binaries match or not. Turns out that both…
		scott.linderUnsubmitted Not Done Reply Inline Actions I'm not sure I follow what you are describing; my thought was to start with just an asm source file containing only the kernel descriptor directive in the default section, and compare the output of the following (with, e.g. diff, as you mention): Assemble it to an object file with llvm-mc Assemble it to an object file with llvm-mc \| disassemble the kernel descriptor symbol \| trim any human-readable prologue \| assemble it to an object file with llvm-mc As a trivial example, diff doesn't find any difference for the following example: $ printf '.amdhsa_kernel my_kernel\n.amdhsa_next_free_vgpr 0\n.amdhsa_next_free_sgpr 0\n.end_amdhsa_kernel' >a.s $ release/bin/llvm-mc --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj a.s >a.o $ diff a.o \ <(release/bin/llvm-objdump --triple=amdgcn-amd-amdhsa --mcpu=gfx908 --disassemble-symbols=my_kernel.kd a.o \ \| tail -n +8 \ \| release/bin/llvm-mc --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj) I don't think you need to use `FileCheck` for these tests at all, you can just rely on ending the RUN pipeline with `diff`, which seems to be supported by lit. You can then just copy-paste the test and edit fields in the input to validate edge cases for things like the SGPR/VGPR allocation directives. I think more comprehensive testing, including for other sections and executables/DSOs, would be good eventually but for now we should at least have some tests that explicitly confirm the KD disassembly round-trips. scott.linder: I'm not sure I follow what you are describing; my thought was to start with just an asm source…
		KdStream << Indent << ".amdhsa_next_free_vgpr " << NextFreeVGPR << '\n';

		// We cannot backward compute values used to calculate
		// GRANULATED_WAVEFRONT_SGPR_COUNT. Hence the original values for following
		// directives can't be computed:
		scott.linderUnsubmitted Not Done Reply Inline Actions I think this can just be replaced with: NextFreeVGPR = (GranulatedWorkitemVGPRCount + 1) * getVGPRAllocGranule(STI); Or we could add another function called `getNumVGPRs(const MCSubtargetInfo STI, unsigned NumVGPRBlocks, Optional<bool> EnableWavefrontSize32)` and put the definition directly next to `getNumVGPRBlocks(const MCSubtargetInfo STI, unsigned NumVGPRs, Optional<bool> EnableWavefrontSize32)` so any future changes affecting one also affect the other. I would lean towards this, and documenting that they are the inverse of one another. scott.linder: I think this can just be replaced with: ``` NextFreeVGPR = (GranulatedWorkitemVGPRCount + 1) *…
		rochauhaAuthorUnsubmitted Done Reply Inline Actions I think this route is good for now : NextFreeVGPR = (GranulatedWorkitemVGPRCount + 1) * getVGPRAllocGranule(STI); Similarly for SGPRs. I'd like to add the new functions via a separate patch. This patch is already quite big in terms of size. rochauha: I think this route is good for now : NextFreeVGPR = (GranulatedWorkitemVGPRCount + 1) *…
		// .amdhsa_reserve_vcc
		madhur13490Unsubmitted Done Reply Inline Actions You can return std::nullopt as return type here is std::optional. madhur13490: You can return std::nullopt as return type here is std::optional.
		rochauhaAuthorUnsubmitted Done Reply Inline Actions llvm::Optional is used here instead of std::optional. rochauha: llvm::Optional is used here instead of std::optional.
		// .amdhsa_reserve_flat_scratch
		// .amdhsa_reserve_xnack_mask
		// They take their respective default values if not specified in the assembly.
		madhur13490Unsubmitted Done Reply Inline Actions Can you please have function in reverse order? E.g."onSymbolStart" calls "decodeKernelDescriptor" so latter should be above former. This way it's readable and and at a predictable location. Ideally, callees should be above callers. madhur13490: Can you please have function in reverse order? E.g."onSymbolStart" calls…
		rochauhaAuthorUnsubmitted Done Reply Inline Actions I was under the impression that having the current order of functions helps to 'incrementally zoom' into the details while reading the code. rochauha: I was under the impression that having the current order of functions helps to 'incrementally…
		madhur13490Unsubmitted Done Reply Inline Actions No, You can see other files e.g. runOn* functions are always at the bottom. It is also important for compiler as it would see declarations/definitions before callers. madhur13490: No, You can see other files e.g. runOn* functions are always at the bottom. It is also…
		scott.linderUnsubmitted Not Done Reply Inline Actions Just as with the VGPR count, we cannot use the symbol to define this, and we can compute a (non-unique) input which produces the same output. scott.linder: Just as with the VGPR count, we cannot use the symbol to define this, and we //can// compute a…
		//
		// GRANULATED_WAVEFRONT_SGPR_COUNT
		// = f(NEXT_FREE_SGPR + VCC + FLAT_SCRATCH + XNACK_MASK)
		//
		// We compute the inverse as though all directives apart from NEXT_FREE_SGPR
		// are set to 0. So while disassembling we consider that:
		//
		// GRANULATED_WAVEFRONT_SGPR_COUNT
		// = f(NEXT_FREE_SGPR + 0 + 0 + 0)
		//
		scott.linderUnsubmitted Not Done Reply Inline Actions This shift isn't necessary, you just need to check for the presence of any set bits. I also noticed when checking the types here that for some reason we declare the enum for these masks/shifts as `int32_t`, which makes one have to think about both integer promotion rules and then possibly which bitwise-operations are valid for signed integers. I think here the signed mask is promoted to unsigned, and you get what you want, but it may be good to go fix the definition of the masks separately. scott.linder: This shift isn't necessary, you just need to check for the presence of any set bits. I also…
		// The disassembler cannot recover the original values of those 3 directives.

		uint32_t GranulatedWavefrontSGPRCount =
		jhendersonUnsubmitted Done Reply Inline Actions Have you considered using the `DataExtractor::Cursor` approach (see for example the DWARFDebugLine code)? This will make the offset tracking and error handling cleaner I think. jhenderson: Have you considered using the `DataExtractor::Cursor` approach (see for example the…
		jhendersonUnsubmitted Done Reply Inline Actions I don't see a response to this suggestion. jhenderson: I don't see a response to this suggestion.
		(FourByteBuffer & COMPUTE_PGM_RSRC1_GRANULATED_WAVEFRONT_SGPR_COUNT) >>
		COMPUTE_PGM_RSRC1_GRANULATED_WAVEFRONT_SGPR_COUNT_SHIFT;
		scott.linderUnsubmitted Done Reply Inline Actions I think a short-lived well-defined preprocessor macro could make the patch shorter and easier to read. For example: #define PRINT_TRIVIAL_FIELD(DIRECTIVE, MASK) do { \ KdStream << Indent << DIRECTIVE " " << ((FourByteBuffer & amdhsa:: ## MASK) >> amdhsa:: ## MASK ## _SHIFT) << '\n'; } while (0); PRINT_TRIVIAL_FIELD(".amdhsa_float_round_mode_32", COMPUTE_PGM_RSRC1_FLOAT_ROUND_MODE_32) PRINT_TRIVIAL_FIELD(".amdhsa_float_round_mode_16_64", COMPUTE_PGM_RSRC1_FLOAT_ROUND_MODE_16_64) // ... #undef HANDLE_TRIVIAL_FIELD I think in part this is because the definition of the masks themselves is in terms of preprocessor macros. scott.linder: I think a short-lived well-defined preprocessor macro could make the patch shorter and easier…

		if (isGFX10() && GranulatedWavefrontSGPRCount)
		return MCDisassembler::Fail;

		uint32_t NextFreeSGPR = (GranulatedWavefrontSGPRCount + 1) *
		AMDGPU::IsaInfo::getSGPREncodingGranule(&STI);

		KdStream << Indent << ".amdhsa_reserve_vcc " << 0 << '\n';
		KdStream << Indent << ".amdhsa_reserve_flat_scratch " << 0 << '\n';
		KdStream << Indent << ".amdhsa_reserve_xnack_mask " << 0 << '\n';
		KdStream << Indent << ".amdhsa_next_free_sgpr " << NextFreeSGPR << "\n";

		if (FourByteBuffer & COMPUTE_PGM_RSRC1_PRIORITY)
		return MCDisassembler::Fail;

		PRINT_DIRECTIVE(".amdhsa_float_round_mode_32",
		scott.linderUnsubmitted Done Reply Inline Actions Same as above, I think a new `getNumSGPRs` to complement `getNumSGPRBlocks` would make this easier to read. For the GFX10 case we could either leave the check here, or have the new function return `Optional` to indicate when there is an error. scott.linder: Same as above, I think a new `getNumSGPRs` to complement `getNumSGPRBlocks` would make this…
		rochauhaAuthorUnsubmitted Not Done Reply Inline Actions As I mentioned in my other reply, this patch is already quite big. So it'd be better to have a separate patch for the new functions. rochauha: As I mentioned in my other reply, this patch is already quite big. So it'd be better to have a…
		COMPUTE_PGM_RSRC1_FLOAT_ROUND_MODE_32);
		jhendersonUnsubmitted Done Reply Inline Actions Use `StringRef` here. jhenderson: Use `StringRef` here.
		PRINT_DIRECTIVE(".amdhsa_float_round_mode_16_64",
		COMPUTE_PGM_RSRC1_FLOAT_ROUND_MODE_16_64);
		jhendersonUnsubmitted Done Reply Inline Actions clang-format plays better with these sort of comments as below: DataExtractor DE(Bytes, /IsLittleEndian=/true, /AddressSize=/64); Also, you probably want an address size of 8, not 64 (address size is in bytes). jhenderson: clang-format plays better with these sort of comments as below: ``` DataExtractor DE(Bytes…
		PRINT_DIRECTIVE(".amdhsa_float_denorm_mode_32",
		jhendersonUnsubmitted Done Reply Inline Actions I think it's better to do: Error Err = Error::success(); ... DE.getU32(&CurrentIndex, &Err); and then check the error. The DataExtractor methods handle the `Err` so that it doesn't matter that it's not been checked yet. You'd have to handle the error yourself though. See also my comment about `Cursor`. If you really think there's no need for the Error, there's no need to specify it at all - in all the `DataExtractor` functions the `Error` parameter is an optional parameter, with nullptr as the default value. jhenderson: I think it's better to do: ``` Error Err = Error::success(); ... DE.getU32(&CurrentIndex, &Err)…
		COMPUTE_PGM_RSRC1_FLOAT_DENORM_MODE_32);
		scott.linderUnsubmitted Not Done Reply Inline Actions Unnecessary shift scott.linder: Unnecessary shift
		scott.linderUnsubmitted Not Done Reply Inline Actions What is "GS" and why is this commented out? scott.linder: What is "GS" and why is this commented out?
		rochauhaAuthorUnsubmitted Done Reply Inline Actions This was was for printing Granulated wave front SGPR count. rochauha: This was was for printing Granulated wave front SGPR count.
		PRINT_DIRECTIVE(".amdhsa_float_denorm_mode_16_64",
		COMPUTE_PGM_RSRC1_FLOAT_DENORM_MODE_16_64);

		if (FourByteBuffer & COMPUTE_PGM_RSRC1_PRIV)
		return MCDisassembler::Fail;

		PRINT_DIRECTIVE(".amdhsa_dx10_clamp", COMPUTE_PGM_RSRC1_ENABLE_DX10_CLAMP);

		if (FourByteBuffer & COMPUTE_PGM_RSRC1_DEBUG_MODE)
		return MCDisassembler::Fail;

		scott.linderUnsubmitted Not Done Reply Inline Actions Unnecessary shift, same for all other cases below when checking that the bits under a mask are 0. scott.linder: Unnecessary shift, same for all other cases below when checking that the bits under a mask are…
		PRINT_DIRECTIVE(".amdhsa_ieee_mode", COMPUTE_PGM_RSRC1_ENABLE_IEEE_MODE);
		jhendersonUnsubmitted Done Reply Inline Actions I don't understand what this comment is trying to tell me, especially since the `Size` computation below is `4 + 4`... jhenderson: I don't understand what this comment is trying to tell me, especially since the `Size`…
		rochauhaAuthorUnsubmitted Done Reply Inline Actions When we fail, we set: Size = CurrentIndex (i.e starting point of the chunk of bytes) + length of the chunk. The failed region is from 0 to this new value of Size. We do this because most directives in the kernel descriptor affect single or very few bits. rochauha: When we fail, we set: Size = CurrentIndex (i.e starting point of the chunk of bytes) +…

		if (FourByteBuffer & COMPUTE_PGM_RSRC1_BULKY)
		return MCDisassembler::Fail;

		if (FourByteBuffer & COMPUTE_PGM_RSRC1_CDBG_USER)
		return MCDisassembler::Fail;

		PRINT_DIRECTIVE(".amdhsa_fp16_overflow", COMPUTE_PGM_RSRC1_FP16_OVFL);

		if (FourByteBuffer & COMPUTE_PGM_RSRC1_RESERVED0)
		return MCDisassembler::Fail;

		if (isGFX10()) {
		PRINT_DIRECTIVE(".amdhsa_workgroup_processor_mode",
		COMPUTE_PGM_RSRC1_WGP_MODE);
		PRINT_DIRECTIVE(".amdhsa_memory_ordered", COMPUTE_PGM_RSRC1_MEM_ORDERED);
		PRINT_DIRECTIVE(".amdhsa_forward_progress", COMPUTE_PGM_RSRC1_FWD_PROGRESS);
		}
		return MCDisassembler::Success;
		}

		// NOLINTNEXTLINE(readability-identifier-naming)
		MCDisassembler::DecodeStatus AMDGPUDisassembler::decodeCOMPUTE_PGM_RSRC2(
		jhendersonUnsubmitted Done Reply Inline Actions next -> Next jhenderson: next -> Next
		uint32_t FourByteBuffer, raw_string_ostream &KdStream) const {
		using namespace amdhsa;
		StringRef Indent = "\t";
		PRINT_DIRECTIVE(
		".amdhsa_system_sgpr_private_segment_wavefront_offset",
		COMPUTE_PGM_RSRC2_ENABLE_SGPR_PRIVATE_SEGMENT_WAVEFRONT_OFFSET);
		PRINT_DIRECTIVE(".amdhsa_system_sgpr_workgroup_id_x",
		COMPUTE_PGM_RSRC2_ENABLE_SGPR_WORKGROUP_ID_X);
		PRINT_DIRECTIVE(".amdhsa_system_sgpr_workgroup_id_y",
		COMPUTE_PGM_RSRC2_ENABLE_SGPR_WORKGROUP_ID_Y);
		jhendersonUnsubmitted Done Reply Inline Actions i -> I jhenderson: i -> I
		PRINT_DIRECTIVE(".amdhsa_system_sgpr_workgroup_id_z",
		COMPUTE_PGM_RSRC2_ENABLE_SGPR_WORKGROUP_ID_Z);
		PRINT_DIRECTIVE(".amdhsa_system_sgpr_workgroup_info",
		COMPUTE_PGM_RSRC2_ENABLE_SGPR_WORKGROUP_INFO);
		PRINT_DIRECTIVE(".amdhsa_system_vgpr_workitem_id",
		COMPUTE_PGM_RSRC2_ENABLE_VGPR_WORKITEM_ID);

		if (FourByteBuffer & COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_ADDRESS_WATCH)
		return MCDisassembler::Fail;

		scott.linderUnsubmitted Not Done Reply Inline Actions I don't know the general conventions here, but I don't think I have seen a comment for the end of a function elsewhere in LLVM. I do know that it is required for namespaces, so maybe it is permitted for long functions? scott.linder: I don't know the general conventions here, but I don't think I have seen a comment for the end…
		rochauhaAuthorUnsubmitted Done Reply Inline Actions I'm not sure. I added those comments because these functions were getting quite long. rochauha: I'm not sure. I added those comments because these functions were getting quite long.
		scott.linderUnsubmitted Done Reply Inline Actions I would lean towards omitting these, especially with the functions becoming shorter. For example, `decodeCOMPUTE_PGM_RSRC2()` is now <50 lines long at the entire definition now fits on one screen for me. It seems like there are other examples of this in the codebase, though, so I'm OK with it for the longer functions. scott.linder: I would lean towards omitting these, especially with the functions becoming shorter. For…
		rochauhaAuthorUnsubmitted Not Done Reply Inline Actions Done. rochauha: Done.
		if (FourByteBuffer & COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_MEMORY)
		return MCDisassembler::Fail;

		if (FourByteBuffer & COMPUTE_PGM_RSRC2_GRANULATED_LDS_SIZE)
		scott.linderUnsubmitted Done Reply Inline Actions Can you expand this comment a little and move it to a Doxygen comment for the function? scott.linder: Can you expand this comment a little and move it to a Doxygen comment for the function?
		rochauhaAuthorUnsubmitted Not Done Reply Inline Actions Done. rochauha: Done.
		return MCDisassembler::Fail;

		PRINT_DIRECTIVE(
		".amdhsa_exception_fp_ieee_invalid_op",
		COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_IEEE_754_FP_INVALID_OPERATION);
		PRINT_DIRECTIVE(".amdhsa_exception_fp_denorm_src",
		COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_FP_DENORMAL_SOURCE);
		PRINT_DIRECTIVE(
		".amdhsa_exception_fp_ieee_div_zero",
		COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_IEEE_754_FP_DIVISION_BY_ZERO);
		PRINT_DIRECTIVE(".amdhsa_exception_fp_ieee_overflow",
		COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_IEEE_754_FP_OVERFLOW);
		PRINT_DIRECTIVE(".amdhsa_exception_fp_ieee_underflow",
		COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_IEEE_754_FP_UNDERFLOW);
		PRINT_DIRECTIVE(".amdhsa_exception_fp_ieee_inexact",
		COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_IEEE_754_FP_INEXACT);
		PRINT_DIRECTIVE(".amdhsa_exception_int_div_zero",
		COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_INT_DIVIDE_BY_ZERO);

		if (FourByteBuffer & COMPUTE_PGM_RSRC2_RESERVED0)
		return MCDisassembler::Fail;

		return MCDisassembler::Success;
		}

		#undef PRINT_DIRECTIVE

		MCDisassembler::DecodeStatus
		AMDGPUDisassembler::decodeKernelDescriptorDirective(
		DataExtractor::Cursor &Cursor, ArrayRef<uint8_t> Bytes,
		raw_string_ostream &KdStream) const {
		#define PRINT_DIRECTIVE(DIRECTIVE, MASK) \
		do { \
		KdStream << Indent << DIRECTIVE " " \
		<< ((TwoByteBuffer & MASK) >> (MASK##_SHIFT)) << '\n'; \
		} while (0)

		uint16_t TwoByteBuffer = 0;
		scott.linderUnsubmitted Not Done Reply Inline Actions All of these comments seem redundant to me, especially when the condition is simplified to just: if (Buffer & MASK) return Fail; At the very least, repeating the actual bit indices here when they are a part of the mask definition seems verbose. scott.linder: All of these comments seem redundant to me, especially when the condition is simplified to just…
		uint32_t FourByteBuffer = 0;
		uint64_t EightByteBuffer = 0;

		StringRef ReservedBytes;
		StringRef Indent = "\t";

		assert(Bytes.size() == 64);
		jhendersonUnsubmitted Done Reply Inline Actions I think for self-documentation purposes, it would be helpful to assert that `Bytes.size() == 64` here. I see that it is verified in the calling function but a) that's not obvious when looking at this function in isolation, and b) in the future, we don't want other places calling this code without that check, so the assert provides a backstop of sorts. jhenderson: I think for self-documentation purposes, it would be helpful to assert that `Bytes.size() ==…
		DataExtractor DE(Bytes, /IsLittleEndian=/true, /AddressSize=/8);
		rochauhaAuthorUnsubmitted Not Done Reply Inline Actions I think this is where little endian is being 'hardcoded'. However, since AMDGPU relocatable objects are meant to be little endian, I don't understand why they are big endian on pp64. rochauha: I think this is where little endian is being 'hardcoded'. However, since AMDGPU relocatable…
		scott.linderUnsubmitted Not Done Reply Inline Actions I think this is actually a bug with the encoding code. Like you say, we should be host-endianness agnostic when encoding the kernel descriptor, but it seems we aren't. I think I vaguely remember this coming up when we implemented it, but I don't remember why we didn't do this to start. I think it is just an oversight. scott.linder: I think this is actually a bug with the encoding code. Like you say, we should be host…
		scott.linderUnsubmitted Not Done Reply Inline Actions I think https://reviews.llvm.org/D88858 should be the fix, need to confirm if big-endian testers will run it automatically. scott.linder: I think https://reviews.llvm.org/D88858 should be the fix, need to confirm if big-endian…
		scott.linderUnsubmitted Not Done Reply Inline Actions I tried to determine if the pre-checkin builders include any big-endian archs, but gave up and just committed it. I'll keep an eye on the builders and see if it needs to be revered. After that you can proceed with this patch again. scott.linder: I tried to determine if the pre-checkin builders include any big-endian archs, but gave up and…
		rochauhaAuthorUnsubmitted Done Reply Inline Actions Thanks! rochauha: Thanks!
		scott.linderUnsubmitted Not Done Reply Inline Actions An update, I missed one test in the initial commit, but I followed up in bf5c1d92d92ef8cee2adbfa17ecca20a8f65dc0e and now the big-endian testers seem to be happy. I think you can try reapplying this. scott.linder: An update, I missed one test in the initial commit, but I followed up in…

		switch (Cursor.tell()) {
		case amdhsa::GROUP_SEGMENT_FIXED_SIZE_OFFSET:
		FourByteBuffer = DE.getU32(Cursor);
		KdStream << Indent << ".amdhsa_group_segment_fixed_size " << FourByteBuffer
		<< '\n';
		return MCDisassembler::Success;

		case amdhsa::PRIVATE_SEGMENT_FIXED_SIZE_OFFSET:
		FourByteBuffer = DE.getU32(Cursor);
		KdStream << Indent << ".amdhsa_private_segment_fixed_size "
		<< FourByteBuffer << '\n';
		return MCDisassembler::Success;
		jhendersonUnsubmitted Not Done Reply Inline Actions Rather than all these `checkError` calls, I'd expect to see a check of the `Cursor` followed by a return of `MCDisassembler::Fail` to indicate there was a problem. jhenderson: Rather than all these `checkError` calls, I'd expect to see a check of the `Cursor` followed by…

		case amdhsa::RESERVED0_OFFSET:
		// 8 reserved bytes, must be 0.
		EightByteBuffer = DE.getU64(Cursor);
		if (EightByteBuffer) {
		return MCDisassembler::Fail;
		}
		return MCDisassembler::Success;

		case amdhsa::KERNEL_CODE_ENTRY_BYTE_OFFSET_OFFSET:
		// KERNEL_CODE_ENTRY_BYTE_OFFSET
		// So far no directive controls this for Code Object V3, so simply skip for
		// disassembly.
		DE.skip(Cursor, 8);
		return MCDisassembler::Success;

		case amdhsa::RESERVED1_OFFSET:
		// 20 reserved bytes, must be 0.
		ReservedBytes = DE.getBytes(Cursor, 20);
		for (int I = 0; I < 20; ++I) {
		if (ReservedBytes[I] != 0) {
		return MCDisassembler::Fail;
		}
		}
		return MCDisassembler::Success;

		case amdhsa::COMPUTE_PGM_RSRC3_OFFSET:
		// COMPUTE_PGM_RSRC3
		// - Only set for GFX10, GFX6-9 have this to be 0.
		// - Currently no directives directly control this.
		FourByteBuffer = DE.getU32(Cursor);
		if (!isGFX10() && FourByteBuffer) {
		return MCDisassembler::Fail;
		}
		return MCDisassembler::Success;

		case amdhsa::COMPUTE_PGM_RSRC1_OFFSET:
		FourByteBuffer = DE.getU32(Cursor);
		if (decodeCOMPUTE_PGM_RSRC1(FourByteBuffer, KdStream) ==
		MCDisassembler::Fail) {
		return MCDisassembler::Fail;
		}
		return MCDisassembler::Success;

		case amdhsa::COMPUTE_PGM_RSRC2_OFFSET:
		FourByteBuffer = DE.getU32(Cursor);
		if (decodeCOMPUTE_PGM_RSRC2(FourByteBuffer, KdStream) ==
		MCDisassembler::Fail) {
		return MCDisassembler::Fail;
		}
		return MCDisassembler::Success;

		case amdhsa::KERNEL_CODE_PROPERTIES_OFFSET:
		using namespace amdhsa;
		TwoByteBuffer = DE.getU16(Cursor);
		jhendersonUnsubmitted Done Reply Inline Actions `StringRef` jhenderson: `StringRef`

		PRINT_DIRECTIVE(".amdhsa_user_sgpr_private_segment_buffer",
		jhendersonUnsubmitted Done Reply Inline Actions I think "cannot" is better than "can not". jhenderson: I think "cannot" is better than "can not".
		KERNEL_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER);
		PRINT_DIRECTIVE(".amdhsa_user_sgpr_dispatch_ptr",
		KERNEL_CODE_PROPERTY_ENABLE_SGPR_DISPATCH_PTR);
		PRINT_DIRECTIVE(".amdhsa_user_sgpr_queue_ptr",
		KERNEL_CODE_PROPERTY_ENABLE_SGPR_QUEUE_PTR);
		PRINT_DIRECTIVE(".amdhsa_user_sgpr_kernarg_segment_ptr",
		jhendersonUnsubmitted Done Reply Inline Actions `/IsLittleEndian =/true` -> `/IsLittleEndian=/true` (and same for `AddressSize`) as already asked for once... jhenderson: `/IsLittleEndian =/true` -> `/IsLittleEndian=/true` (and same for `AddressSize`) as…
		KERNEL_CODE_PROPERTY_ENABLE_SGPR_KERNARG_SEGMENT_PTR);
		scott.linderUnsubmitted Done Reply Inline Actions I don't understand the intent with carefully maintaining `Size` for the failure case. Aren't we certain by this point that this should be a kernel descriptor, and so the correct thing to do when we fail is to disassemble everything as `.byte` directives (i.e. set `Size` to the max value)? Why would we prefer to return a partial failure and have the disassembler start working on the remaining bytes as if they were instructions? That would also shorten the patch and make it obvious that the `Size` tracking is correct. scott.linder: I don't understand the intent with carefully maintaining `Size` for the failure case. Aren't we…
		rochauhaAuthorUnsubmitted Done Reply Inline Actions My initial take was to decode as byte directives to the point of failure indicated by Size. Then going back to the normal flow of disassembling as instructions. But I get what you want to say in this comment. Updated the code based on this comment. rochauha: My initial take was to decode as byte directives to the point of failure indicated by Size.
		PRINT_DIRECTIVE(".amdhsa_user_sgpr_dispatch_id",
		KERNEL_CODE_PROPERTY_ENABLE_SGPR_DISPATCH_ID);
		PRINT_DIRECTIVE(".amdhsa_user_sgpr_flat_scratch_init",
		KERNEL_CODE_PROPERTY_ENABLE_SGPR_FLAT_SCRATCH_INIT);
		jhendersonUnsubmitted Done Reply Inline Actions Bits or bytes? jhenderson: Bits or bytes?
		rochauhaAuthorUnsubmitted Done Reply Inline Actions Bits. If there is a bit that is wrong in a particular chunk of bytes, we consider that the entire chunk of bytes is invalid. We then update the `Size` value. Further, we say that the first `Size` bytes in a symbol are invalid. Error handling in llvm-objdump will print these bytes using `.byte` directive. And then we fall back to decoding the remaining bytes in the symbol as instructions. rochauha: Bits. If there is a bit that is wrong in a particular chunk of bytes, we consider that the…
		rochauhaAuthorUnsubmitted Done Reply Inline Actions We do this because most directives in the kernel descriptor affect single or a very few bits. rochauha: We do this because most directives in the kernel descriptor affect single or a very few bits.
		PRINT_DIRECTIVE(".amdhsa_user_sgpr_private_segment_size",
		KERNEL_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_SIZE);

		if (TwoByteBuffer & KERNEL_CODE_PROPERTY_RESERVED0)
		return MCDisassembler::Fail;

		// Reserved for GFX9
		if (isGFX9() &&
		scott.linderUnsubmitted Not Done Reply Inline Actions Rather than have these comments, which are still just filled with magic numbers, could we define these offsets more explicitly somewhere, as e.g. `amdhsa::GROUP_SEGMENT_FIXED_SIZE_OFFSET`? For example in `llvm/include/llvm/Support/AMDHSAKernelDescriptor.h` alongside the other definitions needed by the compiler? We could then also update the `static_assert`s there to use those definitions, so we aren't relying on inspection to know the same offsets are used everywhere. I.e. the following: 165 static_assert( 1 sizeof(kernel_descriptor_t) == 64, 2 "invalid size for kernel_descriptor_t"); 3 static_assert( 4 offsetof(kernel_descriptor_t, group_segment_fixed_size) == 0, 5 "invalid offset for group_segment_fixed_size"); 6 static_assert( 7 offsetof(kernel_descriptor_t, private_segment_fixed_size) == 4, 8 "invalid offset for private_segment_fixed_size"); 9 static_assert( 10 offsetof(kernel_descriptor_t, reserved0) == 8, 11 "invalid offset for reserved0"); ... Becomes: 165 static_assert( 1 sizeof(kernel_descriptor_t) == 64, 2 "invalid size for kernel_descriptor_t"); 3 static_assert( 4 offsetof(kernel_descriptor_t, group_segment_fixed_size) == GROUP_SEGMENT_FIXED_SIZE_OFFSET, 5 "invalid offset for group_segment_fixed_size"); 6 static_assert( 7 offsetof(kernel_descriptor_t, private_segment_fixed_size) == PRIVATE_SEGMENT_FIXED_SIZE_OFFSET, 8 "invalid offset for private_segment_fixed_size"); 9 static_assert( 10 offsetof(kernel_descriptor_t, reserved0) == RESERVED0_OFFSET, 11 "invalid offset for reserved0"); ... scott.linder: Rather than have these comments, which are still just filled with magic numbers, could we…
		scott.linderUnsubmitted Not Done Reply Inline Actions I would still like to see this done, the magic numbers here could lead to a few problems down the line, and presently they just make the code harder to read. scott.linder: I would still like to see this done, the magic numbers here could lead to a few problems down…
		(TwoByteBuffer & KERNEL_CODE_PROPERTY_ENABLE_WAVEFRONT_SIZE32)) {
		return MCDisassembler::Fail;
		} else if (isGFX10()) {
		PRINT_DIRECTIVE(".amdhsa_wavefront_size32",
		KERNEL_CODE_PROPERTY_ENABLE_WAVEFRONT_SIZE32);
		}

		if (TwoByteBuffer & KERNEL_CODE_PROPERTY_RESERVED1)
		return MCDisassembler::Fail;

		return MCDisassembler::Success;

		case amdhsa::RESERVED2_OFFSET:
		// 6 bytes from here are reserved, must be 0.
		ReservedBytes = DE.getBytes(Cursor, 6);
		for (int I = 0; I < 6; ++I) {
		if (ReservedBytes[I] != 0)
		return MCDisassembler::Fail;
		}
		return MCDisassembler::Success;

		default:
		llvm_unreachable("Unhandled index. Case statements cover everything.");
		return MCDisassembler::Fail;
		}
		#undef PRINT_DIRECTIVE
		}

		MCDisassembler::DecodeStatus AMDGPUDisassembler::decodeKernelDescriptor(
		StringRef KdName, ArrayRef<uint8_t> Bytes, uint64_t KdAddress) const {
		// CP microcode requires the kernel descriptor to be 64 aligned.
		if (Bytes.size() != 64 \|\| KdAddress % 64 != 0)
		return MCDisassembler::Fail;

		std::string Kd;
		raw_string_ostream KdStream(Kd);
		KdStream << ".amdhsa_kernel " << KdName << '\n';

		DataExtractor::Cursor C(0);
		while (C && C.tell() < Bytes.size()) {
		MCDisassembler::DecodeStatus Status =
		decodeKernelDescriptorDirective(C, Bytes, KdStream);

		cantFail(C.takeError());

		if (Status == MCDisassembler::Fail)
		return MCDisassembler::Fail;
		}
		KdStream << ".end_amdhsa_kernel\n";
		outs() << KdStream.str();
		return MCDisassembler::Success;
		}

		Optional<MCDisassembler::DecodeStatus>
		AMDGPUDisassembler::onSymbolStart(SymbolInfoTy &Symbol, uint64_t &Size,
		ArrayRef<uint8_t> Bytes, uint64_t Address,
		raw_ostream &CStream) const {
		// Right now only kernel descriptor needs to be handled.
		// We ignore all other symbols for target specific handling.
		// TODO:
		// Fix the spurious symbol issue for AMDGPU kernels. Exists for both Code
		// Object V2 and V3 when symbols are marked protected.

		// amd_kernel_code_t for Code Object V2.
		jhendersonUnsubmitted Done Reply Inline Actions If this loop terminates due to `C` being invalid, you don't want to fall out the bottom and return `Success`, I think. You'd want to check `C` after loop termination and return `Fail`. Alternatively, if you return `Fail` from the `decodeKernelDescriptorDirective` you can do `cantFail(C.takeError());` after the loop. Ideally, we'd actually report the error from `C` in the event of failure, but currently there's no way of communicating that back up to the caller. jhenderson: If this loop terminates due to `C` being invalid, you don't want to fall out the bottom and…
		rochauhaAuthorUnsubmitted Not Done Reply Inline Actions Thanks for the pointer! I think `cantFail` seems to be all that is needed, because the cursor was holding Error::success in some cases, which needs to be 'checked' before moving further. Since the kernel descriptor is well defined, all cases where failure needs to be handled are handled using `MCDisassembler::Fail`. rochauha: Thanks for the pointer! I think `cantFail` seems to be all that is needed, because the cursor…
		jhendersonUnsubmitted Not Done Reply Inline Actions Ah, I missed that the input can't be truncated, so `C` can't get into a failure state itself. Thanks! (See also my comment about an assert above). jhenderson: Ah, I missed that the input can't be truncated, so `C` can't get into a failure state itself.
		if (Symbol.Type == ELF::STT_AMDGPU_HSA_KERNEL) {
		Size = 256;
		return MCDisassembler::Fail;
		}

		// Code Object V3 kernel descriptors.
		StringRef Name = Symbol.Name;
		if (Symbol.Type == ELF::STT_OBJECT && Name.endswith(StringRef(".kd"))) {
		Size = 64; // Size = 64 regardless of success or failure.
		return decodeKernelDescriptor(Name.drop_back(3), Bytes, Address);
		}
		return None;
		}

		//===----------------------------------------------------------------------===//
// AMDGPUSymbolizer		// AMDGPUSymbolizer
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

// Try to find symbol name for specified label		// Try to find symbol name for specified label
bool AMDGPUSymbolizer::tryAddingSymbolicOperand(MCInst &Inst,		bool AMDGPUSymbolizer::tryAddingSymbolicOperand(MCInst &Inst,
raw_ostream &/cStream/, int64_t Value,		raw_ostream &/cStream/, int64_t Value,
uint64_t /Address/, bool IsBranch,		uint64_t /Address/, bool IsBranch,
uint64_t /Offset/, uint64_t /InstSize/) {		uint64_t /Offset/, uint64_t /InstSize/) {

if (!IsBranch) {		if (!IsBranch) {
return false;		return false;
}		}

		scott.linderUnsubmitted Not Done Reply Inline Actions Can this just be `return decodeKernelDescriptor(...);`? scott.linder: Can this just be `return decodeKernelDescriptor(...);`?
auto Symbols = static_cast<SectionSymbolsTy >(DisInfo);		auto Symbols = static_cast<SectionSymbolsTy >(DisInfo);
if (!Symbols)		if (!Symbols)
return false;		return false;

auto Result = std::find_if(Symbols->begin(), Symbols->end(),		auto Result = std::find_if(Symbols->begin(), Symbols->end(),
[Value](const SymbolInfoTy& Val) {		[Value](const SymbolInfoTy& Val) {
return Val.Addr == static_cast<uint64_t>(Value)		return Val.Addr == static_cast<uint64_t>(Value)
&& Val.Type == ELF::STT_NOTYPE;		&& Val.Type == ELF::STT_NOTYPE;
});		});
if (Result != Symbols->end()) {		if (Result != Symbols->end()) {
auto *Sym = Ctx.getOrCreateSymbol(Result->Name);		auto *Sym = Ctx.getOrCreateSymbol(Result->Name);
const auto *Add = MCSymbolRefExpr::create(Sym, Ctx);		const auto *Add = MCSymbolRefExpr::create(Sym, Ctx);
Inst.addOperand(MCOperand::createExpr(Add));		Inst.addOperand(MCOperand::createExpr(Add));
		jhendersonUnsubmitted Done Reply Inline Actions `StringRef` jhenderson: `StringRef`
return true;		return true;
}		}
return false;		return false;
}		}

void AMDGPUSymbolizer::tryAddingPcLoadReferenceComment(raw_ostream &cStream,		void AMDGPUSymbolizer::tryAddingPcLoadReferenceComment(raw_ostream &cStream,
int64_t Value,		int64_t Value,
uint64_t Address) {		uint64_t Address) {
llvm_unreachable("unimplemented");		llvm_unreachable("unimplemented");
}		}

		kzhuravlUnsubmitted Done Reply Inline Actions Don't use bare numbers. Use enums defined in https://github.com/llvm/llvm-project/blob/master/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h kzhuravl: Don't use bare numbers. Use enums defined in https://github.com/llvm/llvm…
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Initialization		// Initialization
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

static MCSymbolizer createAMDGPUSymbolizer(const Triple &/TT*/,		static MCSymbolizer createAMDGPUSymbolizer(const Triple &/TT*/,
LLVMOpInfoCallback /GetOpInfo/,		LLVMOpInfoCallback /GetOpInfo/,
LLVMSymbolLookupCallback /SymbolLookUp/,		LLVMSymbolLookupCallback /SymbolLookUp/,
void *DisInfo,		void *DisInfo,
MCContext *Ctx,		MCContext *Ctx,
std::unique_ptr<MCRelocationInfo> &&RelInfo) {		std::unique_ptr<MCRelocationInfo> &&RelInfo) {
return new AMDGPUSymbolizer(*Ctx, std::move(RelInfo), DisInfo);		return new AMDGPUSymbolizer(*Ctx, std::move(RelInfo), DisInfo);
}		}

static MCDisassembler *createAMDGPUDisassembler(const Target &T,		static MCDisassembler *createAMDGPUDisassembler(const Target &T,
const MCSubtargetInfo &STI,		const MCSubtargetInfo &STI,
MCContext &Ctx) {		MCContext &Ctx) {
return new AMDGPUDisassembler(STI, Ctx, T.createMCInstrInfo());		return new AMDGPUDisassembler(STI, Ctx, T.createMCInstrInfo());
}		}

extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUDisassembler() {		extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUDisassembler() {
TargetRegistry::RegisterMCDisassembler(getTheGCNTarget(),		TargetRegistry::RegisterMCDisassembler(getTheGCNTarget(),
createAMDGPUDisassembler);		createAMDGPUDisassembler);
TargetRegistry::RegisterMCSymbolizer(getTheGCNTarget(),		TargetRegistry::RegisterMCSymbolizer(getTheGCNTarget(),
createAMDGPUSymbolizer);		createAMDGPUSymbolizer);
		scott.linderUnsubmitted Done Reply Inline Actions Need to handle the "default" case here: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:1510:1: warning: control may reach end of non-void function [-Wreturn-type] scott.linder: Need to handle the "default" case here: ``` llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassem…
		rochauhaAuthorUnsubmitted Not Done Reply Inline Actions Done. rochauha: Done.
}		}
		jhendersonUnsubmitted Done Reply Inline Actions Put this comment above the `if` it is referring to, like the exisitng bit of comment. jhenderson: Put this comment above the `if` it is referring to, like the exisitng bit of comment.
		jhendersonUnsubmitted Done Reply Inline Actions Nit: missing full stop. jhenderson: Nit: missing full stop.
		scott.linderUnsubmitted Not Done Reply Inline Actions The `!= 0` here is redundant. scott.linder: The `!= 0` here is redundant.
		rochauhaAuthorUnsubmitted Done Reply Inline Actions I know, but I thought that it is more readable this way. rochauha: I know, but I thought that it is more readable this way.
		scott.linderUnsubmitted Not Done Reply Inline Actions Fair enough, in a type-safe language it would be required anyway, so it seems reasonable. scott.linder: Fair enough, in a type-safe language it would be required anyway, so it seems reasonable.
		scott.linderUnsubmitted Done Reply Inline Actions Could you move the call to `.drop_back(3)` out into the calling function, so it appears next to the check for `.endswith(StringRef(".kd"))`? scott.linder: Could you move the call to `.drop_back(3)` out into the calling function, so it appears next to…
		rochauhaAuthorUnsubmitted Done Reply Inline Actions Done. rochauha: Done.
		scott.linderUnsubmitted Not Done Reply Inline Actions I'm still not sure what we landed on for the semantics of `SoftFail` here? scott.linder: I'm still not sure what we landed on for the semantics of `SoftFail` here?
		rochauhaAuthorUnsubmitted Done Reply Inline Actions It should be Success / Fail based on what the bytes are for code object v2. But there's nothing we are 'doing' at the moment for v2, I returned SoftFail. rochauha: It should be Success / Fail based on what the bytes are for code object v2. But there's nothing…
		scott.linderUnsubmitted Done Reply Inline Actions If `SoftFail` isn't applicable I don't think we should return it, even if it is just because we haven't implemented something yet. It existing doesn't mean it needs to be used, I think it has a very narrow definition that doesn't apply here. Maybe just emit a diagnostic and return `Fail` so we get the "decode as .byte" behavior? What exactly happens now with the current patch as-is? scott.linder: If `SoftFail` isn't applicable I don't think we should return it, even if it is just because we…
		rochauhaAuthorUnsubmitted Done Reply Inline Actions Done. rochauha: Done.

llvm/test/CodeGen/AMDGPU/nop-data.ll

	; RUN: llc -mtriple=amdgcn--amdhsa -mattr=-code-object-v3 -mcpu=fiji -filetype=obj < %s \| llvm-objdump -d - --mcpu=fiji \| FileCheck %s			; RUN: llc -mtriple=amdgcn--amdhsa -mattr=-code-object-v3 -mcpu=fiji -filetype=obj < %s \| llvm-objdump -d - --mcpu=fiji \| FileCheck %s

	; CHECK: <kernel0>:			; CHECK: <kernel0>:
	; CHECK-NEXT: s_endpgm			; CHECK: s_endpgm
	define amdgpu_kernel void @kernel0() align 256 {			define amdgpu_kernel void @kernel0() align 256 {
	entry:			entry:
	ret void			ret void
	}			}

	; CHECK-NEXT: s_nop 0			; CHECK-NEXT: s_nop 0
	; CHECK-NEXT: s_nop 0			; CHECK-NEXT: s_nop 0
	; CHECK-NEXT: s_nop 0			; CHECK-NEXT: s_nop 0
	▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: s_nop 0			; CHECK-NEXT: s_nop 0
	; CHECK-NEXT: s_nop 0			; CHECK-NEXT: s_nop 0
	; CHECK-NEXT: s_nop 0			; CHECK-NEXT: s_nop 0
	; CHECK-NEXT: s_nop 0			; CHECK-NEXT: s_nop 0
	; CHECK-NEXT: s_nop 0 // 0000000001FC: BF800000			; CHECK-NEXT: s_nop 0 // 0000000001FC: BF800000

	; CHECK-EMPTY:			; CHECK-EMPTY:
	; CHECK-NEXT: <kernel1>:			; CHECK-NEXT: <kernel1>:
	; CHECK-NEXT: s_endpgm			; CHECK: s_endpgm
	define amdgpu_kernel void @kernel1(i32 addrspace(1)* addrspace(4)* %ptr.out) align 256 {			define amdgpu_kernel void @kernel1(i32 addrspace(1)* addrspace(4)* %ptr.out) align 256 {
	entry:			entry:
	ret void			ret void
	}			}

llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-failure.s

This file was added.

;; Failure test. We create a malformed kernel descriptor (KD) by manually

;; setting the bytes, because one can't create a malformed KD using the

jhendersonUnsubmitted

Not Done

Please add a comment to the top of this test explaining what this test is actually testing.

jhenderson: Please add a comment to the top of this test explaining what this test is actually testing.

;; assembler directives.

jhendersonUnsubmitted

Not Done

;; Failure test. We create a malformed kernel descriptor (KD) by manually

- ;; setting the bytes. Because in actuality, one can't create a malformed KD

+ ;; setting the bytes, because in actuality, one can't create a malformed KD

;; using the assembler directives.

jhenderson:

jhendersonUnsubmitted

Not Done

Rather than writing text to another input file at run time in this way, you can use the new split-file tool to have all the input inline below, and split it up using the tool into multiple files.

jhenderson: Rather than writing text to another input file at run time in this way, you can use the new…

rochauhaAuthorUnsubmitted

Not Done

In case of kd-failure.s, concatenation is necessary because the disassembled output will be .bytes, and the symbol information is needed to be get the same binary again. I feel that "printf'ing" to a file and concatenating with disassembled text is a simpler compared to split file and concatenating.

However, I have used the split-file tool to separate the S/VGPR test cases.

rochauha: In case of kd-failure.s, concatenation is necessary because the disassembled output will be `.

; RUN: llvm-mc %s -mattr=+code-object-v3 --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t.o

; RUN: printf ".type my_kernel.kd, @object \nmy_kernel.kd:\n.size my_kernel.kd, 64\n" > %t1.sym_info

; RUN: llvm-objdump --disassemble-symbols=my_kernel.kd %t.o \

; RUN: | tail -n +9 > %t1.sym_content

; RUN: cat %t1.sym_info %t1.sym_content > %t1.s

; RUN: llvm-mc %t1.s -mattr=+code-object-v3 --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t-re-assemble.o

jhendersonUnsubmitted

Done

Please don't mix comment markers within the same file. You use '//' here, but ';' everywhere else.

Additionally, new LLVM binutils tests tend to use double comment markers to indicate true comments as opposed to RUN/CHECK lines (i.e. ';;' in this context).

jhenderson: Please don't mix comment markers within the same file. You use '//' here, but ';' everywhere…

; RUN: diff %t.o %t-re-assemble.o

;; Test failure by setting one of the reserved bytes to non-zero value.

.type my_kernel.kd, @object

.size my_kernel.kd, 64

my_kernel.kd:

.long 0x00000000 ;; group_segment_fixed_size

.long 0x00000000 ;; private_segment_fixed_size

.quad 0x00FF000000000000 ;; reserved bytes.

.quad 0x0000000000000000 ;; kernel_code_entry_byte_offset, any value works.

;; 20 reserved bytes.

.quad 0x0000000000000000

.long 0x00000000

.long 0x00000000 ;; compute_PGM_RSRC3

.long 0x00000000 ;; compute_PGM_RSRC1

.long 0x00000000 ;; compute_PGM_RSRC2

.short 0x0000 ;; additional fields.

;; 6 reserved bytes.

.long 0x0000000

jhendersonUnsubmitted

Not Done

Please remove the additional blank lines at the end of file.

jhenderson: Please remove the additional blank lines at the end of file.

.short 0x0000

llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-sgpr.s

This file was added.

				;; Test disassembly for GRANULATED_WAVEFRONT_SGPR_COUNT in the kernel descriptor.

				jhendersonUnsubmitted Done Reply Inline Actions Please follow the comments from `kd-failure.s` in all the tests (where applicable) too. jhenderson: Please follow the comments from `kd-failure.s` in all the tests (where applicable) too.
				; RUN: split-file %s %t.dir

				; RUN: llvm-mc %t.dir/1.s -mattr=+code-object-v3 --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t1
				; RUN: llvm-objdump --disassemble-symbols=my_kernel_1.kd %t1 \| tail -n +8 \
				; RUN: \| llvm-mc --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t1-re-assemble
				; RUN: diff %t1 %t1-re-assemble

				; RUN: llvm-mc %t.dir/2.s -mattr=+code-object-v3 --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t2
				; RUN: llvm-objdump --disassemble-symbols=my_kernel_2.kd %t2 \| tail -n +8 \
				; RUN: \| llvm-mc --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t2-re-assemble
				; RUN: diff %t2 %t2-re-assemble

				; RUN: llvm-mc %t.dir/3.s -mattr=+code-object-v3 --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t3
				; RUN: llvm-objdump --disassemble-symbols=my_kernel_3.kd %t3 \| tail -n +8 \
				; RUN: \| llvm-mc --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t3-re-assemble
				; RUN: diff %t3 %t3-re-assemble


				;--- 1.s
				;; Only set next_free_sgpr.
				.amdhsa_kernel my_kernel_1
				jhendersonUnsubmitted Done Reply Inline Actions Nit: missing full stop. jhenderson: Nit: missing full stop.
				.amdhsa_next_free_vgpr 0
				.amdhsa_next_free_sgpr 42
				.amdhsa_reserve_flat_scratch 0
				.amdhsa_reserve_xnack_mask 0
				.amdhsa_reserve_vcc 0
				.end_amdhsa_kernel

				;--- 2.s
				;; Only set other directives.
				.amdhsa_kernel my_kernel_2
				.amdhsa_next_free_vgpr 0
				.amdhsa_next_free_sgpr 0
				.amdhsa_reserve_flat_scratch 1
				.amdhsa_reserve_xnack_mask 1
				.amdhsa_reserve_vcc 1
				.end_amdhsa_kernel

				;--- 3.s
				;; Set all affecting directives.
				.amdhsa_kernel my_kernel_3
				.amdhsa_next_free_vgpr 0
				.amdhsa_next_free_sgpr 35
				.amdhsa_reserve_flat_scratch 1
				.amdhsa_reserve_xnack_mask 1
				.amdhsa_reserve_vcc 1
				.end_amdhsa_kernel

llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-vgpr.s

This file was added.

				;; Test disassembly for GRANULATED_WORKITEM_VGPR_COUNT in the kernel descriptor.

				; RUN: split-file %s %t.dir

				; RUN: llvm-mc %t.dir/1.s -mattr=+code-object-v3 --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t1
				; RUN: llvm-objdump --disassemble-symbols=my_kernel_1.kd %t1 \| tail -n +8 \
				; RUN: \| llvm-mc --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t1-re-assemble
				; RUN: diff %t1 %t1-re-assemble

				; RUN: llvm-mc %t.dir/2.s -mattr=+code-object-v3 --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t2
				; RUN: llvm-objdump --disassemble-symbols=my_kernel_2.kd %t2 \| tail -n +8 \
				; RUN: \| llvm-mc --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t2-re-assemble
				; RUN: diff %t2 %t2-re-assemble

				; RUN: llvm-mc %t.dir/3.s -mattr=+code-object-v3 --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t3
				; RUN: llvm-objdump --disassemble-symbols=my_kernel_3.kd %t3 \| tail -n +8 \
				; RUN: \| llvm-mc --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t3-re-assemble
				; RUN: diff %t3 %t3-re-assemble

				;--- 1.s
				.amdhsa_kernel my_kernel_1
				.amdhsa_next_free_vgpr 23
				.amdhsa_next_free_sgpr 0
				.end_amdhsa_kernel

				;--- 2.s
				.amdhsa_kernel my_kernel_2
				.amdhsa_next_free_vgpr 14
				.amdhsa_next_free_sgpr 0
				.end_amdhsa_kernel

				;--- 3.s
				.amdhsa_kernel my_kernel_3
				.amdhsa_next_free_vgpr 32
				.amdhsa_next_free_sgpr 0
				.end_amdhsa_kernel

llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-gfx10.s

This file was added.

				;; Entirely zeroed kernel descriptor (for GFX10).

				; RUN: llvm-mc %s -mattr=+code-object-v3 --triple=amdgcn-amd-amdhsa -mcpu=gfx1010 -filetype=obj -o %t
				; RUN: llvm-objdump -s -j .text %t \| FileCheck --check-prefix=OBJDUMP %s

				;; TODO:
				;; This file and kd-zeroed-raw.s should produce the same output for the kernel
				;; descriptor - a block of 64 zeroed bytes. But looks like the assembler sets
				;; the FWD_PROGRESS bit in COMPUTE_PGM_RSRC1 to 1 even when the directive
				;; mentions 0 (see line 36).

				;; Check the raw bytes right now.

				; OBJDUMP: 0000 00000000 00000000 00000000 00000000
				; OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000
				; OBJDUMP-NEXT: 0020 00000000 00000000 00000000 00000000
				; OBJDUMP-NEXT: 0030 01000000 00000000 00000000 00000000

				.amdhsa_kernel my_kernel
				.amdhsa_group_segment_fixed_size 0
				.amdhsa_private_segment_fixed_size 0
				.amdhsa_next_free_vgpr 8
				.amdhsa_reserve_vcc 0
				.amdhsa_reserve_flat_scratch 0
				.amdhsa_reserve_xnack_mask 0
				.amdhsa_next_free_sgpr 8
				.amdhsa_float_round_mode_32 0
				.amdhsa_float_round_mode_16_64 0
				.amdhsa_float_denorm_mode_32 0
				.amdhsa_float_denorm_mode_16_64 0
				.amdhsa_dx10_clamp 0
				.amdhsa_ieee_mode 0
				.amdhsa_fp16_overflow 0
				.amdhsa_workgroup_processor_mode 0
				.amdhsa_memory_ordered 0
				.amdhsa_forward_progress 0
				.amdhsa_system_sgpr_private_segment_wavefront_offset 0
				.amdhsa_system_sgpr_workgroup_id_x 0
				.amdhsa_system_sgpr_workgroup_id_y 0
				.amdhsa_system_sgpr_workgroup_id_z 0
				.amdhsa_system_sgpr_workgroup_info 0
				.amdhsa_system_vgpr_workitem_id 0
				.amdhsa_exception_fp_ieee_invalid_op 0
				.amdhsa_exception_fp_denorm_src 0
				.amdhsa_exception_fp_ieee_div_zero 0
				.amdhsa_exception_fp_ieee_overflow 0
				.amdhsa_exception_fp_ieee_underflow 0
				.amdhsa_exception_fp_ieee_inexact 0
				.amdhsa_exception_int_div_zero 0
				.amdhsa_user_sgpr_private_segment_buffer 0
				.amdhsa_user_sgpr_dispatch_ptr 0
				.amdhsa_user_sgpr_queue_ptr 0
				.amdhsa_user_sgpr_kernarg_segment_ptr 0
				.amdhsa_user_sgpr_dispatch_id 0
				.amdhsa_user_sgpr_flat_scratch_init 0
				.amdhsa_user_sgpr_private_segment_size 0
				.amdhsa_wavefront_size32 0
				.end_amdhsa_kernel

llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-gfx9.s

This file was added.

				;; Entirely zeroed kernel descriptor (for GFX9).

				; RUN: llvm-mc %s -mattr=+code-object-v3 --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t1
				; RUN: llvm-objdump --disassemble-symbols=my_kernel.kd %t1 \
				; RUN: \| tail -n +8 \| llvm-mc --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t2
				; RUN: diff %t1 %t2

				; RUN: llvm-objdump -s -j .text %t1 \| FileCheck --check-prefix=OBJDUMP %s

				; OBJDUMP: 0000 00000000 00000000 00000000 00000000
				; OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000
				; OBJDUMP-NEXT: 0020 00000000 00000000 00000000 00000000
				; OBJDUMP-NEXT: 0030 00000000 00000000 00000000 00000000

				;; This file and kd-zeroed-raw.s produce the same output for the kernel
				;; descriptor - a block of 64 zeroed bytes.

				.amdhsa_kernel my_kernel
				.amdhsa_group_segment_fixed_size 0
				.amdhsa_private_segment_fixed_size 0
				.amdhsa_next_free_vgpr 0
				.amdhsa_reserve_vcc 0
				.amdhsa_reserve_flat_scratch 0
				.amdhsa_reserve_xnack_mask 0
				.amdhsa_next_free_sgpr 0
				.amdhsa_float_round_mode_32 0
				.amdhsa_float_round_mode_16_64 0
				.amdhsa_float_denorm_mode_32 0
				.amdhsa_float_denorm_mode_16_64 0
				.amdhsa_dx10_clamp 0
				.amdhsa_ieee_mode 0
				.amdhsa_fp16_overflow 0
				.amdhsa_system_sgpr_private_segment_wavefront_offset 0
				.amdhsa_system_sgpr_workgroup_id_x 0
				.amdhsa_system_sgpr_workgroup_id_y 0
				.amdhsa_system_sgpr_workgroup_id_z 0
				.amdhsa_system_sgpr_workgroup_info 0
				.amdhsa_system_vgpr_workitem_id 0
				.amdhsa_exception_fp_ieee_invalid_op 0
				.amdhsa_exception_fp_denorm_src 0
				.amdhsa_exception_fp_ieee_div_zero 0
				.amdhsa_exception_fp_ieee_overflow 0
				.amdhsa_exception_fp_ieee_underflow 0
				.amdhsa_exception_fp_ieee_inexact 0
				.amdhsa_exception_int_div_zero 0
				.amdhsa_user_sgpr_private_segment_buffer 0
				.amdhsa_user_sgpr_dispatch_ptr 0
				.amdhsa_user_sgpr_queue_ptr 0
				.amdhsa_user_sgpr_kernarg_segment_ptr 0
				.amdhsa_user_sgpr_dispatch_id 0
				.amdhsa_user_sgpr_flat_scratch_init 0
				.amdhsa_user_sgpr_private_segment_size 0
				.end_amdhsa_kernel

llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-raw.s

This file was added.

				; RUN: llvm-mc %s -mattr=+code-object-v3 --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t1
				; RUN: llvm-objdump --disassemble-symbols=my_kernel.kd %t1 \
				; RUN: \| tail -n +8 \| llvm-mc --triple=amdgcn-amd-amdhsa -mcpu=gfx908 -filetype=obj -o %t2
				; RUN: llvm-objdump -s -j .text %t2 \| FileCheck --check-prefix=OBJDUMP %s

				;; Not running lit-test over gfx10 (see kd-zeroed-gfx10.s for details).
				;; kd-zeroed-raw.s and kd-zeroed-*.s should produce the same output for the
				;; kernel descriptor - a block of 64 zeroed bytes.

				;; The disassembly will produce the contents of kd-zeroed-*.s which on being
				;; assembled contains additional relocation info. A diff over the entire object
				;; will fail in this case. So we check by looking the bytes in .text.

				; OBJDUMP: 0000 00000000 00000000 00000000 00000000
				; OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000
				; OBJDUMP-NEXT: 0020 00000000 00000000 00000000 00000000
				; OBJDUMP-NEXT: 0030 00000000 00000000 00000000 00000000

				;; The entire object is zeroed out.

				.type my_kernel.kd, @object
				.size my_kernel.kd, 64
				my_kernel.kd:
				.long 0x00000000 ;; group_segment_fixed_size
				.long 0x00000000 ;; private_segment_fixed_size
				.quad 0x0000000000000000 ;; reserved bytes.
				.quad 0x0000000000000000 ;; kernel_code_entry_byte_offset, any value works.

				;; 20 reserved bytes.
				.quad 0x0000000000000000
				.quad 0x0000000000000000
				.long 0x00000000

				.long 0x00000000 ;; compute_PGM_RSRC3
				.long 0x00000000 ;; compute_PGM_RSRC1
				.long 0x00000000 ;; compute_PGM_RSRC2
				.short 0x0000 ;; additional fields.

				;; 6 reserved bytes.
				.long 0x0000000
				.short 0x0000

llvm/tools/llvm-objdump/llvm-objdump.cpp

Show First 20 Lines • Show All 1,847 Lines • ▼ Show 20 Lines	for (unsigned SI = 0, SE = Symbols.size(); SI != SE; ++SI) {

if (!PrintedSection) {		if (!PrintedSection) {
PrintedSection = true;		PrintedSection = true;
outs() << "\nDisassembly of section ";		outs() << "\nDisassembly of section ";
if (!SegmentName.empty())		if (!SegmentName.empty())
outs() << SegmentName << ",";		outs() << SegmentName << ",";
outs() << SectionName << ":\n";		outs() << SectionName << ":\n";
}		}

if (Obj->isELF() && Obj->getArch() == Triple::amdgcn) {
if (Symbols[SI].Type == ELF::STT_AMDGPU_HSA_KERNEL) {
// skip amd_kernel_code_t at the begining of kernel symbol (256 bytes)
Start += 256;
}
if (SI == SE - 1 \|\|
Symbols[SI + 1].Type == ELF::STT_AMDGPU_HSA_KERNEL) {
// cut trailing zeroes at the end of kernel
// cut up to 256 bytes
const uint64_t EndAlign = 256;
const auto Limit = End - (std::min)(EndAlign, End - Start);
while (End > Limit &&
reinterpret_cast<const support::ulittle32_t>(&Bytes[End - 4]) == 0)
End -= 4;
}
}

outs() << '\n';		outs() << '\n';
		kzhuravlUnsubmitted Done Reply Inline Actions Why is this commented out? kzhuravl: Why is this commented out?
		rochauhaAuthorUnsubmitted Done Reply Inline Actions `if (Symbols[SI].Type == ELF::STT_AMDGPU_HSA_KERNEL)` always evaluates to false. This is because there are two symbols for AMDGPU kernels: `<kernel>` => the proper symbol (`STT_AMDGPU_HSA_KERNEL`) `<kernel>$local` => the additional symbol of `STT_NOTYPE` Both these symbols point to the same location. Now `<kernel>` is skipped early on in this loop because it is at the beginning of the next symbol in the list `<kernel>$local` Additionally, this is symbol specific behavior and is being handled in onSymbolStart in the disassembler implementation. `if (SI == SE - 1 \|\| Symbols[SI + 1].Type == ELF::STT_AMDGPU_HSA_KERNEL)` The first part of this condition evaluates to true for the last symbol in a section. Kernel descriptors are usually appear the .rodata section of the binary. Last 6 bytes of the kernel descriptor are reserved and must be zero. Due to this condition, we end up trimming last 4 bytes of the kernel descriptor, making it 'invalid'. rochauha: `if (Symbols[SI].Type == ELF::STT_AMDGPU_HSA_KERNEL)` always evaluates to false. This is…
		madhur13490Unsubmitted Done Reply Inline Actions Can we not remove it? Why to keep commented in code base? madhur13490: Can we not remove it? Why to keep commented in code base?
if (!NoLeadingAddr)		if (!NoLeadingAddr)
outs() << format(Is64Bits ? "%016" PRIx64 " " : "%08" PRIx64 " ",		outs() << format(Is64Bits ? "%016" PRIx64 " " : "%08" PRIx64 " ",
SectionAddr + Start + VMAAdjustment);		SectionAddr + Start + VMAAdjustment);
if (Obj->isXCOFF() && SymbolDescription) {		if (Obj->isXCOFF() && SymbolDescription) {
outs() << getXCOFFSymbolDescription(Symbols[SI], SymbolName) << ":\n";		outs() << getXCOFFSymbolDescription(Symbols[SI], SymbolName) << ":\n";
} else		} else
outs() << '<' << SymbolName << ">:\n";		outs() << '<' << SymbolName << ">:\n";

▲ Show 20 Lines • Show All 1,127 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Support disassembly for AMDGPU kernel descriptorsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 296683

D80713.diff

llvm/include/llvm/Support/AMDHSAKernelDescriptor.h

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp

llvm/test/CodeGen/AMDGPU/nop-data.ll

llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-failure.s

llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-sgpr.s

llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-vgpr.s

llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-gfx10.s

llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-gfx9.s

llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-raw.s

llvm/tools/llvm-objdump/llvm-objdump.cpp

[AMDGPU] Support disassembly for AMDGPU kernel descriptors
ClosedPublic