Jul 15 2022
Jul 14 2022
Apr 25 2021
Apr 23 2021
The test in this patch runs with --symbolize-operands.
Oct 7 2020
Oct 6 2020
Previous reversions were due to test cases failing on big endian hosts.
This was due to multibyte values being laid out as-is from host memory.
Now https://reviews.llvm.org/D88858 addresses the above, and has landed.
Oct 5 2020
Sep 9 2020
Sep 8 2020
3 of the test cases - kd-sgpr.s, kd-vgpr.s, kd-zeroed-gfx10.s fail with the PowerPC buildbot (http://lab.llvm.org:8011/builders/clang-ppc64be-linux/builds/53608). From what I understand, these tests should only run if AMDGPU target is built. The lit.local.cfg file specifies that.
Changes based on comments by @jhenderson.
Sep 7 2020
Changes based on comments by @jhenderson
- Used llvm::cantFail(C.takeError()) to handle error.
- Removed blank lines at the end of test files.
- Streamlined beginning of comments.
- Used split-file tool to separate SGPR and VGPR test cases.
Aug 19 2020
Aug 18 2020
I have manually aligned the comment to the case statement before landing the patch. I think some investigation for clang-format may be necessary.
Aug 17 2020
Can I consider this patch to be at NFC stage?
LGTM, and can you please open a follow-up review once this is committed to hide all of the Object::get<Target><Stuff> methods?
Aug 15 2020
- Removed comment and braces.
Aug 14 2020
- Changes based on comments by @scott.linder.
- Updated tests and added a failure test.
- Checking error when cursor holds Error::success() while handling failure cases.
- Removed Size from subroutines' signatures as we aren't tracking the size.
Aug 8 2020
Based on comments and discussion, the difference for GFX9 is being handled using allocation granule sizes and no change is required.
- Changes based on review by @scott.linder
Aug 6 2020
- Pulled CPU string detection logic into ELFObjectFileBase.
- Added details in commit message.
To me, it doesn't look like this is 'real' target-specific work. For example the target triple is detected within llvm-objdump itself using the getTarget() function. I'd like to understand - what is different in case CPU string detection that it needs to reside somewhere else?
Aug 5 2020
I am a bit worrisome that this patch is adding more target specific stuff to llvm-objdump.
Agreed, I have looked at making the majority of llvm-objdump (and essentially everything under tools/) into a library at one point, but the biggest initial hurdle is the CommandLine parsing library. I think ideally all of the target-relevant parts would be in target-overridden implementations in some library, all of the llvm-objdump behavior parts (like the core symbol-to-symbol loop) would be in an llvm-objdump library which depends on the libraries with target-specific behaviors, and the tools/ directory would contain ~10 line main functions.
Can we step back and think whether precise CPU selection is really necessary? For example, PowerPC has the needs for different ISA levels but it simply uses -mcpu=future.
I am a bit lost on what you mean; is there a significant cost to providing precise CPU selection? It doesn't make sense to require the user to e.g. run the dumper to inspect the binary in order to come up with the command-line to run the dumper in order to inspect the binary.
If different AMDGPU CPUs are incompatible, should such detection be moved to somewhere in lib/Target/AMDGPU/?
Yes, I think it should be done in a generic way. All other targets which don't need/want it just getting the default (existing) behavior.
Jul 28 2020
- Got rid of raw number and added an enum for offsets.
- Added new tests.
- Updated the nop-data.ll test so that it passes with this patch.
Jul 25 2020
Some changes based on review by @scott.linder.
I discussed with Tony today, and I was thinking about this the wrong way.
SPI does not require the granule count to be even, it just rounds up the granule count before actually performing the allocation. This means, from the compiler's perspective, when it is calculating things like the AMDGPU::IsaInfo::getMaxNumSGPRs it must consider the "allocation" granule size (IsaInfo::getSGPRAllocGranule). Conversely, from the assembler/diassembler perspective, it must consider the "encoding" granule size (IsaInfo::getSGPREncodingGranule). It is perfectly OK to have a GFX9 code object with a granulated SGPR count of 1, and we should allow emitting that in the assembler so that the disassembler can accurately reproduce those code objects.
I don't think there is any fix needed here, we already separate these two concepts and correctly apply them elsewhere. I think I just led you astray in the disassembly patch; you should only be using the encoding granule size, and shouldn't need any special handling for e.g. GFX9 to handle the fact that the allocation and encoding granule sizes are not equal.
Jul 24 2020
Jul 23 2020
Added missing std::max.
Jul 22 2020
Updated patch based on comments.
Updated old tests.
Added new test.
Jul 21 2020
Switched to disassembling as numerical value rather than .amdgcn.next_free_sgpr.
Jul 20 2020
Jul 15 2020
I'm not sure I follow. Are you saying you need information from the note section to disassemble other sections? Or are you saying you need it to disassemble the note section? If the former, there's already code in the ELFDumper of llvm-readobj for parsing AMDGPU note sections, which you could reuse (it would need moving somewhere outside the llvm-readobj tool into one of the libraries, maybe in the ELFObjectFile stuff, I'm not sure). If the latter, there's still opposition to adding the functionality to the disassembler (the previous concerns haven't been addressed), given that llvm-readobj already provides this functionality.
Jul 14 2020
Jul 13 2020
Changes as per review by @jhenderson.
Jul 7 2020
- Return MCDisassembler::Fail for code object v2.
- Add missing full stops in doxygen comments.
- Handled default statement to silence the warning.
- Expanded comments for decodeCOMPUTE_PGM_RSRC1 and decodeCOMPUTE_PGM_RSRC2.
- Removed extra comment at the end of functions.
- Changed SoftFail to Success for code object v2.
- Replaced the old test case with a small assembly file.
Jul 6 2020
- Compute .amdhsa_next_free_vgpr based on inverse of what the assembler does to compute GRANULATED_WORKITEM_VGPR_COUNT.
- Some changes to accomodate differences between GFX9 and GFX10
- Updated test case for GFX10 as well
Jul 3 2020
Size = 64 regardless of success or failure.
- Fixed code-object-v3 lit test failure introduced in the previous change to this patch.
- More changes as per inline comments.
Jul 2 2020
- Used macro to shorten printing directives
- Replaced "\n" with '\n'
- Removed unnecessary bit shifts for bits that must be 0
- Removed extra comments for bits that must be 0
Updating the code bit by bit based on comments by @scott.linder.
Jun 30 2020
I'd also like to point out another fact. There is already some feature overlap between llvm-readobj and llvm-objdump - like printing symbol information using --symbols and --syms respectively. Another such example is using the flag --section-headers in both tools to print section headers.
- Having multiple tools to do the same job is not a good idea - each requires its own maintenance, the behaviour can diverge, bugs might require fixing in two places/support for new things etc etc etc. In an ideal world, we'd merge all the binary tools (GNU and LLVM) into a single tool, or redistribute functionality somehow, so that we don't have duplicate functionality like we already do. This takes us further away from that ideal.
I agree that having a single tool is the direction we must aim for. But to do so, one tool needs to be improved to the point that it is 'feature complete'. llvm-objdump already disassembles all contents of the binary. It's just that everything is disassembled as instructions. Even notes are disassembled as instructions today. I am not 'adding' anything new; just trying to 'correct' the existing output. Targets will still need to do implement things from their side(if needed) to take advantages of the infrastructure changes.
The initial plan would be to have note record handling in the MC layer. llvm-objdump will just iterate over the notes section. For each note record it will query the registered targets. The owning target will appropriately disassemble the bytes. A note record must be disassembled using the .byte directive if no target owns the note / printing for a particular kind of note is not implemented.
Could you more clearly clarify what the motivation here is? Is this working towards some missing GNU compatibility or similar? If so, could you paste an example of what GNU objdump produces, please.
Note that .note sections can be dumped using llvm-readelf in GNU style, just not as part of the disassembly.
Jun 29 2020
Added new test case.
Jun 26 2020
Following tests fail:
LLVM :: CodeGen/AMDGPU/call-encoding.ll
LLVM :: CodeGen/AMDGPU/s_code_end.ll
LLVM :: MC/AMDGPU/branch-comment.s
LLVM :: MC/AMDGPU/data.s
LLVM :: MC/AMDGPU/labels-branch-gfx9.s
LLVM :: MC/AMDGPU/labels-branch.s
LLVM :: MC/AMDGPU/offsetbug_once.s
LLVM :: MC/AMDGPU/offsetbug_one_and_one.s
LLVM :: MC/AMDGPU/offsetbug_twice.s
LLVM :: MC/AMDGPU/s_endpgm.s
LLVM :: Object/AMDGPU/objdump.s
LLVM :: tools/llvm-cov/ignore-filename-regex.test
LLVM :: tools/llvm-objdump/ELF/AMDGPU/source-lines.ll
The problem is that you're trying to do it all at once. There's no need to implement the full disassembly at once. You can do it in a series of patches, one after the other that build towards the primary goal. For example, there are many if (<X is some value>) type cases, which you could omit - just assume those are all false, in earlier versions, and add them in (with corresponding testing) in a later patch. Similarly, you could assume that all results of x & y return some specific value, e.g. 0, and just print that for now. Yes, that means you won't support everything from the point at which this patch lands, but it will make each individual patch easier to reason with. This fits much better with LLVM's preferred approach - please see https://llvm.org/docs/DeveloperPolicy.html#incremental-development.
- Updated to use Cursor. Now there isn't any neet to maintain CurrentIndex.
- Other small changes related to coding conventions.
Jun 23 2020
Minor change to the comment about Size.
No idea about the functional logic of this code, but I do wonder whether you'd be better off dividing it into smaller patches for testability.
- Reverse the order of function definitions.
- std::string => StringRef.
- std::stringstream => raw_string_ostream.
- Full stops for comments.
- Set AddressSize = 8.
- Remove error handling when reading bytes with DataExtractor since we already check for Bytes.size() == 64 beforehand.