Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
| llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp | ||
|---|---|---|
| 40 | I'm not perfectly sure we do the right here. One concern is that m_ICst() seems to match signed values and another is that we ignore nuw/nsw flags. I tried to add a check for nuw, and that led to numerous test failures, which coupled with that we seem to use this code for non-scalar buffer loads suggests that it's probably a task of its own. If anyone can confirm that these concerns are valid, I'd like to add a TODO explaining them to this patch. | |
This affects 53 of our test shaders, reducing code size of each by ~100 bytes in average. No shaders with increased code size.
| llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp | ||
|---|---|---|
| 40 | m_ICst should just match constants, the sign is in the interpretation. nuw/nsw only potentially matters in cases where we're extending a 32-bit offset to 64-bits in one of the offsets that don't handle wrapping. I expect to match the DAG behavior for all of these addressing modes I don't think you should be trying to match constant offsets in either operand here. This is showing we're missing the standard canonicalization to move constants to the RHS for gMIR | |
| llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp | ||
|---|---|---|
| 40 | It seems the current approach is to rely on matchers that for commutative operations try both the cases. Updated to use them. | |
| llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
|---|---|---|
| 4886 | Does anything go wrong if you remove the Offset == 0 check? | |
| llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp | ||
|---|---|---|
| 40 | It's not a current approach, it's a missing feature. I'd rather not handle the commuted case and separately introduce a new combine to canonicalize constants to the RHS | |
| llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp | ||
|---|---|---|
| 40 | I don't mind to look into why we currently do it the way we do, but now that this patch doesn't do anything special to match both the cases, it seems it can be addressed separately and after this is landed? (And as I side note, I admit I struggle to see how this is not a current approach, considering that this is literally what's being done whenever we aim to match commutative nodes.) | |
| llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
| 4886 | The tests still pass (and that is what I would expect from reading relevant code), but we don't have many of them and GISel doesn't seem to be able to cope with the most of our test corpus shaders. Is there any benefit in removing the check? | |
| llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
|---|---|---|
| 4886 | 
 I have not fully understood the patch but it seems like it is more complicated than necessary because you insist that this offset has to be non-zero. For example I assume this is why you had to move the Offset && !SOffset case out of SelectSMRDBaseOffset and into SelectSMRD(?). So I am trying to understand where the requirement comes from. Is it because you want to be sure that if the offset is 0 then we select _SGPR instead of _SGPR_IMM forms of the instruction? If so, then I think this should be handled by ensuring that the patterns for the _SGPR forms have higher priority. (Perhaps this already happens, just because of the order the patterns appear in the .td file?) | |
| llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
|---|---|---|
| 4886 | I see. Yes, this makes sense. I've removed the check for the sake of having a chance to catch it if and when we messed up with the matching order. As of right now the ordering looks correct to me as the SGPR_IMM pattern is already the most complex one and thus I understand doesn't need playing with AddedComplexity. Regarding the Offset && !SOffset case, since we now use SelectSMRDBaseOffset() to match the SGPR_IMM form and for that form we don't want the invented zero offset to match the IMM part, that code has to be somewhere outside of this function. | |
| llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
|---|---|---|
| 4886 | 
 I still don't understand. If the _SGPR_IMM form exists then it is in no way inferior to the _SGPR form, so why not use it? (To put it another way, the _SGPR form is redundant on subtargets that have the _SGPR_IMM form - perhaps we should even use predicates to disable it on those subtargets.) | |
| llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
|---|---|---|
| 4886 | OK, that's another matter. One sort of reasons I can think of is to make the resulting code look more natural and to make life a bit easier reading and preparing tests, etc? So no obvious reasons not to use SGPR either, it seems? Trying to see a bigger picture, looks like in a better world we would have the same instruction with the non-zero offset part being syntactically optional and only available on certain subtargets. But where we have different instructions doing the same thing we normally use the one that looks least tricky and unexpected in the context? | |
| llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp | ||
|---|---|---|
| 40 | I don't know why I thought this was checking both operands; I see now this is looking through a copy for a constant. This is still also a pattern that should not reach the selector, and if it does it's demonstrating a combiner failure. The common case where we end up with a copy of a constant is for a VGPR use, which you don't have here so I don't think you need to worry about the copy here | |
| llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp | ||
|---|---|---|
| 40 | This patch doesn't add the copy matching either; just rewrites it to use mi_match(). Whether it's a combiner failure or a matcher issue as your FIXME above says, it doesn't seem to have anything to do with this patch? Removing the copy-matching bit leads to some test failures, so looks like it's there for a reason. Failed Tests (5): LLVM :: CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ds.gws.barrier.ll LLVM :: CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ds.gws.init.ll LLVM :: CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.s.buffer.load.ll LLVM :: CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.s.buffer.load.ll LLVM :: CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.s.buffer.load.mir | |
| llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp | ||
|---|---|---|
| 40 | Ping. | |
| llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp | ||
|---|---|---|
| 40 | There is some philosophical discussion about this here: https://discourse.llvm.org/t/globalisel-and-commutative-binops-in-pat/58115 | |
| llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp | ||
|---|---|---|
| 1886 | Would you mind updating this comment since there is no Imm argument. | |
| 1973 | Would you mind updating this comment since there is no Imm argument. | |
| 1989 | Committing most of this hunk as an NFC refactoring would have made the current patch easier to read. | |
| llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
| 4886 | It feels like the getType check could be an assertion? I can't see how it would fail, if the tablegen patterns are written correctly. | |
| llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp | ||
|---|---|---|
| 40 | What a useful finding is that discussion. Thanks for pointing out. Updated as suggested. | |
I'm going to commit this next Monday, if no objections. Should there be any further feedback on this as people get back to office, hopefully we'll be able to address it post-commit. Thanks for reviewing.
I'm not perfectly sure we do the right here. One concern is that m_ICst() seems to match signed values and another is that we ignore nuw/nsw flags. I tried to add a check for nuw, and that led to numerous test failures, which coupled with that we seem to use this code for non-scalar buffer loads suggests that it's probably a task of its own. If anyone can confirm that these concerns are valid, I'd like to add a TODO explaining them to this patch.