Details
- Reviewers
arsenm
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
@arsenm I added the combine as described but codegen still doesn't look as good as what the original issue describes. I probably missed something but not sure what. Can you please advise?
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp | ||
---|---|---|
3222 | This is not correct because it will lose the overflow bit. You should probably only do this if the ADD has a single use. |
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp | ||
---|---|---|
3222 | Doing it if the add has a single use negates the purpose of the combine as it'll always have 2 uses in the cases we're interested in, but the second use is a trunc to i32. I've adapted the combine so it only does it when users are all truncs to i32, or the srl. |
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp | ||
---|---|---|
3212 | Could also accept 64-bit constants whose upper 32 bits are 0. |
Should also do the globalisel version, if we don't do new optimizations at the same time it will never catch up
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp | ||
---|---|---|
3205 | Should also move to generic code |
Seems to be correct https://alive2.llvm.org/ce/z/VN9-vU
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp | ||
---|---|---|
3205 | This is missing the extends in the input and output | |
3221 | Looking at uses is unusual and I'm not sure why you're doing it | |
llvm/test/CodeGen/AMDGPU/add_shr_carry.ll | ||
6 | Should precommit this test to show the diff |
llvm/test/CodeGen/AMDGPU/add_shr_carry.ll | ||
---|---|---|
50 | This second add is superfluous to the basic pattern https://alive2.llvm.org/ce/z/aLg_Ki |
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp | ||
---|---|---|
3221 | As mentioned below, the thinking is that this transform is not profitable unless every use either only wants the overflow bit, or only wants the low 32 bits of the 64 bit result. Otherwise you might as well keep the full 64 bit add. |
Rebase on D138104
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp | ||
---|---|---|
3205 | What do you mean "generic"? Not checking the types and instead check that the shift amount is 1/2 of the type's size in bits? | |
3221 | Indeed as Jay said, it's because the transformation is only profitable when the users only care about the lower 32 bits and the carry bit. |
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp | ||
---|---|---|
3205 | Yes, and move to DAGCombiner. You then just need to check that the target UADDO is legal or it's pre-legalize |
llvm/test/CodeGen/AMDGPU/add_shr_carry.ll | ||
---|---|---|
238–239 | Testcase with multiple uses? |
Right, I think the reviewers there asked to move it to InstCombine, but I think we also want it in the backend, right?
Perhaps it should be kept as a target-specific combine for now? It can always be moved later if needed
Combines in the backend are primarily for patterns that arise as the result of legalization. Looking at this again, I'm inclined to have it in instcombine primarily.
Perhaps it should be kept as a target-specific combine for now? It can always be moved later if needed
It definitely should be done generically
So this stack of diff should go and an instcombine implementation done instead?
D107552 was also in-flight for this. Should I take it over?
There was some discussion about it being a less understandable canonical form though
Should also move to generic code