This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add DAG Combine for right-shift carry add to uaddo
AbandonedPublic

Authored by Pierre-vh on Nov 9 2022, 2:47 AM.

Diff Detail

Event Timeline

Pierre-vh created this revision.Nov 9 2022, 2:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 9 2022, 2:47 AM
Pierre-vh requested review of this revision.Nov 9 2022, 2:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 9 2022, 2:47 AM

@arsenm I added the combine as described but codegen still doesn't look as good as what the original issue describes. I probably missed something but not sure what. Can you please advise?

foad added inline comments.Nov 9 2022, 3:15 AM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
3222

This is not correct because it will lose the overflow bit. You should probably only do this if the ADD has a single use.

Pierre-vh updated this revision to Diff 474217.Nov 9 2022, 3:38 AM

Fix combine

Pierre-vh marked an inline comment as done.Nov 9 2022, 3:39 AM
Pierre-vh added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
3222

Doing it if the add has a single use negates the purpose of the combine as it'll always have 2 uses in the cases we're interested in, but the second use is a trunc to i32. I've adapted the combine so it only does it when users are all truncs to i32, or the srl.

foad added inline comments.Nov 9 2022, 3:48 AM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
3212

Could also accept 64-bit constants whose upper 32 bits are 0.

Pierre-vh updated this revision to Diff 475410.Nov 15 2022, 3:16 AM
Pierre-vh marked 2 inline comments as done.

Comments

Should also do the globalisel version, if we don't do new optimizations at the same time it will never catch up

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
3205

Should also move to generic code

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
3205

This is missing the extends in the input and output

3221

Looking at uses is unusual and I'm not sure why you're doing it

llvm/test/CodeGen/AMDGPU/add_shr_carry.ll
6

Should precommit this test to show the diff

arsenm added inline comments.Nov 15 2022, 3:41 PM
llvm/test/CodeGen/AMDGPU/add_shr_carry.ll
50

This second add is superfluous to the basic pattern https://alive2.llvm.org/ce/z/aLg_Ki

foad added inline comments.Nov 15 2022, 10:55 PM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
3221

As mentioned below, the thinking is that this transform is not profitable unless every use either only wants the overflow bit, or only wants the low 32 bits of the 64 bit result. Otherwise you might as well keep the full 64 bit add.

Pierre-vh updated this revision to Diff 475723.Nov 16 2022, 1:00 AM
Pierre-vh marked 5 inline comments as done.

Rebase on D138104

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
3205

What do you mean "generic"? Not checking the types and instead check that the shift amount is 1/2 of the type's size in bits?

3221

Indeed as Jay said, it's because the transformation is only profitable when the users only care about the lower 32 bits and the carry bit.

arsenm added inline comments.Nov 16 2022, 9:22 AM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
3205

Yes, and move to DAGCombiner. You then just need to check that the target UADDO is legal or it's pre-legalize

arsenm added inline comments.Nov 16 2022, 9:23 AM
llvm/test/CodeGen/AMDGPU/add_shr_carry.ll
238–239

Testcase with multiple uses?

arsenm added inline comments.Nov 16 2022, 9:25 AM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
3214

Should try to short circuit the second known bits call if the first one fails the countMinLeadingZeros check

3222

There's no point in looking for multiple TRUNCATE users. Those would have been automagically CSEd

Apparently there is already an in flight version of this at D106139

Apparently there is already an in flight version of this at D106139

Right, I think the reviewers there asked to move it to InstCombine, but I think we also want it in the backend, right?
Perhaps it should be kept as a target-specific combine for now? It can always be moved later if needed

Apparently there is already an in flight version of this at D106139

Right, I think the reviewers there asked to move it to InstCombine, but I think we also want it in the backend, right?

Combines in the backend are primarily for patterns that arise as the result of legalization. Looking at this again, I'm inclined to have it in instcombine primarily.

Perhaps it should be kept as a target-specific combine for now? It can always be moved later if needed

It definitely should be done generically

Apparently there is already an in flight version of this at D106139

Right, I think the reviewers there asked to move it to InstCombine, but I think we also want it in the backend, right?

Combines in the backend are primarily for patterns that arise as the result of legalization. Looking at this again, I'm inclined to have it in instcombine primarily.

Perhaps it should be kept as a target-specific combine for now? It can always be moved later if needed

It definitely should be done generically

So this stack of diff should go and an instcombine implementation done instead?
D107552 was also in-flight for this. Should I take it over?
There was some discussion about it being a less understandable canonical form though