This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Improve abs modifier usage
ClosedPublic

Authored by tsymalla on May 11 2023, 1:12 AM.

Details

Summary

If a call to the llvm.fabs intrinsic has users in another reachable
BB, SelectionDAG will not apply the abs modifier to these users and
instead generate a v_and ..., 0x7fffffff instruction.
For fneg instructions, the issue is similar.
This patch implements AMDGPUIselLowering::shouldSinkOperands,
which allows CodegenPrepare to call tryToSinkFreeOperands.

Diff Detail

Event Timeline

tsymalla created this revision.May 11 2023, 1:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2023, 1:12 AM
tsymalla requested review of this revision.May 11 2023, 1:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2023, 1:12 AM
foad added a comment.May 12 2023, 3:33 AM

Whatever we end up doing for fabs, we should do the same for fneg and fneg-fabs. (But that doesn't have to be part of this patch.)

llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp
223–225 ↗(On Diff #521223)

Why do you need this special case at all? And why only for intrinsics?

232–234 ↗(On Diff #521223)

All users of Instructions are Instructions, so this cast will always succeed. (And anyway if you want a cast that can fail, you need to use dyn_cast.)

247 ↗(On Diff #521223)

Surely the "dominates" test can never fail, since IR uses SSA form?

253 ↗(On Diff #521223)

I don't think you need a special case for this. Any fabs(fabs(x)) - even in different basic blocks - should already have been simplified by generic IR optimizations.

tsymalla added inline comments.May 12 2023, 5:36 AM
llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp
223–225 ↗(On Diff #521223)

We don't really need to handle it, but it is useless to clone (for instance) an fabs call with an undef operand.

232–234 ↗(On Diff #521223)

Thanks. I have overlooked that one.

253 ↗(On Diff #521223)

Makes sense. Thanks.

foad added inline comments.May 12 2023, 6:24 AM
llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp
223–225 ↗(On Diff #521223)

Any useless case should have been optimized away already, so it is much cleaner if you do not add code to handle it specially here.

tsymalla updated this revision to Diff 522070.May 15 2023, 12:25 AM
tsymalla marked 4 inline comments as done.

Simplify checks a bit.

Same should apply for fneg. This is also a candidate for generic CodeGenPrepare checked with isFAbsFree/isFNegFree

Generic CGP already has tryToSinkFreeOperands, don't know why this wouldn't already work for fabs/fneg

Generic CGP already has tryToSinkFreeOperands, don't know why this wouldn't already work for fabs/fneg

When I tested the implementation in AMD CGP, it did not work because the CSE pass did eliminate some of them again. I will double-check if I can make tryToSinkFreeOperands work similarly.

Generic CGP already has tryToSinkFreeOperands, don't know why this wouldn't already work for fabs/fneg

When I tested the implementation in AMD CGP, it did not work because the CSE pass did eliminate some of them again. I will double-check if I can make tryToSinkFreeOperands work similarly.

Probably doesn't work because we don't override shouldSinkOperands

tsymalla updated this revision to Diff 522203.May 15 2023, 8:21 AM

Implement shouldSinkOperands, remove old implementation from AMDGPULateCodegenPrepare.

foad added inline comments.May 15 2023, 8:28 AM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
5147

What does this do? Did you mean to check basic blocks here? In any case I am not sure this is required.

tsymalla added inline comments.May 15 2023, 8:40 AM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
5147

It checks whether one of the operands is registered twice, which appears like common practice amongst other implementations as well

foad added inline comments.May 15 2023, 8:45 AM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
5147

Oh I see. Thanks.

tsymalla edited the summary of this revision. (Show Details)May 16 2023, 4:47 AM
foad added a comment.May 16 2023, 6:02 AM

Looks good overall.

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
5147

Nit: I find this confusing because U and Op are both Uses but you only call get on one of them. I guess it relies on an implicit conversion for the other one. I think U->get() == Op.get() would be clearer.

llvm/test/CodeGen/AMDGPU/andorbitset.ll
56

I guess we no longer form s_bitset because the s_and uses two different registers? That means that this is no longer testing the original bug from D85307. Maybe change the test to use and i32 undef, 0xBFFFFFFF (clear bit 30) instead of fabs, so that your new optimization won't spoil it?

tsymalla marked an inline comment as done.May 16 2023, 6:35 AM
tsymalla added inline comments.
llvm/test/CodeGen/AMDGPU/andorbitset.ll
56

Yes, this is, because two different SGPRs are being used. Thanks for the suggestion. I think it makes sense, but unfortunately, it doesn't work. In the original test, the fabs calls are kept until we reach isel - which is a bit strange, because the fabs calls are trivially redundant semantically.

%i = call float @llvm.fabs.f32(float undef) #6
%i1 = bitcast float %i to i32
store i32 %i1, ptr addrspace(1) @gv, align 4
%0 = call float @llvm.fabs.f32(float undef) #6

However, when using and instead, the expressions are eliminated early.

tsymalla updated this revision to Diff 522594.May 16 2023, 6:55 AM

Fix nit
Change test order a bit to prevent BB renaming change

Regarding the lit test from andorbitset.ll, I don't think there's a simple way to preserve the bitset instruction with this change.

foad added a comment.May 18 2023, 1:56 AM

LGTM, thanks, but please try to simplify the code - I've added some ideas inline.

llvm/lib/CodeGen/CodeGenPrepare.cpp
7122 ↗(On Diff #522594)

Please commit this separately (no review required!)

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
5150

Nit: if you're using PatternMatch already, you can use m_FAbs here.

5165

Nit: might be simpler to return !Ops.empty()?

tsymalla updated this revision to Diff 523700.May 19 2023, 2:18 AM

Simplified code

foad accepted this revision.May 19 2023, 2:21 AM

Thanks!

This revision is now accepted and ready to land.May 19 2023, 2:21 AM
This revision was landed with ongoing or failed builds.May 19 2023, 3:02 AM
This revision was automatically updated to reflect the committed changes.