If a call to the llvm.fabs intrinsic has users in another reachable
BB, SelectionDAG will not apply the abs modifier to these users and
instead generate a v_and ..., 0x7fffffff instruction.
For fneg instructions, the issue is similar.
This patch implements AMDGPUIselLowering::shouldSinkOperands,
which allows CodegenPrepare to call tryToSinkFreeOperands.
Details
- Reviewers
nhaehnle foad - Commits
- rG91a7aa4c9b29: [AMDGPU] Improve abs modifier usage
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
Whatever we end up doing for fabs, we should do the same for fneg and fneg-fabs. (But that doesn't have to be part of this patch.)
llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp | ||
---|---|---|
223–225 | Why do you need this special case at all? And why only for intrinsics? | |
232–234 | All users of Instructions are Instructions, so this cast will always succeed. (And anyway if you want a cast that can fail, you need to use dyn_cast.) | |
247 | Surely the "dominates" test can never fail, since IR uses SSA form? | |
253 | I don't think you need a special case for this. Any fabs(fabs(x)) - even in different basic blocks - should already have been simplified by generic IR optimizations. |
llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp | ||
---|---|---|
223–225 | Any useless case should have been optimized away already, so it is much cleaner if you do not add code to handle it specially here. |
Same should apply for fneg. This is also a candidate for generic CodeGenPrepare checked with isFAbsFree/isFNegFree
Generic CGP already has tryToSinkFreeOperands, don't know why this wouldn't already work for fabs/fneg
When I tested the implementation in AMD CGP, it did not work because the CSE pass did eliminate some of them again. I will double-check if I can make tryToSinkFreeOperands work similarly.
Implement shouldSinkOperands, remove old implementation from AMDGPULateCodegenPrepare.
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp | ||
---|---|---|
5148 ↗ | (On Diff #522203) | What does this do? Did you mean to check basic blocks here? In any case I am not sure this is required. |
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp | ||
---|---|---|
5148 ↗ | (On Diff #522203) | It checks whether one of the operands is registered twice, which appears like common practice amongst other implementations as well |
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp | ||
---|---|---|
5148 ↗ | (On Diff #522203) | Oh I see. Thanks. |
Looks good overall.
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp | ||
---|---|---|
5148 ↗ | (On Diff #522203) | Nit: I find this confusing because U and Op are both Uses but you only call get on one of them. I guess it relies on an implicit conversion for the other one. I think U->get() == Op.get() would be clearer. |
llvm/test/CodeGen/AMDGPU/andorbitset.ll | ||
56 ↗ | (On Diff #522203) | I guess we no longer form s_bitset because the s_and uses two different registers? That means that this is no longer testing the original bug from D85307. Maybe change the test to use and i32 undef, 0xBFFFFFFF (clear bit 30) instead of fabs, so that your new optimization won't spoil it? |
llvm/test/CodeGen/AMDGPU/andorbitset.ll | ||
---|---|---|
56 ↗ | (On Diff #522203) | Yes, this is, because two different SGPRs are being used. Thanks for the suggestion. I think it makes sense, but unfortunately, it doesn't work. In the original test, the fabs calls are kept until we reach isel - which is a bit strange, because the fabs calls are trivially redundant semantically. %i = call float @llvm.fabs.f32(float undef) #6 %i1 = bitcast float %i to i32 store i32 %i1, ptr addrspace(1) @gv, align 4 %0 = call float @llvm.fabs.f32(float undef) #6 However, when using and instead, the expressions are eliminated early. |
Regarding the lit test from andorbitset.ll, I don't think there's a simple way to preserve the bitset instruction with this change.
LGTM, thanks, but please try to simplify the code - I've added some ideas inline.
llvm/lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
7122 ↗ | (On Diff #522594) | Please commit this separately (no review required!) |
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp | ||
5151 ↗ | (On Diff #522594) | Nit: if you're using PatternMatch already, you can use m_FAbs here. |
5166 ↗ | (On Diff #522594) | Nit: might be simpler to return !Ops.empty()? |
Why do you need this special case at all? And why only for intrinsics?