Use same atomicrmw fadd expansion rules for gfx908, gfx940 and gfx11
as for gfx90a. Add missing globalisel legalizer support for flat
atomicrmw fadd f32 on gfx940 and gfx11.
Isel support for gfx11 will be added in D130579.
Details
Diff Detail
Event Timeline
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
1346 | Using this predicate function instead of another ST method is inconsistent. Also can merge these two ifs into an or | |
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.global.atomic.fadd-with-ret.ll | ||
7 | Degrading the error is somewhat unfortunate, but we don't consistently do this anyway |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
12783–12786 | The flow of these options is getting too hard to follow. You're basically repeating all the same checks as above, so why not treat this as an independent case before the outer if? |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
12787 | I think both are still unsafe even when supported, on both targets. |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
12787 | I agree, these are unsafe. |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
12787 | In order to simplify all these conditions:
Second bullet is already implemented for gfx90a (MI200). summary of logic // mi300 if `subtarget == gfx940` and `subtarget has instruction` don't expand and return // mi100, mi200, gfx11+ if 'amdgpu-unsafe-fp-atomics=true' and 'scope is non-system' and `subtarget has instruction` don't expand and return expand |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
12787 | There will be variants of gfx940 where atomics are still unsafe, so probably better to use the same approach as for gfx90a. |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
12787 | I am not completely sure about system scope though, is that OK for gfx908 and gfx11? It is certainly unsafe in terms of denorm handling. |
Treat fadd rmw atomics on gfx908, gfx940 and gfx11+ the same way as for gfx90a.
Regression for gfx940 which now expands rmw in some cases that it did not used to.
Added tests to transforms for interesting cases that don't expand rmw.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
12786 | The logic below seems correct to me, but why dropping hasAtomicFaddNoRtnInsts check? It is a fast way to skip the whole block for the targets which do not have it at all. |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
1344 | Probably should factor this into a feature test |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
12804 | If I read it correctly, Subtarget->hasAtomicFaddNoRtnInsts() is always true here right? Because it is in the block: if ((AS == AMDGPUAS::GLOBAL_ADDRESS || AS == AMDGPUAS::FLAT_ADDRESS) && Subtarget->hasAtomicFaddNoRtnInsts()) { |
Probably should factor this into a feature test