This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Improve atomicrmw fadd selection
ClosedPublic

Authored by Petar.Avramovic on Aug 10 2022, 5:42 AM.

Details

Summary

Use same atomicrmw fadd expansion rules for gfx908, gfx940 and gfx11
as for gfx90a. Add missing globalisel legalizer support for flat
atomicrmw fadd f32 on gfx940 and gfx11.
Isel support for gfx11 will be added in D130579.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptAug 10 2022, 5:42 AM
Petar.Avramovic requested review of this revision.Aug 10 2022, 5:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 10 2022, 5:42 AM
arsenm added reviewers: rampitec, Restricted Project.Sep 15 2022, 3:45 PM
arsenm added inline comments.Sep 15 2022, 3:48 PM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
1346

Using this predicate function instead of another ST method is inconsistent. Also can merge these two ifs into an or

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.global.atomic.fadd-with-ret.ll
7

Degrading the error is somewhat unfortunate, but we don't consistently do this anyway

arsenm added inline comments.Sep 15 2022, 3:53 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
12783–12788

The flow of these options is getting too hard to follow. You're basically repeating all the same checks as above, so why not treat this as an independent case before the outer if?

rampitec added inline comments.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
12789

I think both are still unsafe even when supported, on both targets.

b-sumner added inline comments.Sep 15 2022, 6:17 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
12789

I agree, these are unsafe.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
12789

In order to simplify all these conditions:
Obvious first requirement is that subtarget needs to have instruction.

  • for gfx940 (MI300) having instruction is enough
  • for all other subtargets fadd is unsafe and can be selected only when
    1. function has amdgpu-unsafe-fp-atomics=true
    2. and atomic rmw needs to be non-system scope and not "one-as"

Second bullet is already implemented for gfx90a (MI200).
If I understand this correctly, fadd atomic rmw for gfx908(MI100) and gfx11 is unsafe like for MI200 and can be selected under same conditions.

summary of logic

// mi300
if `subtarget == gfx940` and `subtarget has instruction`
  don't expand and return

// mi100, mi200, gfx11+
if 'amdgpu-unsafe-fp-atomics=true' and 'scope is non-system' and `subtarget has instruction`
  don't expand and return

expand
b-sumner added inline comments.Sep 16 2022, 7:26 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
12789

There will be variants of gfx940 where atomics are still unsafe, so probably better to use the same approach as for gfx90a.

rampitec added inline comments.Sep 16 2022, 11:13 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
12789

I am not completely sure about system scope though, is that OK for gfx908 and gfx11? It is certainly unsafe in terms of denorm handling.

Treat fadd rmw atomics on gfx908, gfx940 and gfx11+ the same way as for gfx90a.
Regression for gfx940 which now expands rmw in some cases that it did not used to.
Added tests to transforms for interesting cases that don't expand rmw.

rampitec added inline comments.Sep 21 2022, 3:56 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
12787–12788

The logic below seems correct to me, but why dropping hasAtomicFaddNoRtnInsts check? It is a fast way to skip the whole block for the targets which do not have it at all.

Petar.Avramovic edited the summary of this revision. (Show Details)

LGTM for the expansion part.

arsenm accepted this revision.Sep 22 2022, 9:35 AM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
1344

Probably should factor this into a feature test

This revision is now accepted and ready to land.Sep 22 2022, 9:35 AM

Use hasFlatAtomicFaddF32Inst feature check.

arsenm accepted this revision.Sep 23 2022, 8:45 AM
This revision was automatically updated to reflect the committed changes.
tianshilei1992 added inline comments.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
12805

If I read it correctly, Subtarget->hasAtomicFaddNoRtnInsts() is always true here right? Because it is in the block:

if ((AS == AMDGPUAS::GLOBAL_ADDRESS || AS == AMDGPUAS::FLAT_ADDRESS) &&
    Subtarget->hasAtomicFaddNoRtnInsts()) {
nhaehnle removed a subscriber: nhaehnle.Oct 6 2022, 2:47 AM