Page MenuHomePhabricator

AMDGPU: Use tablegen patterns for buffer global and flat atomic fadd
ClosedPublic

Authored by Petar.Avramovic on Jul 26 2022, 8:07 AM.

Details

Summary

Remove manual selection for atomic fadd from global-isel.
Stop pre-isel translation to AtomicLoadFAdd/G_ATOMICRMW_FADD
which corresponds to llvm-ir's atomicrmw fadd instruction.

global and flat atomic fadd patterns changes:
Split rtn/no-rtn patterns
Add missing patterns or fix predicates
remove atomicrmw patterns for v2f16 (atomic rmw doesn't support vectors)
Patterns now check addrspace of pointer, added patterns for flat intrinsic
with global addrspace pointer that selects into global atomic instruction.

buffer atomic fadd patterns changes:
edit patterns to import into global-isel
remove gfx6/gfx7 _addr64 and _offset patterns
remove patterns that can't be reached (same pattern but different feature)

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptJul 26 2022, 8:07 AM
Petar.Avramovic requested review of this revision.Jul 26 2022, 8:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 26 2022, 8:07 AM
llvm/lib/Target/AMDGPU/FLATInstructions.td
1462–1463

Predicates are edited to match predicates for atomicrmw fadd (_NO_RTN)
see llvm/test/CodeGen/AMDGPU/llvm.amdgcn.atomic.fadd.ll global_atomic_fadd_f32_wrong_subtarget

I left v2f16_rtn, f64_no_rtn, f64_rtn patterns in isGFX90APlus, should each of them get feature for itself?

Here is the list of predicates and atomic global fadd instructions they have
HasAtomicFaddRtnInsts: f32_rtn
HasAtomicFaddNoRtnInsts: f32_no_rtn
HasAtomicPkFaddNoRtnInsts: v2f16_no_rtn
isGFX90APlus: v2f16_rtn, f64_no_rtn, f64_rtn

GFX90A has all of the above

Petar.Avramovic edited the summary of this revision. (Show Details)Jul 26 2022, 8:10 AM
foad added a comment.Jul 26 2022, 8:26 AM

Please add some codegen tests for flat_atomic_add_f32. Also I can't see any buffer_atomic_add_f32 tests that run on GFX11.

About flat_atomic_add, there are some tests for gfx9, gfx908 and gfx90a. I will test them for never targets also.

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.atomic.fadd.rtn_no-rtn.ll
3 ↗(On Diff #447714)

this tests all buffer_atomic_add_f32 patterns for gfx11

abinavpp added inline comments.Jul 27 2022, 1:26 AM
llvm/lib/Target/AMDGPU/BUFInstructions.td
1588

Can we remove the PredicateCode and GISelPredicateCode above?

llvm/lib/Target/AMDGPU/FLATInstructions.td
1035

Since you're splitting the ret and noret patterns to their own multiclasses, I think you can remove the complexity argument here and from FlatSignedAtomicPatImplRtn and use let AddedComplexity = ... in in the defms.

arsenm added inline comments.Jul 27 2022, 5:44 AM
llvm/lib/Target/AMDGPU/FLATInstructions.td
1035

Complexity argument is weird, should put in a let block around the instances

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.atomic.fadd.rtn_no-rtn.ll
6 ↗(On Diff #447714)

Why are these checks so sparse? Why isn't this test generated like the others?

added detailed mir tests for global/flat atomic fadd f32 on gfx11. Covers all patterns from td files. MI subtargets are already covered.

llvm/lib/Target/AMDGPU/FLATInstructions.td
1265–1266

these added complexity numbers are calculated by refactoring. Although I think it would be enough to have AddedComplexity = 1 on nortn patterns since they no longer have a way to get higher complexity then ret pattern (we no longer use complex patterns for no ret checks).
Is there some desired precedence for regular vs saddr pattern?

1469

Strange name: atomic_load_fadd_v2f16_global_noret_32.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
12763–12764

I wanted to avoid lowering for gfx11, I am not sure if this is the correct place.

foad added a reviewer: Restricted Project.Aug 2 2022, 5:44 AM

Overall I think this looks good. Can you precommit all the changes in test/ (consider it pre approved) and then rebase this patch?

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
1344

Apparently gfx90a has flat_atomic_add for f64 (but not f32 or v2f16). Are there any tests for that?

I will to put tests in precommit but some will have to be excluded since they fail to select

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
1344

yes, there are a few but atomic gets lowered in ir. I will try to fix that.

arsenm added inline comments.Aug 2 2022, 7:41 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
12763–12764

Should be based on a subtarget feature check, or at least the getGeneration() query on the subtarget. This also wouldn't belong in a change that's just supposed to stop using manual selection

Petar.Avramovic retitled this revision from AMDGPU: Remove manual selection for atomic fadd to AMDGPU: Use tablegen patterns for buffer global and flat atomic fadd.
Petar.Avramovic edited the summary of this revision. (Show Details)

Checked inc files, removed more patterns that can't be used
Tests for all patterns are in precommit
rmw lowering is moved to another patch

arsenm accepted this revision.Sep 15 2022, 9:51 AM

LGTM

llvm/lib/Target/AMDGPU/FLATInstructions.td
1436

Can drop the Intr from the multiclass name for consistency

llvm/test/CodeGen/AMDGPU/GlobalISel/flat-atomic-fadd-f32.ll
5 ↗(On Diff #451426)

I'm assuming this really does exist on gfx11

This revision is now accepted and ready to land.Sep 15 2022, 9:51 AM
foad added inline comments.Sep 16 2022, 1:37 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/flat-atomic-fadd-f32.ll
5 ↗(On Diff #451426)

Yes GFX11 has FLAT_ATOMIC_ADD_F32.

Petar.Avramovic planned changes to this revision.Sep 16 2022, 6:22 AM

There are some conflicts with https://reviews.llvm.org/D130729, looks like an error to me.
Intrinsic patterns don't check address space (they will take any pointer)

define amdgpu_ps float @flat_atomic_fadd_f32_rtn_intrinsic(float addrspace(1)* %ptr, float %data) {
  %ret = call float @llvm.amdgcn.flat.atomic.fadd(float addrspace(1)* %ptr, float %data)
  ret float %ret
}

declare float @llvm.amdgcn.flat.atomic.fadd(float addrspace(1)*, float)


define amdgpu_ps float @global_atomic_fadd_f32_rtn_intrinsic(float* %ptr, float %data) {
  %ret = call float @llvm.amdgcn.global.atomic.fadd(float* %ptr, float %data)
  ret float %ret
}

declare float @llvm.amdgcn.global.atomic.fadd(float*, float)

tests like this will select %ptr ignoring it is from wrong address space. I would expect tests like this to fail to select.
Comments and behavior from D130729 are based on translation of intrinsic to atomic rmw which are selected based on address space of %ptr (removed in this patch, intrinsics have separate patterns).

Planed fix:
intrinsic patterns should check address space, failing to select when pointer argument has wrong address space (I assume this is possible to do in tablegen).
D130729 will also change intrinsic id, when it changes pointer.

Any comments?

There are some conflicts with https://reviews.llvm.org/D130729, looks like an error to me.
Intrinsic patterns don't check address space (they will take any pointer)

define amdgpu_ps float @flat_atomic_fadd_f32_rtn_intrinsic(float addrspace(1)* %ptr, float %data) {
  %ret = call float @llvm.amdgcn.flat.atomic.fadd(float addrspace(1)* %ptr, float %data)
  ret float %ret
}

declare float @llvm.amdgcn.flat.atomic.fadd(float addrspace(1)*, float)


define amdgpu_ps float @global_atomic_fadd_f32_rtn_intrinsic(float* %ptr, float %data) {
  %ret = call float @llvm.amdgcn.global.atomic.fadd(float* %ptr, float %data)
  ret float %ret
}

declare float @llvm.amdgcn.global.atomic.fadd(float*, float)

tests like this will select %ptr ignoring it is from wrong address space. I would expect tests like this to fail to select.
Comments and behavior from D130729 are based on translation of intrinsic to atomic rmw which are selected based on address space of %ptr (removed in this patch, intrinsics have separate patterns).

Planed fix:
intrinsic patterns should check address space, failing to select when pointer argument has wrong address space (I assume this is possible to do in tablegen).
D130729 will also change intrinsic id, when it changes pointer.

Any comments?

The flat intrinsic with a global pointer is perfectly fine (and we recently started optimizing the address space for these in 20cf170e68def39dc50b59847afb8d9ab445703d). The global intrinsic with a flat pointer is more of a grey area and probably shouldn't select

foad added a comment.Sep 16 2022, 6:33 AM

There are some conflicts with https://reviews.llvm.org/D130729, looks like an error to me.
Intrinsic patterns don't check address space (they will take any pointer)

define amdgpu_ps float @flat_atomic_fadd_f32_rtn_intrinsic(float addrspace(1)* %ptr, float %data) {
  %ret = call float @llvm.amdgcn.flat.atomic.fadd(float addrspace(1)* %ptr, float %data)
  ret float %ret
}

declare float @llvm.amdgcn.flat.atomic.fadd(float addrspace(1)*, float)


define amdgpu_ps float @global_atomic_fadd_f32_rtn_intrinsic(float* %ptr, float %data) {
  %ret = call float @llvm.amdgcn.global.atomic.fadd(float* %ptr, float %data)
  ret float %ret
}

declare float @llvm.amdgcn.global.atomic.fadd(float*, float)

tests like this will select %ptr ignoring it is from wrong address space. I would expect tests like this to fail to select.
Comments and behavior from D130729 are based on translation of intrinsic to atomic rmw which are selected based on address space of %ptr (removed in this patch, intrinsics have separate patterns).

Planed fix:
intrinsic patterns should check address space, failing to select when pointer argument has wrong address space (I assume this is possible to do in tablegen).
D130729 will also change intrinsic id, when it changes pointer.

Any comments?

The flat intrinsic with a global pointer is perfectly fine (and we recently started optimizing the address space for these in 20cf170e68def39dc50b59847afb8d9ab445703d). The global intrinsic with a flat pointer is more of a grey area and probably shouldn't select

Petar, just to explain, this is because the global addr space is a subset of the flat addr space, so any global pointer is also a valid flat pointer (but not vice versa). Given a global pointer it is OK to select a FLAT_ instruction, but it would still be better to select a GLOBAL_ instruction if one is available.

then do I need to add additional sets of patterns for flat_intrinsic with global addr assuming we add checks for address space in each intrinsic pattern
global_intrinsic + addr_space1 -> global_atomic
flat_intrinsic + addr_space0 -> flat_atomic
flat_intrinsic + addr_space1 -> global_atomic

then do I need to add additional sets of patterns for flat_intrinsic with global addr assuming we add checks for address space in each intrinsic pattern
global_intrinsic + addr_space1 -> global_atomic
flat_intrinsic + addr_space0 -> flat_atomic
flat_intrinsic + addr_space1 -> global_atomic

Yes, this should be the current behavior. Arguably flat intrinsic + addrspace1 should select to flat_atomic but I don't think it matters much. We should probably be swapping the intrinsic instead in the address space optimization

Added type checks to address spaces using same method as rmw atomics (made new PatFrags with let IsAtomic = 1; and let AddressSpaces = ... ).
Intrinsic patterns use these (I think they are called PatFrags), there are some additional tests for 'flat intrinsic' + 'global address' -> global atomic instruction

This revision is now accepted and ready to land.Sep 21 2022, 3:30 PM
Petar.Avramovic edited the summary of this revision. (Show Details)
Petar.Avramovic requested review of this revision.Sep 22 2022, 5:12 AM
arsenm accepted this revision.Sep 22 2022, 5:52 AM
This revision is now accepted and ready to land.Sep 22 2022, 5:52 AM
thakis added a subscriber: thakis.Sep 23 2022, 12:06 PM

This broke check-clang everywhere, see e.g. https://lab.llvm.org/buildbot/#/builders/109/builds/47270 (or most other bots on https://lab.llvm.org/buildbot/#/console) or http://45.33.8.238/macm1/45085/step_7.txt (or all other bots on http://45.33.8.238/).

Please take a look and revert for now if it takes a while to fix.

I am aware, looking into it