This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add llvm.amdgcn.wave.reduce.umin/umax Intrinsic.
ClosedPublic

Authored by pravinjagtap on Jul 10 2023, 9:05 AM.

Details

Summary

When input to intrinsic is uniform value, reduced value is
same as input whereas if input value is divergent we need
to iterate over all the active lane to perform the reduction.

The control flow for a loop has been set up, which
iterates over only active lanes to perform reduction.

Introduced WAVE_REDUCE_UMIN_PSEUDO_U32 and
WAVE_REDUCE_UMAX_PSEUDO_U32 Pseudos which
are lowered Post-ISel (in EmitInstrWithCustomInserter ).

Diff Detail

Event Timeline

pravinjagtap created this revision.Jul 10 2023, 9:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 10 2023, 9:05 AM
pravinjagtap requested review of this revision.Jul 10 2023, 9:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 10 2023, 9:05 AM

I was thinking an IR expansion would be easier, but it's good to have a machine one (at least for umin)

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1928

Should have a mangled type. I also think it should have an immarg operand for the preferred lowering strategy to use. Also, wave_reduce?

Also umin would be a better choice for a first one, given that we want it for dynamic alloca handling

llvm/lib/Target/AMDGPU/SILowerReduceAndScanPseudo.cpp
1 ↗(On Diff #538679)

Missing header

87 ↗(On Diff #538679)

There's supposed to be a getWaveRegClass to go through

172 ↗(On Diff #538679)

This doesn't need to be a separate pass, can be a post isel hook

llvm/test/CodeGen/AMDGPU/llvm.amdgpu.reduce.ll
2 ↗(On Diff #538679)

test with -global-isel=1/0

19 ↗(On Diff #538679)

Also add poison and constant tests

25 ↗(On Diff #538679)

Also add a test where this is under divergent control flow

52 ↗(On Diff #538679)

Should strip out most of this test

73 ↗(On Diff #538679)

Drop this, it's redundant with the run line target and breaks adding multiple run targets

Can add MIR tests.

llvm/lib/Target/AMDGPU/SILowerReduceAndScanPseudo.cpp
8 ↗(On Diff #538679)

Some description about what the pass will do? Or function comment if this is not implemented as a pass.

45–48 ↗(On Diff #538679)
INITIALIZE_PASS(SIExpandReduceAndScanPseudo, DEBUG_TYPE,
                      "Expand Reduction and Scan Pseudos", false, false)
pravinjagtap added inline comments.Jul 10 2023, 11:44 PM
llvm/lib/Target/AMDGPU/SILowerReduceAndScanPseudo.cpp
172 ↗(On Diff #538679)

Are you referring to EmitInstrWithCustomInserter API where other PSEUDOs are expanded ?

arsenm added inline comments.Jul 11 2023, 5:58 PM
llvm/lib/Target/AMDGPU/SILowerReduceAndScanPseudo.cpp
172 ↗(On Diff #538679)

Yes, that's generally where the pseudos to hack around the DAG not handling control flow go

pravinjagtap retitled this revision from [WIP] [AMDGPU] Add llvm.amdgcn.reduce.add Intrinsic. to [WIP] [AMDGPU] Add llvm.amdgcn.wave.reduce.umin Intrinsic..

Addressed review comments @arsenm.

Implemented umin using post isel hook

pravinjagtap added inline comments.Jul 12 2023, 4:07 AM
llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1928

I also think it should have an immarg operand for the preferred lowering strategy to use

In that case, we need to create two different intrinsics and two pesudo operations, one for immediate operand and other for non-immediate operand. Also, reduction of scalar value of immediate value is that value itself, so do we really need lowering for this ?

foad added inline comments.Jul 12 2023, 4:45 AM
llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1928

I think @arsenm meant that the intrinsic should take an extra immarg i32 %strategy argument.

Addressed review comments @arsenm.

Implemented umin using post isel hook

Sorry, I meant umax. We need umax for alloca, not umin

arsenm added inline comments.Jul 12 2023, 4:47 AM
llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1928

Yes, so you have a way of requesting the DPP or WWM lowering etc. It doesn't change the main operand

pravinjagtap retitled this revision from [WIP] [AMDGPU] Add llvm.amdgcn.wave.reduce.umin Intrinsic. to [WIP] [AMDGPU] Add llvm.amdgcn.wave.reduce.umin/umax Intrinsic..

Added support for umax

arsenm added inline comments.Jul 12 2023, 7:51 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
4091

I was envisioning this as just a hint, and if unimplemented (or the target doesn't support the version), it would just fallback to one that works.

Should also add some intrinsic documentation to AMDGPUUsage with the values for this

llvm/lib/Target/AMDGPU/SIInstructions.td
267

These need _U32/_B32 suffixes

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.umax.ll
3

Should test with both wave sizes, and test for every generation, with global-isel=0 and 1

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.umin.ll
5

Put the immarg on the declarations

127

Use named values

arsenm added inline comments.Jul 12 2023, 7:58 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
4091

Also split the argument to strategy decision to a separate function

Addressed review comments of @arsenm

  • Extended support global isel
  • Updated test
arsenm added inline comments.Jul 13 2023, 6:19 AM
llvm/docs/AMDGPUUsage.rst
998

Elaborate that it should work if the target doesn't support the mode (e.g. gfx6/7 have no DPP)

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1932–1936

Define an intrinsic class for these to avoid repeating the signautre each time. Also you still should use a type mangled argument instead of hardcoded i32.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
4070

static, start with lowercase

4081

static, start with lowercase

arsenm requested changes to this revision.Jul 13 2023, 6:38 AM
This revision now requires changes to proceed.Jul 13 2023, 6:38 AM
pravinjagtap retitled this revision from [WIP] [AMDGPU] Add llvm.amdgcn.wave.reduce.umin/umax Intrinsic. to [AMDGPU] Add llvm.amdgcn.wave.reduce.umin/umax Intrinsic..Jul 13 2023, 6:38 AM
pravinjagtap edited the summary of this revision. (Show Details)
pravinjagtap added a reviewer: Restricted Project.
arsenm added inline comments.Jul 13 2023, 12:20 PM
llvm/docs/AMDGPUUsage.rst
998

The default 0 should mean target default preference. The higher values should request a specific strategy

Need MIR tests for pseudo expansion

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
4521–4528
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
4115

typo iterative

4117

same

4152

ExecReg

Addressed review commnets.

Added MIR tests

arsenm added inline comments.Jul 17 2023, 4:39 PM
llvm/docs/AMDGPUUsage.rst
996

Missing wave from the name. Also, probably should spell out each one individually rather than putting a / in the names

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
4083

See my above comment, 0 should be auto

Addressed reveiw comments

arsenm added inline comments.Jul 20 2023, 4:25 PM
llvm/docs/AMDGPUUsage.rst
996

unsigned minimum

1005

unsigned maximum

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1935

Comment doesn't match the description now

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
4071

No point in this wrapper, whenever the new implementation arrives it will add the check

4089

Just return true/false?

4094

Just use the default 0

Addressed review commnets.

For now, for all the cases (default, Iterative and DPP) we use
iterative approach by default. When DPP arrives, strategy
switch needs to be added to decide which implemenation to use.

Mostly lgtm with a few more cleanups

llvm/docs/AMDGPUUsage.rst
1014

Probably should mention it's currently only implemented for i32

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1931

llvm_anyint_ty

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
4108

ST.getWaveMaskRegClass

4131

uint32_t? std::numeric_limits<uint32_t>::max()?

4132

No & on the result of any BuildMI

4151

No &

4153

No &

4165

No &

4184

Could have just use the original register to begin with?

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.umax.ll
8

don't specify the wavefrontsize features twice, just use the wave64 override and assume wave32 by default

314

In a follow up commit, AMDGPUInstCombineIntrinsic should also fold these constant cases out

Addressed review comments. Mostly, Code cleanup.

arsenm accepted this revision.Jul 21 2023, 11:54 AM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
4070

lowerWaveReduce?

4100–4105

Can use C++17 binding

4121

Reuse the same getRegClass call

This revision is now accepted and ready to land.Jul 21 2023, 11:54 AM

As a follow up please do the constant folds in AMDGPUInstCombine. Also can you prepare another to introduce this in the lowering of divergent dynamic alloca?

Addressed comments

This revision was landed with ongoing or failed builds.Jul 23 2023, 9:11 PM
This revision was automatically updated to reflect the committed changes.

As a follow up please do the constant folds in AMDGPUInstCombine. Also can you prepare another to introduce this in the lowering of divergent dynamic alloca?

Constant folds: D156077. Will start looking into the lowering of divergent dynamic alloca.

I think this breaks the expensive-checks CI:
https://lab.llvm.org/buildbot/#/builders/16/builds/51955

Hello @steakhal, I am looking into it.

I think this breaks the expensive-checks CI:
https://lab.llvm.org/buildbot/#/builders/16/builds/51955

Hello @steakhal, I am looking into it.

Unless you think you've almost got it solved, can you revert the changes so the bots go back to green?

I think this breaks the expensive-checks CI:
https://lab.llvm.org/buildbot/#/builders/16/builds/51955

Hello @steakhal, I am looking into it.

Unless you think you've almost got it solved, can you revert the changes so the bots go back to green?

Fix : https://reviews.llvm.org/rGd163b76ce348516db7abe3a462ae4cb78f922c75

CC: @steakhal, @aaron.ballman

I think this breaks the expensive-checks CI:
https://lab.llvm.org/buildbot/#/builders/16/builds/51955

Hello @steakhal, I am looking into it.

Unless you think you've almost got it solved, can you revert the changes so the bots go back to green?

Fix : https://reviews.llvm.org/rGd163b76ce348516db7abe3a462ae4cb78f922c75

CC: @steakhal, @aaron.ballman

Thank you! I can confirm this resolved the issues I was seeing.