This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Enable Atomic Optimizer and Default to Iterative Scan Strategy.
ClosedPublic

Authored by pravinjagtap on Jun 11 2023, 7:47 AM.

Details

Summary

The D147408 implemented new Iterative approach for scan computations
and added new flag amdgpu-atomic-optimizer-strategy which is
defaulted to DPP.

The changeset https://github.com/GPUOpen-Drivers/llpc/pull/2506
adapts to the new changes in LLPC.

This patch enables atomic optimizer pass and selects Iterative
approach for scan computations by default for compute pipeline.

Diff Detail

Event Timeline

pravinjagtap created this revision.Jun 11 2023, 7:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 11 2023, 7:47 AM
pravinjagtap requested review of this revision.Jun 11 2023, 7:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 11 2023, 7:47 AM
pravinjagtap edited the summary of this revision. (Show Details)Jun 11 2023, 7:49 AM
pravinjagtap added reviewers: cdevadas, ruiling.
arsenm added inline comments.Jun 12 2023, 4:10 PM
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
278–281

The old option can be dropped, none should be a value for the strategy

pravinjagtap added inline comments.Jun 12 2023, 8:51 PM
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
278–281

The old option can be dropped, none should be a value for the strategy

I think, dropping the old option (amdgpu-atomic-optimizations) will break LLPC at the moment.

I think right way to do this is, transfer the responsibility of enabling the pass to amdgpu-atomic-optimizer-strategy and keep the old option amdgpu-atomic-optimizations till LLPC removes all the usage of it.

@foad

foad added a comment.Jun 14 2023, 2:34 AM

Code LGTM. I'm not sure how far we want to go with disabling this optimization in lit tests that are not specifically testing the optimization itself.

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
278–281

Right, please keep it until LLPC has been updated again, then you can remove it.

llvm/test/CodeGen/AMDGPU/GlobalISel/mubuf-global.ll
1–2

_Maybe_ disable atomic optimization in this test, since it obscures what we are testing for?

llvm/test/CodeGen/AMDGPU/dag-divergence-atomic.ll
2

_Maybe_ disable atomic optimization in this test, since it obscures what we are testing for?

llvm/test/CodeGen/AMDGPU/gds-allocation.ll
2

_Maybe_ disable atomic optimization in this test, since it obscures what we are testing for?

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
260

This is a bit unfortunate. Is there anything we can do about it? Does cycle info even have an API for updating it?

llvm/test/CodeGen/AMDGPU/should-not-hoist-set-inactive.ll
2

_Maybe_ disable atomic optimization in this test, since it obscures what we are testing for?

pravinjagtap added inline comments.Jun 14 2023, 4:27 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
278–281

Right, please keep it until LLPC has been updated again, then you can remove it.

So, we will be having three options for strategy { DPP, Iterative, None}. Here None will disable the pass (will be used in lit tests) & DPP/Iterative will enable the pass with given strategy.

The amdgpu-atomic-optimizer-strategy will be defaulted to Iterative (not None as suggested by @arsenm ) and then we could drop the old option amdgpu-atomic-optimizations.

Is this fine or is there any other way of handling this situation ?

foad added inline comments.Jun 14 2023, 6:54 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
278–281

Sounds fine but in this patch please just change the default for amdgpu-atomic-optimizations from false to true. The other option changes can be done in a separate patch.

Addressed review comments

pravinjagtap retitled this revision from [AMDGPU] Enable Atomic Optimizer and Default to Iterative Scan Strategy. to [AMDGPU] Enable Atomic Optimizer by default..Jun 14 2023, 7:53 AM
pravinjagtap edited the summary of this revision. (Show Details)
pravinjagtap marked 7 inline comments as done.
foad accepted this revision.Jun 14 2023, 7:59 AM

LGTM, with or without changing the default strategy.

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
278–281

Sorry I was not clear: It is also fine to change the default strategy in this patch, if you want to. But please do not remove the amdgpu-atomic-optimizations option in this patch.

This revision is now accepted and ready to land.Jun 14 2023, 7:59 AM

Switching back to iterative strategy

pravinjagtap retitled this revision from [AMDGPU] Enable Atomic Optimizer by default. to [AMDGPU] Enable Atomic Optimizer and Default to Iterative Scan Strategy..Jun 14 2023, 9:15 AM
pravinjagtap edited the summary of this revision. (Show Details)

@jplehr @ronlieb

This patch seems to have had catastrophic results on the build times in the libc project. See the buildbot where the build times have gone from ~15 minutes to over an hour https://lab.llvm.org/staging/#/builders/247. Reverting this patch causes the time for my to build a test to go from 55 seconds to 5 seconds. I can try to get pass timing later.

@jplehr @ronlieb

This patch seems to have had catastrophic results on the build times in the libc project. See the buildbot where the build times have gone from ~15 minutes to over an hour https://lab.llvm.org/staging/#/builders/247. Reverting this patch causes the time for my to build a test to go from 55 seconds to 5 seconds. I can try to get pass timing later.

If this is only affecting AMDGPU and libc, is it a problem? This optimization is very important. Of course, it would be great if it can be sped up.

jhuber6 added a subscriber: ye-luo.Jun 18 2023, 3:38 PM

@jplehr @ronlieb

This patch seems to have had catastrophic results on the build times in the libc project. See the buildbot where the build times have gone from ~15 minutes to over an hour https://lab.llvm.org/staging/#/builders/247. Reverting this patch causes the time for my to build a test to go from 55 seconds to 5 seconds. I can try to get pass timing later.

If this is only affecting AMDGPU and libc, is it a problem? This optimization is very important. Of course, it would be great if it can be sped up.

I don't know if it's just a libc problem, it probably just highlights the problem since it's a considerably large application with a good number of atomic operations thanks to RPC calls. Maybe @ye-luo can let me know if he's had any compile time regressions recently. In any case a ~10x increase in link time is a little steep for a single optimization. I can always disable this pass in the libc build but I think that other large applications are going to see similar changes so we need to think of a way to do this that isn't so slow.

pravinjagtap added a comment.EditedJun 18 2023, 11:16 PM

@jplehr @ronlieb

This patch seems to have had catastrophic results on the build times in the libc project. See the buildbot where the build times have gone from ~15 minutes to over an hour https://lab.llvm.org/staging/#/builders/247. Reverting this patch causes the time for my to build a test to go from 55 seconds to 5 seconds. I can try to get pass timing later.

If this is only affecting AMDGPU and libc, is it a problem? This optimization is very important. Of course, it would be great if it can be sped up.

I don't know if it's just a libc problem, it probably just highlights the problem since it's a considerably large application with a good number of atomic operations thanks to RPC calls. Maybe @ye-luo can let me know if he's had any compile time regressions recently. In any case a ~10x increase in link time is a little steep for a single optimization. I can always disable this pass in the libc build but I think that other large applications are going to see similar changes so we need to think of a way to do this that isn't so slow.

The newly added assertions at https://github.com/llvm/llvm-project/blob/1ebbbf1614cfdbf6d78f4f2a665cdea9cbb2beb8/llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp#L774 and https://github.com/llvm/llvm-project/blob/1ebbbf1614cfdbf6d78f4f2a665cdea9cbb2beb8/llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp#L802 during D147408 seems to be causing this problem. Verifying dominator tree is very expensive using assertions like this. The alternative to this check is -verify-dom-info which seems to be broken at this moment. Let me investigate.

The time for running libc tests is recovered without these assertions.

From the pass,

/ This pass optimizes atomic operations by using a single lane of a wavefront
/ to perform the atomic operation, thus reducing contention on that memory
/// location.

Could someone expand on this? The pass seems to take an atomic operation that lowers to a single instruction and replace it with a loop over active lanes, each of which calls that same instruction. How/when is that better?

foad added a comment.Jun 19 2023, 4:01 AM

The pass seems to take an atomic operation that lowers to a single instruction and replace it with a loop over active lanes, each of which calls that same instruction.

No - it takes an atomic operations that is executed by (we assume) many lanes, and replaces it with an atomic that is only executed by a single lane, because it is inside some kind of "if (laneid==0)" check.

To make this work you might have to fettle the inputs or outputs of the atomic op, to make it work "as if" it was executed many times by many lanes. E.g. for an atomic add you have to do a plus-reduction of the inputs to the many-lane atomic adds, to get the value to pass into the single-lane atomic add. That's where the loop comes in: it is one way of calculating the plus-reduction. But since it is only doing ALU work, it is still supposed to be better than running a whole bunch of serialised atomic memory operations.

I only have logs for the Kokkos build handy: this optimization was the second most expensive in total time with ~17% share.

@jplehr @ronlieb

This patch seems to have had catastrophic results on the build times in the libc project. See the buildbot where the build times have gone from ~15 minutes to over an hour https://lab.llvm.org/staging/#/builders/247. Reverting this patch causes the time for my to build a test to go from 55 seconds to 5 seconds. I can try to get pass timing later.

Hello @jhuber6 , Did D153261 recover the regression?

@jplehr @ronlieb

This patch seems to have had catastrophic results on the build times in the libc project. See the buildbot where the build times have gone from ~15 minutes to over an hour https://lab.llvm.org/staging/#/builders/247. Reverting this patch causes the time for my to build a test to go from 55 seconds to 5 seconds. I can try to get pass timing later.

Hello @jhuber6 , Did D153261 recover the regression?

Yes, I reverted the disabling on Tuesday.