This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
5/6
AMDGPUTargetMachine.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
GlobalISel/
1/1
mubuf-global.ll
-
atomic_optimizations_pixelshader.ll
-
buffer-intrinsics-mmo-offsets.ll
1/1
dag-divergence-atomic.ll
1/1
gds-allocation.ll
1
llc-pipeline.ll
-
noclobber-barrier.ll
1/1
should-not-hoist-set-inactive.ll

Differential D152649

[AMDGPU] Enable Atomic Optimizer and Default to Iterative Scan Strategy.
ClosedPublic

Authored by pravinjagtap on Jun 11 2023, 7:47 AM.

Download Raw Diff

Details

Reviewers

foad
arsenm
brian
cdevadas
ruiling

Commits

rG03d92501f385: [AMDGPU] Enable Atomic Optimizer and Default to Iterative Scan Strategy.

Summary

The D147408 implemented new Iterative approach for scan computations
and added new flag amdgpu-atomic-optimizer-strategy which is
defaulted to DPP.

The changeset https://github.com/GPUOpen-Drivers/llpc/pull/2506
adapts to the new changes in LLPC.

This patch enables atomic optimizer pass and selects Iterative
approach for scan computations by default for compute pipeline.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

pravinjagtap created this revision.Jun 11 2023, 7:47 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 11 2023, 7:47 AM

Herald added subscribers: StephenFan, kerbowa, hiraditya and 5 others. · View Herald Transcript

pravinjagtap requested review of this revision.Jun 11 2023, 7:47 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 11 2023, 7:47 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

pravinjagtap edited the summary of this revision. (Show Details)Jun 11 2023, 7:49 AM

pravinjagtap added reviewers: cdevadas, ruiling.

Harbormaster completed remote builds in B238031: Diff 530300.Jun 11 2023, 8:40 AM

arsenm added inline comments.Jun 12 2023, 4:10 PM

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
278–281	The old option can be dropped, none should be a value for the strategy

pravinjagtap added inline comments.Jun 12 2023, 8:51 PM

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
278–281	The old option can be dropped, none should be a value for the strategy I think, dropping the old option (`amdgpu-atomic-optimizations`) will break LLPC at the moment. I think right way to do this is, transfer the responsibility of enabling the pass to `amdgpu-atomic-optimizer-strategy` and keep the old option `amdgpu-atomic-optimizations` till LLPC removes all the usage of it. @foad

Code LGTM. I'm not sure how far we want to go with disabling this optimization in lit tests that are not specifically testing the optimization itself.

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
278–281	Right, please keep it until LLPC has been updated again, then you can remove it.
llvm/test/CodeGen/AMDGPU/GlobalISel/mubuf-global.ll
1–2	_Maybe_ disable atomic optimization in this test, since it obscures what we are testing for?
llvm/test/CodeGen/AMDGPU/dag-divergence-atomic.ll
2	_Maybe_ disable atomic optimization in this test, since it obscures what we are testing for?
llvm/test/CodeGen/AMDGPU/gds-allocation.ll
2	_Maybe_ disable atomic optimization in this test, since it obscures what we are testing for?
llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
260	This is a bit unfortunate. Is there anything we can do about it? Does cycle info even have an API for updating it?
llvm/test/CodeGen/AMDGPU/should-not-hoist-set-inactive.ll
2	_Maybe_ disable atomic optimization in this test, since it obscures what we are testing for?

pravinjagtap added inline comments.Jun 14 2023, 4:27 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
278–281	Right, please keep it until LLPC has been updated again, then you can remove it. So, we will be having three options for strategy { DPP, Iterative, None}. Here `None` will disable the pass (will be used in lit tests) & DPP/Iterative will enable the pass with given strategy. The `amdgpu-atomic-optimizer-strategy` will be defaulted to `Iterative` (not None as suggested by @arsenm ) and then we could drop the old option `amdgpu-atomic-optimizations`. Is this fine or is there any other way of handling this situation ?

foad added inline comments.Jun 14 2023, 6:54 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
278–281	Sounds fine but in this patch please just change the default for amdgpu-atomic-optimizations from false to true. The other option changes can be done in a separate patch.

Addressed review comments

pravinjagtap retitled this revision from [AMDGPU] Enable Atomic Optimizer and Default to Iterative Scan Strategy. to [AMDGPU] Enable Atomic Optimizer by default..Jun 14 2023, 7:53 AM

pravinjagtap edited the summary of this revision. (Show Details)

pravinjagtap marked 7 inline comments as done.

LGTM, with or without changing the default strategy.

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
278–281	Sorry I was not clear: It is also fine to change the default strategy in this patch, if you want to. But please do not remove the `amdgpu-atomic-optimizations` option in this patch.

This revision is now accepted and ready to land.Jun 14 2023, 7:59 AM

Harbormaster completed remote builds in B238811: Diff 531335.Jun 14 2023, 8:50 AM

Switching back to iterative strategy

pravinjagtap retitled this revision from [AMDGPU] Enable Atomic Optimizer by default. to [AMDGPU] Enable Atomic Optimizer and Default to Iterative Scan Strategy..Jun 14 2023, 9:15 AM

pravinjagtap edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B238840: Diff 531376.Jun 14 2023, 10:46 AM

Closed by commit rG03d92501f385: [AMDGPU] Enable Atomic Optimizer and Default to Iterative Scan Strategy. (authored by pravinjagtap). · Explain WhyJun 14 2023, 10:22 PM

This revision was automatically updated to reflect the committed changes.

pravinjagtap added a commit: rG03d92501f385: [AMDGPU] Enable Atomic Optimizer and Default to Iterative Scan Strategy..

Here's the corresponding LLPC patch: https://github.com/GPUOpen-Drivers/llpc/pull/2523

pravinjagtap mentioned this in D153007: [AMDGPU] Switch to the new cl option amdgpu-atomic-optimizer-strategy..Jun 15 2023, 3:39 AM

@jplehr @ronlieb

This patch seems to have had catastrophic results on the build times in the libc project. See the buildbot where the build times have gone from ~15 minutes to over an hour https://lab.llvm.org/staging/#/builders/247. Reverting this patch causes the time for my to build a test to go from 55 seconds to 5 seconds. I can try to get pass timing later.

In D152649#4431258, @jhuber6 wrote:

@jplehr @ronlieb

This patch seems to have had catastrophic results on the build times in the libc project. See the buildbot where the build times have gone from ~15 minutes to over an hour https://lab.llvm.org/staging/#/builders/247. Reverting this patch causes the time for my to build a test to go from 55 seconds to 5 seconds. I can try to get pass timing later.

If this is only affecting AMDGPU and libc, is it a problem? This optimization is very important. Of course, it would be great if it can be sped up.

In D152649#4431278, @b-sumner wrote:

In D152649#4431258, @jhuber6 wrote:

@jplehr @ronlieb

This patch seems to have had catastrophic results on the build times in the libc project. See the buildbot where the build times have gone from ~15 minutes to over an hour https://lab.llvm.org/staging/#/builders/247. Reverting this patch causes the time for my to build a test to go from 55 seconds to 5 seconds. I can try to get pass timing later.

If this is only affecting AMDGPU and libc, is it a problem? This optimization is very important. Of course, it would be great if it can be sped up.

I don't know if it's just a libc problem, it probably just highlights the problem since it's a considerably large application with a good number of atomic operations thanks to RPC calls. Maybe @ye-luo can let me know if he's had any compile time regressions recently. In any case a ~10x increase in link time is a little steep for a single optimization. I can always disable this pass in the libc build but I think that other large applications are going to see similar changes so we need to think of a way to do this that isn't so slow.

jhuber6 mentioned this in D153232: [libc] Disable atomic optimizations for `libc` AMDGPU builds.Jun 18 2023, 3:50 PM

In D152649#4431309, @jhuber6 wrote:

In D152649#4431278, @b-sumner wrote:

In D152649#4431258, @jhuber6 wrote:

@jplehr @ronlieb

This patch seems to have had catastrophic results on the build times in the libc project. See the buildbot where the build times have gone from ~15 minutes to over an hour https://lab.llvm.org/staging/#/builders/247. Reverting this patch causes the time for my to build a test to go from 55 seconds to 5 seconds. I can try to get pass timing later.

If this is only affecting AMDGPU and libc, is it a problem? This optimization is very important. Of course, it would be great if it can be sped up.

I don't know if it's just a libc problem, it probably just highlights the problem since it's a considerably large application with a good number of atomic operations thanks to RPC calls. Maybe @ye-luo can let me know if he's had any compile time regressions recently. In any case a ~10x increase in link time is a little steep for a single optimization. I can always disable this pass in the libc build but I think that other large applications are going to see similar changes so we need to think of a way to do this that isn't so slow.

The newly added assertions at https://github.com/llvm/llvm-project/blob/1ebbbf1614cfdbf6d78f4f2a665cdea9cbb2beb8/llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp#L774 and https://github.com/llvm/llvm-project/blob/1ebbbf1614cfdbf6d78f4f2a665cdea9cbb2beb8/llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp#L802 during D147408 seems to be causing this problem. Verifying dominator tree is very expensive using assertions like this. The alternative to this check is -verify-dom-info which seems to be broken at this moment. Let me investigate.

The time for running libc tests is recovered without these assertions.

jhuber6 mentioned this in rG5a8fc419379f: [libc] Disable atomic optimizations for `libc` AMDGPU builds.Jun 19 2023, 1:26 AM

From the pass,

/ This pass optimizes atomic operations by using a single lane of a wavefront
/ to perform the atomic operation, thus reducing contention on that memory
/// location.

Could someone expand on this? The pass seems to take an atomic operation that lowers to a single instruction and replace it with a loop over active lanes, each of which calls that same instruction. How/when is that better?

The pass seems to take an atomic operation that lowers to a single instruction and replace it with a loop over active lanes, each of which calls that same instruction.

No - it takes an atomic operations that is executed by (we assume) many lanes, and replaces it with an atomic that is only executed by a single lane, because it is inside some kind of "if (laneid==0)" check.

To make this work you might have to fettle the inputs or outputs of the atomic op, to make it work "as if" it was executed many times by many lanes. E.g. for an atomic add you have to do a plus-reduction of the inputs to the many-lane atomic adds, to get the value to pass into the single-lane atomic add. That's where the loop comes in: it is one way of calculating the plus-reduction. But since it is only doing ALU work, it is still supposed to be better than running a whole bunch of serialised atomic memory operations.

I only have logs for the Kokkos build handy: this optimization was the second most expensive in total time with ~17% share.

pravinjagtap mentioned this in rG597fb7fb4640: [AMDGPU] Switch to the new cl option amdgpu-atomic-optimizer-strategy..Jun 22 2023, 4:09 AM

In D152649#4431258, @jhuber6 wrote:

@jplehr @ronlieb

This patch seems to have had catastrophic results on the build times in the libc project. See the buildbot where the build times have gone from ~15 minutes to over an hour https://lab.llvm.org/staging/#/builders/247. Reverting this patch causes the time for my to build a test to go from 55 seconds to 5 seconds. I can try to get pass timing later.

Hello @jhuber6 , Did D153261 recover the regression?

In D152649#4443909, @pravinjagtap wrote:

In D152649#4431258, @jhuber6 wrote:

@jplehr @ronlieb

This patch seems to have had catastrophic results on the build times in the libc project. See the buildbot where the build times have gone from ~15 minutes to over an hour https://lab.llvm.org/staging/#/builders/247. Reverting this patch causes the time for my to build a test to go from 55 seconds to 5 seconds. I can try to get pass timing later.

Hello @jhuber6 , Did D153261 recover the regression?

Yes, I reverted the disabling on Tuesday.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUTargetMachine.cpp

4 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

mubuf-global.ll

4 lines

atomic_optimizations_pixelshader.ll

14 lines

buffer-intrinsics-mmo-offsets.ll

2 lines

dag-divergence-atomic.ll

2 lines

gds-allocation.ll

2 lines

llc-pipeline.ll

12 lines

noclobber-barrier.ll

2 lines

should-not-hoist-set-inactive.ll

2 lines

Diff 531613

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 269 Lines • ▼ Show 20 Lines	static cl::opt<bool> EnableRegReassign(
cl::Hidden);		cl::Hidden);

static cl::opt<bool> OptVGPRLiveRange(		static cl::opt<bool> OptVGPRLiveRange(
"amdgpu-opt-vgpr-liverange",		"amdgpu-opt-vgpr-liverange",
cl::desc("Enable VGPR liverange optimizations for if-else structure"),		cl::desc("Enable VGPR liverange optimizations for if-else structure"),
cl::init(true), cl::Hidden);		cl::init(true), cl::Hidden);

// Enable atomic optimization		// Enable atomic optimization
static cl::opt<bool>		static cl::opt<bool>
EnableAtomicOptimizations("amdgpu-atomic-optimizations",		EnableAtomicOptimizations("amdgpu-atomic-optimizations",
cl::desc("Enable atomic optimizations"),		cl::desc("Enable atomic optimizations"),
cl::init(false), cl::Hidden);		cl::init(true), cl::Hidden);
		arsenmUnsubmitted Done Reply Inline Actions The old option can be dropped, none should be a value for the strategy arsenm: The old option can be dropped, none should be a value for the strategy
		pravinjagtapAuthorUnsubmitted Done Reply Inline Actions The old option can be dropped, none should be a value for the strategy I think, dropping the old option (`amdgpu-atomic-optimizations`) will break LLPC at the moment. I think right way to do this is, transfer the responsibility of enabling the pass to `amdgpu-atomic-optimizer-strategy` and keep the old option `amdgpu-atomic-optimizations` till LLPC removes all the usage of it. @foad pravinjagtap: > The old option can be dropped, none should be a value for the strategy I think, dropping…
		foadUnsubmitted Done Reply Inline Actions Right, please keep it until LLPC has been updated again, then you can remove it. foad: Right, please keep it until LLPC has been updated again, then you can remove it.
		pravinjagtapAuthorUnsubmitted Done Reply Inline Actions Right, please keep it until LLPC has been updated again, then you can remove it. So, we will be having three options for strategy { DPP, Iterative, None}. Here `None` will disable the pass (will be used in lit tests) & DPP/Iterative will enable the pass with given strategy. The `amdgpu-atomic-optimizer-strategy` will be defaulted to `Iterative` (not None as suggested by @arsenm ) and then we could drop the old option `amdgpu-atomic-optimizations`. Is this fine or is there any other way of handling this situation ? pravinjagtap: > Right, please keep it until LLPC has been updated again, then you can remove it. So, we will…
		foadUnsubmitted Done Reply Inline Actions Sounds fine but in this patch please just change the default for amdgpu-atomic-optimizations from false to true. The other option changes can be done in a separate patch. foad: Sounds fine but in this patch please just change the default for amdgpu-atomic-optimizations…
		foadUnsubmitted Not Done Reply Inline Actions Sorry I was not clear: It is also fine to change the default strategy in this patch, if you want to. But please do not remove the `amdgpu-atomic-optimizations` option in this patch. foad: Sorry I was not clear: It is also fine to change the default strategy in this patch, if you…

static cl::opt<ScanOptions> AMDGPUAtomicOptimizerStrategy(		static cl::opt<ScanOptions> AMDGPUAtomicOptimizerStrategy(
"amdgpu-atomic-optimizer-strategy",		"amdgpu-atomic-optimizer-strategy",
cl::desc("Select DPP or Iterative strategy for scan"),		cl::desc("Select DPP or Iterative strategy for scan"),
cl::init(ScanOptions::DPP),		cl::init(ScanOptions::Iterative),
cl::values(clEnumValN(ScanOptions::DPP, "DPP",		cl::values(clEnumValN(ScanOptions::DPP, "DPP",
"Use DPP operations for scan"),		"Use DPP operations for scan"),
clEnumValN(ScanOptions::Iterative, "Iterative",		clEnumValN(ScanOptions::Iterative, "Iterative",
"Use Iterative approach for scan")));		"Use Iterative approach for scan")));

// Enable Mode register optimization		// Enable Mode register optimization
static cl::opt<bool> EnableSIModeRegisterPass(		static cl::opt<bool> EnableSIModeRegisterPass(
"amdgpu-mode-register",		"amdgpu-mode-register",
▲ Show 20 Lines • Show All 1,350 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/mubuf-global.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=tahiti < %s \| FileCheck -check-prefix=GFX6 %s			; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=tahiti -amdgpu-atomic-optimizations=false < %s \| FileCheck -check-prefix=GFX6 %s
				foadUnsubmitted Done Reply Inline Actions _Maybe_ disable atomic optimization in this test, since it obscures what we are testing for? foad: _Maybe_ disable atomic optimization in this test, since it obscures what we are testing for?
	; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=hawaii < %s \| FileCheck -check-prefix=GFX7 %s			; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=hawaii -amdgpu-atomic-optimizations=false < %s \| FileCheck -check-prefix=GFX7 %s

	; Test end to end matching of addressing modes when MUBUF is used for			; Test end to end matching of addressing modes when MUBUF is used for
	; global memory.			; global memory.

	define amdgpu_ps void @mubuf_store_sgpr_ptr(ptr addrspace(1) inreg %ptr) {			define amdgpu_ps void @mubuf_store_sgpr_ptr(ptr addrspace(1) inreg %ptr) {
	; GFX6-LABEL: mubuf_store_sgpr_ptr:			; GFX6-LABEL: mubuf_store_sgpr_ptr:
	; GFX6: ; %bb.0:			; GFX6: ; %bb.0:
	; GFX6-NEXT: s_mov_b32 s0, s2			; GFX6-NEXT: s_mov_b32 s0, s2
	▲ Show 20 Lines • Show All 1,266 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/atomic_optimizations_pixelshader.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-- -amdgpu-atomic-optimizations=true -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -enable-var-scope -check-prefixes=GFX7 %s			; RUN: llc -mtriple=amdgcn-- -amdgpu-atomic-optimizations=true -amdgpu-atomic-optimizer-strategy=DPP -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -enable-var-scope -check-prefixes=GFX7 %s
	; RUN: llc -mtriple=amdgcn-- -mcpu=tonga -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -enable-var-scope -check-prefixes=GFX89,GFX8 %s			; RUN: llc -mtriple=amdgcn-- -mcpu=tonga -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -amdgpu-atomic-optimizer-strategy=DPP -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -enable-var-scope -check-prefixes=GFX89,GFX8 %s
	; RUN: llc -mtriple=amdgcn-- -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -enable-var-scope -check-prefixes=GFX89,GFX9 %s			; RUN: llc -mtriple=amdgcn-- -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -amdgpu-atomic-optimizer-strategy=DPP -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -enable-var-scope -check-prefixes=GFX89,GFX9 %s
	; RUN: llc -mtriple=amdgcn-- -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -enable-var-scope -check-prefixes=GFX1064 %s			; RUN: llc -mtriple=amdgcn-- -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -amdgpu-atomic-optimizer-strategy=DPP -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -enable-var-scope -check-prefixes=GFX1064 %s
	; RUN: llc -mtriple=amdgcn-- -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -enable-var-scope -check-prefixes=GFX1032 %s			; RUN: llc -mtriple=amdgcn-- -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -amdgpu-atomic-optimizer-strategy=DPP -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -enable-var-scope -check-prefixes=GFX1032 %s
	; RUN: llc -mtriple=amdgcn-- -mcpu=gfx1100 -mattr=-wavefrontsize32,+wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -enable-var-scope -check-prefixes=GFX1164 %s			; RUN: llc -mtriple=amdgcn-- -mcpu=gfx1100 -mattr=-wavefrontsize32,+wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -amdgpu-atomic-optimizer-strategy=DPP -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -enable-var-scope -check-prefixes=GFX1164 %s
	; RUN: llc -mtriple=amdgcn-- -mcpu=gfx1100 -mattr=+wavefrontsize32,-wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -enable-var-scope -check-prefixes=GFX1132 %s			; RUN: llc -mtriple=amdgcn-- -mcpu=gfx1100 -mattr=+wavefrontsize32,-wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -amdgpu-atomic-optimizer-strategy=DPP -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -enable-var-scope -check-prefixes=GFX1132 %s

	declare i1 @llvm.amdgcn.wqm.vote(i1)			declare i1 @llvm.amdgcn.wqm.vote(i1)
	declare i32 @llvm.amdgcn.raw.ptr.buffer.atomic.add(i32, ptr addrspace(8), i32, i32, i32 immarg)			declare i32 @llvm.amdgcn.raw.ptr.buffer.atomic.add(i32, ptr addrspace(8), i32, i32, i32 immarg)
	declare void @llvm.amdgcn.raw.ptr.buffer.store.f32(float, ptr addrspace(8), i32, i32, i32 immarg)			declare void @llvm.amdgcn.raw.ptr.buffer.store.f32(float, ptr addrspace(8), i32, i32, i32 immarg)

	; Show what the atomic optimization pass will do for raw buffers.			; Show what the atomic optimization pass will do for raw buffers.

	define amdgpu_ps void @add_i32_constant(ptr addrspace(8) inreg %out, ptr addrspace(8) inreg %inout) {			define amdgpu_ps void @add_i32_constant(ptr addrspace(8) inreg %out, ptr addrspace(8) inreg %inout) {
	▲ Show 20 Lines • Show All 620 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/buffer-intrinsics-mmo-offsets.ll

	; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=gfx908 -verify-machineinstrs -stop-after=amdgpu-isel -o - %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=gfx908 -verify-machineinstrs -amdgpu-atomic-optimizations=false -stop-after=amdgpu-isel -o - %s \| FileCheck -check-prefix=GCN %s

	define amdgpu_cs void @mmo_offsets0(ptr addrspace(6) inreg noalias align(16) dereferenceable(18446744073709551615) %arg0, i32 %arg1) {			define amdgpu_cs void @mmo_offsets0(ptr addrspace(6) inreg noalias align(16) dereferenceable(18446744073709551615) %arg0, i32 %arg1) {
	; GCN-LABEL: name: mmo_offsets0			; GCN-LABEL: name: mmo_offsets0
	; GCN: bb.0.bb.0:			; GCN: bb.0.bb.0:
	; GCN-NEXT: liveins: $sgpr0, $vgpr0			; GCN-NEXT: liveins: $sgpr0, $vgpr0
	; GCN-NEXT: {{ $}}			; GCN-NEXT: {{ $}}
	; GCN-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GCN-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GCN-NEXT: [[COPY1:%[0-9]+]]:sgpr_32 = COPY $sgpr0			; GCN-NEXT: [[COPY1:%[0-9]+]]:sgpr_32 = COPY $sgpr0
	▲ Show 20 Lines • Show All 628 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/dag-divergence-atomic.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=gfx90a -o - %s \| FileCheck %s			; RUN: llc -march=amdgcn -mcpu=gfx90a -amdgpu-atomic-optimizations=false -o - %s \| FileCheck %s
				foadUnsubmitted Done Reply Inline Actions _Maybe_ disable atomic optimization in this test, since it obscures what we are testing for? foad: _Maybe_ disable atomic optimization in this test, since it obscures what we are testing for?

	%S = type <{ float, double }>			%S = type <{ float, double }>

	; The result of that atomic ops should not be used as a uniform value.			; The result of that atomic ops should not be used as a uniform value.

	define protected amdgpu_kernel void @add(ptr addrspace(1) %p, ptr addrspace(1) %q) {			define protected amdgpu_kernel void @add(ptr addrspace(1) %p, ptr addrspace(1) %q) {
	; CHECK-LABEL: add:			; CHECK-LABEL: add:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	▲ Show 20 Lines • Show All 964 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/gds-allocation.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 -amdgpu-atomic-optimizations=false -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s
				foadUnsubmitted Done Reply Inline Actions _Maybe_ disable atomic optimization in this test, since it obscures what we are testing for? foad: _Maybe_ disable atomic optimization in this test, since it obscures what we are testing for?

	@gds0 = internal addrspace(2) global [4 x i32] undef, align 4			@gds0 = internal addrspace(2) global [4 x i32] undef, align 4
	@lds0 = internal addrspace(3) global [4 x i32] undef, align 128			@lds0 = internal addrspace(3) global [4 x i32] undef, align 128
	@lds1 = internal addrspace(3) global [4 x i32] undef, align 256			@lds1 = internal addrspace(3) global [4 x i32] undef, align 256

	; These two objects should be allocated at the same constant offsets			; These two objects should be allocated at the same constant offsets
	; from the base.			; from the base.
	define amdgpu_kernel void @alloc_lds_gds(ptr addrspace(1) %out) #1 {			define amdgpu_kernel void @alloc_lds_gds(ptr addrspace(1) %out) #1 {
	▲ Show 20 Lines • Show All 122 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

	Show First 20 Lines • Show All 245 Lines • ▼ Show 20 Lines
	; GCN-O1-NEXT: Dominator Tree Construction			; GCN-O1-NEXT: Dominator Tree Construction
	; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O1-NEXT: Function Alias Analysis Results			; GCN-O1-NEXT: Function Alias Analysis Results
	; GCN-O1-NEXT: Flatten the CFG			; GCN-O1-NEXT: Flatten the CFG
	; GCN-O1-NEXT: Dominator Tree Construction			; GCN-O1-NEXT: Dominator Tree Construction
	; GCN-O1-NEXT: Cycle Info Analysis			; GCN-O1-NEXT: Cycle Info Analysis
	; GCN-O1-NEXT: Uniformity Analysis			; GCN-O1-NEXT: Uniformity Analysis
	; GCN-O1-NEXT: AMDGPU IR late optimizations			; GCN-O1-NEXT: AMDGPU IR late optimizations
				; GCN-O1-NEXT: AMDGPU atomic optimizations
	; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O1-NEXT: Function Alias Analysis Results			; GCN-O1-NEXT: Function Alias Analysis Results
	; GCN-O1-NEXT: Natural Loop Information			; GCN-O1-NEXT: Natural Loop Information
	; GCN-O1-NEXT: Code sinking			; GCN-O1-NEXT: Code sinking
	; GCN-O1-NEXT: Post-Dominator Tree Construction			; GCN-O1-NEXT: Post-Dominator Tree Construction
				; GCN-O1-NEXT: Cycle Info Analysis
				foadUnsubmitted Not Done Reply Inline Actions This is a bit unfortunate. Is there anything we can do about it? Does cycle info even have an API for updating it? foad: This is a bit unfortunate. Is there anything we can do about it? Does cycle info even have an…
				; GCN-O1-NEXT: Uniformity Analysis
	; GCN-O1-NEXT: Unify divergent function exit nodes			; GCN-O1-NEXT: Unify divergent function exit nodes
	; GCN-O1-NEXT: Lazy Value Information Analysis			; GCN-O1-NEXT: Lazy Value Information Analysis
	; GCN-O1-NEXT: Lower SwitchInst's to branches			; GCN-O1-NEXT: Lower SwitchInst's to branches
	; GCN-O1-NEXT: Dominator Tree Construction			; GCN-O1-NEXT: Dominator Tree Construction
	; GCN-O1-NEXT: Natural Loop Information			; GCN-O1-NEXT: Natural Loop Information
	; GCN-O1-NEXT: Convert irreducible control-flow into natural loops			; GCN-O1-NEXT: Convert irreducible control-flow into natural loops
	; GCN-O1-NEXT: Fixup each natural loop to have a single exit block			; GCN-O1-NEXT: Fixup each natural loop to have a single exit block
	; GCN-O1-NEXT: Post-Dominator Tree Construction			; GCN-O1-NEXT: Post-Dominator Tree Construction
	▲ Show 20 Lines • Show All 269 Lines • ▼ Show 20 Lines
	; GCN-O1-OPTS-NEXT: Dominator Tree Construction			; GCN-O1-OPTS-NEXT: Dominator Tree Construction
	; GCN-O1-OPTS-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O1-OPTS-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O1-OPTS-NEXT: Function Alias Analysis Results			; GCN-O1-OPTS-NEXT: Function Alias Analysis Results
	; GCN-O1-OPTS-NEXT: Flatten the CFG			; GCN-O1-OPTS-NEXT: Flatten the CFG
	; GCN-O1-OPTS-NEXT: Dominator Tree Construction			; GCN-O1-OPTS-NEXT: Dominator Tree Construction
	; GCN-O1-OPTS-NEXT: Cycle Info Analysis			; GCN-O1-OPTS-NEXT: Cycle Info Analysis
	; GCN-O1-OPTS-NEXT: Uniformity Analysis			; GCN-O1-OPTS-NEXT: Uniformity Analysis
	; GCN-O1-OPTS-NEXT: AMDGPU IR late optimizations			; GCN-O1-OPTS-NEXT: AMDGPU IR late optimizations
				; GCN-O1-OPTS-NEXT: AMDGPU atomic optimizations
	; GCN-O1-OPTS-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O1-OPTS-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O1-OPTS-NEXT: Function Alias Analysis Results			; GCN-O1-OPTS-NEXT: Function Alias Analysis Results
	; GCN-O1-OPTS-NEXT: Natural Loop Information			; GCN-O1-OPTS-NEXT: Natural Loop Information
	; GCN-O1-OPTS-NEXT: Code sinking			; GCN-O1-OPTS-NEXT: Code sinking
	; GCN-O1-OPTS-NEXT: Post-Dominator Tree Construction			; GCN-O1-OPTS-NEXT: Post-Dominator Tree Construction
				; GCN-O1-OPTS-NEXT: Cycle Info Analysis
				; GCN-O1-OPTS-NEXT: Uniformity Analysis
	; GCN-O1-OPTS-NEXT: Unify divergent function exit nodes			; GCN-O1-OPTS-NEXT: Unify divergent function exit nodes
	; GCN-O1-OPTS-NEXT: Lazy Value Information Analysis			; GCN-O1-OPTS-NEXT: Lazy Value Information Analysis
	; GCN-O1-OPTS-NEXT: Lower SwitchInst's to branches			; GCN-O1-OPTS-NEXT: Lower SwitchInst's to branches
	; GCN-O1-OPTS-NEXT: Dominator Tree Construction			; GCN-O1-OPTS-NEXT: Dominator Tree Construction
	; GCN-O1-OPTS-NEXT: Natural Loop Information			; GCN-O1-OPTS-NEXT: Natural Loop Information
	; GCN-O1-OPTS-NEXT: Convert irreducible control-flow into natural loops			; GCN-O1-OPTS-NEXT: Convert irreducible control-flow into natural loops
	; GCN-O1-OPTS-NEXT: Fixup each natural loop to have a single exit block			; GCN-O1-OPTS-NEXT: Fixup each natural loop to have a single exit block
	; GCN-O1-OPTS-NEXT: Post-Dominator Tree Construction			; GCN-O1-OPTS-NEXT: Post-Dominator Tree Construction
	▲ Show 20 Lines • Show All 285 Lines • ▼ Show 20 Lines
	; GCN-O2-NEXT: Dominator Tree Construction			; GCN-O2-NEXT: Dominator Tree Construction
	; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O2-NEXT: Function Alias Analysis Results			; GCN-O2-NEXT: Function Alias Analysis Results
	; GCN-O2-NEXT: Flatten the CFG			; GCN-O2-NEXT: Flatten the CFG
	; GCN-O2-NEXT: Dominator Tree Construction			; GCN-O2-NEXT: Dominator Tree Construction
	; GCN-O2-NEXT: Cycle Info Analysis			; GCN-O2-NEXT: Cycle Info Analysis
	; GCN-O2-NEXT: Uniformity Analysis			; GCN-O2-NEXT: Uniformity Analysis
	; GCN-O2-NEXT: AMDGPU IR late optimizations			; GCN-O2-NEXT: AMDGPU IR late optimizations
				; GCN-O2-NEXT: AMDGPU atomic optimizations
	; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O2-NEXT: Function Alias Analysis Results			; GCN-O2-NEXT: Function Alias Analysis Results
	; GCN-O2-NEXT: Natural Loop Information			; GCN-O2-NEXT: Natural Loop Information
	; GCN-O2-NEXT: Code sinking			; GCN-O2-NEXT: Code sinking
	; GCN-O2-NEXT: Post-Dominator Tree Construction			; GCN-O2-NEXT: Post-Dominator Tree Construction
				; GCN-O2-NEXT: Cycle Info Analysis
				; GCN-O2-NEXT: Uniformity Analysis
	; GCN-O2-NEXT: Unify divergent function exit nodes			; GCN-O2-NEXT: Unify divergent function exit nodes
	; GCN-O2-NEXT: Lazy Value Information Analysis			; GCN-O2-NEXT: Lazy Value Information Analysis
	; GCN-O2-NEXT: Lower SwitchInst's to branches			; GCN-O2-NEXT: Lower SwitchInst's to branches
	; GCN-O2-NEXT: Dominator Tree Construction			; GCN-O2-NEXT: Dominator Tree Construction
	; GCN-O2-NEXT: Natural Loop Information			; GCN-O2-NEXT: Natural Loop Information
	; GCN-O2-NEXT: Convert irreducible control-flow into natural loops			; GCN-O2-NEXT: Convert irreducible control-flow into natural loops
	; GCN-O2-NEXT: Fixup each natural loop to have a single exit block			; GCN-O2-NEXT: Fixup each natural loop to have a single exit block
	; GCN-O2-NEXT: Post-Dominator Tree Construction			; GCN-O2-NEXT: Post-Dominator Tree Construction
	▲ Show 20 Lines • Show All 300 Lines • ▼ Show 20 Lines
	; GCN-O3-NEXT: Dominator Tree Construction			; GCN-O3-NEXT: Dominator Tree Construction
	; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O3-NEXT: Function Alias Analysis Results			; GCN-O3-NEXT: Function Alias Analysis Results
	; GCN-O3-NEXT: Flatten the CFG			; GCN-O3-NEXT: Flatten the CFG
	; GCN-O3-NEXT: Dominator Tree Construction			; GCN-O3-NEXT: Dominator Tree Construction
	; GCN-O3-NEXT: Cycle Info Analysis			; GCN-O3-NEXT: Cycle Info Analysis
	; GCN-O3-NEXT: Uniformity Analysis			; GCN-O3-NEXT: Uniformity Analysis
	; GCN-O3-NEXT: AMDGPU IR late optimizations			; GCN-O3-NEXT: AMDGPU IR late optimizations
				; GCN-O3-NEXT: AMDGPU atomic optimizations
	; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O3-NEXT: Function Alias Analysis Results			; GCN-O3-NEXT: Function Alias Analysis Results
	; GCN-O3-NEXT: Natural Loop Information			; GCN-O3-NEXT: Natural Loop Information
	; GCN-O3-NEXT: Code sinking			; GCN-O3-NEXT: Code sinking
	; GCN-O3-NEXT: Post-Dominator Tree Construction			; GCN-O3-NEXT: Post-Dominator Tree Construction
				; GCN-O3-NEXT: Cycle Info Analysis
				; GCN-O3-NEXT: Uniformity Analysis
	; GCN-O3-NEXT: Unify divergent function exit nodes			; GCN-O3-NEXT: Unify divergent function exit nodes
	; GCN-O3-NEXT: Lazy Value Information Analysis			; GCN-O3-NEXT: Lazy Value Information Analysis
	; GCN-O3-NEXT: Lower SwitchInst's to branches			; GCN-O3-NEXT: Lower SwitchInst's to branches
	; GCN-O3-NEXT: Dominator Tree Construction			; GCN-O3-NEXT: Dominator Tree Construction
	; GCN-O3-NEXT: Natural Loop Information			; GCN-O3-NEXT: Natural Loop Information
	; GCN-O3-NEXT: Convert irreducible control-flow into natural loops			; GCN-O3-NEXT: Convert irreducible control-flow into natural loops
	; GCN-O3-NEXT: Fixup each natural loop to have a single exit block			; GCN-O3-NEXT: Fixup each natural loop to have a single exit block
	; GCN-O3-NEXT: Post-Dominator Tree Construction			; GCN-O3-NEXT: Post-Dominator Tree Construction
	▲ Show 20 Lines • Show All 179 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/noclobber-barrier.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -march=amdgcn -mcpu=gfx900 -amdgpu-aa -amdgpu-aa-wrapper -amdgpu-annotate-uniform -S --amdgpu-lower-module-lds-strategy=module < %s \| FileCheck %s			; RUN: opt -march=amdgcn -mcpu=gfx900 -amdgpu-aa -amdgpu-aa-wrapper -amdgpu-annotate-uniform -S --amdgpu-lower-module-lds-strategy=module < %s \| FileCheck %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs --amdgpu-lower-module-lds-strategy=module < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs --amdgpu-lower-module-lds-strategy=module -amdgpu-atomic-optimizations=false < %s \| FileCheck -check-prefix=GCN %s

	; Check that barrier or fence in between of loads is not considered a clobber			; Check that barrier or fence in between of loads is not considered a clobber
	; for the purpose of converting vector loads into scalar.			; for the purpose of converting vector loads into scalar.

	@LDS = linkonce_odr hidden local_unnamed_addr addrspace(3) global i32 undef			@LDS = linkonce_odr hidden local_unnamed_addr addrspace(3) global i32 undef

	; GCN-LABEL: {{^}}simple_barrier:			; GCN-LABEL: {{^}}simple_barrier:
	; GCN: s_load_dword s			; GCN: s_load_dword s
	▲ Show 20 Lines • Show All 595 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/should-not-hoist-set-inactive.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
	; RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck %s -check-prefix=GCN			; RUN: llc -march=amdgcn -mcpu=gfx1010 -amdgpu-atomic-optimizations=false -verify-machineinstrs < %s \| FileCheck %s -check-prefix=GCN
				foadUnsubmitted Done Reply Inline Actions _Maybe_ disable atomic optimization in this test, since it obscures what we are testing for? foad: _Maybe_ disable atomic optimization in this test, since it obscures what we are testing for?

	define amdgpu_cs void @should_not_hoist_set_inactive(<4 x i32> inreg %i14, i32 inreg %v, i32 %lane, i32 %f, i32 %f2) #0 {			define amdgpu_cs void @should_not_hoist_set_inactive(<4 x i32> inreg %i14, i32 inreg %v, i32 %lane, i32 %f, i32 %f2) #0 {
	; GCN-LABEL: should_not_hoist_set_inactive:			; GCN-LABEL: should_not_hoist_set_inactive:
	; GCN: ; %bb.0: ; %.entry			; GCN: ; %bb.0: ; %.entry
	; GCN-NEXT: v_cmp_gt_i32_e32 vcc_lo, 3, v1			; GCN-NEXT: v_cmp_gt_i32_e32 vcc_lo, 3, v1
	; GCN-NEXT: v_cmp_eq_u32_e64 s5, 0, v0			; GCN-NEXT: v_cmp_eq_u32_e64 s5, 0, v0
	; GCN-NEXT: v_cmp_ne_u32_e64 s6, 0, v2			; GCN-NEXT: v_cmp_ne_u32_e64 s6, 0, v2
	; GCN-NEXT: s_mov_b32 s7, 0			; GCN-NEXT: s_mov_b32 s7, 0
	▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Enable Atomic Optimizer and Default to Iterative Scan Strategy.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 531613

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/mubuf-global.ll

llvm/test/CodeGen/AMDGPU/atomic_optimizations_pixelshader.ll

llvm/test/CodeGen/AMDGPU/buffer-intrinsics-mmo-offsets.ll

llvm/test/CodeGen/AMDGPU/dag-divergence-atomic.ll

llvm/test/CodeGen/AMDGPU/gds-allocation.ll

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

llvm/test/CodeGen/AMDGPU/noclobber-barrier.ll

llvm/test/CodeGen/AMDGPU/should-not-hoist-set-inactive.ll

[AMDGPU] Enable Atomic Optimizer and Default to Iterative Scan Strategy.
ClosedPublic