This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Move code sinking before structurizer
ClosedPublic

Authored by piotr on Apr 22 2021, 3:23 PM.

Download Raw Diff

Details

Reviewers

critson
arsenm
mareko

Commits

rG09fe84abb4ee: [AMDGPU] Move code sinking before structurizer

Summary

Moving code sinking pass before structurizer creates more sinking
opportunities.

The extra flow edges introduced by the structurizer can have adverse
effects on sinking, because the sinking pass prefers moving instructions
to blocks with unique predecessors and the structurizer destroys that
property in some cases.

A notable example is moving high-latency image instructions across kills.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

piotr created this revision.Apr 22 2021, 3:23 PM

Herald added subscribers: kerbowa, hiraditya, t-tye and 7 others. · View Herald TranscriptApr 22 2021, 3:23 PM

piotr requested review of this revision.Apr 22 2021, 3:23 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 22 2021, 3:23 PM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Added test.

Herald added a subscriber: wenlei. · View Herald TranscriptApr 22 2021, 3:26 PM

piotr added reviewers: critson, arsenm, mareko.Apr 22 2021, 3:28 PM

Harbormaster completed remote builds in B100391: Diff 339794.Apr 22 2021, 5:01 PM

Harbormaster completed remote builds in B100394: Diff 339799.Apr 22 2021, 5:16 PM

I don't remember why we even run this extra sink run. The regular middle end optimizer runs it. I think this helped one edge case at one point. Do any tests regress if you just remove it entirely?

foad added a subscriber: foad.Apr 22 2021, 10:51 PM

I know it is a legacy pass, but I am convinced of its usefulness in our flow - both in real-world content and lit testing (e.g., no_skip_no_successors in skip-if-dead.ll).

In D101115#2711462, @piotr wrote:

I know it is a legacy pass, but I am convinced of its usefulness in our flow - both in real-world content and lit testing (e.g., no_skip_no_successors in skip-if-dead.ll).

I'm not worried about legacy, but whether it's redundant since it should have already run at this point

llvm/test/CodeGen/AMDGPU/multilevel-break.ll
198	Did this drop the memory operand or something? This looks like a regression
llvm/test/CodeGen/AMDGPU/sink-image-sample.ll
4	Maybe add a comment explaining what this is for?

arsenm added inline comments.Apr 23 2021, 5:58 AM

llvm/test/CodeGen/AMDGPU/multilevel-break.ll
198	Oh, these loads are already volatile and should have had glc set. Maybe this test was last regenerated before glc was emitted for volatile loads?

In D101115#2711964, @arsenm wrote:

In D101115#2711462, @piotr wrote:

I know it is a legacy pass, but I am convinced of its usefulness in our flow - both in real-world content and lit testing (e.g., no_skip_no_successors in skip-if-dead.ll).

I'm not worried about legacy, but whether it's redundant since it should have already run at this point

Some sinking is done as part of other passes, but I do not think this pass is set up to be run at any other point in our pass list.

piotr added inline comments.Apr 23 2021, 7:15 AM

llvm/test/CodeGen/AMDGPU/multilevel-break.ll
198	Good spot - I get the glc generated even without my patch. I will pre-commit those glc changes separately so they do not pop up here.

Rebased, added comment in test.

In D101115#2711972, @piotr wrote:

In D101115#2711964, @arsenm wrote:

In D101115#2711462, @piotr wrote:

I know it is a legacy pass, but I am convinced of its usefulness in our flow - both in real-world content and lit testing (e.g., no_skip_no_successors in skip-if-dead.ll).

I'm not worried about legacy, but whether it's redundant since it should have already run at this point

Some sinking is done as part of other passes, but I do not think this pass is set up to be run at any other point in our pass list.

@foad helped me understand what you mean (thanks). Yes, this pass should be part of the opt pipeline, but currently we rely on it being run in the codegen pipeline. I do not know the history of that, but Mesa also needs it here.

Harbormaster completed remote builds in B100584: Diff 340052.Apr 23 2021, 10:24 AM

In D101115#2712423, @piotr wrote:

In D101115#2711972, @piotr wrote:

In D101115#2711964, @arsenm wrote:

In D101115#2711462, @piotr wrote:

I know it is a legacy pass, but I am convinced of its usefulness in our flow - both in real-world content and lit testing (e.g., no_skip_no_successors in skip-if-dead.ll).

I'm not worried about legacy, but whether it's redundant since it should have already run at this point

Some sinking is done as part of other passes, but I do not think this pass is set up to be run at any other point in our pass list.

@foad helped me understand what you mean (thanks). Yes, this pass should be part of the opt pipeline, but currently we rely on it being run in the codegen pipeline. I do not know the history of that, but Mesa also needs it here.

The backend isn't responsible for adding all the general optimization passes, only cases where we might want them to cleanup other lowering or late optimizations. I don't remember why this ended up here but moving it is fine

This revision is now accepted and ready to land.May 3 2021, 2:27 PM

This revision was landed with ongoing or failed builds.May 11 2021, 5:34 AM

Closed by commit rG09fe84abb4ee: [AMDGPU] Move code sinking before structurizer (authored by piotr). · Explain Why

This revision was automatically updated to reflect the committed changes.

piotr added a commit: rG09fe84abb4ee: [AMDGPU] Move code sinking before structurizer.

Herald added a subscriber: nikic. · View Herald TranscriptMay 11 2021, 5:34 AM

foad mentioned this in D130170: [AMDGPU] Stop running IR code sinking pass.Jul 20 2022, 6:57 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUTargetMachine.cpp

2 lines

test/

CodeGen/

AMDGPU/

llc-pipeline.ll

50 lines

loop_exit_with_xor.ll

8 lines

multilevel-break.ll

7 lines

sink-image-sample.ll

42 lines

Diff 344373

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 1,035 Lines • ▼ Show 20 Lines	bool GCNPassConfig::addPreISel() {
addPass(createAMDGPULateCodeGenPreparePass());		addPass(createAMDGPULateCodeGenPreparePass());
if (EnableAtomicOptimizations) {		if (EnableAtomicOptimizations) {
addPass(createAMDGPUAtomicOptimizerPass());		addPass(createAMDGPUAtomicOptimizerPass());
}		}

// FIXME: We need to run a pass to propagate the attributes when calls are		// FIXME: We need to run a pass to propagate the attributes when calls are
// supported.		// supported.

		addPass(createSinkingPass());
// Merge divergent exit nodes. StructurizeCFG won't recognize the multi-exit		// Merge divergent exit nodes. StructurizeCFG won't recognize the multi-exit
// regions formed by them.		// regions formed by them.
addPass(&AMDGPUUnifyDivergentExitNodesID);		addPass(&AMDGPUUnifyDivergentExitNodesID);
if (!LateCFGStructurize) {		if (!LateCFGStructurize) {
if (EnableStructurizerWorkarounds) {		if (EnableStructurizerWorkarounds) {
addPass(createFixIrreduciblePass());		addPass(createFixIrreduciblePass());
addPass(createUnifyLoopExitsPass());		addPass(createUnifyLoopExitsPass());
}		}
addPass(createStructurizeCFGPass(false)); // true -> SkipUniformRegions		addPass(createStructurizeCFGPass(false)); // true -> SkipUniformRegions
}		}
addPass(createSinkingPass());
addPass(createAMDGPUAnnotateUniformValues());		addPass(createAMDGPUAnnotateUniformValues());
if (!LateCFGStructurize) {		if (!LateCFGStructurize) {
addPass(createSIAnnotateControlFlowPass());		addPass(createSIAnnotateControlFlowPass());
}		}
addPass(createLCSSAPass());		addPass(createLCSSAPass());

return false;		return false;
}		}
▲ Show 20 Lines • Show All 329 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

	Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
	; GCN-O0-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O0-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O0-NEXT: Function Alias Analysis Results			; GCN-O0-NEXT: Function Alias Analysis Results
	; GCN-O0-NEXT: Flatten the CFG			; GCN-O0-NEXT: Flatten the CFG
	; GCN-O0-NEXT: Dominator Tree Construction			; GCN-O0-NEXT: Dominator Tree Construction
	; GCN-O0-NEXT: Post-Dominator Tree Construction			; GCN-O0-NEXT: Post-Dominator Tree Construction
	; GCN-O0-NEXT: Natural Loop Information			; GCN-O0-NEXT: Natural Loop Information
	; GCN-O0-NEXT: Legacy Divergence Analysis			; GCN-O0-NEXT: Legacy Divergence Analysis
	; GCN-O0-NEXT: AMDGPU IR late optimizations			; GCN-O0-NEXT: AMDGPU IR late optimizations
				; GCN-O0-NEXT: Basic Alias Analysis (stateless AA impl)
				; GCN-O0-NEXT: Function Alias Analysis Results
				; GCN-O0-NEXT: Code sinking
				; GCN-O0-NEXT: Legacy Divergence Analysis
	; GCN-O0-NEXT: Unify divergent function exit nodes			; GCN-O0-NEXT: Unify divergent function exit nodes
	; GCN-O0-NEXT: Lazy Value Information Analysis			; GCN-O0-NEXT: Lazy Value Information Analysis
	; GCN-O0-NEXT: Lower SwitchInst's to branches			; GCN-O0-NEXT: Lower SwitchInst's to branches
	; GCN-O0-NEXT: Dominator Tree Construction			; GCN-O0-NEXT: Dominator Tree Construction
	; GCN-O0-NEXT: Natural Loop Information			; GCN-O0-NEXT: Natural Loop Information
	; GCN-O0-NEXT: Convert irreducible control-flow into natural loops			; GCN-O0-NEXT: Convert irreducible control-flow into natural loops
	; GCN-O0-NEXT: Fixup each natural loop to have a single exit block			; GCN-O0-NEXT: Fixup each natural loop to have a single exit block
	; GCN-O0-NEXT: Post-Dominator Tree Construction			; GCN-O0-NEXT: Post-Dominator Tree Construction
	; GCN-O0-NEXT: Dominance Frontier Construction			; GCN-O0-NEXT: Dominance Frontier Construction
	; GCN-O0-NEXT: Detect single entry single exit regions			; GCN-O0-NEXT: Detect single entry single exit regions
	; GCN-O0-NEXT: Region Pass Manager			; GCN-O0-NEXT: Region Pass Manager
	; GCN-O0-NEXT: Structurize control flow			; GCN-O0-NEXT: Structurize control flow
	; GCN-O0-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O0-NEXT: Function Alias Analysis Results
	; GCN-O0-NEXT: Natural Loop Information
	; GCN-O0-NEXT: Code sinking
	; GCN-O0-NEXT: Post-Dominator Tree Construction			; GCN-O0-NEXT: Post-Dominator Tree Construction
				; GCN-O0-NEXT: Natural Loop Information
	; GCN-O0-NEXT: Legacy Divergence Analysis			; GCN-O0-NEXT: Legacy Divergence Analysis
				; GCN-O0-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O0-NEXT: Function Alias Analysis Results			; GCN-O0-NEXT: Function Alias Analysis Results
	; GCN-O0-NEXT: Memory SSA			; GCN-O0-NEXT: Memory SSA
	; GCN-O0-NEXT: AMDGPU Annotate Uniform Values			; GCN-O0-NEXT: AMDGPU Annotate Uniform Values
	; GCN-O0-NEXT: SI annotate control flow			; GCN-O0-NEXT: SI annotate control flow
	; GCN-O0-NEXT: Natural Loop Information			; GCN-O0-NEXT: Natural Loop Information
	; GCN-O0-NEXT: LCSSA Verifier			; GCN-O0-NEXT: LCSSA Verifier
	; GCN-O0-NEXT: Loop-Closed SSA Form Pass			; GCN-O0-NEXT: Loop-Closed SSA Form Pass
	; GCN-O0-NEXT: CallGraph Construction			; GCN-O0-NEXT: CallGraph Construction
	▲ Show 20 Lines • Show All 145 Lines • ▼ Show 20 Lines
	; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O1-NEXT: Function Alias Analysis Results			; GCN-O1-NEXT: Function Alias Analysis Results
	; GCN-O1-NEXT: Flatten the CFG			; GCN-O1-NEXT: Flatten the CFG
	; GCN-O1-NEXT: Dominator Tree Construction			; GCN-O1-NEXT: Dominator Tree Construction
	; GCN-O1-NEXT: Post-Dominator Tree Construction			; GCN-O1-NEXT: Post-Dominator Tree Construction
	; GCN-O1-NEXT: Natural Loop Information			; GCN-O1-NEXT: Natural Loop Information
	; GCN-O1-NEXT: Legacy Divergence Analysis			; GCN-O1-NEXT: Legacy Divergence Analysis
	; GCN-O1-NEXT: AMDGPU IR late optimizations			; GCN-O1-NEXT: AMDGPU IR late optimizations
				; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl)
				; GCN-O1-NEXT: Function Alias Analysis Results
				; GCN-O1-NEXT: Code sinking
				; GCN-O1-NEXT: Legacy Divergence Analysis
	; GCN-O1-NEXT: Unify divergent function exit nodes			; GCN-O1-NEXT: Unify divergent function exit nodes
	; GCN-O1-NEXT: Lazy Value Information Analysis			; GCN-O1-NEXT: Lazy Value Information Analysis
	; GCN-O1-NEXT: Lower SwitchInst's to branches			; GCN-O1-NEXT: Lower SwitchInst's to branches
	; GCN-O1-NEXT: Dominator Tree Construction			; GCN-O1-NEXT: Dominator Tree Construction
	; GCN-O1-NEXT: Natural Loop Information			; GCN-O1-NEXT: Natural Loop Information
	; GCN-O1-NEXT: Convert irreducible control-flow into natural loops			; GCN-O1-NEXT: Convert irreducible control-flow into natural loops
	; GCN-O1-NEXT: Fixup each natural loop to have a single exit block			; GCN-O1-NEXT: Fixup each natural loop to have a single exit block
	; GCN-O1-NEXT: Post-Dominator Tree Construction			; GCN-O1-NEXT: Post-Dominator Tree Construction
	; GCN-O1-NEXT: Dominance Frontier Construction			; GCN-O1-NEXT: Dominance Frontier Construction
	; GCN-O1-NEXT: Detect single entry single exit regions			; GCN-O1-NEXT: Detect single entry single exit regions
	; GCN-O1-NEXT: Region Pass Manager			; GCN-O1-NEXT: Region Pass Manager
	; GCN-O1-NEXT: Structurize control flow			; GCN-O1-NEXT: Structurize control flow
	; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O1-NEXT: Function Alias Analysis Results
	; GCN-O1-NEXT: Natural Loop Information
	; GCN-O1-NEXT: Code sinking
	; GCN-O1-NEXT: Post-Dominator Tree Construction			; GCN-O1-NEXT: Post-Dominator Tree Construction
				; GCN-O1-NEXT: Natural Loop Information
	; GCN-O1-NEXT: Legacy Divergence Analysis			; GCN-O1-NEXT: Legacy Divergence Analysis
				; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O1-NEXT: Function Alias Analysis Results			; GCN-O1-NEXT: Function Alias Analysis Results
	; GCN-O1-NEXT: Memory SSA			; GCN-O1-NEXT: Memory SSA
	; GCN-O1-NEXT: AMDGPU Annotate Uniform Values			; GCN-O1-NEXT: AMDGPU Annotate Uniform Values
	; GCN-O1-NEXT: SI annotate control flow			; GCN-O1-NEXT: SI annotate control flow
	; GCN-O1-NEXT: Natural Loop Information			; GCN-O1-NEXT: Natural Loop Information
	; GCN-O1-NEXT: LCSSA Verifier			; GCN-O1-NEXT: LCSSA Verifier
	; GCN-O1-NEXT: Loop-Closed SSA Form Pass			; GCN-O1-NEXT: Loop-Closed SSA Form Pass
	; GCN-O1-NEXT: CallGraph Construction			; GCN-O1-NEXT: CallGraph Construction
	▲ Show 20 Lines • Show All 240 Lines • ▼ Show 20 Lines
	; GCN-O1-OPTS-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O1-OPTS-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O1-OPTS-NEXT: Function Alias Analysis Results			; GCN-O1-OPTS-NEXT: Function Alias Analysis Results
	; GCN-O1-OPTS-NEXT: Flatten the CFG			; GCN-O1-OPTS-NEXT: Flatten the CFG
	; GCN-O1-OPTS-NEXT: Dominator Tree Construction			; GCN-O1-OPTS-NEXT: Dominator Tree Construction
	; GCN-O1-OPTS-NEXT: Post-Dominator Tree Construction			; GCN-O1-OPTS-NEXT: Post-Dominator Tree Construction
	; GCN-O1-OPTS-NEXT: Natural Loop Information			; GCN-O1-OPTS-NEXT: Natural Loop Information
	; GCN-O1-OPTS-NEXT: Legacy Divergence Analysis			; GCN-O1-OPTS-NEXT: Legacy Divergence Analysis
	; GCN-O1-OPTS-NEXT: AMDGPU IR late optimizations			; GCN-O1-OPTS-NEXT: AMDGPU IR late optimizations
				; GCN-O1-OPTS-NEXT: Basic Alias Analysis (stateless AA impl)
				; GCN-O1-OPTS-NEXT: Function Alias Analysis Results
				; GCN-O1-OPTS-NEXT: Code sinking
				; GCN-O1-OPTS-NEXT: Legacy Divergence Analysis
	; GCN-O1-OPTS-NEXT: Unify divergent function exit nodes			; GCN-O1-OPTS-NEXT: Unify divergent function exit nodes
	; GCN-O1-OPTS-NEXT: Lazy Value Information Analysis			; GCN-O1-OPTS-NEXT: Lazy Value Information Analysis
	; GCN-O1-OPTS-NEXT: Lower SwitchInst's to branches			; GCN-O1-OPTS-NEXT: Lower SwitchInst's to branches
	; GCN-O1-OPTS-NEXT: Dominator Tree Construction			; GCN-O1-OPTS-NEXT: Dominator Tree Construction
	; GCN-O1-OPTS-NEXT: Natural Loop Information			; GCN-O1-OPTS-NEXT: Natural Loop Information
	; GCN-O1-OPTS-NEXT: Convert irreducible control-flow into natural loops			; GCN-O1-OPTS-NEXT: Convert irreducible control-flow into natural loops
	; GCN-O1-OPTS-NEXT: Fixup each natural loop to have a single exit block			; GCN-O1-OPTS-NEXT: Fixup each natural loop to have a single exit block
	; GCN-O1-OPTS-NEXT: Post-Dominator Tree Construction			; GCN-O1-OPTS-NEXT: Post-Dominator Tree Construction
	; GCN-O1-OPTS-NEXT: Dominance Frontier Construction			; GCN-O1-OPTS-NEXT: Dominance Frontier Construction
	; GCN-O1-OPTS-NEXT: Detect single entry single exit regions			; GCN-O1-OPTS-NEXT: Detect single entry single exit regions
	; GCN-O1-OPTS-NEXT: Region Pass Manager			; GCN-O1-OPTS-NEXT: Region Pass Manager
	; GCN-O1-OPTS-NEXT: Structurize control flow			; GCN-O1-OPTS-NEXT: Structurize control flow
	; GCN-O1-OPTS-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O1-OPTS-NEXT: Function Alias Analysis Results
	; GCN-O1-OPTS-NEXT: Natural Loop Information
	; GCN-O1-OPTS-NEXT: Code sinking
	; GCN-O1-OPTS-NEXT: Post-Dominator Tree Construction			; GCN-O1-OPTS-NEXT: Post-Dominator Tree Construction
				; GCN-O1-OPTS-NEXT: Natural Loop Information
	; GCN-O1-OPTS-NEXT: Legacy Divergence Analysis			; GCN-O1-OPTS-NEXT: Legacy Divergence Analysis
				; GCN-O1-OPTS-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O1-OPTS-NEXT: Function Alias Analysis Results			; GCN-O1-OPTS-NEXT: Function Alias Analysis Results
	; GCN-O1-OPTS-NEXT: Memory SSA			; GCN-O1-OPTS-NEXT: Memory SSA
	; GCN-O1-OPTS-NEXT: AMDGPU Annotate Uniform Values			; GCN-O1-OPTS-NEXT: AMDGPU Annotate Uniform Values
	; GCN-O1-OPTS-NEXT: SI annotate control flow			; GCN-O1-OPTS-NEXT: SI annotate control flow
	; GCN-O1-OPTS-NEXT: Natural Loop Information			; GCN-O1-OPTS-NEXT: Natural Loop Information
	; GCN-O1-OPTS-NEXT: LCSSA Verifier			; GCN-O1-OPTS-NEXT: LCSSA Verifier
	; GCN-O1-OPTS-NEXT: Loop-Closed SSA Form Pass			; GCN-O1-OPTS-NEXT: Loop-Closed SSA Form Pass
	; GCN-O1-OPTS-NEXT: CallGraph Construction			; GCN-O1-OPTS-NEXT: CallGraph Construction
	▲ Show 20 Lines • Show All 248 Lines • ▼ Show 20 Lines
	; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O2-NEXT: Function Alias Analysis Results			; GCN-O2-NEXT: Function Alias Analysis Results
	; GCN-O2-NEXT: Flatten the CFG			; GCN-O2-NEXT: Flatten the CFG
	; GCN-O2-NEXT: Dominator Tree Construction			; GCN-O2-NEXT: Dominator Tree Construction
	; GCN-O2-NEXT: Post-Dominator Tree Construction			; GCN-O2-NEXT: Post-Dominator Tree Construction
	; GCN-O2-NEXT: Natural Loop Information			; GCN-O2-NEXT: Natural Loop Information
	; GCN-O2-NEXT: Legacy Divergence Analysis			; GCN-O2-NEXT: Legacy Divergence Analysis
	; GCN-O2-NEXT: AMDGPU IR late optimizations			; GCN-O2-NEXT: AMDGPU IR late optimizations
				; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl)
				; GCN-O2-NEXT: Function Alias Analysis Results
				; GCN-O2-NEXT: Code sinking
				; GCN-O2-NEXT: Legacy Divergence Analysis
	; GCN-O2-NEXT: Unify divergent function exit nodes			; GCN-O2-NEXT: Unify divergent function exit nodes
	; GCN-O2-NEXT: Lazy Value Information Analysis			; GCN-O2-NEXT: Lazy Value Information Analysis
	; GCN-O2-NEXT: Lower SwitchInst's to branches			; GCN-O2-NEXT: Lower SwitchInst's to branches
	; GCN-O2-NEXT: Dominator Tree Construction			; GCN-O2-NEXT: Dominator Tree Construction
	; GCN-O2-NEXT: Natural Loop Information			; GCN-O2-NEXT: Natural Loop Information
	; GCN-O2-NEXT: Convert irreducible control-flow into natural loops			; GCN-O2-NEXT: Convert irreducible control-flow into natural loops
	; GCN-O2-NEXT: Fixup each natural loop to have a single exit block			; GCN-O2-NEXT: Fixup each natural loop to have a single exit block
	; GCN-O2-NEXT: Post-Dominator Tree Construction			; GCN-O2-NEXT: Post-Dominator Tree Construction
	; GCN-O2-NEXT: Dominance Frontier Construction			; GCN-O2-NEXT: Dominance Frontier Construction
	; GCN-O2-NEXT: Detect single entry single exit regions			; GCN-O2-NEXT: Detect single entry single exit regions
	; GCN-O2-NEXT: Region Pass Manager			; GCN-O2-NEXT: Region Pass Manager
	; GCN-O2-NEXT: Structurize control flow			; GCN-O2-NEXT: Structurize control flow
	; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O2-NEXT: Function Alias Analysis Results
	; GCN-O2-NEXT: Natural Loop Information
	; GCN-O2-NEXT: Code sinking
	; GCN-O2-NEXT: Post-Dominator Tree Construction			; GCN-O2-NEXT: Post-Dominator Tree Construction
				; GCN-O2-NEXT: Natural Loop Information
	; GCN-O2-NEXT: Legacy Divergence Analysis			; GCN-O2-NEXT: Legacy Divergence Analysis
				; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O2-NEXT: Function Alias Analysis Results			; GCN-O2-NEXT: Function Alias Analysis Results
	; GCN-O2-NEXT: Memory SSA			; GCN-O2-NEXT: Memory SSA
	; GCN-O2-NEXT: AMDGPU Annotate Uniform Values			; GCN-O2-NEXT: AMDGPU Annotate Uniform Values
	; GCN-O2-NEXT: SI annotate control flow			; GCN-O2-NEXT: SI annotate control flow
	; GCN-O2-NEXT: Natural Loop Information			; GCN-O2-NEXT: Natural Loop Information
	; GCN-O2-NEXT: LCSSA Verifier			; GCN-O2-NEXT: LCSSA Verifier
	; GCN-O2-NEXT: Loop-Closed SSA Form Pass			; GCN-O2-NEXT: Loop-Closed SSA Form Pass
	; GCN-O2-NEXT: CallGraph Construction			; GCN-O2-NEXT: CallGraph Construction
	▲ Show 20 Lines • Show All 261 Lines • ▼ Show 20 Lines
	; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O3-NEXT: Function Alias Analysis Results			; GCN-O3-NEXT: Function Alias Analysis Results
	; GCN-O3-NEXT: Flatten the CFG			; GCN-O3-NEXT: Flatten the CFG
	; GCN-O3-NEXT: Dominator Tree Construction			; GCN-O3-NEXT: Dominator Tree Construction
	; GCN-O3-NEXT: Post-Dominator Tree Construction			; GCN-O3-NEXT: Post-Dominator Tree Construction
	; GCN-O3-NEXT: Natural Loop Information			; GCN-O3-NEXT: Natural Loop Information
	; GCN-O3-NEXT: Legacy Divergence Analysis			; GCN-O3-NEXT: Legacy Divergence Analysis
	; GCN-O3-NEXT: AMDGPU IR late optimizations			; GCN-O3-NEXT: AMDGPU IR late optimizations
				; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)
				; GCN-O3-NEXT: Function Alias Analysis Results
				; GCN-O3-NEXT: Code sinking
				; GCN-O3-NEXT: Legacy Divergence Analysis
	; GCN-O3-NEXT: Unify divergent function exit nodes			; GCN-O3-NEXT: Unify divergent function exit nodes
	; GCN-O3-NEXT: Lazy Value Information Analysis			; GCN-O3-NEXT: Lazy Value Information Analysis
	; GCN-O3-NEXT: Lower SwitchInst's to branches			; GCN-O3-NEXT: Lower SwitchInst's to branches
	; GCN-O3-NEXT: Dominator Tree Construction			; GCN-O3-NEXT: Dominator Tree Construction
	; GCN-O3-NEXT: Natural Loop Information			; GCN-O3-NEXT: Natural Loop Information
	; GCN-O3-NEXT: Convert irreducible control-flow into natural loops			; GCN-O3-NEXT: Convert irreducible control-flow into natural loops
	; GCN-O3-NEXT: Fixup each natural loop to have a single exit block			; GCN-O3-NEXT: Fixup each natural loop to have a single exit block
	; GCN-O3-NEXT: Post-Dominator Tree Construction			; GCN-O3-NEXT: Post-Dominator Tree Construction
	; GCN-O3-NEXT: Dominance Frontier Construction			; GCN-O3-NEXT: Dominance Frontier Construction
	; GCN-O3-NEXT: Detect single entry single exit regions			; GCN-O3-NEXT: Detect single entry single exit regions
	; GCN-O3-NEXT: Region Pass Manager			; GCN-O3-NEXT: Region Pass Manager
	; GCN-O3-NEXT: Structurize control flow			; GCN-O3-NEXT: Structurize control flow
	; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O3-NEXT: Function Alias Analysis Results
	; GCN-O3-NEXT: Natural Loop Information
	; GCN-O3-NEXT: Code sinking
	; GCN-O3-NEXT: Post-Dominator Tree Construction			; GCN-O3-NEXT: Post-Dominator Tree Construction
				; GCN-O3-NEXT: Natural Loop Information
	; GCN-O3-NEXT: Legacy Divergence Analysis			; GCN-O3-NEXT: Legacy Divergence Analysis
				; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O3-NEXT: Function Alias Analysis Results			; GCN-O3-NEXT: Function Alias Analysis Results
	; GCN-O3-NEXT: Memory SSA			; GCN-O3-NEXT: Memory SSA
	; GCN-O3-NEXT: AMDGPU Annotate Uniform Values			; GCN-O3-NEXT: AMDGPU Annotate Uniform Values
	; GCN-O3-NEXT: SI annotate control flow			; GCN-O3-NEXT: SI annotate control flow
	; GCN-O3-NEXT: Natural Loop Information			; GCN-O3-NEXT: Natural Loop Information
	; GCN-O3-NEXT: LCSSA Verifier			; GCN-O3-NEXT: LCSSA Verifier
	; GCN-O3-NEXT: Loop-Closed SSA Form Pass			; GCN-O3-NEXT: Loop-Closed SSA Form Pass
	; GCN-O3-NEXT: CallGraph Construction			; GCN-O3-NEXT: CallGraph Construction
	▲ Show 20 Lines • Show All 141 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/loop_exit_with_xor.ll

	; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx803 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx803 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

	; Where the mask of lanes wanting to exit the loop on this iteration is not			; Where the mask of lanes wanting to exit the loop on this iteration is not
	; obviously already masked by exec (in this case, the xor with -1 inserted by			; obviously already masked by exec (in this case, the xor with -1 inserted by
	; control flow annotation), then lower control flow must insert an S_AND_B64			; control flow annotation), then lower control flow must insert an S_AND_B64
	; with exec.			; with exec.

	; GCN-LABEL: {{^}}needs_and:			; GCN-LABEL: {{^}}needs_and:
	; GCN: s_xor_b64 [[REG1:[^ ,]]], {{[^ ,], -1$}}
	; GCN: s_and_b64 [[REG2:[^ ,]*]], exec, [[REG1]]
	; GCN: s_or_b64 [[REG3:[^ ,]*]], [[REG2]],
	; GCN: s_andn2_b64 exec, exec, [[REG3]]

				; GCN: s_or_b64 exec, exec, [[REG1:[^ ,]*]]
				; GCN: s_andn2_b64 exec, exec, [[REG2:[^ ,]*]]
				; GCN: s_or_b64 [[REG2:[^ ,]]], [[REG1:[^ ,]]], [[REG2:[^ ,]*]]
				; GCN: s_or_b64 exec, exec, [[REG2:[^ ,]*]]
	define void @needs_and(i32 %arg) {			define void @needs_and(i32 %arg) {
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%tmp23phi = phi i32 [ %tmp23, %endif ], [ 0, %entry ]			%tmp23phi = phi i32 [ %tmp23, %endif ], [ 0, %entry ]
	%tmp23 = add nuw i32 %tmp23phi, 1			%tmp23 = add nuw i32 %tmp23phi, 1
	%tmp27 = icmp ult i32 %arg, %tmp23			%tmp27 = icmp ult i32 %arg, %tmp23
	▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/multilevel-break.ll

	Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
	; GCN-NEXT: s_or_b64 exec, exec, s[8:9]			; GCN-NEXT: s_or_b64 exec, exec, s[8:9]
	; GCN-NEXT: s_and_b64 s[8:9], exec, s[6:7]			; GCN-NEXT: s_and_b64 s[8:9], exec, s[6:7]
	; GCN-NEXT: s_or_b64 s[4:5], s[8:9], s[4:5]			; GCN-NEXT: s_or_b64 s[4:5], s[8:9], s[4:5]
	; GCN-NEXT: s_andn2_b64 exec, exec, s[4:5]			; GCN-NEXT: s_andn2_b64 exec, exec, s[4:5]
	; GCN-NEXT: s_cbranch_execz BB0_1			; GCN-NEXT: s_cbranch_execz BB0_1
	; GCN-NEXT: BB0_4: ; %LOOP			; GCN-NEXT: BB0_4: ; %LOOP
	; GCN-NEXT: ; Parent Loop BB0_2 Depth=1			; GCN-NEXT: ; Parent Loop BB0_2 Depth=1
	; GCN-NEXT: ; => This Inner Loop Header: Depth=2			; GCN-NEXT: ; => This Inner Loop Header: Depth=2
	; GCN-NEXT: v_mov_b32_e32 v1, v0			; GCN-NEXT: v_cmp_lt_i32_e32 vcc, v0, v4
	; GCN-NEXT: v_add_i32_e32 v0, vcc, 1, v1
	; GCN-NEXT: v_cmp_lt_i32_e32 vcc, v1, v4
	; GCN-NEXT: s_or_b64 s[2:3], s[2:3], exec			; GCN-NEXT: s_or_b64 s[2:3], s[2:3], exec
	; GCN-NEXT: s_or_b64 s[6:7], s[6:7], exec			; GCN-NEXT: s_or_b64 s[6:7], s[6:7], exec
	; GCN-NEXT: s_and_saveexec_b64 s[8:9], vcc			; GCN-NEXT: s_and_saveexec_b64 s[8:9], vcc
	; GCN-NEXT: s_cbranch_execz BB0_3			; GCN-NEXT: s_cbranch_execz BB0_3
	; GCN-NEXT: ; %bb.5: ; %ENDIF			; GCN-NEXT: ; %bb.5: ; %ENDIF
	; GCN-NEXT: ; in Loop: Header=BB0_4 Depth=2			; GCN-NEXT: ; in Loop: Header=BB0_4 Depth=2
	; GCN-NEXT: v_cmp_ne_u32_e32 vcc, v5, v0			; GCN-NEXT: v_add_i32_e32 v0, vcc, 1, v0
	; GCN-NEXT: s_andn2_b64 s[2:3], s[2:3], exec			; GCN-NEXT: s_andn2_b64 s[2:3], s[2:3], exec
				; GCN-NEXT: v_cmp_ne_u32_e32 vcc, v5, v0
	; GCN-NEXT: s_andn2_b64 s[6:7], s[6:7], exec			; GCN-NEXT: s_andn2_b64 s[6:7], s[6:7], exec
	; GCN-NEXT: s_and_b64 s[10:11], vcc, exec			; GCN-NEXT: s_and_b64 s[10:11], vcc, exec
	; GCN-NEXT: s_or_b64 s[6:7], s[6:7], s[10:11]			; GCN-NEXT: s_or_b64 s[6:7], s[6:7], s[10:11]
	; GCN-NEXT: s_branch BB0_3			; GCN-NEXT: s_branch BB0_3
	; GCN-NEXT: BB0_6: ; %IF			; GCN-NEXT: BB0_6: ; %IF
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	main_body:			main_body:
	br label %LOOP.outer			br label %LOOP.outer
	▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines
	; GCN-NEXT: s_or_b64 s[0:1], s[6:7], s[0:1]			; GCN-NEXT: s_or_b64 s[0:1], s[6:7], s[0:1]
	; GCN-NEXT: s_andn2_b64 s[4:5], s[4:5], exec			; GCN-NEXT: s_andn2_b64 s[4:5], s[4:5], exec
	; GCN-NEXT: s_and_b64 s[6:7], s[8:9], exec			; GCN-NEXT: s_and_b64 s[6:7], s[8:9], exec
	; GCN-NEXT: s_or_b64 s[4:5], s[4:5], s[6:7]			; GCN-NEXT: s_or_b64 s[4:5], s[4:5], s[6:7]
	; GCN-NEXT: s_andn2_b64 exec, exec, s[0:1]			; GCN-NEXT: s_andn2_b64 exec, exec, s[0:1]
	; GCN-NEXT: s_cbranch_execz BB1_9			; GCN-NEXT: s_cbranch_execz BB1_9
	; GCN-NEXT: BB1_2: ; %bb1			; GCN-NEXT: BB1_2: ; %bb1
	; GCN-NEXT: ; =>This Inner Loop Header: Depth=1			; GCN-NEXT: ; =>This Inner Loop Header: Depth=1
	; GCN-NEXT: buffer_load_dword v1, off, s[0:3], 0 glc			; GCN-NEXT: buffer_load_dword v1, off, s[0:3], 0 glc
				arsenmUnsubmitted Not Done Reply Inline Actions Did this drop the memory operand or something? This looks like a regression arsenm: Did this drop the memory operand or something? This looks like a regression
				arsenmUnsubmitted Not Done Reply Inline Actions Oh, these loads are already volatile and should have had glc set. Maybe this test was last regenerated before glc was emitted for volatile loads? arsenm: Oh, these loads are already volatile and should have had glc set. Maybe this test was last…
				piotrAuthorUnsubmitted Done Reply Inline Actions Good spot - I get the glc generated even without my patch. I will pre-commit those glc changes separately so they do not pop up here. piotr: Good spot - I get the glc generated even without my patch. I will pre-commit those glc changes…
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_cmp_gt_i32_e32 vcc, 1, v1			; GCN-NEXT: v_cmp_gt_i32_e32 vcc, 1, v1
	; GCN-NEXT: s_mov_b64 s[6:7], -1			; GCN-NEXT: s_mov_b64 s[6:7], -1
	; GCN-NEXT: s_and_b64 vcc, exec, vcc			; GCN-NEXT: s_and_b64 vcc, exec, vcc
	; GCN-NEXT: ; implicit-def: $sgpr8_sgpr9			; GCN-NEXT: ; implicit-def: $sgpr8_sgpr9
	; GCN-NEXT: s_mov_b64 s[10:11], -1			; GCN-NEXT: s_mov_b64 s[10:11], -1
	; GCN-NEXT: s_cbranch_vccnz BB1_6			; GCN-NEXT: s_cbranch_vccnz BB1_6
	; GCN-NEXT: ; %bb.3: ; %LeafBlock1			; GCN-NEXT: ; %bb.3: ; %LeafBlock1
	▲ Show 20 Lines • Show All 74 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/sink-image-sample.ll

This file was added.

				; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN %s
				; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN %s

				; Test that image.sample instruction is sunk across the branch and not left in the first block. Since the kill may terminate the shader there might be no need for sampling the image.
				arsenmUnsubmitted Not Done Reply Inline Actions Maybe add a comment explaining what this is for? arsenm: Maybe add a comment explaining what this is for?

				; GCN-LABEL: {{^}}sinking_img_sample:
				; GCN-NOT: image_sample
				; GCN: branch
				; GCN: image_sample
				; GCN: exp null

				define amdgpu_ps float @sinking_img_sample() {
				main_body:
				%i = call <3 x float> @llvm.amdgcn.image.sample.2d.v3f32.f32(i32 7, float undef, float undef, <8 x i32> undef, <4 x i32> undef, i1 false, i32 0, i32 0)
				br i1 undef, label %endif1, label %if1

				if1: ; preds = %main_body
				call void @llvm.amdgcn.kill(i1 false) #4
				br label %exit

				endif1: ; preds = %main_body
				%i22 = extractelement <3 x float> %i, i32 2
				%i23 = call nsz arcp contract float @llvm.fma.f32(float %i22, float 0.000000e+00, float 0.000000e+00) #1
				br label %exit

				exit: ; preds = %endif1, %if1
				%i24 = phi float [ undef, %if1 ], [ %i23, %endif1 ]
				ret float %i24
				}
				; Function Attrs: nounwind readonly willreturn
				declare <3 x float> @llvm.amdgcn.image.sample.2d.v3f32.f32(i32 immarg, float, float, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #3

				; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
				declare float @llvm.fma.f32(float, float, float) #2

				; Function Attrs: nounwind
				declare void @llvm.amdgcn.kill(i1) #4

				attributes #1 = { nounwind readnone }
				attributes #2 = { nofree nosync nounwind readnone speculatable willreturn }
				attributes #3 = { nounwind readonly willreturn }
				attributes #4 = { nounwind }