This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Introduce a flag to control enable/disable instruction sink pass
Needs ReviewPublic

Authored by cfang on Oct 8 2020, 10:03 PM.

Download Raw Diff

Details

Reviewers

kerbowa
rampitec
sameerds
msearles
arsenm

Summary

Instruction sink pass tries to sink instructions to the lowest possible point, with an effort
to bring instructions closer to their users. But it does not have heuristic regarding actual
benefit to do so. For example, when you sink an instruction, it may increase the live ranges
of it uses.
We have observe a greaterr than 10% performance benefit on some applications if we completely
turn off this pass. A flag to control this pass will allow us to tune the performance. It will also help us
isolate potential correctness issue that may be introduced from the llvm community.

Diff Detail

Event Timeline

cfang created this revision.Oct 8 2020, 10:03 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 8 2020, 10:03 PM

Herald added subscribers: hiraditya, t-tye, tpr and 5 others. · View Herald Transcript

cfang requested review of this revision.Oct 8 2020, 10:03 PM

Herald added a subscriber: wdng. · View Herald TranscriptOct 8 2020, 10:03 PM

cfang added reviewers: sameerds, msearles.Oct 8 2020, 10:33 PM

Just adding another flag isn't really fixing anything

This revision now requires changes to proceed.Oct 9 2020, 12:27 PM

In D89095#2322548, @arsenm wrote:

Just adding another flag isn't really fixing anything

Right, But we are providing a way to do performance tuning as well as triaging bugs.

I agree with Matt here. You should be able to do experiments locally. Perhaps sinking should be disabled entirely, or perhaps sinking should be improved to take register liveness into account.

In D89095#2325401, @nhaehnle wrote:

I agree with Matt here. You should be able to do experiments locally. Perhaps sinking should be disabled entirely, or perhaps sinking should be improved to take register liveness into account.

Right, my point of view is that the Sink pass should be re-evaluated. Since it has been introduced into AMDGPU pipeline many years ago, there must have been unexpected dependencies
on this pass.

Local experiments are absolutely necessary, but not sufficient. I propose to introduce a flag which could enable a broad range of experiments (from CQE for example). And then we can make decisions:

if it is mostly negative, we can disable it completely;
if it is mostly positive, we can keep it as it is;
if it it is in the middle, we can make effort to enhance it;

Why does this require a commit upstream?

In D89095#2325539, @nhaehnle wrote:

Why does this require a commit upstream?

So everyone can use it. Also,
If my understand is correct, every optimization pass should have an associated flag to turn it on/off.

Assuming this is irrelevant now

This revision now requires review to proceed.Sep 28 2022, 2:29 PM

Herald added a project: Restricted Project. · View Herald TranscriptSep 28 2022, 2:29 PM

Herald added subscribers: kosarev, foad, arsenm. · View Herald Transcript

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUTargetMachine.cpp

8 lines

Diff 297120

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

	Show All 12 Lines
	cl::init(true),			cl::init(true),
	cl::Hidden);			cl::Hidden);

	static cl::opt<bool> EnableStructurizerWorkarounds(			static cl::opt<bool> EnableStructurizerWorkarounds(
	"amdgpu-enable-structurizer-workarounds",			"amdgpu-enable-structurizer-workarounds",
	cl::desc("Enable workarounds for the StructurizeCFG pass"), cl::init(true),			cl::desc("Enable workarounds for the StructurizeCFG pass"), cl::init(true),
	cl::Hidden);			cl::Hidden);

				static cl::opt<bool> EnableInstructionSink(
				"amdgpu-enable-instruction-sink",
				cl::desc("Enable instruction sink pass"), cl::init(true),
				cl::Hidden);

	extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {			extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {
	// Register the target			// Register the target
	RegisterTargetMachine<R600TargetMachine> X(getTheAMDGPUTarget());			RegisterTargetMachine<R600TargetMachine> X(getTheAMDGPUTarget());
	RegisterTargetMachine<GCNTargetMachine> Y(getTheGCNTarget());			RegisterTargetMachine<GCNTargetMachine> Y(getTheGCNTarget());

	PassRegistry *PR = PassRegistry::getPassRegistry();			PassRegistry *PR = PassRegistry::getPassRegistry();
	initializeR600ClauseMergePassPass(*PR);			initializeR600ClauseMergePassPass(*PR);
	initializeR600ControlFlowFinalizerPass(*PR);			initializeR600ControlFlowFinalizerPass(*PR);
	Show All 24 Lines
	addPass(&AMDGPUUnifyDivergentExitNodesID);			addPass(&AMDGPUUnifyDivergentExitNodesID);
	if (!LateCFGStructurize) {			if (!LateCFGStructurize) {
	if (EnableStructurizerWorkarounds) {			if (EnableStructurizerWorkarounds) {
	addPass(createFixIrreduciblePass());			addPass(createFixIrreduciblePass());
	addPass(createUnifyLoopExitsPass());			addPass(createUnifyLoopExitsPass());
	}			}
	addPass(createStructurizeCFGPass(false)); // true -> SkipUniformRegions			addPass(createStructurizeCFGPass(false)); // true -> SkipUniformRegions
	}			}
	addPass(createSinkingPass());			if (EnableInstructionSink)
				addPass(createSinkingPass());
	addPass(createAMDGPUAnnotateUniformValues());			addPass(createAMDGPUAnnotateUniformValues());
	if (!LateCFGStructurize) {			if (!LateCFGStructurize) {
	addPass(createSIAnnotateControlFlowPass());			addPass(createSIAnnotateControlFlowPass());
	}			}
	addPass(createLCSSAPass());			addPass(createLCSSAPass());

	return false;			return false;
	}			}
	Show All 12 Lines