This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Lower regbanks reassign threshold to 15000
ClosedPublic

Authored by rampitec on Apr 20 2021, 4:00 PM.

Download Raw Diff

Details

Reviewers

msearles
foad

Commits

rGf9d0d0d7e01f: [AMDGPU] Lower regbanks reassign threshold to 15000

Summary

Let it work on a very small kernels only. Measurements showed
the performance benefit is not worth the compile time.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

rampitec created this revision.Apr 20 2021, 4:00 PM

Herald added subscribers: kerbowa, hiraditya, t-tye and 7 others. · View Herald TranscriptApr 20 2021, 4:00 PM

rampitec requested review of this revision.Apr 20 2021, 4:00 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 20 2021, 4:00 PM

Herald added a subscriber: wdng. · View Herald Transcript

LGTM

This revision is now accepted and ready to land.Apr 20 2021, 4:49 PM

Have you looked into changing this to operate on regunits?

Harbormaster completed remote builds in B99830: Diff 339023.Apr 20 2021, 5:49 PM

In D100904#2703654, @arsenm wrote:

Have you looked into changing this to operate on regunits?

I still don't understand how shall it work with reg units, but even then it will not change the complexity of the algorithm, just a cost of a single pass

Is there a measurable benefit on any kernels?

In D100904#2704460, @foad wrote:

Is there a measurable benefit on any kernels?

Not that I know.

This revision was landed with ongoing or failed builds.Apr 21 2021, 8:34 AM

Closed by commit rGf9d0d0d7e01f: [AMDGPU] Lower regbanks reassign threshold to 15000 (authored by rampitec). · Explain Why

This revision was automatically updated to reflect the committed changes.

rampitec added a commit: rGf9d0d0d7e01f: [AMDGPU] Lower regbanks reassign threshold to 15000.

In D100904#2705174, @rampitec wrote:

In D100904#2704460, @foad wrote:

Is there a measurable benefit on any kernels?

Not that I know.

Then maybe delete the pass, or at least disable it by default?

In D100904#2707688, @foad wrote:

In D100904#2705174, @rampitec wrote:

In D100904#2704460, @foad wrote:

Is there a measurable benefit on any kernels?

Not that I know.

Then maybe delete the pass, or at least disable it by default?

Both are options as well as to enable only at -O3. However, I still think we have not enough data. JBTW, what gfx benchmark suite tells us?

I feel we should probably discuss more about the optimal solution than the current one. I have a gut feeling that the problem pass is trying to solve is a flavor of graph coloring problem where we have to color the interference graph with K number of colors where K is the number of register banks. We may need to redo the interference graph in this pass and then attempt the coloring.

In D100904#2707767, @madhur13490 wrote:

I feel we should probably discuss more about the optimal solution than the current one. I have a gut feeling that the problem pass is trying to solve is a flavor of graph coloring problem where we have to color the interference graph with K number of colors where K is the number of register banks. We may need to redo the interference graph in this pass and then attempt the coloring.

It would be much better to implement it inside the RA itself. But only if it is a real thing. The current pass can eliminate most of the conflicts. I yet want to see a confirmation it improves performance just anywhere and quantify it.

JBTW, what gfx benchmark suite tells us?

We don't have an easy way to run lots of benchmarks. I tried running GFXBench 5.0.0 on my own machine with GFX10 hardware, and I couldn't detect any difference from enabling/disabling this pass. The results seemed to be stable within about +/- 0.1%.

In D100904#2712202, @foad wrote:

JBTW, what gfx benchmark suite tells us?

We don't have an easy way to run lots of benchmarks. I tried running GFXBench 5.0.0 on my own machine with GFX10 hardware, and I couldn't detect any difference from enabling/disabling this pass. The results seemed to be stable within about +/- 0.1%.

Maybe we really should just drop it then.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

GCNRegBankReassign.cpp

2 lines

Diff 339250

llvm/lib/Target/AMDGPU/GCNRegBankReassign.cpp

	Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	static cl::opt<unsigned> VerifyStallCycles("amdgpu-verify-regbanks-reassign",			static cl::opt<unsigned> VerifyStallCycles("amdgpu-verify-regbanks-reassign",
	cl::desc("Verify stall cycles in the regbanks reassign pass"),			cl::desc("Verify stall cycles in the regbanks reassign pass"),
	cl::value_desc("0\|1\|2"),			cl::value_desc("0\|1\|2"),
	cl::init(0), cl::Hidden);			cl::init(0), cl::Hidden);

	// Threshold to keep compile time reasonable.			// Threshold to keep compile time reasonable.
	static cl::opt<unsigned> VRegThresh("amdgpu-regbanks-reassign-threshold",			static cl::opt<unsigned> VRegThresh("amdgpu-regbanks-reassign-threshold",
	cl::desc("Max number of vregs to run the regbanks reassign pass"),			cl::desc("Max number of vregs to run the regbanks reassign pass"),
	cl::init(100000), cl::Hidden);			cl::init(15000), cl::Hidden);

	#define DEBUG_TYPE "amdgpu-regbanks-reassign"			#define DEBUG_TYPE "amdgpu-regbanks-reassign"

	#define NUM_VGPR_BANKS 4			#define NUM_VGPR_BANKS 4
	#define NUM_SGPR_BANKS 8			#define NUM_SGPR_BANKS 8
	#define NUM_BANKS (NUM_VGPR_BANKS + NUM_SGPR_BANKS)			#define NUM_BANKS (NUM_VGPR_BANKS + NUM_SGPR_BANKS)
	#define SGPR_BANK_OFFSET NUM_VGPR_BANKS			#define SGPR_BANK_OFFSET NUM_VGPR_BANKS
	#define VGPR_BANK_MASK 0xf			#define VGPR_BANK_MASK 0xf
	▲ Show 20 Lines • Show All 837 Lines • Show Last 20 Lines