This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] New method to estimate register pressure
ClosedPublic

Authored by rampitec on Feb 27 2017, 7:07 PM.

Details

Summary

This change introduces new method to estimate register pressure in
GCNScheduler. Standard RPTracker gives huge error due to the following
reasons:

  1. It does not account for live-ins or live-outs if value is not used

in the region itself. That creates a huge error in a very common case
if there are a lot of live-thu registers.

  1. It does not properly count subregs.
  2. It assumes a register used as an input operand can be reused as an

output. This is not always possible by itself, this is not what RA
will finally do in many cases for various reasons not limited to RA's
inability to do so, and this is not so if the value is actually a
live-thu.

In addition we can now see clear separation between live-in pressure
which we cannot change with the scheduling and tentative pressure
which we can change.

Diff Detail

Repository
rL LLVM

Event Timeline

rampitec created this revision.Feb 27 2017, 7:07 PM
vpykhtin added inline comments.Feb 28 2017, 5:55 AM
lib/Target/AMDGPU/GCNSchedStrategy.cpp
396 ↗(On Diff #89962)

Should this specify slot type explicitly? Something like LIS->getInstructionIndex(*begin()).getBaseIndex() ? I just don't know which slot will be returned by default. For example if it return dead slot you will get ranges that actually live after the instruction.

415 ↗(On Diff #89962)

When a map is indexed for non-existent key the value constructed with default constructor is inserted for this key. LaneBitmask::getNone ()is just a syntax sugar for LaneBitmask(). So you can just read LiveIns[Reg] here without initializing it.

472 ↗(On Diff #89962)

Again this is redundant. If there is no such key in the map the LiveRegs[Reg] will return LaneBitmask() on read.

473 ↗(On Diff #89962)

Please don't copy/paste map indexing. You can't assume compiler will always optimize this and the operation involves hash calculation on the key and array indexing (in the best case). I don't encorage such copy/paste indexing even for ordinary arrays.

rampitec added inline comments.Feb 28 2017, 8:18 AM
lib/Target/AMDGPU/GCNSchedStrategy.cpp
415 ↗(On Diff #89962)

LaneBitmask does not have default initializer value. I guess it worth nothing that LaneBitmask() should result in zero initialization.

473 ↗(On Diff #89962)

OK, I can get references. That will be more verbose but less work for compiler to optimize.

vpykhtin added inline comments.Feb 28 2017, 8:24 AM
lib/Target/AMDGPU/GCNSchedStrategy.cpp
415 ↗(On Diff #89962)

It has, though using C++11 syntax:

LineBitmask.h: line 77

private:
  Type Mask = 0;
};
rampitec added inline comments.Feb 28 2017, 8:31 AM
lib/Target/AMDGPU/GCNSchedStrategy.cpp
415 ↗(On Diff #89962)

Oh, thanks!

rampitec added inline comments.Feb 28 2017, 8:37 AM
lib/Target/AMDGPU/GCNSchedStrategy.cpp
396 ↗(On Diff #89962)

It actually returns first MI in a bundle if any:

Mi2IndexMap::const_iterator itr = mi2iMap.find(getBundleStart(MI));

I guess that is what I'm looking here for.

If you aren't going to use any of the RegisterPressure information, then you should disable register pressure tracking to improve compile time performance. Also, I think it would be nice if the queries could be abstracted so it was easy to switch between the standard ReigsterPressureTracker and the improved version you have implemented.

If you aren't going to use any of the RegisterPressure information, then you should disable register pressure tracking to improve compile time performance. Also, I think it would be nice if the queries could be abstracted so it was easy to switch between the standard ReigsterPressureTracker and the improved version you have implemented.

It is not a tracker, just the estimation. It least not just yet. During scheduling standard RPTracker still used for pressure diffs.

vpykhtin added inline comments.Feb 28 2017, 8:46 AM
lib/Target/AMDGPU/GCNSchedStrategy.cpp
396 ↗(On Diff #89962)

Right, but what is a slot type for this first bundle instruction?

rampitec updated this revision to Diff 90047.Feb 28 2017, 9:05 AM
rampitec marked 10 inline comments as done.

Removed redundant map references.
Added .getBaseIndex() for slot query.

rampitec updated this revision to Diff 90048.Feb 28 2017, 9:08 AM

Added full context diff.

vpykhtin accepted this revision.Feb 28 2017, 9:15 AM

LGTM.

This revision is now accepted and ready to land.Feb 28 2017, 9:15 AM
This revision was automatically updated to reflect the committed changes.