This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Turn register pressure estimation into forward tracker
ClosedPublic

Authored by rampitec on May 11 2017, 10:22 AM.

Details

Summary

This factors register pressure estimation mechanism from the
GCNSchedStrategy into the forward tracker to unify interface
with other strategies and expose it to other interested phases.

Diff Detail

Repository
rL LLVM

Event Timeline

rampitec created this revision.May 11 2017, 10:22 AM
rampitec updated this revision to Diff 98721.May 11 2017, 9:51 PM

Added LastTrackedMI assignment.

rampitec updated this revision to Diff 98723.May 11 2017, 11:29 PM

Split advance() into advanceBefore(MI) and advanceTo(MI).
The tracker does two things: tracks pressure and returns live reg set. A proper live-in set when advancing to an instruction is after we have committed all dead registers before moving to the new instruction, but before we have added new defs from that new instruction.
If we just call advance(), which sequentially calls advanceBefore() and advanceTo(), we would always get a correct live reg set at the MI argument, but that is not the same as live-in set before this instruction. Calling just advanceBefore() leaves it in a live-in state.
In addition added helper to clear maximum register pressure, which is also needed to be reset in between of these two steps if we are tracking the block and want to split it into regions.

vpykhtin added inline comments.May 12 2017, 7:18 AM
lib/Target/AMDGPU/GCNRegPressure.cpp
305 ↗(On Diff #98723)

I think this should be the responsibility of the caller to account valid instructions, it would look unexpected that reset moves somewhere else.

lib/Target/AMDGPU/GCNRegPressure.h
150 ↗(On Diff #98723)

we have some inconsistency here: upward tracker's "reset" moves to the point after the MI, "recede" moves to the point before the MI, but downward reset and advance moves at.

rampitec added inline comments.May 12 2017, 8:02 AM
lib/Target/AMDGPU/GCNRegPressure.cpp
305 ↗(On Diff #98723)

Caller will always need to do it, otherwise reset will just crash. Then advance itself skips debug instructions, so they are consistent. If a caller resets tracker to the debug instruction and then just calls advance increasing the iterator it will not notice any difference. There is nothing good in letting caller to easily crash, there is no additional benefit of it besides the crash.

lib/Target/AMDGPU/GCNRegPressure.h
150 ↗(On Diff #98723)

I do not think this is an inconsistency if we think about advance/recede as a two step move. We really want tracker to do two things, count pressure and get live-ins/live-outs. So reset just does the first step allowing us to collect required live set. If not that advance could just call advance one time to move at the argument instruction.

rampitec added inline comments.May 12 2017, 8:36 AM
lib/Target/AMDGPU/GCNRegPressure.cpp
305 ↗(On Diff #98723)

BTW, reset does not record any state besides live set, and live set is the same if you skip debug or not.

arsenm added inline comments.May 12 2017, 10:43 AM
lib/Target/AMDGPU/GCNRegPressure.cpp
313–314 ↗(On Diff #98723)

Should this be using something like isTransient (as it is now looks inappropriate) to avoid IMPLICIT_DEF from impacting the pressure estimate?

317 ↗(On Diff #98723)

extra space

359 ↗(On Diff #98723)

std::max, not sure how this builds for you

vpykhtin added inline comments.May 12 2017, 10:44 AM
lib/Target/AMDGPU/GCNRegPressure.cpp
359 ↗(On Diff #98723)

this is overloaded max for GCNRegPressure

arsenm added inline comments.May 12 2017, 10:48 AM
lib/Target/AMDGPU/GCNRegPressure.cpp
359 ↗(On Diff #98723)

Probably should name it something else

rampitec added inline comments.May 12 2017, 10:49 AM
lib/Target/AMDGPU/GCNRegPressure.cpp
313–314 ↗(On Diff #98723)

I believe these defs should be accounted towards pressure. RA will finally allocate something for it, and in our case it is better to overestimate pressure rather than underestimate it. Anyway, this is not a functional change, it basically factors out an existing algorithm into a common interface. I do not want to change its behavior with this change.

rampitec updated this revision to Diff 98803.May 12 2017, 10:52 AM
rampitec marked an inline comment as done.

Removed extra space.

vpykhtin added inline comments.May 12 2017, 10:52 AM
lib/Target/AMDGPU/GCNRegPressure.cpp
359 ↗(On Diff #98723)

I thought to choose better name for it as it doesn't compare two pressures and returning larger but instead constructs new pressure using maximum values from both, what would be better name here?

arsenm added inline comments.May 12 2017, 10:54 AM
lib/Target/AMDGPU/GCNRegPressure.cpp
313–314 ↗(On Diff #98723)

implicit_def will use whatever register is available, I think usually (always?) VGPR0, so it shouldn't count. Copy/phi is more questionable.

rampitec updated this revision to Diff 98804.May 12 2017, 10:54 AM

Uploaded correct patch.

rampitec added inline comments.May 12 2017, 10:58 AM
lib/Target/AMDGPU/GCNRegPressure.cpp
313–314 ↗(On Diff #98723)

It may use available registers, but it will use them. If there were no registers available before, then at implicit def we will have an increase. It probably deserves some experiments, but shall be a separate change from the interface.

vpykhtin edited edge metadata.May 15 2017, 7:12 AM

If I don't mistake downward tracker cannot work on arbitrary instruction order, so let's change it's interface so that it tracks current instruction and move it, likewise like llvm standard tracker, that is reset(MI), advance()

If I don't mistake downward tracker cannot work on arbitrary instruction order, so let's change it's interface so that it tracks current instruction and move it, likewise like llvm standard tracker, that is reset(MI), advance()

In general it cannot work on an arbitrary order, but it can be used as a probe to schedule next arbitrary instruction and then reset to the previous state.
Then in D33117 I'm using advanceBefore to cross basic block boundary and I do not really want to slow down advance method by checking for the iterator end condition which is needed relatively seldom.

rampitec added inline comments.May 15 2017, 9:21 AM
lib/Target/AMDGPU/GCNRegPressure.cpp
359 ↗(On Diff #98723)

buildMax() maybe?

If I don't mistake downward tracker cannot work on arbitrary instruction order, so let's change it's interface so that it tracks current instruction and move it, likewise like llvm standard tracker, that is reset(MI), advance()

In general it cannot work on an arbitrary order, but it can be used as a probe to schedule next arbitrary instruction and then reset to the previous state.
Then in D33117 I'm using advanceBefore to cross basic block boundary and I do not really want to slow down advance method by checking for the iterator end condition which is needed relatively seldom.

Why are you sure it would work on arbitrary out-of-original-LIS-constructed-order instruction?

If I don't mistake downward tracker cannot work on arbitrary instruction order, so let's change it's interface so that it tracks current instruction and move it, likewise like llvm standard tracker, that is reset(MI), advance()

In general it cannot work on an arbitrary order, but it can be used as a probe to schedule next arbitrary instruction and then reset to the previous state.
Then in D33117 I'm using advanceBefore to cross basic block boundary and I do not really want to slow down advance method by checking for the iterator end condition which is needed relatively seldom.

Why are you sure it would work on arbitrary out-of-original-LIS-constructed-order instruction?

If I advance just a single out of order instruction it will give you not the correct RP at it, but a correct RP diff as a result of such single instruction scheduling. Honestly you cannot expect more from a probe without giving a whole order of intermediate instructions.

This looks a bit as using carkeys to open a bottle. Should we have single instruction RP diff returning function without changing any state?

This looks a bit as using carkeys to open a bottle. Should we have single instruction RP diff returning function without changing any state?

We probably should, but it probably does not belong to this change.

This looks a bit as using carkeys to open a bottle. Should we have single instruction RP diff returning function without changing any state?

We probably should, but it probably does not belong to this change.

I would not introduce interface in a hope of such usage. This is really counterintuitive. Lets do the RP diff function instead when we need it.

This looks a bit as using carkeys to open a bottle. Should we have single instruction RP diff returning function without changing any state?

We probably should, but it probably does not belong to this change.

I would not introduce interface in a hope of such usage. This is really counterintuitive. Lets do the RP diff function instead when we need it.

OK, think about D33117, GCNSchedStrategy.cpp near line 491. To implement this in the proposed interface I will need to check for the end of the basic block in the advanceBefore(), get the successor, check that is the only successor... I'm really trying to make it faster now ;)

I would not introduce interface in a hope of such usage. This is really counterintuitive. Lets do the RP diff function instead when we need it.

OK, think about D33117, GCNSchedStrategy.cpp near line 491. To implement this in the proposed interface I will need to check for the end of the basic block in the advanceBefore(), get the successor, check that is the only successor... I'm really trying to make it faster now ;)

... and then if there is no single successor I would need to return a failure, turning advance to bool and checking it everywhere... Plus pass it a flag if it needs to cross BB boundary or not. Err, that will be a bad interface.

I really don't understand it all. Why not just have reset(MachineInstr *, LiveRegSet ) and do a reset on the next BB with liveset from previous BB? It looks like your code does it. Another question: your reset does skip debugs, but the caller doesn't know about it and can supply unskipped iterator to the next advance.

I really don't understand it all. Why not just have reset(MachineInstr *, LiveRegSet ) and do a reset on the next BB with liveset from previous BB? It looks like your code does it. Another question: your reset does skip debugs, but the caller doesn't know about it and can supply unskipped iterator to the next advance.

Reset with liveset will mean copying of the map. This copy is inevitable in other places because that is really a backup copy, but here we do not need anything like that.
The unskipped iterator is not an issue, advance() will immediately return on a debug value. Caller will increment iterator and call advance again(). No problem.

You can do the reset with move - this would not require copy.

If you make advance moving its own iterator then it can skip debugs itself.

Making advance checking end pointer wouldn't increase the number of end checks, as you check it on every iteration of advance loop. You can return bool from advance and break the loop on it. Anyway such checks are cheap.

rampitec updated this revision to Diff 99031.May 15 2017, 11:26 AM

Speedup the tracker:

  1. Replace MRI.reg_nodbg_empty() with LIS.hasInterval().
  2. Delete empty regs from live set when tracking.
rampitec updated this revision to Diff 99095.May 15 2017, 7:42 PM

Changed forward RP tracker from stateless to statefull as requested by reviewer.
Improved speed by replacing MRI.reg_nodbg_empty() to LIS.hasInterval().

rampitec marked 3 inline comments as done.May 15 2017, 7:43 PM

I'm a bit confused with all of these advance... but lets submit and improve this later.

vpykhtin accepted this revision.May 16 2017, 8:00 AM
This revision is now accepted and ready to land.May 16 2017, 8:00 AM
This revision was automatically updated to reflect the committed changes.