This factors register pressure estimation mechanism from the
GCNSchedStrategy into the forward tracker to unify interface
with other strategies and expose it to other interested phases.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Split advance() into advanceBefore(MI) and advanceTo(MI).
The tracker does two things: tracks pressure and returns live reg set. A proper live-in set when advancing to an instruction is after we have committed all dead registers before moving to the new instruction, but before we have added new defs from that new instruction.
If we just call advance(), which sequentially calls advanceBefore() and advanceTo(), we would always get a correct live reg set at the MI argument, but that is not the same as live-in set before this instruction. Calling just advanceBefore() leaves it in a live-in state.
In addition added helper to clear maximum register pressure, which is also needed to be reset in between of these two steps if we are tracking the block and want to split it into regions.
lib/Target/AMDGPU/GCNRegPressure.cpp | ||
---|---|---|
305 ↗ | (On Diff #98723) | I think this should be the responsibility of the caller to account valid instructions, it would look unexpected that reset moves somewhere else. |
lib/Target/AMDGPU/GCNRegPressure.h | ||
150 ↗ | (On Diff #98723) | we have some inconsistency here: upward tracker's "reset" moves to the point after the MI, "recede" moves to the point before the MI, but downward reset and advance moves at. |
lib/Target/AMDGPU/GCNRegPressure.cpp | ||
---|---|---|
305 ↗ | (On Diff #98723) | Caller will always need to do it, otherwise reset will just crash. Then advance itself skips debug instructions, so they are consistent. If a caller resets tracker to the debug instruction and then just calls advance increasing the iterator it will not notice any difference. There is nothing good in letting caller to easily crash, there is no additional benefit of it besides the crash. |
lib/Target/AMDGPU/GCNRegPressure.h | ||
150 ↗ | (On Diff #98723) | I do not think this is an inconsistency if we think about advance/recede as a two step move. We really want tracker to do two things, count pressure and get live-ins/live-outs. So reset just does the first step allowing us to collect required live set. If not that advance could just call advance one time to move at the argument instruction. |
lib/Target/AMDGPU/GCNRegPressure.cpp | ||
---|---|---|
305 ↗ | (On Diff #98723) | BTW, reset does not record any state besides live set, and live set is the same if you skip debug or not. |
lib/Target/AMDGPU/GCNRegPressure.cpp | ||
---|---|---|
313–314 ↗ | (On Diff #98723) | Should this be using something like isTransient (as it is now looks inappropriate) to avoid IMPLICIT_DEF from impacting the pressure estimate? |
317 ↗ | (On Diff #98723) | extra space |
359 ↗ | (On Diff #98723) | std::max, not sure how this builds for you |
lib/Target/AMDGPU/GCNRegPressure.cpp | ||
---|---|---|
359 ↗ | (On Diff #98723) | this is overloaded max for GCNRegPressure |
lib/Target/AMDGPU/GCNRegPressure.cpp | ||
---|---|---|
359 ↗ | (On Diff #98723) | Probably should name it something else |
lib/Target/AMDGPU/GCNRegPressure.cpp | ||
---|---|---|
313–314 ↗ | (On Diff #98723) | I believe these defs should be accounted towards pressure. RA will finally allocate something for it, and in our case it is better to overestimate pressure rather than underestimate it. Anyway, this is not a functional change, it basically factors out an existing algorithm into a common interface. I do not want to change its behavior with this change. |
lib/Target/AMDGPU/GCNRegPressure.cpp | ||
---|---|---|
359 ↗ | (On Diff #98723) | I thought to choose better name for it as it doesn't compare two pressures and returning larger but instead constructs new pressure using maximum values from both, what would be better name here? |
lib/Target/AMDGPU/GCNRegPressure.cpp | ||
---|---|---|
313–314 ↗ | (On Diff #98723) | implicit_def will use whatever register is available, I think usually (always?) VGPR0, so it shouldn't count. Copy/phi is more questionable. |
lib/Target/AMDGPU/GCNRegPressure.cpp | ||
---|---|---|
313–314 ↗ | (On Diff #98723) | It may use available registers, but it will use them. If there were no registers available before, then at implicit def we will have an increase. It probably deserves some experiments, but shall be a separate change from the interface. |
If I don't mistake downward tracker cannot work on arbitrary instruction order, so let's change it's interface so that it tracks current instruction and move it, likewise like llvm standard tracker, that is reset(MI), advance()
In general it cannot work on an arbitrary order, but it can be used as a probe to schedule next arbitrary instruction and then reset to the previous state.
Then in D33117 I'm using advanceBefore to cross basic block boundary and I do not really want to slow down advance method by checking for the iterator end condition which is needed relatively seldom.
lib/Target/AMDGPU/GCNRegPressure.cpp | ||
---|---|---|
359 ↗ | (On Diff #98723) | buildMax() maybe? |
Why are you sure it would work on arbitrary out-of-original-LIS-constructed-order instruction?
If I advance just a single out of order instruction it will give you not the correct RP at it, but a correct RP diff as a result of such single instruction scheduling. Honestly you cannot expect more from a probe without giving a whole order of intermediate instructions.
This looks a bit as using carkeys to open a bottle. Should we have single instruction RP diff returning function without changing any state?
I would not introduce interface in a hope of such usage. This is really counterintuitive. Lets do the RP diff function instead when we need it.
OK, think about D33117, GCNSchedStrategy.cpp near line 491. To implement this in the proposed interface I will need to check for the end of the basic block in the advanceBefore(), get the successor, check that is the only successor... I'm really trying to make it faster now ;)
... and then if there is no single successor I would need to return a failure, turning advance to bool and checking it everywhere... Plus pass it a flag if it needs to cross BB boundary or not. Err, that will be a bad interface.
I really don't understand it all. Why not just have reset(MachineInstr *, LiveRegSet ) and do a reset on the next BB with liveset from previous BB? It looks like your code does it. Another question: your reset does skip debugs, but the caller doesn't know about it and can supply unskipped iterator to the next advance.
Reset with liveset will mean copying of the map. This copy is inevitable in other places because that is really a backup copy, but here we do not need anything like that.
The unskipped iterator is not an issue, advance() will immediately return on a debug value. Caller will increment iterator and call advance again(). No problem.
You can do the reset with move - this would not require copy.
If you make advance moving its own iterator then it can skip debugs itself.
Making advance checking end pointer wouldn't increase the number of end checks, as you check it on every iteration of advance loop. You can return bool from advance and break the loop on it. Anyway such checks are cheap.
Speedup the tracker:
- Replace MRI.reg_nodbg_empty() with LIS.hasInterval().
- Delete empty regs from live set when tracking.
Changed forward RP tracker from stateless to statefull as requested by reviewer.
Improved speed by replacing MRI.reg_nodbg_empty() to LIS.hasInterval().
I'm a bit confused with all of these advance... but lets submit and improve this later.