This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix incorrect register usage tracking in GCNUpwardTracker
ClosedPublic

Authored by vpykhtin on May 17 2017, 10:25 AM.

Details

Summary

This change fixes incorrect maximum register pressure calculation in GCNUpwardRPTracker: it reduced pressure of defs before incrementing pressure on uses losing the possible maximum pressure of defs + uses at the machine instruction.

After several attempts to fix it with fewer lines of code I decided it would be easier to introduce MachineInstrRegs class which collects register def/uses for a machine instruction with theirs linemasks. This class does job similar to standard llvm's RegisterOperands class but much smaller.

Trying to figure out a test for this.

Event Timeline

vpykhtin created this revision.May 17 2017, 10:25 AM
rampitec edited edge metadata.May 17 2017, 2:42 PM

Where is GCNUpwardRPTracker::reset() now? What is the base revision this diff is taken against? The GCNUpwardRPTracker::reset() is right before recede() in the current revision, but I do not see it in the left pane as well. Same with some other functions like getDefRegMask.

lib/Target/AMDGPU/GCNRegPressure.cpp
295

You have GCNRPTracker::getDefRegMask() for this.

300

And there is getUsedRegMask() for this too.

311

Why not to do it right above, where you assign getAll()? It seems to be less work.

346

Do not you want to also erase it from LiveRegs if LaneMask.none()?

rampitec added inline comments.May 17 2017, 2:51 PM
lib/Target/AMDGPU/GCNRegPressure.cpp
346

You are now updating MaxPressure correctly, but your CurPressure is incorrect. At this point defs still contribute to the current pressure, they will be out only with the next recede. That may lead scheduler to wrong decisions about a current instruction. You can also run into a paradoxical situation that no single instruction has a pressure equal to max.

I still believe that in the situation where we want to account for both defs and uses contributing to the RP of an instruction we naturally have a two step advance/recede process - step before/past the instruction and actually step to it.

vpykhtin added inline comments.May 18 2017, 3:15 AM
lib/Target/AMDGPU/GCNRegPressure.cpp
295

This is another class, though getDefRegMask can be made nonmember

300

I'm avoiding getLiveLaneMask call here before all registers collected

311

This avoids getLiveLaneMask call more than once for a register

346

Why to erase?

Actually recede moves from the point after the instruction to the point before the instruction in top-down order accounting max pressure interim. recede never stops at the instruction.

I inserted MachineInstrRegs between reset and recede, all functions are in old places. I should move MachineInstrRegs higher

I inserted MachineInstrRegs between reset and recede, all functions are in old places. I should move MachineInstrRegs higher

I see it now, thanks. Can you move it higher please?

lib/Target/AMDGPU/GCNRegPressure.cpp
295

Probably static inline is just right for it. Even for getUsedRegMask if you pass LIS to it.

311

Do you see a common situation where the same register is used in the same instruction more than once? This sounds quite exotic to me, provided we are speaking about uses only, not defs.

346

If you erase you will not iterate over it getRegPressure().

I understand your point that recede does not stop in steps, but I'm still concerned that you will not get a correct CurPressure, or even will not get CurPressure equal to max pressure anywhere in the region. How about that?

I inserted MachineInstrRegs between reset and recede, all functions are in old places. I should move MachineInstrRegs higher

I see it now, thanks. Can you move it higher please?

Yes I'll fix that.

lib/Target/AMDGPU/GCNRegPressure.cpp
311

I agree its going to be very rare ocasion, but the main purpose of this class is to deduplicate register def/uses so it wouldn't be counted twice when calculating pressure. So if I deduplicated registers already then I can save some time on calculating mask.

346

Ok, now that live regs can be reused it may have the point to clear a register immediately. Previously I used stripEmpty on the set but only for debug printing purposes.

CurPressure isn't calculated for the at-the-instruction level, its calculated for the after recede point. I put an assert that CurPressure calculated correctly in the end of recede. CurPressure can never become MaxPressure, but I don't see a problem here. There is no at-the-instruction position in the tracker - it is always in between.

rampitec added inline comments.May 18 2017, 10:30 AM
lib/Target/AMDGPU/GCNRegPressure.cpp
295

Actually getUsedRegMask is unused now, so you can just delete it.

311

I mean, if that is extremely rare you may lose more in the inexpensive but way more often called second loop.

vpykhtin added inline comments.May 18 2017, 10:37 AM
lib/Target/AMDGPU/GCNRegPressure.cpp
311

Ok, I'll make getDefRegMask and getUsedRegMask static and reuse it here removing bottom loop, thanks.

rampitec added inline comments.May 19 2017, 2:21 AM
lib/Target/AMDGPU/GCNRegPressure.cpp
269

It looks like defs are not really needed in this class. Uses needed because you walk them twice, but defs can be just directly processed. I.e. the code can be simplified and overhead somewhat reduced.

346

I see it as a sort of quantum tracker. It hides the intermediate step where pressure actually peaks from an observer. As long as we agree on that understanding I have no objection on submitting such a tracker where actual pressure "tunnels" through recede method as we have no actual interested observers currently. We might want to split the method in the future if we have them.

vpykhtin updated this revision to Diff 99544.May 19 2017, 5:08 AM

fixed as per comments

rampitec accepted this revision.May 19 2017, 9:21 AM

LGTM. Thanks!

This revision is now accepted and ready to land.May 19 2017, 9:21 AM
rampitec added inline comments.May 19 2017, 9:43 AM
lib/Target/AMDGPU/GCNRegPressure.cpp
343

Are you sure it should always present? What if we have a dead def? I.e. an instruction defines a register which is never used. I guess it will not be reported by LIS.

If so this should be if (I != LiveRegs.end()) continue;

rampitec added inline comments.May 19 2017, 9:44 AM
lib/Target/AMDGPU/GCNRegPressure.cpp
343

Sorry, if (I == LiveRegs.end()) continue;

This revision was automatically updated to reflect the committed changes.
vpykhtin marked an inline comment as done.May 22 2017, 6:11 AM
vpykhtin added inline comments.
lib/Target/AMDGPU/GCNRegPressure.cpp
343

Done, thanks!

This change fixes incorrect maximum register pressure calculation in GCNUpwardRPTracker: it reduced pressure of defs before incrementing pressure on uses losing the possible maximum pressure of defs + uses at the machine instruction.

Hi @vpykhtin! I don't understand the reason for this change. Why should max pressure include both the uses and the defs of one instruction? The uses and defs are not live at the same time and can be allocated to overlapping physical registers (assuming the uses are killed by the instruction). There should be an exception for early-clobber def operands but they are not very common.

+ @piotr @critson

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptOct 16 2023, 6:46 AM
piotr added a comment.Oct 16 2023, 7:29 AM

This change fixes incorrect maximum register pressure calculation in GCNUpwardRPTracker: it reduced pressure of defs before incrementing pressure on uses losing the possible maximum pressure of defs + uses at the machine instruction.

Hi @vpykhtin! I don't understand the reason for this change. Why should max pressure include both the uses and the defs of one instruction? The uses and defs are not live at the same time and can be allocated to overlapping physical registers (assuming the uses are killed by the instruction). There should be an exception for early-clobber def operands but they are not very common.

+ @piotr @critson

Would it make sense to only do the AtMIPressure part for instructions with early clobber?

vpykhtin marked an inline comment as done.Oct 18 2023, 3:22 AM

This change fixes incorrect maximum register pressure calculation in GCNUpwardRPTracker: it reduced pressure of defs before incrementing pressure on uses losing the possible maximum pressure of defs + uses at the machine instruction.

Hi @vpykhtin! I don't understand the reason for this change. Why should max pressure include both the uses and the defs of one instruction? The uses and defs are not live at the same time and can be allocated to overlapping physical registers (assuming the uses are killed by the instruction). There should be an exception for early-clobber def operands but they are not very common.

+ @piotr @critson

Would it make sense to only do the AtMIPressure part for instructions with early clobber?

You're right, we should not increment pressure for early-clobbers twice in AtMIPressure

You're right, we should not increment pressure for early-clobbers twice in AtMIPressure

Sorry, disregard this comment. We should only account for early-clobbers this way.

Is anyone working on this at the moment?

foad added a comment.Oct 18 2023, 5:11 AM

Is anyone working on this at the moment?

I'm not working on it but I have been thinking about it. The first problem is how to write regression tests for GCNRegPressure.

Is anyone working on this at the moment?

I'm not working on it but I have been thinking about it. The first problem is how to write regression tests for GCNRegPressure.

I can probably come up with a unit test in the way similar to how we test LiveIntervals and LiveVariables.

foad added a comment.Oct 18 2023, 5:53 AM

Is anyone working on this at the moment?

I'm not working on it but I have been thinking about it. The first problem is how to write regression tests for GCNRegPressure.

I can probably come up with a unit test in the way similar to how we test LiveIntervals and LiveVariables.

I wonder if we can test it more directly by adding an analysis pass like this: https://github.com/GPUOpen-Drivers/llvm-project/commit/042be23e3d98963fb02833511a86f4e26378a04d
and then using something like opt -passes='print<amdgpu-reg-press>'.

I wonder if we can test it more directly by adding an analysis pass like this: https://github.com/GPUOpen-Drivers/llvm-project/commit/042be23e3d98963fb02833511a86f4e26378a04d
and then using something like opt -passes='print<amdgpu-reg-press>'.

We can try to print reg pressure at every instruction

Something like this? (don't forget to expand *.mir file diff, it's not shown by default)

https://github.com/llvm/llvm-project/compare/main...vpykhtin:llvm-project:rp_printer