Page MenuHomePhabricator

[AMDGPU] Add pass to rewrite partially used virtual superregisters after RenameIndependentSubregs pass with registers of minimal size.
Needs ReviewPublic

Authored by vpykhtin on Dec 9 2022, 11:12 AM.

Details

Reviewers
qcolombet
MatzeB
Group Reviewers
Restricted Project
Summary

The main purpose of this is to simplify register pressure tracking as after the pass there is no need
to track subreg liveness anymore.

On the other hand this pass creates more possibilites for the subreg unaware code, as many of the subregs
becomes ordinary registers.

Intersting sideeffect: spill-vgpr.ll has lost a lot of spills.

Diff Detail

Event Timeline

vpykhtin created this revision.Dec 9 2022, 11:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 9 2022, 11:12 AM
vpykhtin requested review of this revision.Dec 9 2022, 11:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 9 2022, 11:12 AM
vpykhtin added a reviewer: Restricted Project.Dec 9 2022, 11:13 AM
vpykhtin edited the summary of this revision. (Show Details)Dec 9 2022, 11:18 AM
arsenm added inline comments.Dec 9 2022, 2:08 PM
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
120–121

getAllocatableClass

arsenm added a comment.Dec 9 2022, 2:10 PM

Can you add some dedicated MIR and IR tests that this improves? I've thought about hacking around some allocation failures by doing something similar. My assumption has been unnecessarily wide register uses should have been broken down before this point

Can you add some dedicated MIR and IR tests that this improves? I've thought about hacking around some allocation failures by doing something similar. My assumption has been unnecessarily wide register uses should have been broken down before this point

I don't know if this improves something at this point apart from that I see less spills at spill-vgpr.ll. I would like to turn off subreg liveness tracking for register pressure calculation because LiveIntervals has bug related to subreg liveness that isn't easy to fix. I'm afraid though I would need more allocatable register classes for this purpose.

vpykhtin updated this revision to Diff 482107.Dec 12 2022, 6:47 AM
  • Rebased.
  • Added test for VGPR.

While working on SGPR test I understood that my implementation is pretty naive because I was lucky to start with VGPRs where every register has every subreg. Now I understand that selection of replacement register is more complicated due to alignment of SGPR tuples on 2,3,4.

While working on SGPR test I understood that my implementation is pretty naive because I was lucky to start with VGPRs where every register has every subreg. Now I understand that selection of replacement register is more complicated due to alignment of SGPR tuples on 2,3,4.

there are also VGPRs with alignment requirements for mi-200

vpykhtin updated this revision to Diff 482873.Dec 14 2022, 8:40 AM
  • Fully reworked code, it looks target independent now.
  • Added tests for SGPRs.
  • Instructions requiring special alignment of register are still to be addressed.
  • Rebased.
arsenm added inline comments.Dec 14 2022, 8:47 AM
llvm/include/llvm/MC/LaneBitmask.h
85–88 ↗(On Diff #482873)

having "getLane" and "getLanes" is going to be terribly confusing. Needs a different name

llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
109

I think {Offset, Size} should work now

156

Comment what true is?

163–165
if (!SubRegRC)
  SubRegRC = ...

if (!SubRegRC)
  return nullptr
303–306

Can you do better than delete and recreate the liveness? Can we do this before liveness is computed?

vpykhtin marked an inline comment as done.Dec 14 2022, 9:12 AM
vpykhtin added inline comments.
llvm/include/llvm/MC/LaneBitmask.h
85–88 ↗(On Diff #482873)

Sorry this is leftover from the previous version, its not used anymore.

vpykhtin updated this revision to Diff 482899.Dec 14 2022, 9:35 AM
vpykhtin marked an inline comment as done.
  • Per review fixes.
vpykhtin marked 3 inline comments as done.Dec 14 2022, 9:40 AM
vpykhtin added inline comments.
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
303–306

Liveness is required by the previous pass but this pass doesn't use it so I decided not to require it.

Liveness subranges aren't touched by this pass but some of them become main range instead of subrange. Subreg indexes and lanemasks need to be updated.

arsenm added inline comments.Dec 14 2022, 9:42 AM
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
303–306

Do you only need to do splitSeparateComponents for each changed register?

vpykhtin updated this revision to Diff 483558.Dec 16 2022, 8:54 AM
  • Added support for aligned register classes, for example: %0.sub0:vreg_64_align2 won't be rewritten because resulting VGPR32 class isn't aligned. Please take a look at rewrite-partial-reg-uses-gen.mir, align2 tests after line 2743.
  • Added support function for register alignment, better namings would be appreciated.
  • Enabled some tests that were incorrecty skipped.
  • Added comments.
  • Better liveness update code yet to be done.
  • Rebased.

passed internal testing,

vpykhtin updated this revision to Diff 485848.Jan 2 2023, 2:16 AM
  • Implemented better liveness info update code.
  • Rebased.
vpykhtin updated this revision to Diff 485849.Jan 2 2023, 2:20 AM

Forgot to remove disabled code, done.

vpykhtin updated this revision to Diff 489905.Jan 17 2023, 11:51 AM
  • Rebased.
  • Passed internal testing.

I would like to finish this patch and submit.

vpykhtin retitled this revision from [AMDGPU] Add pass to rewrite partially used virtual superregisters after RenameIndependentSubregs pass with registers of minimal size (WIP) to [AMDGPU] Add pass to rewrite partially used virtual superregisters after RenameIndependentSubregs pass with registers of minimal size..Jan 17 2023, 11:51 AM
arsenm added inline comments.Jan 17 2023, 1:54 PM
llvm/include/llvm/CodeGen/LiveInterval.h
713–717 ↗(On Diff #489905)

These generic header changes should be split into a separate patch, at first glance I don't understand why they are necessary

arsenm added inline comments.Jan 17 2023, 2:00 PM
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
207

Seems like a missing newline?

303–306

Ping on this, can you just do splitSeparateComponents? I don't see why you have to mutate the existing interval

vpykhtin added inline comments.Jan 17 2023, 9:19 PM
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
303–306

I'm not sure what splitSeparateComponents does, but my code doesn't create separate intervals. It "shifts" subregs to the right and uses smallest possible superreg. I only need to update subreg indexes and remove covering subreg.

qcolombet added inline comments.Jan 19 2023, 1:42 AM
llvm/include/llvm/CodeGen/LiveInterval.h
713 ↗(On Diff #489905)

That change looks very wrong to me.

If we want to "update" a live interval so that it points to a different register, the proper way to do this is to create a new one and shrink/remove the old one.

We have plenty of helper functions to do that.
Look at LiveIntervals::createEmptyInterval, LiveIntervals::extendSegmentsToUses, LiveIntervals::shrinkToUses, etc.

vpykhtin edited the summary of this revision. (Show Details)Jan 19 2023, 6:29 AM
vpykhtin added inline comments.
llvm/include/llvm/CodeGen/LiveInterval.h
713 ↗(On Diff #489905)

Honestly I did this with a heavy heart but in my case I only need to copy existing interval excluding "covering" subreg subrange, which should match main range, and fix lanemasks for the shifted subreg subranges. Maybe I can copy main range and then move subranges from old interval to the new one, do you think it would be better?

vpykhtin updated this revision to Diff 492022.Jan 25 2023, 1:25 AM
  • Rebased.
  • Removed ad-hoc in-place interval modification, replaced with copy.
vpykhtin updated this revision to Diff 499851.Thu, Feb 23, 7:25 AM
  • Rebased.
  • Improved interval updating code.
  • Passed internal testing.

Kindly pinging, this is ready to submit.

qcolombet added inline comments.Wed, Mar 15, 1:41 AM
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
74

We need comments on all these fields and methods.

  • What they hold
  • What they do

etc.

For instance I have no idea what this mapping hold.
I am guessing it is mapping a pair (Reg, Subreg) to a new vreg, but having a comment for that would be great.

Similarly I don't know what rewriteReg does and why it returns a bool.

vpykhtin updated this revision to Diff 505863.Thu, Mar 16, 10:15 AM
  • Improved comments.
  • Rebased, updated tests.
vpykhtin added inline comments.Thu, Mar 16, 10:17 AM
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
74

RIght, sorry about that, added comments.

arsenm added inline comments.Fri, Mar 17, 1:48 PM
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
76

missing 'if'. Also should /// for this and the other comments

155–156

Feels like tablegen should generate something like this which does a binary search of subregister indices

216–218

can you do auto &[OldSubReg, NewSubReg] : SubRegs)?

llvm/lib/Target/AMDGPU/SIRegisterInfo.h
423–426

This should be implied by subregister support. Can you check getSubClassWithSubReg/getMatchingSuperRegClass?

vpykhtin updated this revision to Diff 506550.Mon, Mar 20, 5:25 AM

Per review issues:

  • Improved comments.
  • More C++ structure bingind syntax used.
vpykhtin marked 2 inline comments as done.Mon, Mar 20, 5:33 AM
vpykhtin added inline comments.
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
155–156

Agree, but there's not much of them used and all of them get cached, so I think we can leave this improvement for later.

llvm/lib/Target/AMDGPU/SIRegisterInfo.h
423–426

I'm not sure: I need alignment information to prevent rewriting cases like:

undef %0.sub0:vreg_64_align2 = V_MOV_B32_e32 00, implicit $exec
S_NOP 0, implicit %0.sub0

%0.sub0 cannot be rewritten with ordinary VGPR32 as it loses alignment.