This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add pass to rewrite partially used virtual superregisters after RenameIndependentSubregs pass with registers of minimal size.
ClosedPublic

Authored by vpykhtin on Dec 9 2022, 11:12 AM.

Details

Summary

The main purpose of this is to simplify register pressure tracking as after the pass there is no need
to track subreg liveness anymore.

On the other hand this pass creates more possibilites for the subreg unaware code, as many of the subregs
becomes ordinary registers.

Intersting sideeffect: spill-vgpr.ll has lost a lot of spills.

Diff Detail

Event Timeline

vpykhtin created this revision.Dec 9 2022, 11:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 9 2022, 11:12 AM
vpykhtin requested review of this revision.Dec 9 2022, 11:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 9 2022, 11:12 AM
vpykhtin added a reviewer: Restricted Project.Dec 9 2022, 11:13 AM
vpykhtin edited the summary of this revision. (Show Details)Dec 9 2022, 11:18 AM
arsenm added inline comments.Dec 9 2022, 2:08 PM
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
120–121

getAllocatableClass

arsenm added a comment.Dec 9 2022, 2:10 PM

Can you add some dedicated MIR and IR tests that this improves? I've thought about hacking around some allocation failures by doing something similar. My assumption has been unnecessarily wide register uses should have been broken down before this point

Can you add some dedicated MIR and IR tests that this improves? I've thought about hacking around some allocation failures by doing something similar. My assumption has been unnecessarily wide register uses should have been broken down before this point

I don't know if this improves something at this point apart from that I see less spills at spill-vgpr.ll. I would like to turn off subreg liveness tracking for register pressure calculation because LiveIntervals has bug related to subreg liveness that isn't easy to fix. I'm afraid though I would need more allocatable register classes for this purpose.

vpykhtin updated this revision to Diff 482107.Dec 12 2022, 6:47 AM
  • Rebased.
  • Added test for VGPR.

While working on SGPR test I understood that my implementation is pretty naive because I was lucky to start with VGPRs where every register has every subreg. Now I understand that selection of replacement register is more complicated due to alignment of SGPR tuples on 2,3,4.

While working on SGPR test I understood that my implementation is pretty naive because I was lucky to start with VGPRs where every register has every subreg. Now I understand that selection of replacement register is more complicated due to alignment of SGPR tuples on 2,3,4.

there are also VGPRs with alignment requirements for mi-200

vpykhtin updated this revision to Diff 482873.Dec 14 2022, 8:40 AM
  • Fully reworked code, it looks target independent now.
  • Added tests for SGPRs.
  • Instructions requiring special alignment of register are still to be addressed.
  • Rebased.
arsenm added inline comments.Dec 14 2022, 8:47 AM
llvm/include/llvm/MC/LaneBitmask.h
85–88

having "getLane" and "getLanes" is going to be terribly confusing. Needs a different name

llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
108

I think {Offset, Size} should work now

155

Comment what true is?

162–164
if (!SubRegRC)
  SubRegRC = ...

if (!SubRegRC)
  return nullptr
302–305

Can you do better than delete and recreate the liveness? Can we do this before liveness is computed?

vpykhtin marked an inline comment as done.Dec 14 2022, 9:12 AM
vpykhtin added inline comments.
llvm/include/llvm/MC/LaneBitmask.h
85–88

Sorry this is leftover from the previous version, its not used anymore.

vpykhtin updated this revision to Diff 482899.Dec 14 2022, 9:35 AM
vpykhtin marked an inline comment as done.
  • Per review fixes.
vpykhtin marked 3 inline comments as done.Dec 14 2022, 9:40 AM
vpykhtin added inline comments.
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
302–305

Liveness is required by the previous pass but this pass doesn't use it so I decided not to require it.

Liveness subranges aren't touched by this pass but some of them become main range instead of subrange. Subreg indexes and lanemasks need to be updated.

arsenm added inline comments.Dec 14 2022, 9:42 AM
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
302–305

Do you only need to do splitSeparateComponents for each changed register?

vpykhtin updated this revision to Diff 483558.Dec 16 2022, 8:54 AM
  • Added support for aligned register classes, for example: %0.sub0:vreg_64_align2 won't be rewritten because resulting VGPR32 class isn't aligned. Please take a look at rewrite-partial-reg-uses-gen.mir, align2 tests after line 2743.
  • Added support function for register alignment, better namings would be appreciated.
  • Enabled some tests that were incorrecty skipped.
  • Added comments.
  • Better liveness update code yet to be done.
  • Rebased.

passed internal testing,

vpykhtin updated this revision to Diff 485848.Jan 2 2023, 2:16 AM
  • Implemented better liveness info update code.
  • Rebased.
vpykhtin updated this revision to Diff 485849.Jan 2 2023, 2:20 AM

Forgot to remove disabled code, done.

vpykhtin updated this revision to Diff 489905.Jan 17 2023, 11:51 AM
  • Rebased.
  • Passed internal testing.

I would like to finish this patch and submit.

vpykhtin retitled this revision from [AMDGPU] Add pass to rewrite partially used virtual superregisters after RenameIndependentSubregs pass with registers of minimal size (WIP) to [AMDGPU] Add pass to rewrite partially used virtual superregisters after RenameIndependentSubregs pass with registers of minimal size..Jan 17 2023, 11:51 AM
arsenm added inline comments.Jan 17 2023, 1:54 PM
llvm/include/llvm/CodeGen/LiveInterval.h
713–717 ↗(On Diff #489905)

These generic header changes should be split into a separate patch, at first glance I don't understand why they are necessary

arsenm added inline comments.Jan 17 2023, 2:00 PM
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
207

Seems like a missing newline?

302–305

Ping on this, can you just do splitSeparateComponents? I don't see why you have to mutate the existing interval

vpykhtin added inline comments.Jan 17 2023, 9:19 PM
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
302–305

I'm not sure what splitSeparateComponents does, but my code doesn't create separate intervals. It "shifts" subregs to the right and uses smallest possible superreg. I only need to update subreg indexes and remove covering subreg.

qcolombet added inline comments.Jan 19 2023, 1:42 AM
llvm/include/llvm/CodeGen/LiveInterval.h
713 ↗(On Diff #489905)

That change looks very wrong to me.

If we want to "update" a live interval so that it points to a different register, the proper way to do this is to create a new one and shrink/remove the old one.

We have plenty of helper functions to do that.
Look at LiveIntervals::createEmptyInterval, LiveIntervals::extendSegmentsToUses, LiveIntervals::shrinkToUses, etc.

vpykhtin edited the summary of this revision. (Show Details)Jan 19 2023, 6:29 AM
vpykhtin added inline comments.
llvm/include/llvm/CodeGen/LiveInterval.h
713 ↗(On Diff #489905)

Honestly I did this with a heavy heart but in my case I only need to copy existing interval excluding "covering" subreg subrange, which should match main range, and fix lanemasks for the shifted subreg subranges. Maybe I can copy main range and then move subranges from old interval to the new one, do you think it would be better?

vpykhtin updated this revision to Diff 492022.Jan 25 2023, 1:25 AM
  • Rebased.
  • Removed ad-hoc in-place interval modification, replaced with copy.
vpykhtin updated this revision to Diff 499851.Feb 23 2023, 7:25 AM
  • Rebased.
  • Improved interval updating code.
  • Passed internal testing.

Kindly pinging, this is ready to submit.

qcolombet added inline comments.Mar 15 2023, 1:41 AM
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
74

We need comments on all these fields and methods.

  • What they hold
  • What they do

etc.

For instance I have no idea what this mapping hold.
I am guessing it is mapping a pair (Reg, Subreg) to a new vreg, but having a comment for that would be great.

Similarly I don't know what rewriteReg does and why it returns a bool.

vpykhtin updated this revision to Diff 505863.Mar 16 2023, 10:15 AM
  • Improved comments.
  • Rebased, updated tests.
vpykhtin added inline comments.Mar 16 2023, 10:17 AM
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
74

RIght, sorry about that, added comments.

arsenm added inline comments.Mar 17 2023, 1:48 PM
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
76

missing 'if'. Also should /// for this and the other comments

155–156

Feels like tablegen should generate something like this which does a binary search of subregister indices

216–218

can you do auto &[OldSubReg, NewSubReg] : SubRegs)?

llvm/lib/Target/AMDGPU/SIRegisterInfo.h
423–426 ↗(On Diff #505863)

This should be implied by subregister support. Can you check getSubClassWithSubReg/getMatchingSuperRegClass?

vpykhtin updated this revision to Diff 506550.Mar 20 2023, 5:25 AM

Per review issues:

  • Improved comments.
  • More C++ structure bingind syntax used.
vpykhtin marked 2 inline comments as done.Mar 20 2023, 5:33 AM
vpykhtin added inline comments.
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
155–156

Agree, but there's not much of them used and all of them get cached, so I think we can leave this improvement for later.

llvm/lib/Target/AMDGPU/SIRegisterInfo.h
423–426 ↗(On Diff #505863)

I'm not sure: I need alignment information to prevent rewriting cases like:

undef %0.sub0:vreg_64_align2 = V_MOV_B32_e32 00, implicit $exec
S_NOP 0, implicit %0.sub0

%0.sub0 cannot be rewritten with ordinary VGPR32 as it loses alignment.

I think we need to have LiveRangeEdit start doing the same thing. I guess we can start with this in a separate pass for now, and then I can look into merge it into LiveRangeEdit. One of the problems I'm trying to solve is tuple spills increasing liveness for dead lanes

llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
82

Should use a const reference, this should never need to be null

92

Shouldn't this be a static table?

155

Add TODO to move to tablegen

221

We have helpers to query the class from the instruction, this shouldn't be reachable?

265

For all practical purposes this is dead code, either just enable with with all debug printing or remove it

arsenm added inline comments.Mar 30 2023, 10:36 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
1251

To be safe should add a flag to disable this like the other RA passes

arsenm added inline comments.Mar 30 2023, 10:39 AM
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
197

I have no idea what V is, use an explicit type?

258

I thought the classes were already supposed to be sorted with largest first such that you only need to look at the first set bit?

289

AMDGPU::NoSubRegister?

392

I don't think verify is defined in a release build

arsenm added inline comments.Mar 30 2023, 10:43 AM
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
441

Debug operands need to be updated too. Can you add some tests with debug uses?

arsenm added inline comments.Mar 30 2023, 10:45 AM
llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses.mir
54 ↗(On Diff #506550)

Some tests with phi defs?

vpykhtin updated this revision to Diff 510007.Mar 31 2023, 6:06 AM
  • Rebased, updated lit tests.
  • Per review issues partially done.

Debug info and Phi tests are to be added.

vpykhtin marked 7 inline comments as done.Mar 31 2023, 6:20 AM
vpykhtin added inline comments.
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
82

RC is gradually refined with getCommonSubClass, please see line 418.

92

I agree, but sometime ago I was warned about avoiding static data-members to reduce library memory footprint.

221

Would be appreciated for the function name hint.

258

They're sorted by the number of registers in a class but we need to find the largest class that has registers of minimally suitable size.

392

It is defined with an empty body.

arsenm added inline comments.Mar 31 2023, 7:54 AM
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
92

This wouldn't even be that big of a table. Tablegen could emit it. The concern would be static constructors, a simple static constexpr table wouldn't have that issue.

vpykhtin marked an inline comment as done.Mar 31 2023, 8:41 AM
vpykhtin added inline comments.
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
92

Sorry this isn't constant map, I was thinking about cashed tables. This map is only valid for a specific combination of subregs used: for example, if we use only sub3, we can shift it to sub0, but if we use sub2_sub3, sub2 and sub3 - sub3 can only be sub1.

vpykhtin updated this revision to Diff 515038.Apr 19 2023, 12:01 PM
  • Rebased, updated tests.
  • Added debug operand processing with a TODO, added test.
vpykhtin marked an inline comment as done.Apr 19 2023, 12:02 PM
vpykhtin added inline comments.
llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses.mir
54 ↗(On Diff #506550)

There is no PHIs on this stage.

vpykhtin marked an inline comment as done.May 3 2023, 3:30 AM

ping.

Given the patch does not modify generic parts anymore, I'm happy with it.
I leave the final decision to @arsenm since AMDGPU is outside my domain of expertise.

arsenm accepted this revision.May 23 2023, 2:56 AM

LGTM with some nits (plus I would still prefer if we generated static tables for the compacted subreg mappings)

llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
75

s/to the right/to the left/

84

Left

92

But this set of constraints could be represented in a static table

416

No &

440

s/left/leave/

441

I don't understand why debug info needs special consideration here. If it was referring to a virtual register before, it can refer to the adjusted virtual register?

This revision is now accepted and ready to land.May 23 2023, 2:56 AM
vpykhtin marked an inline comment as done.May 24 2023, 4:32 AM
vpykhtin added inline comments.
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
75

I'm not sure why left? We use lower bits of a superreg so this is like shifting from high to low bits and in my opinion this is called right shift.

92

Ok, but I'm not sure about its size.

441

Yes, I was thinking about situation when debug info actually refer to the whole superreg instead of subregs - in this case we can try to make expression that refers to the shifted content. I'm not sure however, would be nice to talk to debug info experts.

vpykhtin updated this revision to Diff 525127.May 24 2023, 5:45 AM
vpykhtin marked an inline comment as done.
  • Rebased.
  • Minor issues fixed.
arsenm accepted this revision.May 24 2023, 8:10 AM
llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll