This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix register class for a subreg in GCNRewritePartialRegUses.
ClosedPublic

Authored by vpykhtin on Jun 13 2023, 9:49 AM.

Details

Summary
  1. Improved code that deduces register class from instruction definitions. Previously if some instruction didn't contain a reg class for an operand it was considered as no information on register class even if other instructions specified the class.
  1. Added check on required size of resulting register because in some cases classes with smaller registers had been selected (for example VReg_1).

Diff Detail

Event Timeline

vpykhtin created this revision.Jun 13 2023, 9:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 13 2023, 9:49 AM
vpykhtin requested review of this revision.Jun 13 2023, 9:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 13 2023, 9:49 AM
vpykhtin added a reviewer: Restricted Project.Jun 13 2023, 9:50 AM
vpykhtin edited the summary of this revision. (Show Details)
arsenm added inline comments.Jun 13 2023, 5:47 PM
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
100–116

This comment is a run on sentence I'm having trouble following

327–328

Why do you need the size? In the unknown case you still have a class from the vreg

vpykhtin updated this revision to Diff 531238.Jun 14 2023, 2:32 AM

Improved function description.

vpykhtin marked an inline comment as done.Jun 14 2023, 2:54 AM
vpykhtin added inline comments.
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
327–328

I had the following problem with that, given a test:

name: test_subregs_unknown_regclass_from_instructions_sgpr_1024_to_sgpr_64
tracksRegLiveness: true
registers:
  - { id: 0, class: sgpr_1024 }
body:             |
  bb.0:
    %1:vreg_64 = COPY undef %0.sub4_sub5

%0.sub4_sub5 doesn't have known regclass and getSubRegisterClass returns

sgpr_1024 : sub4_sub5 -> SReg_1_with_sub0_and_SReg_1_with_lo16_in_SGPR_LO16

SReg_1_with_sub0_and_SReg_1_with_lo16_in_SGPR_LO16 has the following subclasses (in tablegen generated order):

VReg_1
SGPR_64
CCR_SGPR_64
Gfx_CCR_SGPR_64

If I don't check the size it would select VReg_1 which is invalid.

I haven't followed all the consequences why tablegen generated this artificial subclass, but I know that if a class contains registers of different size tablegen puts subclass with smaller registers first, that is why VReg_1 precedes SGPR_64.

arsenm added inline comments.Jun 22 2023, 2:06 PM
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
241

If you do TRI->getMatchingSuperRegClass(RC, TRI->getSubRegisterClass(RC, OldSubReg), OldSubReg) does it work?

I never got around to the patch to use getSubRegisterClass so it makes sense that it's buggy

arsenm added inline comments.Jun 22 2023, 3:11 PM
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
241

Alternatively, getLargestLegalSuperClass

vpykhtin added inline comments.Jul 4 2023, 10:31 AM
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
241

getMatchingSuperRegClass(RC, TRI->getSubRegisterClass(RC, OldSubReg), OldSubReg) does different thing:

  • getSubRegisterClass(RC, OldSubReg) is supposed to return the register class that can be used to allocate the register for OldSubReg subreg of a super register from class RC
  • the former version is going to find a largest subclass of RC where each register has subreg OldSubReg and this subreg would have TRI->getSubRegisterClass(RC, OldSubReg) class on allocation.

getLargestLegalSuperClass doesn't seem to do the job too.

barannikov88 added inline comments.
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
327–328

FWIW all registers in a register class must have the same (spill) size.
But VReg_1 shouldn't be a sublass of any class at all. This looks like a peculiar effect of a class containing no registers on tablegen.

vpykhtin added inline comments.Jul 4 2023, 3:09 PM
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
327–328

This sounds like tablegen should be fixed instead.

arsenm added inline comments.Jul 5 2023, 10:16 AM
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
327–328

Yes, tablegen should be fixed. Ideally VReg_1 wouldn't appear anywhere, it's a hack for the DAG only

arsenm accepted this revision.Jul 5 2023, 10:18 AM

LGTM to move forward with this. I do think we should not be depending on register class size queries and I would much prefer getSubRegisterClass produce a useful result

This revision is now accepted and ready to land.Jul 5 2023, 10:18 AM
vpykhtin updated this revision to Diff 537590.Jul 5 2023, 10:49 PM

Rebase before landing.

This revision was landed with ongoing or failed builds.Jul 5 2023, 11:48 PM
This revision was automatically updated to reflect the committed changes.