This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Use TargetRegisterClass::hasSubClassEq in tryToFindRegisterToRename
ClosedPublic

Authored by c-rhodes on Oct 1 2020, 8:22 AM.

Details

Summary

When renaming store operands for pairing in the load/store optimizer it
tries to find an available register from the minimal physical register
class of the original register. For each register it compares the
equality of minimal physical register class of all sub/super registers
with the minimal physical register class of the original register.

Simply checking for register class equality can break once additional
register classes are added, as was the case when adding:

def foo : RegisterClass<"AArch64", [i32], 32, (sequence "W%u", 12, 15)>

which broke:

CodeGen/AArch64/stp-opt-with-renaming-reserved-regs.mir
CodeGen/AArch64/stp-opt-with-renaming.mir

Since the introduction of the register class above, the rename register
in test1 of the reserved regs test changed from x12 to x18. The reason
for this is the minimal physical register class of x12 (as well as
x13-x15) and its sub/super registers no longer matches that of x9
(GPR64noip_and_tcGPR64).

Rather than selecting a matching register based on a comparison of the minimal
physical register classes of the original and rename registers, this patch
selects based on MachineInstr::getRegClassConstraint for the original
register.

It's worth mentioning the parameter passing registers (r0-r7) could be now be
used as rename registers since the GPR32arg and GPR64arg register classes are
subclasses of the minimal physical register class for x8 for example. I'm not
entirely sure if we want to exclude those registers, if so maybe we could
explicitly exclude those register classes.

Diff Detail

Event Timeline

c-rhodes created this revision.Oct 1 2020, 8:22 AM
c-rhodes requested review of this revision.Oct 1 2020, 8:22 AM
efriedma added inline comments.Oct 1 2020, 9:47 AM
llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
885

I don't think this does what you want. Suppose we had a separate register class for every single AArch64 register. Then this would fail unless OriginalReg and SubOrSuper are equal, I think.

Maybe instead of TRI->getMinimalPhysRegClass(OriginalReg), we can use MachineInstr::getRegClassConstraint?

1534

C->hasSubClassEq(TRI->getMinimalPhysRegClass(SubOrSuper)); seems like a really awkward way to write C->contains(SubOrSuper)

c-rhodes updated this revision to Diff 295836.Oct 2 2020, 8:38 AM

Address Eli's comments

c-rhodes marked an inline comment as done.Oct 2 2020, 8:57 AM
c-rhodes added inline comments.
llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
885

Maybe instead of TRI->getMinimalPhysRegClass(OriginalReg), we can use MachineInstr::getRegClassConstraint?

Took me a while to figure out how to use it but I've updated it. An issue I found was for some machine instrs the number of operands was different to the MCInstrDesc so it couldn't find a regclass when OpIdx >= MCID.getNumOperands(). This caused a difference in test8 of llvm/test/CodeGen/AArch64/stp-opt-with-renaming.mir since the implicit-def at the end of the ORR no longer uses the rename register.

efriedma added inline comments.Oct 2 2020, 11:25 AM
llvm/test/CodeGen/AArch64/stp-opt-with-renaming.mir
260

This seems wrong: we shouldn't skip renaming implicit defs. Probably we should bail out of the transform if we see an implicit def we can't understand.

The test input is a little weird, though; I'm not sure it's legal to mark the result of the ORRWrs renamable given the relationship between the result and the implicit def.

c-rhodes added inline comments.Oct 5 2020, 7:51 AM
llvm/test/CodeGen/AArch64/stp-opt-with-renaming.mir
260

This seems wrong: we shouldn't skip renaming implicit defs. Probably we should bail out of the transform if we see an implicit def we can't understand.

The test input is a little weird, though; I'm not sure it's legal to mark the result of the ORRWrs renamable given the relationship between the result and the implicit def.

I'm not very familiar with this so I'm not sure what the right thing to do is to be honest. The optimization could handle these implicit-defs before using MachineInstr::getRegClassConstraint, so I guess it's a question of whether the input was right in the first place or if it can be extended to not bail out for these operands.

I grepped on ORRWrs and found a few uses similar to this, although that doesn't necessarily means it valid I suppose.

efriedma added inline comments.Oct 9 2020, 2:45 PM
llvm/test/CodeGen/AArch64/stp-opt-with-renaming.mir
260

Thinking about it a bit more, I guess you could argue the renamable marking is correct. Just say that that if there's an overlapping implicit def, we have to follow some some target-specific rule to perform the rewrite.

The problem with rewriting implicit defs, in general, is that the target-independent instruction definition doesn't naturally encode any information about them. If we're looking at ORRWrs specifically, sure, we can say that the implicit def corresponds to the result register. But in general, it's an arbitrary register with unknown register allocation constraints. Some can't be rewritten at all; for example, something like TLSDESC_CALLSEQ. (Not sure we have those specifically this late, but there are a bunch of similar sorts of instructions.)

I think what should be happening here is that canRenameMOP() shouldn't accept arbitrary implciit defs. If it's going to accept any, it needs to be ones where we know the specific rule required to rewrite them.

@fhahn Any thoughts?

fhahn added inline comments.Nov 2 2020, 1:43 PM
llvm/test/CodeGen/AArch64/stp-opt-with-renaming.mir
260

@fhahn Any thoughts?

Just coming back to this now, sorry!

I think renaming implicit defs like in the ORRWrs case is quite valuable, because those are quite common.

Not sure how to best encode it, but teaching canRenameMOP about them might be the best way forward for now. I don't think there's any existing info that we can use.

c-rhodes added inline comments.Nov 20 2020, 8:07 AM
llvm/test/CodeGen/AArch64/stp-opt-with-renaming.mir
260

Thinking about it a bit more, I guess you could argue the renamable marking is correct. Just say that that if there's an overlapping implicit def, we have to follow some some target-specific rule to perform the rewrite.

The problem with rewriting implicit defs, in general, is that the target-independent instruction definition doesn't naturally encode any information about them. If we're looking at ORRWrs specifically, sure, we can say that the implicit def corresponds to the result register. But in general, it's an arbitrary register with unknown register allocation constraints. Some can't be rewritten at all; for example, something like TLSDESC_CALLSEQ. (Not sure we have those specifically this late, but there are a bunch of similar sorts of instructions.)

I think what should be happening here is that canRenameMOP() shouldn't accept arbitrary implciit defs. If it's going to accept any, it needs to be ones where we know the specific rule required to rewrite them.

Sorry just coming back to this myself. Restricting the opcode to ORRWrs for implicit defs in canRenameMOP should be straightforward enough, but supporting the rename in UpdateMIs will still need fixing. I feel like most of the complexity there now stems from the use of MachineInstr::getRegClassConstraint, how necessary is it to use this and would it be possible to revert that change to getMinimalPhysRegClass?

c-rhodes updated this revision to Diff 322107.Feb 8 2021, 8:05 AM
c-rhodes edited the summary of this revision. (Show Details)

Fix renaming of implicit-defs but only allow it for certain opcodes.

c-rhodes added inline comments.Feb 8 2021, 8:12 AM
llvm/test/CodeGen/AArch64/stp-opt-with-renaming.mir
260

Thinking about it a bit more, I guess you could argue the renamable marking is correct. Just say that that if there's an overlapping implicit def, we have to follow some some target-specific rule to perform the rewrite.

The problem with rewriting implicit defs, in general, is that the target-independent instruction definition doesn't naturally encode any information about them. If we're looking at ORRWrs specifically, sure, we can say that the implicit def corresponds to the result register. But in general, it's an arbitrary register with unknown register allocation constraints. Some can't be rewritten at all; for example, something like TLSDESC_CALLSEQ. (Not sure we have those specifically this late, but there are a bunch of similar sorts of instructions.)

I think what should be happening here is that canRenameMOP() shouldn't accept arbitrary implciit defs. If it's going to accept any, it needs to be ones where we know the specific rule required to rewrite them.

Sorry just coming back to this myself. Restricting the opcode to ORRWrs for implicit defs in canRenameMOP should be straightforward enough, but supporting the rename in UpdateMIs will still need fixing. I feel like most of the complexity there now stems from the use of MachineInstr::getRegClassConstraint, how necessary is it to use this and would it be possible to revert that change to getMinimalPhysRegClass?

I've implemented an opcode whitelist for renaming implicit-defs and fixed the issue I mentioned in test8 by choosing the destination register for ORRWrs. Let us know if this is along the right lines and apologies for the delay in updating this patch.

Matt added a subscriber: Matt.Jul 7 2021, 3:42 PM

I know this patch has stalled but it would be good to revive the discussion in the context of D105572, which adds MC layer support for load/store instructions in the Arm Scalable Matrix Extension (SME). These instructions (e.g. [1]) have a 32-bit slice index register in the range W12-W15 that's defined as:

def MatrixIndexGPR32_12_15 : RegisterClass<"AArch64", [i32], 32, (sequence "W%u", 12, 15)>;
def MatrixIndexGPR32Op12_15 : RegisterOperand<MatrixIndexGPR32_12_15> {
  let EncoderMethod = "EncodeMatrixIndexGPR32";
}

Adding this class is affecting the ld/st optimizer and causing differences in the tests listed here. For now I've just re-generated the tests in D105572 but it would be good if this issue can be resolved.

[1] https://developer.arm.com/documentation/ddi0602/2021-06/SME-Instructions/LD1B--Contiguous-load-of-bytes-to-8-bit-element-ZA-tile-slice-

efriedma added inline comments.Jul 8 2021, 11:06 AM
llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
914

We don't want to repeat this list of opcodes in multiple places.

llvm/test/CodeGen/AArch64/stp-opt-with-renaming.mir
258

Somehow the implicit-def is getting rewritten from an x register to a w register?

If the implicit-def is dead, we can just drop it from the operand list. If it's not, rewriting like this is going to mess with liveness computation.

c-rhodes added inline comments.Jul 12 2021, 1:19 PM
llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
914

Makes sense, I'll create a function.

llvm/test/CodeGen/AArch64/stp-opt-with-renaming.mir
258

Somehow the implicit-def is getting rewritten from an x register to a w register?

Ah, that'll be because in UpdateMIs when getRegClassConstraint returns NULL for the implicit-def register it uses the result register, assuming the opcode is in the whitelist. I don't think getRegClassConstraint will work for this implicit-def operand, I suppose getMinimalPhysRegClass could be called instead if register class can't be determined by getRegClassConstraint?

c-rhodes updated this revision to Diff 358294.Jul 13 2021, 9:16 AM
c-rhodes edited the summary of this revision. (Show Details)
  • Moved opcode whitelist to function.
  • Use getMinimalPhysRegClass for implicit-def operand when getRegClassConstraint returns NULL.
c-rhodes added inline comments.Jul 13 2021, 9:19 AM
llvm/test/CodeGen/AArch64/stp-opt-with-renaming.mir
258

Somehow the implicit-def is getting rewritten from an x register to a w register?

Ah, that'll be because in UpdateMIs when getRegClassConstraint returns NULL for the implicit-def register it uses the result register, assuming the opcode is in the whitelist. I don't think getRegClassConstraint will work for this implicit-def operand, I suppose getMinimalPhysRegClass could be called instead if register class can't be determined by getRegClassConstraint?

I've updated the patch to use getMinimalPhysRegClass for the implicit-def operand RC when getRegClassConstraint returns NULL, rather than copying the result register. The implicit-def is no longer getting rewritten from an x register to a w register, hopefully the fix is ok?

@efriedma Could you take another look at this?

Another ping.

Herald added a project: Restricted Project. · View Herald TranscriptOct 19 2022, 11:01 AM
efriedma accepted this revision.Oct 19 2022, 2:05 PM

LGTM

Sorry about the delay, I completely forgot about this.

Renamable implicit defs still feel little weird, but what this patch is doing seems self-consistent, at least. Maybe we should consider factoring out helper functions for some of register handling stuff.

This revision is now accepted and ready to land.Oct 19 2022, 2:05 PM

LGTM

Sorry about the delay, I completely forgot about this.

Renamable implicit defs still feel little weird, but what this patch is doing seems self-consistent, at least. Maybe we should consider factoring out helper functions for some of register handling stuff.

thanks for accepting. It's been a long time since I've looked at this and I'm not familiar with it anymore but I'll rebase and see if it still works.

c-rhodes updated this revision to Diff 557585.Oct 4 2023, 6:37 AM

Rebase. Reviving this as it's apparently come up again when adding a new register class.

paulwalker-arm accepted this revision.Oct 20 2023, 6:04 AM
paulwalker-arm added a subscriber: paulwalker-arm.

That's for resurrecting this fix @c-rhodes.