This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Remove isDef32
ClosedPublic

Authored by dmgreen on Jun 6 2022, 2:27 PM.

Details

Summary

isDef32 would attempt to make a guess at which SelectionDag nodes were 32bit sources, and use the nature of 32bit AArch64 instructions implicitly zeroing the upper register half to not emit zext that were expected to already be zero. This was a bit fragile though, needing to guess at the correct opcodes that do not become 32bit defs.

This patch removed isDef32, relying on the AArch64MIPeephole optimizer to remove redundant SUBREG_TO_REG nodes. A part of SelectArithExtendedRegister was left with the same logic as a heuristic to prevent some regressions from it picking less optimal sequences, and the AArch64MIPeepholeOpt needs to be taught that COPY from a FPR will become a FMOVSWr, doing the same implicit zeroing.

Fixes #55833

Diff Detail

Event Timeline

dmgreen created this revision.Jun 6 2022, 2:27 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 6 2022, 2:27 PM
dmgreen requested review of this revision.Jun 6 2022, 2:27 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 6 2022, 2:27 PM
efriedma added inline comments.Jun 6 2022, 2:47 PM
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
827–828

Maybe explicitly note this is a heuristic, so it doesn't matter if it's wrong in edge cases.

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
209

It's not clear to me we'll always end up with an FMOVSWr, given copy coalescing.

You could explicitly select an FMOVSWr here, but I guess that has other consequences for register allocation...

218

Should we check for pseudo-instructions here? Are there any pseudo-instructions which might produce a register with undefined high bits?

dmgreen updated this revision to Diff 434747.Jun 7 2022, 2:40 AM

Lower COPY to FMOVSWr and add a comment about the heuristic

dmgreen added inline comments.Jun 7 2022, 2:40 AM
llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
209

We can try it and see how it goes. I'm hoping it is fairly rare that it both is used by a redundant ORR and has negative consequences for register allocation.

218

Do you mean the AArch64 pseudos? Do you have any instructions in mind?

This is a list of the pseudos, minus the SVE instructions and G_ pseudos:

ADDSWrr     = 249,
ADDSXrr     = 250,
ADDWrr      = 251,
ADDXrr      = 252,
ADDlowTLS   = 257,
ADJCALLSTACKDOWN    = 258,
ADJCALLSTACKUP      = 259,
AESIMCrrTied        = 260,
AESMCrrTied = 261,
ANDSWrr     = 262,
ANDSXrr     = 263,
ANDWrr      = 264,
ANDXrr      = 265,
BICSWrr     = 286,
BICSXrr     = 287,
BICWrr      = 288,
BICXrr      = 289,
BLRNoIP     = 294,
BLR_BTI     = 295,
BLR_RVMARKER        = 296,
BSPv16i8    = 297,
BSPv8i8     = 298,
CATCHRET    = 299,
CLEANUPRET  = 300,
CMP_SWAP_128        = 309,
CMP_SWAP_128_ACQUIRE        = 310,
CMP_SWAP_128_MONOTONIC      = 311,
CMP_SWAP_128_RELEASE        = 312,
CMP_SWAP_16 = 313,
CMP_SWAP_32 = 314,
CMP_SWAP_64 = 315,
CMP_SWAP_8  = 316,
CompilerBarrier     = 325,
EMITBKEY    = 326,
EONWrr      = 327,
EONXrr      = 328,
EORWrr      = 329,
EORXrr      = 330,
F128CSEL    = 335,
FMOVD0      = 440,
FMOVH0      = 441,
FMOVS0      = 442,
HOM_Epilog  = 671,
HOM_Prolog  = 672,
HWASAN_CHECK_MEMACCESS      = 673,
HWASAN_CHECK_MEMACCESS_SHORTGRANULES
IRGstack    = 675,
JumpTableDest16     = 676,
JumpTableDest32     = 677,
JumpTableDest8      = 678,
LOADgot     = 730,
MOPSMemoryCopyPseudo        = 755,
MOPSMemoryMovePseudo        = 756,
MOPSMemorySetPseudo = 757,
MOPSMemorySetTaggingPseudo  = 758,
MOVMCSym    = 759,
MOVaddr     = 760,
MOVaddrBA   = 761,
MOVaddrCP   = 762,
MOVaddrEXT  = 763,
MOVaddrJT   = 764,
MOVaddrTLS  = 765,
MOVbaseTLS  = 766,
MOVi32imm   = 767,
MOVi64imm   = 768,
ORNWrr      = 781,
ORNXrr      = 782,
ORRWrr      = 783,
ORNXrr      = 782,
ORRWrr      = 783,
ORRXrr      = 784,
RET_ReallyLR        = 791,
SEH_AddFP   = 805,
SEH_EpilogEnd       = 806,
SEH_EpilogStart     = 807,
SEH_Nop     = 808,
SEH_PrologEnd       = 809,
SEH_SaveFPLR        = 810,
SEH_SaveFPLR_X      = 811,
SEH_SaveFReg        = 812,
SEH_SaveFRegP       = 813,
SEH_SaveFRegP_X     = 814,
SEH_SaveFReg_X      = 815,
SEH_SaveReg = 816,
SEH_SaveRegP        = 817,
SEH_SaveRegP_X      = 818,
SEH_SaveReg_X       = 819,
SEH_SetFP   = 820,
SEH_StackAlloc      = 821,
SPACE       = 834,
TGloop
STGloop_wback       = 868,
STZGloop    = 872,
STZGloop_wback      = 873,
SUBSWrr     = 878,
SUBSXrr     = 879,
SUBWrr      = 880,
SUBXrr      = 881,
SpeculationBarrierISBDSBEndBB       = 892,
SpeculationBarrierSBEndBB   = 893,
SpeculationSafeValueW       = 894,
SpeculationSafeValueX       = 895,
StoreSwiftAsyncContext      = 896,
TAGPstack   = 897,
TCRETURNdi  = 898,
TCRETURNri  = 899,
TCRETURNriALL       = 900,
TCRETURNriBTI       = 901,
TLSDESCCALL = 902,
TLSDESC_CALLSEQ     = 903,

Most don't def a GPR32, and the ones that do (ADDWrr and friends, MOVi32imm) should be OK as far as I understand, as they always become W instructions.

efriedma accepted this revision.Jun 7 2022, 9:56 AM

LGTM

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
218

Going through the list, I agree none of those are likely to be an issue. I'm a bit concerned about instructions that don't exist yet, but I guess we don't introduce pseudo-instructions very frequently, so maybe it isn't an issue.

This revision is now accepted and ready to land.Jun 7 2022, 9:56 AM
This revision was landed with ongoing or failed builds.Jun 7 2022, 10:58 AM
This revision was automatically updated to reflect the committed changes.

This triggers failed asserts for me:

$ cat repro.c
typedef int a;
typedef unsigned b;
int c, k, n;
b l;
void o();
typedef struct {
  a d, e;
  b f;
  b g;
  b h;
  b i
} j;
j m;
void p() {
  b q, r, s;
  for (;;) {
    if (c) {
      r = k - m.d;
      s = -m.e;
      q = -r;
    } else {
      r = m.h - m.f;
      s = m.i - m.g;
    }
    for (; l < s;) {
      o(r * sizeof(a));
      n = r + q;
    }
  }
}
$ clang -target aarch64-linux-gnu -c repro.c -O2
clang: ../lib/CodeGen/MachineRegisterInfo.cpp:399: llvm::MachineInstr* llvm::MachineRegisterInfo::getVRegDef(llvm::Register) const: Assertion `(I.atEnd() || std::next(I) == def_instr_end()) && "getVRegDef assumes a single definition or no definition"' failed.

This triggers failed asserts for me:
...

Thanks for the reproducer - we are not preserving SSA in the intermediate steps, which is causing problems for subsequent combines. I feel like we should try and remove ToBeRemoved set, but that might require more testing to make sure it doesn't cause other problems.