This is an archive of the discontinued LLVM Phabricator instance.

[fastregalloc] Fix bug when undef value is tied to def.
ClosedPublic

Authored by LuoYuanke on May 3 2022, 2:36 AM.

Details

Summary

If the tied use is undef value, fastregalloc should free the def
register. There is no reload needed for the undef value.

Diff Detail

Event Timeline

LuoYuanke created this revision.May 3 2022, 2:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 3 2022, 2:36 AM
LuoYuanke requested review of this revision.May 3 2022, 2:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 3 2022, 2:36 AM
LuoYuanke updated this revision to Diff 426623.May 3 2022, 2:46 AM

Remove unused commends.

arsenm added inline comments.May 3 2022, 11:00 AM
llvm/lib/CodeGen/RegAllocFast.cpp
1115

Might as well pass in MachineOperand since that's what you already have at the use points

llvm/test/CodeGen/X86/fastregalloc-tied-undef.mir
6–27

Can drop the IR section if you drop the block names and IR value references

42

Generate full checks?

MatzeB added inline comments.May 3 2022, 1:31 PM
llvm/lib/CodeGen/RegAllocFast.cpp
1115–1122

I think this would be easier to understand for readers if you go for a different name and semantic. Despite the use being "undef" it's still a tied operand after all. You just want to change the algorithm to ignore those, but for that I think it's clearer to state that within the algorithm.

So how about:

1128–1141

In the above spirit change this to:

1241–1242
1281
llvm/test/CodeGen/X86/fastregalloc-tied-undef.mir
29–52

On top of dropping the IR you can probably simplify like this?

LuoYuanke updated this revision to Diff 426897.May 3 2022, 7:16 PM

Address Arsenault and Braun's comments.

LuoYuanke marked 7 inline comments as done.May 3 2022, 7:20 PM
MatzeB accepted this revision.May 3 2022, 7:45 PM

LGTM with nitpick addressed.

This revision is now accepted and ready to land.May 3 2022, 7:45 PM
MatzeB added inline comments.May 3 2022, 7:47 PM
llvm/lib/CodeGen/RegAllocFast.cpp
1118

Please avoid auto when the type isn't immediately visible within the same line, it makes the code easier to read. See https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable

LuoYuanke updated this revision to Diff 426908.May 3 2022, 7:59 PM

Address Braun's comments.

LuoYuanke marked an inline comment as done.May 3 2022, 8:00 PM
craig.topper added inline comments.
llvm/test/CodeGen/X86/fastregalloc-tied-undef.mir
28

Was this a synthetic example? It shouldn't be possible to generate a PXORrr like this before register allocation.

This revision was landed with ongoing or failed builds.May 3 2022, 9:13 PM
This revision was automatically updated to reflect the committed changes.
LuoYuanke added inline comments.May 3 2022, 9:22 PM
llvm/test/CodeGen/X86/fastregalloc-tied-undef.mir
28

I use the following C code to generate the test case. However compiler would generate "%0:vr128 = V_SET0" instead of "%0:vr128 = PXORrr undef %0, undef %0". The real case that I encounter is to generate code to zero the stack slot for AMX configure register. I use "PXORrr" to zero the stack slot, I can change it to "V_SET0" to avoid the issue in fast regalloc. However I think it expose an issue in fast regalloc, so I create a patch for it.

#include <immintrin.h>

void foo() {
  __m128 vec[4];
  __m128 m = {0, 0};

  vec[0] = m;
  vec[1] = m;
  vec[2] = m;
  vec[3] = m;
}
craig.topper added inline comments.May 3 2022, 9:31 PM
llvm/test/CodeGen/X86/fastregalloc-tied-undef.mir
28

Thanks. I recommend using V_SET0. If you create the PXOR while in SSA form the source and dest will need different vregs. I think the two address instruction pass will change the tied one to match the dest, but it won't change the other one. Then the register allocator is not obligated to give the 2 sources the same register. The untied source will likely always end up with xmm0. If the other source isn't xmm0 it won't be recognized as a zero idiom by the hardware. V_SET0 exists to work around all of that.

craig.topper added inline comments.May 3 2022, 9:32 PM
llvm/test/CodeGen/X86/fastregalloc-tied-undef.mir
28

Actually its worse than that. Not only will it not be recognized by the hardware. It won't produce 0. It will produce a random value since the register contents don't match.

LuoYuanke added inline comments.May 3 2022, 9:35 PM
llvm/test/CodeGen/X86/fastregalloc-tied-undef.mir
28

Thanks, Craig :) I'll change the code.

MatzeB added inline comments.May 3 2022, 9:40 PM
llvm/test/CodeGen/X86/fastregalloc-tied-undef.mir
28

I think the two address instruction pass will change the tied one to match the dest, but it won't change the other one.

I don't think TwoAddressInstruction does that. I can't find the relevant code right now, but at least the explanation in MachineOperand.h for the undef flag says:

/// Note that an instruction may have multiple <undef> operands referring to
/// the same register.  In that case, the instruction may depend on those
/// operands reading the same dont-care value.  For example:
///
///   %1 = XOR undef %2, undef %2
craig.topper added inline comments.May 3 2022, 9:47 PM
llvm/test/CodeGen/X86/fastregalloc-tied-undef.mir
28

I could be wrong. I think the relevant code is in TwoAddressInstructionPass::collectTiedOperands. At first glance it doesn't look like it's trying to keep other operands the same when it rewrites undef tied operands.

craig.topper added inline comments.May 3 2022, 9:51 PM
llvm/test/CodeGen/X86/fastregalloc-tied-undef.mir
28

I hacked the test and ran the twoaddressinstruction pass

# *** IR Dump Before Two-Address instruction pass (twoaddressinstruction) ***:
# Machine code for function foo: IsSSA, NoPHIs, TracksLiveness
Frame Objects:
  fi#0: size=64, align=16, at location [SP+8]
  fi#1: size=16, align=16, at location [SP+8]

bb.0.entry:
  %1:vr128 = PXORrr undef %0:vr128(tied-def 0), undef %0:vr128
  MOVAPSmr %stack.1, 1, $noreg, 0, $noreg, %1:vr128
  MOVAPSmr %stack.0, 1, $noreg, 0, $noreg, %1:vr128
  MOVAPSmr %stack.0, 1, $noreg, 16, $noreg, %1:vr128
  MOVAPSmr %stack.0, 1, $noreg, 32, $noreg, %1:vr128
  MOVAPSmr %stack.0, 1, $noreg, 48, $noreg, killed %1:vr128
  RET 0

# End machine code for function foo.

********** REWRITING TWO-ADDR INSTRS **********
********** Function: foo
                rewrite undef:  %1:vr128 = PXORrr undef %1:vr128(tied-def 0), undef %0:vr128
        %1:vr128 = PXORrr undef %1:vr128(tied-def 0), undef %0:vr128

It creates a PXORrr with different source registers.

MatzeB added inline comments.May 3 2022, 9:53 PM
llvm/test/CodeGen/X86/fastregalloc-tied-undef.mir
28

Interesting! Guess twoaddressinstruction is buggy then given how we define the undef flag and people just got used to use the workaround with pseudo instructions instead...