If the tied use is undef value, fastregalloc should free the def
register. There is no reload needed for the undef value.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
llvm/lib/CodeGen/RegAllocFast.cpp | ||
---|---|---|
1115–1122 | I think this would be easier to understand for readers if you go for a different name and semantic. Despite the use being "undef" it's still a tied operand after all. You just want to change the algorithm to ignore those, but for that I think it's clearer to state that within the algorithm. So how about: | |
1130–1142 | In the above spirit change this to: | |
1242 | ||
1281 | ||
llvm/test/CodeGen/X86/fastregalloc-tied-undef.mir | ||
28–51 | On top of dropping the IR you can probably simplify like this? |
llvm/lib/CodeGen/RegAllocFast.cpp | ||
---|---|---|
1118 | Please avoid auto when the type isn't immediately visible within the same line, it makes the code easier to read. See https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable |
llvm/test/CodeGen/X86/fastregalloc-tied-undef.mir | ||
---|---|---|
28 | Was this a synthetic example? It shouldn't be possible to generate a PXORrr like this before register allocation. |
llvm/test/CodeGen/X86/fastregalloc-tied-undef.mir | ||
---|---|---|
28 | I use the following C code to generate the test case. However compiler would generate "%0:vr128 = V_SET0" instead of "%0:vr128 = PXORrr undef %0, undef %0". The real case that I encounter is to generate code to zero the stack slot for AMX configure register. I use "PXORrr" to zero the stack slot, I can change it to "V_SET0" to avoid the issue in fast regalloc. However I think it expose an issue in fast regalloc, so I create a patch for it. #include <immintrin.h> void foo() { __m128 vec[4]; __m128 m = {0, 0}; vec[0] = m; vec[1] = m; vec[2] = m; vec[3] = m; } |
llvm/test/CodeGen/X86/fastregalloc-tied-undef.mir | ||
---|---|---|
28 | Thanks. I recommend using V_SET0. If you create the PXOR while in SSA form the source and dest will need different vregs. I think the two address instruction pass will change the tied one to match the dest, but it won't change the other one. Then the register allocator is not obligated to give the 2 sources the same register. The untied source will likely always end up with xmm0. If the other source isn't xmm0 it won't be recognized as a zero idiom by the hardware. V_SET0 exists to work around all of that. |
llvm/test/CodeGen/X86/fastregalloc-tied-undef.mir | ||
---|---|---|
28 | Actually its worse than that. Not only will it not be recognized by the hardware. It won't produce 0. It will produce a random value since the register contents don't match. |
llvm/test/CodeGen/X86/fastregalloc-tied-undef.mir | ||
---|---|---|
28 | Thanks, Craig :) I'll change the code. |
llvm/test/CodeGen/X86/fastregalloc-tied-undef.mir | ||
---|---|---|
28 |
I don't think TwoAddressInstruction does that. I can't find the relevant code right now, but at least the explanation in MachineOperand.h for the undef flag says: /// Note that an instruction may have multiple <undef> operands referring to /// the same register. In that case, the instruction may depend on those /// operands reading the same dont-care value. For example: /// /// %1 = XOR undef %2, undef %2 |
llvm/test/CodeGen/X86/fastregalloc-tied-undef.mir | ||
---|---|---|
28 | I could be wrong. I think the relevant code is in TwoAddressInstructionPass::collectTiedOperands. At first glance it doesn't look like it's trying to keep other operands the same when it rewrites undef tied operands. |
llvm/test/CodeGen/X86/fastregalloc-tied-undef.mir | ||
---|---|---|
28 | I hacked the test and ran the twoaddressinstruction pass # *** IR Dump Before Two-Address instruction pass (twoaddressinstruction) ***: # Machine code for function foo: IsSSA, NoPHIs, TracksLiveness Frame Objects: fi#0: size=64, align=16, at location [SP+8] fi#1: size=16, align=16, at location [SP+8] bb.0.entry: %1:vr128 = PXORrr undef %0:vr128(tied-def 0), undef %0:vr128 MOVAPSmr %stack.1, 1, $noreg, 0, $noreg, %1:vr128 MOVAPSmr %stack.0, 1, $noreg, 0, $noreg, %1:vr128 MOVAPSmr %stack.0, 1, $noreg, 16, $noreg, %1:vr128 MOVAPSmr %stack.0, 1, $noreg, 32, $noreg, %1:vr128 MOVAPSmr %stack.0, 1, $noreg, 48, $noreg, killed %1:vr128 RET 0 # End machine code for function foo. ********** REWRITING TWO-ADDR INSTRS ********** ********** Function: foo rewrite undef: %1:vr128 = PXORrr undef %1:vr128(tied-def 0), undef %0:vr128 %1:vr128 = PXORrr undef %1:vr128(tied-def 0), undef %0:vr128 It creates a PXORrr with different source registers. |
llvm/test/CodeGen/X86/fastregalloc-tied-undef.mir | ||
---|---|---|
28 | Interesting! Guess twoaddressinstruction is buggy then given how we define the undef flag and people just got used to use the workaround with pseudo instructions instead... |
Might as well pass in MachineOperand since that's what you already have at the use points