Page MenuHomePhabricator

MIR Statepoint refactoring. Part 2: Operand folding.
ClosedPublic

Authored by dantrushin on Jun 11 2020, 5:08 AM.

Details

Summary

Add support for Statepoint operand folding (replacing VReg with
memory operand). This allowed for InlineSpiller only, because it
knows how to reload tied def VReg after it was turned into memory
operand. Other callers (e.g. peephole) cannot do such reload.
We do this by un-tieing operand we want to fold in InlineSpiller
and allowing to fold only untied operands.

Full change set is available at D81603.

Diff Detail

Event Timeline

dantrushin created this revision.Jun 11 2020, 5:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 11 2020, 5:08 AM

I can't wrap my head around why untieing operands would be correct here. It seems like we could end up with a use folded, but a def not leaving to invalid codegen? Can you either expand on the justification or discuss offline?

I can't wrap my head around why untieing operands would be correct here. It seems like we could end up with a use folded, but a def not leaving to invalid codegen? Can you either expand on the justification or discuss offline?

The key point here is that we have two-address form at this point (not SSA, i.e. register on both sides is the same: vreg1 = STATEPOINT vreg1<tied-def0>).
foldMemoryOperand is called from spillAroundUses in InlineSpiller. And this method (besides few other optimizations) for every VReg use tries to fold register into memory operand or, if unsuccessful, inserts reload before instruction.
In some sense it turns VReg into StackSlot. And since Def of stack slot is invisible to LLVM (just like with old statepoint), it "magically" works.

I can't wrap my head around why untieing operands would be correct here. It seems like we could end up with a use folded, but a def not leaving to invalid codegen? Can you either expand on the justification or discuss offline?

The key point here is that we have two-address form at this point (not SSA, i.e. register on both sides is the same: vreg1 = STATEPOINT vreg1<tied-def0>).
foldMemoryOperand is called from spillAroundUses in InlineSpiller. And this method (besides few other optimizations) for every VReg use tries to fold register into memory operand or, if unsuccessful, inserts reload before instruction.
In some sense it turns VReg into StackSlot. And since Def of stack slot is invisible to LLVM (just like with old statepoint), it "magically" works.

But you're converting an instruction which defines a vreg to one which doesn't. If the original definition vreg had uses, isn't this a miscompile?

vreg1 = statepoint .... vreg1 (tied)
vreg2 = ADDri vreg1, 20

>

statepoint ... [FI]
vreg2 = ADDri vreg1, 20 <-- What's the value of vreg1 here?

We could insert a fill after the folded statepoint, but I don't see code which does that?

InlineSpiller::spillAroundUses folds/spills ALL concurrences of vreg1 , so it will turn

vreg1 = statepoint .... vreg1 (tied)
vreg2 = ADDri vreg1, 20

into

statepoint ... [FI]
vreg2 = ADDmi [FI], 20

In case folding fails, it will insert reload before use.

Add test demonstrating statepoint operand folding.

LGTM. I sat down today to write a cleaner patch, and in the process convinced myself this was in fact correct. I will post a separate cleanup once this lands as I think we can have the foldMemoryOperand impl structured much more cleanly for tied operands, but doing so involves a slightly deeper rewrite of some x86 specific code than is reasonable to ask for here.

Please address the inline comments in this review when landing.

I would also encourage - but not require - you to fix 46916 in a separate patch before landing this. I see that the tied argument propagation herein should fix that, but splitting the patch into "fold deopt w/vregs present" and "fold vreg gc operand" would make it much easier for anyone following along.

llvm/lib/CodeGen/InlineSpiller.cpp
813

Please add a comment explaining assumption here.

llvm/lib/CodeGen/TargetInstrInfo.cpp
502

Please add an assert that DefToFoldIdx == e. That is, there's at most one def op per fold operation. The code below would miscompile a multiple def fold.

506

I don't believe this line or comment are necessary. Please remove if make check passes; if not, I will handle in follow up.

optimizeMemoryInst (the peephole entry point) rejects all defs, not just tied defs.

reames accepted this revision.Jul 30 2020, 7:16 PM
This revision is now accepted and ready to land.Jul 30 2020, 7:16 PM


Attached test will explode if isTied check is removed as suggested.
Essential part is:

%10:gr64 = MOV64rm killed %9, 1, $noreg, 0, $noreg :: (load 8 from %ir.p0, addrspace 1)
%6:gr64, %7:gr64 = STATEPOINT 0, 0, 0, @foo, 2, 0, 2, 0, 2, 0, %8, %8(tied-def 0), killed %10, %0(tied-def 1), ...

It is synthetic (i.e. we do no generate such MIR from any IR), but still not invalid

llvm/lib/CodeGen/TargetInstrInfo.cpp
506

I don't believe this line or comment are necessary. Please remove if make check passes; if not, I will handle in follow up.

optimizeMemoryInst (the peephole entry point) rejects all defs, not just tied defs.

The possible path is:

PeepholeOptimizer::runOnMachineFunction (!isLoadFoldable() branch) ->
X86InstrInfo::optimizeLoadInstr ->
TargetInstrInfo::foldMemoryOperand ->
TargetInstrInfo::foldPatchpoint (here)

The fact this path is not exercised now is implementation detail (it did with earlier versions of ISEL patch).
I think we should keep this check to stay on the safe side.