This is an archive of the discontinued LLVM Phabricator instance.

[TwoAddressInstructionPass] Special processing of STATEPOINT instruction.
ClosedPublic

Authored by dantrushin on Apr 28 2022, 11:44 AM.

Details

Summary

STATEPOINT is a special pseudo instruction which represent Moving GC semantic to LLVM.
Every tied def/use VReg pair in STATEPOINT represent same physical register which can 'magically' change during call wrapped by statepoint.
(By construction, tied use operand is not live across STATEPOINT).

This means that when converting into two-address form, there is not need to insert COPY instruction before stateppoint, what TwoAddressInstruction pass
does for 'regular' instructions.

Diff Detail

Event Timeline

dantrushin created this revision.Apr 28 2022, 11:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 28 2022, 11:44 AM
dantrushin requested review of this revision.Apr 28 2022, 11:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 28 2022, 11:44 AM

Sorry I am not up-to-speed with how STATEPOINTs work in detail. But this change is confusing me and probably needs the description reworked:

  • The summary reads like somehow the lifetime for any STATEPOINT operand must end at the STATEPOINT.
  • The comment above TwoAddressInstructionPass::processStatepoint shows the exact opposite and lengthens the lifetime of the operand register.

Also given that this is a targetted change to the twoaddressinstruction pass it would be good to have at least 1 test that just checks for this particular change in isolation! (the statepoint-invoke-ra.mir is running multiple passes in succession and doesn't make it clear what exactly changed).

I would also recommend to read https://llvm.org/docs/MIRLangRef.html#simplifying-mir-files on how to shrink .mir tests.

Sorry I am not up-to-speed with how STATEPOINTs work in detail. But this change is confusing me and probably needs the description reworked:

  • The summary reads like somehow the lifetime for any STATEPOINT operand must end at the STATEPOINT.
  • The comment above TwoAddressInstructionPass::processStatepoint shows the exact opposite and lengthens the lifetime of the operand register.

STATEPOINT is a special pseudo instruction which wraps call and does not exists in hardware. Its only purpose is to represent possible register value change during a call it wraps. (Which we cannot express by other means)
It uses tied registers in conventional way - use and def must be allocated to the same physical register.
Its important property is that at SSA level, USE register is never used after STATEPOINT, only corresponding DEF is. So, as I tried to explain in description, they are adjacent segments of the same Live Range.
This allows us to not insert COPY instruction before STATEPOINT, but simply replace DEF register with USE one. This is what RegisterCoalescer will do anyway, but using quadratic algorithm (extremely expensive).

Also given that this is a targetted change to the twoaddressinstruction pass it would be good to have at least 1 test that just checks for this particular change in isolation! (the statepoint-invoke-ra.mir is running multiple passes in succession and doesn't make it clear what exactly changed).

In theory, this should have been NFC patch, but in practice it affects how Register Allocator works (because there are no SlotIndexes holes anymore and LiveRanges are shorter their weights change.

The only purpose of this patch is to eliminate COPY instruction which TwoAddress pass insert before instructions which it does not optimize. I'll add simple test if it would be useful.

Its important property is that at SSA level

I assume what you are explaining here is that at a STATEPOINT a moving garbage collector can potentially move objects around and needs to update pointer values. That likely needs additional correctness properties on top of ending the lifetimes (like requiring that any life GC controlled pointer must be an argument to the STATEPOINT). Anyway as far as I understand it this discussion seems irrelevant to this patch.

This means that when transforming it into two-address form, no extra COPY instruction is necessary.

I thought you only get a COPY when a value actually is alive across the instruction, which would be an invalid input anyway as far as I understand things. So I still don't see what we change here.

In theory, this should have been NFC patch

Unless I miss something this is producing different machine-IR with and without your patch. If so this can and should be tested in isolation (if it is not changing the IR, then why bother with this patch in the first place?)

Its important property is that at SSA level

I assume what you are explaining here is that at a STATEPOINT a moving garbage collector can potentially move objects around and needs to update pointer values. That likely needs additional correctness properties on top of ending the lifetimes (like requiring that any life GC controlled pointer must be an argument to the STATEPOINT). Anyway as far as I understand it this discussion seems irrelevant to this patch.

This means that when transforming it into two-address form, no extra COPY instruction is necessary.

I thought you only get a COPY when a value actually is alive across the instruction, which would be an invalid input anyway as far as I understand things. So I still don't see what we change here.

No. TwoaddressPass always inserts COPY (unless it can untie operands by commuting or converting to 3-address form).
And all my above explanations were in fact clumsy attempt to explain that tied-use operand is killed by statepoint, so tied def/use pair can be assigned to the same register without inserting COPY.
I think that this is true in general, but for 'regular' instructions nobody cares, as they usually have at most single tied def/use pair. STATEPOINTs can have tens of such pairs, so avoiding redundant COPY insertions becomes really important for them.
Also please note that 'killed' property for tied operands is not easy. Different parts of LLVM have different opinions whether tied-use operand may be marked as 'Kill' (even if it is not live across instruction).

dantrushin edited the summary of this revision. (Show Details)Apr 29 2022, 7:54 AM
dantrushin edited the summary of this revision. (Show Details)

Added test to show changes in TwoAddress pass.

@MatzeB Matthias, is test statepoint-vreg-twoaddr.mir looks meaningful? If yes, I can precommit it to make changes more obvious. Also, any suggestion on improving description?
(Yes, I know that IR part in .mir test is redundant, but I find it handy when I occasionally have to modify tests)

MatzeB accepted this revision.May 25 2022, 4:07 PM

Well I don't have too much experience with STATEPOINTs, but the change makes sense to me so far.

Please take another look if you can simplify/shrink the statepoint-vreg-twoaddr.mir test so future readers have an easier time.

Either way LGTM.

llvm/test/CodeGen/X86/statepoint-vreg-twoaddr.mir
6–35 ↗(On Diff #426070)

So you need the LLVM-IR here? It feels to me like we can drop it without affecting the test.

60–70 ↗(On Diff #426070)

Is it possible to further simplify this to focus on the thing being tested? Does the following change work:

This revision is now accepted and ready to land.May 25 2022, 4:07 PM
This revision was landed with ongoing or failed builds.May 30 2022, 9:08 AM
This revision was automatically updated to reflect the committed changes.