This is an archive of the discontinued LLVM Phabricator instance.

[Live Intervals] Teach Greedy RA to recognize special case live-through
ClosedPublic

Authored by skatkov on Apr 12 2021, 3:47 AM.

Details

Summary

Statepoint instruction has a deopt section which is actually live-through the call.
Currently this is handled by special post pass after RA - fixup-statepoint-caller-saved.

This change teaches Greedy RA that if segment of live interval is ended with statepoint
instruction and its reg is used in deopt bundle then this live interval interferes regmask of this statepoint
and as a result caller-saved register cannot be assigned to this live interval.

Diff Detail

Event Timeline

skatkov created this revision.Apr 12 2021, 3:47 AM
skatkov requested review of this revision.Apr 12 2021, 3:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 12 2021, 3:47 AM
skatkov added inline comments.Apr 12 2021, 4:31 AM
llvm/lib/CodeGen/LiveIntervals.cpp
959–960

This needs to be fixed as well...
we can skip *SlotI if it is a statepoint and end of last segment in LI....

lkail added a subscriber: lkail.Apr 12 2021, 5:52 AM
skatkov updated this revision to Diff 336865.Apr 12 2021, 9:31 AM

It is probably not optimal but workable.

I really really like this idea, but the implementation, not so much. :)

I assume that you'd disable and eventually delete FixupCalleeSaved if this landed right? (Mostly checking my understanding of the patch intent.)

Code structure wise, I think this needs to be merged into the existing loop. If I'm reading the code correctly, you just need to add a special case when we'd otherwise exit the innermost loop. If *SlotI == LiveI->end and Live->end is a statepoint deopt use, then add the regmask. (I believe that SlotE is past the end of all possible instructions, and thus we can't have a statepoint there.)

You could also consider adjusting the existing code to use a lambda to do the lazy UsableRegs init. That would be a lot easier to read even for the existing code, and your approach becomes easier to read if you're not duplicating all of the Found logic.

If my earlier comment didn't parse, I'd suggest extracting the helper as suggested inline and rebasing over the lambda. I think once most of the extra complexity is out of the way, it'll be easier to see ways to fold the two loops together.

llvm/lib/CodeGen/LiveIntervals.cpp
937

I think you can make this code much more readable by adding a utility along the lines of the following:
bool hasLiveThroughUse(MachineInstr, Register);

945

Isn't this just a really weirdly spelled expression of getInstructionIndex on SlotIndexes?

I really really like this idea, but the implementation, not so much. :)

This is just WIP :) Let's make it better.

I assume that you'd disable and eventually delete FixupCalleeSaved if this landed right? (Mostly checking my understanding of the patch intent.)

Yep, the intent is as. However we should be careful with deleting the pass because it is used for all RAs while this fixes only Greedy RA.
In downstream I've checked that if I modify FixupCallerSaved and insert an assert there that only callee saved are expected then no assert triggered in my testing (not so big to be honest - I probably should check it more).

Code structure wise, I think this needs to be merged into the existing loop. If I'm reading the code correctly, you just need to add a special case when we'd otherwise exit the innermost loop. If *SlotI == LiveI->end and Live->end is a statepoint deopt use, then add the regmask. (I believe that SlotE is past the end of all possible instructions, and thus we can't have a statepoint there.)

This was my first implementation but unfortunately it is not true.
The critical pieces are the following:

  1. Indeed when we exit innermost loop, SlotI is a first slot after current segment and we can check deopt use.
  2. After that we advance to next segment which end is strongly bigger then SlotI. As a result if SlotI is an end of next segment we'll skip this segment and will not handle SlotI. The simplest example is LI with two segments and the second one ends with statepoint. We'll skip it.

You could also consider adjusting the existing code to use a lambda to do the lazy UsableRegs init. That would be a lot easier to read even for the existing code, and your approach becomes easier to read if you're not duplicating all of the Found logic.

Agreed.

If my earlier comment didn't parse, I'd suggest extracting the helper as suggested inline and rebasing over the lambda. I think once most of the extra complexity is out of the way, it'll be easier to see ways to fold the two loops together.

llvm/lib/CodeGen/LiveIntervals.cpp
937

Agreed.

945

The intent here is as follows:
we found that live segment is ended with statepoint instruction and there is a deopt use for this live interval in this statepoint instruction. Now we need to find regmask corresponding to this statepoint. We can do this using two approaches:

  1. Iterate over operands of statepoint and find an regmask operand
  2. There are two arrays in this method: Slots and Bits. The first array contains slot indexes for which regmask exists. The second array contains regmasks. if we know slot index we should find an index in Slots array and use found index in Bits to get regmask.

The second approach is implemented here.

skatkov updated this revision to Diff 337037.Apr 12 2021, 9:06 PM
skatkov updated this revision to Diff 337049.Apr 12 2021, 11:05 PM
skatkov edited the summary of this revision. (Show Details)

Is this better?

skatkov retitled this revision from [WIP] Teach Greedy RA to recognize special case live-through to [Live Intervals] Teach Greedy RA to recognize special case live-through.Apr 12 2021, 11:07 PM
reames accepted this revision.Apr 13 2021, 1:06 PM

LGTM w/minor comments.

llvm/lib/CodeGen/LiveIntervals.cpp
891

Add a definition of live-through to the comment. A couple of words is fine, you don't need anything exhaustive.

934

Rename this to unionBitMask, commit, and rebase please.

llvm/test/CodeGen/X86/statepoint-regs.ll
58

This test delta (and all the following analogous ones) is interesting. It's correct, but it's a case where we'd have better off pushing the deopt operands onto the stack. (Note: "push" here is key word. General spilling and leaving other values in registers would be smaller code wise, but probably *not* faster due to stack engine.)

Probably a good case to look into for regalloc quality around deopt.

This revision is now accepted and ready to land.Apr 13 2021, 1:06 PM
skatkov added inline comments.Apr 13 2021, 7:54 PM
llvm/test/CodeGen/X86/statepoint-regs.ll
58

I guess it is a case of better use spilling instead of first usage of callee saved register.

skatkov added inline comments.Apr 13 2021, 8:28 PM
llvm/lib/CodeGen/LiveIntervals.cpp
958

Some compilation time save.
if we processed SlotI we can move it forward.

skatkov updated this revision to Diff 337326.Apr 13 2021, 9:04 PM

Handled comments before landing,

This revision was landed with ongoing or failed builds.Apr 13 2021, 11:28 PM
This revision was automatically updated to reflect the committed changes.