This is an archive of the discontinued LLVM Phabricator instance.

[PPC64] Find valid scratch register for prologue/epilogue during shrink wrapping
ClosedPublic

Authored by kbarton on Oct 21 2015, 1:07 PM.

Details

Summary

There was an assumption in the emitPrologue and emitEpilogue codes that register R0 could be used as a scratch register when saving/restoring several registers (Link Register, Base Pointer, etc.). When shrink wrapping has moved the location of the prologue/epilogue, it is possible that R0 has already been assigned and is in use at the location where the prologue/epilogue are inserted, resulting in bad code generation. This is a problem that was exposed in the self-hosting build of Clang.

The solution is to use the canUseAsPrologue and canUseAsEpilogue methods to determine if a register is available to be used as the scratch register in the prologue/epilogue. If a register is available, it is saved in the PPCFrameLowering class for use by the emitProlog and emitEpilog methods. If no register can be found, the the blocks identified by shrink wrapping are not valid, and the optimization will not make any changes.

A reduced test case was created based on the failing file in the self-host build (BreakableToken.cpp)

Diff Detail

Event Timeline

kbarton updated this revision to Diff 38037.Oct 21 2015, 1:07 PM
kbarton retitled this revision from to [PPC64] Find valid scratch register for prologue/epilogue during shrink wrapping.
kbarton updated this object.
kbarton added reviewers: hfinkel, qcolombet.
kbarton added subscribers: wschmidt, nemanjai, seurer, tjablin.
qcolombet requested changes to this revision.Oct 21 2015, 2:47 PM
qcolombet edited edge metadata.

Hi Kit,

I don’t think the way you store the scratch register is correct. I.e., FrameLowering is supposed to be stateless and moreover, unless I remember wrong, the current implementation is potentially wrong. I suggest moving that information into PPCFunctionInfo. See my inline comment.

Finally, I think the calls to the register scavenger in canUseAsEpilogue are missing something.

Cheers,
-Quentin

lib/Target/PowerPC/PPCFrameLowering.cpp
92

Introduce a getInitialScratchRegister or something to avoid duplicating this code.

998

You need to move forward in the block just before the first terminator, right?

998

If you store your scratch information into the MachineFunctionInfo, you’ll have it here.

lib/Target/PowerPC/PPCFrameLowering.h
136

Unless I am mistaking, FrameLowering instances can be shared by several MachineFunctions.

Although I believe it can’t happen with the current pipeline of MachineInstr passes, you have no guarantee that you won’t have a call to canUseAsPrologue with a different MachineFunction/MachineBasicBlock before emitPrologue. I.e., the information in here may be invalid between when you compute it and when you use it.

If you want to keep the information here, you would need a map MBB -> scratch reg, but given our current scheme you may use a pair MMB, scratch reg, and assert that MBB is the same in emitXXX as the last one you saw from canUseAsXXX.

That being said, it may make more sense to keep this information into PPCFunctionInfo that is bound to the function.

This revision now requires changes to proceed.Oct 21 2015, 2:47 PM
nemanjai added inline comments.Oct 22 2015, 2:16 AM
lib/Target/PowerPC/PPCFrameLowering.cpp
983

Not that I really understand this code, but since other assignments to PrologueScratchRegister and EpilogueScratchRegister depend on isPPC64(), shouldn't this as well? Namely, won't this change the variable from PPC::R0 to some unused PPC::Xn if PPC::R0 is used?

test/CodeGen/PowerPC/BreakableToken-reduced.ll
191

This is minor but may cause a failure at some point if I understand this correctly:
If the scavenger finds registers 10, 20 or 30, I think this REGEX will not match. Perhaps {{[1-9]}} so that just the first digit is non-zero. Or (and I'm not sure if this works) {{[1-9][0-9]*}}.
Similarly with the restore.

wschmidt added inline comments.Oct 22 2015, 6:32 AM
lib/Target/PowerPC/PPCFrameLowering.cpp
983

Agreed, need GPRCRegClass when ! isPPC64(). Good catch.

1003

Also needs a PPC64 check.

lib/Target/PowerPC/PPCFrameLowering.h
136

Agreed. I got dinged once before by trying to put mutable state in PPCFrameLowering -- it's not permitted.

kbarton added inline comments.Oct 22 2015, 11:57 AM
lib/Target/PowerPC/PPCFrameLowering.cpp
983

We technically don't need this, because this code is only run on ppc64. However, I can add it in case we decide to enable this for 32-bit at some point in the future.

998

Yes, need to include anything that gets initialized in the basic block. Good catch!

lib/Target/PowerPC/PPCFrameLowering.h
136

OK, good, I'll make this change and post another review.

test/CodeGen/PowerPC/BreakableToken-reduced.ll
191

ok, that's a good point. I'll need to play with the regular expression and see how to write one that matches.

kbarton added inline comments.Oct 22 2015, 5:07 PM
test/CodeGen/PowerPC/BreakableToken-reduced.ll
191

It turns out that this expression will match 10, 20, and 30. So I will leave this as is.

kbarton updated this revision to Diff 38189.Oct 22 2015, 5:32 PM
kbarton edited edge metadata.

Moved the PrologueScratchRegister and EpilogueScratchRegister to PPCMachineFunctionInfo class and fixed up the locations where these values are used and set.
I also fixed the problem in the canUseAsEpilog method by moving the RegisterScavenger forward to the first terminator of the epilogue block (to ensure we do not identify any registers used in the epilogue block as available).

hfinkel added inline comments.Oct 27 2015, 1:21 PM
lib/Target/PowerPC/PPCFrameLowering.cpp
668

Doing things this way introduces a very-unfortunate ordering dependency. You're assuming that the very last successful call to canUseAsPrologue and canUseAsEpilogue will be on the blocks that are ultimately chosen. While this might be true, it obviously constrains other uses of these functions, and would require prominent comments both in TargetFrameLowering.h and also in ShrinkWrap.cpp.

But don't do that. Just refactor the logic so that you can rerun the RS here and pick a good register. Then there's no ordering dependency and we don't need to worry about breaking this in the future.

981

No space after !.

1006

No space after !.

1008

Same here; rerun the RS logic.

test/CodeGen/PowerPC/BreakableToken-reduced.ll
2

Can you use a MIR test instead of this? I suspect this will be quite fragile and we'll soon no longer be testing anything in particular.

We have a few of these already (see test/CodeGen/PowerPC/*.mir), and you can generate the inputs by running llc on this test with a appropriate -stop-after flag.

kbarton added inline comments.Oct 27 2015, 3:20 PM
lib/Target/PowerPC/PPCFrameLowering.cpp
668

OK, I see your point. My concern for running RS twice is compile time, but I haven't looked through that code to see how expensive it is. If we're not worried about the cost of the RS, then we don't need to cache the registers that it picks.
Running the logic twice is also consistent with how it is done on ARM (similar, not identical problem).

hfinkel added inline comments.Oct 27 2015, 3:25 PM
lib/Target/PowerPC/PPCFrameLowering.cpp
668

Agreed, but we're already running RS for each block queried, so what's one more time? ;)

kbarton updated this revision to Diff 39593.Nov 6 2015, 2:37 PM
kbarton edited edge metadata.

Modified the new findScratchRegister method to indicate whether there was an available scratch register to use in the given block, in addition to setting the Scratch Register.
This method is now used in canEmitPrologue/canEmitEpilogue during ShrinkWrapping as well as the emitPrologue and emitEpilogue methods when inserting the prologue/epilogue code.

Hi Kit,

FWIW, the new logic LGTM.
I leave the tests part and the final LGTM to PPC people.

Cheers,
-Quentin

lib/Target/PowerPC/PPCFrameLowering.cpp
602

NULL -> nullptr

608

Ditto.

hfinkel accepted this revision.Nov 11 2015, 8:14 AM
hfinkel edited edge metadata.

Aside from the NULL -> nullptr changes pointed out by Quentin, this LGTM.

Committed r253247

kbarton accepted this revision.Nov 16 2015, 12:26 PM
kbarton added a reviewer: kbarton.
qcolombet accepted this revision.Jan 27 2016, 10:23 AM
qcolombet edited edge metadata.

Landed in r253247.

This revision is now accepted and ready to land.Jan 27 2016, 10:23 AM
qcolombet closed this revision.Jan 27 2016, 10:24 AM