This is an archive of the discontinued LLVM Phabricator instance.

Fix for PR 26500
ClosedPublic

Authored by nemanjai on Feb 16 2016, 8:27 AM.

Details

Summary

The register used in emitPrologue/emitEpilogue to spill and reload the condition register is hard coded as R12. This isn't a problem in general, however when we shrink-wrap the function, the prologue/epilogue will end up in a block that may be using R12. In a case like that, we will clobber a value that is potentially needed.

This patch provides the support for acquiring an unused register for spilling the CR. In cases where there isn't an extra available register, the LR and CR will be spilled/reloaded using the same temporary register.

In the cases where we are not shrinkwrapping (i.e. the prologue/epilogue are staying put), we will not look for an available register since R12 is available.

Diff Detail

Repository
rL LLVM

Event Timeline

nemanjai updated this revision to Diff 48073.Feb 16 2016, 8:27 AM
nemanjai retitled this revision from to Fix for PR 26500.
nemanjai updated this object.
nemanjai added reviewers: hfinkel, kbarton, wschmidt, seurer.
nemanjai set the repository for this revision to rL LLVM.
nemanjai added a subscriber: llvm-commits.

I was thinking of adding a test case from the PR (reduced in a trivial way but still 915 lines of IR). However, I think the key things to test here are:

  1. No change in behaviour for non-shrink-rapped functions
  2. Shrink-wrapped functions only change by potentially using a register other than R12 to spill CR
  3. The selected register isn't live in the basic block

Number 1. is accomplished by the fact that other test cases did not have to change. Numbers 2. and 3. cannot really be tested reliably in the IR test case.
The only thing I see as a possibility is that I add a test case to projects/test-suite that ensures there's no crash with this fix (i.e. build the program Anton posted in the PR and run it).

hfinkel accepted this revision.Feb 18 2016, 5:35 AM
hfinkel edited edge metadata.

A few comments, but otherwise, LGTM. Thanks!

lib/Target/PowerPC/PPCFrameLowering.cpp
646

Variable names start with a capital letter. (I see it is like this below also, so please fix the others in a follow-up commit).

650

We need a comment somewhere explaining what's going on, and this might be a good place.

// We need a scratch register for spilling LR and for spilling CR. By default, we use two scratch registers to hide latency. However, if only one scratch register is available, we can adjust for that by not overlapping the spill code. However, if we need to realign the stack (i.e. have a base pointer) and the stack frame is large, we need two scratch registers.
884

Please add a comment here noting that this condition needs to be kept in sync with the check in canUseAsPrologue.

lib/Target/PowerPC/PPCFrameLowering.h
67

I think that the code would be easier to read if you made some of these have default parameters:

bool findScratchRegister(MachineBasicBlock *MBB,
                           bool UseAtEnd,
                           bool Needs2ScratchRegs = false,
                           unsigned *ScratchRegister = nullptr,
                           unsigned *ScratchRegisterCR = nullptr) const;

and then adjusted the call sites accordingly.

This revision is now accepted and ready to land.Feb 18 2016, 5:35 AM
kbarton added inline comments.Feb 18 2016, 9:30 AM
lib/Target/PowerPC/PPCFrameLowering.cpp
571–574

Need to give ScratchRegisterCR a default value

576

No, you shouldn't do this this. You should define what you want the ScratchRegisterCR to be here, don't assume it is defined elsewhere.

664

Remove /*Needs2ScratchRegs*/ in comments

lib/Target/PowerPC/PPCFrameLowering.h
62

Please updated the doxygen comments for this function.

qcolombet added inline comments.
lib/Target/PowerPC/PPCFrameLowering.cpp
609

Don’t know if you want to fix this while you are here, but you’ll need to keep only scratch registers in the returned set.
See Tim’s comment in http://reviews.llvm.org/D17350 for more details.

kbarton added inline comments.Feb 18 2016, 10:28 AM
lib/Target/PowerPC/PPCFrameLowering.cpp
610

Yes, good catch!
This should be easy to do by masking out all non-volatile registers from the bit vector, before traversing it.

nemanjai updated this revision to Diff 48381.Feb 18 2016, 11:57 AM
nemanjai edited edge metadata.
  • Fixed the names
  • Provided default arguments
  • Removed callee-saved registers from consideration
  • Provided a default register for CR spilling (rather than expecting the caller to have set it)
  • Added the requested comments
nemanjai updated this revision to Diff 48382.Feb 18 2016, 12:07 PM

I forgot one change:

  • If the second scratch register is not available, find_next will return -1

Of course, this should never happen if the caller provides the correct argument for Needs2ScratchRegs (we will have returned before getting here). However, assuming the caller incorrectly sets the argument, we don't want to return some huge register number.

kbarton added inline comments.Feb 18 2016, 12:32 PM
lib/Target/PowerPC/PPCFrameLowering.cpp
601

You don't need to do this now, since you've initialized ScratchRegisterCR at the beginning.

616

Probably should expand on this comment a bit...

655

You should refactor this logic into a separate method and use it when calling findScratchRegister in emitPrologue and emitEpilogue, as well as here.

661

Need to update this call to findScratchRegister as well, since the epilogue can use two scratch registers.

753–754

Need to use the same logic when calling findScratchRegister here, as used above in canUseAsPrologue.

883

This is currently not in sync, since you're only using MustSaveCRs to indicate whether you need two scratch registers.

1106–1107

We need to make a corresponding change to findScratchRegister in canUseAsEpilogue.

lib/Target/PowerPC/PPCFrameLowering.h
46

What happens if it cannot find two registers?

nemanjai added inline comments.Feb 18 2016, 2:24 PM
lib/Target/PowerPC/PPCFrameLowering.cpp
601

Well, there's no guarantee at this point that R12 is available. If I don't do this, we can easily end up in the same situation that prompted the bug. The idea is that I assume we don't have the second scratch register available until one is found. The code that reorders the CR/LR spills depends on the two registers actually being the same.

616

How about:
We shouldn't use callee-saved registers as scratch registers as they may be available when looking for a candidate block for shrink wrapping but not available when the actual prologue/epilogue is being emitted.

655

This is a good point. If I don't have to save the CR but I do have to realign a large stack frame and R0 is available as a scratch register, findScratchRegister will just set ScratchRegisterCR to R12 without checking if it's available.

661

The code in emitEpilogue only uses the second scratch register for spilling the CR so there's never a requirement for a second scratch register. When emitting the epilogue, we will ask nicely for a second scratch register if we're spilling the CR, but if we don't get the second one, we're still functionally correct (the CR spill sequence completes before the LR sequence starts).

753–754

Yes, I agree. Well almost. The condition in words should be "if we need to spill the CR OR we need to realign a large stack frame".

883

I see what you mean. It's in sync with canUseAsPrologue but not with the call to findScratchRegister.

1106–1107

I think that this would have the potential of causing some functions not to be shrink wrapped unnecessarily. At the very least, it will make blocks that don't have two available scratch registers invalid for epilogue insertion when we technically could use the block as epilogue. This is so because as I mentioned in the comment above, the emitEpilogue function never has a hard requirement for two scratch registers - if we don't have two, we just use the same one for spilling the CR and LR.

So now the question is, do we want that? Using a single register will cause the CR/LR spills to have increased overall latency. On the other hand, if no blocks are available for epilogue insertion, we don't shrink wrap. I can implement it either way.

lib/Target/PowerPC/PPCFrameLowering.h
46

I've described that in the return section. Do you want me to add something here as well?

kbarton added inline comments.Feb 18 2016, 3:05 PM
lib/Target/PowerPC/PPCFrameLowering.cpp
601

See comment below.

616

... because they have been added as live-in to the prologue block by the PrologueEpilogueInserter.

623

I don't think you want this BV.count() < 2 check here.

As it stands now, if there aren't at least 2 bits available, this will return false (indicating you can't find 2 different registers).
This ties in with the if condition above on line 601. It's possible that there is one register available, in which case both ScratchRegister and ScratchRegisterCR are set to R0 (line 604) but then this returns false here. The effect of that is shrink wrapping will be disabled, and the same temp register will be used for both the CR and LR (i.e., lost both opportunities).

I think this logic needs to be reworked here to account for the 3 different scenarios:

  1. No scratch registers available: set ScratchRegister to R0, ScratchRegisterCR to R12, return false;
  2. One scratch register available: set both ScratchRegisters to available reg; return true;
  3. Two scratch registers available: set ScratchRegister and ScratchRegisterCR using the bit vector iterator, return true.
629

Why would find_next return -1? You've already checked that there are at least 2 bits set.

1106–1107

I think we should use two scratch registers, if they are available, otherwise just use one. There should be other ways we can overlap the instructions to hide the latencies, if need be.
However, I would put a comment in both canUseAsEpilogue and emitEpilogue saying that this.

lib/Target/PowerPC/PPCFrameLowering.h
46

Sorry, I missed that.
But you should update to indicate R0 and R12 are used as the defaults.

nemanjai updated this revision to Diff 48478.Feb 19 2016, 5:24 AM

Cleaned up the code and added comments to make it a bit more clear as to what is happening. Made sure that we require two registers when we're emitting a prologue for a function that requires two registers.
I can add the following as a comment for findScratchRegister if the reviewers would like (since the semantics of that function are difficult to follow).

Just to enumerate all the possibilities for findScratchRegister:

  • When querying whether a block is a valid candidate for the prologue/epilogue
    • The block is in a function where large stack frame realignment is required: we return true only if there are two available registers
    • The block is in a function where large stack frame realignment is not required: we return true if at least one register is available
    • The output parameters are not set in either case
  • When requesting the actual scratch registers
    • We need to save the CR and realign a large stack frame:
      • If two registers are available, we'll set both
      • If only one register is available, we'll set both to that one (should be an impossible situation)*
      • If no registers are available, we'll set both to R0 (should be an impossible situation)*
    • We need to save the CR but do not need to realign a large stack frame:
      • If two registers are available, we'll set both
      • If only one is available, we'll set both to that one
      • If no registers are available, we'll set both to R0 (should be an impossible situation)*
    • We do not need to save the CR but need to realign a large stack frame:
      • If two registers are available, we'll set both
      • If only one register is available, we'll set both to that one (should be an impossible situation)*
      • If no registers are available, we'll set both to R0 (should be an impossible situation)*
    • We do not need to save the CR or realign a large stack frame:
      • If R0 is available, we'll set the first scratch register to R0 and the second one will default to R12
      • If two registers are available (other than R0), we'll set both
      • If one register is available, we'll set both to that one
      • If no register is available, we'll set both to R0 (should be an impossible situation)*

*The reason this situation should be impossible is if we have queried the basic block for the right number of available scratch registers, it would have returned true only if they are available. The only way we can get into this situation is if we are trying to insert the prologue/epilogue into a basic block that we have not queried with canUseAsPrologue/canUseAsEpilogue.

nemanjai updated this revision to Diff 48536.Feb 19 2016, 11:43 AM

Finally cleaned up the logic in findScratchRegister. Now it will try its best to find two unique registers. If it can, it sets both (if the pointers are passes) and returns true. If it can find only one, it will set both to that and return true if only one is required or false if two are required. Finally, if it can't find any free registers, it will return false and set both to R0 if the pointers are passed.

The condition that determines whether two registers are needed is now in a separate function which is queried both in canUseAsPrologue and emitPrologue.

kbarton added inline comments.Feb 19 2016, 12:38 PM
lib/Target/PowerPC/PPCFrameLowering.cpp
635

use unsigned and ++i, not i++.

640

Please use the term unique here. This isn't about getting two registers - it's about getting unique registers.

652

You need to move this check above line 618. If there are not two unique registers, you need to return false and set SR1 and SR2 to the default values. This will return false and set the scratch registers to be the same.

757

please add:
assert(!(SingleScratchReg && twoScratchRegsRequired) && "I asked for two unique registers!);

This should assure that if you asked for two unique registers, you really got two unique registers.

lib/Target/PowerPC/PPCFrameLowering.h
55

... require two unique scratch registers.

64

... were not available and the defaults are being used.

nemanjai added inline comments.Feb 19 2016, 3:42 PM
lib/Target/PowerPC/PPCFrameLowering.cpp
635

I don't see why this would make any difference for a primitive type, but I will certainly change it.

640

OK.

652

That seems like a very bad idea to me. What the result would be is that we set the SR1 and SR2 to R0 and R12 respectively knowing that these registers are not available. Don't forget that in emitPrologue/emitEpilogue we do not use the return value at all - we just use the function to get the registers that are available.
If we somehow end up in a situation where we are inserting the prologue/epilogue into a block that doesn't have the R0/R12 available, the function will happily go on it's merry way using the two registers.

757

OK, will do. I only didn't add it here because I have what is essentially an equivalent assert on line 905. Do you think this requires another review?

lib/Target/PowerPC/PPCFrameLowering.h
55

OK.

64

Yeah, makes sense. I'll change it.

Some additional comments below. As far as I'm concerned, you can commit and we can do additional fixups as followup.

lib/Target/PowerPC/PPCFrameLowering.cpp
597

It seems like, if we only need one scratch register, and R0 is available, then we can also return true here.

635

ints are preferred (because the compiler can know they can't wrap), so we tend to only use unsiged when necessary for some reason (although we're not entirely consistent about it). ++i is preferred because we only use post-increment if there's a reason (because it can be more expensive for non-primitive types, and no one should have to think about it otherwise).

641

Setting this to R0 seems confusing here. First, it is already set to R0 at the beginning of the function, and by definition, R0 is not actually available (right?). It seems better to set it to -1 in this case (or some other invalid value, like 0 (== PPC::NoRegister)) so that we'd catch an accidental use of it later.

nemanjai added inline comments.Feb 20 2016, 2:48 AM
lib/Target/PowerPC/PPCFrameLowering.cpp
597

The only reason I don't have that check is that we only know if we absolutely have to have two unique registers. However, when we are in a block that needs to save the CR, we don't absolutely have to have them, but if we do, we emit better code.

I guess the overall idea implemented with this function is this:

  • We'll try our best to give you two unique registers
  • If we can't, we'll give you one if we can
  • If you happen to be asking whether we have enough registers available (1 or 2), we'll let you know
641

I really like the idea of setting it to PPC::NoRegister. After all, if we don't have an available scratch register, it is better to fail than to clobber a live register and have a hard to track-down bug.
The only reason this was setting the register to R0 is to replicate existing semantics.

652

Actually, now that Hal suggested PPC::NoRegister, I actually like that idea best. Then, keeping this check here would have the following outcome:

  • If no registers are available, we set both to PPC::NoRegister and return false
  • If only one register is available, we set both to that and return true if we only need one, false if we need two
  • If two registers are available, we set both accordingly and return true

Finally, I recommend that the calls to this function in emitPrologue/emitEpilogue use the return value of this function in an assert (see below).

755

I think the source of a lot of confusion is that we only use the function here as a mutator and don't check its return value. We ultimately should never be in a situation where this function returns false here. So I recommend having it in an assert:

assert(findScratchRegister(&MBB, false, twoScratchRegsRequired(&MBB), &ScratchReg,
       &TempReg) && "Not enough available scratch registers in this block");
755

Actually, I've come to realize that this assert is redundant since findScratchRegister never currently sets the scratch register to zero.

nemanjai updated this revision to Diff 48588.Feb 20 2016, 4:00 AM

I think Kit mentioned that he still has some concerns about the semantics in the patch. In the meantime and in the interest of getting this patch finalized as soon as possible so that it stops blocking 3.8, I'm uploading an update that I feel addresses Hal's and Kit's comments.

The major change is that we don't set the output parameters to registers that aren't available. If any required register isn't available, we'll set the corresponding output parameter to PPC::NoRegister. This will ensure that we never try to use live registers as scratch registers. Also, to eliminate confusion about what the purpose of findScratchRegister is used for at any particular call, the return value is now always used. When we're emitting the prologue/epilogue, we use it to assert that we have enough registers.

kbarton added inline comments.Feb 20 2016, 5:00 AM
lib/Target/PowerPC/PPCFrameLowering.cpp
597

Yes, I agree.

641

R0 is always available because it is the default scratch register. What these checks are essentially doing is determining if we have enough available scratch registers or whether we need to fall back onto the default registers recommended by the ABI.

It's necessary to put the check from line 632 above these checks.
Once that is done, we can simplify this by just using the bit vector iterators to pull the first one or two free registers from the bit vector.

652

This scenario can never happen if this function is used properly.
As long as it is called exactly the same way (same MBB, same conditions for unique registers) it will return false the first time.
In canUseAsPrologue, if this returns false, then the specified block cannot be used for the prologue. If you still try to put the prologue there, then you will get incorrect results, but you have not used the functions properly. If you really want to test for this, check the return value for the call emitPrologue/emitEpilogue, and put an assert on that. It should *always* return true from those functions.

757

OK, yes, that's true. That assert will provide the same coverage.
Yes, this still requires another review because there are still problems in the findScratchRegisters method.

kbarton added inline comments.Feb 20 2016, 5:16 AM
lib/Target/PowerPC/PPCFrameLowering.cpp
641

I don't think we can use PPC::NoRegister here - it breaks the contract of the function.
Read the comments in the function carefully - it cleary states what is expected if all registers are not available. We return false and return the default registers (R0 and R12).

I think it's a much better idea to always check the return code, and assert in cases where it is expected to return true. That is, when we are asking whether there are registers available, the return value is used to make a decision (i.e., in canUseAsPrologue). Once the decision is made, we use the same function to get the actual registers available (i.e., emitPrologue). In that case, this function should always return true, or something went wrong.

kbarton added inline comments.Feb 20 2016, 6:38 AM
lib/Target/PowerPC/PPCFrameLowering.cpp
641

OK, I've thought about this some more and I agree we should be able to use PPC::NoRegister in cases where the function returns false. It's a good idea to ensure everything is being used correctly.

Please make sure to change the documentation of the function, as this is now different from what it originally states. I would also add this (or something similar) to the documentation, to indicate the high-level view of what the function does:

  1. If MBB is an entry or exit block, set SR1 and SR2 to R0 and R12 respectively (defaults recommended by the ABI) and return true
  2. If MBB is not an entry block, initialize the register scavenger and look for available registers. If TwoUniqueRegsRequired is set to true, it looks for two unique registers. Otherwise, look for a single available register. If the required registers are found, set SR1 and SR2 and return true. If the required registers are not found, set SR1 and SR2 to PPC::NoRegister and return false.

Note that if both SR1 and SR2 are valid parameters, and TwoUniqueRegsRequired is not set, this function will attempt to find two different registers, but still return true if only one register is available (and set SR1 == SR2).

I think we've converged; please update the comment as Kit has suggested, and commit this.

lib/Target/PowerPC/PPCFrameLowering.cpp
597

Okay, a comment here to that effect would be useful.

nemanjai closed this revision.Feb 23 2016, 12:48 AM

This was committed in a series of patches due to bugs I introduced when committing it. This should be the final commit (hopefully no more bugs):
Committed revision 261546.