Page MenuHomePhabricator

Do not mark CSRs as pristine in functions that have been shrink wrapped
AbandonedPublic

Authored by kbarton on Sep 21 2015, 3:13 PM.

Details

Reviewers
qcolombet
MatzeB
Summary

The current definition if a pristine register is:
Pristine registers hold a value that is useless to the current function, but that must be preserved - they are callee saved registers that are not saved.
Before the PrologueEpilogueInserter has placed the CSR spill code, this method always returns an empty set.

There is currently an assumption that once the PrologueEpilogueInserter has run, all CSRs are not pristine because they have been saved in the prologue and restored in the epilogue. However, when shrink wrapping has run and identified new save/restore points, the CSR are still pristine before the save block and after the epilogue block.

This is a problem exposed by the Fhourstones benchmark when shrink wrapping on PPC64 was enabled. A reduced test case that illustrates the problem has been created.

Diff Detail

Event Timeline

kbarton updated this revision to Diff 35312.Sep 21 2015, 3:13 PM
kbarton retitled this revision from to Do not mark CSRs as pristine in functions that have been shrink wrapped.
kbarton updated this object.
kbarton added reviewers: qcolombet, MatzeB.
kbarton added subscribers: seurer, hfinkel, wschmidt, llvm-commits.

Can you give some more context why this fails? I recently changed the PristineRegister definition to be independent of the place in the function so it wouldn't matter whether you are in the prologue or epilogue of the function. For the prologue/epilogue those saved CSRs should be marked as live-in values and therefore should be considered live by the RegisterScavenger/LivePhysRegs. Do you have an instance where this is not the case?

Can you give some more context why this fails? I recently changed the PristineRegister definition to be independent of the place in the function so it wouldn't matter whether you are in the prologue or epilogue of the function. For the prologue/epilogue those saved CSRs should be marked as live-in values and therefore should be considered live by the RegisterScavenger/LivePhysRegs. Do you have an instance where this is not the case?

The attached test case demonstrates how this fails.
Shrink wrapping identifies a new location for the save/restore blocks. Callee-saved register R30 is then moved to be saved/restored outside of the prologue. Later in the compilation, during Aggressive Anti-Dependence breaking, R30 is chosen to be used to break an anti-dependence in a block *after* the restore location identified by shrink wrapping. This clobbers the value in R30.

I just reread the comments above. You're saying that in the example I provided above, R30 should be marked as live-in to the epilogue block (indicating that it is live on exit of the function)? I did not check the live-in registers, so I cannot say for certain whether that is happening or not. I can spend some time looking at that, to determine whether that is the case or not.

Can you give some more context why this fails? I recently changed the PristineRegister definition to be independent of the place in the function so it wouldn't matter whether you are in the prologue or epilogue of the function. For the prologue/epilogue those saved CSRs should be marked as live-in values and therefore should be considered live by the RegisterScavenger/LivePhysRegs. Do you have an instance where this is not the case?

The attached test case demonstrates how this fails.
Shrink wrapping identifies a new location for the save/restore blocks. Callee-saved register R30 is then moved to be saved/restored outside of the prologue. Later in the compilation, during Aggressive Anti-Dependence breaking, R30 is chosen to be used to break an anti-dependence in a block *after* the restore location identified by shrink wrapping. This clobbers the value in R30.

The issue here maybe an assumption that the AggressiveAntiDepBreaker makes about how PristineRegisters work, specifically, in AggressiveAntiDepBreaker::StartBlock we have this code:

// Mark live-out callee-saved registers. In a return block this is
// all callee-saved registers. In non-return this is any
// callee-saved register that is not saved in the prolog.
const MachineFrameInfo *MFI = MF.getFrameInfo();
BitVector Pristine = MFI->getPristineRegs(MF);
for (const MCPhysReg *I = TRI->getCalleeSavedRegs(&MF); *I; ++I) {
  unsigned Reg = *I;
  if (!IsReturnBlock && !Pristine.test(Reg)) continue;
  for (MCRegAliasIterator AI(Reg, TRI, true); AI.isValid(); ++AI) {
    unsigned AliasReg = *AI;
    State->UnionGroups(AliasReg, 0);
    KillIndices[AliasReg] = BB->size();
    DefIndices[AliasReg] = ~0u;
  }
}

So, in a return block, it is marking all callee-saved registers as live-out, and in a non-return block, it is marking as live-out all callee-saved registers that are pristine (and, thus, should not be used). The assumption here, obviously, is that the restore point for non-pristine callee-saved registers will be in the return block. With shrink-wrapping enabled, this clearly won't be the case.

I'll also note that, prior to Matthias's change in r238524, we were collecting pristine registers per-MBB, not per function. Even then, however, it is not clear to me that the code would have been correct for shrink wrapping unless the set of pristine registers was maximal in blocks with the restore point.

Also worth noting is that in CriticalAntiDepBreaker::StartBlock we have very-similar logic:

// Mark live-out callee-saved registers. In a return block this is
// all callee-saved registers. In non-return this is any
// callee-saved register that is not saved in the prolog.
const MachineFrameInfo *MFI = MF.getFrameInfo();
BitVector Pristine = MFI->getPristineRegs(MF);
for (const MCPhysReg *I = TRI->getCalleeSavedRegs(&MF); *I; ++I) {
  if (!IsReturnBlock && !Pristine.test(*I)) continue;
  for (MCRegAliasIterator AI(*I, TRI, true); AI.isValid(); ++AI) {
    unsigned Reg = *AI;
    Classes[Reg] = reinterpret_cast<TargetRegisterClass *>(-1);
    KillIndices[Reg] = BBSize;
    DefIndices[Reg] = ~0u;
  }
}

these parts of the code are many years old, and perhaps date from a time when relevant parts of the infrastructure were different.

Looking at the function just prior to anti-dependency breaking (which is post-RA scheduling), %X30 is certainly marked as live-in to those blocks that are predecessors to the now-conditionally-executed prologue block (which is the same block containing the epilogue). However, it is not in the live-in set of any other block in the function, and thus the problem. In the non-return blocks, because %X30 is not in the set of pristine registers, and it is not marked live-in anywhere else, the anti-dependency breakers will assume it to be available for allocation. As far as I can tell, the register scavenger will suffer from a similar problem.

Kit, while your fix is conservatively correct, I'd like to find a better way. PrologEpilogInserter contains code in a function called updateLiveness to add Live-In registers to blocks based on the selected save/restore points. You could try updating it to mark pristine registers outside the save/restore region as live-in as well. Another option is the teach the anti-dependency breakers about save/restore points, and let them use dominance queries to determine if they need to consider all callee-saved registers as pristine (just as they currently do in return blocks). I imagine that checking whether a block is dominated by the save point and not dominated by the restore point would do the trick.

I looked at the issue an think the proper fix is this: http://reviews.llvm.org/D13176

So this was simply a bug about the live-in lists not getting set correctly after the restore point if the save and restore points happened to be in the same basic block (while still having successor blocks).
The code in AggressiveAntiDepBreaker should work fine with this fix as far as I can see.

qcolombet edited edge metadata.Sep 28 2015, 9:50 AM

Hi Kit,

Could you double check that Matthias’ fix invalidate the need for that?

Thanks,
-Quentin

kbarton abandoned this revision.Oct 14 2015, 10:03 AM

This is no longer needed. It was fixed by MatzeB in http://reviews.llvm.org/D13176.