This is an archive of the discontinued LLVM Phabricator instance.

Expand subregisters in MachineFrameInfo::getPristineRegs
ClosedPublic

Authored by kparzysz on Nov 16 2015, 11:15 AM.

Details

Reviewers
MatzeB
hfinkel
Summary

Solve the previously described problem by adding subregisters of pristine registers to the bit vector returned by getPristineRegs.

Diff Detail

Repository
rL LLVM

Event Timeline

kparzysz updated this revision to Diff 40312.Nov 16 2015, 11:15 AM
kparzysz retitled this revision from to Track pristine registers in terms of register units..
kparzysz updated this object.
kparzysz added reviewers: MatzeB, hfinkel.
kparzysz set the repository for this revision to rL LLVM.
kparzysz added a subscriber: llvm-commits.

Hi Krzysztof,

Unfortunately, I think this is not correct to use the RegUnit for now.
Matthias may remember the details, but right now, we have some weird situation where for instance XMM0 is pristine, but YMM0 is not, whereas they are actually the same regunit.

So, for the target, that means that it has to list all the registers that are actually preserved.

Note, my recollection may be wrong and in that case just ignore my comment, but please double check!

Thanks,
-Quentin

How do you prevent the scavenger from clobbering YMM0 then? Do you think that a similar problem could also happen on X86 in some circumstances?

kparzysz added a comment.EditedNov 16 2015, 12:31 PM

Listing both, 32-bit registers and their 64-bit super-registers as callee-saved does not really solve the problem. For example, if callee-saved = { R16, R17, D8 }, and actually-saved = { D8 }, then there are no pristine registers since R16 and R17 are both covered by D8. The scavenger will still calculate the set-difference and come up with { R16, R17 }, and consider those to be pristine. The "actually-saved" information is used in PEI to generate the save/restore instructions. If that set also contained sub-registers, we would have to add checks to our Hexagon-specific hook to avoid duplicate loads/stores. I'm not sure if all callers of "getCalleeSavedInfo" would be aware of such potential redundancy.

Edit: defeating remarkup failed

MatzeB edited edge metadata.Nov 16 2015, 5:06 PM

I doesn't look like this change actually simplifies the code: While RegisterScavenger.cpp gets simpler with this patch, AggressiveAntiDepBreaker.cpp, CriticalAntiDepBreaker.cpp, LivePhysRegs.cpp and MachineVerifier.cpp get more complicated.

I'd rather change getPristineRegs() to something like this to fix the bug:

BitVector MachineFrameInfo::getPristineRegs(const MachineFunction &MF) const {
  const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
  BitVector BV(TRI->getNumRegs());

  // Before CSI is calculated, no registers are considered pristine. They can be
  // freely used and PEI will make sure they are saved.
  if (!isCalleeSavedInfoValid())
    return BV;

  for (const MCPhysReg *CSR = TRI->getCalleeSavedRegs(&MF); CSR && *CSR; ++CSR)
    BV.set(*CSR);

  // Saved registers and their subregisters are not pristine.
  const std::vector<CalleeSavedInfo> &CSI = getCalleeSavedInfo();
  for (const CalleeSavedInfo &Info : CSI) {
    for (MCSubRegIterator R(Info.getReg(), TRI, true); R.isValid(); ++R)
      BV.reset(*R);
  }

  return BV;
}

would that work for you?

I know that the code does not become simpler, I just wanted to use a common measure of register use. What you are proposing would work for our case, I'd only add the traversal of subregisters to the CSR loop as well (to cover the opposite case).

There is also one other place where the pristine registers are calculated: addPristines in LivePhysReg.cpp. Should that function call MachineFrameInfo::getPristineRegs?

For the record, I am kind of uncomfortable with Matthias’ suggestion because we slightly modify the output of the function. This is not wrong of course, but I believe that for most backend this is not needed because they already know what the output was and were doing the right thing. I.e., we add a compile time overhead for no benefit.

My concern is that this may have a significant impact on backend with a lot of registers (i.e., GPUs). I believe the effect would be mitigated by the fact that, those usually don’t have calls, i.e., they likely don’t have CSRs saved.

The bottom line is if we go with that change we should be aware of potentiel downside for everybody and we need to balance whether a target specific fix would be more appropriate.

That being said, Matthias seems fine with the suggested change, so we might just carry on.

Cheers,
-Quentin

[...] whether a target specific fix would be more appropriate.

What type of approach do you have in mind for that? Targets that call this function themselves will sure know what to expect, my concern is about target-independent code. It doesn't seem like that the function getPristineRegs has a precisely defined result: it seems that the current assumption is that it will be a set of registers that is a subset of what getCalleeSavedRegs returns. This, however, does not work for us. Should it be the all-inclusive set of registers, including all aliases that are fully covered by the pristine bits from the register file, or a minimal set of registers that cover these bits? Anything that is in between?

kparzysz updated this revision to Diff 40544.Nov 18 2015, 12:59 PM
kparzysz retitled this revision from Track pristine registers in terms of register units. to Expand subregisters in MachineFrameInfo::getPristineRegs.
kparzysz updated this object.
kparzysz edited edge metadata.

New version, as per the discussion

kparzysz updated this revision to Diff 40546.Nov 18 2015, 1:02 PM

Delete unused declaration.

I suppose we have agreed on this. Committed in r253600.

MatzeB accepted this revision.Nov 24 2015, 4:09 PM
MatzeB edited edge metadata.

Accept + Close review that is already committed.

This revision is now accepted and ready to land.Nov 24 2015, 4:09 PM
MatzeB closed this revision.Nov 24 2015, 4:09 PM

Accept + Close review that is already committed.