Page MenuHomePhabricator

[RegUsageInfoCollector] Fix handling of callee saved registers with CSR optimization.
ClosedPublic

Authored by jonpa on May 1 2018, 7:38 AM.

Details

Summary

I saw a test case on SystemZ, where Target needed to save and restore the return register (and therefore also the SP), but those registers were missing from the regmask on the call to that function (in other words they were "not preserved"). This is wrong.

Currently, the set of clobbered physregs are first built, and then the callee saved regs are added back as preserved if the CSROpt did not take effect. This misses the registers that the target still had to save.

This patch instead first checks which registers the target actually decided to save, and then checks against this set during iteration over the target registers.

Diff Detail

Event Timeline

jonpa created this revision.May 1 2018, 7:38 AM
vivekvpandya resigned from this revision.May 1 2018, 11:28 AM

I don't understand if this change address any correctness issue. If this in response to recent test failures you discovered for SystemZ then I think those tests are failing because SystemZ::determineCalleeSaves() did not do job as per expectation. Also your proposed change uses getCalleeSavedRegs() and it adds a loop however previous change is using getCallPreservedMask() which is faster.
It is better to get this reviewed from more experience person in register allocation.

Hi Jonas,

Target needed to save and restore the return register (and therefore also the SP), but those registers were missing from the regmask on the call to that function (in other words they were "not preserved")

Usually those are marked reserved and don't need to show up in the mask. Though having them here is probably a good thing :).

Anyhow, shouldn't getCallPreservedMask be a super set of determineCalleeSaves? (A quick look at the header of TFI confirms that.)
Therefore, the current implementation should be conservatively correct.

What am I missing here?

Cheers,
-Quentin

test/CodeGen/SystemZ/ipra-04.ll
18

Nit: use opt -instnamer

jonpa added a comment.May 4 2018, 1:24 AM

Hi Jonas,

Target needed to save and restore the return register (and therefore also the SP), but those registers were missing from the regmask on the call to that function (in other words they were "not preserved")

Usually those are marked reserved and don't need to show up in the mask. Though having them here is probably a good thing :).

Anyhow, shouldn't getCallPreservedMask be a super set of determineCalleeSaves? (A quick look at the header of TFI confirms that.)
Therefore, the current implementation should be conservatively correct.

What am I missing here?

When isSafeForNoCSROpt() returns true, the unpatched implementation does not add any callee saved registers as preserved. This seems to assume the default implementation of determineCalleeSaves() which then returns an empty set. However, SystemZ overrides this and saves the return register because it has to always do that in order to use it when returning from a function.

So my understanding is that since determineCalleeSaves() can be overridden (and depends on isSafeForNoCSROpt) , that is the function should be consulted for the actual dynamic set of saved registers.

qcolombet accepted this revision.May 4 2018, 8:55 AM

Hi Jonas,

Thanks for the explanation.
I missed that the call to getCallPreservedMask was guarded by isSafeForNoCSROpt.

Given the default implementation of determineCalleeSaves is already guarded by isSafeForNoCSROpt, I think the patch makes a lot of sense.

LGTM.

Cheers,
-Quentin

lib/CodeGen/RegUsageInfoCollector.cpp
142

This logging is not very useful now. I would get rid of it.

This revision is now accepted and ready to land.May 4 2018, 8:55 AM
jonpa added a comment.May 7 2018, 1:00 AM

While looking at this and the related machine verifier patch, I am now again somewhat unsure if this is doing the right thing for any given target.

The target returns a list of registers from determineCalleeSaves that does not include any aliases. It seems that we must therefore build this set so that it includes any fully saved/restored register. I have added the subregs already, but it seems we should also check which super-regs end up as fully included, right? Should we make a new function that does this work, since we need this also in the MachineVerifier, like determineCalleeSavesWithAliases? Maybe this could be a static method in RegUsageInfoCollector?

lib/CodeGen/RegUsageInfoCollector.cpp
142

Do you want to remove this entire clause of just the dbgs() output?

I might object a little to removing this, since I have made SystemZ tests that checks for this, for instance to check that the return register is saved even on such a function:

; DBG: fun3 function optimized for not having CSR
; CHECK-LABEL: fun3
​; CHECK: stmg    %r14, %r15, 112(%r15)
​; CHECK: lr      %r14
​; CHECK: a       %r14
​; CHECK: lmg     %r14, %r15, 112(%r15)
​; CHECK: br      %r14

It also seems useful to IPRA to have the statistic around...

qcolombet requested changes to this revision.May 7 2018, 12:40 PM

Thanks for double checking. Now, I don't think some part of the patch makes sense. See my inline comment.
Basically, we should have to augment whatever determineCalleeSaved gives us, otherwise that means this information is just flawed.

The target returns a list of registers from determineCalleeSaves that does not include any aliases.

The default implementation should include the aliases (it relies on isPhysRegModified).
If we don't have the aliases in here, a lot of passes would be broken, like shrink-wrapping or PEI.

It seems that we must therefore build this set so that it includes any fully saved/restored register.

Yes, that's what it should do.

I have added the subregs already, but it seems we should also check which super-regs end up as fully included, right?

This should already happen in the default implementation of determineCalleeSaves.

Should we make a new function that does this work, since we need this also in the MachineVerifier, like determineCalleeSavesWithAliases? Maybe this could be a static method in RegUsageInfoCollector?

No, we shouldn't need that.

lib/CodeGen/RegUsageInfoCollector.cpp
120

That loop shouldn't be required. determineCalleeSaves should already mark those.

142

To me the logging is not useful anymore because there is no change of behavior in the past itself based on isSafeForNoCSROpt.
It looks to me that we are printing this because we expect determineCalleeSaved to do the right thing with it, but that's a big assumption :).

This revision now requires changes to proceed.May 7 2018, 12:40 PM
jonpa added a comment.May 11 2018, 1:24 AM

The default implementation should include the aliases (it relies on isPhysRegModified).

I am confused now:

This loop (per what determineCalleeSaves does) will only add any of the registers that getCalleeSavedRegs() returns:

const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
for (unsigned i = 0; CSRegs[i]; ++i) {
  unsigned Reg = CSRegs[i];
  if (CallsUnwindInit || MRI.isPhysRegModified(Reg))
    SavedRegs.set(Reg);
}

MRI.isPhysRegModified(Reg) checks if any aliases of Reg is modified, but then this only *inserts* Reg.

SystemZRegisterInfo::getCalleeSavedRegs basically returns CSR_SystemZ_SaveList which is:

def CSR_SystemZ : CalleeSavedRegs<(add (sequence "R%dD", 6, 15),
                                       (sequence "F%dD", 8, 15))>;

and generated as:

static const MCPhysReg CSR_SystemZ_SaveList[] = { SystemZ::R6D, SystemZ::R7D, SystemZ::R8D, SystemZ::R9D, SystemZ::R10D, SystemZ::R11D, SystemZ::R12D, SystemZ::R13D, SystemZ::R14D, SystemZ::\
R15D, SystemZ::F8D, SystemZ::F9D, SystemZ::F10D, SystemZ::F11D, SystemZ::F12D, SystemZ::F13D, SystemZ::F14D, SystemZ::F15D, 0 };

So in fact, it seems that the aliases of a callee saved register is *not* added to SavedRegs on SystemZ by the default implementation.

This works during normal operation because we simply have to save/restore any of these registers if they are clobbered, and there isn't any need to check for any aliases while doing so.

So I am still not sure what to think: Should SystemZ start to add all the aliases into CSR_SystemZ (which seems like unnecessary extra work), or else IPRA would have to do it per my previous question.

IPRA needs the set of fully saved/restored registers while building the regmask of clobbered registers for the whole function. I thought this is the only user of determineCalleeSaves that needs this, and that's why I suggested that this would be done inside the IPRA pass.

qcolombet accepted this revision.May 11 2018, 12:31 PM

This loop (per what determineCalleeSaves does) will only add any of the registers that getCalleeSavedRegs() returns

You're right I thought it was doing it for all the registers. Sorry for bringing confusion here.

So I am still not sure what to think: Should SystemZ start to add all the aliases into CSR_SystemZ (which seems like unnecessary extra work), or else IPRA would have to do it per my previous question.

After actually checking the users (my recollections are not as good as I would like :P), no, we should definitely not start to put the aliases in that list.

IPRA needs the set of fully saved/restored registers while building the regmask of clobbered registers for the whole function. I thought this is the only user of determineCalleeSaves that needs this, and that's why I suggested that this would be done inside the IPRA pass.

I agree.

Going back to the previous questions now.

I have added the subregs already, but it seems we should also check which super-regs end up as fully included, right?

So yes, but I don't think we have an easy wait to do that automatically. Now, I understand we would need that to make the checks accurate in the verifier, but I am afraid that if that is the only user we should have a mechanism to opt-in the targets that supply the proper implementation.

Anyhow, the patch itself, sounds legit again :).

This revision is now accepted and ready to land.May 11 2018, 12:31 PM
I have added the subregs already, but it seems we should also check which super-regs end up as fully included, right?

So yes, but I don't think we have an easy wait to do that automatically. Now, I understand we would need that to make the checks accurate in the verifier, but I am afraid that if that is the only user we should have a mechanism to opt-in the targets that supply the proper implementation.

Anyhow, the patch itself, sounds legit again :).

I think also RegUsageInfoCollector would benefit from this - let's say that the MF uses %r10Q. SystemZ will never save %r10Q, but %r10D and %r11D, so I think we must simply do a second iteration over the registers and check for any reg that is not marked as saved if all its subregs are saved. Why is that not easy? Are there other cases (targets) where the subreg structure modeling would make this wrong?

I was thinking about this the other way around: if all the subreg of a register are saved that doesn’t imply the register itself is saved.
The way you describe with adding the sub regs works great. I thought you wanted the other way :P

Just reread what you said, you indeed want the other way.
The problem is that the big register is not necessarily covered by the sub regs. So you have to check for that.

E.g., on x86 saving xmm0 is not enough to say that ymm0 is preservered even if ymm0 doesn’t have any other sub reg.

The problem is that the big register is not necessarily covered by the sub regs. So you have to check for that.
E.g., on x86 saving xmm0 is not enough to say that ymm0 is preservered even if ymm0 doesn’t have any other sub reg.

Ah, I see. Would it work to instead use RegUnits, so that we first collect all saved/restored RegUnits, and then check which registers have all their RegUnits saved?

Would it work to instead use RegUnits, so that we first collect all saved/restored RegUnits, and then check which registers have all their RegUnits saved?

No, unless you can prove that the units cover the whole register. For instance, there is no unit covering the high part of ymm0. So we would think it is okay to make it preserved when only the known unit is preserved whereas it is not.

That said, Krzysztof added the support to describe such "anonymous" units in r328016, so if we can prove that all the registers are properly covered, we can do that. An alternative of proving that, would be to have the target explicitly say whether or not their register units covers the full registers.

An alternative of proving that, would be to have the target explicitly say whether or not their register units covers the full registers.

Doesn't the target already do this? At least there's a CoveredBySubRegs flag in the Register .td class ...

Doesn't the target already do this? At least there's a CoveredBySubRegs flag in the Register .td class ...

Indeed! Forgot about that one too!

jonpa updated this revision to Diff 148406.May 24 2018, 7:22 AM

Patch updated per previous discussion.

  • New static method computeCalleeSavedRegs() that does the work of adding subregs and fully saved superregs, that we can reuse in e.g. MachineVerifier.
  • When encountering a def of a reg, only call SetRegAsDefined after checking that *AI was not saved.
  • There is no CoveredBySubRegs flag on the register, but only on the register class as far as I can see. Since getMinimalPhysRegClass() iterates over all register classes, I thought this method might as well do the same.
jonpa requested review of this revision.May 24 2018, 7:22 AM
jonpa added inline comments.
lib/CodeGen/RegUsageInfoCollector.cpp
142

Unless you object, I'll keep it there, since I think this should be reported in the logs at least somewhere (and not by every backend).

LGTM.

Up to you for the logging. We can keep it.

There's one nit in the test case you didn't address about getting rid of the implicit variables + one blocking problem: the test requires asserts.

test/CodeGen/SystemZ/ipra-04.ll
8

You'll need an assert build if you want to look at the log.

I.e., REQUIRE: asserts (or something like that :P).

qcolombet accepted this revision.May 24 2018, 11:22 AM

With the test case fixed

This revision is now accepted and ready to land.May 24 2018, 11:22 AM
jonpa closed this revision.May 25 2018, 1:47 AM
jonpa marked 2 inline comments as done.

Thanks for review.
r333261.

MatzeB added inline comments.Jul 13 2018, 11:30 AM
lib/CodeGen/RegUsageInfoCollector.cpp
165

This needs to use MachineRegisterInfo::getCalleeSavedRegs(), not TargetRegisterInfo::getCalleeSavedRegs().