Page MenuHomePhabricator

[regalloc] Make RegMask clobbers prevent merging vreg's into PhysRegs when hoisting def's upwards.

Authored by dsanders on Jul 30 2015, 8:23 AM.



This prevents vreg260 and D7 from being merged in:

%vreg260<def> = LDC1 ...
JAL <ga:@sin>, <regmask ... list not containing D7 ...>
%D7<def> = COPY %vreg260; ...

This fixes the almabench regression in the LLVM 3.7.0 release branch.

Diff Detail

Event Timeline

dsanders updated this revision to Diff 31030.Jul 30 2015, 8:23 AM
dsanders retitled this revision from to [regalloc] Make RegMask clobbers prevent merging vreg's into PhysRegs when hoisting def's upwards..
dsanders updated this object.
dsanders added subscribers: llvm-commits, hans, qcolombet.

One worrying discovery I made while debugging this is that MachineInstr::definesRegister(Mips::D7) doesn't return true for the JAL instruction mentioned in the summary. It seems it doesn't consider regmasks to be defs of the clobbered registers. Is this intentional?

Hi Daniel,

Could you elaborate on what you are seeing?

In particular, is your problem the fact that the copy with vreg260 may not be coalesced or is it something else?



This check should have been handled by the loop starting line 1497, IIRC.
I.e., this shouldn’t be necessary.

Never mind, by regarding again the description I’ve got it.

So now, the question is why this case is not caught by the loop starting line 1497. vreg260 should have interfered with a reg unit of D7.

Could you double check?


MatzeB accepted this revision.Jul 30 2015, 11:05 AM
MatzeB added a reviewer: MatzeB.

The patch LGTM (maybe remove the complaints about definesRegister() from the comment).


From what I can see we do not create mini liverange segments for clobbers. This is both a bit scary but also understandable from a performance point of view as I imagine creating countless small segments would increase the LiveIntervals quite a bit.

The register allocators (the LiveRegMatrix class) has a special code paths to deal with regmasks.

That definesRegister() doesn't account for regmasks is understandable as the register isn't defined to something sensible, but there is certainly a case to make for MachineInstr::killsRegister() to check for register masks.

This revision is now accepted and ready to land.Jul 30 2015, 11:05 AM

That's right, the vreg260 and D7 can't be coalesced because the intervening JAL instruction clobbers D7 (it's not in the call-preserved regmask). Coalescing them would hoist the def of D7 (for the copy) above another def of D7 (for the calls clobber).

The relevant snippet of the original test is (almabench.c) is:

for (k = 8; k <= 9; ++k) {
    argl = kq[np][k] * dmu;
    dl   = dl + t * ( cl[np][k] * cos(argl) + sl[np][k] * sin(argl) ) * 0.0000001;

dl = fmod(dl,TWOPI);

Occasionally the TWOPI constant is zero instead of 2*PI. What's happening is the ldc1 instruction (load to FPU register) that loads the TWOPI constant into $f14 (known as D7 internally and is the second FPU argument register) is being moved by the coalescer from just before the fmod call, to a point just above three sin/cos calls from the unrolled loop. However, $f14 is call-clobbered and sometimes the sin/cos calls clobber the value.


I agree that that loop should have caught this but it doesn't. I haven't looked into why that loop doesn't catch it yet but if the information is based on MachineInstr::definesRegister() then it could be that that doesn't seem to account for regmasks.

I also tried:

if (MI->definesRegister(DstReg, TRI)) {
  DEBUG(dbgs() << "\t\tInterference (read): " << *MI);
  return false;

in this loop but that doesn't catch it either.

qcolombet added inline comments.Jul 30 2015, 1:05 PM

I guess Matthias is right about the small segments.

Then, I would suggest a slightly different API to use.
Use the analyzePhysReg on the MachineOperand iterator.
This will tell you if the physreg is read, write, or clobbered, and you can get rid of the previous loop to do the interference check.

dsanders closed this revision.Jul 31 2015, 5:59 AM

I've used the earlier LGTM to commit so that I have something to merge to the release branch. I will of course follow up on the comment I received after the LGTM.

I fixed a silly mistake before the commit. The inner if had two statements and no '{ }' so the return false was always executed. The mistake showed up as failures in four PowerPC tests.

Ok to merge to the branch?


I've looked at that function and I think it will do the right thing. I'll post another patch to make this change shortly.

dsanders added inline comments.Jul 31 2015, 6:24 AM

While making the change, I noticed that I can't remove the loop on line 1497. Isn't it needed for the if-statement on line 1510?

If I understand isFlipped() correctly, the true path is trying to account for:

%B = COPY %A
%A = COPY %B

We still need to check for defines of %A in the '...' region.

Also, suppose there's a JAL that clobbers A between them. In this case %A can't be coalesced with %A because that would move the second %A def above the clobber.

Shouldn't we be moving the check for regmask clobbers up into the loop on line 1497 somehow?

dsanders added inline comments.Jul 31 2015, 10:04 AM

Sorry, I was not referring to the loop on line 1497, I was referring to the part of the loop that
checks the uses, i.e., loop line 1527, if statement line 1530.

What I meant is this check can be replaced by looking at the read property for the
analyzePhysReg that you’re going to call for the clobber.

I realize I was not clear at all.

Apologize for the confusion.


Ah that makes sense, sorry for misunderstanding. I'll post a patch for that soon.


Because we're in rc2 now and the rules are stricter: Hans: Is this ok with you too?