This is an archive of the discontinued LLVM Phabricator instance.

[regalloc] Make RegMask clobbers prevent merging vreg's into PhysRegs when hoisting def's upwards.
ClosedPublic

Authored by dsanders on Jul 30 2015, 8:23 AM.

Download Raw Diff

Details

Reviewers

Commits

rG6bd026f977da: Merging r243745: --------------------------------------------------------------…
rGe82f2947fd45: [regalloc] Make RegMask clobbers prevent merging vreg's into PhysRegs when…
rL243745: [regalloc] Make RegMask clobbers prevent merging vreg's into PhysRegs when…

Summary

This prevents vreg260 and D7 from being merged in:

%vreg260<def> = LDC1 ...
JAL <ga:@sin>, <regmask ... list not containing D7 ...>
%D7<def> = COPY %vreg260; ...

This fixes the almabench regression in the LLVM 3.7.0 release branch.

Diff Detail

Event Timeline

dsanders updated this revision to Diff 31030.Jul 30 2015, 8:23 AM

dsanders retitled this revision from to [regalloc] Make RegMask clobbers prevent merging vreg's into PhysRegs when hoisting def's upwards..

dsanders updated this object.

dsanders added subscribers: llvm-commits, hans, qcolombet.

Herald added a subscriber: MatzeB. · View Herald TranscriptJul 30 2015, 8:23 AM

One worrying discovery I made while debugging this is that MachineInstr::definesRegister(Mips::D7) doesn't return true for the JAL instruction mentioned in the summary. It seems it doesn't consider regmasks to be defs of the clobbered registers. Is this intentional?

Hi Daniel,

Could you elaborate on what you are seeing?

In particular, is your problem the fact that the copy with vreg260 may not be coalesced or is it something else?

Thanks,
-Quentin

lib/CodeGen/RegisterCoalescer.cpp
1541	This check should have been handled by the loop starting line 1497, IIRC. I.e., this shouldn’t be necessary.

Never mind, by regarding again the description I’ve got it.

So now, the question is why this case is not caught by the loop starting line 1497. vreg260 should have interfered with a reg unit of D7.

Could you double check?

Thanks,
-Quentin

The patch LGTM (maybe remove the complaints about definesRegister() from the comment).

lib/CodeGen/RegisterCoalescer.cpp
1541	From what I can see we do not create mini liverange segments for clobbers. This is both a bit scary but also understandable from a performance point of view as I imagine creating countless small segments would increase the LiveIntervals quite a bit. The register allocators (the LiveRegMatrix class) has a special code paths to deal with regmasks. That definesRegister() doesn't account for regmasks is understandable as the register isn't defined to something sensible, but there is certainly a case to make for MachineInstr::killsRegister() to check for register masks.

This revision is now accepted and ready to land.Jul 30 2015, 11:05 AM

That's right, the vreg260 and D7 can't be coalesced because the intervening JAL instruction clobbers D7 (it's not in the call-preserved regmask). Coalescing them would hoist the def of D7 (for the copy) above another def of D7 (for the calls clobber).

The relevant snippet of the original test is (almabench.c) is:

for (k = 8; k <= 9; ++k) {
    argl = kq[np][k] * dmu;
    dl   = dl + t * ( cl[np][k] * cos(argl) + sl[np][k] * sin(argl) ) * 0.0000001;
}

dl = fmod(dl,TWOPI);

Occasionally the TWOPI constant is zero instead of 2*PI. What's happening is the ldc1 instruction (load to FPU register) that loads the TWOPI constant into $f14 (known as D7 internally and is the second FPU argument register) is being moved by the coalescer from just before the fmod call, to a point just above three sin/cos calls from the unrolled loop. However, $f14 is call-clobbered and sometimes the sin/cos calls clobber the value.

lib/CodeGen/RegisterCoalescer.cpp
1541	I agree that that loop should have caught this but it doesn't. I haven't looked into why that loop doesn't catch it yet but if the information is based on MachineInstr::definesRegister() then it could be that that doesn't seem to account for regmasks. I also tried: if (MI->definesRegister(DstReg, TRI)) { DEBUG(dbgs() << "\t\tInterference (read): " << *MI); return false; } in this loop but that doesn't catch it either.

qcolombet added inline comments.Jul 30 2015, 1:05 PM

lib/CodeGen/RegisterCoalescer.cpp
1541	I guess Matthias is right about the small segments. Then, I would suggest a slightly different API to use. Use the analyzePhysReg on the MachineOperand iterator. This will tell you if the physreg is read, write, or clobbered, and you can get rid of the previous loop to do the interference check.

dsanders closed this revision.Jul 31 2015, 5:59 AM

I've used the earlier LGTM to commit so that I have something to merge to the release branch. I will of course follow up on the comment I received after the LGTM.

I fixed a silly mistake before the commit. The inner if had two statements and no '{ }' so the return false was always executed. The mistake showed up as failures in four PowerPC tests.

Ok to merge to the branch?

lib/CodeGen/RegisterCoalescer.cpp
1541	I've looked at that function and I think it will do the right thing. I'll post another patch to make this change shortly.

dsanders added inline comments.Jul 31 2015, 6:24 AM

lib/CodeGen/RegisterCoalescer.cpp
1541	While making the change, I noticed that I can't remove the loop on line 1497. Isn't it needed for the if-statement on line 1510? If I understand isFlipped() correctly, the true path is trying to account for: %B = COPY %A ... %A = COPY %B We still need to check for defines of %A in the '...' region. Also, suppose there's a JAL that clobbers A between them. In this case %A can't be coalesced with %A because that would move the second %A def above the clobber. Shouldn't we be moving the check for regmask clobbers up into the loop on line 1497 somehow?

Hi Daniel,

dsanders added inline comments.Jul 31 2015, 10:04 AM

lib/CodeGen/RegisterCoalescer.cpp
1541	Sorry, I was not referring to the loop on line 1497, I was referring to the part of the loop that checks the uses, i.e., loop line 1527, if statement line 1530. What I meant is this check can be replaced by looking at the read property for the analyzePhysReg that you’re going to call for the clobber. I realize I was not clear at all. Apologize for the confusion. Thanks, Q. Ah that makes sense, sorry for misunderstanding. I'll post a patch for that soon.

Thanks.

Because we're in rc2 now and the rules are stricter: Hans: Is this ok with you too?

Revision Contents

Path

Size

lib/

CodeGen/

RegisterCoalescer.cpp

8 lines

Diff 31030

lib/CodeGen/RegisterCoalescer.cpp

Show First 20 Lines • Show All 1,525 Lines • ▼ Show 20 Lines	if (CP.isFlipped()) {
SlotIndexes *Indexes = LIS->getSlotIndexes();		SlotIndexes *Indexes = LIS->getSlotIndexes();
for (SlotIndex SI = Indexes->getNextNonNullIndex(DestRegIdx);		for (SlotIndex SI = Indexes->getNextNonNullIndex(DestRegIdx);
SI != CopyRegIdx; SI = Indexes->getNextNonNullIndex(SI)) {		SI != CopyRegIdx; SI = Indexes->getNextNonNullIndex(SI)) {
MachineInstr *MI = LIS->getInstructionFromIndex(SI);		MachineInstr *MI = LIS->getInstructionFromIndex(SI);
if (MI->readsRegister(DstReg, TRI)) {		if (MI->readsRegister(DstReg, TRI)) {
DEBUG(dbgs() << "\t\tInterference (read): " << *MI);		DEBUG(dbgs() << "\t\tInterference (read): " << *MI);
return false;		return false;
}		}

		// We must also check for clobbers caused by regmasks. We can't use
		// definesRegister() since it doesn't account for regmasks.
		for (const auto &MO : MI->operands()) {
		if (MO.isRegMask() && MO.clobbersPhysReg(DstReg))
		DEBUG(dbgs() << "\t\tInterference (regmask clobber): " << *MI);
		return false;
		}
		qcolombetUnsubmitted Not Done Reply Inline Actions This check should have been handled by the loop starting line 1497, IIRC. I.e., this shouldn’t be necessary. qcolombet: This check should have been handled by the loop starting line 1497, IIRC. I.e., this shouldn’t…
		dsandersAuthorUnsubmitted Not Done Reply Inline Actions I agree that that loop should have caught this but it doesn't. I haven't looked into why that loop doesn't catch it yet but if the information is based on MachineInstr::definesRegister() then it could be that that doesn't seem to account for regmasks. I also tried: if (MI->definesRegister(DstReg, TRI)) { DEBUG(dbgs() << "\t\tInterference (read): " << MI); return false; } in this loop but that doesn't catch it either. dsanders:* I agree that that loop should have caught this but it doesn't. I haven't looked into why that…
		MatzeBUnsubmitted Not Done Reply Inline Actions From what I can see we do not create mini liverange segments for clobbers. This is both a bit scary but also understandable from a performance point of view as I imagine creating countless small segments would increase the LiveIntervals quite a bit. The register allocators (the LiveRegMatrix class) has a special code paths to deal with regmasks. That definesRegister() doesn't account for regmasks is understandable as the register isn't defined to something sensible, but there is certainly a case to make for MachineInstr::killsRegister() to check for register masks. MatzeB: From what I can see we do not create mini liverange segments for clobbers. This is both a bit…
		qcolombetUnsubmitted Not Done Reply Inline Actions I guess Matthias is right about the small segments. Then, I would suggest a slightly different API to use. Use the analyzePhysReg on the MachineOperand iterator. This will tell you if the physreg is read, write, or clobbered, and you can get rid of the previous loop to do the interference check. qcolombet: I guess Matthias is right about the small segments. Then, I would suggest a slightly different…
		dsandersAuthorUnsubmitted Not Done Reply Inline Actions I've looked at that function and I think it will do the right thing. I'll post another patch to make this change shortly. dsanders: I've looked at that function and I think it will do the right thing. I'll post another patch to…
		dsandersAuthorUnsubmitted Not Done Reply Inline Actions While making the change, I noticed that I can't remove the loop on line 1497. Isn't it needed for the if-statement on line 1510? If I understand isFlipped() correctly, the true path is trying to account for: %B = COPY %A ... %A = COPY %B We still need to check for defines of %A in the '...' region. Also, suppose there's a JAL that clobbers A between them. In this case %A can't be coalesced with %A because that would move the second %A def above the clobber. Shouldn't we be moving the check for regmask clobbers up into the loop on line 1497 somehow? dsanders: While making the change, I noticed that I can't remove the loop on line 1497. Isn't it needed…
		dsandersAuthorUnsubmitted Not Done Reply Inline Actions Sorry, I was not referring to the loop on line 1497, I was referring to the part of the loop that checks the uses, i.e., loop line 1527, if statement line 1530. What I meant is this check can be replaced by looking at the read property for the analyzePhysReg that you’re going to call for the clobber. I realize I was not clear at all. Apologize for the confusion. Thanks, Q. Ah that makes sense, sorry for misunderstanding. I'll post a patch for that soon. dsanders: > Sorry, I was not referring to the loop on line 1497, I was referring to the part of the loop…
}		}

// We're going to remove the copy which defines a physical reserved		// We're going to remove the copy which defines a physical reserved
// register, so remove its valno, etc.		// register, so remove its valno, etc.
DEBUG(dbgs() << "\t\tRemoving phys reg def of " << DstReg << " at "		DEBUG(dbgs() << "\t\tRemoving phys reg def of " << DstReg << " at "
<< CopyRegIdx << "\n");		<< CopyRegIdx << "\n");

LIS->removePhysRegDefAt(DstReg, CopyRegIdx);		LIS->removePhysRegDefAt(DstReg, CopyRegIdx);
▲ Show 20 Lines • Show All 1,462 Lines • Show Last 20 Lines