This prevents vreg260 and D7 from being merged in:
%vreg260<def> = LDC1 ... JAL <ga:@sin>, <regmask ... list not containing D7 ...> %D7<def> = COPY %vreg260; ...
This fixes the almabench regression in the LLVM 3.7.0 release branch.
Differential D11649
[regalloc] Make RegMask clobbers prevent merging vreg's into PhysRegs when hoisting def's upwards. dsanders on Jul 30 2015, 8:23 AM. Authored by
Details This prevents vreg260 and D7 from being merged in: %vreg260<def> = LDC1 ... JAL <ga:@sin>, <regmask ... list not containing D7 ...> %D7<def> = COPY %vreg260; ... This fixes the almabench regression in the LLVM 3.7.0 release branch.
Diff Detail Event TimelineComment Actions One worrying discovery I made while debugging this is that MachineInstr::definesRegister(Mips::D7) doesn't return true for the JAL instruction mentioned in the summary. It seems it doesn't consider regmasks to be defs of the clobbered registers. Is this intentional? Comment Actions Hi Daniel, Could you elaborate on what you are seeing? In particular, is your problem the fact that the copy with vreg260 may not be coalesced or is it something else? Thanks,
Comment Actions Never mind, by regarding again the description I’ve got it. So now, the question is why this case is not caught by the loop starting line 1497. vreg260 should have interfered with a reg unit of D7. Could you double check? Thanks, Comment Actions The patch LGTM (maybe remove the complaints about definesRegister() from the comment).
Comment Actions That's right, the vreg260 and D7 can't be coalesced because the intervening JAL instruction clobbers D7 (it's not in the call-preserved regmask). Coalescing them would hoist the def of D7 (for the copy) above another def of D7 (for the calls clobber). The relevant snippet of the original test is (almabench.c) is: for (k = 8; k <= 9; ++k) { argl = kq[np][k] * dmu; dl = dl + t * ( cl[np][k] * cos(argl) + sl[np][k] * sin(argl) ) * 0.0000001; } dl = fmod(dl,TWOPI); Occasionally the TWOPI constant is zero instead of 2*PI. What's happening is the ldc1 instruction (load to FPU register) that loads the TWOPI constant into $f14 (known as D7 internally and is the second FPU argument register) is being moved by the coalescer from just before the fmod call, to a point just above three sin/cos calls from the unrolled loop. However, $f14 is call-clobbered and sometimes the sin/cos calls clobber the value.
Comment Actions I've used the earlier LGTM to commit so that I have something to merge to the release branch. I will of course follow up on the comment I received after the LGTM. I fixed a silly mistake before the commit. The inner if had two statements and no '{ }' so the return false was always executed. The mistake showed up as failures in four PowerPC tests. Ok to merge to the branch?
Comment Actions Thanks. Because we're in rc2 now and the rules are stricter: Hans: Is this ok with you too? |
This check should have been handled by the loop starting line 1497, IIRC.
I.e., this shouldn’t be necessary.