Page MenuHomePhabricator

Fix regalloc assignment of overlapping registers
ClosedPublic

Authored by rampitec on Jan 24 2017, 4:59 PM.

Details

Summary

SplitEditor::defFromParent() can create a register copy. If register is a tuple of other registers and not all lanes are used a copy will be done on a full tuple regardless. Later register unit for an unused lane will be considered free and another overlapping register tuple can be assigned to a different value even though first register is live at that point. That is because interference only look at liveness info, while full register copy clobbers all lanes, even unused.

This patch fixes copy to only cover used lanes.

This is how it happens in the app I was debugging:

Before Virtual Register Rewriter:

187344B         %vreg16749:sub0<def,read-undef> = V_ADD_I32_e32 %vreg12357, %vreg16754, %VCC<imp-def>, %EXEC<imp-use>; VReg_64:%vreg16749 SReg_32_XM0:%vreg12357 VGPR_32:%vreg16754
187360B         %vreg16749:sub1<def> = V_ADDC_U32_e32 %vreg16742, %vreg15898, %VCC<imp-def,dead>, %VCC<imp-use>, %EXEC<imp-use>; VReg_64:%vreg16749 VGPR_32:%vreg16742,%vreg15898
...
197648B         %vreg21888<def> = COPY %vreg21887; VReg_64:%vreg21888,%vreg21887
...
197988B         %vreg12551<def> = FLAT_LOAD_DWORDX2 %vreg16749, 0, 0, 0, %EXEC<imp-use>, %FLAT_SCR<imp-use>; mem:LD8[%arrayidx112.i18013006(addrspace=2)](tbaa=!11) VReg_64:%vr

After Virtual Register Rewriter:

187344B         %VGPR119<def> = V_ADD_I32_e32 %SGPR3, %VGPR1<kill>, %VCC<imp-def>, %EXEC<imp-use>
187360B         %VGPR120<def> = V_ADDC_U32_e32 %VGPR8<kill>, %VGPR19, %VCC<imp-def,dead>, %VCC<imp-use>, %EXEC<imp-use>
...
197648B         %VGPR118_VGPR119<def> = COPY %VGPR33_VGPR34
...
197988B         %VGPR52_VGPR53<def> = FLAT_LOAD_DWORDX2 %VGPR119_VGPR120, 0, 0, 0, %EXEC<imp-use>, %FLAT_SCR<imp-use>; mem:LD8[%arrayidx112.i18013006(addrspace=2)](tbaa=!11)

The RA debug log excerpt:

selectOrSplit VReg_64:%vreg21888 [197648r,200388r:0)  0@197648r L00000001 [197648r,200388r:0)  0@197648r w=4.824841e-04
assigning %vreg21888 to %VGPR118_VGPR119: VGPR118 [197648r,200388r:0)  0@197648r

selectOrSplit VReg_64:%vreg16749 [187344r,187360r:0)[187360r,197988r:1)  0@187344r 1@187360r L00000001 [187344r,197988r:0)  0@187344r L00000002 [187360r,197988r:0)  0@187360r
assigning %vreg16749 to %VGPR119_VGPR120: VGPR119 [187344r,197988r:0)  0@187344r VGPR120 [187360r,197988r:0)  0@187360r

[%vreg16749 -> %VGPR119_VGPR120] VReg_64
[%vreg21888 -> %VGPR118_VGPR119] VReg_64

One can see that live intervals for vreg21888 and vreg16749 do overlap, but only lane 0 of %vreg21888 is used, so VGPR119 considered free. This allows rewriter to assign pair of registers 119~120 to vreg16749. Then VGPR119 is clobbered by the %VGPR118_VGPR119<def> = COPY %VGPR33_VGPR34.

After the fix the copy will become %vreg21888:sub0<def, read-undef> = COPY %vreg21887:sub0.

I’m struggling to create a small and robust testcase so far. The original testcase is more than 12000 lines or IR and MIR does not give a reproducible result. If/when I have a smaller and better testcase I will add it.

Diff Detail

Repository
rL LLVM

Event Timeline

rampitec created this revision.Jan 24 2017, 4:59 PM
alex-t added inline comments.Jan 25 2017, 7:11 AM
lib/CodeGen/SplitKit.cpp
535

What if in parent LiveInterval there are more then one subrange (i.e. several not adjacent lanes are in use) ? Let's say we copy sub0 and sub2.
LaneMask will be 1010 and the condition "TRI.getSubRegIndexLaneMask(I) == LM" will never met.
Given the assert below this, why this never expected to happen?

alex-t added inline comments.Jan 25 2017, 7:15 AM
lib/CodeGen/SplitKit.cpp
535

Oops... forget it ) I missed that getSubRegIndexLaneMask and setSubreg treat the parameter as a plane integer...

rampitec marked 2 inline comments as done.Jan 25 2017, 9:13 AM
rampitec added inline comments.
lib/CodeGen/SplitKit.cpp
535

It looks like it does not happen, but in case if lane combination is not supported by target here is the assert. To me it is better to see an assert rather than silently clobber register.

alex-t accepted this revision.Jan 26 2017, 11:18 AM
This revision is now accepted and ready to land.Jan 26 2017, 11:18 AM
MatzeB requested changes to this revision.Jan 26 2017, 11:23 AM
MatzeB added inline comments.
lib/CodeGen/SplitKit.cpp
535

This can definitely happen. And we probably need a better scheme to deal with that. At the very least you should make it a report_fatal_error() so even release compilers abort instead of producing invalid code!

This revision now requires changes to proceed.Jan 26 2017, 11:23 AM
MatzeB added inline comments.Jan 26 2017, 11:25 AM
lib/CodeGen/SplitKit.cpp
535

Doesn't this also fail if a full register is copied (the parts that do not have all lanes alive could be somewhere else in the program).

rampitec marked 3 inline comments as done.Jan 26 2017, 1:00 PM
rampitec added inline comments.
lib/CodeGen/SplitKit.cpp
535

If all lanes are used that is recorded in the liveness, then the full register is copied. That is correct. If some lanes are unused, then we are looking for a subreg to cover only used lanes, and if found that is correct again. The only problem is as discussed if such a subreg cannot be found. Too bad, I cannot insert more than copy here, neither extend liveness here to cover all lanes if we fail to find a subreg index.

I.e. from correctness point of view it seems to be fine, with exception of subreg index absence of course. I will insert fatal error here.

535

Agree, will insert fatal error and update the review.

rampitec updated this revision to Diff 85955.Jan 26 2017, 1:11 PM
rampitec edited edge metadata.
rampitec marked an inline comment as done.

Replaced assert with report_fatal_error.

rampitec updated this revision to Diff 85956.Jan 26 2017, 1:12 PM

Fixed typo in condition.

The conditions to come into this situation are quite complicated. Unfortunately I lost a hope to create a non fragile and non gigantic test for it, neither I found any existing lit tests touching this condition.

Given that AMDGPU backend is the only one tracking lane liveness, I'd like this change to be accepted now.
LGTM

MatzeB accepted this revision.Jan 31 2017, 4:46 PM

Given that AMDGPU backend is the only one tracking lane liveness, I'd like this change to be accepted now.
LGTM

Hexagon is using subreg liveness as well and there is at least 1 more out-of-tree target using it.

This patch feels incomplete, it should just be a question of time until you run into the report_fatal_error() condition, but it's better than the status quo, so go ahead and push it there is not better solution.

This revision is now accepted and ready to land.Jan 31 2017, 4:46 PM

Given that AMDGPU backend is the only one tracking lane liveness, I'd like this change to be accepted now.
LGTM

Hexagon is using subreg liveness as well and there is at least 1 more out-of-tree target using it.

This patch feels incomplete, it should just be a question of time until you run into the report_fatal_error() condition, but it's better than the status quo, so go ahead and push it there is not better solution.

I agree, it is incomplete but better than what we have now. At least we will see this fatal error. It took me 2 months to catch this error in a huge program without any diagnostics from the compiler. Next time we will see it right away.

This revision was automatically updated to reflect the committed changes.
MatzeB added a comment.Feb 6 2017, 6:25 PM

This commit broke an internal target for us as it can choose subregister indexes which are not compatible with a given register class. Maybe we should go for the simpler LI->clearSubRanges() instead of the whole subreg aware copying for now even though that may pessimize the code.

This commit broke an internal target for us as it can choose subregister indexes which are not compatible with a given register class. Maybe we should go for the simpler LI->clearSubRanges() instead of the whole subreg aware copying for now even though that may pessimize the code.

It's been broken before, but now we have indication, right?
I have tried to use LI->clearSubRanges() either as an alone solution or as a backup if subreg search fails. It does not work, later on we hit assertion that a new subrange (because by clearing ranges we are creating new in fact) was not found in parent LI...