This is an alternative to D48032.

# Details

- Reviewers
tpr MatzeB - Commits
- rL335472: Improve handling of COPY instructions with identical value numbers

rG36b816f81409: Improve handling of COPY instructions with identical value numbers

rL334594: Improve handling of COPY instructions with identical value numbers

rG4581f37e7c36: Improve handling of COPY instructions with identical value numbers

# Diff Detail

- Repository
- rL LLVM

### Event Timeline

Yes, that works for me thanks on my larger test case.

lib/CodeGen/RegisterCoalescer.cpp | ||
---|---|---|

2488 | Did you mean to leave this comment from my change in? |

This testcase is exposing some preexisting problem in the coalescer. I have reverted this patch until I figure out what's wrong.

With this, I'm getting a different assert "CopyMI input register not live" in one test case. I'll narrow it down and check I can reproduce it on upstream LLVM with your fix (as opposed to our local branch with your fix).

I can fix that assert on my testcase with this addition to your fix:

--- a/lib/CodeGen/RegisterCoalescer.cpp +++ b/lib/CodeGen/RegisterCoalescer.cpp @@ -1653,6 +1653,10 @@ bool RegisterCoalescer::joinCopy(MachineInstr *CopyMI, bool &Again) { deleteInstr(CopyMI); return false; // Not coalescable. } + if (CopyMI->getOpcode() == TargetOpcode::IMPLICIT_DEF) { + // Not coalesceable; eliminateUndefCopy turned it into implicit_def. + return false; + } // Coalesced copies are normally removed immediately, but transformations // like removeCopyByCommutingDef() can inadvertently create identity copies.

I will go and test that more thoroughly.

I'm getting another failure somewhere else in our test suite with that. I need to analyze it further.

lib/CodeGen/RegisterCoalescer.cpp | ||
---|---|---|

2869 | This is the cause of my second failure. In my test, because of some weird branching around in the control flow, there is another value in between Id and Q.endPoint(), and this code is illegally extending segment Id over the top of the other value. I guess what it really needs is to add new segments as necessary so Id->valno is live on all paths from OtherDef to Def. Is there some ready made code to do that somewhere? |

Could you add "LIS->print(dbgs())" at the beginning of joinCopy and post the output with -debug-only=regalloc? You can remove the unnecessary parts from the output. For the copy that triggers the problem, I'd like to see all instructions using the registers from the copy and the debugging output from the coalescer.

Sorry about the lack of testcase. I need to get clearance to post it as it is derived from a third party app.

I'll gather together the useful bits of the debug log.

********** INTERVALS ********** %67 [3184r,3216r:0) 0@3184r weight:0.000000e+00 %72 [384B,544r:0)[2576r,2608B:2)[2864B,2912B:0)[3072r,3104B:1)[3104B,3264B:0) 0@2864B-phi 1@3072r 2@2576r L00000004 [384B,544r:0)[2576r,2608B:2)[2864B,2912B:0)[3072r,3104B:1)[3104B,3264B:0) 0@2864B-phi 1@3072r 2@2576r L00000002 [384B,528r:0)[2576r,2608B:2)[2864B,2912B:0)[3072r,3104B:1)[3104B,3264B:0) 0@2864B-phi 1@3072r 2@2576r L00000001 [384B,528r:0)[2576r,2608B:2)[2864B,2912B:0)[3072r,3104B:1)[3104B,3264B:0) 0@2864B-phi 1@3072r 2@2576r L00000008 [2576r,2608B:2)[2864B,2912B:0)[3072r,3104B:1)[3104B,3184r:0) 0@2864B-phi 1@3072r 2@2576r weight:0.000000e+00 %86 [224r,240B:1)[240B,384B:2)[640B,1808B:2)[1904r,1920B:3)[1920B,2576r:4)[2608B,2624r:4)[3216r,3264B:0) 0@3216r 1@224r 2@240B-phi 3@1904r 4@1920B-phi L00000004 [224r,240B:1)[240B,384B:2)[640B,1808B:2)[1904r,1920B:3)[1920B,2576r:4)[2608B,2624r:4)[3216r,3264B:0) 0@3216r 1@224r 2@240B-phi 3@1904r 4@1920B-phi L0000000B [224r,240B:1)[240B,384B:2)[640B,1808B:2)[1904r,1920B:3)[1920B,2576r:4)[3216r,3264B:0) 0@3216r 1@224r 2@240B-phi 3@1904r 4@1920B-phi weight:0.000000e+00 ********** MACHINEINSTRS ********** # Machine code for function _amdgpu_cs_main: NoPHIs, TracksLiveness 2864B bb.66.Flow: ; predecessors: %bb.63, %bb.67 successors: %bb.68(0x80000000); %bb.68(100.00%) 2896B S_BRANCH %bb.68 2912B bb.67 (%ir-block.152): ; predecessors: %bb.64, %bb.65 successors: %bb.66(0x80000000); %bb.66(100.00%) 2928B $exec = S_OR_B64 $exec, %5:sreg_64, implicit-def $scc 2944B %8:vreg_1 = COPY %89:vreg_1 2960B %40:sreg_64_xexec = V_CMP_NE_U32_e64 0, %8:vreg_1, implicit $exec 2976B %39:vgpr_32 = V_CNDMASK_B32_e64 0, 1, %40:sreg_64_xexec, implicit $exec 2992B %82:vgpr_32 = V_MOV_B32_e32 0, implicit $exec 3008B undef %81.sub0:vreg_128 = COPY %82:vgpr_32 3024B %81.sub1:vreg_128 = COPY %82:vgpr_32 3040B %81.sub2:vreg_128 = COPY %39:vgpr_32 3056B %73:vreg_128 = COPY %81:vreg_128 3072B %72:vreg_128 = COPY %73:vreg_128 3088B S_BRANCH %bb.66 3104B bb.68 (%ir-block.156): ; predecessors: %bb.66 successors: %bb.4(0x04000000), %bb.1(0x7c000000); %bb.4(3.12%), %bb.1(96.88%) 3120B %64:vgpr_32 = V_ADD_I32_e32 32, %85:vgpr_32, implicit-def dead $vcc, implicit $exec 3136B V_CMP_EQ_U32_e32 0, %64:vgpr_32, implicit-def $vcc, implicit $exec 3152B %63:vgpr_32 = COPY %64:vgpr_32 3168B $vcc = S_AND_B64 $exec, killed $vcc, implicit-def dead $scc 3184B %67:vreg_128 = COPY %72:vreg_128 3200B %85:vgpr_32 = COPY %63:vgpr_32 3216B %86:vreg_128 = COPY %67:vreg_128 3232B S_CBRANCH_VCCNZ %bb.4, implicit killed $vcc 3248B S_BRANCH %bb.1 # End machine code for function _amdgpu_cs_main. 2576B %72:vreg_128 = COPY %86:vreg_128 Considering merging to VReg_128 with %72 in %86 RHS = %72 [384B,544r:0)[2576r,2608B:2)[2864B,2912B:0)[3072r,3104B:1)[3104B,3264B:0) 0@2864B-phi 1@3072r 2@2576r L00000004 [384B,544r:0)[2576r,2608B:2)[2864B,2912B:0)[3072r,3104B:1)[3104B,3264B:0) 0@2864B-phi 1@3072r 2@2576r L00000002 [384B,528r:0)[2576r,2608B:2)[2864B,2912B:0)[3072r,3104B:1)[3104B,3264B:0) 0@2864B-phi 1@3072r 2@2576r L00000001 [384B,528r:0)[2576r,2608B:2)[2864B,2912B:0)[3072r,3104B:1)[3104B,3264B:0) 0@2864B-phi 1@3072r 2@2576r L00000008 [2576r,2608B:2)[2864B,2912B:0)[3072r,3104B:1)[3104B,3184r:0) 0@2864B-phi 1@3072r 2@2576r weight:0.000000e+00 LHS = %86 [224r,240B:1)[240B,384B:2)[640B,1808B:2)[1904r,1920B:3)[1920B,2576r:4)[2608B,2624r:4)[3216r,3264B:0) 0@3216r 1@224r 2@240B-phi 3@1904r 4@1920B-phi L00000004 [224r,240B:1)[240B,384B:2)[640B,1808B:2)[1904r,1920B:3)[1920B,2576r:4)[2608B,2624r:4)[3216r,3264B:0) 0@3216r 1@224r 2@240B-phi 3@1904r 4@1920B-phi L0000000B [224r,240B:1)[240B,384B:2)[640B,1808B:2)[1904r,1920B:3)[1920B,2576r:4)[3216r,3264B:0) 0@3216r 1@224r 2@240B-phi 3@1904r 4@1920B-phi weight:0.000000e+00 merge %86:0@3216r into %72:0@2864B --> @2864B merge %72:2@2576r into %86:4@1920B --> @1920B LHST = %86 %86 [224r,240B:1)[240B,384B:2)[640B,1808B:2)[1904r,1920B:3)[1920B,2576r:4)[2608B,2624r:4)[3216r,3264B:0) 0@3216r 1@224r 2@240B-phi 3@1904r 4@1920B-phi L00000004 [224r,240B:1)[240B,384B:2)[640B,1808B:2)[1904r,1920B:3)[1920B,2576r:4)[2608B,2624r:4)[3216r,3264B:0) 0@3216r 1@224r 2@240B-phi 3@1904r 4@1920B-phi L0000000B [224r,240B:1)[240B,384B:2)[640B,1808B:2)[1904r,1920B:3)[1920B,2576r:4)[3216r,3264B:0) 0@3216r 1@224r 2@240B-phi 3@1904r 4@1920B-phi weight:0.000000e+00 merge %86:0@3216r into %72:0@2864B --> @2864B merge %72:2@2576r into %86:4@1920B --> @1920B joined lanes: [224r,240B:1)[240B,384B:2)[384B,544r:0)[640B,1808B:2)[1904r,1920B:3)[1920B,2624r:4)[2864B,2912B:0)[3072r,3104B:5)[3104B,3264B:0) 0@2864B-phi 1@224r 2@240B-phi 3@1904r 4@1920B-phi 5@3072r merge %86:0@3216r into %72:0@2864B --> @2864B merge %72:2@2576r into %86:4@1920B --> @1920B joined lanes: [224r,240B:1)[240B,384B:2)[384B,528r:0)[640B,1808B:2)[1904r,1920B:3)[1920B,2608B:4)[2864B,2912B:0)[3072r,3104B:5)[3104B,3264B:0) 0@2864B-phi 1@224r 2@240B-phi 3@1904r 4@1920B-phi 5@3072r merge %86:0@3216r into %72:0@2864B --> @2864B merge %72:2@2576r into %86:4@1920B --> @1920B joined lanes: [224r,240B:1)[240B,384B:2)[384B,528r:0)[640B,1808B:2)[1904r,1920B:3)[1920B,2608B:4)[2864B,2912B:0)[3072r,3104B:5)[3104B,3264B:0) 0@2864B-phi 1@224r 2@240B-phi 3@1904r 4@1920B-phi 5@3072r merge %72:2@2576r into %86:4@1920B --> @1920B joined lanes: [224r,240B:1)[240B,384B:2)[640B,1808B:2)[1904r,1920B:3)[1920B,2608B:4)[2864B,2912B:5)[3072r,3104B:6)[3104B,3184r:5)[3216r,3264B:0) 0@3216r 1@224r 2@240B-phi 3@1904r 4@1920B-phi 5@2864B-phi 6@3072r Joined SubRanges %86 [224r,240B:1)[240B,384B:2)[640B,1808B:2)[1904r,1920B:3)[1920B,2576r:4)[2608B,2624r:4)[3216r,3264B:0) 0@3216r 1@224r 2@240B-phi 3@1904r 4@1920B-phi L00000001 [224r,240B:1)[240B,384B:2)[384B,528r:0)[640B,1808B:2)[1904r,1920B:3)[1920B,2608B:4)[2864B,2912B:0)[3072r,3104B:5)[3104B,3264B:0) 0@2864B-phi 1@224r 2@240B-phi 3@1904r 4@1920B-phi 5@3072r L00000002 [224r,240B:1)[240B,384B:2)[384B,528r:0)[640B,1808B:2)[1904r,1920B:3)[1920B,2608B:4)[2864B,2912B:0)[3072r,3104B:5)[3104B,3264B:0) 0@2864B-phi 1@224r 2@240B-phi 3@1904r 4@1920B-phi 5@3072r L00000004 [224r,240B:1)[240B,384B:2)[384B,544r:0)[640B,1808B:2)[1904r,1920B:3)[1920B,2624r:4)[2864B,2912B:0)[3072r,3104B:5)[3104B,3264B:0) 0@2864B-phi 1@224r 2@240B-phi 3@1904r 4@1920B-phi 5@3072r L00000008 [224r,240B:1)[240B,384B:2)[640B,1808B:2)[1904r,1920B:3)[1920B,2608B:4)[2864B,2912B:5)[3072r,3104B:6)[3104B,3184r:5)[3216r,3264B:0) 0@3216r 1@224r 2@240B-phi 3@1904r 4@1920B-phi 5@2864B-phi 6@3072r weight:0.000000e+00 Expecting instruction removal at 3216r checking: L00000001 [224r,240B:1)[240B,384B:2)[384B,528r:0)[640B,1808B:2)[1904r,1920B:3)[1920B,2608B:4)[2864B,2912B:0)[3072r,3104B:5)[3104B,3264B:0) 0@2864B-phi 1@224r 2@240B-phi 3@1904r 4@1920B-phi 5@3072r checking: L00000002 [224r,240B:1)[240B,384B:2)[384B,528r:0)[640B,1808B:2)[1904r,1920B:3)[1920B,2608B:4)[2864B,2912B:0)[3072r,3104B:5)[3104B,3264B:0) 0@2864B-phi 1@224r 2@240B-phi 3@1904r 4@1920B-phi 5@3072r checking: L00000004 [224r,240B:1)[240B,384B:2)[384B,544r:0)[640B,1808B:2)[1904r,1920B:3)[1920B,2624r:4)[2864B,2912B:0)[3072r,3104B:5)[3104B,3264B:0) 0@2864B-phi 1@224r 2@240B-phi 3@1904r 4@1920B-phi 5@3072r checking: L00000008 [224r,240B:1)[240B,384B:2)[640B,1808B:2)[1904r,1920B:3)[1920B,2608B:4)[2864B,2912B:5)[3072r,3104B:6)[3104B,3184r:5)[3216r,3264B:0) 0@3216r 1@224r 2@240B-phi 3@1904r 4@1920B-phi 5@2864B-phi 6@3072r Prune sublane 00000008 at 3216r Assertion failed: (I->end <= std::next(I)->start), function verify, file ../lib/CodeGen/LiveInterval.cpp, line 1022. (lldb) up 5 frame #5: 0x00000001012cf283 llc`(anonymous namespace)::JoinVals::pruneSubRegValues(this=0x00007fff5fbfa9e8, LI=0x0000000106b0a9e0, ShrinkMask=0x0000000106d00be0) + 1619 at RegisterCoalescer.cpp:2872 2869 } 2870 // Mark value number as unused. 2871 ValueOut->markUnused(); -> 2872 S.verify(); 2873 continue; 2874 } 2875 // If a subrange ends at the copy, then a value was copied but only (lldb) p S.dump() L00000008 [224r,240B:1)[240B,384B:2)[640B,1808B:2)[1904r,1920B:3)[1920B,2608B:4)[2864B,3264B:5)[3072r,3104B:6)[3104B,3184r:5) 0@x 1@224r 2@240B-phi 3@1904r 4@1920B-phi 5@2864B-phi 6@3072r

It was trying to coalesce %72 and %86, and it did that thing where it notices that a value of %86 is copied (at 3216) from %67, which is itself copied (at 3184) from %72. %72's value at that point is 5@2864B-phi, which reaches 3184 by branching over [2912B,3104B), in which another value 6@3072r is defined.

In lane L00000008, it wants to remove the segment [3216r,3264B:0) and extend [2864B,2912B:5) to 3264B, but it ends up with [2864B,3264B:5), whereas it should have [2864B,2912B:5)[3104B,3264B:5).

I've just realized I am allowed to post the whole cutdown test case because it comes from a conformance test, not from someone else's app.

What's the best way to post it?

Thanks, but how do I get your email address from phabricator? It's a .mir file that I got from -stop-before=simple-register-coalescer on a .ll test that I had cut down with bugpoint.

I'm trying out fixing up the LR like this:

// If a subrange starts at the copy then an undefined value has been // copied and we must remove that subrange value as well. VNInfo *ValueOut = Q.valueOutOrDead(); if (ValueOut != nullptr && Q.valueIn() == nullptr) { DEBUG(dbgs() << "\t\tPrune sublane " << PrintLaneMask(S.LaneMask) << " at " << Def << "\n"); SmallVector<SlotIndex, 4> EndPoints; LIS->pruneValue(S, Def, &EndPoints); DidPrune = true; // Mark value number as unused. ValueOut->markUnused(); if (V.Identical) { // Ensure that liveness reaches the end points of the value that we // have just pruned. LIS->extendToIndices(S, EndPoints, ArrayRef<SlotIndex>()); } S.verify(); continue; }

It has pruned value 0 away completely, and it extends value 5 to reach the end point(s) of value 0. Isn't that what we want?

I guess there is the possibility that OtherVNI is not the closest dominating def, but surely it would not be coalescing in the first place if that was true.

You're right, `extendToIndices` should be ok. I wanted to be explicit about what gets extended (since it could catch other errors), while `extendToIndices` would just extend whatever is the reaching value.

I can reproduce the problem using your testcase (which now works), let me try to reduce it and add to the patch.

lib/CodeGen/RegisterCoalescer.cpp | ||
---|---|---|

2869 | Why don't we need to extend to all the uses of the value we've just pruned? |

I now have another test case where the extendToIndices falls over with "Use not dominated by all defs". I will investigate.

lib/CodeGen/RegisterCoalescer.cpp | ||
---|---|---|

2868 | My new failure is because it gets here with V.Identical true, but in this particular lane there is no def or liveness at OtherDef. So I added a check for that: if (V.Identical && S.Query(OtherDef).valueOut()) { // If V is identical to V.OtherVNI and is live out of OtherDef in // this lane, then we can't simply prune V from S. V needs to be // replaced with V.OtherVNI. But I am now getting a "Couldn't join subrange" in the code that originally prompted this whole fix (but in not the cutdown that is the first test in this change). So I need to investigate that. |

lib/CodeGen/RegisterCoalescer.cpp | ||
---|---|---|

2298 | This should obvoiusly be &&... |

Good spot on the typo, but unfortunately it does not fix my latest problem. I have been told that I am not allowed to send the test case outside of AMD, so I'll have to try and get by with just showing you part of a debug dump of what is going on.

It is trying to coalesce %17 and %25, and the main range and L00000008 say that 832r is a CR_Erase with Identical set. But then L00000001 says that 832r is CR_Unresolved.

Any ideas? I'll have to stop working on this in a minute until tomorrow.

********** INTERVALS ********** %1 [912r,928r:0) 0@912r weight:0.000000e+00 %16 [816r,832r:0) 0@816r weight:0.000000e+00 %17 [800r,848r:0) 0@800r weight:0.000000e+00 %18 [720r,720d:0) 0@720r weight:0.000000e+00 %25 [528r,592r:0)[592r,608r:3)[608r,640r:4)[640r,704B:5)[704B,752r:0)[752r,768r:1)[768r,816r:2)[832r,864B:6)[864B,912r:7) 0@528r 1@752r 2@768r 3@592r 4@608r 5@640r 6@832r 7@864B-phi L00000008 [640r,672r:0)[832r,832d:1) 0@640r 1@832r 2@x L00000001 [592r,672r:1)[752r,800r:0)[832r,832d:2) 0@752r 1@592r 2@832r 3@x L00000002 [608r,672r:1)[768r,800r:0)[832r,832d:2) 0@768r 1@608r 2@832r 3@x L00000004 [528r,816r:0)[832r,864B:1)[864B,912r:2) 0@528r 1@832r 2@864B-phi weight:0.000000e+00 %36 [976r,992r:0) 0@976r weight:0.000000e+00 %41 [672r,704B:1)[848r,864B:0)[864B,976r:2) 0@848r 1@672r 2@864B-phi weight:0.000000e+00 %42 [112r,144B:2)[992r,1008B:1)[1008B,1040r:3)[1184r,1216B:0) 0@1184r 1@992r 2@112r 3@1008B-phi L00000008 [112r,144B:2)[992r,1008B:1)[1008B,1040r:3)[1184r,1216B:0) 0@1184r 1@992r 2@112r 3@1008B-phi L00000007 [112r,112d:2)[992r,992d:1)[1184r,1184d:0) 0@1184r 1@992r 2@112r 3@x weight:0.000000e+00 RegMasks: ********** MACHINEINSTRS ********** ... 752B %25.sub0:sreg_128 = COPY %25.sub2:sreg_128 768B %25.sub1:sreg_128 = COPY %25.sub2:sreg_128 800B %17:sreg_128 = COPY %25:sreg_128 816B %16:sreg_128 = COPY %25:sreg_128 832B %25:sreg_128 = COPY %16:sreg_128 848B %41:sreg_128 = COPY %17:sreg_128 864B bb.17.bb22: ; predecessors: %bb.15, %bb.16 ... # End machine code for function _amdgpu_cs_main. 800B %17:sreg_128 = COPY %25:sreg_128 Considering merging to SReg_128 with %17 in %25 RHS = %17 [800r,848r:0) 0@800r weight:0.000000e+00 LHS = %25 [528r,592r:0)[592r,608r:3)[608r,640r:4)[640r,704B:5)[704B,752r:0)[752r,768r:1)[768r,816r:2)[832r,864B:6)[864B,912r:7) 0@528r 1@752r 2@768r 3@592r 4@608r 5@640r 6@832r 7@864B-phi L00000008 [640r,672r:0)[832r,832d:1) 0@640r 1@832r 2@x L00000001 [592r,672r:1)[752r,800r:0)[832r,832d:2) 0@752r 1@592r 2@832r 3@x L00000002 [608r,672r:1)[768r,800r:0)[832r,832d:2) 0@768r 1@608r 2@832r 3@x L00000004 [528r,816r:0)[832r,864B:1)[864B,912r:2) 0@528r 1@832r 2@864B-phi weight:0.000000e+00 analyzeValue(528r) gives CR_Keep analyzeValue(752r) gives CR_Keep analyzeValue(768r) gives CR_Keep analyzeValue(592r) gives CR_Keep analyzeValue(608r) gives CR_Keep analyzeValue(640r) gives CR_Keep analyzeValue(800r) gives CR_Erase merge %17:0@800r into %25:2@768r --> @768r Identical CR_Erase analyzeValue(832r) gives CR_Erase merge %25:6@832r into %17:0@800r --> @768r analyzeValue(864B) gives CR_Keep LHST = %25 %25 [528r,592r:0)[592r,608r:3)[608r,640r:4)[640r,704B:5)[704B,752r:0)[752r,768r:1)[768r,816r:2)[832r,864B:6)[864B,912r:7) 0@528r 1@752r 2@768r 3@592r 4@608r 5@640r 6@832r 7@864B-phi L00000008 [640r,672r:0)[832r,832d:1) 0@640r 1@832r 2@x L00000001 [592r,672r:1)[752r,800r:0)[832r,832d:2) 0@752r 1@592r 2@832r 3@x L00000002 [608r,672r:1)[768r,800r:0)[832r,832d:2) 0@768r 1@608r 2@832r 3@x L00000004 [528r,816r:0)[832r,864B:1)[864B,912r:2) 0@528r 1@832r 2@864B-phi weight:0.000000e+00 joinSubRegRanges L00000008 EMPTY LRange [640r,672r:0)[832r,832d:1) 0@640r 1@832r 2@x RRange [800r,848r:0) 0@800r analyzeValue(640r) gives CR_Keep analyzeValue(800r) gives CR_Keep Identical CR_Erase analyzeValue(832r) gives CR_Erase merge %25:1@832r into %17:0@800r --> @800r analyzeValue(invalid) gives CR_Keep joined lanes: [640r,672r:0)[800r,848r:1) 0@640r 1@800r 2@x joinSubRegRanges L00000001 EMPTY LRange [592r,672r:1)[752r,800r:0)[832r,832d:2) 0@752r 1@592r 2@832r 3@x RRange [800r,848r:0) 0@800r analyzeValue(752r) gives CR_Keep analyzeValue(592r) gives CR_Keep analyzeValue(800r) gives CR_Erase merge %17:0@800r into %25:0@752r --> @752r analyzeValue(832r) gives CR_Unresolved analyzeValue(invalid) gives CR_Keep conflict at %25:2@832r *** Couldn't join subrange! UNREACHABLE executed at ../lib/CodeGen/RegisterCoalescer.cpp:3060!

Actually, with this change on top of upstream llvm, L00000008 fails with CR_Unresolved too. But I also had D35073, which fixed the problem for L00000008 but not L00000001, in the dump I added above.

Fix the "cannot join subranges" issue caused by redefining an undef value (more details in comments), roughly something like this:

%0 <- undef %1 = COPY %0 %0 = COPY %1 <- now %0 is "def", even though it's identical to what it was before (i.e. "undef").

D'oh! I now think that last problem is because I made a mistake resolving conflicts when cherry-picking this change on top of D35073.

So I think this change is good, and I'm going to set off some more testing overnight to make sure.

Ah, I see you've done a fix for it anyway. It might fix the same as D35073. I'll check.

Your 'Fix the "cannot join subranges" issue caused by redefining an undef value' does not seem to fix all the cases covered by D35073, so I suggest removing that part of the fix again and moving the discussion on how to fix that problem onto D35073. (I realize you had to try and fix it blind as I was unable to give you my test case.)

lib/CodeGen/RegisterCoalescer.cpp | ||
---|---|---|

2875 | I now have a test case that hits this assert. I will investigate further. |

lib/CodeGen/RegisterCoalescer.cpp | ||
---|---|---|

2875 | It could be because of a basic block boundary between Def and OtherDef. You can probably delete this code under #ifndef NDEBUG. |

lib/CodeGen/RegisterCoalescer.cpp | ||
---|---|---|

2875 | We don't want to test this in the case that there were no end points to extend to. So I put this round the block of code inside the #ifndef: if (!EndPoints.empty()) { |

My fix for the redef/undef issue was trying to avoid cases where we treat "undef" values from different registers as identical. I guess this could be relaxed (i.e. we can treat all undef values as identical), but we'd have to make sure that this equality is only used to allow removing an undef value (i.e. to allow pruning it without following it with extendToIndices).

That's because this change isn't in yet:

--- a/lib/CodeGen/RegisterCoalescer.cpp +++ b/lib/CodeGen/RegisterCoalescer.cpp @@ -1653,6 +1653,10 @@ bool RegisterCoalescer::joinCopy(MachineInstr *CopyMI, bool &Again) { deleteInstr(CopyMI); return false; // Not coalescable. } + if (CopyMI->getOpcode() == TargetOpcode::IMPLICIT_DEF) { + // Not coalesceable; eliminateUndefCopy turned it into implicit_def. + return false; + } // Coalesced copies are normally removed immediately, but transformations // like removeCopyByCommutingDef() can inadvertently create identity copies.

This shouldn't be necessary. Try returning "true" from eliminateUndefCopy after generating an IMPLICIT_DEF.

But then it would erase CopyMI, which used to be the copy but is now the IMPLICIT_DEF.

We shouldn't really be calling joinCopy with something that is not a COPY, but I guess it's a minor issue. Please replace the getOpcode() == ... with isImplicitDef(), and it should be fine.

It was a copy at the point that joinCopy was called, but eliminateUndefCopy changed it to implicit_def.

Good idea re isImplicitDef().

lib/CodeGen/RegisterCoalescer.cpp | ||
---|---|---|

2875 | Even with that change, I've now got a different test case that hits this assert. I'll investigate and see what is going on. |

Here's what I've found so far.

Here is some of the code (two separate blocks, but the first one is the only predecessor of the second one):

656B %217.sub0:vreg_128 = COPY %46:sgpr_32 720B %188:vreg_128 = COPY %217:vreg_128 816B %172:vreg_128 = IMPLICIT_DEF 1568B %185:vreg_128 = COPY %188:vreg_128 1600B %172:vreg_128 = COPY %185:vreg_128

We're trying to coalesce this:

1200B %217:vreg_128 = COPY %172:vreg_128

At 1600, we have that situation where it sets CR_Erase and Identical. %172 is a copy of %185 is a copy of %188 is a copy of %217 defined at 656r, and it's the same value of %217 that is live and reaches 1600.

My current theory is that that condition is not strong enough: we have a different value of %172 at 816, and it is "in the way", in the sense that, once we have merged, the value defined at 656r no longer reaches 1600. Then the extendToIndices for L00000004 goes wrong because it attaches the use at 1648B to the implicit_def at 816r.

But I'm not sure what the stronger condition should be.

I can give you this test case, so I'll email it over.

This "in the way" def at 816r is an implicit_def that is also being erased. I think this has to be the case, otherwise the main range would not have set our def as CR_Erase Identical, and/or the main range would not have allowed coalescing at all.

Therefore I now think the fix for this problem is to do all the prunes first, remembering the end points, and then do the extend afterwards.

I have a local fix for that which I will now start testing.

Hi Krzysztof

With my latest fix for the "in the way value" problem, it all passes my local testing.

That latest fix is quite a big diff in pruneSubRegValues because it puts the loops the other way round (outer loop is subranges, inner loop is values). So I figured it would be easier to start a new review D48478 to supersede and subsume this one. Could you head over and review that one please?

Thanks for all your help on this issue.

Fixed handling of newly created implicit defs, relaxed handling of undef values in followCopyChain.