This is an archive of the discontinued LLVM Phabricator instance.

Improve handling of COPY instructions with identical value numbers
ClosedPublic

Authored by kparzysz on Jun 12 2018, 3:40 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

kparzysz created this revision.Jun 12 2018, 3:40 PM
tpr accepted this revision.Jun 13 2018, 2:24 AM
tpr added a subscriber: tpr.

Yes, that works for me thanks on my larger test case.

lib/CodeGen/RegisterCoalescer.cpp
2462

Did you mean to leave this comment from my change in?

This revision is now accepted and ready to land.Jun 13 2018, 2:24 AM
kparzysz marked an inline comment as done.Jun 13 2018, 5:48 AM
This revision was automatically updated to reflect the committed changes.

This testcase is exposing some preexisting problem in the coalescer. I have reverted this patch until I figure out what's wrong.

kparzysz reopened this revision.Jun 14 2018, 1:13 PM
This revision is now accepted and ready to land.Jun 14 2018, 1:13 PM
kparzysz updated this revision to Diff 151411.Jun 14 2018, 1:15 PM

Properly extend liveness of a merged value when a redundant copy was erased.

nhaehnle removed a subscriber: nhaehnle.Jun 15 2018, 3:43 AM
tpr added a comment.Jun 15 2018, 4:18 AM

With this, I'm getting a different assert "CopyMI input register not live" in one test case. I'll narrow it down and check I can reproduce it on upstream LLVM with your fix (as opposed to our local branch with your fix).

tpr added a subscriber: dstuttard.Jun 16 2018, 4:04 AM
tpr added a comment.Jun 16 2018, 10:46 AM

I can fix that assert on my testcase with this addition to your fix:

--- a/lib/CodeGen/RegisterCoalescer.cpp
+++ b/lib/CodeGen/RegisterCoalescer.cpp
@@ -1653,6 +1653,10 @@ bool RegisterCoalescer::joinCopy(MachineInstr *CopyMI, bool &Again) {
     deleteInstr(CopyMI);
     return false;  // Not coalescable.
   }
+  if (CopyMI->getOpcode() == TargetOpcode::IMPLICIT_DEF) {
+    // Not coalesceable; eliminateUndefCopy turned it into implicit_def.
+    return false;
+  }

   // Coalesced copies are normally removed immediately, but transformations
   // like removeCopyByCommutingDef() can inadvertently create identity copies.

I will go and test that more thoroughly.

tpr added a comment.Jun 17 2018, 9:41 AM

I'm getting another failure somewhere else in our test suite with that. I need to analyze it further.

tpr added inline comments.Jun 18 2018, 7:23 AM
lib/CodeGen/RegisterCoalescer.cpp
2842

This is the cause of my second failure. In my test, because of some weird branching around in the control flow, there is another value in between Id and Q.endPoint(), and this code is illegally extending segment Id over the top of the other value.

I guess what it really needs is to add new segments as necessary so Id->valno is live on all paths from OtherDef to Def.

Is there some ready made code to do that somewhere?

Please post a testcase. We need to understand how these things happen.

Could you add "LIS->print(dbgs())" at the beginning of joinCopy and post the output with -debug-only=regalloc? You can remove the unnecessary parts from the output. For the copy that triggers the problem, I'd like to see all instructions using the registers from the copy and the debugging output from the coalescer.

tpr added a comment.Jun 18 2018, 7:46 AM

Sorry about the lack of testcase. I need to get clearance to post it as it is derived from a third party app.

I'll gather together the useful bits of the debug log.

tpr added a comment.Jun 18 2018, 8:29 AM
********** INTERVALS **********
%67 [3184r,3216r:0)  0@3184r weight:0.000000e+00
%72 [384B,544r:0)[2576r,2608B:2)[2864B,2912B:0)[3072r,3104B:1)[3104B,3264B:0)  0@2864B-phi 1@3072r 2@2576r L00000004 [384B,544r:0)[2576r,2608B:2)[2864B,2912B:0)[3072r,3104B:1)[3104B,3264B:0)  0@2864B-phi 1@3072r 2@2576r L00000002 [384B,528r:0)[2576r,2608B:2)[2864B,2912B:0)[3072r,3104B:1)[3104B,3264B:0)  0@2864B-phi 1@3072r 2@2576r L00000001 [384B,528r:0)[2576r,2608B:2)[2864B,2912B:0)[3072r,3104B:1)[3104B,3264B:0)  0@2864B-phi 1@3072r 2@2576r L00000008 [2576r,2608B:2)[2864B,2912B:0)[3072r,3104B:1)[3104B,3184r:0)  0@2864B-phi 1@3072r 2@2576r weight:0.000000e+00
%86 [224r,240B:1)[240B,384B:2)[640B,1808B:2)[1904r,1920B:3)[1920B,2576r:4)[2608B,2624r:4)[3216r,3264B:0)  0@3216r 1@224r 2@240B-phi 3@1904r 4@1920B-phi L00000004 [224r,240B:1)[240B,384B:2)[640B,1808B:2)[1904r,1920B:3)[1920B,2576r:4)[2608B,2624r:4)[3216r,3264B:0)  0@3216r 1@224r 2@240B-phi 3@1904r 4@1920B-phi L0000000B [224r,240B:1)[240B,384B:2)[640B,1808B:2)[1904r,1920B:3)[1920B,2576r:4)[3216r,3264B:0)  0@3216r 1@224r 2@240B-phi 3@1904r 4@1920B-phi weight:0.000000e+00

********** MACHINEINSTRS **********
# Machine code for function _amdgpu_cs_main: NoPHIs, TracksLiveness

2864B	bb.66.Flow:
	; predecessors: %bb.63, %bb.67
	  successors: %bb.68(0x80000000); %bb.68(100.00%)

2896B	  S_BRANCH %bb.68

2912B	bb.67 (%ir-block.152):
	; predecessors: %bb.64, %bb.65
	  successors: %bb.66(0x80000000); %bb.66(100.00%)

2928B	  $exec = S_OR_B64 $exec, %5:sreg_64, implicit-def $scc
2944B	  %8:vreg_1 = COPY %89:vreg_1
2960B	  %40:sreg_64_xexec = V_CMP_NE_U32_e64 0, %8:vreg_1, implicit $exec
2976B	  %39:vgpr_32 = V_CNDMASK_B32_e64 0, 1, %40:sreg_64_xexec, implicit $exec
2992B	  %82:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
3008B	  undef %81.sub0:vreg_128 = COPY %82:vgpr_32
3024B	  %81.sub1:vreg_128 = COPY %82:vgpr_32
3040B	  %81.sub2:vreg_128 = COPY %39:vgpr_32
3056B	  %73:vreg_128 = COPY %81:vreg_128
3072B	  %72:vreg_128 = COPY %73:vreg_128
3088B	  S_BRANCH %bb.66

3104B	bb.68 (%ir-block.156):
	; predecessors: %bb.66
	  successors: %bb.4(0x04000000), %bb.1(0x7c000000); %bb.4(3.12%), %bb.1(96.88%)

3120B	  %64:vgpr_32 = V_ADD_I32_e32 32, %85:vgpr_32, implicit-def dead $vcc, implicit $exec
3136B	  V_CMP_EQ_U32_e32 0, %64:vgpr_32, implicit-def $vcc, implicit $exec
3152B	  %63:vgpr_32 = COPY %64:vgpr_32
3168B	  $vcc = S_AND_B64 $exec, killed $vcc, implicit-def dead $scc
3184B	  %67:vreg_128 = COPY %72:vreg_128
3200B	  %85:vgpr_32 = COPY %63:vgpr_32
3216B	  %86:vreg_128 = COPY %67:vreg_128
3232B	  S_CBRANCH_VCCNZ %bb.4, implicit killed $vcc
3248B	  S_BRANCH %bb.1

# End machine code for function _amdgpu_cs_main.


2576B	%72:vreg_128 = COPY %86:vreg_128
	Considering merging to VReg_128 with %72 in %86
		RHS = %72 [384B,544r:0)[2576r,2608B:2)[2864B,2912B:0)[3072r,3104B:1)[3104B,3264B:0)  0@2864B-phi 1@3072r 2@2576r L00000004 [384B,544r:0)[2576r,2608B:2)[2864B,2912B:0)[3072r,3104B:1)[3104B,3264B:0)  0@2864B-phi 1@3072r 2@2576r L00000002 [384B,528r:0)[2576r,2608B:2)[2864B,2912B:0)[3072r,3104B:1)[3104B,3264B:0)  0@2864B-phi 1@3072r 2@2576r L00000001 [384B,528r:0)[2576r,2608B:2)[2864B,2912B:0)[3072r,3104B:1)[3104B,3264B:0)  0@2864B-phi 1@3072r 2@2576r L00000008 [2576r,2608B:2)[2864B,2912B:0)[3072r,3104B:1)[3104B,3184r:0)  0@2864B-phi 1@3072r 2@2576r weight:0.000000e+00
		LHS = %86 [224r,240B:1)[240B,384B:2)[640B,1808B:2)[1904r,1920B:3)[1920B,2576r:4)[2608B,2624r:4)[3216r,3264B:0)  0@3216r 1@224r 2@240B-phi 3@1904r 4@1920B-phi L00000004 [224r,240B:1)[240B,384B:2)[640B,1808B:2)[1904r,1920B:3)[1920B,2576r:4)[2608B,2624r:4)[3216r,3264B:0)  0@3216r 1@224r 2@240B-phi 3@1904r 4@1920B-phi L0000000B [224r,240B:1)[240B,384B:2)[640B,1808B:2)[1904r,1920B:3)[1920B,2576r:4)[3216r,3264B:0)  0@3216r 1@224r 2@240B-phi 3@1904r 4@1920B-phi weight:0.000000e+00
		merge %86:0@3216r into %72:0@2864B --> @2864B
		merge %72:2@2576r into %86:4@1920B --> @1920B
		LHST = %86 %86 [224r,240B:1)[240B,384B:2)[640B,1808B:2)[1904r,1920B:3)[1920B,2576r:4)[2608B,2624r:4)[3216r,3264B:0)  0@3216r 1@224r 2@240B-phi 3@1904r 4@1920B-phi L00000004 [224r,240B:1)[240B,384B:2)[640B,1808B:2)[1904r,1920B:3)[1920B,2576r:4)[2608B,2624r:4)[3216r,3264B:0)  0@3216r 1@224r 2@240B-phi 3@1904r 4@1920B-phi L0000000B [224r,240B:1)[240B,384B:2)[640B,1808B:2)[1904r,1920B:3)[1920B,2576r:4)[3216r,3264B:0)  0@3216r 1@224r 2@240B-phi 3@1904r 4@1920B-phi weight:0.000000e+00
		merge %86:0@3216r into %72:0@2864B --> @2864B
		merge %72:2@2576r into %86:4@1920B --> @1920B
		joined lanes: [224r,240B:1)[240B,384B:2)[384B,544r:0)[640B,1808B:2)[1904r,1920B:3)[1920B,2624r:4)[2864B,2912B:0)[3072r,3104B:5)[3104B,3264B:0)  0@2864B-phi 1@224r 2@240B-phi 3@1904r 4@1920B-phi 5@3072r
		merge %86:0@3216r into %72:0@2864B --> @2864B
		merge %72:2@2576r into %86:4@1920B --> @1920B
		joined lanes: [224r,240B:1)[240B,384B:2)[384B,528r:0)[640B,1808B:2)[1904r,1920B:3)[1920B,2608B:4)[2864B,2912B:0)[3072r,3104B:5)[3104B,3264B:0)  0@2864B-phi 1@224r 2@240B-phi 3@1904r 4@1920B-phi 5@3072r
		merge %86:0@3216r into %72:0@2864B --> @2864B
		merge %72:2@2576r into %86:4@1920B --> @1920B
		joined lanes: [224r,240B:1)[240B,384B:2)[384B,528r:0)[640B,1808B:2)[1904r,1920B:3)[1920B,2608B:4)[2864B,2912B:0)[3072r,3104B:5)[3104B,3264B:0)  0@2864B-phi 1@224r 2@240B-phi 3@1904r 4@1920B-phi 5@3072r
		merge %72:2@2576r into %86:4@1920B --> @1920B
		joined lanes: [224r,240B:1)[240B,384B:2)[640B,1808B:2)[1904r,1920B:3)[1920B,2608B:4)[2864B,2912B:5)[3072r,3104B:6)[3104B,3184r:5)[3216r,3264B:0)  0@3216r 1@224r 2@240B-phi 3@1904r 4@1920B-phi 5@2864B-phi 6@3072r
	Joined SubRanges %86 [224r,240B:1)[240B,384B:2)[640B,1808B:2)[1904r,1920B:3)[1920B,2576r:4)[2608B,2624r:4)[3216r,3264B:0)  0@3216r 1@224r 2@240B-phi 3@1904r 4@1920B-phi L00000001 [224r,240B:1)[240B,384B:2)[384B,528r:0)[640B,1808B:2)[1904r,1920B:3)[1920B,2608B:4)[2864B,2912B:0)[3072r,3104B:5)[3104B,3264B:0)  0@2864B-phi 1@224r 2@240B-phi 3@1904r 4@1920B-phi 5@3072r L00000002 [224r,240B:1)[240B,384B:2)[384B,528r:0)[640B,1808B:2)[1904r,1920B:3)[1920B,2608B:4)[2864B,2912B:0)[3072r,3104B:5)[3104B,3264B:0)  0@2864B-phi 1@224r 2@240B-phi 3@1904r 4@1920B-phi 5@3072r L00000004 [224r,240B:1)[240B,384B:2)[384B,544r:0)[640B,1808B:2)[1904r,1920B:3)[1920B,2624r:4)[2864B,2912B:0)[3072r,3104B:5)[3104B,3264B:0)  0@2864B-phi 1@224r 2@240B-phi 3@1904r 4@1920B-phi 5@3072r L00000008 [224r,240B:1)[240B,384B:2)[640B,1808B:2)[1904r,1920B:3)[1920B,2608B:4)[2864B,2912B:5)[3072r,3104B:6)[3104B,3184r:5)[3216r,3264B:0)  0@3216r 1@224r 2@240B-phi 3@1904r 4@1920B-phi 5@2864B-phi 6@3072r weight:0.000000e+00
		Expecting instruction removal at 3216r
		 checking:  L00000001 [224r,240B:1)[240B,384B:2)[384B,528r:0)[640B,1808B:2)[1904r,1920B:3)[1920B,2608B:4)[2864B,2912B:0)[3072r,3104B:5)[3104B,3264B:0)  0@2864B-phi 1@224r 2@240B-phi 3@1904r 4@1920B-phi 5@3072r
		 checking:  L00000002 [224r,240B:1)[240B,384B:2)[384B,528r:0)[640B,1808B:2)[1904r,1920B:3)[1920B,2608B:4)[2864B,2912B:0)[3072r,3104B:5)[3104B,3264B:0)  0@2864B-phi 1@224r 2@240B-phi 3@1904r 4@1920B-phi 5@3072r
		 checking:  L00000004 [224r,240B:1)[240B,384B:2)[384B,544r:0)[640B,1808B:2)[1904r,1920B:3)[1920B,2624r:4)[2864B,2912B:0)[3072r,3104B:5)[3104B,3264B:0)  0@2864B-phi 1@224r 2@240B-phi 3@1904r 4@1920B-phi 5@3072r
		 checking:  L00000008 [224r,240B:1)[240B,384B:2)[640B,1808B:2)[1904r,1920B:3)[1920B,2608B:4)[2864B,2912B:5)[3072r,3104B:6)[3104B,3184r:5)[3216r,3264B:0)  0@3216r 1@224r 2@240B-phi 3@1904r 4@1920B-phi 5@2864B-phi 6@3072r
		Prune sublane 00000008 at 3216r
Assertion failed: (I->end <= std::next(I)->start), function verify, file ../lib/CodeGen/LiveInterval.cpp, line 1022.

(lldb) up 5
frame #5: 0x00000001012cf283 llc`(anonymous namespace)::JoinVals::pruneSubRegValues(this=0x00007fff5fbfa9e8, LI=0x0000000106b0a9e0, ShrinkMask=0x0000000106d00be0) + 1619 at RegisterCoalescer.cpp:2872
   2869	        }
   2870	        // Mark value number as unused.
   2871	        ValueOut->markUnused();
-> 2872	        S.verify();
   2873	        continue;
   2874	      }
   2875	      // If a subrange ends at the copy, then a value was copied but only
(lldb) p S.dump()
 L00000008 [224r,240B:1)[240B,384B:2)[640B,1808B:2)[1904r,1920B:3)[1920B,2608B:4)[2864B,3264B:5)[3072r,3104B:6)[3104B,3184r:5)  0@x 1@224r 2@240B-phi 3@1904r 4@1920B-phi 5@2864B-phi 6@3072r

It was trying to coalesce %72 and %86, and it did that thing where it notices that a value of %86 is copied (at 3216) from %67, which is itself copied (at 3184) from %72. %72's value at that point is 5@2864B-phi, which reaches 3184 by branching over [2912B,3104B), in which another value 6@3072r is defined.

In lane L00000008, it wants to remove the segment [3216r,3264B:0) and extend [2864B,2912B:5) to 3264B, but it ends up with [2864B,3264B:5), whereas it should have [2864B,2912B:5)[3104B,3264B:5).

tpr added a comment.Jun 18 2018, 8:30 AM

I've just realized I am allowed to post the whole cutdown test case because it comes from a conformance test, not from someone else's app.

What's the best way to post it?

Is it an .ll file? You can email it to me, I'll try to see if I can reduce it.

tpr added a comment.Jun 18 2018, 9:14 AM

Thanks, but how do I get your email address from phabricator? It's a .mir file that I got from -stop-before=simple-register-coalescer on a .ll test that I had cut down with bugpoint.

My username at codeaurora.org.

tpr added a comment.Jun 18 2018, 9:54 AM

I'm trying out fixing up the LR like this:

// If a subrange starts at the copy then an undefined value has been
// copied and we must remove that subrange value as well.
VNInfo *ValueOut = Q.valueOutOrDead();
if (ValueOut != nullptr && Q.valueIn() == nullptr) {
  DEBUG(dbgs() << "\t\tPrune sublane " << PrintLaneMask(S.LaneMask)
               << " at " << Def << "\n");
  SmallVector<SlotIndex, 4> EndPoints;
  LIS->pruneValue(S, Def, &EndPoints);
  DidPrune = true;

  // Mark value number as unused.
  ValueOut->markUnused();

  if (V.Identical) {
    // Ensure that liveness reaches the end points of the value that we
    // have just pruned.
    LIS->extendToIndices(S, EndPoints, ArrayRef<SlotIndex>());
  }
  S.verify();
  continue;
}

That simply undoes the pruning. It has to extend the specific value (OtherVNI).

tpr added a comment.Jun 18 2018, 10:21 AM

It has pruned value 0 away completely, and it extends value 5 to reach the end point(s) of value 0. Isn't that what we want?

tpr added a comment.Jun 18 2018, 10:22 AM

I guess there is the possibility that OtherVNI is not the closest dominating def, but surely it would not be coalescing in the first place if that was true.

You're right, extendToIndices should be ok. I wanted to be explicit about what gets extended (since it could catch other errors), while extendToIndices would just extend whatever is the reaching value.

kparzysz updated this revision to Diff 151754.Jun 18 2018, 11:09 AM

Use extendToIndices.

I can reproduce the problem using your testcase (which now works), let me try to reduce it and add to the patch.

tpr added inline comments.Jun 18 2018, 11:28 AM
lib/CodeGen/RegisterCoalescer.cpp
2842

Why don't we need to extend to all the uses of the value we've just pruned?

kparzysz updated this revision to Diff 151768.Jun 18 2018, 12:48 PM

Updated the extension, added second testcase.

kparzysz updated this revision to Diff 151775.Jun 18 2018, 12:58 PM

Remove unnecessary stuff from the other testcase.

tpr added a comment.Jun 19 2018, 1:29 AM

I now have another test case where the extendToIndices falls over with "Use not dominated by all defs". I will investigate.

tpr added inline comments.Jun 19 2018, 5:12 AM
lib/CodeGen/RegisterCoalescer.cpp
2841

My new failure is because it gets here with V.Identical true, but in this particular lane there is no def or liveness at OtherDef. So I added a check for that:

if (V.Identical && S.Query(OtherDef).valueOut()) {
  // If V is identical to V.OtherVNI and is live out of OtherDef in
  // this lane, then we can't simply prune V from S. V needs to be
  // replaced with V.OtherVNI.

But I am now getting a "Couldn't join subrange" in the code that originally prompted this whole fix (but in not the cutdown that is the first test in this change). So I need to investigate that.

kparzysz added inline comments.Jun 19 2018, 7:40 AM
lib/CodeGen/RegisterCoalescer.cpp
2277

This should obvoiusly be &&...

kparzysz updated this revision to Diff 151916.Jun 19 2018, 7:56 AM

Fixed a typo and added a check for subrange liveness at OtherDef.

tpr added a comment.Jun 19 2018, 9:47 AM

Good spot on the typo, but unfortunately it does not fix my latest problem. I have been told that I am not allowed to send the test case outside of AMD, so I'll have to try and get by with just showing you part of a debug dump of what is going on.

It is trying to coalesce %17 and %25, and the main range and L00000008 say that 832r is a CR_Erase with Identical set. But then L00000001 says that 832r is CR_Unresolved.

Any ideas? I'll have to stop working on this in a minute until tomorrow.

********** INTERVALS **********
%1 [912r,928r:0)  0@912r weight:0.000000e+00
%16 [816r,832r:0)  0@816r weight:0.000000e+00
%17 [800r,848r:0)  0@800r weight:0.000000e+00
%18 [720r,720d:0)  0@720r weight:0.000000e+00
%25 [528r,592r:0)[592r,608r:3)[608r,640r:4)[640r,704B:5)[704B,752r:0)[752r,768r:1)[768r,816r:2)[832r,864B:6)[864B,912r:7)  0@528r 1@752r 2@768r 3@592r 4@608r 5@640r 6@832r 7@864B-phi L00000008 [640r,672r:0)[832r,832d:1)  0@640r 1@832r 2@x L00000001 [592r,672r:1)[752r,800r:0)[832r,832d:2)  0@752r 1@592r 2@832r 3@x L00000002 [608r,672r:1)[768r,800r:0)[832r,832d:2)  0@768r 1@608r 2@832r 3@x L00000004 [528r,816r:0)[832r,864B:1)[864B,912r:2)  0@528r 1@832r 2@864B-phi weight:0.000000e+00
%36 [976r,992r:0)  0@976r weight:0.000000e+00
%41 [672r,704B:1)[848r,864B:0)[864B,976r:2)  0@848r 1@672r 2@864B-phi weight:0.000000e+00
%42 [112r,144B:2)[992r,1008B:1)[1008B,1040r:3)[1184r,1216B:0)  0@1184r 1@992r 2@112r 3@1008B-phi L00000008 [112r,144B:2)[992r,1008B:1)[1008B,1040r:3)[1184r,1216B:0)  0@1184r 1@992r 2@112r 3@1008B-phi L00000007 [112r,112d:2)[992r,992d:1)[1184r,1184d:0)  0@1184r 1@992r 2@112r 3@x weight:0.000000e+00
RegMasks:
********** MACHINEINSTRS **********
...
752B	  %25.sub0:sreg_128 = COPY %25.sub2:sreg_128
768B	  %25.sub1:sreg_128 = COPY %25.sub2:sreg_128
800B	  %17:sreg_128 = COPY %25:sreg_128
816B	  %16:sreg_128 = COPY %25:sreg_128
832B	  %25:sreg_128 = COPY %16:sreg_128
848B	  %41:sreg_128 = COPY %17:sreg_128

864B	bb.17.bb22:
	; predecessors: %bb.15, %bb.16
...

# End machine code for function _amdgpu_cs_main.

800B	%17:sreg_128 = COPY %25:sreg_128
	Considering merging to SReg_128 with %17 in %25
		RHS = %17 [800r,848r:0)  0@800r weight:0.000000e+00
		LHS = %25 [528r,592r:0)[592r,608r:3)[608r,640r:4)[640r,704B:5)[704B,752r:0)[752r,768r:1)[768r,816r:2)[832r,864B:6)[864B,912r:7)  0@528r 1@752r 2@768r 3@592r 4@608r 5@640r 6@832r 7@864B-phi L00000008 [640r,672r:0)[832r,832d:1)  0@640r 1@832r 2@x L00000001 [592r,672r:1)[752r,800r:0)[832r,832d:2)  0@752r 1@592r 2@832r 3@x L00000002 [608r,672r:1)[768r,800r:0)[832r,832d:2)  0@768r 1@608r 2@832r 3@x L00000004 [528r,816r:0)[832r,864B:1)[864B,912r:2)  0@528r 1@832r 2@864B-phi weight:0.000000e+00
		analyzeValue(528r) gives CR_Keep
		analyzeValue(752r) gives CR_Keep
		analyzeValue(768r) gives CR_Keep
		analyzeValue(592r) gives CR_Keep
		analyzeValue(608r) gives CR_Keep
		analyzeValue(640r) gives CR_Keep
		analyzeValue(800r) gives CR_Erase
		merge %17:0@800r into %25:2@768r --> @768r
		Identical CR_Erase
		analyzeValue(832r) gives CR_Erase
		merge %25:6@832r into %17:0@800r --> @768r
		analyzeValue(864B) gives CR_Keep
		LHST = %25 %25 [528r,592r:0)[592r,608r:3)[608r,640r:4)[640r,704B:5)[704B,752r:0)[752r,768r:1)[768r,816r:2)[832r,864B:6)[864B,912r:7)  0@528r 1@752r 2@768r 3@592r 4@608r 5@640r 6@832r 7@864B-phi L00000008 [640r,672r:0)[832r,832d:1)  0@640r 1@832r 2@x L00000001 [592r,672r:1)[752r,800r:0)[832r,832d:2)  0@752r 1@592r 2@832r 3@x L00000002 [608r,672r:1)[768r,800r:0)[832r,832d:2)  0@768r 1@608r 2@832r 3@x L00000004 [528r,816r:0)[832r,864B:1)[864B,912r:2)  0@528r 1@832r 2@864B-phi weight:0.000000e+00
	joinSubRegRanges  L00000008 EMPTY
		LRange [640r,672r:0)[832r,832d:1)  0@640r 1@832r 2@x
		RRange [800r,848r:0)  0@800r
		analyzeValue(640r) gives CR_Keep
		analyzeValue(800r) gives CR_Keep
		Identical CR_Erase
		analyzeValue(832r) gives CR_Erase
		merge %25:1@832r into %17:0@800r --> @800r
		analyzeValue(invalid) gives CR_Keep
		joined lanes: [640r,672r:0)[800r,848r:1)  0@640r 1@800r 2@x
	joinSubRegRanges  L00000001 EMPTY
		LRange [592r,672r:1)[752r,800r:0)[832r,832d:2)  0@752r 1@592r 2@832r 3@x
		RRange [800r,848r:0)  0@800r
		analyzeValue(752r) gives CR_Keep
		analyzeValue(592r) gives CR_Keep
		analyzeValue(800r) gives CR_Erase
		merge %17:0@800r into %25:0@752r --> @752r
		analyzeValue(832r) gives CR_Unresolved
		analyzeValue(invalid) gives CR_Keep
		conflict at %25:2@832r
*** Couldn't join subrange!

UNREACHABLE executed at ../lib/CodeGen/RegisterCoalescer.cpp:3060!
tpr added a comment.Jun 19 2018, 2:39 PM

Actually, with this change on top of upstream llvm, L00000008 fails with CR_Unresolved too. But I also had D35073, which fixed the problem for L00000008 but not L00000001, in the dump I added above.

kparzysz updated this revision to Diff 151982.Jun 19 2018, 3:06 PM

Fix the "cannot join subranges" issue caused by redefining an undef value (more details in comments), roughly something like this:

%0 <- undef
%1 = COPY %0
%0 = COPY %1  <- now %0 is "def", even though it's identical to what it was before (i.e. "undef").
tpr added a comment.Jun 19 2018, 3:19 PM

D'oh! I now think that last problem is because I made a mistake resolving conflicts when cherry-picking this change on top of D35073.

So I think this change is good, and I'm going to set off some more testing overnight to make sure.

tpr added a comment.Jun 19 2018, 3:22 PM

Ah, I see you've done a fix for it anyway. It might fix the same as D35073. I'll check.

tpr added a comment.Jun 20 2018, 5:42 AM

Your 'Fix the "cannot join subranges" issue caused by redefining an undef value' does not seem to fix all the cases covered by D35073, so I suggest removing that part of the fix again and moving the discussion on how to fix that problem onto D35073. (I realize you had to try and fix it blind as I was unable to give you my test case.)

lib/CodeGen/RegisterCoalescer.cpp
2848

I now have a test case that hits this assert. I will investigate further.

kparzysz added inline comments.Jun 20 2018, 5:49 AM
lib/CodeGen/RegisterCoalescer.cpp
2848

It could be because of a basic block boundary between Def and OtherDef. You can probably delete this code under #ifndef NDEBUG.

tpr added inline comments.Jun 20 2018, 6:05 AM
lib/CodeGen/RegisterCoalescer.cpp
2848

We don't want to test this in the case that there were no end points to extend to. So I put this round the block of code inside the #ifndef:

if (!EndPoints.empty()) {

My fix for the redef/undef issue was trying to avoid cases where we treat "undef" values from different registers as identical. I guess this could be relaxed (i.e. we can treat all undef values as identical), but we'd have to make sure that this equality is only used to allow removing an undef value (i.e. to allow pruning it without following it with extendToIndices).

tpr added a comment.Jun 20 2018, 6:33 AM

Oh dear, testing has shown up another "CopyMI input register not live".

tpr added a comment.Jun 20 2018, 8:51 AM

That's because this change isn't in yet:

--- a/lib/CodeGen/RegisterCoalescer.cpp
+++ b/lib/CodeGen/RegisterCoalescer.cpp
@@ -1653,6 +1653,10 @@ bool RegisterCoalescer::joinCopy(MachineInstr *CopyMI, bool &Again) {
     deleteInstr(CopyMI);
     return false;  // Not coalescable.
   }
+  if (CopyMI->getOpcode() == TargetOpcode::IMPLICIT_DEF) {
+    // Not coalesceable; eliminateUndefCopy turned it into implicit_def.
+    return false;
+  }

   // Coalesced copies are normally removed immediately, but transformations
   // like removeCopyByCommutingDef() can inadvertently create identity copies.

This shouldn't be necessary. Try returning "true" from eliminateUndefCopy after generating an IMPLICIT_DEF.

tpr added a comment.Jun 20 2018, 12:22 PM

But then it would erase CopyMI, which used to be the copy but is now the IMPLICIT_DEF.

We shouldn't really be calling joinCopy with something that is not a COPY, but I guess it's a minor issue. Please replace the getOpcode() == ... with isImplicitDef(), and it should be fine.

tpr added a comment.Jun 20 2018, 1:54 PM

It was a copy at the point that joinCopy was called, but eliminateUndefCopy changed it to implicit_def.

Good idea re isImplicitDef().

tpr added inline comments.Jun 21 2018, 7:35 AM
lib/CodeGen/RegisterCoalescer.cpp
2848

Even with that change, I've now got a different test case that hits this assert. I'll investigate and see what is going on.

tpr added a comment.Jun 21 2018, 10:15 AM

Here's what I've found so far.

Here is some of the code (two separate blocks, but the first one is the only predecessor of the second one):

656B	  %217.sub0:vreg_128 = COPY %46:sgpr_32
720B	  %188:vreg_128 = COPY %217:vreg_128
816B	  %172:vreg_128 = IMPLICIT_DEF

1568B	  %185:vreg_128 = COPY %188:vreg_128
1600B	  %172:vreg_128 = COPY %185:vreg_128

We're trying to coalesce this:

1200B	%217:vreg_128 = COPY %172:vreg_128

At 1600, we have that situation where it sets CR_Erase and Identical. %172 is a copy of %185 is a copy of %188 is a copy of %217 defined at 656r, and it's the same value of %217 that is live and reaches 1600.

My current theory is that that condition is not strong enough: we have a different value of %172 at 816, and it is "in the way", in the sense that, once we have merged, the value defined at 656r no longer reaches 1600. Then the extendToIndices for L00000004 goes wrong because it attaches the use at 1648B to the implicit_def at 816r.

But I'm not sure what the stronger condition should be.

I can give you this test case, so I'll email it over.

tpr added a comment.Jun 22 2018, 12:45 AM

This "in the way" def at 816r is an implicit_def that is also being erased. I think this has to be the case, otherwise the main range would not have set our def as CR_Erase Identical, and/or the main range would not have allowed coalescing at all.

Therefore I now think the fix for this problem is to do all the prunes first, remembering the end points, and then do the extend afterwards.

I have a local fix for that which I will now start testing.

tpr requested changes to this revision.Jun 22 2018, 5:06 AM

Hi Krzysztof

With my latest fix for the "in the way value" problem, it all passes my local testing.

That latest fix is quite a big diff in pruneSubRegValues because it puts the loops the other way round (outer loop is subranges, inner loop is values). So I figured it would be easier to start a new review D48478 to supersede and subsume this one. Could you head over and review that one please?

Thanks for all your help on this issue.

This revision now requires changes to proceed.Jun 22 2018, 5:06 AM
kparzysz updated this revision to Diff 152537.Jun 22 2018, 1:33 PM

Fixed handling of newly created implicit defs, relaxed handling of undef values in followCopyChain.

This includes all testcases from related reviews.

tpr accepted this revision.Jun 22 2018, 4:38 PM

Thanks Krzysztof. This all works for me now.

This revision is now accepted and ready to land.Jun 22 2018, 4:38 PM
This revision was automatically updated to reflect the committed changes.