This is an archive of the discontinued LLVM Phabricator instance.

[RegisterCoalescer] Fix for subrange join unreachable
AbandonedPublic

Authored by dstuttard on Jul 6 2017, 10:50 AM.

Details

Summary

During subrange join checking for a fullcopy that can be erased fails if
subranges don't match

When subrange join occurs and the following pattern is being checked (resulting
in an erase):

%other = COPY %ext
%this  = COPY %ext

It is sometimes the case that this is missed due to it being a FullCopy and none
of the subranges matching correctly.

See http://llvm.org/PR33152

Event Timeline

dstuttard created this revision.Jul 6 2017, 10:50 AM
dstuttard added a subscriber: llvm-commits.
arsenm added inline comments.Jul 6 2017, 1:29 PM
test/CodeGen/AMDGPU/regcoalesce-subregjoin-fullcopy.mir
1

Missing -verify-machineinstrs

dstuttard updated this revision to Diff 105623.Jul 7 2017, 5:35 AM

Adding verify-machineinstr flag to llc invocation in the test

dstuttard marked an inline comment as done.Jul 7 2017, 5:36 AM
MatzeB edited edge metadata.Jul 19 2017, 4:48 PM

I'm confused by this change (and the testcase doesn't really help understanding what is going on). You are probably onto a real bug here where we cannot assume the same lanemasks work for the input/output register class of a copy. However that should be independent of the fact that a full or partial copy is present.

I'm confused by this change (and the testcase doesn't really help understanding what is going on). You are probably onto a real bug here where we cannot assume the same lanemasks work for the input/output register class of a copy. However that should be independent of the fact that a full or partial copy is present.

Sorry about the test case. It isn't great, but that was the best I could do to provoke the error without spending a lot longer on it. The issue is provoked on the last 2 copies in the last BB (bb.25).

Having looked at what I've done again, I think you're correct that it isn't the right way to solve. Given the code before the change, it will never enter the subrange leg of the if as MI->isFullCopy() is always true at this point.

The issue that is showing up is that the LaneMask test for the subranges never results in a match in the loop over the subranges. This in turn leads to an assert shortly afterwards as the SubRange join never succeeds in any of the patterns being searched for in the JoinVals::computeAssignement method.

My initial attempt at a fix was the reasoning that since this if this is a fullcopy why do we need to match the subranges and pick the first matching one, they are all equally effective - now I realise that this is always the case, can't we always choose the same one as the non-subrange uses here?

Here's some info on the relevant registers and subranges:

%vreg122 [448r,480B:1)[560B,992B:0)[992B,1040r:4)[1104r,1168B:2)[1824B,1888r:0)[2016r,2064B:3)  0@560B-phi 1@448r 2@1104r 3@2016r 4@992B-phi L00000009 [448r,448d:0)[992B,1040r:5)[1104r,1104d:1)[2016r,2064B:4)  0@448r 1@1104r 2@x 3@x 4@2016r 5@992B-phi L00000002 [448r,480B:1)[560B,992B:0)[992B,1040r:4)[1104r,1168B:2)[1824B,1872r:0)[2016r,2064B:3)  0@560B-phi 1@448r 2@1104r 3@2016r 4@992B-phi L00000004 [448r,480B:1)[560B,992B:0)[992B,1040r:4)[1104r,1168B:2)[1824B,1888r:0)[2016r,2064B:3)  0@560B-phi 1@448r 2@1104r 3@2016r 4@992B-phi

%vreg126 [1872r,1888r:0)[1888r,2032r:1)  0@1872r 1@1888r L00000002 [1872r,2032r:0)  0@1872r L00000004 [1888r,2032r:0)  0@1888r

%vreg151 [880r,992B:0)[992B,1104r:2)[2032r,2064B:1)  0@880r 1@2032r 2@992B-phi

In this case it's using the def of vreg151 against vreg122 in these 2 lines:

2016B		%vreg122<def> = COPY %vreg126; VReg_128:%vreg122,%vreg126
2032B		%vreg151<def> = COPY %vreg126; VReg_128:%vreg151,%vreg126

and in the valuesIdentical method it iterates over the vreg126 subranges (L000002 and L0000004) against the L000009 from vreg122 and doesn't match.

Perhaps using valuesIdentical with subranges is the issue here and an alternative should be implemented?

dstuttard updated this revision to Diff 108275.Jul 26 2017, 7:24 AM

Implemented a slightly different approach to resolving the issue.

arsenm edited edge metadata.Oct 4 2017, 2:37 PM

ping, I think I ran into this same issue on another testcase

I'm going to take another look at this bug.
I've also uploaded a couple of other fixes for SubRange join failures. See D40300 and D40308. I've also uploaded a verifier method that is helpful in tracking these problems down. See D40297.

@arsenm you might find one of these changes also fixes your issue. Let me know if you try them and it does (or doesn't).

dstuttard updated this revision to Diff 129060.Jan 9 2018, 3:58 AM

A better fix for this problem.

The original fix chose an arbitrary other subreg to return in the case where the
subreg was actually not defined - this isn't ideal as this can sometimes lead
to matching something different and causing an assert.

Fix now updated to allow for undefined sub-registers (they are ignored if they
are undefined)

dstuttard updated this revision to Diff 129070.Jan 9 2018, 6:02 AM

Removing a debug llvm_unreachable

Any more comments on this change? If not I'll land this in the next few days.

hakzsam added a subscriber: hakzsam.Feb 6 2018, 1:47 AM

Any more comments on this change? If not I'll land this in the next few days.

Hi,

I hit a similar issue with AMDGPU backend but your patch doesn't seem to fix it.

The LLVM IR is https://hastebin.com/uxeqojudoy and the backtrace is https://hastebin.com/nitozuzeye.makefile

Not sure if that will help but the unreachable is triggered only if LLVMEarlyCSEMemPass is used.

Any ideas?

Thanks!

This patch fixes [0] on llvm-6 for me

[0] https://bugs.llvm.org/show_bug.cgi?id=36679

tpr added a subscriber: tpr.Jun 22 2018, 5:22 AM

Krzysztof's fix for another occurrence of what looks like the same problem was as follows. This cropped up in the middle of D48102.

I don't know if this fixes all our occurrences of the problem, but it's worth checking.

--- a/lib/CodeGen/RegisterCoalescer.cpp
+++ b/lib/CodeGen/RegisterCoalescer.cpp
@@ -2121,7 +2117,8 @@ class JoinVals {
   LaneBitmask computeWriteLanes(const MachineInstr *DefMI, bool &Redef) const;
 
   /// Find the ultimate value that VNI was copied from.
-  std::pair<const VNInfo*,unsigned> followCopyChain(const VNInfo *VNI) const;
+  std::pair<const VNInfo*,unsigned> followCopyChain(const VNInfo *VNI,
+                                                    unsigned OtherReg) const;
 
   bool valuesIdentical(VNInfo *Val0, VNInfo *Val1, const JoinVals &Other) const;
 
@@ -2240,18 +2237,18 @@ LaneBitmask JoinVals::computeWriteLanes(const MachineInstr *DefMI, bool &Redef)
 }
 
 std::pair<const VNInfo*, unsigned> JoinVals::followCopyChain(
-    const VNInfo *VNI) const {
-  unsigned Reg = this->Reg;
+    const VNInfo *VNI, unsigned OtherReg) const {
+  unsigned TrackReg = Reg;
 
   while (!VNI->isPHIDef()) {
     SlotIndex Def = VNI->def;
     MachineInstr *MI = Indexes->getInstructionFromIndex(Def);
     assert(MI && "No defining instruction");
     if (!MI->isFullCopy())
-      return std::make_pair(VNI, Reg);
+      return std::make_pair(VNI, TrackReg);
     unsigned SrcReg = MI->getOperand(1).getReg();
     if (!TargetRegisterInfo::isVirtualRegister(SrcReg))
-      return std::make_pair(VNI, Reg);
+      return std::make_pair(VNI, TrackReg);
 
     const LiveInterval &LI = LIS->getInterval(SrcReg);
     const VNInfo *ValueIn;
@@ -2272,25 +2269,41 @@ std::pair<const VNInfo*, unsigned> JoinVals::followCopyChain(
         break;
       }
     }
-    if (ValueIn == nullptr)
+    if (ValueIn == nullptr) {
+      // Reaching undefined value is legitimate in cases where it comes from
+      // one of the considered registers (Reg or OtherReg).
+      //
+      // 1   undef %0.sub1 = ...  ;; %0.sub0 == undef
+      // 2   %1 = COPY %0         ;; %1 is defined here.
+      // 3   %0 = COPY %1         ;; Now %0.sub0 has a definition!
+      // 4   ...                  ;; Here we should indicate that %0 at 3 is
+      //                          ;; the same as %0 at 2 (%0 at 3 == %1 at 2).
+      if (SrcReg == Reg || SrcReg == OtherReg)
+        return std::make_pair(nullptr, SrcReg);
       break;
+    }
     VNI = ValueIn;
-    Reg = SrcReg;
+    TrackReg = SrcReg;
   }
-  return std::make_pair(VNI, Reg);
+  return std::make_pair(VNI, TrackReg);
 }
 
 bool JoinVals::valuesIdentical(VNInfo *Value0, VNInfo *Value1,
                                const JoinVals &Other) const {
   const VNInfo *Orig0;
   unsigned Reg0;
-  std::tie(Orig0, Reg0) = followCopyChain(Value0);
+  std::tie(Orig0, Reg0) = followCopyChain(Value0, Other.Reg);
   if (Orig0 == Value1 && Reg0 == Other.Reg)
     return true;
 
   const VNInfo *Orig1;
   unsigned Reg1;
-  std::tie(Orig1, Reg1) = Other.followCopyChain(Value1);
+  std::tie(Orig1, Reg1) = Other.followCopyChain(Value1, Reg);
+  // If both values are undefined, and the source registers are the same
+  // register, the values are identical. Filter out cases where only one
+  // value is defined.
+  if (Orig0 == nullptr || Orig1 == nullptr)
+    return Orig0 == Orig1 && Reg0 == Reg1;
 
   // The values are equal if they are defined at the same place and use the
   // same register. Note that we cannot compare VNInfos directly as some of
tpr added a comment.Jun 22 2018, 9:07 AM

Hi Samuel

Your link to your test case does not work for me. Are you still getting a problem not addressed by this change?

tpr added a comment.Jun 22 2018, 9:17 AM

I have confirmed that Krzysztof's fix that I put in the comment above does not fix the test case here, and it does not fix 179 of the 209 shaders with the problem that I found in a couple of our big apps.

So I think we need to go with David's fix.

Is this patch still relevant? It doesn't apply to trunk and when I fixup the testcase, it now passes

dstuttard abandoned this revision.Nov 6 2018, 4:14 AM

This patch has been superseded by D49097 - so I'm abandoning this one.