Page MenuHomePhabricator

[RegisterCoalescer] Another fix for subrange join unreachable
Needs ReviewPublic

Authored by arsenm on Jul 9 2018, 12:12 PM.

Details

Summary

When checking for conflicts, two subreg values can be considered equal
if they are ultimately copies of the same register but one is undefined.

This is a simplified alternative to David's D35073 now that most of the
cases fixed by that are also fixed by Krzysztof's D48102.

Change-Id: I5c8c1a229237a46ae20d0dd18ee5121bfbfc5ad8

Diff Detail

Event Timeline

tpr created this revision.Jul 9 2018, 12:12 PM
tpr added a subscriber: dstuttard.

Here's how the problem happens:

Consider the following snippet, and vregs %19 and %34.

512B    bb.14.bb19:
        ; predecessors: %bb.9, %bb.12, %bb.13
          successors: %bb.15(0x40000000), %bb.16(0x40000000); %bb.15(50.00%), %bb.16(50.00%)
                
528B      undef %34.sub2:sreg_128 = S_MOV_B32 0
560B      S_CBRANCH_SCC0 %bb.16, implicit undef $scc
                
576B    bb.15:  
        ; predecessors: %bb.14
          successors: %bb.17(0x80000000); %bb.17(100.00%)
                
592B      undef %35.sub0:sreg_128 = COPY %34.sub2:sreg_128
656B      S_BRANCH %bb.17 
                
672B    bb.16.bb22:
        ; predecessors: %bb.14  
          successors: %bb.17(0x80000000); %bb.17(100.00%)
                
688B      undef %19.sub0:sreg_128 = COPY %34.sub2:sreg_128
704B      %19.sub1:sreg_128 = COPY %34.sub2:sreg_128
720B      %19.sub2:sreg_128 = COPY %34.sub2:sreg_128                                                            
736B      %17:sreg_128 = COPY %19:sreg_128
752B      %16:sreg_128 = COPY %34:sreg_128                                                                      
768B      %34:sreg_128 = COPY %16:sreg_128
784B      %35:sreg_128 = COPY %17:sreg_128

In %34 only sub2 (lane mask L00000004) is defined, and so the "full" use at 752 is not reflected in the subranges for other subregisters:

%34 [528r,752r:1)[768r,800B:0)[800B,848r:2)  0@768r 1@528r 2@800B-phi L00000004 [528r,752r:1)[768r,800B:0)[800B,848r:2)  0@768r 1@528r 2@800B-phi L0000000B [768r,768d:0)  0@768r 1@x 2@x

When the copy at 688 is coalesced into %19, subregisters other than sub2 are now live prior to 752, and the use of (now) %19 is using all defined subregisters:

688B      %19.sub0:sreg_128 = COPY %19.sub2:sreg_128
704B      %19.sub1:sreg_128 = COPY %19.sub2:sreg_128
736B      %17:sreg_128 = COPY %19:sreg_128
752B      %16:sreg_128 = COPY %19:sreg_128
768B      %19:sreg_128 = COPY %16:sreg_128
784B      %35:sreg_128 = COPY %17:sreg_128

This is, however, not reflected in the live interval for %19. Notice that subranges for sub0 (L0000001) and sub1 (L00000002) end before 752:

%19 [528r,688r:0)[688r,704r:1)[704r,752r:2)[768r,800B:3)[800B,848r:4)  0@528r 1@688r 2@704r 3@768r 4@800B-phi L00000008 [768r,768d:0)  0@768r 1@x 2@x L00000001 [688r,736r:0)[768r,768d:1)  0@688r 1@768r 2@x 3@x L00000002 [704r,736r:0)[768r,768d:1)  0@704r 1@768r 2@x 3@x L00000004 [528r,752r:0)[768r,800B:1)[800B,848r:2)  0@528r 1@768r 2@800B-phi weight:0.000000e+00

This leads to an assertion when the coalescer tries to coalesce %19 and %17 at 736.

tpr added a comment.Jul 12 2018, 4:55 AM

Are you saying there needs to be a fix to the way the coalescing happens between %19 and %34? Rather than relaxing the valuesIdentical criterion as I have?

tpr updated this revision to Diff 156150.Jul 18 2018, 2:25 PM

[RegisterCoalescer] Another fix for subrange join unreachable

Summary:
The test shows a case where a use is not in a subrange because the
subreg is undefined at the use point, but, after a coalesce, the
subrange incorrectly still does not include the use even though the
subreg is now defined at that point. The incorrect live range causes a
"subrange join unreachable" assert on a later coalesce.

This commit ensures that a subrange is extended to all uses that can be
reached from a definition.

V2: Completely different fix, instead of working round the problem in
the later coalesce that actually asserted.

tpr added a comment.Jul 18 2018, 2:27 PM

Krzysztof, my new fix is a bit of a sledgehammer approach. Is there a more refined way of spotting that we might have the error case and we need to do this?

tpr added a comment.Jul 18 2018, 2:53 PM

I have found another case that V2 of the fix does not fix, but V1 does. I'll narrow it down tomorrow.

We may need to guard this code with some extra checks (to verify that we have merged a non-undef value with an undef one), to avoid doing this unnecessarily, but I don't think there is a much simpler way to handle it.

tpr updated this revision to Diff 156265.Jul 19 2018, 7:13 AM

[RegisterCoalescer] Another fix for subrange join unreachable

Summary:
When checking for conflicts, two subreg values can be considered equal
if they are ultimately copies of the same register but one is undefined.

This is a simplified alternative to David's D35073 now that most of the
cases fixed by that are also fixed by Krzysztof's D48102.

The two tests show two different cases:

coalescing-subreg-was-undef-but-became-def.mir shows a case where a
whole reg use point of the whole register was undefined in one subreg,
but after coalescing it became defined but the subrange was not updated
to reflect that. It is arguable that this is correct, in that the subreg
is defined but the use does not care what the value is there, so the use
does not need to be in the subrange.

coalescing-subreg-removed-undef-copy.mir shows a case where a subreg was
defined by a copy from an undef register, and that copy got removed and
the subreg became undef at a later whole reg use.

You probably need D49535 first for the tests to make sense.

V3: Mostly back to the original fix, but only subregs are considered
identical if one is undef, not main regs.

tpr added a comment.Jul 26 2018, 8:28 AM

Ping (for the V3 change which mostly goes back to the V1 change but only allows a defined value to be equivalent to undef in a subrange, not a main range).

tpr added a comment.Aug 2 2018, 11:49 AM

Ping.

I have switched the fix mostly back to what it was doing in V1, and I hope that the two different cases shown by the two tests justify that.

tpr updated this revision to Diff 161253.Aug 17 2018, 8:01 AM

V4: Fixed test in light of "AMDGPU: Improve hack for packing conversion ops"

tpr updated this revision to Diff 162551.Aug 25 2018, 8:54 AM

With this version, I have gone back to the fix recommended by Krzysztof,
and updated the commit message:

[RegisterCoalescer] Extend subranges to uses after join

Summary:
The test shows a case where a full use is not in a subrange because the
subreg is undefined at the use point, but, after a coalesce, the
subrange incorrectly still does not include the use even though the
subreg is now defined at that point. The incorrect live range causes a
"subrange join unreachable" assert on a later coalesce.

This commit ensures that a subrange is extended to all uses that can be
reached from a definition.

V2: Completely different fix, instead of working round the problem in
the later coalesce that actually asserted.
V5: Fixed to not extend liveness to an undef use. Spun second test out
into its own D51257 as it is a completely different problem with the
same "subrange join unreachable" symptom.

tpr added a comment.Aug 25 2018, 9:00 AM

This fix adds code to scan all uses of a subrange and extend it, after every join. I can't think of a way of doing it in a lower impact way.

Which has got me thinking:

Am I right about all of the following?

  1. The decision on whether to join a copy is taken only with reference to each register's main range, not the subrange.
  2. The only reason it does similar stuff with conflict detection and resolution on the subranges is so it can modify them incrementally.
  3. This fix, scanning all uses and extending subranges after a join, pretty much nullifies the advantage of modifying subranges incrementally.

If so, why not remove all the code that incrementally modifies subranges, and instead remove the subranges of any register affected by a join, and recalculate them from scratch at the end of the whole pass? That would simplify the code, and we have found quite a few bugs in the way that subranges are incrementally modified.

In D49097#1213446, @tpr wrote:

This fix adds code to scan all uses of a subrange and extend it, after every join. I can't think of a way of doing it in a lower impact way.

Which has got me thinking:

Am I right about all of the following?

  1. The decision on whether to join a copy is taken only with reference to each register's main range, not the subrange.

Yes.

  1. The only reason it does similar stuff with conflict detection and resolution on the subranges is so it can modify them incrementally.

Yes, it's mostly about updating the subranges.

  1. This fix, scanning all uses and extending subranges after a join, pretty much nullifies the advantage of modifying subranges incrementally.

Two reasons:

  1. Updating the liveranges was considered better for compiletime (I mean you could remove even the non-subrange updating logic with the same reasoning to make the code even simpler...)
  2. Recalculating subregister liveness used to not work reliably after coalescing (There were corner cases with full register uses with only some lanes being defined, without having corresponding IMPLICIT_DEF instructions for the undefined lanes).

If so, why not remove all the code that incrementally modifies subranges, and instead remove the subranges of any register affected by a join, and recalculate them from scratch at the end of the whole pass? That would simplify the code, and we have found quite a few bugs in the way that subranges are incrementally modified.

I agree that the current code is highly complex and I don't like that either. I tried recalculating everything a couple years ago and could not get that working because of the recalculation problems, however it may be worth trying it again as the Undefs arrays passed around in LiveRangeCalc should/may fix all the problems in that area. Would be nice to see some experiments if we get the same liveness information as before when recalculating and how it would affect compile time...

tpr added a comment.Aug 28 2018, 12:21 AM

Surely that would be a problem if recalculating all the subranges, or the whole of live intervals, at the end of pass gives different results, because the idea of LLVM pass management is that semantically it doesn't matter if you recalculate stuff, and changing it incrementally is just a compile time optimization. Even if live intervals normally gets preserved through to register allocation, someone's backend might insert an extra pass that does not preserve live intervals, and that should not break everything.

So surely it ought to be possible to insert code to recalculate the live intervals at the end of RegisterCoalescing and assert if they end up (substantively) different, just to prove we've got the semantics right.

Anyway, notwithstanding all of the above, could someone review and hopefully approve the change as it please?

tpr updated this revision to Diff 163325.Aug 30 2018, 7:31 AM

V6: Ignore debug uses.

tpr updated this revision to Diff 163634.Sep 1 2018, 3:13 PM

V7: Set up undefs correctly, to avoid generating invalid subranges.

tpr added a comment.Sep 1 2018, 3:19 PM

This fix shows up the bug fixed in D51574.

arsenm added a subscriber: arsenm.Nov 1 2018, 7:11 PM

I've hit another instance of this error which this patch doesn't fix

I've reduced this testcase to this, although I think it can go a little further:

---
name: coalescing_makes_lane_defined
tracksRegLiveness: true
body:             |
  bb.0:
    successors: %bb.1, %bb.2

    %0:sreg_32_xm0 = S_MOV_B32 0
    undef %1.sub2:sreg_128 = COPY %0
    undef %2.sub0:sreg_128 = S_MOV_B32 0
    undef %3.sub2:sreg_128 = COPY %0
    S_CBRANCH_SCC0 %bb.2, implicit undef $scc

  bb.1:
    successors: %bb.2

    undef %4.sub0:sreg_128 = S_MOV_B32 -1
    %4.sub2:sreg_128 = COPY killed %0
    %5:sreg_128 = COPY killed %4
    %6:sreg_128 = COPY killed %1
    %3:sreg_128 = COPY killed %6
    %2:sreg_128 = COPY killed %5

  bb.2:
    S_NOP 0, implicit killed %3
    S_ENDPGM implicit %2

...
arsenm commandeered this revision.Nov 14 2018, 10:12 AM
arsenm added a reviewer: tpr.
arsenm updated this revision to Diff 174060.Nov 14 2018, 10:13 AM

Reduce testcase