This is an archive of the discontinued LLVM Phabricator instance.

RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG
AcceptedPublic

Authored by arsenm on Jul 26 2023, 9:38 AM.

Details

Summary

Currently coalescing with SUBREG_TO_REG introduces an invisible load
bearing undef. There is liveness for the super register not
represented in the MIR.

This is part 1 of a fix for regressions that appeared after
b7836d856206ec39509d42529f958c920368166b. The allocator started
recognizing undef-def subregister MOVs as copies. Since there was no
representation for the dependency on the high bits, different undef
segments of the super register ended up disconnected and downstream
users ended up observing different undefs than they did previously.

This does not yet fix the regression. The isCopyInstr handling needs
to start handling implicit-defs on any instruction.

I wanted to include an end to end IR test since the actual failure
only appeared with an interaction between the coalescer and the
allocator. It's a bit bigger than I'd like but I'm having a bit of
trouble reducing it to something which definitely shows a diff that's
meaningful.

The same problem likely exists everywhere trying to do anything with
SUBREG_TO_REG. I don't understand how this managed to be broken for so
long.

This needs to be applied to the release branch.

Diff Detail

Event Timeline

arsenm created this revision.Jul 26 2023, 9:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 26 2023, 9:38 AM
arsenm requested review of this revision.Jul 26 2023, 9:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 26 2023, 9:38 AM
Herald added a subscriber: wdng. · View Herald Transcript

The same problem likely exists everywhere trying to do anything with SUBREG_TO_REG. I don't understand how this managed to be broken for so long.

Wow, looking at the lowering of SUBREG_TO_REG, I agree with this assessment!
SUBREG_TO_REG is supposed to be an assert point and yet to we produce code that may break this assertion.

Like you said in a different PR, we likely want to use a tied-operand here. (Though we may have some infrastructure gap here to fix.)

Looking at the patch now.

This does not yet fix the regression. The isCopyInstr handling needs to start handling implicit-defs on any instruction.

What do you mean here?

In my mind, if we have implicit defs on a copy, then we should probably not consider the related instruction as a copy to begin with.

In my mind, if we have implicit defs on a copy, then we should probably not consider the related instruction as a copy to begin with.

Essentially that's the difference between isCopy and isCopyLike.
isCopyLike requires more careful handling.

Maybe that's the root cause of the problem. We're using isCopyInstr for isCopyLike instructions at places where isCopy (i.e., true copies, nothing else) was only supported. It's also possible that over the years we used isCopyLike in places where isCopy is the only thing supported. (I may be guilty of that :)).

Now, it shouldn't be that easy to shoot ourselves in the foot.

This does not yet fix the regression. The isCopyInstr handling needs to start handling implicit-defs on any instruction.

What do you mean here?

I mean this stack of patches is really 2 pieces, with supporting changes. This is part 1, D156346 is part 2 which stops considering MOV with implicit-def as a simple copy

In my mind, if we have implicit defs on a copy, then we should probably not consider the related instruction as a copy to begin with.

That's the net effect of D156346 (except for COPY proper)

Hi @arsenm ,

A maybe more robust temporary fix may be to apply the terminal rule (applyTerminalRule) only on isCopy instructions (right now it is on isCopyLike).

I could be wrong, but I wouldn't be surprised if this is the entry point of the problematic coalesced SUBREG_TO_REGs. And clearly without your fixes, the terminal rule is incorrect on SUBREG_TO_REG.

The reason I'm suggesting that is because I feel that properly fixing the liveness for SUBREG_TO_REG may take a while.
In particular the asserts you added are I believe possible to break.

Cheers,
-Quentin

llvm/lib/CodeGen/RegisterCoalescer.cpp
1889

It should be possible to produce a test case that hits this assertion today, no?
In other words, won't we just miscompile in release mode in these case.

2163

Maybe it can happen if you have something like:

%a = SUBREG_TO_REG ...
%b = IMPLICIT_DEF
%c = INSERT_SUBREG %b, %a, sub

>

%a = SUBREG_TO_REG ...
%c.sub = COPY %a

>

%c.sub = SUBREG_TO_REG ...
arsenm added inline comments.Jul 28 2023, 6:24 AM
llvm/lib/CodeGen/RegisterCoalescer.cpp
1889

In general, you would have to go out of your way to enable subregister liveness. Then I would expect a verifier error, and downstream uses to be about as broken as before.

Only AMDGPU, PowerPC, Hexagon RISCV enable it without a cl::opt, none of which use SUBREG_TO_REG.

PowerPC is potentially problematic, as it does use SUBREG_TO_REG and enables subregister liveness by default (TIL)

Hi @arsenm ,

A maybe more robust temporary fix may be to apply the terminal rule (applyTerminalRule) only on isCopy instructions (right now it is on isCopyLike).

I could be wrong, but I wouldn't be surprised if this is the entry point of the problematic coalesced SUBREG_TO_REGs. And clearly without your fixes, the terminal rule is incorrect on SUBREG_TO_REG.

"UseTerminalRule" is false by default. Swapping the isCopyLike in copyCoalesceInMBB for isCopy induces about 850 test failures

Hi @arsenm ,

A maybe more robust temporary fix may be to apply the terminal rule (applyTerminalRule) only on isCopy instructions (right now it is on isCopyLike).

I could be wrong, but I wouldn't be surprised if this is the entry point of the problematic coalesced SUBREG_TO_REGs. And clearly without your fixes, the terminal rule is incorrect on SUBREG_TO_REG.

"UseTerminalRule" is false by default. Swapping the isCopyLike in copyCoalesceInMBB for isCopy induces about 850 test failures

Ah right, that's the whole copyCoalesceInMBB method that uses isCopyLike not just the terminal rule.
Sounds like you have to push forward with your crusade!

arsenm added a comment.Aug 9 2023, 9:50 AM

ping, I don't really see a better option without ripping out SUBREG_TO_REG right now. We'd have to have some kind of way to go backwards from the defining instruction to an opcode that does define the whole register (which could have been done in tablegen during selection to begin with)

The code is reasonable to me, but I am worried by the asserts not being true in general.
I think we should indeed remove SUBREG_TO_REG in the long run and maybe short term, let's turn the asserts into fatal_errors.
That way if this breaks in the wild we'll get reports of the related errors instead of miscompiles.

What do you think?

llvm/lib/CodeGen/RegisterCoalescer.cpp
1889

PowerPC is potentially problematic, as it does use SUBREG_TO_REG and enables subregister liveness by default (TIL)

Did you get to the bottom of whether or not this is problematic?
(Meaning the assert may break?)

If we're unsure maybe it would be best to report a fatal_error at this point.

2163

Ditto on the can this happen.
Did you give a try to the example I gave?

arsenm added inline comments.Sep 12 2023, 5:24 AM
llvm/lib/CodeGen/RegisterCoalescer.cpp
1889

So if I force on coalescing and verify coalescing, only one PPC test fails. Interestingly, the failure isn't even a PPC failure. It for some reason has a run line using the default triple, such that it compiles for x86.

The PPC usage of SUBREG_TO_REG also looks different, it seems to be using this for vectors and for f64 values (using the hint value 1, not 0).

So my conclusion is that it's broken, but less likely to be observed. I think it's best to leave the subrange complexity for a separate patch, at least you hit the verifier error when it matters.

2163

If I force subregister liveness on ninja check does find the failures.

Also the baseline ninja check with verify-coalescing forced on isn't clean :(

arsenm added inline comments.Sep 12 2023, 7:30 AM
llvm/lib/CodeGen/RegisterCoalescer.cpp
2163

This does hit the assert / verifier error, so I'll push it somewhere

# RUN: llc -mtriple=x86_64-- -run-pass=register-coalescer -enable-subreg-liveness -o - %s
---
name:            test
tracksRegLiveness: true
body:             |
  bb.0:
    liveins: $eax
    %init_eax:gr32 = COPY $eax
    %a:gr64 = SUBREG_TO_REG 0, %init_eax, %subreg.sub_32bit
    %b:gr32 = IMPLICIT_DEF
    %c:gr64 = INSERT_SUBREG %a, %b, %subreg.sub_32bit
    JCC_1 %bb.2, 4, implicit undef $eflags

  bb.1:
    %imm0:gr32 = MOV32r0 implicit-def dead $eflags
    %a = SUBREG_TO_REG 0, %imm0, %subreg.sub_32bit
    %c.sub_32bit = COPY %a

  bb.2:
    %c.sub_32bit = SUBREG_TO_REG %a, %b, %subreg.sub_32bit
    RET 0, implicit %c

...
arsenm added inline comments.Sep 12 2023, 7:53 AM
llvm/lib/CodeGen/RegisterCoalescer.cpp
2163

For some reason this failure only manifests further up the stack of patches

arsenm added inline comments.Sep 12 2023, 7:57 AM
llvm/lib/CodeGen/RegisterCoalescer.cpp
2163

Oh, it's because this required hacking in enabling subreg liveness for x86. The flag apparently doesn't override the lack of implementation of the target hook

arsenm updated this revision to Diff 556581.Sep 12 2023, 8:23 AM

Add test that fails with subreg liveness enabled on x86. Currently the -enable-subreg-liveness flag is useless because it requires the target to enable it in addition to the flag (which already defaults to true)

qcolombet accepted this revision.Sep 18 2023, 5:52 AM
This revision is now accepted and ready to land.Sep 18 2023, 5:52 AM

we're seeing crashes around RegisterCoalescer. I haven't bisected exactly which patch yet but I figured it's one of these patches

$ cat /tmp/a.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define i1 @_ZN4llvm8LLParser17parseDIEnumeratorERPNS_6MDNodeEb(i32 %0) {
entry:
  switch i32 %0, label %if.then.i.i [
    i32 1, label %"_ZN4llvm8LLParser17parseMDFieldsImplIZNS0_17parseDIEnumeratorERPNS_6MDNodeEbE3$_0EEbT_RNS_5SMLocE.exit"
    i32 0, label %if.end.i.i.preheader
  ]

if.end.i.i.preheader:                             ; preds = %entry
  br label %if.then.i.i

if.then.i.i:                                      ; preds = %if.end.i.i.preheader, %entry
  %1 = phi i64 [ 0, %entry ], [ 1, %if.end.i.i.preheader ]
  %2 = phi i8 [ 0, %entry ], [ 1, %if.end.i.i.preheader ]
  br label %"_ZN4llvm8LLParser17parseMDFieldsImplIZNS0_17parseDIEnumeratorERPNS_6MDNodeEbE3$_0EEbT_RNS_5SMLocE.exit"

"_ZN4llvm8LLParser17parseMDFieldsImplIZNS0_17parseDIEnumeratorERPNS_6MDNodeEbE3$_0EEbT_RNS_5SMLocE.exit": ; preds = %if.then.i.i, %entry
  %3 = phi i64 [ 0, %entry ], [ %1, %if.then.i.i ]
  %4 = phi i8 [ 0, %entry ], [ %2, %if.then.i.i ]
  %5 = phi i8 [ 0, %entry ], [ 1, %if.then.i.i ]
  %tobool7.not = icmp eq i8 %5, 0
  br i1 %tobool7.not, label %if.then8, label %do.end

if.then8:                                         ; preds = %"_ZN4llvm8LLParser17parseMDFieldsImplIZNS0_17parseDIEnumeratorERPNS_6MDNodeEbE3$_0EEbT_RNS_5SMLocE.exit"
  ret i1 false

do.end:                                           ; preds = %"_ZN4llvm8LLParser17parseMDFieldsImplIZNS0_17parseDIEnumeratorERPNS_6MDNodeEbE3$_0EEbT_RNS_5SMLocE.exit"
  store i64 %3, ptr null, align 8
  store i8 %4, ptr null, align 4
  ret i1 false
}
$ llc /tmp/a.ll
llc: ../../llvm/lib/CodeGen/RegisterCoalescer.cpp:1443: bool (anonymous namespace)::RegisterCoalescer::reMaterializeTrivialDef(const CoalescerPair &, MachineInstr *, bool &): Assertion `MO.getReg() == NewMI.getO
perand(0).getReg() && MO.getSubReg() == 0' failed.
fhahn added a subscriber: fhahn.Oct 2 2023, 10:38 AM

FYI it looks like this is causing crashes on ARM64.

Reproducer for llc:

target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
target triple = "arm64-apple-macosx"

declare i32 @fprintf(ptr, ptr, ...)

declare i32 @putc(i32, ptr)

define void @widget(i32 %arg, i32 %arg1, ptr %arg2, ptr %arg3, ptr %arg4, i32 %arg5, i1 %arg6) {
bb:
  br label %bb7

bb7:                                              ; preds = %bb14, %bb13, %bb
  %phi = phi i32 [ undef, %bb ], [ %mul, %bb14 ], [ %mul, %bb13 ]
  br label %bb8

bb8:                                              ; preds = %bb10, %bb9, %bb8, %bb7
  switch i32 %arg1, label %bb8 [
    i32 10, label %bb9
    i32 64, label %bb16
    i32 0, label %bb13
    i32 39, label %bb10
    i32 34, label %bb10
    i32 1, label %bb12
  ]

bb9:                                              ; preds = %bb8
  store i32 1, ptr %arg2, align 4
  br label %bb8

bb10:                                             ; preds = %bb8, %bb8
  store i32 0, ptr %arg4, align 4
  %call = tail call i32 @putc(i32 %arg5, ptr %arg3)
  %call11 = tail call i32 @putc(i32 %arg, ptr null)
  br label %bb8

bb12:                                             ; preds = %bb12, %bb8
  br label %bb12

bb13:                                             ; preds = %bb8
  %mul = mul i32 1, 1
  br i1 %arg6, label %bb14, label %bb7

bb14:                                             ; preds = %bb13
  %call15 = tail call i32 (ptr, ptr, ...) @fprintf(ptr null, ptr null, ptr null, i32 0, i32 %mul, ptr null)
  br label %bb7

bb16:                                             ; preds = %bb8
  %call17 = tail call i32 (ptr, ptr, ...) @fprintf(ptr null, ptr null, i32 %phi)
  unreachable
}

also a slightly different assert running llc on the following

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"                                                                                                                       
target triple = "x86_64-unknown-linux-gnu"                                                                                                                                                                         
                                                                                                                                                                                                                   
%struct.BlockContext = type { [32 x i8], [32 x i8], [2 x [32 x i8]], [32 x i8], [32 x i8], [32 x i8], [32 x i8], [32 x i8], [2 x [32 x i8]], [2 x [32 x i8]], [32 x i8], [32 x i8], [32 x i8], [32 x i8], [16 x i8]
, [32 x i8], [32 x i8] }                                                                                                                                                                                           
%struct.CdfModeContext = type { [4 x [16 x i16]], [2 x [13 x [16 x i16]]], [9 x [16 x i16]], [5 x [4 x [16 x i16]]], [6 x [16 x i16]], [2 x [16 x i16]], [16 x i16], [2 x [13 x [8 x i16]]], [3 x [13 x [8 x i16]]]
, [8 x i16], [8 x [8 x i16]], [8 x i16], [8 x [8 x i16]], [3 x [8 x i16]], [2 x [7 x [8 x i16]]], [2 x [7 x [5 x [8 x i16]]]], [2 x [8 x [4 x i16]]], [4 x [3 x [4 x i16]]], [22 x [4 x i16]], [4 x i16], [5 x [4 x
 i16]], [4 x [4 x i16]], [4 x i16], [2 x i16], [2 x i16], [7 x [2 x i16]], [7 x [2 x i16]], [4 x [2 x i16]], [22 x [2 x i16]], [6 x [2 x i16]], [2 x [2 x i16]], [6 x [2 x i16]], [3 x [2 x i16]], [4 x [2 x i16]],
 [5 x [2 x i16]], [5 x [2 x i16]], [6 x [2 x i16]], [6 x [2 x i16]], [9 x [2 x i16]], [6 x [3 x [2 x i16]]], [3 x [3 x [2 x i16]]], [2 x [3 x [2 x i16]]], [3 x [3 x [2 x i16]]], [7 x [3 x [2 x i16]]], [3 x [2 x 
i16]], [3 x [2 x i16]], [3 x [2 x i16]], [22 x [2 x i16]], [7 x [3 x [2 x i16]]], [2 x [2 x i16]], [2 x i16], [8 x i8] }                                                                                           
                                                                                                                                                                                                                   
define i32 @decode_sb(ptr %t, i32 %bl, i32 %_msprop1966, i32 %sub.i, i64 %idxprom, i1 %cmp54) #0 {                                                                                                                 
entry:                                                                                                                                                                                                             
  %0 = load i32, ptr null, align 8                                                                                                                                                                                 
  br i1 %cmp54, label %if.end69, label %if.else                                                                                                                                                                    
                                                                                                                                                                                                                   
if.else:                                          ; preds = %entry                                                                                                                                                 
  %shr18 = and i32 %sub.i, 1                                                                                                                                                                                       
  %idxprom.i = zext i32 %shr18 to i64                                                                                                                                                                              
  %arrayidx.i = getelementptr %struct.BlockContext, ptr null, i64 0, i32 14, i64 %idxprom.i                                                                                                                        
  %1 = load i8, ptr %arrayidx.i, align 1                                                                                                                                                                           
  %conv.i = zext i8 %1 to i32                                                                                                                                                                                      
  %and.i = and i32 %conv.i, 1                                                                                                                                                                                      
  %2 = and i64 87960930222080, 1                                                                                                                                                                                   
  %3 = inttoptr i64 %2 to ptr                                                                                                                                                                                      
  %4 = load i32, ptr %3, align 4                                                                                                                                                                                   
  %5 = and i64 %idxprom, 1                                                                                                                                                                                         
  %6 = or i64 %5, 17592186044416                                                                                                                                                                                   
  %7 = inttoptr i64 %6 to ptr
  %8 = load i32, ptr %7, align 4
  %9 = lshr i32 %bl, %sub.i
  %10 = and i32 %9, 2
  %11 = or i32 %bl, %10
  %12 = select i1 %cmp54, i32 %8, i32 %4
  %add.i = or i32 %_msprop1966, %and.i
  %idxprom4 = zext i32 %bl to i64
  %idxprom24 = zext i32 %add.i to i64
  %13 = or i32 %0, 1
  %14 = zext i32 %13 to i64
  %.not2329 = icmp eq i32 %11, 0
  %15 = select i1 %.not2329, i32 1, i32 %12
  %arrayidx25 = getelementptr %struct.CdfModeContext, ptr null, i64 0, i32 3, i64 %idxprom4, i64 %idxprom24
  store i64 %14, ptr null, align 8
  store i32 %15, ptr %t, align 4
  %call53 = tail call i32 null(ptr null, ptr %arrayidx25, i64 0)
  %16 = xor i64 %idxprom, 1
  %17 = inttoptr i64 %16 to ptr
  %_msld1992 = load i32, ptr %17, align 8
  %18 = icmp ne i32 %_msld1992, 0
  %_msprop_icmp1993 = and i1 %18, false
  br i1 %_msprop_icmp1993, label %19, label %20

19:                                               ; preds = %if.else
  unreachable

20:                                               ; preds = %if.else
  br i1 %cmp54, label %land.lhs.true56, label %if.end69

land.lhs.true56:                                  ; preds = %20
  ret i32 0

if.end69:                                         ; preds = %20, %entry
  %bx8.011941201 = phi i32 [ %shr18, %20 ], [ undef, %entry ]
  store i32 %0, ptr null, align 8
  %call79 = tail call fastcc i32 null(ptr %t, i32 0, i32 0, i32 0, i32 0)
  %idxprom666 = zext i32 %bl to i64
  %21 = xor i64 %idxprom666, 87960930222080
  %idxprom675 = sext i32 %bx8.011941201 to i64
  %arrayidx676 = getelementptr %struct.BlockContext, ptr null, i64 0, i32 14, i64 %idxprom675
  %22 = inttoptr i64 %21 to ptr
  %_msld1414 = load i8, ptr %22, align 1
  store i8 %_msld1414, ptr %arrayidx676, align 1
  ret i32 0
}

attributes #0 = { "frame-pointer"="all" "target-cpu"="x86-64" }

clang: /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/lib/CodeGen/InlineSpiller.cpp:1075: bool isRealSpill(const MachineInstr &): Assertion Def.getNumOperands() == 1 && "Implicit def with more than one definition"' failed.`

arsenm reopened this revision.Oct 5 2023, 9:56 AM
This revision is now accepted and ready to land.Oct 5 2023, 9:56 AM