Page MenuHomePhabricator

[CodeGen][TailDuplicator] Don't duplicate blocks with INLINEASM_BR
ClosedPublic

Authored by void on Oct 5 2020, 4:24 AM.

Details

Summary

Tail duplication of a block with an INLINEASM_BR may result in a PHI
node on the indirect branch. This is okay, but it also introduces a copy
for that PHI node *after* the INLINEASM_BR, which is not okay.

See: https://github.com/ClangBuiltLinux/linux/issues/1125

Diff Detail

Event Timeline

void created this revision.Oct 5 2020, 4:24 AM
Herald added a project: Restricted Project. ยท View Herald TranscriptOct 5 2020, 4:24 AM
void requested review of this revision.Oct 5 2020, 4:24 AM

Hm, it doesn't seem like it should be a problem to duplicate an inlineasm_br instruction. I'm interested to see what's going wrong here.

void added a comment.Oct 5 2020, 10:01 AM

Hm, it doesn't seem like it should be a problem to duplicate an inlineasm_br instruction. I'm interested to see what's going wrong here.

After sleeping on it, I think it's when a PHI node's created in the indirect destination. We can't handle that, because we assume that any variable used in an indirect path dominates all blocks coming into it. I'll update this bug to reflect that.

void updated this revision to Diff 296216.Oct 5 2020, 10:08 AM

Improve comment on why we're limiting tail duplication on INLINEASM_BR instruction.

void updated this revision to Diff 296237.Oct 5 2020, 11:10 AM

Add testcase.

void edited the summary of this revision. (Show Details)Oct 5 2020, 12:40 PM

I confirmed that this patch does fix the kernel panic observed in https://github.com/ClangBuiltLinux/linux/issues/1125#issuecomment-703890483.

I don't understand the bug though. Looking at the output of:

llc -mtriple=i686-- -stop-after=early-tailduplication -print-after-all llvm/test/CodeGen/X86/tail-dup-asm-goto.ll -o - 2>&1 | less

Before/After IR Dump After Early Tail Duplication before rebuilding clang with this patch applied (so still in a bad state), but nothing looks wrong to me. So what precisely is the bug?

Before early tail dup:

bb.0.bb:
  successors: %bb.1(0x40000000), %bb.2(0x40000000); %bb.1(50.00%), %bb.2(50.00%)

  %2:gr32 = IMPLICIT_DEF
  %0:gr32 = MOV32rm killed %2:gr32, 1, $noreg, 0, $noreg :: (load 4 from `i8** undef`, align 8)
  %3:gr32_abcd = MOV32r0 implicit-def dead $eflags
  %4:gr8 = COPY %3.sub_8bit:gr32_abcd
  TEST8rr %4:gr8, %4:gr8, implicit-def $eflags
  JCC_1 %bb.2, 5, implicit $eflags
  JMP_1 %bb.1

bb.1.bb100:
; predecessors: %bb.0
  successors: %bb.3(0x80000000); %bb.3(100.00%)

  %6:gr32 = IMPLICIT_DEF
  MOV32mi killed %6:gr32, 1, $noreg, 0, $noreg, 0 :: (store 4 into `i8** undef`, align 8)
  JMP_1 %bb.3

bb.2.bb106:
; predecessors: %bb.0
  successors: %bb.3(0x80000000); %bb.3(100.00%)

  ADJCALLSTACKDOWN32 0, 0, 0, implicit-def dead $esp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $esp, implicit $ssp
  CALLpcrel32 @foo, <regmask $bh $bl $bp $bph $bpl $bx $di $dih $dil $ebp $ebx $edi $esi $hbp $hbx $hdi $hsi $si $sih $sil>, implicit $esp, implicit $ssp, implicit-def $esp, implicit-def 
$ssp
  ADJCALLSTACKUP32 0, 0, implicit-def dead $esp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $esp, implicit $ssp
  %5:gr32 = MOV32r0 implicit-def dead $eflags

bb.3.bb110:
; predecessors: %bb.2, %bb.1
  successors: %bb.5(0x80000000), %bb.4(0x00000000); %bb.5(100.00%), %bb.4(0.00%)

  %1:gr32 = PHI %5:gr32, %bb.2, %0:gr32, %bb.1
  INLINEASM_BR &"" [sideeffect] [mayload] [attdialect], $0:[imm], 42, $1:[imm], 0, $2:[imm], blockaddress(@test1, %ir-block.bb17.i.i.i), $3:[clobber], implicit-def early-clobber $df, $4:[
clobber], implicit-def early-clobber $fpsw, $5:[clobber], implicit-def early-clobber $eflags
  JMP_1 %bb.5

bb.4.bb17.i.i.i (address-taken):
; predecessors: %bb.3
  successors: %bb.5(0x80000000); %bb.5(100.00%)


bb.5.kmem_cache_has_cpu_partial.exit:
; predecessors: %bb.3, %bb.4

  $eax = COPY %1:gr32
  RET 0, $eax

After early tail dup:

bb.0.bb:
  successors: %bb.1(0x40000000), %bb.2(0x40000000); %bb.1(50.00%), %bb.2(50.00%)

  %2:gr32 = IMPLICIT_DEF
  %0:gr32 = MOV32rm killed %2:gr32, 1, $noreg, 0, $noreg :: (load 4 from `i8** undef`, align 8)
  %3:gr32_abcd = MOV32r0 implicit-def dead $eflags
  %4:gr8 = COPY %3.sub_8bit:gr32_abcd
  TEST8rr %4:gr8, %4:gr8, implicit-def $eflags
  JCC_1 %bb.2, 5, implicit $eflags
  JMP_1 %bb.1

bb.1.bb100:
; predecessors: %bb.0
  successors: %bb.5(0x80000000), %bb.4(0x00000000); %bb.5(100.00%), %bb.4(0.00%)

  %6:gr32 = IMPLICIT_DEF
  MOV32mi killed %6:gr32, 1, $noreg, 0, $noreg, 0 :: (store 4 into `i8** undef`, align 8)
  INLINEASM_BR &"" [sideeffect] [mayload] [attdialect], $0:[imm], 42, $1:[imm], 0, $2:[imm], blockaddress(@test1, %ir-block.bb17.i.i.i), $3:[clobber], implicit-def early-clobber $df, $4:[clobber], implicit-def early-clobber $fpsw, $5:[clobber], implicit-def early-clobber $eflags
  JMP_1 %bb.5

bb.2.bb106:
; predecessors: %bb.0
  successors: %bb.5(0x80000000), %bb.4(0x00000000); %bb.5(100.00%), %bb.4(0.00%)

  ADJCALLSTACKDOWN32 0, 0, 0, implicit-def dead $esp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $esp, implicit $ssp
  CALLpcrel32 @foo, <regmask $bh $bl $bp $bph $bpl $bx $di $dih $dil $ebp $ebx $edi $esi $hbp $hbx $hdi $hsi $si $sih $sil>, implicit $esp, implicit $ssp, implicit-def $esp, implicit-def $ssp
  ADJCALLSTACKUP32 0, 0, implicit-def dead $esp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $esp, implicit $ssp
  %5:gr32 = MOV32r0 implicit-def dead $eflags
  INLINEASM_BR &"" [sideeffect] [mayload] [attdialect], $0:[imm], 42, $1:[imm], 0, $2:[imm], blockaddress(@test1, %ir-block.bb17.i.i.i), $3:[clobber], implicit-def early-clobber $df, $4:[clobber], implicit-def early-clobber $fpsw, $5:[clobber], implicit-def early-clobber $eflags
  JMP_1 %bb.5

bb.4.bb17.i.i.i (address-taken):
; predecessors: %bb.1, %bb.2
  successors: %bb.5(0x80000000); %bb.5(100.00%)

  %9:gr32 = PHI %0:gr32, %bb.1, %5:gr32, %bb.2

bb.5.kmem_cache_has_cpu_partial.exit:
; predecessors: %bb.4, %bb.1, %bb.2

  %10:gr32 = PHI %9:gr32, %bb.4, %0:gr32, %bb.1, %5:gr32, %bb.2
  $eax = COPY %10:gr32
  RET 0, $eax

The comment

We assume that all values used on the indirect path
dominates all paths into the indirect block, i.e. don't have PHI nodes.

Doesn't make sense to me. Which PHI node in the above post-early-tail-dup is problematic?

I wonder if there's perhaps a later pass that's messing up, and this change is simply hiding that? Or if duplicating the INLINEASM_BR results in multiple side effects? (Even if the code is never run, the kernel uses .pushsection to squirrel away data in these, so tail duplication is duplicating the data. Is that a problem? Not sure yet.)

void added a comment.Oct 5 2020, 2:57 PM

I confirmed that this patch does fix the kernel panic observed in https://github.com/ClangBuiltLinux/linux/issues/1125#issuecomment-703890483.

I don't understand the bug though. Looking at the output of:

llc -mtriple=i686-- -stop-after=early-tailduplication -print-after-all llvm/test/CodeGen/X86/tail-dup-asm-goto.ll -o - 2>&1 | less

Before/After IR Dump After Early Tail Duplication before rebuilding clang with this patch applied (so still in a bad state), but nothing looks wrong to me. So what precisely is the bug?

Before early tail dup:

bb.0.bb:
  successors: %bb.1(0x40000000), %bb.2(0x40000000); %bb.1(50.00%), %bb.2(50.00%)

  %2:gr32 = IMPLICIT_DEF
  %0:gr32 = MOV32rm killed %2:gr32, 1, $noreg, 0, $noreg :: (load 4 from `i8** undef`, align 8)
  %3:gr32_abcd = MOV32r0 implicit-def dead $eflags
  %4:gr8 = COPY %3.sub_8bit:gr32_abcd
  TEST8rr %4:gr8, %4:gr8, implicit-def $eflags
  JCC_1 %bb.2, 5, implicit $eflags
  JMP_1 %bb.1

bb.1.bb100:
; predecessors: %bb.0
  successors: %bb.3(0x80000000); %bb.3(100.00%)

  %6:gr32 = IMPLICIT_DEF
  MOV32mi killed %6:gr32, 1, $noreg, 0, $noreg, 0 :: (store 4 into `i8** undef`, align 8)
  JMP_1 %bb.3

bb.2.bb106:
; predecessors: %bb.0
  successors: %bb.3(0x80000000); %bb.3(100.00%)

  ADJCALLSTACKDOWN32 0, 0, 0, implicit-def dead $esp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $esp, implicit $ssp
  CALLpcrel32 @foo, <regmask $bh $bl $bp $bph $bpl $bx $di $dih $dil $ebp $ebx $edi $esi $hbp $hbx $hdi $hsi $si $sih $sil>, implicit $esp, implicit $ssp, implicit-def $esp, implicit-def 
$ssp
  ADJCALLSTACKUP32 0, 0, implicit-def dead $esp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $esp, implicit $ssp
  %5:gr32 = MOV32r0 implicit-def dead $eflags

bb.3.bb110:
; predecessors: %bb.2, %bb.1
  successors: %bb.5(0x80000000), %bb.4(0x00000000); %bb.5(100.00%), %bb.4(0.00%)

  %1:gr32 = PHI %5:gr32, %bb.2, %0:gr32, %bb.1

Note that we have a PHI node here that passes through the indirect branch (starting at bb.4.bb17.i.i.i) and used in bb.5.kmem_cache_has_cpu_partial.exit.

INLINEASM_BR &"" [sideeffect] [mayload] [attdialect], $0:[imm], 42, $1:[imm], 0, $2:[imm], blockaddress(@test1, %ir-block.bb17.i.i.i), $3:[clobber], implicit-def early-clobber $df, $4:[

clobber], implicit-def early-clobber $fpsw, $5:[clobber], implicit-def early-clobber $eflags

JMP_1 %bb.5

The duplication of bb.3.bb110 has the effect that the PHI is now pushed down into the indirect destination block. See below.

bb.4.bb17.i.i.i (address-taken):
; predecessors: %bb.3

successors: %bb.5(0x80000000); %bb.5(100.00%)

bb.5.kmem_cache_has_cpu_partial.exit:
; predecessors: %bb.3, %bb.4

$eax = COPY %1:gr32
RET 0, $eax
After early tail dup:

bb.0.bb:

successors: %bb.1(0x40000000), %bb.2(0x40000000); %bb.1(50.00%), %bb.2(50.00%)

%2:gr32 = IMPLICIT_DEF
%0:gr32 = MOV32rm killed %2:gr32, 1, $noreg, 0, $noreg :: (load 4 from `i8** undef`, align 8)
%3:gr32_abcd = MOV32r0 implicit-def dead $eflags
%4:gr8 = COPY %3.sub_8bit:gr32_abcd
TEST8rr %4:gr8, %4:gr8, implicit-def $eflags
JCC_1 %bb.2, 5, implicit $eflags
JMP_1 %bb.1

bb.1.bb100:
; predecessors: %bb.0

successors: %bb.5(0x80000000), %bb.4(0x00000000); %bb.5(100.00%), %bb.4(0.00%)

%6:gr32 = IMPLICIT_DEF
MOV32mi killed %6:gr32, 1, $noreg, 0, $noreg, 0 :: (store 4 into `i8** undef`, align 8)
INLINEASM_BR &"" [sideeffect] [mayload] [attdialect], $0:[imm], 42, $1:[imm], 0, $2:[imm], blockaddress(@test1, %ir-block.bb17.i.i.i), $3:[clobber], implicit-def early-clobber $df, $4:[clobber], implicit-def early-clobber $fpsw, $5:[clobber], implicit-def early-clobber $eflags
JMP_1 %bb.5

bb.2.bb106:
; predecessors: %bb.0

successors: %bb.5(0x80000000), %bb.4(0x00000000); %bb.5(100.00%), %bb.4(0.00%)

ADJCALLSTACKDOWN32 0, 0, 0, implicit-def dead $esp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $esp, implicit $ssp
CALLpcrel32 @foo, <regmask $bh $bl $bp $bph $bpl $bx $di $dih $dil $ebp $ebx $edi $esi $hbp $hbx $hdi $hsi $si $sih $sil>, implicit $esp, implicit $ssp, implicit-def $esp, implicit-def $ssp
ADJCALLSTACKUP32 0, 0, implicit-def dead $esp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $esp, implicit $ssp
%5:gr32 = MOV32r0 implicit-def dead $eflags
INLINEASM_BR &"" [sideeffect] [mayload] [attdialect], $0:[imm], 42, $1:[imm], 0, $2:[imm], blockaddress(@test1, %ir-block.bb17.i.i.i), $3:[clobber], implicit-def early-clobber $df, $4:[clobber], implicit-def early-clobber $fpsw, $5:[clobber], implicit-def early-clobber $eflags
JMP_1 %bb.5

The PHI node that was in bb.3.bb110 was moved back into the predecessors: %6:gr32 = IMPLICIT_DEF and %5:gr32 = MOV32r0 implicit-def dead $eflags. (Note that %5.gr32 ... is now with the INLINEASM_BR due to the predecessor and bb.3.bb110 merging.) This doesn't resolve the PHI node though, so it ends up in the indirect destination block (bb.4.bb17.i.i.i). Those values also end up in the PHI node in the default destination.

bb.4.bb17.i.i.i (address-taken):
; predecessors: %bb.1, %bb.2

successors: %bb.5(0x80000000); %bb.5(100.00%)

%9:gr32 = PHI %0:gr32, %bb.1, %5:gr32, %bb.2

The problem is this PHI here. We don't have a way to resolve PHI nodes in the entry block to an indirect destination. (This is the reason we don't allow "asm goto" outputs on the indirect paths.) When we go to resolve this PHI node, there's no place to put the put the values that works in all situations; we can't split the critical edges and it's not valid to place the resolved instructions at the ends of the predecessor blocks.

bb.5.kmem_cache_has_cpu_partial.exit:
; predecessors: %bb.4, %bb.1, %bb.2

%10:gr32 = PHI %9:gr32, %bb.4, %0:gr32, %bb.1, %5:gr32, %bb.2
$eax = COPY %10:gr32
RET 0, $eax
The comment

> We assume that all values used on the indirect path
> dominates all paths into the indirect block, i.e. don't have PHI nodes.

Doesn't make sense to me.  Which PHI node in the above post-early-tail-dup is problematic?

I wonder if there's perhaps a later pass that's messing up, and this change is simply hiding that? Or if duplicating the `INLINEASM_BR` results in multiple side effects? (Even if the code is never run, the kernel uses `.pushsection` to squirrel away data in these, so tail duplication is duplicating the data.  Is that a problem? Not sure yet.)

Later passes are messing up because tail duplication created a situation they're not able to handle.

%9:gr32 = PHI %0:gr32, %bb.1, %5:gr32, %bb.2

The problem is this PHI here. We don't have a way to resolve PHI nodes in the entry block to an indirect destination. (This is the reason we don't allow "asm goto" outputs on the indirect paths.) When we go to resolve this PHI node, there's no place to put the put the values that works in all situations; we can't split the critical edges and it's not valid to place the resolved instructions at the ends of the predecessor blocks.

We're also not using "asm goto" outputs in this test case, so I still don't understand why this PHI is problematic.

We don't have a way to resolve PHI nodes in the entry block to an indirect destination.

We should be able to catch cases where we have a MachineBasicBlock whose address is taken, yet starts with a phi, right? Then an assert or -verify-machineinstrs check should help avoid producing such cases.

diff --git a/llvm/lib/CodeGen/MachineVerifier.cpp b/llvm/lib/CodeGen/MachineVerifier.cpp
index 870b4606e51d..dbd1e71c8619 100644
--- a/llvm/lib/CodeGen/MachineVerifier.cpp
+++ b/llvm/lib/CodeGen/MachineVerifier.cpp
@@ -2408,6 +2408,11 @@ void MachineVerifier::checkPHIOps(const MachineBasicBlock &MBB) {
             !PrInfo.isLiveOut(MO0.getReg()))
           report("PHI operand is not live-out from predecessor", &MO0, I);
       }
+
+      if (MBB.isInlineAsmBrIndirectTarget())
+        for (const MachineInstr &MI: Pre)
+          if (MI.getOpcode() == TargetOpcode::INLINEASM_BR)
+            report("PHI in successor of INLINEASM_BR predecassor", &MO1, I);
     }
 
     // Did we see all predecessors?

fails in the existing test cases:

Failed Tests (2):
  LLVM :: CodeGen/X86/callbr-asm-outputs.ll
  LLVM :: CodeGen/X86/callbr-asm-phi-placement.ll

Maybe there's a more specific case that's problematic?


I feel like we SHOULD be able to allow tail dup in this basic case:

declare void @foo()
declare void @bar()

define i32 @baz(i1 %arg_0) {
  br i1 %arg_0, label %bb1, label %bb2
bb1:
  call void @foo()
  br label %bb3
bb2:
  call void @bar()
  br label %bb3
bb3:
  %x = phi i32 [0, %bb1], [1, %bb2]
  callbr void asm sideeffect "jmp ${2:l}", "i,i,X,~{dirflag},~{fpsr},~{flags}"(i32 42, i1 false, i8* blockaddress(@baz, %bb4))
          to label %bb5 [label %bb4]
bb4:
  br label %bb5
bb5:
  ret i32 %x
}

Prior to this patch and James' patch, we would tail duplicate that. With this change, we wont. (It looks to me like we get the codegen of that case correct).

When we go to resolve this PHI node, there's no place to put the put the values that works in all situations; we can't split the critical edges and it's not valid to place the resolved instructions at the ends of the predecessor blocks.

We can (at least: should be able to) handle a PHI node in the indirect targets of an INLINEASM_BR, for any value which is not the output of the INLINEASM_BR itself. We place the required register copies prior to the INLINEASM_BR, rather than after it. This is handled by findPHICopyInsertPoint -- where we did have a bug already, fixed in f7a53d82c0902147909f28a9295a9d00b4b27d38.

I also don't see anything wrong in this test-case. AFAICT, the phi looks to be handled correctly here. Perhaps it's been minimized too far, and is no longer showing the actual problem?

void added a comment.Oct 5 2020, 7:52 PM

%9:gr32 = PHI %0:gr32, %bb.1, %5:gr32, %bb.2

The problem is this PHI here. We don't have a way to resolve PHI nodes in the entry block to an indirect destination. (This is the reason we don't allow "asm goto" outputs on the indirect paths.) When we go to resolve this PHI node, there's no place to put the put the values that works in all situations; we can't split the critical edges and it's not valid to place the resolved instructions at the ends of the predecessor blocks.

We're also not using "asm goto" outputs in this test case, so I still don't understand why this PHI is problematic.

That doesn't matter. It's problematic because of the PHI in the block with the INLINEASM_BR. As I mentioned, the result of the tail duplication is that that PHI is more-or-less moved *down* into the indirect and default destinations. Look at those PHIs and you'll see that they have the same values coming into them.

We don't have a way to resolve PHI nodes in the entry block to an indirect destination.

We should be able to catch cases where we have a MachineBasicBlock whose address is taken, yet starts with a phi, right? Then an assert or -verify-machineinstrs check should help avoid producing such cases.

That's not what's wrong. A PHI is fine in the default branch. It shouldn't exist in the indirect branch, because *the back end can't currently handle it*. And even if it could, the semantics of the asm goto just don't allow for a PHI to exist in the indirect branch.

void added a comment.Oct 5 2020, 7:58 PM

When we go to resolve this PHI node, there's no place to put the put the values that works in all situations; we can't split the critical edges and it's not valid to place the resolved instructions at the ends of the predecessor blocks.

We can (at least: should be able to) handle a PHI node in the indirect targets of an INLINEASM_BR, for any value which is not the output of the INLINEASM_BR itself. We place the required register copies prior to the INLINEASM_BR, rather than after it. This is handled by findPHICopyInsertPoint -- where we did have a bug already, fixed in f7a53d82c0902147909f28a9295a9d00b4b27d38.

I disagree. You'll be inserting an instruction that won't be executed on the default path before the asm goto block. While this may be okay in a majority of cases, I don't suspect that it'll be good to rely on it being always okay.

I also don't see anything wrong in this test-case. AFAICT, the phi looks to be handled correctly here. Perhaps it's been minimized too far, and is no longer showing the actual problem?

The testcase is very simplified from the original. The testcase replicates the problematic tail duplication, but not the issue that caused the memory corruption. You can see the .ll files in the bug that's associated with it.

I've explained way too many times what this patch is doing. You and Nick seem to think that this patch is bad for some reason or that it's confusing. If so, then please present one of your own.

void added a comment.Oct 5 2020, 8:21 PM

Here are the relevant parts of the original issue in get_partial_node. Blocks bb.16.bb100 and bb.17.bb106 are the sources of the values in the PHI node in bb.18.bb110. The bb.19.bb17.i.i.i block is the indirect destination and bb.20.bb113 the default destination.

Before Tail Duplication:

bb.16.bb100:
; predecessors: %bb.15
  successors: %bb.18(0x80000000); %bb.18(100.00%)

  MOV64mr %21:gr64, 1, $noreg, 16, $noreg, %7:gr64 :: (store 8 into %ir.17)
  JMP_1 %bb.18

bb.17.bb106:
; predecessors: %bb.15
  successors: %bb.18(0x80000000); %bb.18(100.00%)

  ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
  %56:gr32 = MOV32r0 implicit-def dead $eflags
  $rdi = COPY %19:gr64
  $rsi = COPY %7:gr64
  $edx = COPY %56:gr32
  CALL64pcrel32 @put_cpu_partial, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit $edx, implicit-def $rsp, implicit-def $ssp
  ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp

bb.18.bb110:
; predecessors: %bb.17, %bb.16
  successors: %bb.20(0x80000000), %bb.19(0x00000000); %bb.20(100.00%), %bb.19(0.00%)

  %16:gr64 = PHI %4:gr64, %bb.17, %13:gr64, %bb.16
  INLINEASM_BR &"1:.byte 0x0f,0x1f,0x44,0x00,0\0A\09.pushsection __jump_table,  \22aw\22 \0A\09 .balign 8 \0A\09.long 1b - ., ${2:l} - . \0A\09 .quad ${0:c} + ${1:c} - .\0A\09.popsection \0A\09" [sideeffect] [mayload] [attdialect], $0:[imm], @slub_debug_enabled, $1:[imm], 0, $2:[imm], blockaddress(@get_partial_node, %ir-block.bb17.i.i.i), $3:[clobber], implicit-def early-clobber $df, $4:[clobber], implicit-def early-clobber $fpsw, $5:[clobber], implicit-def early-clobber $eflags, !11
  JMP_1 %bb.20

bb.19.bb17.i.i.i (address-taken):
; predecessors: %bb.18
  successors: %bb.20(0x7c000000), %bb.21(0x04000000); %bb.20(96.88%), %bb.21(3.12%)

  TEST32mi %19:gr64, 1, $noreg, 8, $noreg, 2166016, implicit-def $eflags :: (load 4 from %ir.20, align 8)
  JCC_1 %bb.21, 5, implicit $eflags
  JMP_1 %bb.20

bb.20.bb113:
; predecessors: %bb.18, %bb.19
  successors: %bb.21(0x04000000), %bb.3(0x7c000000); %bb.21(3.12%), %bb.3(96.88%)

  %57:gr32 = MOV32rm %19:gr64, 1, $noreg, 44, $noreg :: (load 4 from %ir.22)
  %58:gr32 = SHR32r1 %57:gr32(tied-def 0), implicit-def dead $eflags
  %59:gr32 = SUB32rr %15:gr32(tied-def 0), killed %58:gr32, implicit-def $eflags
  JCC_1 %bb.3, 6, implicit $eflags
  JMP_1 %bb.21

After Tail Duplication:

bb.16.bb100:
; predecessors: %bb.15
  successors: %bb.20(0x80000000), %bb.19(0x00000000); %bb.20(100.00%), %bb.19(0.00%)

  MOV64mr %21:gr64, 1, $noreg, 16, $noreg, %7:gr64 :: (store 8 into %ir.17)
  INLINEASM_BR &"1:.byte 0x0f,0x1f,0x44,0x00,0\0A\09.pushsection __jump_table,  \22aw\22 \0A\09 .balign 8 \0A\09.long 1b - ., ${2:l} - . \0A\09 .quad ${0:c} + ${1:c} - .\0A\09.popsection \0A\09" [sideeffect] [mayload] [attdialect], $0:[imm], @slub_debug_enabled, $1:[imm], 0, $2:[imm], blockaddress(@get_partial_node, %ir-block.bb17.i.i.i), $3:[clobber], implicit-def early-clobber $df, $4:[clobber], implicit-def early-clobber $fpsw, $5:[clobber], implicit-def early-clobber $eflags, !11
  %60:gr64 = COPY %13:gr64
  JMP_1 %bb.20

bb.17.bb106:
; predecessors: %bb.15
  successors: %bb.20(0x80000000), %bb.19(0x00000000); %bb.20(100.00%), %bb.19(0.00%)

  ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
  %56:gr32 = MOV32r0 implicit-def dead $eflags
  $rdi = COPY %19:gr64
  $rsi = COPY %7:gr64
  $edx = COPY %56:gr32
  CALL64pcrel32 @put_cpu_partial, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit $edx, implicit-def $rsp, implicit-def $ssp
  ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
  INLINEASM_BR &"1:.byte 0x0f,0x1f,0x44,0x00,0\0A\09.pushsection __jump_table,  \22aw\22 \0A\09 .balign 8 \0A\09.long 1b - ., ${2:l} - . \0A\09 .quad ${0:c} + ${1:c} - .\0A\09.popsection \0A\09" [sideeffect] [mayload] [attdialect], $0:[imm], @slub_debug_enabled, $1:[imm], 0, $2:[imm], blockaddress(@get_partial_node, %ir-block.bb17.i.i.i), $3:[clobber], implicit-def early-clobber $df, $4:[clobber], implicit-def early-clobber $fpsw, $5:[clobber], implicit-def early-clobber $eflags, !11
  %61:gr64 = COPY %4:gr64
  JMP_1 %bb.20

bb.19.bb17.i.i.i (address-taken):
; predecessors: %bb.16, %bb.17
  successors: %bb.20(0x7c000000), %bb.21(0x04000000); %bb.20(96.88%), %bb.21(3.12%)

  %62:gr64 = PHI %60:gr64, %bb.16, %61:gr64, %bb.17
  TEST32mi %19:gr64, 1, $noreg, 8, $noreg, 2166016, implicit-def $eflags :: (load 4 from %ir.20, align 8)
  JCC_1 %bb.21, 5, implicit $eflags
  JMP_1 %bb.20

bb.20.bb113:
; predecessors: %bb.19, %bb.16, %bb.17
  successors: %bb.21(0x04000000), %bb.3(0x7c000000); %bb.21(3.12%), %bb.3(96.88%)

  %63:gr64 = PHI %62:gr64, %bb.19, %60:gr64, %bb.16, %61:gr64, %bb.17
  %57:gr32 = MOV32rm %19:gr64, 1, $noreg, 44, $noreg :: (load 4 from %ir.22)
  %58:gr32 = SHR32r1 %57:gr32(tied-def 0), implicit-def dead $eflags
  %59:gr32 = SUB32rr %15:gr32(tied-def 0), killed %58:gr32, implicit-def $eflags
  JCC_1 %bb.3, 6, implicit $eflags
  JMP_1 %bb.21

You'll notice that the INLINEASM_BR was duplicated into its predecessors, and bb.17.bb106 was merged with bb.18.bb110 (the block containing the original INLINEASM_BR instruction).

Here's the code after PHI elimination:

bb.16.bb94:
; predecessors: %bb.14
  successors: %bb.17(0x30000000), %bb.18(0x50000000); %bb.17(37.50%), %bb.18(62.50%)

  TEST64rr %4:gr64, %4:gr64, implicit-def $eflags
  JCC_1 %bb.18, 5, implicit killed $eflags
  JMP_1 %bb.17

bb.17.bb100:
; predecessors: %bb.16
  successors: %bb.20(0x80000000), %bb.19(0x00000000); %bb.20(100.00%), %bb.19(0.00%)

  MOV64mr %21:gr64, 1, $noreg, 16, $noreg, killed %7:gr64 :: (store 8 into %ir.17)
  INLINEASM_BR &"1:.byte 0x0f,0x1f,0x44,0x00,0\0A\09.pushsection __jump_table,  \22aw\22 \0A\09 .balign 8 \0A\09.long 1b - ., ${2:l} - . \0A\09 .quad ${0:c} + ${1:c} - .\0A\09.popsection \0A\09" [sideeffect] [mayload] [attdialect], $0:[imm], @slub_debug_enabled, $1:[imm], 0, $2:[imm], blockaddress(@get_partial_node, %ir-block.bb17.i.i.i), $3:[clobber], implicit-def dead early-clobber $df, $4:[clobber], implicit-def early-clobber $fpsw, $5:[clobber], implicit-def dead early-clobber $eflags, !11
  %60:gr64 = COPY killed %13:gr64
  %69:gr64 = COPY %60:gr64
  %70:gr64 = COPY killed %60:gr64
  JMP_1 %bb.20

bb.18.bb106:
; predecessors: %bb.16
  successors: %bb.20(0x80000000), %bb.19(0x00000000); %bb.20(100.00%), %bb.19(0.00%)

  ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
  $rdi = COPY %19:gr64
  $rsi = COPY killed %7:gr64
  $edx = COPY %26:gr32
  CALL64pcrel32 @put_cpu_partial, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit killed $rdi, implicit killed $rsi, implicit killed $edx, implicit-def $rsp, implicit-def $ssp
  ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
  INLINEASM_BR &"1:.byte 0x0f,0x1f,0x44,0x00,0\0A\09.pushsection __jump_table,  \22aw\22 \0A\09 .balign 8 \0A\09.long 1b - ., ${2:l} - . \0A\09 .quad ${0:c} + ${1:c} - .\0A\09.popsection \0A\09" [sideeffect] [mayload] [attdialect], $0:[imm], @slub_debug_enabled, $1:[imm], 0, $2:[imm], blockaddress(@get_partial_node, %ir-block.bb17.i.i.i), $3:[clobber], implicit-def dead early-clobber $df, $4:[clobber], implicit-def early-clobber $fpsw, $5:[clobber], implicit-def dead early-clobber $eflags, !11
  %61:gr64 = COPY killed %4:gr64
  %69:gr64 = COPY %61:gr64
  %70:gr64 = COPY killed %61:gr64
  JMP_1 %bb.20

bb.19.bb17.i.i.i (address-taken):
; predecessors: %bb.17, %bb.18
  successors: %bb.20(0x7c000000), %bb.21(0x04000000); %bb.20(96.88%), %bb.21(3.12%)

  %62:gr64 = COPY killed %69:gr64
  TEST32mi %19:gr64, 1, $noreg, 8, $noreg, 2166016, implicit-def $eflags :: (load 4 from %ir.20, align 8)
  %70:gr64 = COPY %62:gr64
  %71:gr64 = COPY killed %62:gr64
  JCC_1 %bb.21, 5, implicit killed $eflags
  JMP_1 %bb.20

bb.20.bb113:
; predecessors: %bb.19, %bb.17, %bb.18
  successors: %bb.21(0x04000000), %bb.5(0x7c000000); %bb.21(3.12%), %bb.5(96.88%)

  %63:gr64 = COPY killed %70:gr64
  %42:gr16 = COPY killed %37.sub_16bit:gr32
  %43:gr32 = MOVZX32rr16 killed %42:gr16
  %14:gr32 = nsw SUB32rr killed %41:gr32(tied-def 0), killed %43:gr32, implicit-def dead $eflags
  %15:gr32 = ADD32rr killed %3:gr32(tied-def 0), killed %14:gr32, implicit-def dead $eflags
  %57:gr32 = MOV32rm %19:gr64, 1, $noreg, 44, $noreg :: (load 4 from %ir.22)
  %58:gr32 = SHR32r1 killed %57:gr32(tied-def 0), implicit-def dead $eflags
  CMP32rr %15:gr32, killed %58:gr32, implicit-def $eflags
  %64:gr32 = COPY killed %15:gr32
  %65:gr64 = COPY %63:gr64
  %66:gr64 = COPY killed %8:gr64
  %71:gr64 = COPY killed %63:gr64
  JCC_1 %bb.5, 6, implicit killed $eflags
  JMP_1 %bb.21

As you can see, the PHI elimination pushed the copy instructions into the wrong places. As I wrote to James, we shouldn't be trying to shove those copy instructions before the INLINEASM_BR, because that's contrary to how we normally resolve PHI nodes. I.e., the instruction is executed *only* if that branch is taken. The back-end relies upon this. So if we allow for PHI node resolution from the indirect branch, and the subsequent instructions are placed before the asm goto block, then it's violating that assumption. Not only is the code slower, but it may result in incorrect execution.

I disagree. You'll be inserting an instruction that won't be executed on the default path before the asm goto block. While this may be okay in a majority of cases, I don't suspect that it'll be good to rely on it being always okay.

Being able to handle a PHI in the indirect target block is required. We can, and _must_ be able to do so. E.g., consider two asm goto statements both having the same block as their indirect destination.

I've explained way too many times what this patch is doing. You and Nick seem to think that this patch is bad for some reason or that it's confusing. If so, then please present one of your own.

The problem is that what the patch is doing is not consistent with the description of the problem. If "PHIs on indirect targets are bad", this patch cannot be the right fix, because in that case, we'd need to actually prevent PHIs in indirect targets, which this doesn't do. It avoids only one way to cause them to be created.

However, looking at your last example, I do believe it's the _description_ which is wrong. That is -- the bug is actually in tail duplication, not with PHIs in indirect targets in general. (phew!) In the "After Tail Duplication" dump, you can see tail duplication has introduced a COPY after the INLINEASM_BR, which gets used in the indirect block. This is obviously not OK -- the COPY should need to be before the INLINEASM_BR in order to be used in IR block bb17.i.i.i.

INLINEASM_BR &"1:.byte 0x0f,0x1f,0x44,0x00,0\0A\09.pushsection __jump_table,  \22aw\22 \0A\09 .balign 8 \0A\09.long 1b - ., ${2:l} - . \0A\09 .quad ${0:c} + ${1:c} - .\0A\09.popsection \0A\09" [sideeffect] [mayload] [attdialect], $0:[imm], @slub_debug_enabled, $1:[imm], 0, $2:[imm], blockaddress(@get_partial_node, %ir-block.bb17.i.i.i), $3:[clobber], implicit-def early-clobber $df, $4:[clobber], implicit-def early-clobber $fpsw, $5:[clobber], implicit-def early-clobber $eflags, !11

^^ This instruction can jump to bb.19.bb17.i.i.i

%61:gr64 = COPY %4:gr64

Therefore, this copy ^^ cannot be placed here validly.

  JMP_1 %bb.20

bb.19.bb17.i.i.i (address-taken):
; predecessors: %bb.16, %bb.17
  successors: %bb.20(0x7c000000), %bb.21(0x04000000); %bb.20(96.88%), %bb.21(3.12%)

  %62:gr64 = PHI %60:gr64, %bb.16, %61:gr64, %bb.17

^^^ Here's the use.

Here's a slightly modified version of your minimized test-case which exhibits these same invalid copy insertions. (The difference is using the phi variable somewhere else besides just that one phi.) I also got rid of the undefs, just because they seem icky --

declare void @foo()

define i8* @test1(i8** %arg1, i8* %arg2) {
bb:
  %i28.i = load i8*, i8** %arg1, align 8
  %if = icmp ne i8* %i28.i, %arg2
  br i1 %if, label %bb100, label %bb106

bb100:                                            ; preds = %bb
  store i8* null, i8** %arg1, align 8
  br label %bb110

bb106:                                            ; preds = %bb
  call void @foo()
  br label %bb110

bb110:                                            ; preds = %bb106, %bb100
  %i10.1 = phi i8* [ %arg2, %bb106 ], [ %i28.i, %bb100 ]
  callbr void asm sideeffect "#$0 $1 $2", "i,i,X,~{dirflag},~{fpsr},~{flags}"(i32 42, i1 false, i8* blockaddress(@test1, %bb17.i.i.i))
          to label %kmem_cache_has_cpu_partial.exit [label %bb17.i.i.i]

bb17.i.i.i:                                       ; preds = %bb110
  br label %kmem_cache_has_cpu_partial.exit

kmem_cache_has_cpu_partial.exit:                  ; preds = %bb110
  ret i8* %i10.1
}

In all these sorts of bugs, it's useful to ask why it's not broken with invoke (in case it is!). In this case, it looks like what prevents this bug from occurring for invoke is that EH_LABEL is marked isNotDuplicable = 1, checked a few lines up from your change. So, good.

I think the fix you have here is OK for now, if you fix the test and commentary.

As a follow-on, we could re-allow tail duplication, but actually place the copies in the correct location, by calling llvm::findPHICopyInsertPoint within TailDuplicator::appendCopies.

void abandoned this revision.Oct 5 2020, 11:26 PM
void added a comment.Oct 5 2020, 11:29 PM

Okay. Abandoning this change.

void added a comment.Oct 5 2020, 11:50 PM

One thing, I believe we're trying to turn "asm goto" into something it was never meant to be. If you look at gcc's output for this code:

int foo(int a, int b) {
	int x;

	if (a < 42) {
		x = 0;
		asm goto("# 1\n\t.data %l1" : : "r"(b) : : indirect);
	}
	if (b > 42) {
		x = 37;
		asm goto("# 2\n\t.data %l1" : : "r"(a) : : indirect);
	}
indirect:
	x = a + b - x;
ret:
	return x;
}

GCC doesn't combine the indirect destination:

	.file	"asm-goto-phi.c"
	.text
	.p2align 4
	.globl	foo
	.type	foo, @function
foo:
.LFB0:
	.cfi_startproc
	movl	%esi, %eax
	cmpl	$41, %edi
	jg	.L2
#APP
# 6 "asm-goto-phi.c" 1
	# 1
	.data .L5
# 0 "" 2
#NO_APP
.L2:
	xorl	%edx, %edx
	cmpl	$42, %eax
	jle	.L3
#APP
# 10 "asm-goto-phi.c" 1
	# 2
	.data .L4
# 0 "" 2
#NO_APP
.L4:
	movl	$37, %edx
.L3:
	addl	%edi, %eax
	subl	%edx, %eax
	ret
	.p2align 4,,10
	.p2align 3
.L5:
	xorl	%edx, %edx
	addl	%edi, %eax
	subl	%edx, %eax
	ret
	.cfi_endproc
.LFE0:
	.size	foo, .-foo
	.ident	"GCC: (Debian 9.3.0-14) 9.3.0"
	.section	.note.GNU-stack,"",@progbits

This side steps the issue of resolving PHI nodes.

But whatever. You guys come up with something.

nickdesaulniers added a comment.EditedOct 6 2020, 12:11 AM
  • EDIT ** : Bah, had typed all this out before seeing the past 3 posts...

Thanks for the additional context. I assert that that fuller demonstration does indeed have a bug in tail duplication, one that is not exposed by the current test case, so @jyknight may be correct that the current test case was over reduced. From the After tail duplication linked above:

Take a look at the PHI in bb.19.bb17.i.i.i. The values it gets (via the indirect branch of the INLINEASM_BR) are garbage. Pared down to highlight what I'm referring to:

bb.16:
  ...
  INLINEASM_BR ; conditional jump to %bb.19
  %60:gr64 = COPY %13:gr64
  JMP_1 %bb.20

bb.17:
  ...
  INLINEASM_BR ; conditional jump to %bb.19
  %61:gr64 = COPY %4:gr64
  JMP_1 %bb.20

bb.19:
  %62:gr64 = PHI %60:gr64, %bb.16, %61:gr64, %bb.17
  ...

Either way, if we take the indirect branch (via the assembly within the INLINEASM_BR), whatever the results of the PHI of %62 resolve to are going to be hella wrong, since the COPYs occurred after the INLINEASM_BRs. Instead, the COPYs should have occurred before the INLINEASM_BR.

My dump of the current test case post tail duplication above contains no such COPY related bug. I think if the test case was updated to demonstrate that, then we'd have a starting point to discuss whether we can properly resolve PHIs or not.


That we don't have a -verify-machineinstrs pass that detects that case is also ๐Ÿ’ฉ ๐Ÿ’ฉ ๐Ÿ’ฉ . We should be able to walk backwards up the use-def chain of any PHI and find a COPY or def, I suppose.


we shouldn't be trying to shove those copy instructions before the INLINEASM_BR, ...

Disagree. Without doing so we've generated garbage.

... because that's contrary to how we normally resolve PHI nodes. The back-end relies upon this.

I'll take your word on that. TBH, I don't understand why the COPY's are necessary...why don't we just transform %62 to %62 = PHI %13, %bb.16, %4, %bb.17? Trying to minimize live ranges, perhaps?

So if we allow for PHI node resolution from the indirect branch, and the subsequent instructions are placed before the asm goto block, then it's violating that assumption.

Not "[all of] the subsequent instructions," can we just move the newly created COPY to be before the INLINEASM_BR?

Not only is the code slower, but it may result in incorrect execution.

How can hoisting COPYs to be above the INLINEASM_BR that might branch to a block whose PHI will use the result lead to incorrect execution? Slower, maybe, but then I think my question about %62 would help there.


Wait, @void, was your above MIR dumped with f7a53d82c0902147909f28a9295a9d00b4b27d38 not applied, perhaps? Say if you had git checkout 4b0aa5724fea for instance to test the results of the initial "INLINEASM_BR is no longer a terminator" change? Oh, TailDuplicator does not use findPHICopyInsertPoint, so it may have exactly the same bug!

Yeah! I was looking at TailDuplicator::appendCopies before thinking "I bet MachineBasicBlock::iterator Loc = MBB->getFirstTerminator(); behaves differently after @jyknight 's 4b0aa5724fea." What if that was updated with sauce from findPHICopyInsertPoint? (It would need to know the successor BB as it's only given the predecessor BB currently). I bet that's it. It would be nice to reuse findPHICopyInsertPoint if possible (we may not want to hoist all COPYs in CopyInfos, but I'm not sure).

nickdesaulniers added a comment.EditedOct 6 2020, 12:17 AM

In all these sorts of bugs, it's useful to ask why it's not broken with invoke (in case it is!). In this case, it looks like what prevents this bug from occurring for invoke is that EH_LABEL is marked isNotDuplicable = 1, checked a few lines up from your change. So, good.

Ah, that's another thing I was looking at when I said

Or if duplicating the INLINEASM_BR results in multiple side effects? (Even if the code is never run, the kernel uses .pushsection to squirrel away data in these, so tail duplication is duplicating the data. Is that a problem? Not sure yet.)

... That is -- the bug is actually in tail duplication ...
I think the fix you have here is OK for now, if you fix the test and commentary.
As a follow-on, we could re-allow tail duplication, but actually place the copies in the correct location, by calling llvm::findPHICopyInsertPoint within TailDuplicator::appendCopies.

I agree.

llvm/lib/CodeGen/TailDuplicator.cpp
630โ€“634

I'd update this to say something to the effect of:

TailDuplicator::appendCopies() will erroneously place COPYs post INLINEASM_BR instructions
after 4b0aa5724fea, demonstrating the same bug that was fixed in findPHICopyInsertPoint() in
f7a53d82c090.
FIXME: use findPHICopyInsertPoint() in TailDuplicator::appendCopies() rather than
MachineBasicBlock::getFirstTerminator() to find the correct insertion point for the
COPY when replacing PHIs.

With that comment block updated, the commit message updated, and the altered test case from @jyknight , we'd be all set to land this and close out the clang-11 release. Then we have plenty of time to follow up on the FIXME for clang-12. WDYT?

Okay. Abandoning this change.

Sorry, maybe I buried the lede too far into the message? "I think the fix you have here is OK" was at the end. :)

One thing, I believe we're trying to turn "asm goto" into something it was never meant to be. If you look at gcc's output for this code.

Certainly, as discussed on the email thread with Segher, Clang/LLVM has interpreted a restriction on the semantics for asm-goto which GCC does _not_ have. LLVM treats asm-goto similar to the block-label and indirect goto features -- and thus requires that a critical edge to the indirect target cannot be split by making a new target block. GCC does not have such a restriction. We can discuss whether we should modify the semantics of asm-goto in Clang, but not part of this bugfix.

This side steps the issue of resolving PHI nodes.

Sure. But, in LLVM, PHIs also appear in a landingpad, which has the exact same behavior. "findPHICopyInsertPoint" being able to place a COPY earlier in a block is not something brand-new for asm-goto. The PHI code is intended to be able to handle this placement, and does so correctly, in this example at least.

void reclaimed this revision.Oct 6 2020, 4:05 PM

Okay. Abandoning this change.

Sorry, maybe I buried the lede too far into the message? "I think the fix you have here is OK" was at the end. :)

One thing, I believe we're trying to turn "asm goto" into something it was never meant to be. If you look at gcc's output for this code.

Okay. I'll reclaim this revision.

void edited the summary of this revision. (Show Details)Oct 6 2020, 4:11 PM
void updated this revision to Diff 296568.Oct 6 2020, 5:59 PM

Update comment and testcase.

void marked an inline comment as done.Oct 6 2020, 6:00 PM
void updated this revision to Diff 296569.Oct 6 2020, 6:08 PM

Rerun with the correct llc.

nickdesaulniers accepted this revision.Oct 6 2020, 6:24 PM

LGTM; I did quick ninja check-all, and build+boot tests of the problematic config reported in https://github.com/ClangBuiltLinux/linux/issues/1125, as well as x86_64 and aarch64 build+boot tests of linux-next. Everything LGTM. I'll continue to run more tests overnight just in case, but this is a fairly unobtrusive patch IMO. Thanks for narrowing down on the problematic object file, function, and pass so quickly!

This revision is now accepted and ready to land.Oct 6 2020, 6:24 PM
This revision was landed with ongoing or failed builds.Oct 6 2020, 6:45 PM
This revision was automatically updated to reflect the committed changes.

The rest of my tests completed successfully.