This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen]Allow targets to use target specific COPY instructions for live range splitting
ClosedPublic

Authored by yassingh on May 11 2023, 11:28 AM.

Details

Summary

Replacing D143754. Right now the LiveRangeSplitting during register allocation uses
TargetOpcode::COPY instruction for splitting. For AMDGPU target that creates a
problem as we have both vector and scalar copies. Vector copies perform a copy over
a vector register but only on the lanes(threads) that are active. This is mostly sufficient
however we do run into cases when we have to copy the entire vector register and
not just active lane data. One major place where we need that is live range splitting.

Allowing targets to use their own copy instructions(if defined) will provide a lot of
flexibility and ease to lower these pseudo instructions to correct MIR.

  • Introduce getTargetCopyOpcode() virtual function and use if to generate copy in Live range splitting.
  • Replace necessary MI.isCopy() checks with TII.isCopyInstr() in register allocator pipeline.

Diff Detail

Event Timeline

yassingh created this revision.May 11 2023, 11:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2023, 11:28 AM
yassingh requested review of this revision.May 11 2023, 11:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2023, 11:28 AM
yassingh added a comment.EditedMay 11 2023, 11:41 AM

This is still a work in progress, I have tried to describe the challenges in the latest comment on the discourse thread (https://discourse.llvm.org/t/rfc-introduce-generic-predicated-copy-opcode/68494/11).

Edit: Seeking to get it reviewed now.

yassingh added a comment.EditedMay 14 2023, 11:12 PM

As described in the discourse thread, the target's implementation of isCopyInstrImpl() does not allow straightforward use of TII::isCopyInstr() in the regalloc pipeline. One MIPS test crashes(LLVM.CodeGen/Mips::dsp-r1.ll) on which I agree with @arsenm can be a bug, and some tests in X86 and Thumb2 needed updating, which is to be expected?

@atanasyan @SjoerdMeijer

yassingh updated this revision to Diff 523306.May 18 2023, 3:01 AM
  • Marked CodeGen/Mips/dsp-r1.ll XFAIL for now

Ping: Hi! I'm still looking to get some initial feedback, especially on MIPS and X86 test changes.

I think the changes on X86 tests are correct.

arsenm added inline comments.May 19 2023, 8:56 AM
llvm/include/llvm/CodeGen/TargetInstrInfo.h
1967

Register should be passed by value. MRI should also be const.

Needs a doc comment.

I'd also rename this to something like getLiveRangeSplitOpcode?

llvm/lib/CodeGen/TargetInstrInfo.cpp
445

I'd prefer to pass in TII separately rather than jumping through all these hoops. Also, this is only used in the assert so will warn in release build

llvm/test/CodeGen/Mips/dsp-r1.ll
3–5 ↗(On Diff #523306)

Can't do this, just fix the mips isCopy implementation?

yassingh updated this revision to Diff 524217.May 22 2023, 3:36 AM
  • Updated Mips::isCopyInstrImpl()
  • Review comments
sdardis added inline comments.May 22 2023, 3:25 PM
llvm/lib/Target/Mips/MipsSEInstrInfo.cpp
229 ↗(On Diff #524217)

I believe the crash associated with the most immediate version of this patch was due to the definition of those instructions which take an immediate mask to specify which fields to read or write as the immediate operand is the place of the expected source operand.

This if branch can be removed with some changes I think.

The MIPS rddsp and wrdsp instructions can read/write either fields or the entire DSPCond register, provided the instruction flag of 'isMoveReg' is set to zero for (RD|WR)DSP instructions. As those instructions take the DSPConds subregisters as implicit operands, I don't think those instructions really match the expectations of the isCopy hook in their current form.

I would suggest removing isMoveReg = 1 from wrdsp and rddsp so they aren't considered simple copy-like instructions for the moment.

yassingh added inline comments.
llvm/lib/Target/Mips/MipsSEInstrInfo.cpp
229 ↗(On Diff #524217)

Thanks for the feedback @sdardis. I have removed "isMoveReg=1" from both instructions as you suggested. D151181

yassingh updated this revision to Diff 526356.May 28 2023, 7:01 PM

Rebase and ping.

@qcolombet can you take a look?
This will replace the initial implementation of generic pred-copy opcode, D143754.

Hi,

The patch looks reasonable to me.
We need to fix the use of getOperand(0) and getOperand(1) everywhere.

I've highlighted a couple of these. I let you take a closer look.

Cheers,
-Quentin

llvm/include/llvm/CodeGen/TargetInstrInfo.h
1050

Instead of checking operand 0 and 1 directly, we should get the pair from isCopyInstr and check these.

llvm/lib/CodeGen/CalcSpillWeights.cpp
228

Similar comment, everywhere we introduce a isCopyInstr, we need to check the returned pair.

llvm/lib/CodeGen/RegAllocGreedy.cpp
2457–2458

Ditto

yassingh updated this revision to Diff 528713.Jun 5 2023, 11:57 PM

Thanks for the feedback @qcolombet! I have updated all uses of getOperand(0|1)

arsenm added inline comments.Jun 8 2023, 5:00 PM
llvm/include/llvm/CodeGen/TargetInstrInfo.h
1970

MRI should be const

llvm/lib/CodeGen/InlineSpiller.cpp
874

Don't need ? true : false

llvm/lib/CodeGen/LiveRangeEdit.cpp
356

swap these

llvm/lib/CodeGen/LiveRangeShrink.cpp
202

Move this to the function prolog

yassingh updated this revision to Diff 529917.Jun 9 2023, 5:00 AM
  • review comments
yassingh marked 3 inline comments as done.Jun 9 2023, 5:03 AM
yassingh added inline comments.
llvm/lib/CodeGen/InlineSpiller.cpp
874

isCopyInstr() returns std::optional. Should I use 'auto' to get rid of the ternary operator?

arsenm added inline comments.Jun 14 2023, 7:27 AM
llvm/lib/CodeGen/InlineSpiller.cpp
874

or .has_value

yassingh updated this revision to Diff 531664.Jun 15 2023, 2:34 AM
  • review comments
yassingh marked 2 inline comments as done.Jun 15 2023, 2:36 AM
cdevadas accepted this revision.Jun 15 2023, 4:46 AM

LGTM.

This revision is now accepted and ready to land.Jun 15 2023, 4:46 AM
arsenm accepted this revision.Jun 15 2023, 4:48 AM
arsenm added inline comments.
llvm/lib/CodeGen/InlineSpiller.cpp
523

swap these checks

yassingh updated this revision to Diff 532848.Jun 20 2023, 3:12 AM
yassingh marked an inline comment as done.

rebase

Can you push this soon? I have a patch that's going to conflict with the isCopyInstr changes

Can you push this soon? I have a patch that's going to conflict with the isCopyInstr changes

I was planning to first commit all the patches in this series downstream first(together). Since they will cause a major merge conflict. Let me try if it's possible to only commit generic patches without breaking spill handling downstream.

cdevadas added inline comments.Jun 26 2023, 9:29 AM
llvm/include/llvm/CodeGen/TargetInstrInfo.h
1970

Can you make this function to take MachineFunction& as the second argument in the first place?
You are changing it in D143762.

yassingh updated this revision to Diff 534897.Jun 27 2023, 3:38 AM
yassingh marked an inline comment as done.

Move function definiton change in getLiveRangeSplitOpcode from D143762 here.

arsenm accepted this revision.Jun 27 2023, 5:32 AM
kparzysz accepted this revision.Jun 27 2023, 7:08 AM
cdevadas added inline comments.Jul 6 2023, 8:47 AM
llvm/lib/CodeGen/RegAllocGreedy.cpp
2460

Change both Dest and Src to Ref as in the original code.
const MachineOperand &Dest = *DestSrc->Destination;
That will avoid the additional changes you made below.

yassingh updated this revision to Diff 537988.Jul 6 2023, 10:24 PM

change ptr to ref

cdevadas accepted this revision.Jul 7 2023, 5:00 AM
yassingh updated this revision to Diff 538180.Jul 7 2023, 9:58 AM

Rebase before merge

This revision was landed with ongoing or failed builds.Jul 7 2023, 10:00 AM
This revision was automatically updated to reflect the committed changes.

Headsup: This change might be causing a miscompile. We're having some breakages in google because of this change that goes away when we add __attribute__((optnone)) to https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/SourceCode.h#L111. I'm working on confirming it.

Headsup: This change might be causing a miscompile. We're having some breakages in google because of this change that goes away when we add __attribute__((optnone)) to https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/SourceCode.h#L111. I'm working on confirming it.

The easiest experiment would be to change the implementation of isCopyInstr to not bother calling isCopyInstrImpl. The most likely source of any issue is the broader recognition of copy-like operations

Headsup: This change might be causing a miscompile. We're having some breakages in google because of this change that goes away when we add __attribute__((optnone)) to https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/SourceCode.h#L111. I'm working on confirming it.

We also see false -fsanitize=bool reports. Definitely miss-compile.
I propose to revert.

Headsup: This change might be causing a miscompile. We're having some breakages in google because of this change that goes away when we add __attribute__((optnone)) to https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/SourceCode.h#L111. I'm working on confirming it.

As Matt suggested, "The most likely source of any issue is the broader recognition of copy-like operations". It's hard to say anything without looking at your internal implementation of isCopyIstrImpl(). For eg MIPS implementation of the same function required some tweaking before this patch was pushed as it wasn't accurately describing COPY like instructions. D151181

Headsup: This change might be causing a miscompile. We're having some breakages in google because of this change that goes away when we add __attribute__((optnone)) to https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/SourceCode.h#L111. I'm working on confirming it.

As Matt suggested, "The most likely source of any issue is the broader recognition of copy-like operations". It's hard to say anything without looking at your internal implementation of isCopyIstrImpl(). For eg MIPS implementation of the same function required some tweaking before this patch was pushed as it wasn't accurately describing COPY like instructions. D151181

It's still regression. Unless someone has a patch ready to land, we need to revert this.

Headsup: This change might be causing a miscompile. We're having some breakages in google because of this change that goes away when we add __attribute__((optnone)) to https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/SourceCode.h#L111. I'm working on confirming it.

As Matt suggested, "The most likely source of any issue is the broader recognition of copy-like operations". It's hard to say anything without looking at your internal implementation of isCopyIstrImpl(). For eg MIPS implementation of the same function required some tweaking before this patch was pushed as it wasn't accurately describing COPY like instructions. D151181

It's still regression. Unless someone has a patch ready to land, we need to revert this.

This is part of a chain of patches that fixes an AMDGPU backend issue. D124196 being the final one in the chain, the revert won't be that easy.
I'm trying to see if we can avoid the revert. Can you attach a reproducible IR code?

It's still regression. Unless someone has a patch ready to land, we need to revert this.

I'd ask that you provide a reproducer, and perform the experiment I mentioned. If we can just disable the expanded isCopyInstr identification, it will be a lot less painful

It's still regression. Unless someone has a patch ready to land, we need to revert this.

I'd ask that you provide a reproducer, and perform the experiment I mentioned. If we can just disable the expanded isCopyInstr identification, it will be a lot less painful

I tried the suggestion; based on my understanding what I did is:

@@ -1038,10 +1038,10 @@
   /// registers as machine operands, for all other instructions the method calls
   /// target-dependent implementation.
   std::optional<DestSourcePair> isCopyInstr(const MachineInstr &MI) const {
-    if (MI.isCopy()) {
-      return DestSourcePair{MI.getOperand(0), MI.getOperand(1)};
-    }
-    return isCopyInstrImpl(MI);
+    // if (MI.isCopy()) {
+    return DestSourcePair{MI.getOperand(0), MI.getOperand(1)};
+    // }
+    // return isCopyInstrImpl(MI);
   }
 
   bool isFullCopyInstr(const MachineInstr &MI) const {

But clang crashes when I use it to build the target test (and many other targets), I can't include the exact stack trace. But maybe you meant something else by your suggestion?

1.	<eof> parser at end of file
2.	Code generation
3.	Running pass 'Function Pass Manager' on module $xyz.
4.	Running pass 'Greedy Register Allocator' on function

Target: x86_64-grtev4-linux-gnu

I'm still working on a repro.

@@ -1038,10 +1038,10 @@

/// registers as machine operands, for all other instructions the method calls
/// target-dependent implementation.
std::optional<DestSourcePair> isCopyInstr(const MachineInstr &MI) const {
  • if (MI.isCopy()) {
  • return DestSourcePair{MI.getOperand(0), MI.getOperand(1)};
  • }
  • return isCopyInstrImpl(MI);

+ if (MI.isCopy()) {
+ return DestSourcePair{MI.getOperand(0), MI.getOperand(1)};
+
}
+ // return isCopyInstrImpl(MI);

}
 
bool isFullCopyInstr(const MachineInstr &MI) const {
But clang crashes when I use it to build the target test (and many other targets), I can't include the exact stack trace. But maybe you meant something else by your suggestion?
  1. <eof> parser at end of file
  2. Code generation
  3. Running pass 'Function Pass Manager' on module $xyz.
  4. Running pass 'Greedy Register Allocator' on function
Target: x86_64-grtev4-linux-gnu

I'm still working on a repro.

Yes, this would just crash. You want:

   std::optional<DestSourcePair> isCopyInstr(const MachineInstr &MI) const {
   if (MI.isCopy()) {
    return DestSourcePair{MI.getOperand(0), MI.getOperand(1)};
  return std::nullopt;
}

It's still regression. Unless someone has a patch ready to land, we need to revert this.

I'd ask that you provide a reproducer, and perform the experiment I mentioned. If we can just disable the expanded isCopyInstr identification, it will be a lot less painful

I am running creduce for ubsan case whole weekend, still quite large. I can try the experiment if you can provide a draft patch.

This comment was removed by asmok-g.

The issue didn't go away with the changed line.

It's still regression. Unless someone has a patch ready to land, we need to revert this.

I'd ask that you provide a reproducer, and perform the experiment I mentioned. If we can just disable the expanded isCopyInstr identification, it will be a lot less painful

I am running creduce for ubsan case whole weekend, still quite large. I can try the experiment if you can provide a draft patch.

It's still regression. Unless someone has a patch ready to land, we need to revert this.

I'd ask that you provide a reproducer, and perform the experiment I mentioned. If we can just disable the expanded isCopyInstr identification, it will be a lot less painful

I am running creduce for ubsan case whole weekend, still quite large. I can try the experiment if you can provide a draft patch.

https://reviews.llvm.org/differential/diff/541268/ is the draft

There's something wrong elsewhere even if this fixes it, but I can't see what else would change anything

It's still regression. Unless someone has a patch ready to land, we need to revert this.

I'd ask that you provide a reproducer, and perform the experiment I mentioned. If we can just disable the expanded isCopyInstr identification, it will be a lot less painful

I am running creduce for ubsan case whole weekend, still quite large. I can try the experiment if you can provide a draft patch.

It's still regression. Unless someone has a patch ready to land, we need to revert this.

I'd ask that you provide a reproducer, and perform the experiment I mentioned. If we can just disable the expanded isCopyInstr identification, it will be a lot less painful

I am running creduce for ubsan case whole weekend, still quite large. I can try the experiment if you can provide a draft patch.

https://reviews.llvm.org/differential/diff/541268/ is the draft

There's something wrong elsewhere even if this fixes it, but I can't see what else would change anything

I just spotted a bug, isCopyOfBundle isn't using the proper result from isCopyInstr. Copy bundles are niche enough that I'd be surprised if they help your case (x86 I presume?)

There's something wrong elsewhere even if this fixes it, but I can't see what else would change anything

I just spotted a bug, isCopyOfBundle isn't using the proper result from isCopyInstr. Copy bundles are niche enough that I'd be surprised if they help your case (x86 I presume?)

Try 825b7f0ca5f2211ec3c93139f98d1e24048c225c

I'm still very interested in getting a reproducer if this fixes it. It's usually hard to synthetically craft one for cases like this. If you send IR + invocation I can quickly reduce cases where this patch introduces a diff

bgraur added a subscriber: bgraur.Jul 18 2023, 1:27 AM

Try 825b7f0ca5f2211ec3c93139f98d1e24048c225c

I'm still very interested in getting a reproducer if this fixes it. It's usually hard to synthetically craft one for cases like this. If you send IR + invocation I can quickly reduce cases where this patch introduces a diff

Still trying with the IR reduction but the patch didn't help in our case.

Try 825b7f0ca5f2211ec3c93139f98d1e24048c225c

I'm still very interested in getting a reproducer if this fixes it. It's usually hard to synthetically craft one for cases like this. If you send IR + invocation I can quickly reduce cases where this patch introduces a diff

Still trying with the IR reduction but the patch didn't help in our case.

Are you getting the same error as before after both the patches (the experiment one and the bundlecopy fix)?

Try 825b7f0ca5f2211ec3c93139f98d1e24048c225c

I'm still very interested in getting a reproducer if this fixes it. It's usually hard to synthetically craft one for cases like this. If you send IR + invocation I can quickly reduce cases where this patch introduces a diff

Still trying with the IR reduction but the patch didn't help in our case.

Are you getting the same error as before after both the patches (the experiment one and the bundlecopy fix)?

Yes.

alexfh added a subscriber: alexfh.Jul 18 2023, 3:45 PM

I'm not sure this llvm-reduce'd IR snippet retains the essence of the problem, but maybe you could look at it and see if there's an obvious issue with the codegen?

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

; Function Attrs: cold noreturn nounwind
declare void @llvm.ubsantrap(i8 immarg) #0

declare i1 @_f1()

declare { i64, i64 } @_f2(ptr)

declare { i64, i8 } @_f3()

declare void @_f4()

define fastcc void @_f(ptr %0, ptr %1, i64 %2, ptr %3, ptr %4, i1 %5, i1 %6, i24 %7, i1 %8) #1 {
  %10 = call i1 @_f1()
  %11 = icmp eq i24 %7, 0
  br i1 %11, label %13, label %12

12:                                               ; preds = %9
  call void @_f4()
  br label %common.ret

common.ret:                                       ; preds = %22, %18, %12
  ret void

13:                                               ; preds = %20, %9
  %14 = phi i40 [ undef, %9 ], [ %21, %20 ]
  br i1 %6, label %15, label %16

15:                                               ; preds = %13
  call void @llvm.ubsantrap(i8 0)
  unreachable

16:                                               ; preds = %13
  %17 = call { i64, i64 } @_f2(ptr %3)
  br i1 %5, label %20, label %18

18:                                               ; preds = %16
  %19 = and i40 %14, 4294967295
  store ptr null, ptr %0, align 8
  store i40 %19, ptr %1, align 4
  br i1 %8, label %common.ret, label %20

20:                                               ; preds = %18, %16
  %21 = phi i40 [ %14, %16 ], [ %19, %18 ]
  br i1 %8, label %22, label %13

22:                                               ; preds = %20
  store ptr null, ptr %0, align 8
  %23 = call { i64, i8 } @_f3()
  %24 = load ptr, ptr %0, align 8
  %25 = icmp eq ptr %24, null
  br i1 %25, label %26, label %common.ret

26:                                               ; preds = %22
  store volatile i32 0, ptr null, align 4294967296
  unreachable
}

attributes #0 = { cold noreturn nounwind }
attributes #1 = { "frame-pointer"="all" }

The difference in the generated x86-64 assembly (with clang -O1 -S, before this patch and after it) is as follows:

        .text
        .file   "reduced.ll"
        .globl  _f                              # -- Begin function _f
        .p2align        4, 0x90
        .type   _f,@function
 _f:                                     # @_f
        .cfi_startproc
 # %bb.0:
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset %rbp, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register %rbp
        pushq   %r15
        pushq   %r14
        pushq   %r13
        pushq   %r12
        pushq   %rbx
        subq    $24, %rsp
        .cfi_offset %rbx, -56
        .cfi_offset %r12, -48
        .cfi_offset %r13, -40
        .cfi_offset %r14, -32
        .cfi_offset %r15, -24
        movl    %r9d, %r14d
-       movq    %rcx, %r12
-       movq    %rsi, %r15
-       movq    %rdi, -48(%rbp)                 # 8-byte Spill
+       movq    %rcx, %r15
+       movq    %rsi, %r12
+       movq    %rdi, %rbx
        movzbl  32(%rbp), %r13d
-       movzbl  16(%rbp), %ebx
+       movzbl  16(%rbp), %eax
+       movb    %al, -41(%rbp)                  # 1-byte Spill
        callq   _f1@PLT
        testl   $16777215, 24(%rbp)             # imm = 0xFFFFFF
        je      .LBB0_1
-# %bb.5:
+# %bb.9:
        addq    $24, %rsp
        popq    %rbx
        popq    %r12
        popq    %r13
        popq    %r14
        popq    %r15
        popq    %rbp
        .cfi_def_cfa %rsp, 8
        jmp     _f4@PLT                         # TAILCALL
 .LBB0_1:                                # %.preheader
        .cfi_def_cfa %rbp, 16
-       movq    %r15, -56(%rbp)                 # 8-byte Spill
-       movl    %r14d, %r15d
-       movq    -48(%rbp), %r14                 # 8-byte Reload
-       testb   $1, %bl
-       jne     .LBB0_7
+       movq    %rbx, -56(%rbp)                 # 8-byte Spill
+       testb   $1, -41(%rbp)                   # 1-byte Folded Reload
+       jne     .LBB0_11
 # %bb.2:                                # %.preheader.split.preheader
                                         # implicit-def: $rbx
        jmp     .LBB0_3
        .p2align        4, 0x90
-.LBB0_4:                                #   in Loop: Header=BB0_3 Depth=1
-       movq    %r14, %rax
+.LBB0_6:                                #   in Loop: Header=BB0_3 Depth=1
        testb   $1, %r13b
-       jne     .LBB0_11
+       jne     .LBB0_7
 .LBB0_3:                                # %.preheader.split
                                         # =>This Inner Loop Header: Depth=1
-       movq    %r12, %rdi
+       movq    %r15, %rdi
        callq   _f2@PLT
-       testb   $1, %r15b
-       jne     .LBB0_4
-# %bb.8:                                #   in Loop: Header=BB0_3 Depth=1
-       movq    $0, (%r14)
-       movq    -56(%rbp), %rcx                 # 8-byte Reload
-       movl    %ebx, (%rcx)
-       movb    $0, 4(%rcx)
-       testb   $1, %r13b
+       testb   $1, %r14b
        jne     .LBB0_6
-# %bb.9:                                #   in Loop: Header=BB0_3 Depth=1
-       movq    %r14, %rax
-       movl    %ebx, %ebx
-       testb   $1, %r13b
-       je      .LBB0_3
-.LBB0_11:
+# %bb.4:                                #   in Loop: Header=BB0_3 Depth=1
+       movq    -56(%rbp), %rax                 # 8-byte Reload
        movq    $0, (%rax)
-       movq    %rax, %rbx
+       movl    %ebx, (%r12)
+       movb    $0, 4(%r12)
+       testb   $1, %r13b
+       jne     .LBB0_10
+# %bb.5:                                #   in Loop: Header=BB0_3 Depth=1
+       movl    %ebx, %ebx
+       jmp     .LBB0_6
+.LBB0_7:
+       movq    -56(%rbp), %rbx                 # 8-byte Reload
+       movq    $0, (%rbx)
        callq   _f3@PLT
        cmpq    $0, (%rbx)
-       je      .LBB0_12
-.LBB0_6:                                # %common.ret
+       je      .LBB0_8
+.LBB0_10:                               # %common.ret
        addq    $24, %rsp
        popq    %rbx
        popq    %r12
        popq    %r13
        popq    %r14
        popq    %r15
        popq    %rbp
        .cfi_def_cfa %rsp, 8
        retq
-.LBB0_7:
+.LBB0_11:
        .cfi_def_cfa %rbp, 16
        ud1l    (%eax), %eax
-.LBB0_12:
+.LBB0_8:
        movl    $0, 0
 .Lfunc_end0:
        .size   _f, .Lfunc_end0-_f
        .cfi_endproc
                                         # -- End function
        .section        ".note.GNU-stack","",@progbits
        .addrsig

I'm not sure this llvm-reduce'd IR snippet retains the essence of the problem, but maybe you could look at it and see if there's an obvious issue with the codegen?

I'll look more tomorrow but this diff is fully the isCopyInstrImpl handling some identity copies. The diff disappers without the x86 implementation. It triggers for identity copies only

My repro:

; ModuleID = '<bc file>'
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

%"struct.devtools::inliner::CallArg2.2307.4010.6850.7702.9690.1453.4566.8245.24575.24695.24965.25235.25445.26645.26877.26937.27207.27387.27673.27761.27849.27893.27915.27959.27981.28003.28016.28068.28120.28133.28185.28237.28250.50.102.115" = type { %"class.std::__u::optional.2306.4009.6849.7701.9689.1452.4565.8244.24574.24694.24964.25234.25444.26644.26876.26936.27206.27386.27672.27760.27848.27892.27914.27958.27980.28002.28015.28067.28119.28132.28184.28236.28249.49.101.114" }
%"class.std::__u::optional.2306.4009.6849.7701.9689.1452.4565.8244.24574.24694.24964.25234.25444.26644.26876.26936.27206.27386.27672.27760.27848.27892.27914.27958.27980.28002.28015.28067.28119.28132.28184.28236.28249.49.101.114" = type { %"struct.std::__u::__optional_move_assign_base.base.2305.4008.6848.7700.9688.1451.4564.8243.24573.24693.24963.25233.25443.26643.26875.26935.27205.27385.27671.27759.27847.27891.27913.27957.27979.28001.28014.28066.28118.28131.28183.28235.28248.48.100.113", [3 x i8] }
%"struct.std::__u::__optional_move_assign_base.base.2305.4008.6848.7700.9688.1451.4564.8243.24573.24693.24963.25233.25443.26643.26875.26935.27205.27385.27671.27759.27847.27891.27913.27957.27979.28001.28014.28066.28118.28131.28183.28235.28248.48.100.113" = type { %"struct.std::__u::__optional_copy_assign_base.base.2304.4007.6847.7699.9687.1450.4563.8242.24572.24692.24962.25232.25442.26642.26874.26934.27204.27384.27670.27758.27846.27890.27912.27956.27978.28000.28013.28065.28117.28130.28182.28234.28247.47.99.112" }
%"struct.std::__u::__optional_copy_assign_base.base.2304.4007.6847.7699.9687.1450.4563.8242.24572.24692.24962.25232.25442.26642.26874.26934.27204.27384.27670.27758.27846.27890.27912.27956.27978.28000.28013.28065.28117.28130.28182.28234.28247.47.99.112" = type { %"struct.std::__u::__optional_move_base.base.2303.4006.6846.7698.9686.1449.4562.8241.24571.24691.24961.25231.25441.26641.26873.26933.27203.27383.27669.27757.27845.27889.27911.27955.27977.27999.28012.28064.28116.28129.28181.28233.28246.46.98.111" }
%"struct.std::__u::__optional_move_base.base.2303.4006.6846.7698.9686.1449.4562.8241.24571.24691.24961.25231.25441.26641.26873.26933.27203.27383.27669.27757.27845.27889.27911.27955.27977.27999.28012.28064.28116.28129.28181.28233.28246.46.98.111" = type { %"struct.std::__u::__optional_copy_base.base.2302.4005.6845.7697.9685.1448.4561.8240.24570.24690.24960.25230.25440.26640.26872.26932.27202.27382.27668.27756.27844.27888.27910.27954.27976.27998.28011.28063.28115.28128.28180.28232.28245.45.97.110" }
%"struct.std::__u::__optional_copy_base.base.2302.4005.6845.7697.9685.1448.4561.8240.24570.24690.24960.25230.25440.26640.26872.26932.27202.27382.27668.27756.27844.27888.27910.27954.27976.27998.28011.28063.28115.28128.28180.28232.28245.45.97.110" = type { %"struct.std::__u::__optional_storage_base.base.2301.4004.6844.7696.9684.1447.4560.8239.24569.24689.24959.25229.25439.26639.26871.26931.27201.27381.27667.27755.27843.27887.27909.27953.27975.27997.28010.28062.28114.28127.28179.28231.28244.44.96.109" }
%"struct.std::__u::__optional_storage_base.base.2301.4004.6844.7696.9684.1447.4560.8239.24569.24689.24959.25229.25439.26639.26871.26931.27201.27381.27667.27755.27843.27887.27909.27953.27975.27997.28010.28062.28114.28127.28179.28231.28244.44.96.109" = type { %"struct.std::__u::__optional_destruct_base.base.2300.4003.6843.7695.9683.1446.4559.8238.24568.24688.24958.25228.25438.26638.26870.26930.27200.27380.27666.27754.27842.27886.27908.27952.27974.27996.28009.28061.28113.28126.28178.28230.28243.43.95.108" }
%"struct.std::__u::__optional_destruct_base.base.2300.4003.6843.7695.9683.1446.4559.8238.24568.24688.24958.25228.25438.26638.26870.26930.27200.27380.27666.27754.27842.27886.27908.27952.27974.27996.28009.28061.28113.28126.28178.28230.28243.43.95.108" = type { %union.anon.40.2299.4002.6842.7694.9682.1445.4558.8237.24567.24687.24957.25227.25437.26637.26869.26929.27199.27379.27665.27753.27841.27885.27907.27951.27973.27995.28008.28060.28112.28125.28177.28229.28242.42.94.107, i8 }
%union.anon.40.2299.4002.6842.7694.9682.1445.4558.8237.24567.24687.24957.25227.25437.26637.26869.26929.27199.27379.27665.27753.27841.27885.27907.27951.27973.27995.28008.28060.28112.28125.28177.28229.28242.42.94.107 = type { %"class.clang::CharSourceRange.2289.3992.6832.7684.9672.1435.4548.8227.24566.24686.24956.25226.25436.26636.26868.26928.27198.27378.27664.27752.27840.27884.27906.27950.27972.27994.28007.28059.28111.28124.28176.28228.28241.41.93.106" }
%"class.clang::CharSourceRange.2289.3992.6832.7684.9672.1435.4548.8227.24566.24686.24956.25226.25436.26636.26868.26928.27198.27378.27664.27752.27840.27884.27906.27950.27972.27994.28007.28059.28111.28124.28176.28228.28241.41.93.106" = type <{ %"class.clang::SourceRange.2288.3991.6831.7683.9671.1434.4547.8226.24565.24685.24955.25225.25435.26635.26867.26927.27197.27377.27663.27751.27839.27883.27905.27949.27971.27993.28006.28058.28110.28123.28175.28227.28240.40.92.105", i8, [3 x i8] }>
%"class.clang::SourceRange.2288.3991.6831.7683.9671.1434.4547.8226.24565.24685.24955.25225.25435.26635.26867.26927.27197.27377.27663.27751.27839.27883.27905.27949.27971.27993.28006.28058.28110.28123.28175.28227.28240.40.92.105" = type { %"class.clang::SourceLocation.2287.3990.6830.7682.9670.1433.4546.8225.24555.24675.24945.25215.25425.26625.26857.26917.27187.27367.27653.27741.27829.27873.27895.27939.27961.27983.28005.28057.28109.28122.28174.28226.28239.39.91.104", %"class.clang::SourceLocation.2287.3990.6830.7682.9670.1433.4546.8225.24555.24675.24945.25215.25425.26625.26857.26917.27187.27367.27653.27741.27829.27873.27895.27939.27961.27983.28005.28057.28109.28122.28174.28226.28239.39.91.104" }
%"class.clang::SourceLocation.2287.3990.6830.7682.9670.1433.4546.8225.24555.24675.24945.25215.25425.26625.26857.26917.27187.27367.27653.27741.27829.27873.27895.27939.27961.27983.28005.28057.28109.28122.28174.28226.28239.39.91.104" = type { i32 }
%"struct.std::__u::__optional_destruct_base.2555.4258.7098.7950.9938.1701.4814.8493.24576.24696.24966.25236.25446.26646.26885.26945.27215.27395.27674.27762.27850.27894.27916.27960.27982.28004.28017.28069.28121.28134.28186.28238.28251.51.103.116" = type { %union.anon.40.2299.4002.6842.7694.9682.1445.4558.8237.24567.24687.24957.25227.25437.26637.26869.26929.27199.27379.27665.27753.27841.27885.27907.27951.27973.27995.28008.28060.28112.28125.28177.28229.28242.42.94.107, i8, [3 x i8] }

; Function Attrs: noinline
define void @_ZN8devtools7inliner14ParseCallArgs3ERKN5clang8CallExprERKNS1_12FunctionDeclERNS1_10ASTContextE(ptr %0, ptr %1, i40 %2, ptr %3, i32 %4) #0 {
  br label %9

6:                                                ; preds = %21, %18
  %.sroa.0.0 = phi ptr [ %24, %21 ], [ null, %18 ]
  %.sroa.5.0 = phi ptr [ %25, %21 ], [ null, %18 ]
  %7 = add i32 %10, 1
  %8 = icmp eq i32 %10, %4
  br i1 %8, label %27, label %9

9:                                                ; preds = %6, %5
  %.sroa.5.1 = phi ptr [ null, %5 ], [ %.sroa.5.0, %6 ]
  %10 = phi i32 [ 0, %5 ], [ %7, %6 ]
  %11 = phi i40 [ undef, %5 ], [ %19, %6 ]
  %12 = call ptr @_ZN5clang4Expr27IgnoreUnlessSpelledInSourceEv()
  %13 = load i8, ptr %1, align 8
  %14 = icmp ult i8 %13, -5
  %15 = and i40 %11, 4294967295
  br i1 %14, label %18, label %16

16:                                               ; preds = %9
  %17 = load volatile { i64, i64 }, ptr null, align 4294967296
  br label %18

18:                                               ; preds = %16, %9
  %19 = phi i40 [ %15, %9 ], [ %2, %16 ]
  %20 = icmp ugt ptr %.sroa.5.1, %0
  br i1 %20, label %6, label %21

21:                                               ; preds = %18
  %22 = icmp eq ptr %.sroa.5.1, null
  %23 = zext i1 %22 to i64
  %24 = call ptr @_Znwm(i64 0)
  %25 = getelementptr %"struct.devtools::inliner::CallArg2.2307.4010.6850.7702.9690.1453.4566.8245.24575.24695.24965.25235.25445.26645.26877.26937.27207.27387.27673.27761.27849.27893.27915.27959.27981.28003.28016.28068.28120.28133.28185.28237.28250.50.102.115", ptr %3, i64 %23
  %26 = getelementptr i8, ptr %24, i64 8
  store i40 %19, ptr %26, align 4
  br label %6

27:                                               ; preds = %6
  %28 = getelementptr %"struct.std::__u::__optional_destruct_base.2555.4258.7098.7950.9938.1701.4814.8493.24576.24696.24966.25236.25446.26646.26885.26945.27215.27395.27674.27762.27850.27894.27916.27960.27982.28004.28017.28069.28121.28134.28186.28238.28251.51.103.116", ptr %.sroa.0.0, i64 0, i32 1
  %29 = load i8, ptr %28, align 4
  %30 = icmp eq i8 %29, 0
  br i1 %30, label %32, label %31

31:                                               ; preds = %27
  call void @__ubsan_handle_load_invalid_value_abort(ptr %0)
  unreachable

32:                                               ; preds = %27
  ret void

; uselistorder directives
  uselistorder i32 %10, { 1, 0 }
}

define void @_ZN8devtools7inliner14ParseCallArgs2ERKN5clang8CallExprERKNS1_12FunctionDeclERNS1_10ASTContextE(ptr %0, ptr %1) {
  call void @_ZN8devtools7inliner14ParseCallArgs3ERKN5clang8CallExprERKNS1_12FunctionDeclERNS1_10ASTContextE(ptr %1, ptr %0, i40 0, ptr null, i32 0)
  ret void
}

declare ptr @_ZN5clang4Expr27IgnoreUnlessSpelledInSourceEv()

declare void @__ubsan_handle_load_invalid_value_abort(ptr)

declare ptr @_Znwm(i64)

attributes #0 = { noinline "frame-pointer"="all" }

Before this patch __ubsan_handle_load_invalid_value_abort was not reached, now it is.

llc ./llvm-reduce-42f916.ll -O3 -o ./llvm-reduce-42f916.ll.<revision>.s
diff -u --color ./llvm-reduce-42f916.ll.eb98abab2c83.s ./llvm-reduce-42f916.ll.b7836d856206.s :

--- ./llvm-reduce-42f916.ll.eb98abab2c83.s	2023-07-19 10:38:05.571863413 -0700
+++ ./llvm-reduce-42f916.ll.b7836d856206.s	2023-07-19 10:39:47.587632219 -0700
@@ -23,14 +23,14 @@
 	.cfi_offset %r14, -32
 	.cfi_offset %r15, -24
 	movl	%r8d, %r14d
-	movq	%rcx, -64(%rbp)                 # 8-byte Spill
-	movq	%rdx, -56(%rbp)                 # 8-byte Spill
-	movq	%rsi, %r13
+	movq	%rcx, -56(%rbp)                 # 8-byte Spill
+	movq	%rdx, -48(%rbp)                 # 8-byte Spill
+	movq	%rsi, %r12
 	movq	%rdi, %r15
 	incl	%r14d
 	xorl	%ebx, %ebx
-                                        # implicit-def: $r12
-	movq	%rsi, -48(%rbp)                 # 8-byte Spill
+                                        # implicit-def: $rax
+                                        # kill: killed $rax
 	jmp	.LBB0_3
 	.p2align	4, 0x90
 .LBB0_1:                                #   in Loop: Header=BB0_3 Depth=1
@@ -41,41 +41,37 @@
 	xorl	%edi, %edi
 	callq	_Znwm@PLT
 	shlq	$4, %r15
-	addq	-64(%rbp), %r15                 # 8-byte Folded Reload
-	movq	%r12, %rcx
+	addq	-56(%rbp), %r15                 # 8-byte Folded Reload
+	movq	-64(%rbp), %rdx                 # 8-byte Reload
+	movq	%rdx, %rcx
 	shrq	$32, %rcx
 	movb	%cl, 12(%rax)
-	movl	%r12d, 8(%rax)
+	movl	%edx, 8(%rax)
 	movq	%r15, %rbx
 	movq	%r13, %r15
-	movq	-48(%rbp), %r13                 # 8-byte Reload
 	decl	%r14d
-	je	.LBB0_8
+	je	.LBB0_7
 .LBB0_3:                                # =>This Inner Loop Header: Depth=1
 	callq	_ZN5clang4Expr27IgnoreUnlessSpelledInSourceEv@PLT
-	cmpb	$-5, (%r13)
-	jae	.LBB0_5
+	cmpb	$-5, (%r12)
+	jb	.LBB0_5
 # %bb.4:                                #   in Loop: Header=BB0_3 Depth=1
-	movl	%r12d, %r12d
-	cmpq	%r15, %rbx
-	jbe	.LBB0_1
-	jmp	.LBB0_7
-	.p2align	4, 0x90
-.LBB0_5:                                #   in Loop: Header=BB0_3 Depth=1
 	movq	0, %rax
 	movq	8, %rax
-	movq	-56(%rbp), %r12                 # 8-byte Reload
+	movq	-48(%rbp), %rax                 # 8-byte Reload
+	movq	%rax, -64(%rbp)                 # 8-byte Spill
+.LBB0_5:                                #   in Loop: Header=BB0_3 Depth=1
 	cmpq	%r15, %rbx
 	jbe	.LBB0_1
-.LBB0_7:                                #   in Loop: Header=BB0_3 Depth=1
+# %bb.6:                                #   in Loop: Header=BB0_3 Depth=1
 	xorl	%eax, %eax
 	xorl	%ebx, %ebx
 	decl	%r14d
 	jne	.LBB0_3
-.LBB0_8:
+.LBB0_7:
 	cmpb	$0, 12(%rax)
-	jne	.LBB0_10
-# %bb.9:
+	jne	.LBB0_9
+# %bb.8:
 	addq	$24, %rsp
 	popq	%rbx
 	popq	%r12
@@ -85,7 +81,7 @@
 	popq	%rbp
 	.cfi_def_cfa %rsp, 8
 	retq
-.LBB0_10:
+.LBB0_9:
 	.cfi_def_cfa %rbp, 16
 	movq	%r15, %rdi
 	callq	__ubsan_handle_load_invalid_value_abort@PLT

No diff b7836d856206 vs 645f6dcd69a5(HEAD)

My repro:

; ModuleID = '<bc file>'
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

%"struct.devtools::inliner::CallArg2.2307.4010.6850.7702.9690.1453.4566.8245.24575.24695.24965.25235.25445.26645.26877.26937.27207.27387.27673.27761.27849.27893.27915.27959.27981.28003.28016.28068.28120.28133.28185.28237.28250.50.102.115" = type { %"class.std::__u::optional.2306.4009.6849.7701.9689.1452.4565.8244.24574.24694.24964.25234.25444.26644.26876.26936.27206.27386.27672.27760.27848.27892.27914.27958.27980.28002.28015.28067.28119.28132.28184.28236.28249.49.101.114" }
%"class.std::__u::optional.2306.4009.6849.7701.9689.1452.4565.8244.24574.24694.24964.25234.25444.26644.26876.26936.27206.27386.27672.27760.27848.27892.27914.27958.27980.28002.28015.28067.28119.28132.28184.28236.28249.49.101.114" = type { %"struct.std::__u::__optional_move_assign_base.base.2305.4008.6848.7700.9688.1451.4564.8243.24573.24693.24963.25233.25443.26643.26875.26935.27205.27385.27671.27759.27847.27891.27913.27957.27979.28001.28014.28066.28118.28131.28183.28235.28248.48.100.113", [3 x i8] }
%"struct.std::__u::__optional_move_assign_base.base.2305.4008.6848.7700.9688.1451.4564.8243.24573.24693.24963.25233.25443.26643.26875.26935.27205.27385.27671.27759.27847.27891.27913.27957.27979.28001.28014.28066.28118.28131.28183.28235.28248.48.100.113" = type { %"struct.std::__u::__optional_copy_assign_base.base.2304.4007.6847.7699.9687.1450.4563.8242.24572.24692.24962.25232.25442.26642.26874.26934.27204.27384.27670.27758.27846.27890.27912.27956.27978.28000.28013.28065.28117.28130.28182.28234.28247.47.99.112" }
%"struct.std::__u::__optional_copy_assign_base.base.2304.4007.6847.7699.9687.1450.4563.8242.24572.24692.24962.25232.25442.26642.26874.26934.27204.27384.27670.27758.27846.27890.27912.27956.27978.28000.28013.28065.28117.28130.28182.28234.28247.47.99.112" = type { %"struct.std::__u::__optional_move_base.base.2303.4006.6846.7698.9686.1449.4562.8241.24571.24691.24961.25231.25441.26641.26873.26933.27203.27383.27669.27757.27845.27889.27911.27955.27977.27999.28012.28064.28116.28129.28181.28233.28246.46.98.111" }
%"struct.std::__u::__optional_move_base.base.2303.4006.6846.7698.9686.1449.4562.8241.24571.24691.24961.25231.25441.26641.26873.26933.27203.27383.27669.27757.27845.27889.27911.27955.27977.27999.28012.28064.28116.28129.28181.28233.28246.46.98.111" = type { %"struct.std::__u::__optional_copy_base.base.2302.4005.6845.7697.9685.1448.4561.8240.24570.24690.24960.25230.25440.26640.26872.26932.27202.27382.27668.27756.27844.27888.27910.27954.27976.27998.28011.28063.28115.28128.28180.28232.28245.45.97.110" }
%"struct.std::__u::__optional_copy_base.base.2302.4005.6845.7697.9685.1448.4561.8240.24570.24690.24960.25230.25440.26640.26872.26932.27202.27382.27668.27756.27844.27888.27910.27954.27976.27998.28011.28063.28115.28128.28180.28232.28245.45.97.110" = type { %"struct.std::__u::__optional_storage_base.base.2301.4004.6844.7696.9684.1447.4560.8239.24569.24689.24959.25229.25439.26639.26871.26931.27201.27381.27667.27755.27843.27887.27909.27953.27975.27997.28010.28062.28114.28127.28179.28231.28244.44.96.109" }
%"struct.std::__u::__optional_storage_base.base.2301.4004.6844.7696.9684.1447.4560.8239.24569.24689.24959.25229.25439.26639.26871.26931.27201.27381.27667.27755.27843.27887.27909.27953.27975.27997.28010.28062.28114.28127.28179.28231.28244.44.96.109" = type { %"struct.std::__u::__optional_destruct_base.base.2300.4003.6843.7695.9683.1446.4559.8238.24568.24688.24958.25228.25438.26638.26870.26930.27200.27380.27666.27754.27842.27886.27908.27952.27974.27996.28009.28061.28113.28126.28178.28230.28243.43.95.108" }
%"struct.std::__u::__optional_destruct_base.base.2300.4003.6843.7695.9683.1446.4559.8238.24568.24688.24958.25228.25438.26638.26870.26930.27200.27380.27666.27754.27842.27886.27908.27952.27974.27996.28009.28061.28113.28126.28178.28230.28243.43.95.108" = type { %union.anon.40.2299.4002.6842.7694.9682.1445.4558.8237.24567.24687.24957.25227.25437.26637.26869.26929.27199.27379.27665.27753.27841.27885.27907.27951.27973.27995.28008.28060.28112.28125.28177.28229.28242.42.94.107, i8 }
%union.anon.40.2299.4002.6842.7694.9682.1445.4558.8237.24567.24687.24957.25227.25437.26637.26869.26929.27199.27379.27665.27753.27841.27885.27907.27951.27973.27995.28008.28060.28112.28125.28177.28229.28242.42.94.107 = type { %"class.clang::CharSourceRange.2289.3992.6832.7684.9672.1435.4548.8227.24566.24686.24956.25226.25436.26636.26868.26928.27198.27378.27664.27752.27840.27884.27906.27950.27972.27994.28007.28059.28111.28124.28176.28228.28241.41.93.106" }
%"class.clang::CharSourceRange.2289.3992.6832.7684.9672.1435.4548.8227.24566.24686.24956.25226.25436.26636.26868.26928.27198.27378.27664.27752.27840.27884.27906.27950.27972.27994.28007.28059.28111.28124.28176.28228.28241.41.93.106" = type <{ %"class.clang::SourceRange.2288.3991.6831.7683.9671.1434.4547.8226.24565.24685.24955.25225.25435.26635.26867.26927.27197.27377.27663.27751.27839.27883.27905.27949.27971.27993.28006.28058.28110.28123.28175.28227.28240.40.92.105", i8, [3 x i8] }>
%"class.clang::SourceRange.2288.3991.6831.7683.9671.1434.4547.8226.24565.24685.24955.25225.25435.26635.26867.26927.27197.27377.27663.27751.27839.27883.27905.27949.27971.27993.28006.28058.28110.28123.28175.28227.28240.40.92.105" = type { %"class.clang::SourceLocation.2287.3990.6830.7682.9670.1433.4546.8225.24555.24675.24945.25215.25425.26625.26857.26917.27187.27367.27653.27741.27829.27873.27895.27939.27961.27983.28005.28057.28109.28122.28174.28226.28239.39.91.104", %"class.clang::SourceLocation.2287.3990.6830.7682.9670.1433.4546.8225.24555.24675.24945.25215.25425.26625.26857.26917.27187.27367.27653.27741.27829.27873.27895.27939.27961.27983.28005.28057.28109.28122.28174.28226.28239.39.91.104" }
%"class.clang::SourceLocation.2287.3990.6830.7682.9670.1433.4546.8225.24555.24675.24945.25215.25425.26625.26857.26917.27187.27367.27653.27741.27829.27873.27895.27939.27961.27983.28005.28057.28109.28122.28174.28226.28239.39.91.104" = type { i32 }
%"struct.std::__u::__optional_destruct_base.2555.4258.7098.7950.9938.1701.4814.8493.24576.24696.24966.25236.25446.26646.26885.26945.27215.27395.27674.27762.27850.27894.27916.27960.27982.28004.28017.28069.28121.28134.28186.28238.28251.51.103.116" = type { %union.anon.40.2299.4002.6842.7694.9682.1445.4558.8237.24567.24687.24957.25227.25437.26637.26869.26929.27199.27379.27665.27753.27841.27885.27907.27951.27973.27995.28008.28060.28112.28125.28177.28229.28242.42.94.107, i8, [3 x i8] }

; Function Attrs: noinline
define void @_ZN8devtools7inliner14ParseCallArgs3ERKN5clang8CallExprERKNS1_12FunctionDeclERNS1_10ASTContextE(ptr %0, ptr %1, i40 %2, ptr %3, i32 %4) #0 {
  br label %9

6:                                                ; preds = %21, %18
  %.sroa.0.0 = phi ptr [ %24, %21 ], [ null, %18 ]
  %.sroa.5.0 = phi ptr [ %25, %21 ], [ null, %18 ]
  %7 = add i32 %10, 1
  %8 = icmp eq i32 %10, %4
  br i1 %8, label %27, label %9

9:                                                ; preds = %6, %5
  %.sroa.5.1 = phi ptr [ null, %5 ], [ %.sroa.5.0, %6 ]
  %10 = phi i32 [ 0, %5 ], [ %7, %6 ]
  %11 = phi i40 [ undef, %5 ], [ %19, %6 ]
  %12 = call ptr @_ZN5clang4Expr27IgnoreUnlessSpelledInSourceEv()
  %13 = load i8, ptr %1, align 8
  %14 = icmp ult i8 %13, -5
  %15 = and i40 %11, 4294967295
  br i1 %14, label %18, label %16

16:                                               ; preds = %9
  %17 = load volatile { i64, i64 }, ptr null, align 4294967296
  br label %18

18:                                               ; preds = %16, %9
  %19 = phi i40 [ %15, %9 ], [ %2, %16 ]
  %20 = icmp ugt ptr %.sroa.5.1, %0
  br i1 %20, label %6, label %21

21:                                               ; preds = %18
  %22 = icmp eq ptr %.sroa.5.1, null
  %23 = zext i1 %22 to i64
  %24 = call ptr @_Znwm(i64 0)
  %25 = getelementptr %"struct.devtools::inliner::CallArg2.2307.4010.6850.7702.9690.1453.4566.8245.24575.24695.24965.25235.25445.26645.26877.26937.27207.27387.27673.27761.27849.27893.27915.27959.27981.28003.28016.28068.28120.28133.28185.28237.28250.50.102.115", ptr %3, i64 %23
  %26 = getelementptr i8, ptr %24, i64 8
  store i40 %19, ptr %26, align 4
  br label %6

27:                                               ; preds = %6
  %28 = getelementptr %"struct.std::__u::__optional_destruct_base.2555.4258.7098.7950.9938.1701.4814.8493.24576.24696.24966.25236.25446.26646.26885.26945.27215.27395.27674.27762.27850.27894.27916.27960.27982.28004.28017.28069.28121.28134.28186.28238.28251.51.103.116", ptr %.sroa.0.0, i64 0, i32 1
  %29 = load i8, ptr %28, align 4
  %30 = icmp eq i8 %29, 0
  br i1 %30, label %32, label %31

31:                                               ; preds = %27
  call void @__ubsan_handle_load_invalid_value_abort(ptr %0)
  unreachable

32:                                               ; preds = %27
  ret void

; uselistorder directives
  uselistorder i32 %10, { 1, 0 }
}

define void @_ZN8devtools7inliner14ParseCallArgs2ERKN5clang8CallExprERKNS1_12FunctionDeclERNS1_10ASTContextE(ptr %0, ptr %1) {
  call void @_ZN8devtools7inliner14ParseCallArgs3ERKN5clang8CallExprERKNS1_12FunctionDeclERNS1_10ASTContextE(ptr %1, ptr %0, i40 0, ptr null, i32 0)
  ret void
}

declare ptr @_ZN5clang4Expr27IgnoreUnlessSpelledInSourceEv()

declare void @__ubsan_handle_load_invalid_value_abort(ptr)

declare ptr @_Znwm(i64)

attributes #0 = { noinline "frame-pointer"="all" }

Before this patch __ubsan_handle_load_invalid_value_abort was not reached, now it is.

llc ./llvm-reduce-42f916.ll -O3 -o ./llvm-reduce-42f916.ll.<revision>.s
diff -u --color ./llvm-reduce-42f916.ll.eb98abab2c83.s ./llvm-reduce-42f916.ll.b7836d856206.s :

--- ./llvm-reduce-42f916.ll.eb98abab2c83.s	2023-07-19 10:38:05.571863413 -0700
+++ ./llvm-reduce-42f916.ll.b7836d856206.s	2023-07-19 10:39:47.587632219 -0700
@@ -23,14 +23,14 @@
 	.cfi_offset %r14, -32
 	.cfi_offset %r15, -24
 	movl	%r8d, %r14d
-	movq	%rcx, -64(%rbp)                 # 8-byte Spill
-	movq	%rdx, -56(%rbp)                 # 8-byte Spill
-	movq	%rsi, %r13
+	movq	%rcx, -56(%rbp)                 # 8-byte Spill
+	movq	%rdx, -48(%rbp)                 # 8-byte Spill
+	movq	%rsi, %r12
 	movq	%rdi, %r15
 	incl	%r14d
 	xorl	%ebx, %ebx
-                                        # implicit-def: $r12
-	movq	%rsi, -48(%rbp)                 # 8-byte Spill
+                                        # implicit-def: $rax
+                                        # kill: killed $rax
 	jmp	.LBB0_3
 	.p2align	4, 0x90
 .LBB0_1:                                #   in Loop: Header=BB0_3 Depth=1
@@ -41,41 +41,37 @@
 	xorl	%edi, %edi
 	callq	_Znwm@PLT
 	shlq	$4, %r15
-	addq	-64(%rbp), %r15                 # 8-byte Folded Reload
-	movq	%r12, %rcx
+	addq	-56(%rbp), %r15                 # 8-byte Folded Reload
+	movq	-64(%rbp), %rdx                 # 8-byte Reload
+	movq	%rdx, %rcx
 	shrq	$32, %rcx
 	movb	%cl, 12(%rax)
-	movl	%r12d, 8(%rax)
+	movl	%edx, 8(%rax)
 	movq	%r15, %rbx
 	movq	%r13, %r15
-	movq	-48(%rbp), %r13                 # 8-byte Reload
 	decl	%r14d
-	je	.LBB0_8
+	je	.LBB0_7
 .LBB0_3:                                # =>This Inner Loop Header: Depth=1
 	callq	_ZN5clang4Expr27IgnoreUnlessSpelledInSourceEv@PLT
-	cmpb	$-5, (%r13)
-	jae	.LBB0_5
+	cmpb	$-5, (%r12)
+	jb	.LBB0_5
 # %bb.4:                                #   in Loop: Header=BB0_3 Depth=1
-	movl	%r12d, %r12d
-	cmpq	%r15, %rbx
-	jbe	.LBB0_1
-	jmp	.LBB0_7
-	.p2align	4, 0x90
-.LBB0_5:                                #   in Loop: Header=BB0_3 Depth=1
 	movq	0, %rax
 	movq	8, %rax
-	movq	-56(%rbp), %r12                 # 8-byte Reload
+	movq	-48(%rbp), %rax                 # 8-byte Reload
+	movq	%rax, -64(%rbp)                 # 8-byte Spill
+.LBB0_5:                                #   in Loop: Header=BB0_3 Depth=1
 	cmpq	%r15, %rbx
 	jbe	.LBB0_1
-.LBB0_7:                                #   in Loop: Header=BB0_3 Depth=1
+# %bb.6:                                #   in Loop: Header=BB0_3 Depth=1
 	xorl	%eax, %eax
 	xorl	%ebx, %ebx
 	decl	%r14d
 	jne	.LBB0_3
-.LBB0_8:
+.LBB0_7:
 	cmpb	$0, 12(%rax)
-	jne	.LBB0_10
-# %bb.9:
+	jne	.LBB0_9
+# %bb.8:
 	addq	$24, %rsp
 	popq	%rbx
 	popq	%r12
@@ -85,7 +81,7 @@
 	popq	%rbp
 	.cfi_def_cfa %rsp, 8
 	retq
-.LBB0_10:
+.LBB0_9:
 	.cfi_def_cfa %rbp, 16
 	movq	%r15, %rdi
 	callq	__ubsan_handle_load_invalid_value_abort@PLT

No diff b7836d856206 vs 645f6dcd69a5(HEAD)

I see the diff disappear if I make isCopyInstr skip isCopyInstrImpl. I'm not seeing anything clearly wrong with the codegen decisions here. The apparent block body removal in %bb.4, and the implicit_def; kill in the entry look suspicious, but I'm not seeing how it's wrong.

The isCopyInstrImpl changes 1 spill weight based on an identity mov. That just triggers a bunch of other different spilling decisions and you end up with different code.

If I step back and look at the original IR, it's branching on undef. From the entry in %bb, on the first branch to %bb.7, it's branching on undef here:

bb:
   br label %bb7

bb7:
 ...
  %i9 = phi i40 [ undef, %bb ], [ %i17, %bb5 ]
  ...
  %i13 = and i40 %i9, 4294967295
  br i1 %i12, label %bb16, label %bb14

Maybe this is just an artifact of reduction? If you try opt-bisect-limit, does it point at some other pass?

I see the diff disappear if I make isCopyInstr skip isCopyInstrImpl. I'm not seeing anything clearly wrong with the codegen decisions here. The apparent block body removal in %bb.4, and the implicit_def; kill in the entry look suspicious, but I'm not seeing how it's wrong.

I've tried replacing isCopyInstrImpl() call in isCopyInstr() with a return std::nullopt; and the issue disappeared. I don't know how my experiment is different to what @asmok-g did, but this definitely helps. I tried this in two configurations:

The patch I used:

--- llvm/include/llvm/CodeGen/TargetInstrInfo.h
+++ llvm/include/llvm/CodeGen/TargetInstrInfo.h
@@ -1039,11 +1039,11 @@
   /// target-dependent implementation.
   std::optional<DestSourcePair> isCopyInstr(const MachineInstr &MI) const {
     if (MI.isCopy()) {
       return DestSourcePair{MI.getOperand(0), MI.getOperand(1)};
     }
-    return isCopyInstrImpl(MI);
+    return std::nullopt;
   }

   bool isFullCopyInstr(const MachineInstr &MI) const {
     auto DestSrc = isCopyInstr(MI);
     if (!DestSrc)

Is this change something that can be committed to the tree? Do you need a better test case from us?

If I step back and look at the original IR, it's branching on undef. From the entry in %bb, on the first branch to %bb.7, it's branching on undef here:

bb:
   br label %bb7

bb7:
 ...
  %i9 = phi i40 [ undef, %bb ], [ %i17, %bb5 ]
  ...
  %i13 = and i40 %i9, 4294967295
  br i1 %i12, label %bb16, label %bb14

Maybe this is just an artifact of reduction? If you try opt-bisect-limit, does it point at some other pass?

I'm almost sure it's an artifact of reduction, but it's also not trivial to create an interestingness test that completely avoids this sort of a degradation of the test case to something depending on UB.

Not sure if this is useful when we have IR, but is my creduce

template <class a> void b(a &c, a p2) { c = p2; }
void *operator new(unsigned long, void *);
inline namespace e {
template <class... f> void *g(f... c) { return __builtin_operator_new(c...); }
void *h(unsigned c) { return g(c); }
int l;
template <class> struct aa;
template <class a, class = aa<a>> struct j;
template <class a> struct k {
  union {
    char ab;
    a b;
  };
  bool n;
  k() : ab(), n() {}
};
template <class a> struct ac : k<a> {
  bool m() { return this->n; }
};
template <class r> struct o {
  using ad = r::ae;
};
template <class af> struct ag {
  using ad = af::ah;
};
template <class af> struct s {
  using p = af;
  using ae = o<p>::ad;
  using ah = ag<p>::ad;
  template <class a, class... f> static void ai(p c, a p2, f &&...p3) {
    c.ai(p2, p3...);
  }
};
template <class q> struct aj {
  q ak;
};
int al;
template <class af> aj<typename af ::ae> am(af c) { return {c.allocate(al)}; }
template <class a> struct aa {
  a *allocate(int) { return static_cast<a *>(h(sizeof(a))); }
  typedef a *ae;
  typedef a *ah;
  template <class an, class... f> void ai(an *c, f &&...p2) {
    new (c) an(p2...);
  }
};
template <class a> struct ao {
  using ap = a &;
  template <class an> ao(an) : aq() {}
  ap ar() { return aq; }
  a aq;
};
template <class as> struct at : ao<as> {
  using au = ao<as>;
  template <class av, class aw> at(av c, aw) : au(c) {}
  au::ap ax() { return (*this).ar(); }
};
template <class ay> struct az {
  using p = ay;
  using ba = p;
  using bb = ba;
  using ae = bb::ae;
  ae bc;
  ae bd;
  ae be;
  at<ae> bf;
  az(unsigned, unsigned, ba &);
  ae &bg() { return bf.ax(); }
};
template <class ay> az<ay>::az(unsigned c, unsigned, ba &p3) : bf(nullptr, p3) {
  ba bh;
  auto bi = am(bh);
  bc = bi.ak;
  bd = be = bc;
  bg() = bc + c;
}
template <class a, class ay> struct j {
  typedef a bj;
  typedef ay p;
  typedef s<p> bb;
  typedef bb::ae ae;
  typedef bb::ah bk;
  bk begin() const noexcept;
  bk end() const noexcept;
  void bl(bj &&);
  ae bd;
  ae be;
  at<ae> bf = at<ae>(nullptr, int());
  template <class an> void bm(an &&);
  ae &bg() { return bf.ax(); }
};
template <class a, class ay> j<a, ay>::bk j<a, ay>::begin() const noexcept {
  return bd;
}
template <class a, class ay> j<a, ay>::bk j<a, ay>::end() const noexcept {
  return be;
}
template <class a, class ay> template <class an> void j<a, ay>::bm(an &&c) {
  p x;
  az bn(1, 0, x);
  a bo(c);
  bb::ai(x, bn.be, bo);
  bn.be++;
  b(bd, bn.bd);
  b(be, bn.be);
  ae bq = bn.bg();
  b(bg(), bq);
}
template <class a, class ay> void j<a, ay>::bl(bj &&c) {
  if (bg())
    ;
  else
    bm(c);
}
} // namespace e
template <typename bp> struct br {
  template <typename... bs> br(int, bs... p2) : bt(p2...) {}
  bp bt;
};
template <typename bp> struct bv : br<bp> {
  template <typename... bs> bv(int, bs &&...);
  template <typename bu> bv(bu c) : bv(l, c) {}
  bp &operator*() &;
};
template <typename bp>
template <typename... bs>
bv<bp>::bv(int, bs &&...p2) : br<bp>(l, p2...) {}
template <typename bp> bp &bv<bp>::operator*() & { return this->bt; }
namespace ca {
template <typename bw> struct bx {
  using by = bw;
};
template <typename bz, typename bw> struct cc {
  static bool cb(bw c) { return bz::cd(&c); }
};
template <typename, typename> struct ce;
template <typename bz, typename bw> struct ce<bz, bw *> {
  static bool cb(bw *c) { return cc<bz, bw>::cb(*c); }
};
template <typename, typename, typename> struct cf;
template <typename bz, typename ci> struct cf<bz, ci, ci> {
  static bool cb(ci c) { return ce<bz, ci>::cb(c); }
};
struct cg {
  using cm = int *;
};
struct cj {
  using cm = cg::cm;
};
struct ck {
  using cm = cj::cm;
};
template <class> struct cl;
template <class ci> struct cl<ci *> {
  static ck::cm cb(ci *c) { return (ck::cm)c; }
};
template <typename bz, typename bw> struct cn {
  static bool cr(bw c) { return cf<bz, bw, typename bx<bw>::by>::cb(c); }
};
template <typename bz, typename bw> struct co : cn<bz, bw> {
  using cq = co;
  using cp = ck::cm;
  static cp cv(bw c) {
    cp cs;
    if (!cq::cr(c))
      return 0;
    cs = cl<typename bx<bw>::by>::cb(c);
    return cs;
  }
};
template <typename bz, typename bw> auto ct(bw c) { return co<bz, bw>::cv(c); }
} // namespace ca
using ca::ct;
namespace ca {
template <typename cy> struct cu {
  cy a;
};
template <typename, typename, int...> struct db;
template <typename cw, typename cx, int da> struct db<cw, cx, da> {
  cx ch;
};
struct cz : db<cz, cu<void *>, 0> {};
} // namespace ca
namespace clang {
struct SourceManager;
struct dc {
  using dd = int;
  dd de;
};
struct dg {
  dc df;
  dc d;
};
struct CharSourceRange {
  dg dh;
  bool di;
  CharSourceRange(bool) {}
};
struct LangOptions;
struct u {
  ca::cu<ca::cz> a;
};
struct dj {
  enum dk { v };
  class {
    friend dj;
    unsigned dl;
  } w;
  dk z() { return dk(w.dl); }
};
struct dm : dj {
  u dn;
};
struct CallExpr : dm {
  unsigned y;
  unsigned getNumArgs() { return y; }
  dm *getArg(unsigned);
  static bool cd(dj *c) { return c->z() <= v; }
};
struct ASTContext {
  SourceManager &SourceMgr;
  LangOptions &LangOpts;
  SourceManager &getSourceManager() { return SourceMgr; }
  LangOptions &getLangOpts() { return LangOpts; }
};
namespace tooling {
ac<CharSourceRange> getFileRangeForEdit(const CharSourceRange &,
                                        const SourceManager &,
                                        const LangOptions &, bool);
ac<CharSourceRange> getFileRangeForEdit(CharSourceRange c, ASTContext p2,
                                        bool p3) {
  SourceManager &__trans_tmp_4 = p2.getSourceManager();
  LangOptions &__trans_tmp_5 = p2.getLangOpts();
  return getFileRangeForEdit(c, __trans_tmp_4, __trans_tmp_5, p3);
}
} // namespace tooling
struct CallArg2 {
  ac<CharSourceRange> std_move_call;
};
void ParseCallArgs3(CallExpr &c, ASTContext p2) {
  int num_args = c.getNumArgs();
  j<CallArg2> args;
  for (int i = 0; i < num_args; ++i) {
    dm __trans_tmp_6 = *c.getArg(i), arg = __trans_tmp_6;
    dm *std_move_call = nullptr;
    if (ct<CallExpr>(&arg))
      std_move_call = &arg;
    ac<CharSourceRange> std_move_range;
    if (std_move_call)
      std_move_range = tooling::getFileRangeForEdit(0, p2, true);
    args.bl({std_move_range});
  }
  bv<j<CallArg2>> arg2(args);
  for (auto t : *arg2)
    t.std_move_call.m();
}
extern "C" {
void *pcontext;
void *pcall;
void ParseCallArgs2() {
  ParseCallArgs3(*(CallExpr *)pcall, *(ASTContext *)pcontext);
}
}
} // namespace clang

the produce a different assembly with and without patch
clang -cc1 -triple x86_64-grtev4-linux-gnu -S -target-cpu x86-64 -O1 -fsanitize=bool -x c++ test.cc -o - -w

And if linked into my binary produces UBsan report with D150388, and no report on D150388^.

vitalybuka added a comment.EditedJul 24 2023, 10:07 AM

@yassingh @arsenm
The major target is broken for too long, we need either fix or revert, even if it's a chain of patches.

@yassingh @arsenm
The major target is broken for too long, we need either fix or revert, even if it's a chain of patches.

I just posted about this issue:
https://discourse.llvm.org/t/subreg-to-reg-semantics-or-x86s-zext-implementation-is-broken/72250

Either x86 is broken or the definition of SUBREG_TO_REG is broken

@yassingh @arsenm
The major target is broken for too long, we need either fix or revert, even if it's a chain of patches.

I just posted about this issue:
https://discourse.llvm.org/t/subreg-to-reg-semantics-or-x86s-zext-implementation-is-broken/72250

Either x86 is broken or the definition of SUBREG_TO_REG is broken

So if we have not fix, just discussion, should we revert? Even if it's preexisted unknown bug, I don't think it's appropriate to keep regression in existing functionality, for sake of fixing previously broken or missing functionally.
Is it possible to workaround/revert some change with "if (x86) do old way" ?

@yassingh @arsenm
The major target is broken for too long, we need either fix or revert, even if it's a chain of patches.

I just posted about this issue:
https://discourse.llvm.org/t/subreg-to-reg-semantics-or-x86s-zext-implementation-is-broken/72250

Either x86 is broken or the definition of SUBREG_TO_REG is broken

So if we have not fix, just discussion, should we revert? Even if it's preexisted unknown bug, I don't think it's appropriate to keep regression in existing functionality, for sake of fixing previously broken or missing functionally.
Is it possible to workaround/revert some change with "if (x86) do old way" ?

Let me prepare a patch to insert explicit zeroing here, see if it fixes the issue and how badly it blows up the lit suite

So if we have not fix, just discussion, should we revert? Even if it's preexisted unknown bug, I don't think it's appropriate to keep regression in existing functionality, for sake of fixing previously broken or missing functionally.
Is it possible to workaround/revert some change with "if (x86) do old way" ?

Try https://reviews.llvm.org/D156164 as a workaround. The high-bits-must-be-zero invariant disappears somewhere in the coalescer, this just hacks the resulting broken mov such that it hides the problem as it was before

So if we have not fix, just discussion, should we revert? Even if it's preexisted unknown bug, I don't think it's appropriate to keep regression in existing functionality, for sake of fixing previously broken or missing functionally.
Is it possible to workaround/revert some change with "if (x86) do old way" ?

Try https://reviews.llvm.org/D156164 as a workaround. The high-bits-must-be-zero invariant disappears somewhere in the coalescer, this just hacks the resulting broken mov such that it hides the problem as it was before

Much better, probably real fix incoming

So if we have not fix, just discussion, should we revert? Even if it's preexisted unknown bug, I don't think it's appropriate to keep regression in existing functionality, for sake of fixing previously broken or missing functionally.
Is it possible to workaround/revert some change with "if (x86) do old way" ?

Try https://reviews.llvm.org/D156164 as a workaround. The high-bits-must-be-zero invariant disappears somewhere in the coalescer, this just hacks the resulting broken mov such that it hides the problem as it was before

Much better, probably real fix incoming

There is still neither a fix nor even a workaround in the tree. I think, the right solution in such cases is reverting to the last known good state and investigating asynchronously. I'm not saying this would be the best approach right now, given there is a workaround in review. But it would have been the right choice almost two weeks ago, when the issue was reported.

Can we have at least the workaround submitted?

Can we have at least the workaround submitted?

Could use another review I think given that it should also be cherry picked to the release branch. I should have a more general fix tomorrow

Can we have at least the workaround submitted?

Could use another review I think given that it should also be cherry picked to the release branch. I should have a more general fix tomorrow

I'd be glad to help with the review, but I can't add more to the information @vitalybuka provided. Unfortunately I'm not an expert in the area and can't bring one in this case.

More generally speaking, this situation highlights the difference between reverting to a known good (*) state and fixing forward. When reverting, no thorough review is required and (if all dependent commits are reverted) we're guaranteed to return to a stable state reasonably quick. This way, for each discovered problem there is a commit soon enough that fixes it. This approach makes any release process of an LLVM based toolchain (be it one of the toolchains released by Google, official LLVM releases, toolchains in Linux distros, other companies' private toolchains, etc.) easier and more predictable. On the opposite side, an attempt to fix forward (unless trivial one that would normally require no review at all) can take arbitrarily long to be prepared and reviewed, and can open a completely new can of worms, especially if it triggers an unobvious problem that takes another few days to detect and root-cause.

(*) well, in the worst case, there may be known issues that the problematic commit was meant to resolve, but if "revert and then investigate" approach is applied systematically, these would be long-standing issues with initially a much longer resolution time frame. And here I'm assuming that fixing an issue should not lead to regressions in this or other parts of the project (hopefully, there's no disagreement on this point).

I'm not saying anything new here, just trying to push for a bit more systematic application of the existing patch reversion policy (https://llvm.org/docs/DeveloperPolicy.html#patch-reversion-policy) and, similarly, in the code review policy (https://llvm.org/docs/CodeReview.html#can-code-be-reviewed-after-it-is-committed).

More proper fix ends with stack at D156346

I've also pushed the stack to https://github.com/arsenm/llvm-project/tree/fix-subreg-to-reg-liveness

vitalybuka reopened this revision.Jul 26 2023, 10:15 PM
This revision is now accepted and ready to land.Jul 26 2023, 10:15 PM