This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
2/4
TargetInstrInfo.h
-
lib/CodeGen/
-
CodeGen/
1/1
CalcSpillWeights.cpp
3/4
InlineSpiller.cpp
1/1
LiveRangeEdit.cpp
1/1
LiveRangeShrink.cpp
2
RegAllocGreedy.cpp
-
SplitKit.h
-
SplitKit.cpp
1
TargetInstrInfo.cpp
-
test/CodeGen/
-
CodeGen/
-
Mips/
-
madd-msub.ll
-
Thumb2/
-
mve-float32regloops.ll
-
X86/
-
GlobalISel/
-
add-ext.ll
-
dagcombine-cse.ll
-
fold-and-shift-x86_64.ll
-
unfold-masked-merge-scalar-constmask-lowhigh.ll

Differential D150388

[CodeGen]Allow targets to use target specific COPY instructions for live range splitting
ClosedPublic

Authored by yassingh on May 11 2023, 11:28 AM.

Download Raw Diff

Details

Reviewers

arsenm
cdevadas
qcolombet
MatzeB
kparzysz
atanasyan
SjoerdMeijer
sdardis

Commits

rGb7836d856206: [CodeGen]Allow targets to use target specific COPY instructions for live range…

Summary

Replacing D143754. Right now the LiveRangeSplitting during register allocation uses
TargetOpcode::COPY instruction for splitting. For AMDGPU target that creates a
problem as we have both vector and scalar copies. Vector copies perform a copy over
a vector register but only on the lanes(threads) that are active. This is mostly sufficient
however we do run into cases when we have to copy the entire vector register and
not just active lane data. One major place where we need that is live range splitting.

Allowing targets to use their own copy instructions(if defined) will provide a lot of
flexibility and ease to lower these pseudo instructions to correct MIR.

Introduce getTargetCopyOpcode() virtual function and use if to generate copy in Live range splitting.
Replace necessary MI.isCopy() checks with TII.isCopyInstr() in register allocator pipeline.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

yassingh created this revision.May 11 2023, 11:28 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2023, 11:28 AM

Herald added subscribers: pengfei, dmgreen, atanasyan and 6 others. · View Herald Transcript

yassingh requested review of this revision.May 11 2023, 11:28 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2023, 11:28 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

~~This is still a work in progress,~~ I have tried to describe the challenges in the latest comment on the discourse thread (https://discourse.llvm.org/t/rfc-introduce-generic-predicated-copy-opcode/68494/11).

Edit: Seeking to get it reviewed now.

yassingh added a child revision: D150390: [AMDGPU] Introduce and use the new PRED_COPY opcode.May 11 2023, 11:42 AM

Harbormaster completed remote builds in B231398: Diff 521385.May 11 2023, 12:40 PM

yassingh added reviewers: arsenm, cdevadas, qcolombet, MatzeB, kparzysz, atanasyan, SjoerdMeijer.May 14 2023, 11:11 PM

Herald added a subscriber: wdng. · View Herald TranscriptMay 14 2023, 11:11 PM

As described in the discourse thread, the target's implementation of isCopyInstrImpl() does not allow straightforward use of TII::isCopyInstr() in the regalloc pipeline. One MIPS test crashes(LLVM.CodeGen/Mips::dsp-r1.ll) on which I agree with @arsenm can be a bug, and some tests in X86 and Thumb2 needed updating, which is to be expected?

@atanasyan @SjoerdMeijer

Marked CodeGen/Mips/dsp-r1.ll XFAIL for now

Harbormaster completed remote builds in B232806: Diff 523306.May 18 2023, 3:41 AM

Ping: Hi! I'm still looking to get some initial feedback, especially on MIPS and X86 test changes.

I think the changes on X86 tests are correct.

arsenm added inline comments.May 19 2023, 8:56 AM

llvm/include/llvm/CodeGen/TargetInstrInfo.h
1967	Register should be passed by value. MRI should also be const. Needs a doc comment. I'd also rename this to something like getLiveRangeSplitOpcode?
llvm/lib/CodeGen/TargetInstrInfo.cpp
445	I'd prefer to pass in TII separately rather than jumping through all these hoops. Also, this is only used in the assert so will warn in release build
llvm/test/CodeGen/Mips/dsp-r1.ll
3–5 ↗	(On Diff #523306)	Can't do this, just fix the mips isCopy implementation?

Updated Mips::isCopyInstrImpl()
Review comments

Harbormaster completed remote builds in B233517: Diff 524217.May 22 2023, 4:13 AM

yassingh added a reviewer: sdardis.May 22 2023, 4:29 AM

cdevadas mentioned this in D150390: [AMDGPU] Introduce and use the new PRED_COPY opcode.May 22 2023, 8:42 AM

sdardis added inline comments.May 22 2023, 3:25 PM

llvm/lib/Target/Mips/MipsSEInstrInfo.cpp
229 ↗	(On Diff #524217)	I believe the crash associated with the most immediate version of this patch was due to the definition of those instructions which take an immediate mask to specify which fields to read or write as the immediate operand is the place of the expected source operand. This `if` branch can be removed with some changes I think. The MIPS `rddsp` and `wrdsp` instructions can read/write either fields or the entire `DSPCond` register, provided the instruction flag of 'isMoveReg' is set to zero for (RD\|WR)DSP instructions. As those instructions take the `DSPCond`s subregisters as implicit operands, I don't think those instructions really match the expectations of the `isCopy` hook in their current form. I would suggest removing `isMoveReg = 1` from `wrdsp` and `rddsp` so they aren't considered simple copy-like instructions for the moment.

yassingh added a parent revision: D151181: [Mips] Remove isMoveReg=1 from wrdsp and rddsp instructions.May 23 2023, 12:00 AM

yassingh mentioned this in D151181: [Mips] Remove isMoveReg=1 from wrdsp and rddsp instructions.May 23 2023, 12:03 AM

yassingh added inline comments.

llvm/lib/Target/Mips/MipsSEInstrInfo.cpp
229 ↗	(On Diff #524217)	Thanks for the feedback @sdardis. I have removed "isMoveReg=1" from both instructions as you suggested. D151181

Rebase and ping.

Harbormaster completed remote builds in B235088: Diff 526356.May 28 2023, 7:55 PM

@qcolombet can you take a look?
This will replace the initial implementation of generic pred-copy opcode, D143754.

cdevadas mentioned this in D143754: [MachineInstr] Introduce generic predicated copy opcode.Jun 3 2023, 6:00 PM

Hi,

The patch looks reasonable to me.
We need to fix the use of getOperand(0) and getOperand(1) everywhere.

I've highlighted a couple of these. I let you take a closer look.

Cheers,
-Quentin

llvm/include/llvm/CodeGen/TargetInstrInfo.h
1050	Instead of checking operand 0 and 1 directly, we should get the pair from `isCopyInstr` and check these.
llvm/lib/CodeGen/CalcSpillWeights.cpp
228	Similar comment, everywhere we introduce a `isCopyInstr`, we need to check the returned pair.
llvm/lib/CodeGen/RegAllocGreedy.cpp
2457–2458	Ditto

Thanks for the feedback @qcolombet! I have updated all uses of getOperand(0|1)

Harbormaster completed remote builds in B236831: Diff 528713.Jun 6 2023, 2:01 AM

yassingh added a child revision: D143758: [CodeGen] MRI call back in TargetMachine.Jun 6 2023, 5:46 AM

arsenm added inline comments.Jun 8 2023, 5:00 PM

llvm/include/llvm/CodeGen/TargetInstrInfo.h
1970	MRI should be const
llvm/lib/CodeGen/InlineSpiller.cpp
874	Don't need ? true : false
llvm/lib/CodeGen/LiveRangeEdit.cpp
356	swap these
llvm/lib/CodeGen/LiveRangeShrink.cpp
202	Move this to the function prolog

review comments

yassingh marked 3 inline comments as done.Jun 9 2023, 5:03 AM

yassingh added inline comments.

llvm/lib/CodeGen/InlineSpiller.cpp
874	isCopyInstr() returns std::optional. Should I use 'auto' to get rid of the ternary operator?

Harbormaster completed remote builds in B237731: Diff 529917.Jun 9 2023, 7:20 AM

arsenm added inline comments.Jun 14 2023, 7:27 AM

llvm/lib/CodeGen/InlineSpiller.cpp
874	or .has_value

review comments

yassingh marked 2 inline comments as done.Jun 15 2023, 2:36 AM

LGTM.

This revision is now accepted and ready to land.Jun 15 2023, 4:46 AM

arsenm accepted this revision.Jun 15 2023, 4:48 AM

arsenm added inline comments.

llvm/lib/CodeGen/InlineSpiller.cpp
523	swap these checks

Harbormaster completed remote builds in B239063: Diff 531664.Jun 15 2023, 5:44 AM

rebase

Harbormaster completed remote builds in B239956: Diff 532848.Jun 20 2023, 6:18 AM

Can you push this soon? I have a patch that's going to conflict with the isCopyInstr changes

In D150388#4440353, @arsenm wrote:

Can you push this soon? I have a patch that's going to conflict with the isCopyInstr changes

I was planning to first commit all the patches in this series downstream first(together). Since they will cause a major merge conflict. Let me try if it's possible to only commit generic patches without breaking spill handling downstream.

cdevadas added inline comments.Jun 26 2023, 9:29 AM

llvm/include/llvm/CodeGen/TargetInstrInfo.h
1970	Can you make this function to take MachineFunction& as the second argument in the first place? You are changing it in D143762.

Move function definiton change in getLiveRangeSplitOpcode from D143762 here.

Harbormaster completed remote builds in B241422: Diff 534897.Jun 27 2023, 3:39 AM

arsenm accepted this revision.Jun 27 2023, 5:32 AM

cdevadas mentioned this in D143762: [AMDGPU] Enable whole wave register copy.Jun 27 2023, 6:14 AM

kparzysz accepted this revision.Jun 27 2023, 7:08 AM

yassingh removed a child revision: D150390: [AMDGPU] Introduce and use the new PRED_COPY opcode.Jul 2 2023, 8:08 AM

yassingh edited child revisions, added: D143759: [AMDGPU] Implement whole wave register spill; removed: D143758: [CodeGen] MRI call back in TargetMachine.Jul 2 2023, 8:13 AM

yassingh mentioned this in rG0f58cfeb9fff: [Mips] Remove isMoveReg=1 from wrdsp and rddsp instructions.Jul 3 2023, 8:50 AM

cdevadas added inline comments.Jul 6 2023, 8:47 AM

llvm/lib/CodeGen/RegAllocGreedy.cpp
2460	Change both Dest and Src to Ref as in the original code. `const MachineOperand &Dest = *DestSrc->Destination;` That will avoid the additional changes you made below.

change ptr to ref

Harbormaster completed remote builds in B243663: Diff 537988.Jul 6 2023, 11:58 PM

cdevadas accepted this revision.Jul 7 2023, 5:00 AM

Rebase before merge

This revision was landed with ongoing or failed builds.Jul 7 2023, 10:00 AM

Closed by commit rGb7836d856206: [CodeGen]Allow targets to use target specific COPY instructions for live range… (authored by yassingh). · Explain Why

This revision was automatically updated to reflect the committed changes.

yassingh added a commit: rGb7836d856206: [CodeGen]Allow targets to use target specific COPY instructions for live range….

Harbormaster completed remote builds in B243804: Diff 538180.Jul 7 2023, 11:44 AM

Headsup: This change might be causing a miscompile. We're having some breakages in google because of this change that goes away when we add __attribute__((optnone)) to https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/SourceCode.h#L111. I'm working on confirming it.

In D150388#4501339, @asmok-g wrote:

Headsup: This change might be causing a miscompile. We're having some breakages in google because of this change that goes away when we add __attribute__((optnone)) to https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/SourceCode.h#L111. I'm working on confirming it.

The easiest experiment would be to change the implementation of isCopyInstr to not bother calling isCopyInstrImpl. The most likely source of any issue is the broader recognition of copy-like operations

In D150388#4501339, @asmok-g wrote:

Headsup: This change might be causing a miscompile. We're having some breakages in google because of this change that goes away when we add __attribute__((optnone)) to https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/SourceCode.h#L111. I'm working on confirming it.

We also see false -fsanitize=bool reports. Definitely miss-compile.
I propose to revert.

In D150388#4501339, @asmok-g wrote:

Headsup: This change might be causing a miscompile. We're having some breakages in google because of this change that goes away when we add __attribute__((optnone)) to https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/SourceCode.h#L111. I'm working on confirming it.

As Matt suggested, "The most likely source of any issue is the broader recognition of copy-like operations". It's hard to say anything without looking at your internal implementation of isCopyIstrImpl(). For eg MIPS implementation of the same function required some tweaking before this patch was pushed as it wasn't accurately describing COPY like instructions. D151181

In D150388#4502779, @yassingh wrote:

In D150388#4501339, @asmok-g wrote:

Headsup: This change might be causing a miscompile. We're having some breakages in google because of this change that goes away when we add __attribute__((optnone)) to https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/SourceCode.h#L111. I'm working on confirming it.

As Matt suggested, "The most likely source of any issue is the broader recognition of copy-like operations". It's hard to say anything without looking at your internal implementation of isCopyIstrImpl(). For eg MIPS implementation of the same function required some tweaking before this patch was pushed as it wasn't accurately describing COPY like instructions. D151181

It's still regression. Unless someone has a patch ready to land, we need to revert this.

In D150388#4504380, @vitalybuka wrote:

In D150388#4502779, @yassingh wrote:

In D150388#4501339, @asmok-g wrote:

Headsup: This change might be causing a miscompile. We're having some breakages in google because of this change that goes away when we add __attribute__((optnone)) to https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/SourceCode.h#L111. I'm working on confirming it.

As Matt suggested, "The most likely source of any issue is the broader recognition of copy-like operations". It's hard to say anything without looking at your internal implementation of isCopyIstrImpl(). For eg MIPS implementation of the same function required some tweaking before this patch was pushed as it wasn't accurately describing COPY like instructions. D151181

It's still regression. Unless someone has a patch ready to land, we need to revert this.

This is part of a chain of patches that fixes an AMDGPU backend issue. D124196 being the final one in the chain, the revert won't be that easy.
I'm trying to see if we can avoid the revert. Can you attach a reproducible IR code?

In D150388#4504380, @vitalybuka wrote:

It's still regression. Unless someone has a patch ready to land, we need to revert this.

I'd ask that you provide a reproducer, and perform the experiment I mentioned. If we can just disable the expanded isCopyInstr identification, it will be a lot less painful

In D150388#4504387, @arsenm wrote:

In D150388#4504380, @vitalybuka wrote:

It's still regression. Unless someone has a patch ready to land, we need to revert this.

I'd ask that you provide a reproducer, and perform the experiment I mentioned. If we can just disable the expanded isCopyInstr identification, it will be a lot less painful

I tried the suggestion; based on my understanding what I did is:

@@ -1038,10 +1038,10 @@
   /// registers as machine operands, for all other instructions the method calls
   /// target-dependent implementation.
   std::optional<DestSourcePair> isCopyInstr(const MachineInstr &MI) const {
-    if (MI.isCopy()) {
-      return DestSourcePair{MI.getOperand(0), MI.getOperand(1)};
-    }
-    return isCopyInstrImpl(MI);
+    // if (MI.isCopy()) {
+    return DestSourcePair{MI.getOperand(0), MI.getOperand(1)};
+    // }
+    // return isCopyInstrImpl(MI);
   }
 
   bool isFullCopyInstr(const MachineInstr &MI) const {

But clang crashes when I use it to build the target test (and many other targets), I can't include the exact stack trace. But maybe you meant something else by your suggestion?

1.	<eof> parser at end of file
2.	Code generation
3.	Running pass 'Function Pass Manager' on module $xyz.
4.	Running pass 'Greedy Register Allocator' on function

Target: x86_64-grtev4-linux-gnu

I'm still working on a repro.

In D150388#4504380, @vitalybuka wrote:
@@ -1038,10 +1038,10 @@
/// registers as machine operands, for all other instructions the method calls
/// target-dependent implementation.
std::optional<DestSourcePair> isCopyInstr(const MachineInstr &MI) const {
if (MI.isCopy()) {

return DestSourcePair{MI.getOperand(0), MI.getOperand(1)};

}

return isCopyInstrImpl(MI);

+ if (MI.isCopy()) {
+ return DestSourcePair{MI.getOperand(0), MI.getOperand(1)};
+ }
+ // return isCopyInstrImpl(MI);
}
 
bool isFullCopyInstr(const MachineInstr &MI) const {
But clang crashes when I use it to build the target test (and many other targets), I can't include the exact stack trace. But maybe you meant something else by your suggestion?
<eof> parser at end of file

Code generation

Running pass 'Function Pass Manager' on module $xyz.

Running pass 'Greedy Register Allocator' on function
Target: x86_64-grtev4-linux-gnu

I'm still working on a repro.

Yes, this would just crash. You want:

   std::optional<DestSourcePair> isCopyInstr(const MachineInstr &MI) const {
   if (MI.isCopy()) {
    return DestSourcePair{MI.getOperand(0), MI.getOperand(1)};
  return std::nullopt;
}

In D150388#4504387, @arsenm wrote:

In D150388#4504380, @vitalybuka wrote:

It's still regression. Unless someone has a patch ready to land, we need to revert this.

I'd ask that you provide a reproducer, and perform the experiment I mentioned. If we can just disable the expanded isCopyInstr identification, it will be a lot less painful

I am running creduce for ubsan case whole weekend, still quite large. I can try the experiment if you can provide a draft patch.

asmok-g added a comment.Jul 17 2023, 10:26 AM

This comment was removed by asmok-g.

The issue didn't go away with the changed line.

In D150388#4506932, @vitalybuka wrote:

In D150388#4504387, @arsenm wrote:

In D150388#4504380, @vitalybuka wrote:

It's still regression. Unless someone has a patch ready to land, we need to revert this.

I'd ask that you provide a reproducer, and perform the experiment I mentioned. If we can just disable the expanded isCopyInstr identification, it will be a lot less painful

I am running creduce for ubsan case whole weekend, still quite large. I can try the experiment if you can provide a draft patch.

https://reviews.llvm.org/differential/diff/541268/ is the draft

There's something wrong elsewhere even if this fixes it, but I can't see what else would change anything

In D150388#4508264, @arsenm wrote:

In D150388#4506932, @vitalybuka wrote:

In D150388#4504387, @arsenm wrote:

In D150388#4504380, @vitalybuka wrote:

It's still regression. Unless someone has a patch ready to land, we need to revert this.

I'd ask that you provide a reproducer, and perform the experiment I mentioned. If we can just disable the expanded isCopyInstr identification, it will be a lot less painful

I am running creduce for ubsan case whole weekend, still quite large. I can try the experiment if you can provide a draft patch.

In D150388#4506932, @vitalybuka wrote:

In D150388#4504387, @arsenm wrote:

In D150388#4504380, @vitalybuka wrote:

It's still regression. Unless someone has a patch ready to land, we need to revert this.

I'd ask that you provide a reproducer, and perform the experiment I mentioned. If we can just disable the expanded isCopyInstr identification, it will be a lot less painful

I am running creduce for ubsan case whole weekend, still quite large. I can try the experiment if you can provide a draft patch.

https://reviews.llvm.org/differential/diff/541268/ is the draft

There's something wrong elsewhere even if this fixes it, but I can't see what else would change anything

I just spotted a bug, isCopyOfBundle isn't using the proper result from isCopyInstr. Copy bundles are niche enough that I'd be surprised if they help your case (x86 I presume?)

In D150388#4508276, @arsenm wrote:

There's something wrong elsewhere even if this fixes it, but I can't see what else would change anything

I just spotted a bug, isCopyOfBundle isn't using the proper result from isCopyInstr. Copy bundles are niche enough that I'd be surprised if they help your case (x86 I presume?)

Try 825b7f0ca5f2211ec3c93139f98d1e24048c225c

I'm still very interested in getting a reproducer if this fixes it. It's usually hard to synthetically craft one for cases like this. If you send IR + invocation I can quickly reduce cases where this patch introduces a diff

bgraur added a subscriber: bgraur.Jul 18 2023, 1:27 AM

Try 825b7f0ca5f2211ec3c93139f98d1e24048c225c

I'm still very interested in getting a reproducer if this fixes it. It's usually hard to synthetically craft one for cases like this. If you send IR + invocation I can quickly reduce cases where this patch introduces a diff

Still trying with the IR reduction but the patch didn't help in our case.

In D150388#4509991, @asmok-g wrote:

Try 825b7f0ca5f2211ec3c93139f98d1e24048c225c

I'm still very interested in getting a reproducer if this fixes it. It's usually hard to synthetically craft one for cases like this. If you send IR + invocation I can quickly reduce cases where this patch introduces a diff

Still trying with the IR reduction but the patch didn't help in our case.

Are you getting the same error as before after both the patches (the experiment one and the bundlecopy fix)?

In D150388#4509992, @yassingh wrote:

In D150388#4509991, @asmok-g wrote:

Try 825b7f0ca5f2211ec3c93139f98d1e24048c225c

I'm still very interested in getting a reproducer if this fixes it. It's usually hard to synthetically craft one for cases like this. If you send IR + invocation I can quickly reduce cases where this patch introduces a diff

Still trying with the IR reduction but the patch didn't help in our case.

Are you getting the same error as before after both the patches (the experiment one and the bundlecopy fix)?

Yes.

I'm not sure this llvm-reduce'd IR snippet retains the essence of the problem, but maybe you could look at it and see if there's an obvious issue with the codegen?

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

; Function Attrs: cold noreturn nounwind
declare void @llvm.ubsantrap(i8 immarg) #0

declare i1 @_f1()

declare { i64, i64 } @_f2(ptr)

declare { i64, i8 } @_f3()

declare void @_f4()

define fastcc void @_f(ptr %0, ptr %1, i64 %2, ptr %3, ptr %4, i1 %5, i1 %6, i24 %7, i1 %8) #1 {
  %10 = call i1 @_f1()
  %11 = icmp eq i24 %7, 0
  br i1 %11, label %13, label %12

12:                                               ; preds = %9
  call void @_f4()
  br label %common.ret

common.ret:                                       ; preds = %22, %18, %12
  ret void

13:                                               ; preds = %20, %9
  %14 = phi i40 [ undef, %9 ], [ %21, %20 ]
  br i1 %6, label %15, label %16

15:                                               ; preds = %13
  call void @llvm.ubsantrap(i8 0)
  unreachable

16:                                               ; preds = %13
  %17 = call { i64, i64 } @_f2(ptr %3)
  br i1 %5, label %20, label %18

18:                                               ; preds = %16
  %19 = and i40 %14, 4294967295
  store ptr null, ptr %0, align 8
  store i40 %19, ptr %1, align 4
  br i1 %8, label %common.ret, label %20

20:                                               ; preds = %18, %16
  %21 = phi i40 [ %14, %16 ], [ %19, %18 ]
  br i1 %8, label %22, label %13

22:                                               ; preds = %20
  store ptr null, ptr %0, align 8
  %23 = call { i64, i8 } @_f3()
  %24 = load ptr, ptr %0, align 8
  %25 = icmp eq ptr %24, null
  br i1 %25, label %26, label %common.ret

26:                                               ; preds = %22
  store volatile i32 0, ptr null, align 4294967296
  unreachable
}

attributes #0 = { cold noreturn nounwind }
attributes #1 = { "frame-pointer"="all" }

The difference in the generated x86-64 assembly (with clang -O1 -S, before this patch and after it) is as follows:

        .text
        .file   "reduced.ll"
        .globl  _f                              # -- Begin function _f
        .p2align        4, 0x90
        .type   _f,@function
 _f:                                     # @_f
        .cfi_startproc
 # %bb.0:
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset %rbp, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register %rbp
        pushq   %r15
        pushq   %r14
        pushq   %r13
        pushq   %r12
        pushq   %rbx
        subq    $24, %rsp
        .cfi_offset %rbx, -56
        .cfi_offset %r12, -48
        .cfi_offset %r13, -40
        .cfi_offset %r14, -32
        .cfi_offset %r15, -24
        movl    %r9d, %r14d
-       movq    %rcx, %r12
-       movq    %rsi, %r15
-       movq    %rdi, -48(%rbp)                 # 8-byte Spill
+       movq    %rcx, %r15
+       movq    %rsi, %r12
+       movq    %rdi, %rbx
        movzbl  32(%rbp), %r13d
-       movzbl  16(%rbp), %ebx
+       movzbl  16(%rbp), %eax
+       movb    %al, -41(%rbp)                  # 1-byte Spill
        callq   _f1@PLT
        testl   $16777215, 24(%rbp)             # imm = 0xFFFFFF
        je      .LBB0_1
-# %bb.5:
+# %bb.9:
        addq    $24, %rsp
        popq    %rbx
        popq    %r12
        popq    %r13
        popq    %r14
        popq    %r15
        popq    %rbp
        .cfi_def_cfa %rsp, 8
        jmp     _f4@PLT                         # TAILCALL
 .LBB0_1:                                # %.preheader
        .cfi_def_cfa %rbp, 16
-       movq    %r15, -56(%rbp)                 # 8-byte Spill
-       movl    %r14d, %r15d
-       movq    -48(%rbp), %r14                 # 8-byte Reload
-       testb   $1, %bl
-       jne     .LBB0_7
+       movq    %rbx, -56(%rbp)                 # 8-byte Spill
+       testb   $1, -41(%rbp)                   # 1-byte Folded Reload
+       jne     .LBB0_11
 # %bb.2:                                # %.preheader.split.preheader
                                         # implicit-def: $rbx
        jmp     .LBB0_3
        .p2align        4, 0x90
-.LBB0_4:                                #   in Loop: Header=BB0_3 Depth=1
-       movq    %r14, %rax
+.LBB0_6:                                #   in Loop: Header=BB0_3 Depth=1
        testb   $1, %r13b
-       jne     .LBB0_11
+       jne     .LBB0_7
 .LBB0_3:                                # %.preheader.split
                                         # =>This Inner Loop Header: Depth=1
-       movq    %r12, %rdi
+       movq    %r15, %rdi
        callq   _f2@PLT
-       testb   $1, %r15b
-       jne     .LBB0_4
-# %bb.8:                                #   in Loop: Header=BB0_3 Depth=1
-       movq    $0, (%r14)
-       movq    -56(%rbp), %rcx                 # 8-byte Reload
-       movl    %ebx, (%rcx)
-       movb    $0, 4(%rcx)
-       testb   $1, %r13b
+       testb   $1, %r14b
        jne     .LBB0_6
-# %bb.9:                                #   in Loop: Header=BB0_3 Depth=1
-       movq    %r14, %rax
-       movl    %ebx, %ebx
-       testb   $1, %r13b
-       je      .LBB0_3
-.LBB0_11:
+# %bb.4:                                #   in Loop: Header=BB0_3 Depth=1
+       movq    -56(%rbp), %rax                 # 8-byte Reload
        movq    $0, (%rax)
-       movq    %rax, %rbx
+       movl    %ebx, (%r12)
+       movb    $0, 4(%r12)
+       testb   $1, %r13b
+       jne     .LBB0_10
+# %bb.5:                                #   in Loop: Header=BB0_3 Depth=1
+       movl    %ebx, %ebx
+       jmp     .LBB0_6
+.LBB0_7:
+       movq    -56(%rbp), %rbx                 # 8-byte Reload
+       movq    $0, (%rbx)
        callq   _f3@PLT
        cmpq    $0, (%rbx)
-       je      .LBB0_12
-.LBB0_6:                                # %common.ret
+       je      .LBB0_8
+.LBB0_10:                               # %common.ret
        addq    $24, %rsp
        popq    %rbx
        popq    %r12
        popq    %r13
        popq    %r14
        popq    %r15
        popq    %rbp
        .cfi_def_cfa %rsp, 8
        retq
-.LBB0_7:
+.LBB0_11:
        .cfi_def_cfa %rbp, 16
        ud1l    (%eax), %eax
-.LBB0_12:
+.LBB0_8:
        movl    $0, 0
 .Lfunc_end0:
        .size   _f, .Lfunc_end0-_f
        .cfi_endproc
                                         # -- End function
        .section        ".note.GNU-stack","",@progbits
        .addrsig

cdevadas mentioned this in D143756: [AMDGPU] Use buildCopy and isCopy helper functions (NFC)..Jul 18 2023, 4:30 PM

cdevadas mentioned this in D143757: [AMDGPU] Enable predicated copy right from instruction selection.

cdevadas mentioned this in D143752: [MachineInstr] Use isCopy helper function (NFC)..Jul 18 2023, 4:32 PM

cdevadas mentioned this in D143753: [MachineInstr] Introduce TII buildCopy helper functions (NFC)..

In D150388#4512745, @alexfh wrote:

I'm not sure this llvm-reduce'd IR snippet retains the essence of the problem, but maybe you could look at it and see if there's an obvious issue with the codegen?

I'll look more tomorrow but this diff is fully the isCopyInstrImpl handling some identity copies. The diff disappers without the x86 implementation. It triggers for identity copies only

My repro:

; ModuleID = '<bc file>'
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

%"struct.devtools::inliner::CallArg2.2307.4010.6850.7702.9690.1453.4566.8245.24575.24695.24965.25235.25445.26645.26877.26937.27207.27387.27673.27761.27849.27893.27915.27959.27981.28003.28016.28068.28120.28133.28185.28237.28250.50.102.115" = type { %"class.std::__u::optional.2306.4009.6849.7701.9689.1452.4565.8244.24574.24694.24964.25234.25444.26644.26876.26936.27206.27386.27672.27760.27848.27892.27914.27958.27980.28002.28015.28067.28119.28132.28184.28236.28249.49.101.114" }
%"class.std::__u::optional.2306.4009.6849.7701.9689.1452.4565.8244.24574.24694.24964.25234.25444.26644.26876.26936.27206.27386.27672.27760.27848.27892.27914.27958.27980.28002.28015.28067.28119.28132.28184.28236.28249.49.101.114" = type { %"struct.std::__u::__optional_move_assign_base.base.2305.4008.6848.7700.9688.1451.4564.8243.24573.24693.24963.25233.25443.26643.26875.26935.27205.27385.27671.27759.27847.27891.27913.27957.27979.28001.28014.28066.28118.28131.28183.28235.28248.48.100.113", [3 x i8] }
%"struct.std::__u::__optional_move_assign_base.base.2305.4008.6848.7700.9688.1451.4564.8243.24573.24693.24963.25233.25443.26643.26875.26935.27205.27385.27671.27759.27847.27891.27913.27957.27979.28001.28014.28066.28118.28131.28183.28235.28248.48.100.113" = type { %"struct.std::__u::__optional_copy_assign_base.base.2304.4007.6847.7699.9687.1450.4563.8242.24572.24692.24962.25232.25442.26642.26874.26934.27204.27384.27670.27758.27846.27890.27912.27956.27978.28000.28013.28065.28117.28130.28182.28234.28247.47.99.112" }
%"struct.std::__u::__optional_copy_assign_base.base.2304.4007.6847.7699.9687.1450.4563.8242.24572.24692.24962.25232.25442.26642.26874.26934.27204.27384.27670.27758.27846.27890.27912.27956.27978.28000.28013.28065.28117.28130.28182.28234.28247.47.99.112" = type { %"struct.std::__u::__optional_move_base.base.2303.4006.6846.7698.9686.1449.4562.8241.24571.24691.24961.25231.25441.26641.26873.26933.27203.27383.27669.27757.27845.27889.27911.27955.27977.27999.28012.28064.28116.28129.28181.28233.28246.46.98.111" }
%"struct.std::__u::__optional_move_base.base.2303.4006.6846.7698.9686.1449.4562.8241.24571.24691.24961.25231.25441.26641.26873.26933.27203.27383.27669.27757.27845.27889.27911.27955.27977.27999.28012.28064.28116.28129.28181.28233.28246.46.98.111" = type { %"struct.std::__u::__optional_copy_base.base.2302.4005.6845.7697.9685.1448.4561.8240.24570.24690.24960.25230.25440.26640.26872.26932.27202.27382.27668.27756.27844.27888.27910.27954.27976.27998.28011.28063.28115.28128.28180.28232.28245.45.97.110" }
%"struct.std::__u::__optional_copy_base.base.2302.4005.6845.7697.9685.1448.4561.8240.24570.24690.24960.25230.25440.26640.26872.26932.27202.27382.27668.27756.27844.27888.27910.27954.27976.27998.28011.28063.28115.28128.28180.28232.28245.45.97.110" = type { %"struct.std::__u::__optional_storage_base.base.2301.4004.6844.7696.9684.1447.4560.8239.24569.24689.24959.25229.25439.26639.26871.26931.27201.27381.27667.27755.27843.27887.27909.27953.27975.27997.28010.28062.28114.28127.28179.28231.28244.44.96.109" }
%"struct.std::__u::__optional_storage_base.base.2301.4004.6844.7696.9684.1447.4560.8239.24569.24689.24959.25229.25439.26639.26871.26931.27201.27381.27667.27755.27843.27887.27909.27953.27975.27997.28010.28062.28114.28127.28179.28231.28244.44.96.109" = type { %"struct.std::__u::__optional_destruct_base.base.2300.4003.6843.7695.9683.1446.4559.8238.24568.24688.24958.25228.25438.26638.26870.26930.27200.27380.27666.27754.27842.27886.27908.27952.27974.27996.28009.28061.28113.28126.28178.28230.28243.43.95.108" }
%"struct.std::__u::__optional_destruct_base.base.2300.4003.6843.7695.9683.1446.4559.8238.24568.24688.24958.25228.25438.26638.26870.26930.27200.27380.27666.27754.27842.27886.27908.27952.27974.27996.28009.28061.28113.28126.28178.28230.28243.43.95.108" = type { %union.anon.40.2299.4002.6842.7694.9682.1445.4558.8237.24567.24687.24957.25227.25437.26637.26869.26929.27199.27379.27665.27753.27841.27885.27907.27951.27973.27995.28008.28060.28112.28125.28177.28229.28242.42.94.107, i8 }
%union.anon.40.2299.4002.6842.7694.9682.1445.4558.8237.24567.24687.24957.25227.25437.26637.26869.26929.27199.27379.27665.27753.27841.27885.27907.27951.27973.27995.28008.28060.28112.28125.28177.28229.28242.42.94.107 = type { %"class.clang::CharSourceRange.2289.3992.6832.7684.9672.1435.4548.8227.24566.24686.24956.25226.25436.26636.26868.26928.27198.27378.27664.27752.27840.27884.27906.27950.27972.27994.28007.28059.28111.28124.28176.28228.28241.41.93.106" }
%"class.clang::CharSourceRange.2289.3992.6832.7684.9672.1435.4548.8227.24566.24686.24956.25226.25436.26636.26868.26928.27198.27378.27664.27752.27840.27884.27906.27950.27972.27994.28007.28059.28111.28124.28176.28228.28241.41.93.106" = type <{ %"class.clang::SourceRange.2288.3991.6831.7683.9671.1434.4547.8226.24565.24685.24955.25225.25435.26635.26867.26927.27197.27377.27663.27751.27839.27883.27905.27949.27971.27993.28006.28058.28110.28123.28175.28227.28240.40.92.105", i8, [3 x i8] }>
%"class.clang::SourceRange.2288.3991.6831.7683.9671.1434.4547.8226.24565.24685.24955.25225.25435.26635.26867.26927.27197.27377.27663.27751.27839.27883.27905.27949.27971.27993.28006.28058.28110.28123.28175.28227.28240.40.92.105" = type { %"class.clang::SourceLocation.2287.3990.6830.7682.9670.1433.4546.8225.24555.24675.24945.25215.25425.26625.26857.26917.27187.27367.27653.27741.27829.27873.27895.27939.27961.27983.28005.28057.28109.28122.28174.28226.28239.39.91.104", %"class.clang::SourceLocation.2287.3990.6830.7682.9670.1433.4546.8225.24555.24675.24945.25215.25425.26625.26857.26917.27187.27367.27653.27741.27829.27873.27895.27939.27961.27983.28005.28057.28109.28122.28174.28226.28239.39.91.104" }
%"class.clang::SourceLocation.2287.3990.6830.7682.9670.1433.4546.8225.24555.24675.24945.25215.25425.26625.26857.26917.27187.27367.27653.27741.27829.27873.27895.27939.27961.27983.28005.28057.28109.28122.28174.28226.28239.39.91.104" = type { i32 }
%"struct.std::__u::__optional_destruct_base.2555.4258.7098.7950.9938.1701.4814.8493.24576.24696.24966.25236.25446.26646.26885.26945.27215.27395.27674.27762.27850.27894.27916.27960.27982.28004.28017.28069.28121.28134.28186.28238.28251.51.103.116" = type { %union.anon.40.2299.4002.6842.7694.9682.1445.4558.8237.24567.24687.24957.25227.25437.26637.26869.26929.27199.27379.27665.27753.27841.27885.27907.27951.27973.27995.28008.28060.28112.28125.28177.28229.28242.42.94.107, i8, [3 x i8] }

; Function Attrs: noinline
define void @_ZN8devtools7inliner14ParseCallArgs3ERKN5clang8CallExprERKNS1_12FunctionDeclERNS1_10ASTContextE(ptr %0, ptr %1, i40 %2, ptr %3, i32 %4) #0 {
  br label %9

6:                                                ; preds = %21, %18
  %.sroa.0.0 = phi ptr [ %24, %21 ], [ null, %18 ]
  %.sroa.5.0 = phi ptr [ %25, %21 ], [ null, %18 ]
  %7 = add i32 %10, 1
  %8 = icmp eq i32 %10, %4
  br i1 %8, label %27, label %9

9:                                                ; preds = %6, %5
  %.sroa.5.1 = phi ptr [ null, %5 ], [ %.sroa.5.0, %6 ]
  %10 = phi i32 [ 0, %5 ], [ %7, %6 ]
  %11 = phi i40 [ undef, %5 ], [ %19, %6 ]
  %12 = call ptr @_ZN5clang4Expr27IgnoreUnlessSpelledInSourceEv()
  %13 = load i8, ptr %1, align 8
  %14 = icmp ult i8 %13, -5
  %15 = and i40 %11, 4294967295
  br i1 %14, label %18, label %16

16:                                               ; preds = %9
  %17 = load volatile { i64, i64 }, ptr null, align 4294967296
  br label %18

18:                                               ; preds = %16, %9
  %19 = phi i40 [ %15, %9 ], [ %2, %16 ]
  %20 = icmp ugt ptr %.sroa.5.1, %0
  br i1 %20, label %6, label %21

21:                                               ; preds = %18
  %22 = icmp eq ptr %.sroa.5.1, null
  %23 = zext i1 %22 to i64
  %24 = call ptr @_Znwm(i64 0)
  %25 = getelementptr %"struct.devtools::inliner::CallArg2.2307.4010.6850.7702.9690.1453.4566.8245.24575.24695.24965.25235.25445.26645.26877.26937.27207.27387.27673.27761.27849.27893.27915.27959.27981.28003.28016.28068.28120.28133.28185.28237.28250.50.102.115", ptr %3, i64 %23
  %26 = getelementptr i8, ptr %24, i64 8
  store i40 %19, ptr %26, align 4
  br label %6

27:                                               ; preds = %6
  %28 = getelementptr %"struct.std::__u::__optional_destruct_base.2555.4258.7098.7950.9938.1701.4814.8493.24576.24696.24966.25236.25446.26646.26885.26945.27215.27395.27674.27762.27850.27894.27916.27960.27982.28004.28017.28069.28121.28134.28186.28238.28251.51.103.116", ptr %.sroa.0.0, i64 0, i32 1
  %29 = load i8, ptr %28, align 4
  %30 = icmp eq i8 %29, 0
  br i1 %30, label %32, label %31

31:                                               ; preds = %27
  call void @__ubsan_handle_load_invalid_value_abort(ptr %0)
  unreachable

32:                                               ; preds = %27
  ret void

; uselistorder directives
  uselistorder i32 %10, { 1, 0 }
}

define void @_ZN8devtools7inliner14ParseCallArgs2ERKN5clang8CallExprERKNS1_12FunctionDeclERNS1_10ASTContextE(ptr %0, ptr %1) {
  call void @_ZN8devtools7inliner14ParseCallArgs3ERKN5clang8CallExprERKNS1_12FunctionDeclERNS1_10ASTContextE(ptr %1, ptr %0, i40 0, ptr null, i32 0)
  ret void
}

declare ptr @_ZN5clang4Expr27IgnoreUnlessSpelledInSourceEv()

declare void @__ubsan_handle_load_invalid_value_abort(ptr)

declare ptr @_Znwm(i64)

attributes #0 = { noinline "frame-pointer"="all" }

Before this patch __ubsan_handle_load_invalid_value_abort was not reached, now it is.

llc ./llvm-reduce-42f916.ll -O3 -o ./llvm-reduce-42f916.ll.<revision>.s
diff -u --color ./llvm-reduce-42f916.ll.eb98abab2c83.s ./llvm-reduce-42f916.ll.b7836d856206.s :

--- ./llvm-reduce-42f916.ll.eb98abab2c83.s	2023-07-19 10:38:05.571863413 -0700
+++ ./llvm-reduce-42f916.ll.b7836d856206.s	2023-07-19 10:39:47.587632219 -0700
@@ -23,14 +23,14 @@
 	.cfi_offset %r14, -32
 	.cfi_offset %r15, -24
 	movl	%r8d, %r14d
-	movq	%rcx, -64(%rbp)                 # 8-byte Spill
-	movq	%rdx, -56(%rbp)                 # 8-byte Spill
-	movq	%rsi, %r13
+	movq	%rcx, -56(%rbp)                 # 8-byte Spill
+	movq	%rdx, -48(%rbp)                 # 8-byte Spill
+	movq	%rsi, %r12
 	movq	%rdi, %r15
 	incl	%r14d
 	xorl	%ebx, %ebx
-                                        # implicit-def: $r12
-	movq	%rsi, -48(%rbp)                 # 8-byte Spill
+                                        # implicit-def: $rax
+                                        # kill: killed $rax
 	jmp	.LBB0_3
 	.p2align	4, 0x90
 .LBB0_1:                                #   in Loop: Header=BB0_3 Depth=1
@@ -41,41 +41,37 @@
 	xorl	%edi, %edi
 	callq	_Znwm@PLT
 	shlq	$4, %r15
-	addq	-64(%rbp), %r15                 # 8-byte Folded Reload
-	movq	%r12, %rcx
+	addq	-56(%rbp), %r15                 # 8-byte Folded Reload
+	movq	-64(%rbp), %rdx                 # 8-byte Reload
+	movq	%rdx, %rcx
 	shrq	$32, %rcx
 	movb	%cl, 12(%rax)
-	movl	%r12d, 8(%rax)
+	movl	%edx, 8(%rax)
 	movq	%r15, %rbx
 	movq	%r13, %r15
-	movq	-48(%rbp), %r13                 # 8-byte Reload
 	decl	%r14d
-	je	.LBB0_8
+	je	.LBB0_7
 .LBB0_3:                                # =>This Inner Loop Header: Depth=1
 	callq	_ZN5clang4Expr27IgnoreUnlessSpelledInSourceEv@PLT
-	cmpb	$-5, (%r13)
-	jae	.LBB0_5
+	cmpb	$-5, (%r12)
+	jb	.LBB0_5
 # %bb.4:                                #   in Loop: Header=BB0_3 Depth=1
-	movl	%r12d, %r12d
-	cmpq	%r15, %rbx
-	jbe	.LBB0_1
-	jmp	.LBB0_7
-	.p2align	4, 0x90
-.LBB0_5:                                #   in Loop: Header=BB0_3 Depth=1
 	movq	0, %rax
 	movq	8, %rax
-	movq	-56(%rbp), %r12                 # 8-byte Reload
+	movq	-48(%rbp), %rax                 # 8-byte Reload
+	movq	%rax, -64(%rbp)                 # 8-byte Spill
+.LBB0_5:                                #   in Loop: Header=BB0_3 Depth=1
 	cmpq	%r15, %rbx
 	jbe	.LBB0_1
-.LBB0_7:                                #   in Loop: Header=BB0_3 Depth=1
+# %bb.6:                                #   in Loop: Header=BB0_3 Depth=1
 	xorl	%eax, %eax
 	xorl	%ebx, %ebx
 	decl	%r14d
 	jne	.LBB0_3
-.LBB0_8:
+.LBB0_7:
 	cmpb	$0, 12(%rax)
-	jne	.LBB0_10
-# %bb.9:
+	jne	.LBB0_9
+# %bb.8:
 	addq	$24, %rsp
 	popq	%rbx
 	popq	%r12
@@ -85,7 +81,7 @@
 	popq	%rbp
 	.cfi_def_cfa %rsp, 8
 	retq
-.LBB0_10:
+.LBB0_9:
 	.cfi_def_cfa %rbp, 16
 	movq	%r15, %rdi
 	callq	__ubsan_handle_load_invalid_value_abort@PLT

No diff b7836d856206 vs 645f6dcd69a5(HEAD)

In D150388#4515524, @vitalybuka wrote:

My repro:

; ModuleID = '<bc file>'
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

%"struct.devtools::inliner::CallArg2.2307.4010.6850.7702.9690.1453.4566.8245.24575.24695.24965.25235.25445.26645.26877.26937.27207.27387.27673.27761.27849.27893.27915.27959.27981.28003.28016.28068.28120.28133.28185.28237.28250.50.102.115" = type { %"class.std::__u::optional.2306.4009.6849.7701.9689.1452.4565.8244.24574.24694.24964.25234.25444.26644.26876.26936.27206.27386.27672.27760.27848.27892.27914.27958.27980.28002.28015.28067.28119.28132.28184.28236.28249.49.101.114" }
%"class.std::__u::optional.2306.4009.6849.7701.9689.1452.4565.8244.24574.24694.24964.25234.25444.26644.26876.26936.27206.27386.27672.27760.27848.27892.27914.27958.27980.28002.28015.28067.28119.28132.28184.28236.28249.49.101.114" = type { %"struct.std::__u::__optional_move_assign_base.base.2305.4008.6848.7700.9688.1451.4564.8243.24573.24693.24963.25233.25443.26643.26875.26935.27205.27385.27671.27759.27847.27891.27913.27957.27979.28001.28014.28066.28118.28131.28183.28235.28248.48.100.113", [3 x i8] }
%"struct.std::__u::__optional_move_assign_base.base.2305.4008.6848.7700.9688.1451.4564.8243.24573.24693.24963.25233.25443.26643.26875.26935.27205.27385.27671.27759.27847.27891.27913.27957.27979.28001.28014.28066.28118.28131.28183.28235.28248.48.100.113" = type { %"struct.std::__u::__optional_copy_assign_base.base.2304.4007.6847.7699.9687.1450.4563.8242.24572.24692.24962.25232.25442.26642.26874.26934.27204.27384.27670.27758.27846.27890.27912.27956.27978.28000.28013.28065.28117.28130.28182.28234.28247.47.99.112" }
%"struct.std::__u::__optional_copy_assign_base.base.2304.4007.6847.7699.9687.1450.4563.8242.24572.24692.24962.25232.25442.26642.26874.26934.27204.27384.27670.27758.27846.27890.27912.27956.27978.28000.28013.28065.28117.28130.28182.28234.28247.47.99.112" = type { %"struct.std::__u::__optional_move_base.base.2303.4006.6846.7698.9686.1449.4562.8241.24571.24691.24961.25231.25441.26641.26873.26933.27203.27383.27669.27757.27845.27889.27911.27955.27977.27999.28012.28064.28116.28129.28181.28233.28246.46.98.111" }
%"struct.std::__u::__optional_move_base.base.2303.4006.6846.7698.9686.1449.4562.8241.24571.24691.24961.25231.25441.26641.26873.26933.27203.27383.27669.27757.27845.27889.27911.27955.27977.27999.28012.28064.28116.28129.28181.28233.28246.46.98.111" = type { %"struct.std::__u::__optional_copy_base.base.2302.4005.6845.7697.9685.1448.4561.8240.24570.24690.24960.25230.25440.26640.26872.26932.27202.27382.27668.27756.27844.27888.27910.27954.27976.27998.28011.28063.28115.28128.28180.28232.28245.45.97.110" }
%"struct.std::__u::__optional_copy_base.base.2302.4005.6845.7697.9685.1448.4561.8240.24570.24690.24960.25230.25440.26640.26872.26932.27202.27382.27668.27756.27844.27888.27910.27954.27976.27998.28011.28063.28115.28128.28180.28232.28245.45.97.110" = type { %"struct.std::__u::__optional_storage_base.base.2301.4004.6844.7696.9684.1447.4560.8239.24569.24689.24959.25229.25439.26639.26871.26931.27201.27381.27667.27755.27843.27887.27909.27953.27975.27997.28010.28062.28114.28127.28179.28231.28244.44.96.109" }
%"struct.std::__u::__optional_storage_base.base.2301.4004.6844.7696.9684.1447.4560.8239.24569.24689.24959.25229.25439.26639.26871.26931.27201.27381.27667.27755.27843.27887.27909.27953.27975.27997.28010.28062.28114.28127.28179.28231.28244.44.96.109" = type { %"struct.std::__u::__optional_destruct_base.base.2300.4003.6843.7695.9683.1446.4559.8238.24568.24688.24958.25228.25438.26638.26870.26930.27200.27380.27666.27754.27842.27886.27908.27952.27974.27996.28009.28061.28113.28126.28178.28230.28243.43.95.108" }
%"struct.std::__u::__optional_destruct_base.base.2300.4003.6843.7695.9683.1446.4559.8238.24568.24688.24958.25228.25438.26638.26870.26930.27200.27380.27666.27754.27842.27886.27908.27952.27974.27996.28009.28061.28113.28126.28178.28230.28243.43.95.108" = type { %union.anon.40.2299.4002.6842.7694.9682.1445.4558.8237.24567.24687.24957.25227.25437.26637.26869.26929.27199.27379.27665.27753.27841.27885.27907.27951.27973.27995.28008.28060.28112.28125.28177.28229.28242.42.94.107, i8 }
%union.anon.40.2299.4002.6842.7694.9682.1445.4558.8237.24567.24687.24957.25227.25437.26637.26869.26929.27199.27379.27665.27753.27841.27885.27907.27951.27973.27995.28008.28060.28112.28125.28177.28229.28242.42.94.107 = type { %"class.clang::CharSourceRange.2289.3992.6832.7684.9672.1435.4548.8227.24566.24686.24956.25226.25436.26636.26868.26928.27198.27378.27664.27752.27840.27884.27906.27950.27972.27994.28007.28059.28111.28124.28176.28228.28241.41.93.106" }
%"class.clang::CharSourceRange.2289.3992.6832.7684.9672.1435.4548.8227.24566.24686.24956.25226.25436.26636.26868.26928.27198.27378.27664.27752.27840.27884.27906.27950.27972.27994.28007.28059.28111.28124.28176.28228.28241.41.93.106" = type <{ %"class.clang::SourceRange.2288.3991.6831.7683.9671.1434.4547.8226.24565.24685.24955.25225.25435.26635.26867.26927.27197.27377.27663.27751.27839.27883.27905.27949.27971.27993.28006.28058.28110.28123.28175.28227.28240.40.92.105", i8, [3 x i8] }>
%"class.clang::SourceRange.2288.3991.6831.7683.9671.1434.4547.8226.24565.24685.24955.25225.25435.26635.26867.26927.27197.27377.27663.27751.27839.27883.27905.27949.27971.27993.28006.28058.28110.28123.28175.28227.28240.40.92.105" = type { %"class.clang::SourceLocation.2287.3990.6830.7682.9670.1433.4546.8225.24555.24675.24945.25215.25425.26625.26857.26917.27187.27367.27653.27741.27829.27873.27895.27939.27961.27983.28005.28057.28109.28122.28174.28226.28239.39.91.104", %"class.clang::SourceLocation.2287.3990.6830.7682.9670.1433.4546.8225.24555.24675.24945.25215.25425.26625.26857.26917.27187.27367.27653.27741.27829.27873.27895.27939.27961.27983.28005.28057.28109.28122.28174.28226.28239.39.91.104" }
%"class.clang::SourceLocation.2287.3990.6830.7682.9670.1433.4546.8225.24555.24675.24945.25215.25425.26625.26857.26917.27187.27367.27653.27741.27829.27873.27895.27939.27961.27983.28005.28057.28109.28122.28174.28226.28239.39.91.104" = type { i32 }
%"struct.std::__u::__optional_destruct_base.2555.4258.7098.7950.9938.1701.4814.8493.24576.24696.24966.25236.25446.26646.26885.26945.27215.27395.27674.27762.27850.27894.27916.27960.27982.28004.28017.28069.28121.28134.28186.28238.28251.51.103.116" = type { %union.anon.40.2299.4002.6842.7694.9682.1445.4558.8237.24567.24687.24957.25227.25437.26637.26869.26929.27199.27379.27665.27753.27841.27885.27907.27951.27973.27995.28008.28060.28112.28125.28177.28229.28242.42.94.107, i8, [3 x i8] }

; Function Attrs: noinline
define void @_ZN8devtools7inliner14ParseCallArgs3ERKN5clang8CallExprERKNS1_12FunctionDeclERNS1_10ASTContextE(ptr %0, ptr %1, i40 %2, ptr %3, i32 %4) #0 {
  br label %9

6:                                                ; preds = %21, %18
  %.sroa.0.0 = phi ptr [ %24, %21 ], [ null, %18 ]
  %.sroa.5.0 = phi ptr [ %25, %21 ], [ null, %18 ]
  %7 = add i32 %10, 1
  %8 = icmp eq i32 %10, %4
  br i1 %8, label %27, label %9

9:                                                ; preds = %6, %5
  %.sroa.5.1 = phi ptr [ null, %5 ], [ %.sroa.5.0, %6 ]
  %10 = phi i32 [ 0, %5 ], [ %7, %6 ]
  %11 = phi i40 [ undef, %5 ], [ %19, %6 ]
  %12 = call ptr @_ZN5clang4Expr27IgnoreUnlessSpelledInSourceEv()
  %13 = load i8, ptr %1, align 8
  %14 = icmp ult i8 %13, -5
  %15 = and i40 %11, 4294967295
  br i1 %14, label %18, label %16

16:                                               ; preds = %9
  %17 = load volatile { i64, i64 }, ptr null, align 4294967296
  br label %18

18:                                               ; preds = %16, %9
  %19 = phi i40 [ %15, %9 ], [ %2, %16 ]
  %20 = icmp ugt ptr %.sroa.5.1, %0
  br i1 %20, label %6, label %21

21:                                               ; preds = %18
  %22 = icmp eq ptr %.sroa.5.1, null
  %23 = zext i1 %22 to i64
  %24 = call ptr @_Znwm(i64 0)
  %25 = getelementptr %"struct.devtools::inliner::CallArg2.2307.4010.6850.7702.9690.1453.4566.8245.24575.24695.24965.25235.25445.26645.26877.26937.27207.27387.27673.27761.27849.27893.27915.27959.27981.28003.28016.28068.28120.28133.28185.28237.28250.50.102.115", ptr %3, i64 %23
  %26 = getelementptr i8, ptr %24, i64 8
  store i40 %19, ptr %26, align 4
  br label %6

27:                                               ; preds = %6
  %28 = getelementptr %"struct.std::__u::__optional_destruct_base.2555.4258.7098.7950.9938.1701.4814.8493.24576.24696.24966.25236.25446.26646.26885.26945.27215.27395.27674.27762.27850.27894.27916.27960.27982.28004.28017.28069.28121.28134.28186.28238.28251.51.103.116", ptr %.sroa.0.0, i64 0, i32 1
  %29 = load i8, ptr %28, align 4
  %30 = icmp eq i8 %29, 0
  br i1 %30, label %32, label %31

31:                                               ; preds = %27
  call void @__ubsan_handle_load_invalid_value_abort(ptr %0)
  unreachable

32:                                               ; preds = %27
  ret void

; uselistorder directives
  uselistorder i32 %10, { 1, 0 }
}

define void @_ZN8devtools7inliner14ParseCallArgs2ERKN5clang8CallExprERKNS1_12FunctionDeclERNS1_10ASTContextE(ptr %0, ptr %1) {
  call void @_ZN8devtools7inliner14ParseCallArgs3ERKN5clang8CallExprERKNS1_12FunctionDeclERNS1_10ASTContextE(ptr %1, ptr %0, i40 0, ptr null, i32 0)
  ret void
}

declare ptr @_ZN5clang4Expr27IgnoreUnlessSpelledInSourceEv()

declare void @__ubsan_handle_load_invalid_value_abort(ptr)

declare ptr @_Znwm(i64)

attributes #0 = { noinline "frame-pointer"="all" }

Before this patch __ubsan_handle_load_invalid_value_abort was not reached, now it is.

llc ./llvm-reduce-42f916.ll -O3 -o ./llvm-reduce-42f916.ll.<revision>.s
diff -u --color ./llvm-reduce-42f916.ll.eb98abab2c83.s ./llvm-reduce-42f916.ll.b7836d856206.s :

--- ./llvm-reduce-42f916.ll.eb98abab2c83.s	2023-07-19 10:38:05.571863413 -0700
+++ ./llvm-reduce-42f916.ll.b7836d856206.s	2023-07-19 10:39:47.587632219 -0700
@@ -23,14 +23,14 @@
 	.cfi_offset %r14, -32
 	.cfi_offset %r15, -24
 	movl	%r8d, %r14d
-	movq	%rcx, -64(%rbp)                 # 8-byte Spill
-	movq	%rdx, -56(%rbp)                 # 8-byte Spill
-	movq	%rsi, %r13
+	movq	%rcx, -56(%rbp)                 # 8-byte Spill
+	movq	%rdx, -48(%rbp)                 # 8-byte Spill
+	movq	%rsi, %r12
 	movq	%rdi, %r15
 	incl	%r14d
 	xorl	%ebx, %ebx
-                                        # implicit-def: $r12
-	movq	%rsi, -48(%rbp)                 # 8-byte Spill
+                                        # implicit-def: $rax
+                                        # kill: killed $rax
 	jmp	.LBB0_3
 	.p2align	4, 0x90
 .LBB0_1:                                #   in Loop: Header=BB0_3 Depth=1
@@ -41,41 +41,37 @@
 	xorl	%edi, %edi
 	callq	_Znwm@PLT
 	shlq	$4, %r15
-	addq	-64(%rbp), %r15                 # 8-byte Folded Reload
-	movq	%r12, %rcx
+	addq	-56(%rbp), %r15                 # 8-byte Folded Reload
+	movq	-64(%rbp), %rdx                 # 8-byte Reload
+	movq	%rdx, %rcx
 	shrq	$32, %rcx
 	movb	%cl, 12(%rax)
-	movl	%r12d, 8(%rax)
+	movl	%edx, 8(%rax)
 	movq	%r15, %rbx
 	movq	%r13, %r15
-	movq	-48(%rbp), %r13                 # 8-byte Reload
 	decl	%r14d
-	je	.LBB0_8
+	je	.LBB0_7
 .LBB0_3:                                # =>This Inner Loop Header: Depth=1
 	callq	_ZN5clang4Expr27IgnoreUnlessSpelledInSourceEv@PLT
-	cmpb	$-5, (%r13)
-	jae	.LBB0_5
+	cmpb	$-5, (%r12)
+	jb	.LBB0_5
 # %bb.4:                                #   in Loop: Header=BB0_3 Depth=1
-	movl	%r12d, %r12d
-	cmpq	%r15, %rbx
-	jbe	.LBB0_1
-	jmp	.LBB0_7
-	.p2align	4, 0x90
-.LBB0_5:                                #   in Loop: Header=BB0_3 Depth=1
 	movq	0, %rax
 	movq	8, %rax
-	movq	-56(%rbp), %r12                 # 8-byte Reload
+	movq	-48(%rbp), %rax                 # 8-byte Reload
+	movq	%rax, -64(%rbp)                 # 8-byte Spill
+.LBB0_5:                                #   in Loop: Header=BB0_3 Depth=1
 	cmpq	%r15, %rbx
 	jbe	.LBB0_1
-.LBB0_7:                                #   in Loop: Header=BB0_3 Depth=1
+# %bb.6:                                #   in Loop: Header=BB0_3 Depth=1
 	xorl	%eax, %eax
 	xorl	%ebx, %ebx
 	decl	%r14d
 	jne	.LBB0_3
-.LBB0_8:
+.LBB0_7:
 	cmpb	$0, 12(%rax)
-	jne	.LBB0_10
-# %bb.9:
+	jne	.LBB0_9
+# %bb.8:
 	addq	$24, %rsp
 	popq	%rbx
 	popq	%r12
@@ -85,7 +81,7 @@
 	popq	%rbp
 	.cfi_def_cfa %rsp, 8
 	retq
-.LBB0_10:
+.LBB0_9:
 	.cfi_def_cfa %rbp, 16
 	movq	%r15, %rdi
 	callq	__ubsan_handle_load_invalid_value_abort@PLT

No diff b7836d856206 vs 645f6dcd69a5(HEAD)

I see the diff disappear if I make isCopyInstr skip isCopyInstrImpl. I'm not seeing anything clearly wrong with the codegen decisions here. The apparent block body removal in %bb.4, and the implicit_def; kill in the entry look suspicious, but I'm not seeing how it's wrong.

The isCopyInstrImpl changes 1 spill weight based on an identity mov. That just triggers a bunch of other different spilling decisions and you end up with different code.

If I step back and look at the original IR, it's branching on undef. From the entry in %bb, on the first branch to %bb.7, it's branching on undef here:

bb:
   br label %bb7

bb7:
 ...
  %i9 = phi i40 [ undef, %bb ], [ %i17, %bb5 ]
  ...
  %i13 = and i40 %i9, 4294967295
  br i1 %i12, label %bb16, label %bb14

Maybe this is just an artifact of reduction? If you try opt-bisect-limit, does it point at some other pass?

In D150388#4516897, @arsenm wrote:

I see the diff disappear if I make isCopyInstr skip isCopyInstrImpl. I'm not seeing anything clearly wrong with the codegen decisions here. The apparent block body removal in %bb.4, and the implicit_def; kill in the entry look suspicious, but I'm not seeing how it's wrong.

I've tried replacing isCopyInstrImpl() call in isCopyInstr() with a return std::nullopt; and the issue disappeared. I don't know how my experiment is different to what @asmok-g did, but this definitely helps. I tried this in two configurations:

llvm from f3d0613d852a90563a1e8704930a6e79368f106a + this commit (b7836d856206ec39509d42529f958c920368166b) - tests fail with UndefinedBehaviorSanitizer: invalid-bool-load (on a few instances of seemingly correct code), adding isCopyInstrImpl() -> std::nullopt makes tests pass;
llvm from 3cd3f11c174baa001b337b88c7a6507eb5705cf2 (already includes b7836d856206ec39509d42529f958c920368166b) - tests fail exactly as in the case above, adding isCopyInstrImpl() -> std::nullopt makes tests pass.

The patch I used:

--- llvm/include/llvm/CodeGen/TargetInstrInfo.h
+++ llvm/include/llvm/CodeGen/TargetInstrInfo.h
@@ -1039,11 +1039,11 @@
   /// target-dependent implementation.
   std::optional<DestSourcePair> isCopyInstr(const MachineInstr &MI) const {
     if (MI.isCopy()) {
       return DestSourcePair{MI.getOperand(0), MI.getOperand(1)};
     }
-    return isCopyInstrImpl(MI);
+    return std::nullopt;
   }

   bool isFullCopyInstr(const MachineInstr &MI) const {
     auto DestSrc = isCopyInstr(MI);
     if (!DestSrc)

Is this change something that can be committed to the tree? Do you need a better test case from us?

If I step back and look at the original IR, it's branching on undef. From the entry in %bb, on the first branch to %bb.7, it's branching on undef here:
bb:
   br label %bb7

bb7:
 ...
  %i9 = phi i40 [ undef, %bb ], [ %i17, %bb5 ]
  ...
  %i13 = and i40 %i9, 4294967295
  br i1 %i12, label %bb16, label %bb14
Maybe this is just an artifact of reduction? If you try opt-bisect-limit, does it point at some other pass?

I'm almost sure it's an artifact of reduction, but it's also not trivial to create an interestingness test that completely avoids this sort of a degradation of the test case to something depending on UB.

Not sure if this is useful when we have IR, but is my creduce

template <class a> void b(a &c, a p2) { c = p2; }
void *operator new(unsigned long, void *);
inline namespace e {
template <class... f> void *g(f... c) { return __builtin_operator_new(c...); }
void *h(unsigned c) { return g(c); }
int l;
template <class> struct aa;
template <class a, class = aa<a>> struct j;
template <class a> struct k {
  union {
    char ab;
    a b;
  };
  bool n;
  k() : ab(), n() {}
};
template <class a> struct ac : k<a> {
  bool m() { return this->n; }
};
template <class r> struct o {
  using ad = r::ae;
};
template <class af> struct ag {
  using ad = af::ah;
};
template <class af> struct s {
  using p = af;
  using ae = o<p>::ad;
  using ah = ag<p>::ad;
  template <class a, class... f> static void ai(p c, a p2, f &&...p3) {
    c.ai(p2, p3...);
  }
};
template <class q> struct aj {
  q ak;
};
int al;
template <class af> aj<typename af ::ae> am(af c) { return {c.allocate(al)}; }
template <class a> struct aa {
  a *allocate(int) { return static_cast<a *>(h(sizeof(a))); }
  typedef a *ae;
  typedef a *ah;
  template <class an, class... f> void ai(an *c, f &&...p2) {
    new (c) an(p2...);
  }
};
template <class a> struct ao {
  using ap = a &;
  template <class an> ao(an) : aq() {}
  ap ar() { return aq; }
  a aq;
};
template <class as> struct at : ao<as> {
  using au = ao<as>;
  template <class av, class aw> at(av c, aw) : au(c) {}
  au::ap ax() { return (*this).ar(); }
};
template <class ay> struct az {
  using p = ay;
  using ba = p;
  using bb = ba;
  using ae = bb::ae;
  ae bc;
  ae bd;
  ae be;
  at<ae> bf;
  az(unsigned, unsigned, ba &);
  ae &bg() { return bf.ax(); }
};
template <class ay> az<ay>::az(unsigned c, unsigned, ba &p3) : bf(nullptr, p3) {
  ba bh;
  auto bi = am(bh);
  bc = bi.ak;
  bd = be = bc;
  bg() = bc + c;
}
template <class a, class ay> struct j {
  typedef a bj;
  typedef ay p;
  typedef s<p> bb;
  typedef bb::ae ae;
  typedef bb::ah bk;
  bk begin() const noexcept;
  bk end() const noexcept;
  void bl(bj &&);
  ae bd;
  ae be;
  at<ae> bf = at<ae>(nullptr, int());
  template <class an> void bm(an &&);
  ae &bg() { return bf.ax(); }
};
template <class a, class ay> j<a, ay>::bk j<a, ay>::begin() const noexcept {
  return bd;
}
template <class a, class ay> j<a, ay>::bk j<a, ay>::end() const noexcept {
  return be;
}
template <class a, class ay> template <class an> void j<a, ay>::bm(an &&c) {
  p x;
  az bn(1, 0, x);
  a bo(c);
  bb::ai(x, bn.be, bo);
  bn.be++;
  b(bd, bn.bd);
  b(be, bn.be);
  ae bq = bn.bg();
  b(bg(), bq);
}
template <class a, class ay> void j<a, ay>::bl(bj &&c) {
  if (bg())
    ;
  else
    bm(c);
}
} // namespace e
template <typename bp> struct br {
  template <typename... bs> br(int, bs... p2) : bt(p2...) {}
  bp bt;
};
template <typename bp> struct bv : br<bp> {
  template <typename... bs> bv(int, bs &&...);
  template <typename bu> bv(bu c) : bv(l, c) {}
  bp &operator*() &;
};
template <typename bp>
template <typename... bs>
bv<bp>::bv(int, bs &&...p2) : br<bp>(l, p2...) {}
template <typename bp> bp &bv<bp>::operator*() & { return this->bt; }
namespace ca {
template <typename bw> struct bx {
  using by = bw;
};
template <typename bz, typename bw> struct cc {
  static bool cb(bw c) { return bz::cd(&c); }
};
template <typename, typename> struct ce;
template <typename bz, typename bw> struct ce<bz, bw *> {
  static bool cb(bw *c) { return cc<bz, bw>::cb(*c); }
};
template <typename, typename, typename> struct cf;
template <typename bz, typename ci> struct cf<bz, ci, ci> {
  static bool cb(ci c) { return ce<bz, ci>::cb(c); }
};
struct cg {
  using cm = int *;
};
struct cj {
  using cm = cg::cm;
};
struct ck {
  using cm = cj::cm;
};
template <class> struct cl;
template <class ci> struct cl<ci *> {
  static ck::cm cb(ci *c) { return (ck::cm)c; }
};
template <typename bz, typename bw> struct cn {
  static bool cr(bw c) { return cf<bz, bw, typename bx<bw>::by>::cb(c); }
};
template <typename bz, typename bw> struct co : cn<bz, bw> {
  using cq = co;
  using cp = ck::cm;
  static cp cv(bw c) {
    cp cs;
    if (!cq::cr(c))
      return 0;
    cs = cl<typename bx<bw>::by>::cb(c);
    return cs;
  }
};
template <typename bz, typename bw> auto ct(bw c) { return co<bz, bw>::cv(c); }
} // namespace ca
using ca::ct;
namespace ca {
template <typename cy> struct cu {
  cy a;
};
template <typename, typename, int...> struct db;
template <typename cw, typename cx, int da> struct db<cw, cx, da> {
  cx ch;
};
struct cz : db<cz, cu<void *>, 0> {};
} // namespace ca
namespace clang {
struct SourceManager;
struct dc {
  using dd = int;
  dd de;
};
struct dg {
  dc df;
  dc d;
};
struct CharSourceRange {
  dg dh;
  bool di;
  CharSourceRange(bool) {}
};
struct LangOptions;
struct u {
  ca::cu<ca::cz> a;
};
struct dj {
  enum dk { v };
  class {
    friend dj;
    unsigned dl;
  } w;
  dk z() { return dk(w.dl); }
};
struct dm : dj {
  u dn;
};
struct CallExpr : dm {
  unsigned y;
  unsigned getNumArgs() { return y; }
  dm *getArg(unsigned);
  static bool cd(dj *c) { return c->z() <= v; }
};
struct ASTContext {
  SourceManager &SourceMgr;
  LangOptions &LangOpts;
  SourceManager &getSourceManager() { return SourceMgr; }
  LangOptions &getLangOpts() { return LangOpts; }
};
namespace tooling {
ac<CharSourceRange> getFileRangeForEdit(const CharSourceRange &,
                                        const SourceManager &,
                                        const LangOptions &, bool);
ac<CharSourceRange> getFileRangeForEdit(CharSourceRange c, ASTContext p2,
                                        bool p3) {
  SourceManager &__trans_tmp_4 = p2.getSourceManager();
  LangOptions &__trans_tmp_5 = p2.getLangOpts();
  return getFileRangeForEdit(c, __trans_tmp_4, __trans_tmp_5, p3);
}
} // namespace tooling
struct CallArg2 {
  ac<CharSourceRange> std_move_call;
};
void ParseCallArgs3(CallExpr &c, ASTContext p2) {
  int num_args = c.getNumArgs();
  j<CallArg2> args;
  for (int i = 0; i < num_args; ++i) {
    dm __trans_tmp_6 = *c.getArg(i), arg = __trans_tmp_6;
    dm *std_move_call = nullptr;
    if (ct<CallExpr>(&arg))
      std_move_call = &arg;
    ac<CharSourceRange> std_move_range;
    if (std_move_call)
      std_move_range = tooling::getFileRangeForEdit(0, p2, true);
    args.bl({std_move_range});
  }
  bv<j<CallArg2>> arg2(args);
  for (auto t : *arg2)
    t.std_move_call.m();
}
extern "C" {
void *pcontext;
void *pcall;
void ParseCallArgs2() {
  ParseCallArgs3(*(CallExpr *)pcall, *(ASTContext *)pcontext);
}
}
} // namespace clang

the produce a different assembly with and without patch
clang -cc1 -triple x86_64-grtev4-linux-gnu -S -target-cpu x86-64 -O1 -fsanitize=bool -x c++ test.cc -o - -w

And if linked into my binary produces UBsan report with D150388, and no report on D150388^.

@yassingh @arsenm
The major target is broken for too long, we need either fix or revert, even if it's a chain of patches.

In D150388#4528824, @vitalybuka wrote:

@yassingh @arsenm
The major target is broken for too long, we need either fix or revert, even if it's a chain of patches.

I just posted about this issue:
https://discourse.llvm.org/t/subreg-to-reg-semantics-or-x86s-zext-implementation-is-broken/72250

Either x86 is broken or the definition of SUBREG_TO_REG is broken

In D150388#4529229, @arsenm wrote:

In D150388#4528824, @vitalybuka wrote:

@yassingh @arsenm
The major target is broken for too long, we need either fix or revert, even if it's a chain of patches.

I just posted about this issue:
https://discourse.llvm.org/t/subreg-to-reg-semantics-or-x86s-zext-implementation-is-broken/72250

Either x86 is broken or the definition of SUBREG_TO_REG is broken

So if we have not fix, just discussion, should we revert? Even if it's preexisted unknown bug, I don't think it's appropriate to keep regression in existing functionality, for sake of fixing previously broken or missing functionally.
Is it possible to workaround/revert some change with "if (x86) do old way" ?

In D150388#4529268, @vitalybuka wrote:

In D150388#4529229, @arsenm wrote:

In D150388#4528824, @vitalybuka wrote:

@yassingh @arsenm
The major target is broken for too long, we need either fix or revert, even if it's a chain of patches.

I just posted about this issue:
https://discourse.llvm.org/t/subreg-to-reg-semantics-or-x86s-zext-implementation-is-broken/72250

Either x86 is broken or the definition of SUBREG_TO_REG is broken

So if we have not fix, just discussion, should we revert? Even if it's preexisted unknown bug, I don't think it's appropriate to keep regression in existing functionality, for sake of fixing previously broken or missing functionally.
Is it possible to workaround/revert some change with "if (x86) do old way" ?

Let me prepare a patch to insert explicit zeroing here, see if it fixes the issue and how badly it blows up the lit suite

In D150388#4529268, @vitalybuka wrote:

So if we have not fix, just discussion, should we revert? Even if it's preexisted unknown bug, I don't think it's appropriate to keep regression in existing functionality, for sake of fixing previously broken or missing functionally.
Is it possible to workaround/revert some change with "if (x86) do old way" ?

Try https://reviews.llvm.org/D156164 as a workaround. The high-bits-must-be-zero invariant disappears somewhere in the coalescer, this just hacks the resulting broken mov such that it hides the problem as it was before

In D150388#4529445, @arsenm wrote:

In D150388#4529268, @vitalybuka wrote:

So if we have not fix, just discussion, should we revert? Even if it's preexisted unknown bug, I don't think it's appropriate to keep regression in existing functionality, for sake of fixing previously broken or missing functionally.
Is it possible to workaround/revert some change with "if (x86) do old way" ?

Try https://reviews.llvm.org/D156164 as a workaround. The high-bits-must-be-zero invariant disappears somewhere in the coalescer, this just hacks the resulting broken mov such that it hides the problem as it was before

Much better, probably real fix incoming

In D150388#4529498, @arsenm wrote:

In D150388#4529445, @arsenm wrote:

In D150388#4529268, @vitalybuka wrote:

So if we have not fix, just discussion, should we revert? Even if it's preexisted unknown bug, I don't think it's appropriate to keep regression in existing functionality, for sake of fixing previously broken or missing functionally.
Is it possible to workaround/revert some change with "if (x86) do old way" ?

Try https://reviews.llvm.org/D156164 as a workaround. The high-bits-must-be-zero invariant disappears somewhere in the coalescer, this just hacks the resulting broken mov such that it hides the problem as it was before

Much better, probably real fix incoming

There is still neither a fix nor even a workaround in the tree. I think, the right solution in such cases is reverting to the last known good state and investigating asynchronously. I'm not saying this would be the best approach right now, given there is a workaround in review. But it would have been the right choice almost two weeks ago, when the issue was reported.

Can we have at least the workaround submitted?

In D150388#4533644, @alexfh wrote:

Can we have at least the workaround submitted?

Could use another review I think given that it should also be cherry picked to the release branch. I should have a more general fix tomorrow

In D150388#4533762, @arsenm wrote:

In D150388#4533644, @alexfh wrote:

Can we have at least the workaround submitted?

Could use another review I think given that it should also be cherry picked to the release branch. I should have a more general fix tomorrow

I'd be glad to help with the review, but I can't add more to the information @vitalybuka provided. Unfortunately I'm not an expert in the area and can't bring one in this case.

More generally speaking, this situation highlights the difference between reverting to a known good (*) state and fixing forward. When reverting, no thorough review is required and (if all dependent commits are reverted) we're guaranteed to return to a stable state reasonably quick. This way, for each discovered problem there is a commit soon enough that fixes it. This approach makes any release process of an LLVM based toolchain (be it one of the toolchains released by Google, official LLVM releases, toolchains in Linux distros, other companies' private toolchains, etc.) easier and more predictable. On the opposite side, an attempt to fix forward (unless trivial one that would normally require no review at all) can take arbitrarily long to be prepared and reviewed, and can open a completely new can of worms, especially if it triggers an unobvious problem that takes another few days to detect and root-cause.

(*) well, in the worst case, there may be known issues that the problematic commit was meant to resolve, but if "revert and then investigate" approach is applied systematically, these would be long-standing issues with initially a much longer resolution time frame. And here I'm assuming that fixing an issue should not lead to regressions in this or other parts of the project (hopefully, there's no disagreement on this point).

I'm not saying anything new here, just trying to push for a bit more systematic application of the existing patch reversion policy (https://llvm.org/docs/DeveloperPolicy.html#patch-reversion-policy) and, similarly, in the code review policy (https://llvm.org/docs/CodeReview.html#can-code-be-reviewed-after-it-is-committed).

More proper fix ends with stack at D156346

I've also pushed the stack to https://github.com/arsenm/llvm-project/tree/fix-subreg-to-reg-liveness

vitalybuka mentioned this in D156346: CodeGen: Disable isCopyInstrImpl if there are implicit operands.Jul 26 2023, 3:29 PM

vitalybuka added a reverting change: D156381: Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting".Jul 26 2023, 4:00 PM

alexfh mentioned this in D156381: Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting".Jul 26 2023, 5:53 PM

vitalybuka added a reverting change: rGa496c8be6e63: Revert "[CodeGen]Allow targets to use target specific COPY instructions for….Jul 26 2023, 10:13 PM

vitalybuka reopened this revision.Jul 26 2023, 10:15 PM

This revision is now accepted and ready to land.Jul 26 2023, 10:15 PM

Patch relanded with 4d42e8b5d1fa87e49768d100dd1bc53515391e89

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

TargetInstrInfo.h

17 lines

lib/

CodeGen/

15 lines

29 lines

3 lines

4 lines

33 lines

7 lines

17 lines

7 lines

test/

CodeGen/

Mips/

madd-msub.ll

84 lines

Thumb2/

mve-float32regloops.ll

15 lines

X86/

GlobalISel/

add-ext.ll

4 lines

dagcombine-cse.ll

20 lines

fold-and-shift-x86_64.ll

4 lines

unfold-masked-merge-scalar-constmask-lowhigh.ll

8 lines

Diff 534897

llvm/include/llvm/CodeGen/TargetInstrInfo.h

Show First 20 Lines • Show All 1,038 Lines • ▼ Show 20 Lines	public:
/// target-dependent implementation.		/// target-dependent implementation.
std::optional<DestSourcePair> isCopyInstr(const MachineInstr &MI) const {		std::optional<DestSourcePair> isCopyInstr(const MachineInstr &MI) const {
if (MI.isCopy()) {		if (MI.isCopy()) {
return DestSourcePair{MI.getOperand(0), MI.getOperand(1)};		return DestSourcePair{MI.getOperand(0), MI.getOperand(1)};
}		}
return isCopyInstrImpl(MI);		return isCopyInstrImpl(MI);
}		}

		bool isFullCopyInstr(const MachineInstr &MI) const {
		auto DestSrc = isCopyInstr(MI);
		if (!DestSrc)
		return false;
		qcolombetUnsubmitted Not Done Reply Inline Actions Instead of checking operand 0 and 1 directly, we should get the pair from `isCopyInstr` and check these. qcolombet: Instead of checking operand 0 and 1 directly, we should get the pair from `isCopyInstr` and…

		const MachineOperand *DestRegOp = DestSrc->Destination;
		const MachineOperand *SrcRegOp = DestSrc->Source;
		return !DestRegOp->getSubReg() && !SrcRegOp->getSubReg();
		}

/// If the specific machine instruction is an instruction that adds an		/// If the specific machine instruction is an instruction that adds an
/// immediate value and a physical register, and stores the result in		/// immediate value and a physical register, and stores the result in
/// the given physical register \c Reg, return a pair of the source		/// the given physical register \c Reg, return a pair of the source
/// register and the offset which has been added.		/// register and the offset which has been added.
virtual std::optional<RegImmPair> isAddImmediate(const MachineInstr &MI,		virtual std::optional<RegImmPair> isAddImmediate(const MachineInstr &MI,
Register Reg) const {		Register Reg) const {
return std::nullopt;		return std::nullopt;
}		}
▲ Show 20 Lines • Show All 894 Lines • ▼ Show 20 Lines	public:

/// True if the instruction is bound to the top of its basic block and no		/// True if the instruction is bound to the top of its basic block and no
/// other instructions shall be inserted before it. This can be implemented		/// other instructions shall be inserted before it. This can be implemented
/// to prevent register allocator to insert spills before such instructions.		/// to prevent register allocator to insert spills before such instructions.
virtual bool isBasicBlockPrologue(const MachineInstr &MI) const {		virtual bool isBasicBlockPrologue(const MachineInstr &MI) const {
return false;		return false;
}		}

		/// Allows targets to use appropriate copy instruction while spilitting live
		arsenmUnsubmitted Not Done Reply Inline Actions Register should be passed by value. MRI should also be const. Needs a doc comment. I'd also rename this to something like getLiveRangeSplitOpcode? arsenm: Register should be passed by value. MRI should also be const. Needs a doc comment. I'd also…
		/// range of a register in register allocation.
		virtual unsigned getLiveRangeSplitOpcode(Register Reg,
		const MachineFunction &MF) const {
		arsenmUnsubmitted Done Reply Inline Actions MRI should be const arsenm: MRI should be const
		cdevadasUnsubmitted Done Reply Inline Actions Can you make this function to take MachineFunction& as the second argument in the first place? You are changing it in D143762. cdevadas: Can you make this function to take MachineFunction& as the second argument in the first place?
		return TargetOpcode::COPY;
		}

/// During PHI eleimination lets target to make necessary checks and		/// During PHI eleimination lets target to make necessary checks and
/// insert the copy to the PHI destination register in a target specific		/// insert the copy to the PHI destination register in a target specific
/// manner.		/// manner.
virtual MachineInstr *createPHIDestinationCopy(		virtual MachineInstr *createPHIDestinationCopy(
MachineBasicBlock &MBB, MachineBasicBlock::iterator InsPt,		MachineBasicBlock &MBB, MachineBasicBlock::iterator InsPt,
const DebugLoc &DL, Register Src, Register Dst) const {		const DebugLoc &DL, Register Src, Register Dst) const {
return BuildMI(MBB, InsPt, DL, get(TargetOpcode::COPY), Dst)		return BuildMI(MBB, InsPt, DL, get(TargetOpcode::COPY), Dst)
.addReg(Src);		.addReg(Src);
▲ Show 20 Lines • Show All 187 Lines • Show Last 20 Lines

llvm/lib/CodeGen/CalcSpillWeights.cpp

Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	if (VNI->isPHIDef())
return false;		return false;

MachineInstr *MI = LIS.getInstructionFromIndex(VNI->def);		MachineInstr *MI = LIS.getInstructionFromIndex(VNI->def);
assert(MI && "Dead valno in interval");		assert(MI && "Dead valno in interval");

// Trace copies introduced by live range splitting. The inline		// Trace copies introduced by live range splitting. The inline
// spiller can rematerialize through these copies, so the spill		// spiller can rematerialize through these copies, so the spill
// weight must reflect this.		// weight must reflect this.
while (MI->isFullCopy()) {		while (TII.isFullCopyInstr(*MI)) {
// The copy destination must match the interval register.		// The copy destination must match the interval register.
if (MI->getOperand(0).getReg() != Reg)		if (MI->getOperand(0).getReg() != Reg)
return false;		return false;

// Get the source register.		// Get the source register.
Reg = MI->getOperand(1).getReg();		Reg = MI->getOperand(1).getReg();

// If the original (pre-splitting) registers match this		// If the original (pre-splitting) registers match this
▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	for (MachineRegisterInfo::reg_instr_nodbg_iterator

// For local split artifacts, we are interested only in instructions between		// For local split artifacts, we are interested only in instructions between
// the expected start and end of the range.		// the expected start and end of the range.
SlotIndex SI = LIS.getInstructionIndex(*MI);		SlotIndex SI = LIS.getInstructionIndex(*MI);
if (IsLocalSplitArtifact && ((SI < Start) \|\| (SI > End)))		if (IsLocalSplitArtifact && ((SI < Start) \|\| (SI > End)))
continue;		continue;

NumInstr++;		NumInstr++;
if (MI->isIdentityCopy() \|\| MI->isImplicitDef())		bool identityCopy = false;
		auto DestSrc = TII.isCopyInstr(*MI);
		qcolombetUnsubmitted Done Reply Inline Actions Similar comment, everywhere we introduce a `isCopyInstr`, we need to check the returned pair. qcolombet: Similar comment, everywhere we introduce a `isCopyInstr`, we need to check the returned pair.
		if (DestSrc) {
		const MachineOperand *DestRegOp = DestSrc->Destination;
		const MachineOperand *SrcRegOp = DestSrc->Source;
		identityCopy = DestRegOp->getReg() == SrcRegOp->getReg() &&
		DestRegOp->getSubReg() == SrcRegOp->getSubReg();
		}

		if (identityCopy \|\| MI->isImplicitDef())
continue;		continue;
if (!Visited.insert(MI).second)		if (!Visited.insert(MI).second)
continue;		continue;

// For terminators that produce values, ask the backend if the register is		// For terminators that produce values, ask the backend if the register is
// not spillable.		// not spillable.
if (TII.isUnspillableTerminator(MI) && MI->definesRegister(LI.reg())) {		if (TII.isUnspillableTerminator(MI) && MI->definesRegister(LI.reg())) {
LI.markNotSpillable();		LI.markNotSpillable();
Show All 17 Lines	if (IsSpillable) {
// Give extra weight to what looks like a loop induction variable update.		// Give extra weight to what looks like a loop induction variable update.
if (Writes && IsExiting && LIS.isLiveOutOfMBB(LI, MBB))		if (Writes && IsExiting && LIS.isLiveOutOfMBB(LI, MBB))
Weight *= 3;		Weight *= 3;

TotalWeight += Weight;		TotalWeight += Weight;
}		}

// Get allocation hints from copies.		// Get allocation hints from copies.
if (!MI->isCopy())		if (!TII.isCopyInstr(*MI))
continue;		continue;
Register HintReg = copyHint(MI, LI.reg(), TRI, MRI);		Register HintReg = copyHint(MI, LI.reg(), TRI, MRI);
if (!HintReg)		if (!HintReg)
continue;		continue;
// Force hweight onto the stack so that x86 doesn't add hidden precision,		// Force hweight onto the stack so that x86 doesn't add hidden precision,
// making the comparison incorrectly pass (i.e., 1 > 1 == true??).		// making the comparison incorrectly pass (i.e., 1 > 1 == true??).
//		//
// FIXME: we probably shouldn't use floats at all.		// FIXME: we probably shouldn't use floats at all.
▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

llvm/lib/CodeGen/InlineSpiller.cpp

Show First 20 Lines • Show All 250 Lines • ▼ Show 20 Lines
// When spilling a virtual register, we also spill any snippets it is connected		// When spilling a virtual register, we also spill any snippets it is connected
// to. The snippets are small live ranges that only have a single real use,		// to. The snippets are small live ranges that only have a single real use,
// leftovers from live range splitting. Spilling them enables memory operand		// leftovers from live range splitting. Spilling them enables memory operand
// folding or tightens the live range around the single use.		// folding or tightens the live range around the single use.
//		//
// This minimizes register pressure and maximizes the store-to-load distance for		// This minimizes register pressure and maximizes the store-to-load distance for
// spill slots which can be important in tight loops.		// spill slots which can be important in tight loops.

/// If MI is a COPY to or from Reg, return the other register, otherwise return		/// isFullCopyOf - If MI is a COPY to or from Reg, return the other register,
/// 0.		/// otherwise return 0.
static Register isCopyOf(const MachineInstr &MI, Register Reg) {		static Register isCopyOf(const MachineInstr &MI, Register Reg,
assert(!MI.isBundled());		const TargetInstrInfo &TII) {
if (!MI.isCopy())		if (!TII.isCopyInstr(MI))
return Register();		return Register();

const MachineOperand &DstOp = MI.getOperand(0);		const MachineOperand &DstOp = MI.getOperand(0);
const MachineOperand &SrcOp = MI.getOperand(1);		const MachineOperand &SrcOp = MI.getOperand(1);

// TODO: Probably only worth allowing subreg copies with undef dests.		// TODO: Probably only worth allowing subreg copies with undef dests.
if (DstOp.getSubReg() != SrcOp.getSubReg())		if (DstOp.getSubReg() != SrcOp.getSubReg())
return Register();		return Register();
if (DstOp.getReg() == Reg)		if (DstOp.getReg() == Reg)
return SrcOp.getReg();		return SrcOp.getReg();
if (SrcOp.getReg() == Reg)		if (SrcOp.getReg() == Reg)
return DstOp.getReg();		return DstOp.getReg();
return Register();		return Register();
}		}

/// Check for a copy bundle as formed by SplitKit.		/// Check for a copy bundle as formed by SplitKit.
static Register isCopyOfBundle(const MachineInstr &FirstMI, Register Reg) {		static Register isCopyOfBundle(const MachineInstr &FirstMI, Register Reg,
		const TargetInstrInfo &TII) {
if (!FirstMI.isBundled())		if (!FirstMI.isBundled())
return isCopyOf(FirstMI, Reg);		return isCopyOf(FirstMI, Reg, TII);

assert(!FirstMI.isBundledWithPred() && FirstMI.isBundledWithSucc() &&		assert(!FirstMI.isBundledWithPred() && FirstMI.isBundledWithSucc() &&
"expected to see first instruction in bundle");		"expected to see first instruction in bundle");

Register SnipReg;		Register SnipReg;
MachineBasicBlock::const_instr_iterator I = FirstMI.getIterator();		MachineBasicBlock::const_instr_iterator I = FirstMI.getIterator();
while (I->isBundledWithSucc()) {		while (I->isBundledWithSucc()) {
const MachineInstr &MI = *I;		const MachineInstr &MI = *I;
if (!MI.isCopy())		if (!TII.isCopyInstr(FirstMI))
return Register();		return Register();

const MachineOperand &DstOp = MI.getOperand(0);		const MachineOperand &DstOp = MI.getOperand(0);
const MachineOperand &SrcOp = MI.getOperand(1);		const MachineOperand &SrcOp = MI.getOperand(1);
if (DstOp.getReg() == Reg) {		if (DstOp.getReg() == Reg) {
if (!SnipReg)		if (!SnipReg)
SnipReg = SrcOp.getReg();		SnipReg = SrcOp.getReg();
else if (SnipReg != SrcOp.getReg())		else if (SnipReg != SrcOp.getReg())
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	bool InlineSpiller::isSnippet(const LiveInterval &SnipLI) {
// Check that all uses satisfy our criteria.		// Check that all uses satisfy our criteria.
for (MachineRegisterInfo::reg_bundle_nodbg_iterator		for (MachineRegisterInfo::reg_bundle_nodbg_iterator
RI = MRI.reg_bundle_nodbg_begin(SnipLI.reg()),		RI = MRI.reg_bundle_nodbg_begin(SnipLI.reg()),
E = MRI.reg_bundle_nodbg_end();		E = MRI.reg_bundle_nodbg_end();
RI != E;) {		RI != E;) {
MachineInstr &MI = *RI++;		MachineInstr &MI = *RI++;

// Allow copies to/from Reg.		// Allow copies to/from Reg.
if (isCopyOfBundle(MI, Reg))		if (isCopyOfBundle(MI, Reg, TII))
continue;		continue;

// Allow stack slot loads.		// Allow stack slot loads.
int FI;		int FI;
if (SnipLI.reg() == TII.isLoadFromStackSlot(MI, FI) && FI == StackSlot)		if (SnipLI.reg() == TII.isLoadFromStackSlot(MI, FI) && FI == StackSlot)
continue;		continue;

// Allow stack slot stores.		// Allow stack slot stores.
Show All 21 Lines	void InlineSpiller::collectRegsToSpill() {
SnippetCopies.clear();		SnippetCopies.clear();

// Snippets all have the same original, so there can't be any for an original		// Snippets all have the same original, so there can't be any for an original
// register.		// register.
if (Original == Reg)		if (Original == Reg)
return;		return;

for (MachineInstr &MI : llvm::make_early_inc_range(MRI.reg_bundles(Reg))) {		for (MachineInstr &MI : llvm::make_early_inc_range(MRI.reg_bundles(Reg))) {
Register SnipReg = isCopyOfBundle(MI, Reg);		Register SnipReg = isCopyOfBundle(MI, Reg, TII);
if (!isSibling(SnipReg))		if (!isSibling(SnipReg))
continue;		continue;
LiveInterval &SnipLI = LIS.getInterval(SnipReg);		LiveInterval &SnipLI = LIS.getInterval(SnipReg);
if (!isSnippet(SnipLI))		if (!isSnippet(SnipLI))
continue;		continue;
SnippetCopies.insert(&MI);		SnippetCopies.insert(&MI);
if (isRegToSpill(SnipReg))		if (isRegToSpill(SnipReg))
continue;		continue;
▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	do {

// Add all of VNI's live range to StackInt.		// Add all of VNI's live range to StackInt.
StackInt->MergeValueInAsValue(*LI, VNI, StackInt->getValNumInfo(0));		StackInt->MergeValueInAsValue(*LI, VNI, StackInt->getValNumInfo(0));
LLVM_DEBUG(dbgs() << "Merged to stack int: " << *StackInt << '\n');		LLVM_DEBUG(dbgs() << "Merged to stack int: " << *StackInt << '\n');

// Find all spills and copies of VNI.		// Find all spills and copies of VNI.
for (MachineInstr &MI :		for (MachineInstr &MI :
llvm::make_early_inc_range(MRI.use_nodbg_bundles(Reg))) {		llvm::make_early_inc_range(MRI.use_nodbg_bundles(Reg))) {
if (!MI.isCopy() && !MI.mayStore())		if (!MI.mayStore() && !TII.isCopyInstr(MI))
		arsenmUnsubmitted Done Reply Inline Actions swap these checks arsenm: swap these checks
continue;		continue;
SlotIndex Idx = LIS.getInstructionIndex(MI);		SlotIndex Idx = LIS.getInstructionIndex(MI);
if (LI->getVNInfoAt(Idx) != VNI)		if (LI->getVNInfoAt(Idx) != VNI)
continue;		continue;

// Follow sibling copies down the dominator tree.		// Follow sibling copies down the dominator tree.
if (Register DstReg = isCopyOfBundle(MI, Reg)) {		if (Register DstReg = isCopyOfBundle(MI, Reg, TII)) {
if (isSibling(DstReg)) {		if (isSibling(DstReg)) {
LiveInterval &DstLI = LIS.getInterval(DstReg);		LiveInterval &DstLI = LIS.getInterval(DstReg);
VNInfo *DstVNI = DstLI.getVNInfoAt(Idx.getRegSlot());		VNInfo *DstVNI = DstLI.getVNInfoAt(Idx.getRegSlot());
assert(DstVNI && "Missing defined value");		assert(DstVNI && "Missing defined value");
assert(DstVNI->def == Idx.getRegSlot() && "Wrong copy def slot");		assert(DstVNI->def == Idx.getRegSlot() && "Wrong copy def slot");

WorkList.push_back(std::make_pair(&DstLI, DstVNI));		WorkList.push_back(std::make_pair(&DstLI, DstVNI));
}		}
▲ Show 20 Lines • Show All 327 Lines • ▼ Show 20 Lines	foldMemoryOperand(ArrayRef<std::pair<MachineInstr *, unsigned>> Ops,
MachineInstr *LoadMI) {		MachineInstr *LoadMI) {
if (Ops.empty())		if (Ops.empty())
return false;		return false;
// Don't attempt folding in bundles.		// Don't attempt folding in bundles.
MachineInstr *MI = Ops.front().first;		MachineInstr *MI = Ops.front().first;
if (Ops.back().first != MI \|\| MI->isBundled())		if (Ops.back().first != MI \|\| MI->isBundled())
return false;		return false;

bool WasCopy = MI->isCopy();		bool WasCopy = TII.isCopyInstr(*MI).has_value();
		arsenmUnsubmitted Done Reply Inline Actions Don't need ? true : false arsenm: Don't need ? true : false
		yassinghAuthorUnsubmitted Done Reply Inline Actions isCopyInstr() returns std::optional. Should I use 'auto' to get rid of the ternary operator? yassingh: isCopyInstr() returns std::optional. Should I use 'auto' to get rid of the ternary operator?
		arsenmUnsubmitted Not Done Reply Inline Actions or .has_value arsenm: or .has_value
Register ImpReg;		Register ImpReg;

// TII::foldMemoryOperand will do what we need here for statepoint		// TII::foldMemoryOperand will do what we need here for statepoint
// (fold load into use and remove corresponding def). We will replace		// (fold load into use and remove corresponding def). We will replace
// uses of removed def with loads (spillAroundUses).		// uses of removed def with loads (spillAroundUses).
// For that to work we need to untie def and use to pass it through		// For that to work we need to untie def and use to pass it through
// foldMemoryOperand and signal foldPatchpoint that it is allowed to		// foldMemoryOperand and signal foldPatchpoint that it is allowed to
// fold them.		// fold them.
▲ Show 20 Lines • Show All 268 Lines • ▼ Show 20 Lines	for (MachineInstr &MI : llvm::make_early_inc_range(MRI.reg_bundles(Reg))) {
// Find the slot index where this instruction reads and writes OldLI.		// Find the slot index where this instruction reads and writes OldLI.
// This is usually the def slot, except for tied early clobbers.		// This is usually the def slot, except for tied early clobbers.
SlotIndex Idx = LIS.getInstructionIndex(MI).getRegSlot();		SlotIndex Idx = LIS.getInstructionIndex(MI).getRegSlot();
if (VNInfo *VNI = OldLI.getVNInfoAt(Idx.getRegSlot(true)))		if (VNInfo *VNI = OldLI.getVNInfoAt(Idx.getRegSlot(true)))
if (SlotIndex::isSameInstr(Idx, VNI->def))		if (SlotIndex::isSameInstr(Idx, VNI->def))
Idx = VNI->def;		Idx = VNI->def;

// Check for a sibling copy.		// Check for a sibling copy.
Register SibReg = isCopyOfBundle(MI, Reg);		Register SibReg = isCopyOfBundle(MI, Reg, TII);
if (SibReg && isSibling(SibReg)) {		if (SibReg && isSibling(SibReg)) {
// This may actually be a copy between snippets.		// This may actually be a copy between snippets.
if (isRegToSpill(SibReg)) {		if (isRegToSpill(SibReg)) {
LLVM_DEBUG(dbgs() << "Found new snippet copy: " << MI);		LLVM_DEBUG(dbgs() << "Found new snippet copy: " << MI);
SnippetCopies.insert(&MI);		SnippetCopies.insert(&MI);
continue;		continue;
}		}
if (RI.Writes) {		if (RI.Writes) {
▲ Show 20 Lines • Show All 525 Lines • Show Last 20 Lines

llvm/lib/CodeGen/LiveRangeEdit.cpp

Show First 20 Lines • Show All 346 Lines • ▼ Show 20 Lines	if (!Reg.isVirtual()) {
continue;		continue;
}		}
LiveInterval &LI = LIS.getInterval(Reg);		LiveInterval &LI = LIS.getInterval(Reg);

// Shrink read registers, unless it is likely to be expensive and		// Shrink read registers, unless it is likely to be expensive and
// unlikely to change anything. We typically don't want to shrink the		// unlikely to change anything. We typically don't want to shrink the
// PIC base register that has lots of uses everywhere.		// PIC base register that has lots of uses everywhere.
// Always shrink COPY uses that probably come from live range splitting.		// Always shrink COPY uses that probably come from live range splitting.
if ((MI->readsVirtualRegister(Reg) && (MI->isCopy() \|\| MO.isDef())) \|\|		if ((MI->readsVirtualRegister(Reg) &&
		(MO.isDef() \|\| TII.isCopyInstr(*MI))) \|\|
		arsenmUnsubmitted Done Reply Inline Actions swap these arsenm: swap these
(MO.readsReg() && (MRI.hasOneNonDBGUse(Reg) \|\| useIsKill(LI, MO))))		(MO.readsReg() && (MRI.hasOneNonDBGUse(Reg) \|\| useIsKill(LI, MO))))
ToShrink.insert(&LI);		ToShrink.insert(&LI);
else if (MO.readsReg())		else if (MO.readsReg())
HasLiveVRegUses = true;		HasLiveVRegUses = true;

// Remove defined value.		// Remove defined value.
if (MO.isDef()) {		if (MO.isDef()) {
if (TheDelegate && LI.getVNInfoAt(Idx) != nullptr)		if (TheDelegate && LI.getVNInfoAt(Idx) != nullptr)
▲ Show 20 Lines • Show All 145 Lines • Show Last 20 Lines

llvm/lib/CodeGen/LiveRangeShrink.cpp

Show All 17 Lines
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/iterator_range.h"		#include "llvm/ADT/iterator_range.h"
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineInstr.h"		#include "llvm/CodeGen/MachineInstr.h"
#include "llvm/CodeGen/MachineOperand.h"		#include "llvm/CodeGen/MachineOperand.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
		#include "llvm/CodeGen/TargetInstrInfo.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <iterator>		#include <iterator>
#include <utility>		#include <utility>

using namespace llvm;		using namespace llvm;
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	for (MachineInstr &I : make_range(Start, Start->getParent()->end()))
M[&I] = i++;		M[&I] = i++;
}		}

bool LiveRangeShrink::runOnMachineFunction(MachineFunction &MF) {		bool LiveRangeShrink::runOnMachineFunction(MachineFunction &MF) {
if (skipFunction(MF.getFunction()))		if (skipFunction(MF.getFunction()))
return false;		return false;

MachineRegisterInfo &MRI = MF.getRegInfo();		MachineRegisterInfo &MRI = MF.getRegInfo();
		const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();

LLVM_DEBUG(dbgs() << "**** Analysing " << MF.getName() << '\n');		LLVM_DEBUG(dbgs() << "**** Analysing " << MF.getName() << '\n');

InstOrderMap IOM;		InstOrderMap IOM;
// Map from register to instruction order (value of IOM) where the		// Map from register to instruction order (value of IOM) where the
// register is used last. When moving instructions up, we need to		// register is used last. When moving instructions up, we need to
// make sure all its defs (including dead def) will not cross its		// make sure all its defs (including dead def) will not cross its
// last use when moving up.		// last use when moving up.
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	for (MachineBasicBlock::iterator Next = MBB.begin(); Next != MBB.end();) {
} else if (MRI.hasOneNonDBGUse(Reg) && MRI.hasOneDef(Reg) && DefMO &&		} else if (MRI.hasOneNonDBGUse(Reg) && MRI.hasOneDef(Reg) && DefMO &&
MRI.getRegClass(DefMO->getReg()) ==		MRI.getRegClass(DefMO->getReg()) ==
MRI.getRegClass(MO.getReg())) {		MRI.getRegClass(MO.getReg())) {
// The heuristic does not handle different register classes yet		// The heuristic does not handle different register classes yet
// (registers of different sizes, looser/tighter constraints). This		// (registers of different sizes, looser/tighter constraints). This
// is because it needs more accurate model to handle register		// is because it needs more accurate model to handle register
// pressure correctly.		// pressure correctly.
MachineInstr &DefInstr = *MRI.def_instr_begin(Reg);		MachineInstr &DefInstr = *MRI.def_instr_begin(Reg);
if (!DefInstr.isCopy())		if (!TII.isCopyInstr(DefInstr))
		arsenmUnsubmitted Done Reply Inline Actions Move this to the function prolog arsenm: Move this to the function prolog
NumEligibleUse++;		NumEligibleUse++;
Insert = FindDominatedInstruction(DefInstr, Insert, IOM);		Insert = FindDominatedInstruction(DefInstr, Insert, IOM);
} else {		} else {
Insert = nullptr;		Insert = nullptr;
break;		break;
}		}
}		}

Show All 37 Lines

llvm/lib/CodeGen/RegAllocGreedy.cpp

Show First 20 Lines • Show All 1,276 Lines • ▼ Show 20 Lines	static LaneBitmask getInstReadLaneMask(const MachineRegisterInfo &MRI,

return Mask;		return Mask;
}		}

/// Return true if \p MI at \P Use reads a subset of the lanes live in \p		/// Return true if \p MI at \P Use reads a subset of the lanes live in \p
/// VirtReg.		/// VirtReg.
static bool readsLaneSubset(const MachineRegisterInfo &MRI,		static bool readsLaneSubset(const MachineRegisterInfo &MRI,
const MachineInstr *MI, const LiveInterval &VirtReg,		const MachineInstr *MI, const LiveInterval &VirtReg,
const TargetRegisterInfo *TRI, SlotIndex Use) {		const TargetRegisterInfo *TRI, SlotIndex Use,
		const TargetInstrInfo *TII) {
// Early check the common case.		// Early check the common case.
if (MI->isCopy() &&		auto DestSrc = TII->isCopyInstr(*MI);
MI->getOperand(0).getSubReg() == MI->getOperand(1).getSubReg())		if (DestSrc &&
		DestSrc->Destination->getSubReg() == DestSrc->Source->getSubReg())
return false;		return false;

// FIXME: We're only considering uses, but should be consider defs too?		// FIXME: We're only considering uses, but should be consider defs too?
LaneBitmask ReadMask = getInstReadLaneMask(MRI, TRI, MI, VirtReg.reg());		LaneBitmask ReadMask = getInstReadLaneMask(MRI, TRI, MI, VirtReg.reg());

LaneBitmask LiveAtMask;		LaneBitmask LiveAtMask;
for (const LiveInterval::SubRange &S : VirtReg.subranges()) {		for (const LiveInterval::SubRange &S : VirtReg.subranges()) {
if (S.liveAt(Use))		if (S.liveAt(Use))
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	unsigned RAGreedy::tryInstructionSplit(const LiveInterval &VirtReg,
unsigned SuperRCNumAllocatableRegs =		unsigned SuperRCNumAllocatableRegs =
RegClassInfo.getNumAllocatableRegs(SuperRC);		RegClassInfo.getNumAllocatableRegs(SuperRC);
// Split around every non-copy instruction if this split will relax		// Split around every non-copy instruction if this split will relax
// the constraints on the virtual register.		// the constraints on the virtual register.
// Otherwise, splitting just inserts uncoalescable copies that do not help		// Otherwise, splitting just inserts uncoalescable copies that do not help
// the allocation.		// the allocation.
for (const SlotIndex Use : Uses) {		for (const SlotIndex Use : Uses) {
if (const MachineInstr *MI = Indexes->getInstructionFromIndex(Use)) {		if (const MachineInstr *MI = Indexes->getInstructionFromIndex(Use)) {
if (MI->isFullCopy() \|\|		if (TII->isFullCopyInstr(*MI) \|\|
(SplitSubClass &&		(SplitSubClass &&
SuperRCNumAllocatableRegs ==		SuperRCNumAllocatableRegs ==
getNumAllocatableRegsForConstraints(MI, VirtReg.reg(), SuperRC,		getNumAllocatableRegsForConstraints(MI, VirtReg.reg(), SuperRC,
TII, TRI, RegClassInfo)) \|\|		TII, TRI, RegClassInfo)) \|\|
// TODO: Handle split for subranges with subclass constraints?		// TODO: Handle split for subranges with subclass constraints?
(!SplitSubClass && VirtReg.hasSubRanges() &&		(!SplitSubClass && VirtReg.hasSubRanges() &&
!readsLaneSubset(*MRI, MI, VirtReg, TRI, Use))) {		!readsLaneSubset(*MRI, MI, VirtReg, TRI, Use, TII))) {
LLVM_DEBUG(dbgs() << " skip:\t" << Use << '\t' << *MI);		LLVM_DEBUG(dbgs() << " skip:\t" << Use << '\t' << *MI);
continue;		continue;
}		}
}		}
SE->openIntv();		SE->openIntv();
SlotIndex SegStart = SE->enterIntvBefore(Use);		SlotIndex SegStart = SE->enterIntvBefore(Use);
SlotIndex SegStop = SE->leaveIntvAfter(Use);		SlotIndex SegStop = SE->leaveIntvAfter(Use);
SE->useIntv(SegStart, SegStop);		SE->useIntv(SegStart, SegStop);
▲ Show 20 Lines • Show All 770 Lines • ▼ Show 20 Lines	else
CSRCost = CSRCost.getFrequency() * (ActualEntry / FixedEntry);		CSRCost = CSRCost.getFrequency() * (ActualEntry / FixedEntry);
}		}

/// Collect the hint info for \p Reg.		/// Collect the hint info for \p Reg.
/// The results are stored into \p Out.		/// The results are stored into \p Out.
/// \p Out is not cleared before being populated.		/// \p Out is not cleared before being populated.
void RAGreedy::collectHintInfo(Register Reg, HintsInfo &Out) {		void RAGreedy::collectHintInfo(Register Reg, HintsInfo &Out) {
for (const MachineInstr &Instr : MRI->reg_nodbg_instructions(Reg)) {		for (const MachineInstr &Instr : MRI->reg_nodbg_instructions(Reg)) {
if (!Instr.isFullCopy())		if (!TII->isFullCopyInstr(Instr))
continue;		continue;
// Look for the other end of the copy.		// Look for the other end of the copy.
Register OtherReg = Instr.getOperand(0).getReg();		Register OtherReg = Instr.getOperand(0).getReg();
if (OtherReg == Reg) {		if (OtherReg == Reg) {
OtherReg = Instr.getOperand(1).getReg();		OtherReg = Instr.getOperand(1).getReg();
if (OtherReg == Reg)		if (OtherReg == Reg)
continue;		continue;
}		}
▲ Show 20 Lines • Show All 297 Lines • ▼ Show 20 Lines	auto isSpillSlotAccess = [&MFI](const MachineMemOperand *A) {
return MFI.isSpillSlotObjectIndex(cast<FixedStackPseudoSourceValue>(		return MFI.isSpillSlotObjectIndex(cast<FixedStackPseudoSourceValue>(
A->getPseudoValue())->getFrameIndex());		A->getPseudoValue())->getFrameIndex());
};		};
auto isPatchpointInstr = [](const MachineInstr &MI) {		auto isPatchpointInstr = [](const MachineInstr &MI) {
return MI.getOpcode() == TargetOpcode::PATCHPOINT \|\|		return MI.getOpcode() == TargetOpcode::PATCHPOINT \|\|
MI.getOpcode() == TargetOpcode::STACKMAP \|\|		MI.getOpcode() == TargetOpcode::STACKMAP \|\|
MI.getOpcode() == TargetOpcode::STATEPOINT;		MI.getOpcode() == TargetOpcode::STATEPOINT;
};		};
for (MachineInstr &MI : MBB) {		for (MachineInstr &MI : MBB) {
if (MI.isCopy()) {		auto DestSrc = TII->isCopyInstr(MI);
		qcolombetUnsubmitted Not Done Reply Inline Actions Ditto qcolombet: Ditto
const MachineOperand &Dest = MI.getOperand(0);		if (DestSrc) {
const MachineOperand &Src = MI.getOperand(1);		const MachineOperand *Dest = DestSrc->Destination;
		cdevadasUnsubmitted Not Done Reply Inline Actions Change both Dest and Src to Ref as in the original code. `const MachineOperand &Dest = DestSrc->Destination;` That will avoid the additional changes you made below. cdevadas:* Change both Dest and Src to Ref as in the original code. `const MachineOperand &Dest = *DestSrc…
Register SrcReg = Src.getReg();		const MachineOperand *Src = DestSrc->Source;
Register DestReg = Dest.getReg();		Register SrcReg = Src->getReg();
		Register DestReg = Dest->getReg();
// Only count `COPY`s with a virtual register as source or destination.		// Only count `COPY`s with a virtual register as source or destination.
if (SrcReg.isVirtual() \|\| DestReg.isVirtual()) {		if (SrcReg.isVirtual() \|\| DestReg.isVirtual()) {
if (SrcReg.isVirtual()) {		if (SrcReg.isVirtual()) {
SrcReg = VRM->getPhys(SrcReg);		SrcReg = VRM->getPhys(SrcReg);
if (Src.getSubReg())		if (Src->getSubReg())
SrcReg = TRI->getSubReg(SrcReg, Src.getSubReg());		SrcReg = TRI->getSubReg(SrcReg, Src->getSubReg());
}		}
if (DestReg.isVirtual()) {		if (DestReg.isVirtual()) {
DestReg = VRM->getPhys(DestReg);		DestReg = VRM->getPhys(DestReg);
if (Dest.getSubReg())		if (Dest->getSubReg())
DestReg = TRI->getSubReg(DestReg, Dest.getSubReg());		DestReg = TRI->getSubReg(DestReg, Dest->getSubReg());
}		}
if (SrcReg != DestReg)		if (SrcReg != DestReg)
++Stats.Copies;		++Stats.Copies;
}		}
continue;		continue;
}		}

SmallVector<const MachineMemOperand *, 2> Accesses;		SmallVector<const MachineMemOperand *, 2> Accesses;
▲ Show 20 Lines • Show All 190 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SplitKit.h

Show First 20 Lines • Show All 422 Lines • ▼ Show 20 Lines	private:
/// Add a copy instruction copying \p FromReg to \p ToReg before		/// Add a copy instruction copying \p FromReg to \p ToReg before
/// \p InsertBefore. This can be invoked with a \p LaneMask which may make it		/// \p InsertBefore. This can be invoked with a \p LaneMask which may make it
/// necessary to construct a sequence of copies to cover it exactly.		/// necessary to construct a sequence of copies to cover it exactly.
SlotIndex buildCopy(Register FromReg, Register ToReg, LaneBitmask LaneMask,		SlotIndex buildCopy(Register FromReg, Register ToReg, LaneBitmask LaneMask,
MachineBasicBlock &MBB, MachineBasicBlock::iterator InsertBefore,		MachineBasicBlock &MBB, MachineBasicBlock::iterator InsertBefore,
bool Late, unsigned RegIdx);		bool Late, unsigned RegIdx);

SlotIndex buildSingleSubRegCopy(Register FromReg, Register ToReg,		SlotIndex buildSingleSubRegCopy(Register FromReg, Register ToReg,
MachineBasicBlock &MB, MachineBasicBlock::iterator InsertBefore,		MachineBasicBlock &MB,
unsigned SubIdx, LiveInterval &DestLI, bool Late, SlotIndex Def);		MachineBasicBlock::iterator InsertBefore,
		unsigned SubIdx, LiveInterval &DestLI,
		bool Late, SlotIndex Def,
		const MCInstrDesc &Desc);

public:		public:
/// Create a new SplitEditor for editing the LiveInterval analyzed by SA.		/// Create a new SplitEditor for editing the LiveInterval analyzed by SA.
/// Newly created intervals will be appended to newIntervals.		/// Newly created intervals will be appended to newIntervals.
SplitEditor(SplitAnalysis &SA, LiveIntervals &LIS, VirtRegMap &VRM,		SplitEditor(SplitAnalysis &SA, LiveIntervals &LIS, VirtRegMap &VRM,
MachineDominatorTree &MDT, MachineBlockFrequencyInfo &MBFI,		MachineDominatorTree &MDT, MachineBlockFrequencyInfo &MBFI,
VirtRegAuxInfo &VRAI);		VirtRegAuxInfo &VRAI);

▲ Show 20 Lines • Show All 117 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SplitKit.cpp

Show First 20 Lines • Show All 508 Lines • ▼ Show 20 Lines	void SplitEditor::forceRecompute(unsigned RegIdx, const VNInfo &ParentVNI) {
// This was previously a single mapping. Make sure the old def is represented		// This was previously a single mapping. Make sure the old def is represented
// by a trivial live range.		// by a trivial live range.
addDeadDef(LIS.getInterval(Edit->get(RegIdx)), VNI, false);		addDeadDef(LIS.getInterval(Edit->get(RegIdx)), VNI, false);

// Mark as complex mapped, forced.		// Mark as complex mapped, forced.
VFP = ValueForcePair(nullptr, true);		VFP = ValueForcePair(nullptr, true);
}		}

SlotIndex SplitEditor::buildSingleSubRegCopy(Register FromReg, Register ToReg,		SlotIndex SplitEditor::buildSingleSubRegCopy(
MachineBasicBlock &MBB, MachineBasicBlock::iterator InsertBefore,		Register FromReg, Register ToReg, MachineBasicBlock &MBB,
unsigned SubIdx, LiveInterval &DestLI, bool Late, SlotIndex Def) {		MachineBasicBlock::iterator InsertBefore, unsigned SubIdx,
const MCInstrDesc &Desc = TII.get(TargetOpcode::COPY);		LiveInterval &DestLI, bool Late, SlotIndex Def, const MCInstrDesc &Desc) {
bool FirstCopy = !Def.isValid();		bool FirstCopy = !Def.isValid();
MachineInstr *CopyMI = BuildMI(MBB, InsertBefore, DebugLoc(), Desc)		MachineInstr *CopyMI = BuildMI(MBB, InsertBefore, DebugLoc(), Desc)
.addReg(ToReg, RegState::Define \| getUndefRegState(FirstCopy)		.addReg(ToReg, RegState::Define \| getUndefRegState(FirstCopy)
\| getInternalReadRegState(!FirstCopy), SubIdx)		\| getInternalReadRegState(!FirstCopy), SubIdx)
.addReg(FromReg, 0, SubIdx);		.addReg(FromReg, 0, SubIdx);

SlotIndexes &Indexes = *LIS.getSlotIndexes();		SlotIndexes &Indexes = *LIS.getSlotIndexes();
if (FirstCopy) {		if (FirstCopy) {
Def = Indexes.insertMachineInstrInMaps(*CopyMI, Late).getRegSlot();		Def = Indexes.insertMachineInstrInMaps(*CopyMI, Late).getRegSlot();
} else {		} else {
CopyMI->bundleWithPred();		CopyMI->bundleWithPred();
}		}
return Def;		return Def;
}		}

SlotIndex SplitEditor::buildCopy(Register FromReg, Register ToReg,		SlotIndex SplitEditor::buildCopy(Register FromReg, Register ToReg,
LaneBitmask LaneMask, MachineBasicBlock &MBB,		LaneBitmask LaneMask, MachineBasicBlock &MBB,
MachineBasicBlock::iterator InsertBefore, bool Late, unsigned RegIdx) {		MachineBasicBlock::iterator InsertBefore, bool Late, unsigned RegIdx) {
const MCInstrDesc &Desc = TII.get(TargetOpcode::COPY);		const MCInstrDesc &Desc =
		TII.get(TII.getLiveRangeSplitOpcode(FromReg, *MBB.getParent()));
SlotIndexes &Indexes = *LIS.getSlotIndexes();		SlotIndexes &Indexes = *LIS.getSlotIndexes();
if (LaneMask.all() \|\| LaneMask == MRI.getMaxLaneMaskForVReg(FromReg)) {		if (LaneMask.all() \|\| LaneMask == MRI.getMaxLaneMaskForVReg(FromReg)) {
// The full vreg is copied.		// The full vreg is copied.
MachineInstr *CopyMI =		MachineInstr *CopyMI =
BuildMI(MBB, InsertBefore, DebugLoc(), Desc, ToReg).addReg(FromReg);		BuildMI(MBB, InsertBefore, DebugLoc(), Desc, ToReg).addReg(FromReg);
return Indexes.insertMachineInstrInMaps(*CopyMI, Late).getRegSlot();		return Indexes.insertMachineInstrInMaps(*CopyMI, Late).getRegSlot();
}		}

Show All 11 Lines	SlotIndex SplitEditor::buildCopy(Register FromReg, Register ToReg,

// Abort if we cannot possibly implement the COPY with the given indexes.		// Abort if we cannot possibly implement the COPY with the given indexes.
if (!TRI.getCoveringSubRegIndexes(MRI, RC, LaneMask, SubIndexes))		if (!TRI.getCoveringSubRegIndexes(MRI, RC, LaneMask, SubIndexes))
report_fatal_error("Impossible to implement partial COPY");		report_fatal_error("Impossible to implement partial COPY");

SlotIndex Def;		SlotIndex Def;
for (unsigned BestIdx : SubIndexes) {		for (unsigned BestIdx : SubIndexes) {
Def = buildSingleSubRegCopy(FromReg, ToReg, MBB, InsertBefore, BestIdx,		Def = buildSingleSubRegCopy(FromReg, ToReg, MBB, InsertBefore, BestIdx,
DestLI, Late, Def);		DestLI, Late, Def, Desc);
}		}

BumpPtrAllocator &Allocator = LIS.getVNInfoAllocator();		BumpPtrAllocator &Allocator = LIS.getVNInfoAllocator();
DestLI.refineSubRanges(		DestLI.refineSubRanges(
Allocator, LaneMask,		Allocator, LaneMask,
[Def, &Allocator](LiveInterval::SubRange &SR) {		[Def, &Allocator](LiveInterval::SubRange &SR) {
SR.createDeadDef(Def, Allocator);		SR.createDeadDef(Def, Allocator);
},		},
▲ Show 20 Lines • Show All 1,003 Lines • ▼ Show 20 Lines	if (!BI.isOneInstr())
return true;		return true;
// Don't split for single instructions unless explicitly requested.		// Don't split for single instructions unless explicitly requested.
if (!SingleInstrs)		if (!SingleInstrs)
return false;		return false;
// Splitting a live-through range always makes progress.		// Splitting a live-through range always makes progress.
if (BI.LiveIn && BI.LiveOut)		if (BI.LiveIn && BI.LiveOut)
return true;		return true;
// No point in isolating a copy. It has no register class constraints.		// No point in isolating a copy. It has no register class constraints.
if (LIS.getInstructionFromIndex(BI.FirstInstr)->isCopyLike())		MachineInstr *MI = LIS.getInstructionFromIndex(BI.FirstInstr);
		bool copyLike = TII.isCopyInstr(*MI) \|\| MI->isSubregToReg();
		if (copyLike)
return false;		return false;
// Finally, don't isolate an end point that was created by earlier splits.		// Finally, don't isolate an end point that was created by earlier splits.
return isOriginalEndpoint(BI.FirstInstr);		return isOriginalEndpoint(BI.FirstInstr);
}		}

void SplitEditor::splitSingleBlock(const SplitAnalysis::BlockInfo &BI) {		void SplitEditor::splitSingleBlock(const SplitAnalysis::BlockInfo &BI) {
openIntv();		openIntv();
SlotIndex LastSplitPoint = SA.getLastSplitPoint(BI.MBB);		SlotIndex LastSplitPoint = SA.getLastSplitPoint(BI.MBB);
▲ Show 20 Lines • Show All 293 Lines • Show Last 20 Lines

llvm/lib/CodeGen/TargetInstrInfo.cpp

Show First 20 Lines • Show All 434 Lines • ▼ Show 20 Lines	MachineInstr &TargetInstrInfo::duplicate(MachineBasicBlock &MBB,
assert(!Orig.isNotDuplicable() && "Instruction cannot be duplicated");		assert(!Orig.isNotDuplicable() && "Instruction cannot be duplicated");
MachineFunction &MF = *MBB.getParent();		MachineFunction &MF = *MBB.getParent();
return MF.cloneMachineInstrBundle(MBB, InsertBefore, Orig);		return MF.cloneMachineInstrBundle(MBB, InsertBefore, Orig);
}		}

// If the COPY instruction in MI can be folded to a stack operation, return		// If the COPY instruction in MI can be folded to a stack operation, return
// the register class to use.		// the register class to use.
static const TargetRegisterClass *canFoldCopy(const MachineInstr &MI,		static const TargetRegisterClass *canFoldCopy(const MachineInstr &MI,
		const TargetInstrInfo &TII,
unsigned FoldIdx) {		unsigned FoldIdx) {
assert(MI.isCopy() && "MI must be a COPY instruction");		assert(TII.isCopyInstr(MI) && "MI must be a COPY instruction");
		arsenmUnsubmitted Not Done Reply Inline Actions I'd prefer to pass in TII separately rather than jumping through all these hoops. Also, this is only used in the assert so will warn in release build arsenm: I'd prefer to pass in TII separately rather than jumping through all these hoops. Also, this is…
if (MI.getNumOperands() != 2)		if (MI.getNumOperands() != 2)
return nullptr;		return nullptr;
assert(FoldIdx<2 && "FoldIdx refers no nonexistent operand");		assert(FoldIdx<2 && "FoldIdx refers no nonexistent operand");

const MachineOperand &FoldOp = MI.getOperand(FoldIdx);		const MachineOperand &FoldOp = MI.getOperand(FoldIdx);
const MachineOperand &LiveOp = MI.getOperand(1 - FoldIdx);		const MachineOperand &LiveOp = MI.getOperand(1 - FoldIdx);

if (FoldOp.getSubReg() \|\| LiveOp.getSubReg())		if (FoldOp.getSubReg() \|\| LiveOp.getSubReg())
▲ Show 20 Lines • Show All 172 Lines • ▼ Show 20 Lines	if (NewMI) {
// The pass "x86 speculative load hardening" always attaches symbols to		// The pass "x86 speculative load hardening" always attaches symbols to
// call instructions. We need copy it form old instruction.		// call instructions. We need copy it form old instruction.
NewMI->cloneInstrSymbols(MF, MI);		NewMI->cloneInstrSymbols(MF, MI);

return NewMI;		return NewMI;
}		}

// Straight COPY may fold as load/store.		// Straight COPY may fold as load/store.
if (!MI.isCopy() \|\| Ops.size() != 1)		if (!isCopyInstr(MI) \|\| Ops.size() != 1)
return nullptr;		return nullptr;

const TargetRegisterClass *RC = canFoldCopy(MI, Ops[0]);		const TargetRegisterClass RC = canFoldCopy(MI, this, Ops[0]);
if (!RC)		if (!RC)
return nullptr;		return nullptr;

const MachineOperand &MO = MI.getOperand(1 - Ops[0]);		const MachineOperand &MO = MI.getOperand(1 - Ops[0]);
MachineBasicBlock::iterator Pos = MI;		MachineBasicBlock::iterator Pos = MI;

if (Flags == MachineMemOperand::MOStore)		if (Flags == MachineMemOperand::MOStore)
storeRegToStackSlot(*MBB, Pos, MO.getReg(), MO.isKill(), FI, RC, TRI,		storeRegToStackSlot(*MBB, Pos, MO.getReg(), MO.isKill(), FI, RC, TRI,
▲ Show 20 Lines • Show All 1,027 Lines • Show Last 20 Lines

llvm/test/CodeGen/Mips/madd-msub.ll

	Show All 36 Lines
	; DSP-NEXT: mthi $1, $ac0			; DSP-NEXT: mthi $1, $ac0
	; DSP-NEXT: madd $ac0, $5, $4			; DSP-NEXT: madd $ac0, $5, $4
	; DSP-NEXT: mfhi $2, $ac0			; DSP-NEXT: mfhi $2, $ac0
	; DSP-NEXT: jr $ra			; DSP-NEXT: jr $ra
	; DSP-NEXT: mflo $3, $ac0			; DSP-NEXT: mflo $3, $ac0
	;			;
	; 64-LABEL: madd1:			; 64-LABEL: madd1:
	; 64: # %bb.0: # %entry			; 64: # %bb.0: # %entry
	; 64-NEXT: sll $1, $4, 0			; 64-NEXT: sll $4, $4, 0
	; 64-NEXT: sll $2, $5, 0			; 64-NEXT: sll $5, $5, 0
	; 64-NEXT: dmult $2, $1			; 64-NEXT: dmult $5, $4
	; 64-NEXT: mflo $1			; 64-NEXT: mflo $1
	; 64-NEXT: sll $2, $6, 0			; 64-NEXT: sll $6, $6, 0
	; 64-NEXT: jr $ra			; 64-NEXT: jr $ra
	; 64-NEXT: daddu $2, $1, $2			; 64-NEXT: daddu $2, $1, $6
	;			;
	; 64R6-LABEL: madd1:			; 64R6-LABEL: madd1:
	; 64R6: # %bb.0: # %entry			; 64R6: # %bb.0: # %entry
	; 64R6-NEXT: sll $1, $4, 0			; 64R6-NEXT: sll $4, $4, 0
	; 64R6-NEXT: sll $2, $5, 0			; 64R6-NEXT: sll $5, $5, 0
	; 64R6-NEXT: dmul $1, $2, $1			; 64R6-NEXT: dmul $1, $5, $4
	; 64R6-NEXT: sll $2, $6, 0			; 64R6-NEXT: sll $6, $6, 0
	; 64R6-NEXT: jr $ra			; 64R6-NEXT: jr $ra
	; 64R6-NEXT: daddu $2, $1, $2			; 64R6-NEXT: daddu $2, $1, $6
	;			;
	; 16-LABEL: madd1:			; 16-LABEL: madd1:
	; 16: # %bb.0: # %entry			; 16: # %bb.0: # %entry
	; 16-NEXT: mult $5, $4			; 16-NEXT: mult $5, $4
	; 16-NEXT: mflo $2			; 16-NEXT: mflo $2
	; 16-NEXT: mfhi $3			; 16-NEXT: mfhi $3
	; 16-NEXT: sra $4, $6, 31			; 16-NEXT: sra $4, $6, 31
	; 16-NEXT: addu $4, $3, $4			; 16-NEXT: addu $4, $3, $4
	▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines
	; DSP-NEXT: mthi $6, $ac0			; DSP-NEXT: mthi $6, $ac0
	; DSP-NEXT: madd $ac0, $5, $4			; DSP-NEXT: madd $ac0, $5, $4
	; DSP-NEXT: mfhi $2, $ac0			; DSP-NEXT: mfhi $2, $ac0
	; DSP-NEXT: jr $ra			; DSP-NEXT: jr $ra
	; DSP-NEXT: mflo $3, $ac0			; DSP-NEXT: mflo $3, $ac0
	;			;
	; 64-LABEL: madd3:			; 64-LABEL: madd3:
	; 64: # %bb.0: # %entry			; 64: # %bb.0: # %entry
	; 64-NEXT: sll $1, $4, 0			; 64-NEXT: sll $4, $4, 0
	; 64-NEXT: sll $2, $5, 0			; 64-NEXT: sll $5, $5, 0
	; 64-NEXT: dmult $2, $1			; 64-NEXT: dmult $5, $4
	; 64-NEXT: mflo $1			; 64-NEXT: mflo $1
	; 64-NEXT: jr $ra			; 64-NEXT: jr $ra
	; 64-NEXT: daddu $2, $1, $6			; 64-NEXT: daddu $2, $1, $6
	;			;
	; 64R6-LABEL: madd3:			; 64R6-LABEL: madd3:
	; 64R6: # %bb.0: # %entry			; 64R6: # %bb.0: # %entry
	; 64R6-NEXT: sll $1, $4, 0			; 64R6-NEXT: sll $4, $4, 0
	; 64R6-NEXT: sll $2, $5, 0			; 64R6-NEXT: sll $5, $5, 0
	; 64R6-NEXT: dmul $1, $2, $1			; 64R6-NEXT: dmul $1, $5, $4
	; 64R6-NEXT: jr $ra			; 64R6-NEXT: jr $ra
	; 64R6-NEXT: daddu $2, $1, $6			; 64R6-NEXT: daddu $2, $1, $6
	;			;
	; 16-LABEL: madd3:			; 16-LABEL: madd3:
	; 16: # %bb.0: # %entry			; 16: # %bb.0: # %entry
	; 16-NEXT: mult $5, $4			; 16-NEXT: mult $5, $4
	; 16-NEXT: mflo $2			; 16-NEXT: mflo $2
	; 16-NEXT: mfhi $3			; 16-NEXT: mfhi $3
	▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines
	; DSP-NEXT: mthi $1, $ac0			; DSP-NEXT: mthi $1, $ac0
	; DSP-NEXT: msub $ac0, $5, $4			; DSP-NEXT: msub $ac0, $5, $4
	; DSP-NEXT: mfhi $2, $ac0			; DSP-NEXT: mfhi $2, $ac0
	; DSP-NEXT: jr $ra			; DSP-NEXT: jr $ra
	; DSP-NEXT: mflo $3, $ac0			; DSP-NEXT: mflo $3, $ac0
	;			;
	; 64-LABEL: msub1:			; 64-LABEL: msub1:
	; 64: # %bb.0: # %entry			; 64: # %bb.0: # %entry
	; 64-NEXT: sll $1, $4, 0			; 64-NEXT: sll $4, $4, 0
	; 64-NEXT: sll $2, $5, 0			; 64-NEXT: sll $5, $5, 0
	; 64-NEXT: dmult $2, $1			; 64-NEXT: dmult $5, $4
	; 64-NEXT: mflo $1			; 64-NEXT: mflo $1
	; 64-NEXT: sll $2, $6, 0			; 64-NEXT: sll $6, $6, 0
	; 64-NEXT: jr $ra			; 64-NEXT: jr $ra
	; 64-NEXT: dsubu $2, $2, $1			; 64-NEXT: dsubu $2, $6, $1
	;			;
	; 64R6-LABEL: msub1:			; 64R6-LABEL: msub1:
	; 64R6: # %bb.0: # %entry			; 64R6: # %bb.0: # %entry
	; 64R6-NEXT: sll $1, $4, 0			; 64R6-NEXT: sll $4, $4, 0
	; 64R6-NEXT: sll $2, $5, 0			; 64R6-NEXT: sll $5, $5, 0
	; 64R6-NEXT: dmul $1, $2, $1			; 64R6-NEXT: dmul $1, $5, $4
	; 64R6-NEXT: sll $2, $6, 0			; 64R6-NEXT: sll $6, $6, 0
	; 64R6-NEXT: jr $ra			; 64R6-NEXT: jr $ra
	; 64R6-NEXT: dsubu $2, $2, $1			; 64R6-NEXT: dsubu $2, $6, $1
	;			;
	; 16-LABEL: msub1:			; 16-LABEL: msub1:
	; 16: # %bb.0: # %entry			; 16: # %bb.0: # %entry
	; 16-NEXT: mult $5, $4			; 16-NEXT: mult $5, $4
	; 16-NEXT: mflo $2			; 16-NEXT: mflo $2
	; 16-NEXT: mfhi $4			; 16-NEXT: mfhi $4
	; 16-NEXT: subu $3, $6, $2			; 16-NEXT: subu $3, $6, $2
	; 16-NEXT: sltu $6, $2			; 16-NEXT: sltu $6, $2
	▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines
	; DSP-NEXT: mthi $6, $ac0			; DSP-NEXT: mthi $6, $ac0
	; DSP-NEXT: msub $ac0, $5, $4			; DSP-NEXT: msub $ac0, $5, $4
	; DSP-NEXT: mfhi $2, $ac0			; DSP-NEXT: mfhi $2, $ac0
	; DSP-NEXT: jr $ra			; DSP-NEXT: jr $ra
	; DSP-NEXT: mflo $3, $ac0			; DSP-NEXT: mflo $3, $ac0
	;			;
	; 64-LABEL: msub3:			; 64-LABEL: msub3:
	; 64: # %bb.0: # %entry			; 64: # %bb.0: # %entry
	; 64-NEXT: sll $1, $4, 0			; 64-NEXT: sll $4, $4, 0
	; 64-NEXT: sll $2, $5, 0			; 64-NEXT: sll $5, $5, 0
	; 64-NEXT: dmult $2, $1			; 64-NEXT: dmult $5, $4
	; 64-NEXT: mflo $1			; 64-NEXT: mflo $1
	; 64-NEXT: jr $ra			; 64-NEXT: jr $ra
	; 64-NEXT: dsubu $2, $6, $1			; 64-NEXT: dsubu $2, $6, $1
	;			;
	; 64R6-LABEL: msub3:			; 64R6-LABEL: msub3:
	; 64R6: # %bb.0: # %entry			; 64R6: # %bb.0: # %entry
	; 64R6-NEXT: sll $1, $4, 0			; 64R6-NEXT: sll $4, $4, 0
	; 64R6-NEXT: sll $2, $5, 0			; 64R6-NEXT: sll $5, $5, 0
	; 64R6-NEXT: dmul $1, $2, $1			; 64R6-NEXT: dmul $1, $5, $4
	; 64R6-NEXT: jr $ra			; 64R6-NEXT: jr $ra
	; 64R6-NEXT: dsubu $2, $6, $1			; 64R6-NEXT: dsubu $2, $6, $1
	;			;
	; 16-LABEL: msub3:			; 16-LABEL: msub3:
	; 16: # %bb.0: # %entry			; 16: # %bb.0: # %entry
	; 16-NEXT: mult $5, $4			; 16-NEXT: mult $5, $4
	; 16-NEXT: mflo $2			; 16-NEXT: mflo $2
	; 16-NEXT: mfhi $4			; 16-NEXT: mfhi $4
	▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines
	; DSP-NEXT: sra $4, $6, 31			; DSP-NEXT: sra $4, $6, 31
	; DSP-NEXT: subu $1, $1, $4			; DSP-NEXT: subu $1, $1, $4
	; DSP-NEXT: subu $2, $1, $2			; DSP-NEXT: subu $2, $1, $2
	; DSP-NEXT: jr $ra			; DSP-NEXT: jr $ra
	; DSP-NEXT: subu $3, $3, $6			; DSP-NEXT: subu $3, $3, $6
	;			;
	; 64-LABEL: msub5:			; 64-LABEL: msub5:
	; 64: # %bb.0: # %entry			; 64: # %bb.0: # %entry
	; 64-NEXT: sll $1, $4, 0			; 64-NEXT: sll $4, $4, 0
	; 64-NEXT: sll $2, $5, 0			; 64-NEXT: sll $5, $5, 0
	; 64-NEXT: dmult $2, $1			; 64-NEXT: dmult $5, $4
	; 64-NEXT: mflo $1			; 64-NEXT: mflo $1
	; 64-NEXT: sll $2, $6, 0			; 64-NEXT: sll $6, $6, 0
	; 64-NEXT: jr $ra			; 64-NEXT: jr $ra
	; 64-NEXT: dsubu $2, $1, $2			; 64-NEXT: dsubu $2, $1, $6
	;			;
	; 64R6-LABEL: msub5:			; 64R6-LABEL: msub5:
	; 64R6: # %bb.0: # %entry			; 64R6: # %bb.0: # %entry
	; 64R6-NEXT: sll $1, $4, 0			; 64R6-NEXT: sll $4, $4, 0
	; 64R6-NEXT: sll $2, $5, 0			; 64R6-NEXT: sll $5, $5, 0
	; 64R6-NEXT: dmul $1, $2, $1			; 64R6-NEXT: dmul $1, $5, $4
	; 64R6-NEXT: sll $2, $6, 0			; 64R6-NEXT: sll $6, $6, 0
	; 64R6-NEXT: jr $ra			; 64R6-NEXT: jr $ra
	; 64R6-NEXT: dsubu $2, $1, $2			; 64R6-NEXT: dsubu $2, $1, $6
	;			;
	; 16-LABEL: msub5:			; 16-LABEL: msub5:
	; 16: # %bb.0: # %entry			; 16: # %bb.0: # %entry
	; 16-NEXT: mult $5, $4			; 16-NEXT: mult $5, $4
	; 16-NEXT: mflo $2			; 16-NEXT: mflo $2
	; 16-NEXT: mfhi $4			; 16-NEXT: mfhi $4
	; 16-NEXT: subu $3, $2, $6			; 16-NEXT: subu $3, $2, $6
	; 16-NEXT: sltu $2, $6			; 16-NEXT: sltu $2, $6
	Show All 13 Lines

llvm/test/CodeGen/Thumb2/mve-float32regloops.ll

	Show First 20 Lines • Show All 1,608 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: wls lr, r0, .LBB19_6			; CHECK-NEXT: wls lr, r0, .LBB19_6
	; CHECK-NEXT: @ %bb.4: @ %while.body.lr.ph			; CHECK-NEXT: @ %bb.4: @ %while.body.lr.ph
	; CHECK-NEXT: @ in Loop: Header=BB19_3 Depth=1			; CHECK-NEXT: @ in Loop: Header=BB19_3 Depth=1
	; CHECK-NEXT: ldr r5, [sp, #28] @ 4-byte Reload			; CHECK-NEXT: ldr r5, [sp, #28] @ 4-byte Reload
	; CHECK-NEXT: .LBB19_5: @ %while.body			; CHECK-NEXT: .LBB19_5: @ %while.body
	; CHECK-NEXT: @ Parent Loop BB19_3 Depth=1			; CHECK-NEXT: @ Parent Loop BB19_3 Depth=1
	; CHECK-NEXT: @ => This Inner Loop Header: Depth=2			; CHECK-NEXT: @ => This Inner Loop Header: Depth=2
	; CHECK-NEXT: vmov r7, s7			; CHECK-NEXT: vmov r7, s7
	; CHECK-NEXT: vldrw.u32 q2, [r9, #16]			; CHECK-NEXT: vldr s0, [r1, #12]
	; CHECK-NEXT: vmov r11, s6			; CHECK-NEXT: vmov r11, s6
	; CHECK-NEXT: vldrw.u32 q1, [r9, #112]			; CHECK-NEXT: vldrw.u32 q1, [r9, #112]
	; CHECK-NEXT: vmov r4, s1
	; CHECK-NEXT: vldr s1, [r1, #12]
	; CHECK-NEXT: vmov r3, s3			; CHECK-NEXT: vmov r3, s3
	; CHECK-NEXT: vldr s3, [r1, #8]			; CHECK-NEXT: vldr s3, [r1, #8]
	; CHECK-NEXT: vstrw.32 q1, [sp, #32] @ 16-byte Spill			; CHECK-NEXT: vstrw.32 q1, [sp, #32] @ 16-byte Spill
	; CHECK-NEXT: vldrw.u32 q1, [r9]			; CHECK-NEXT: vldrw.u32 q1, [r9]
	; CHECK-NEXT: vmov r8, s1			; CHECK-NEXT: vmov r8, s0
				; CHECK-NEXT: vldrw.u32 q2, [r9, #16]
	; CHECK-NEXT: ldr r6, [r1, #4]			; CHECK-NEXT: ldr r6, [r1, #4]
	; CHECK-NEXT: vldrw.u32 q7, [r9, #32]			; CHECK-NEXT: vldrw.u32 q7, [r9, #32]
	; CHECK-NEXT: vmul.f32 q1, q1, r8			; CHECK-NEXT: vmul.f32 q1, q1, r8
	; CHECK-NEXT: vmov r0, s3			; CHECK-NEXT: vmov r0, s3
	; CHECK-NEXT: vldrw.u32 q3, [r9, #48]
	; CHECK-NEXT: vfma.f32 q1, q2, r0			; CHECK-NEXT: vfma.f32 q1, q2, r0
				; CHECK-NEXT: vldrw.u32 q3, [r9, #48]
	; CHECK-NEXT: ldr r0, [r1], #16			; CHECK-NEXT: ldr r0, [r1], #16
	; CHECK-NEXT: vfma.f32 q1, q7, r6			; CHECK-NEXT: vfma.f32 q1, q7, r6
				; CHECK-NEXT: vmov r4, s1
	; CHECK-NEXT: vldrw.u32 q6, [r9, #64]			; CHECK-NEXT: vldrw.u32 q6, [r9, #64]
	; CHECK-NEXT: vmov.f32 s2, s1			; CHECK-NEXT: vmov.f32 s1, s0
	; CHECK-NEXT: vfma.f32 q1, q3, r0			; CHECK-NEXT: vfma.f32 q1, q3, r0
				; CHECK-NEXT: vmov.f32 s2, s0
	; CHECK-NEXT: vldrw.u32 q5, [r9, #80]			; CHECK-NEXT: vldrw.u32 q5, [r9, #80]
	; CHECK-NEXT: vfma.f32 q1, q6, r4			; CHECK-NEXT: vfma.f32 q1, q6, r4
	; CHECK-NEXT: vldrw.u32 q4, [r9, #96]			; CHECK-NEXT: vldrw.u32 q4, [r9, #96]
	; CHECK-NEXT: vldrw.u32 q2, [sp, #32] @ 16-byte Reload
	; CHECK-NEXT: vfma.f32 q1, q5, r3			; CHECK-NEXT: vfma.f32 q1, q5, r3
				; CHECK-NEXT: vldrw.u32 q2, [sp, #32] @ 16-byte Reload
	; CHECK-NEXT: vfma.f32 q1, q4, r7			; CHECK-NEXT: vfma.f32 q1, q4, r7
	; CHECK-NEXT: vfma.f32 q1, q2, r11			; CHECK-NEXT: vfma.f32 q1, q2, r11
	; CHECK-NEXT: vstrb.8 q1, [r5], #16			; CHECK-NEXT: vstrb.8 q1, [r5], #16
	; CHECK-NEXT: le lr, .LBB19_5			; CHECK-NEXT: le lr, .LBB19_5
	; CHECK-NEXT: .LBB19_6: @ %while.end			; CHECK-NEXT: .LBB19_6: @ %while.end
	; CHECK-NEXT: @ in Loop: Header=BB19_3 Depth=1			; CHECK-NEXT: @ in Loop: Header=BB19_3 Depth=1
	; CHECK-NEXT: ldr r7, [sp, #4] @ 4-byte Reload			; CHECK-NEXT: ldr r7, [sp, #4] @ 4-byte Reload
	; CHECK-NEXT: cmp r7, #0			; CHECK-NEXT: cmp r7, #0
	▲ Show 20 Lines • Show All 482 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/GlobalISel/add-ext.ll

	Show First 20 Lines • Show All 199 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: imulq $4, %rax, %rax			; CHECK-NEXT: imulq $4, %rax, %rax
	; CHECK-NEXT: addq %rdi, %rax			; CHECK-NEXT: addq %rdi, %rax
	; CHECK-NEXT: leal 2(%rsi), %ecx			; CHECK-NEXT: leal 2(%rsi), %ecx
	; CHECK-NEXT: movl %ecx, %ecx			; CHECK-NEXT: movl %ecx, %ecx
	; CHECK-NEXT: imulq $4, %rcx, %rcx			; CHECK-NEXT: imulq $4, %rcx, %rcx
	; CHECK-NEXT: addq %rdi, %rcx			; CHECK-NEXT: addq %rdi, %rcx
	; CHECK-NEXT: movl (%rcx), %ecx			; CHECK-NEXT: movl (%rcx), %ecx
	; CHECK-NEXT: addl (%rax), %ecx			; CHECK-NEXT: addl (%rax), %ecx
	; CHECK-NEXT: movl %esi, %eax			; CHECK-NEXT: movl %esi, %esi
	; CHECK-NEXT: imulq $4, %rax, %rax			; CHECK-NEXT: imulq $4, %rsi, %rax
	; CHECK-NEXT: addq %rdi, %rax			; CHECK-NEXT: addq %rdi, %rax
	; CHECK-NEXT: movl %ecx, (%rax)			; CHECK-NEXT: movl %ecx, (%rax)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq

	%add1 = add nuw i32 %i, 1			%add1 = add nuw i32 %i, 1
	%idx1 = zext i32 %add1 to i64			%idx1 = zext i32 %add1 to i64
	%gep1 = getelementptr i32, ptr %a, i64 %idx1			%gep1 = getelementptr i32, ptr %a, i64 %idx1
	%load1 = load i32, ptr %gep1, align 4			%load1 = load i32, ptr %gep1, align 4
	Show All 12 Lines

llvm/test/CodeGen/X86/dagcombine-cse.ll

	Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines
	; X86-NEXT: popl %esi			; X86-NEXT: popl %esi
	; X86-NEXT: popl %edi			; X86-NEXT: popl %edi
	; X86-NEXT: popl %ebx			; X86-NEXT: popl %ebx
	; X86-NEXT: popl %ebp			; X86-NEXT: popl %ebp
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: square_high:			; X64-LABEL: square_high:
	; X64: ## %bb.0: ## %entry			; X64: ## %bb.0: ## %entry
	; X64-NEXT: movl %esi, %ecx			; X64-NEXT: movl %esi, %esi
	; X64-NEXT: movq %rcx, %rax			; X64-NEXT: movq %rsi, %rax
	; X64-NEXT: mulq %rdi			; X64-NEXT: mulq %rdi
	; X64-NEXT: movq %rdx, %rsi			; X64-NEXT: movq %rdx, %rcx
	; X64-NEXT: movq %rax, %r8			; X64-NEXT: movq %rax, %r8
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: mulq %rdi			; X64-NEXT: mulq %rdi
	; X64-NEXT: addq %r8, %rdx			; X64-NEXT: addq %r8, %rdx
	; X64-NEXT: movq %rsi, %rax			; X64-NEXT: movq %rcx, %rax
	; X64-NEXT: adcq $0, %rax			; X64-NEXT: adcq $0, %rax
	; X64-NEXT: addq %rdx, %r8			; X64-NEXT: addq %rdx, %r8
	; X64-NEXT: adcq %rsi, %rax			; X64-NEXT: adcq %rcx, %rax
	; X64-NEXT: imulq %rcx, %rcx			; X64-NEXT: imulq %rsi, %rsi
	; X64-NEXT: addq %rax, %rcx			; X64-NEXT: addq %rax, %rsi
	; X64-NEXT: shrdq $32, %rcx, %r8			; X64-NEXT: shrdq $32, %rsi, %r8
	; X64-NEXT: shrq $32, %rcx			; X64-NEXT: shrq $32, %rsi
	; X64-NEXT: movq %r8, %rax			; X64-NEXT: movq %r8, %rax
	; X64-NEXT: movq %rcx, %rdx			; X64-NEXT: movq %rsi, %rdx
	; X64-NEXT: retq			; X64-NEXT: retq
	entry:			entry:
	%conv = zext i96 %x to i192			%conv = zext i96 %x to i192
	%mul = mul nuw i192 %conv, %conv			%mul = mul nuw i192 %conv, %conv
	%shr = lshr i192 %mul, 96			%shr = lshr i192 %mul, 96
	%conv2 = trunc i192 %shr to i96			%conv2 = trunc i192 %shr to i96
	ret i96 %conv2			ret i96 %conv2
	}			}

llvm/test/CodeGen/X86/fold-and-shift-x86_64.ll

Show All 28 Lines	entry:
%tmp7 = getelementptr i8, ptr %X, i64 %tmp4		%tmp7 = getelementptr i8, ptr %X, i64 %tmp4
%tmp9 = load i8, ptr %tmp7		%tmp9 = load i8, ptr %tmp7
ret i8 %tmp9		ret i8 %tmp9
}		}

define i8 @t3(ptr %X, i64 %i) {		define i8 @t3(ptr %X, i64 %i) {
; CHECK-LABEL: t3:		; CHECK-LABEL: t3:
; CHECK: # %bb.0: # %entry		; CHECK: # %bb.0: # %entry
; CHECK-NEXT: movl %esi, %eax		; CHECK-NEXT: movl %esi, %esi
; CHECK-NEXT: movzbl (%rdi,%rax,4), %eax		; CHECK-NEXT: movzbl (%rdi,%rsi,4), %eax
; CHECK-NEXT: retq		; CHECK-NEXT: retq

entry:		entry:
%tmp2 = shl i64 %i, 2		%tmp2 = shl i64 %i, 2
%tmp4 = and i64 %tmp2, 17179869180		%tmp4 = and i64 %tmp2, 17179869180
%tmp7 = getelementptr i8, ptr %X, i64 %tmp4		%tmp7 = getelementptr i8, ptr %X, i64 %tmp4
%tmp9 = load i8, ptr %tmp7		%tmp9 = load i8, ptr %tmp7
ret i8 %tmp9		ret i8 %tmp9
▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/unfold-masked-merge-scalar-constmask-lowhigh.ll

Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	; CHECK-BMI-NEXT: retq
%my = and i32 %y, -65536		%my = and i32 %y, -65536
%r = or i32 %mx, %my		%r = or i32 %mx, %my
ret i32 %r		ret i32 %r
}		}

define i64 @out64_constmask(i64 %x, i64 %y) {		define i64 @out64_constmask(i64 %x, i64 %y) {
; CHECK-NOBMI-LABEL: out64_constmask:		; CHECK-NOBMI-LABEL: out64_constmask:
; CHECK-NOBMI: # %bb.0:		; CHECK-NOBMI: # %bb.0:
; CHECK-NOBMI-NEXT: movl %edi, %ecx		; CHECK-NOBMI-NEXT: movl %edi, %edi
; CHECK-NOBMI-NEXT: movabsq $-4294967296, %rax # imm = 0xFFFFFFFF00000000		; CHECK-NOBMI-NEXT: movabsq $-4294967296, %rax # imm = 0xFFFFFFFF00000000
; CHECK-NOBMI-NEXT: andq %rsi, %rax		; CHECK-NOBMI-NEXT: andq %rsi, %rax
; CHECK-NOBMI-NEXT: orq %rcx, %rax		; CHECK-NOBMI-NEXT: orq %rdi, %rax
; CHECK-NOBMI-NEXT: retq		; CHECK-NOBMI-NEXT: retq
;		;
; CHECK-BMI-LABEL: out64_constmask:		; CHECK-BMI-LABEL: out64_constmask:
; CHECK-BMI: # %bb.0:		; CHECK-BMI: # %bb.0:
; CHECK-BMI-NEXT: movl %edi, %ecx		; CHECK-BMI-NEXT: movl %edi, %edi
; CHECK-BMI-NEXT: movabsq $-4294967296, %rax # imm = 0xFFFFFFFF00000000		; CHECK-BMI-NEXT: movabsq $-4294967296, %rax # imm = 0xFFFFFFFF00000000
; CHECK-BMI-NEXT: andq %rsi, %rax		; CHECK-BMI-NEXT: andq %rsi, %rax
; CHECK-BMI-NEXT: orq %rcx, %rax		; CHECK-BMI-NEXT: orq %rdi, %rax
; CHECK-BMI-NEXT: retq		; CHECK-BMI-NEXT: retq
%mx = and i64 %x, 4294967295		%mx = and i64 %x, 4294967295
%my = and i64 %y, -4294967296		%my = and i64 %y, -4294967296
%r = or i64 %mx, %my		%r = or i64 %mx, %my
ret i64 %r		ret i64 %r
}		}

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;		;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
▲ Show 20 Lines • Show All 332 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen]Allow targets to use target specific COPY instructions for live range splittingClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 534897

llvm/include/llvm/CodeGen/TargetInstrInfo.h

llvm/lib/CodeGen/CalcSpillWeights.cpp

llvm/lib/CodeGen/InlineSpiller.cpp

llvm/lib/CodeGen/LiveRangeEdit.cpp

llvm/lib/CodeGen/LiveRangeShrink.cpp

llvm/lib/CodeGen/RegAllocGreedy.cpp

llvm/lib/CodeGen/SplitKit.h

llvm/lib/CodeGen/SplitKit.cpp

llvm/lib/CodeGen/TargetInstrInfo.cpp

llvm/test/CodeGen/Mips/madd-msub.ll

llvm/test/CodeGen/Thumb2/mve-float32regloops.ll

llvm/test/CodeGen/X86/GlobalISel/add-ext.ll

llvm/test/CodeGen/X86/dagcombine-cse.ll

llvm/test/CodeGen/X86/fold-and-shift-x86_64.ll

llvm/test/CodeGen/X86/unfold-masked-merge-scalar-constmask-lowhigh.ll

[CodeGen]Allow targets to use target specific COPY instructions for live range splitting
ClosedPublic