This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/
-
CodeGen/
1
CodeGenPrepare.cpp
-
test/Transforms/CodeGenPrepare/AArch64/
-
Transforms/
-
CodeGenPrepare/
-
AArch64/
-
sink-free-instructions-inseltpoison.ll
-
sink-free-instructions.ll

Differential D107262

[CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block
ClosedPublic

Authored by TiehuZhang on Aug 2 2021, 4:09 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
paulwalker-arm
mdchen
fhahn
dmgreen

Commits

rG9cfa9b44a589: [CodeGenPrepare] The instruction to be sunk should be inserted before its user…

Summary

In current implementation, the instruction to be sunk will be inserted before the target instruction without considering the def-use tree, which may case Instruction does not dominate all uses error. We need to choose a suitable location to insert
according to the use chain

Diff Detail

Event Timeline

TiehuZhang created this revision.Aug 2 2021, 4:09 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptAug 2 2021, 4:09 AM

TiehuZhang requested review of this revision.Aug 2 2021, 4:09 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2021, 4:09 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

TiehuZhang edited the summary of this revision. (Show Details)Aug 2 2021, 4:13 AM

This happens because out of the chain of insert; shuffle only the insert needs to be sunk?
It may be better to remove the UI->getParent() == TargetBB from if (UI->getParent() == TargetBB || isa<PHINode>(UI)) and update InsertPoint in the loop as we go. It won't need to clone the instruction, but it can update InsertPoint and continue.

llvm/test/Transforms/CodeGenPrepare/AArch64/sink-free-instructions-1.ll
4–5 ↗	(On Diff #363430)	This can be removed.
25 ↗	(On Diff #363430)	Bugpoint has a habit of over-reducing test cases, introducing undef where they only make things worse and less maintainable in the longrun. Can you make the test not use any undef?

Harbormaster completed remote builds in B117419: Diff 363430.Aug 2 2021, 4:55 AM

TiehuZhang added a reviewer: fhahn.Aug 2 2021, 7:57 PM

In D107262#2919594, @dmgreen wrote:

This happens because out of the chain of insert; shuffle only the insert needs to be sunk?
It may be better to remove the UI->getParent() == TargetBB from if (UI->getParent() == TargetBB || isa<PHINode>(UI)) and update InsertPoint in the loop as we go. It won't need to clone the instruction, but it can update InsertPoint and continue.

Hi, dmgreen, thanks for your review! But I don't quite make sense of your comment. The condition UI->getParent() == TargetBB is used to filter out some instructions already in targetBB to avoid the meaningless sinking, why can it be removed?

TiehuZhang updated this revision to Diff 363675.Aug 3 2021, 3:27 AM

Harbormaster completed remote builds in B117588: Diff 363675.Aug 3 2021, 4:14 AM

In D107262#2921656, @TiehuZhang wrote:

In D107262#2919594, @dmgreen wrote:

This happens because out of the chain of insert; shuffle only the insert needs to be sunk?
It may be better to remove the UI->getParent() == TargetBB from if (UI->getParent() == TargetBB || isa<PHINode>(UI)) and update InsertPoint in the loop as we go. It won't need to clone the instruction, but it can update InsertPoint and continue.

Hi, dmgreen, thanks for your review! But I don't quite make sense of your comment. The condition UI->getParent() == TargetBB is used to filter out some instructions already in targetBB to avoid the meaningless sinking, why can it be removed?

shouldSinkOperands will return a chain of instructions, in this case starting at the mul it will return the shuffle and the insert, as those are the instruction it is profitable to sink. They are visited in reverse order in order to sink the last instruction first. The shuffle is already in the TargetBB, so doesn't need to be sunk (or cloned), but we still need to update the InsertPoint when sinking the insert, or else it will fail dominance checks.

It looks like there are some Arm tests failing with the current attempt to fix that. The instruction being sunk may have multiple uses, and moving before the first one we happen to find might not always work, it might not be the instruction that was originally returned by shouldSinkOperands.

Instead, I think it would be better to make sure we are updating InsertPoint as we go in the ToReplace loop. So we add instruction to ToReplace even if they are already in TargetDB, and in the ToReplace loop we check if the instruction is already in the correct BB and just update the InsertPoint if so.

llvm/test/Transforms/CodeGenPrepare/AArch64/sink-free-instructions-1.ll
7 ↗	(On Diff #363675)	Can this test be added to one of the existing files? There is already a sink-free-instructions.ll test. Otherwise the test should preferably have a better name than sink-free-instructions-1.ll.
29 ↗	(On Diff #363675)	This value is dead? Can it be returned instead?

TiehuZhang updated this revision to Diff 364461.Aug 5 2021, 7:30 AM

TiehuZhang edited the summary of this revision. (Show Details)

TiehuZhang updated this revision to Diff 364464.Aug 5 2021, 7:35 AM

In D107262#2922154, @dmgreen wrote:

In D107262#2921656, @TiehuZhang wrote:

In D107262#2919594, @dmgreen wrote:

This happens because out of the chain of insert; shuffle only the insert needs to be sunk?
It may be better to remove the UI->getParent() == TargetBB from if (UI->getParent() == TargetBB || isa<PHINode>(UI)) and update InsertPoint in the loop as we go. It won't need to clone the instruction, but it can update InsertPoint and continue.

Hi, dmgreen, thanks for your review! But I don't quite make sense of your comment. The condition UI->getParent() == TargetBB is used to filter out some instructions already in targetBB to avoid the meaningless sinking, why can it be removed?

shouldSinkOperands will return a chain of instructions, in this case starting at the mul it will return the shuffle and the insert, as those are the instruction it is profitable to sink. They are visited in reverse order in order to sink the last instruction first. The shuffle is already in the TargetBB, so doesn't need to be sunk (or cloned), but we still need to update the InsertPoint when sinking the insert, or else it will fail dominance checks.

It looks like there are some Arm tests failing with the current attempt to fix that. The instruction being sunk may have multiple uses, and moving before the first one we happen to find might not always work, it might not be the instruction that was originally returned by shouldSinkOperands.

Instead, I think it would be better to make sure we are updating InsertPoint as we go in the ToReplace loop. So we add instruction to ToReplace even if they are already in TargetDB, and in the ToReplace loop we check if the instruction is already in the correct BB and just update the InsertPoint if so.

@dmgreen, thanks very much! We can make use of condition UI->getParent() == TargetBB from if (UI->getParent() == TargetBB || isa<PHINode>(UI)) to update the insertPoint. Actually, the order of users obtained from users() may not follow instructions order in a basic block. So the previous patch was not correct.

llvm/test/Transforms/CodeGenPrepare/AArch64/sink-free-instructions-1.ll
25 ↗	(On Diff #363430)	Thanks for you comment. The case is reduced from a csmith testcase, so there are some strange instructions. I've made some changes to remove the `undef`.

TiehuZhang updated this revision to Diff 364468.Aug 5 2021, 7:52 AM

TiehuZhang updated this revision to Diff 364473.Aug 5 2021, 7:55 AM

Harbormaster completed remote builds in B118164: Diff 364473.Aug 5 2021, 8:23 AM

Thanks, looking good. But I do still worry about the order of instructions sunk.

I was trying it out, seeing if it would go wrong when we were sinking a lot of operands. I noticed that the add/sub sinking wasn't really working properly though! There is https://reviews.llvm.org/D107623 to improve that and getting the shuffles to sink.

With that in, can you add these two test to show partially sinking two values at the same time:

define <4 x i32> @sinkadd_partial(<8 x i16> %a1, <8 x i16> %a2, i8 %f) {
for.cond4.preheader.lr.ph:
  %cmp = icmp slt i8 %f, 0
  %s2 = shufflevector <8 x i16> %a2, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  %s1 = shufflevector <8 x i16> %a1, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  br i1 %cmp, label %for.cond4.preheader.us.preheader, label %for.cond4.preheader.preheader

for.cond4.preheader.us.preheader:                 ; preds = %for.cond4.preheader.lr.ph
  %e1 = sext <4 x i16> %s1 to <4 x i32>
  %e2 = sext <4 x i16> %s2 to <4 x i32>
  %0 = add <4 x i32> %e1, %e2
  ret <4 x i32> %0

for.cond4.preheader.preheader:                    ; preds = %for.cond4.preheader.lr.ph
  ret <4 x i32> zeroinitializer
}

define <4 x i32> @sinkadd_partial_rev(<8 x i16> %a1, <8 x i16> %a2, i8 %f) {
for.cond4.preheader.lr.ph:
  %cmp = icmp slt i8 %f, 0
  %s2 = shufflevector <8 x i16> %a2, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  %s1 = shufflevector <8 x i16> %a1, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  br i1 %cmp, label %for.cond4.preheader.us.preheader, label %for.cond4.preheader.preheader

for.cond4.preheader.us.preheader:                 ; preds = %for.cond4.preheader.lr.ph
  %e2 = sext <4 x i16> %s2 to <4 x i32>
  %e1 = sext <4 x i16> %s1 to <4 x i32>
  %0 = add <4 x i32> %e1, %e2
  ret <4 x i32> %0

for.cond4.preheader.preheader:                    ; preds = %for.cond4.preheader.lr.ph
  ret <4 x i32> zeroinitializer
}

The order of extends in the target block become important.

In D107262#2930517, @dmgreen wrote:

Thanks, looking good. But I do still worry about the order of instructions sunk.

With that in, can you add these two test to show partially sinking two values at the same time:

define <4 x i32> @sinkadd_partial(<8 x i16> %a1, <8 x i16> %a2, i8 %f) {
for.cond4.preheader.lr.ph:
  %cmp = icmp slt i8 %f, 0
  %s2 = shufflevector <8 x i16> %a2, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  %s1 = shufflevector <8 x i16> %a1, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  br i1 %cmp, label %for.cond4.preheader.us.preheader, label %for.cond4.preheader.preheader

for.cond4.preheader.us.preheader:                 ; preds = %for.cond4.preheader.lr.ph
  %e1 = sext <4 x i16> %s1 to <4 x i32>
  %e2 = sext <4 x i16> %s2 to <4 x i32>
  %0 = add <4 x i32> %e1, %e2
  ret <4 x i32> %0

for.cond4.preheader.preheader:                    ; preds = %for.cond4.preheader.lr.ph
  ret <4 x i32> zeroinitializer
}

define <4 x i32> @sinkadd_partial_rev(<8 x i16> %a1, <8 x i16> %a2, i8 %f) {
for.cond4.preheader.lr.ph:
  %cmp = icmp slt i8 %f, 0
  %s2 = shufflevector <8 x i16> %a2, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  %s1 = shufflevector <8 x i16> %a1, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  br i1 %cmp, label %for.cond4.preheader.us.preheader, label %for.cond4.preheader.preheader

for.cond4.preheader.us.preheader:                 ; preds = %for.cond4.preheader.lr.ph
  %e2 = sext <4 x i16> %s2 to <4 x i32>
  %e1 = sext <4 x i16> %s1 to <4 x i32>
  %0 = add <4 x i32> %e1, %e2
  ret <4 x i32> %0

for.cond4.preheader.preheader:                    ; preds = %for.cond4.preheader.lr.ph
  ret <4 x i32> zeroinitializer
}

The order of extends in the target block become important.

OK, I will, thanks!

TiehuZhang updated this revision to Diff 365144.Aug 9 2021, 4:00 AM

Harbormaster completed remote builds in B118645: Diff 365144.Aug 9 2021, 4:35 AM

dmgreen mentioned this in D107794: [AArch64ISelLowering] Avoid sinking mul's ops in some cases.Aug 10 2021, 1:06 AM

I've committed 013030a0b213a75e0403fcdb5a070d21831ee561. Do you want to rebase and fix those extra testcases here? Or do you think that's best left for a separate patch?

In D107262#2936743, @dmgreen wrote:

I've committed 013030a0b213a75e0403fcdb5a070d21831ee561. Do you want to rebase and fix those extra testcases here? Or do you think that's best left for a separate patch?

Oh, I'll rebase to the latest version and update my patch

TiehuZhang updated this revision to Diff 365907.Aug 11 2021, 9:46 PM

Harbormaster completed remote builds in B119188: Diff 365907.Aug 11 2021, 11:16 PM

Hi, @dmgreen, the previous implementation doesn't take the order of extends into account, so Instruction does not dominate all uses error will still appear in one of the testcases you mentioned. I have updated the patch to fix the failed case. Could you please check whether this modification is appropriate? Thanks very much.

Yeah, this sounds good to me. I might have used std::distance(TargetBB->begin(), UI) < std::distance(TargetBB->begin(), InsertPt) myself, as there will only even be a few sinking nodes and they may not be in the TargetDB, but your way with collecting the block sounds good too.

LGTM

llvm/lib/CodeGen/CodeGenPrepare.cpp
6954	I would either use uint32_t or uint64_t to be specific. I don't think there could be more than 2^32 instructions in a block.

This revision is now accepted and ready to land.Aug 12 2021, 9:17 AM

Closed by commit rG9cfa9b44a589: [CodeGenPrepare] The instruction to be sunk should be inserted before its user… (authored by TiehuZhang, committed by Peilin Guo <guopeilin1@huawei.com>). · Explain WhyAug 17 2021, 3:58 AM

This revision was automatically updated to reflect the committed changes.

guopeilin added a commit: rG9cfa9b44a589: [CodeGenPrepare] The instruction to be sunk should be inserted before its user….

In D107262#2941680, @dmgreen wrote:

Yeah, this sounds good to me. I might have used std::distance(TargetBB->begin(), UI) < std::distance(TargetBB->begin(), InsertPt) myself, as there will only even be a few sinking nodes and they may not be in the TargetDB, but your way with collecting the block sounds good too.

LGTM

Why not ‘comesBefore’ ?

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

CodeGenPrepare.cpp

8 lines

test/

Transforms/

CodeGenPrepare/

AArch64/

sink-free-instructions-inseltpoison.ll

24 lines

sink-free-instructions.ll

48 lines

Diff 364468

llvm/lib/CodeGen/CodeGenPrepare.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,944 Lines • ▼ Show 20 Lines	bool CodeGenPrepare::tryToSinkFreeOperands(Instruction *I) {

// OpsToSink can contain multiple uses in a use chain (e.g.		// OpsToSink can contain multiple uses in a use chain (e.g.
// (%u1 with %u1 = shufflevector), (%u2 with %u2 = zext %u1)). The dominating		// (%u1 with %u1 = shufflevector), (%u2 with %u2 = zext %u1)). The dominating
// uses must come first, so we process the ops in reverse order so as to not		// uses must come first, so we process the ops in reverse order so as to not
// create invalid IR.		// create invalid IR.
BasicBlock *TargetBB = I->getParent();		BasicBlock *TargetBB = I->getParent();
bool Changed = false;		bool Changed = false;
SmallVector<Use *, 4> ToReplace;		SmallVector<Use *, 4> ToReplace;
		Instruction *InsertPoint = I;
for (Use *U : reverse(OpsToSink)) {		for (Use *U : reverse(OpsToSink)) {
		dmgreenUnsubmitted Not Done Reply Inline Actions I would either use uint32_t or uint64_t to be specific. I don't think there could be more than 2^32 instructions in a block. dmgreen: I would either use uint32_t or uint64_t to be specific. I don't think there could be more than…
auto *UI = cast<Instruction>(U->get());		auto *UI = cast<Instruction>(U->get());
if (UI->getParent() == TargetBB \|\| isa<PHINode>(UI))		if (isa<PHINode>(UI))
		continue;
		if (UI->getParent() == TargetBB) {
		InsertPoint = UI;
continue;		continue;
		}
ToReplace.push_back(U);		ToReplace.push_back(U);
}		}

SetVector<Instruction *> MaybeDead;		SetVector<Instruction *> MaybeDead;
DenseMap<Instruction , Instruction > NewInstructions;		DenseMap<Instruction , Instruction > NewInstructions;
Instruction *InsertPoint = I;
for (Use *U : ToReplace) {		for (Use *U : ToReplace) {
auto *UI = cast<Instruction>(U->get());		auto *UI = cast<Instruction>(U->get());
Instruction *NI = UI->clone();		Instruction *NI = UI->clone();
NewInstructions[UI] = NI;		NewInstructions[UI] = NI;
MaybeDead.insert(UI);		MaybeDead.insert(UI);
LLVM_DEBUG(dbgs() << "Sinking " << UI << " to user " << I << "\n");		LLVM_DEBUG(dbgs() << "Sinking " << UI << " to user " << I << "\n");
NI->insertBefore(InsertPoint);		NI->insertBefore(InsertPoint);
InsertPoint = NI;		InsertPoint = NI;
▲ Show 20 Lines • Show All 1,341 Lines • Show Last 20 Lines

llvm/test/Transforms/CodeGenPrepare/AArch64/sink-free-instructions-inseltpoison.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -codegenprepare -S \| FileCheck %s		; RUN: opt < %s -codegenprepare -S \| FileCheck %s

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"		target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-unknown"		target triple = "aarch64-unknown"

define <8 x i16> @sink_zext(<8 x i8> %a, <8 x i8> %b, i1 %c) {		define <8 x i16> @sink_zext(<8 x i8> %a, <8 x i8> %b, i1 %c) {
; CHECK-LABEL: @sink_zext(		; CHECK-LABEL: @sink_zext(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]		; CHECK-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]
; CHECK: if.then:		; CHECK: if.then:
; CHECK-NEXT: [[ZB_1:%.]] = zext <8 x i8> [[B:%.]] to <8 x i16>
; CHECK-NEXT: [[TMP0:%.]] = zext <8 x i8> [[A:%.]] to <8 x i16>		; CHECK-NEXT: [[TMP0:%.]] = zext <8 x i8> [[A:%.]] to <8 x i16>
		; CHECK-NEXT: [[ZB_1:%.]] = zext <8 x i8> [[B:%.]] to <8 x i16>
; CHECK-NEXT: [[RES_1:%.*]] = add <8 x i16> [[TMP0]], [[ZB_1]]		; CHECK-NEXT: [[RES_1:%.*]] = add <8 x i16> [[TMP0]], [[ZB_1]]
; CHECK-NEXT: ret <8 x i16> [[RES_1]]		; CHECK-NEXT: ret <8 x i16> [[RES_1]]
; CHECK: if.else:		; CHECK: if.else:
; CHECK-NEXT: [[ZB_2:%.*]] = zext <8 x i8> [[B]] to <8 x i16>
; CHECK-NEXT: [[TMP1:%.*]] = zext <8 x i8> [[A]] to <8 x i16>		; CHECK-NEXT: [[TMP1:%.*]] = zext <8 x i8> [[A]] to <8 x i16>
		; CHECK-NEXT: [[ZB_2:%.*]] = zext <8 x i8> [[B]] to <8 x i16>
; CHECK-NEXT: [[RES_2:%.*]] = sub <8 x i16> [[TMP1]], [[ZB_2]]		; CHECK-NEXT: [[RES_2:%.*]] = sub <8 x i16> [[TMP1]], [[ZB_2]]
; CHECK-NEXT: ret <8 x i16> [[RES_2]]		; CHECK-NEXT: ret <8 x i16> [[RES_2]]
;		;
entry:		entry:
%za = zext <8 x i8> %a to <8 x i16>		%za = zext <8 x i8> %a to <8 x i16>
br i1 %c, label %if.then, label %if.else		br i1 %c, label %if.then, label %if.else

if.then:		if.then:
%zb.1 = zext <8 x i8> %b to <8 x i16>		%zb.1 = zext <8 x i8> %b to <8 x i16>
%res.1 = add <8 x i16> %za, %zb.1		%res.1 = add <8 x i16> %za, %zb.1
ret <8 x i16> %res.1		ret <8 x i16> %res.1

if.else:		if.else:
%zb.2 = zext <8 x i8> %b to <8 x i16>		%zb.2 = zext <8 x i8> %b to <8 x i16>
%res.2 = sub <8 x i16> %za, %zb.2		%res.2 = sub <8 x i16> %za, %zb.2
ret <8 x i16> %res.2		ret <8 x i16> %res.2
}		}

define <8 x i16> @sink_sext(<8 x i8> %a, <8 x i8> %b, i1 %c) {		define <8 x i16> @sink_sext(<8 x i8> %a, <8 x i8> %b, i1 %c) {
; CHECK-LABEL: @sink_sext(		; CHECK-LABEL: @sink_sext(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]		; CHECK-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]
; CHECK: if.then:		; CHECK: if.then:
; CHECK-NEXT: [[ZB_1:%.]] = sext <8 x i8> [[B:%.]] to <8 x i16>
; CHECK-NEXT: [[TMP0:%.]] = sext <8 x i8> [[A:%.]] to <8 x i16>		; CHECK-NEXT: [[TMP0:%.]] = sext <8 x i8> [[A:%.]] to <8 x i16>
		; CHECK-NEXT: [[ZB_1:%.]] = sext <8 x i8> [[B:%.]] to <8 x i16>
; CHECK-NEXT: [[RES_1:%.*]] = add <8 x i16> [[TMP0]], [[ZB_1]]		; CHECK-NEXT: [[RES_1:%.*]] = add <8 x i16> [[TMP0]], [[ZB_1]]
; CHECK-NEXT: ret <8 x i16> [[RES_1]]		; CHECK-NEXT: ret <8 x i16> [[RES_1]]
; CHECK: if.else:		; CHECK: if.else:
; CHECK-NEXT: [[ZB_2:%.*]] = sext <8 x i8> [[B]] to <8 x i16>
; CHECK-NEXT: [[TMP1:%.*]] = sext <8 x i8> [[A]] to <8 x i16>		; CHECK-NEXT: [[TMP1:%.*]] = sext <8 x i8> [[A]] to <8 x i16>
		; CHECK-NEXT: [[ZB_2:%.*]] = sext <8 x i8> [[B]] to <8 x i16>
; CHECK-NEXT: [[RES_2:%.*]] = sub <8 x i16> [[TMP1]], [[ZB_2]]		; CHECK-NEXT: [[RES_2:%.*]] = sub <8 x i16> [[TMP1]], [[ZB_2]]
; CHECK-NEXT: ret <8 x i16> [[RES_2]]		; CHECK-NEXT: ret <8 x i16> [[RES_2]]
;		;
entry:		entry:
%za = sext <8 x i8> %a to <8 x i16>		%za = sext <8 x i8> %a to <8 x i16>
br i1 %c, label %if.then, label %if.else		br i1 %c, label %if.then, label %if.else

if.then:		if.then:
%zb.1 = sext <8 x i8> %b to <8 x i16>		%zb.1 = sext <8 x i8> %b to <8 x i16>
%res.1 = add <8 x i16> %za, %zb.1		%res.1 = add <8 x i16> %za, %zb.1
ret <8 x i16> %res.1		ret <8 x i16> %res.1

if.else:		if.else:
%zb.2 = sext <8 x i8> %b to <8 x i16>		%zb.2 = sext <8 x i8> %b to <8 x i16>
%res.2 = sub <8 x i16> %za, %zb.2		%res.2 = sub <8 x i16> %za, %zb.2
ret <8 x i16> %res.2		ret <8 x i16> %res.2
}		}

define <8 x i16> @do_not_sink_nonfree_zext(<8 x i8> %a, <8 x i8> %b, i1 %c) {		define <8 x i16> @do_not_sink_nonfree_zext(<8 x i8> %a, <8 x i8> %b, i1 %c) {
; CHECK-LABEL: @do_not_sink_nonfree_zext(		; CHECK-LABEL: @do_not_sink_nonfree_zext(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]		; CHECK-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]
; CHECK: if.then:		; CHECK: if.then:
; CHECK-NEXT: [[ZB_1:%.]] = sext <8 x i8> [[B:%.]] to <8 x i16>
; CHECK-NEXT: [[TMP0:%.]] = sext <8 x i8> [[A:%.]] to <8 x i16>		; CHECK-NEXT: [[TMP0:%.]] = sext <8 x i8> [[A:%.]] to <8 x i16>
		; CHECK-NEXT: [[ZB_1:%.]] = sext <8 x i8> [[B:%.]] to <8 x i16>
; CHECK-NEXT: [[RES_1:%.*]] = add <8 x i16> [[TMP0]], [[ZB_1]]		; CHECK-NEXT: [[RES_1:%.*]] = add <8 x i16> [[TMP0]], [[ZB_1]]
; CHECK-NEXT: ret <8 x i16> [[RES_1]]		; CHECK-NEXT: ret <8 x i16> [[RES_1]]
; CHECK: if.else:		; CHECK: if.else:
; CHECK-NEXT: [[ZB_2:%.*]] = sext <8 x i8> [[B]] to <8 x i16>		; CHECK-NEXT: [[ZB_2:%.*]] = sext <8 x i8> [[B]] to <8 x i16>
; CHECK-NEXT: ret <8 x i16> [[ZB_2]]		; CHECK-NEXT: ret <8 x i16> [[ZB_2]]
;		;
entry:		entry:
%za = sext <8 x i8> %a to <8 x i16>		%za = sext <8 x i8> %a to <8 x i16>
Show All 9 Lines	if.else:
ret <8 x i16> %zb.2		ret <8 x i16> %zb.2
}		}

define <8 x i16> @do_not_sink_nonfree_sext(<8 x i8> %a, <8 x i8> %b, i1 %c) {		define <8 x i16> @do_not_sink_nonfree_sext(<8 x i8> %a, <8 x i8> %b, i1 %c) {
; CHECK-LABEL: @do_not_sink_nonfree_sext(		; CHECK-LABEL: @do_not_sink_nonfree_sext(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]		; CHECK-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]
; CHECK: if.then:		; CHECK: if.then:
; CHECK-NEXT: [[ZB_1:%.]] = sext <8 x i8> [[B:%.]] to <8 x i16>
; CHECK-NEXT: [[TMP0:%.]] = sext <8 x i8> [[A:%.]] to <8 x i16>		; CHECK-NEXT: [[TMP0:%.]] = sext <8 x i8> [[A:%.]] to <8 x i16>
		; CHECK-NEXT: [[ZB_1:%.]] = sext <8 x i8> [[B:%.]] to <8 x i16>
; CHECK-NEXT: [[RES_1:%.*]] = add <8 x i16> [[TMP0]], [[ZB_1]]		; CHECK-NEXT: [[RES_1:%.*]] = add <8 x i16> [[TMP0]], [[ZB_1]]
; CHECK-NEXT: ret <8 x i16> [[RES_1]]		; CHECK-NEXT: ret <8 x i16> [[RES_1]]
; CHECK: if.else:		; CHECK: if.else:
; CHECK-NEXT: [[ZB_2:%.*]] = sext <8 x i8> [[B]] to <8 x i16>		; CHECK-NEXT: [[ZB_2:%.*]] = sext <8 x i8> [[B]] to <8 x i16>
; CHECK-NEXT: ret <8 x i16> [[ZB_2]]		; CHECK-NEXT: ret <8 x i16> [[ZB_2]]
;		;
entry:		entry:
%za = sext <8 x i8> %a to <8 x i16>		%za = sext <8 x i8> %a to <8 x i16>
Show All 10 Lines
}		}

; The masks used are suitable for umull, sink shufflevector to users.		; The masks used are suitable for umull, sink shufflevector to users.
define <8 x i16> @sink_shufflevector_umull(<16 x i8> %a, <16 x i8> %b) {		define <8 x i16> @sink_shufflevector_umull(<16 x i8> %a, <16 x i8> %b) {
; CHECK-LABEL: @sink_shufflevector_umull(		; CHECK-LABEL: @sink_shufflevector_umull(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br i1 undef, label [[IF_THEN:%.]], label [[IF_ELSE:%.]]		; CHECK-NEXT: br i1 undef, label [[IF_THEN:%.]], label [[IF_ELSE:%.]]
; CHECK: if.then:		; CHECK: if.then:
; CHECK-NEXT: [[S2:%.]] = shufflevector <16 x i8> [[B:%.]], <16 x i8> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: [[TMP0:%.]] = shufflevector <16 x i8> [[A:%.]], <16 x i8> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: [[TMP0:%.]] = shufflevector <16 x i8> [[A:%.]], <16 x i8> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; CHECK-NEXT: [[S2:%.]] = shufflevector <16 x i8> [[B:%.]], <16 x i8> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: [[VMULL0:%.*]] = tail call <8 x i16> @llvm.aarch64.neon.umull.v8i16(<8 x i8> [[TMP0]], <8 x i8> [[S2]])		; CHECK-NEXT: [[VMULL0:%.*]] = tail call <8 x i16> @llvm.aarch64.neon.umull.v8i16(<8 x i8> [[TMP0]], <8 x i8> [[S2]])
; CHECK-NEXT: ret <8 x i16> [[VMULL0]]		; CHECK-NEXT: ret <8 x i16> [[VMULL0]]
; CHECK: if.else:		; CHECK: if.else:
; CHECK-NEXT: [[S4:%.*]] = shufflevector <16 x i8> [[B]], <16 x i8> poison, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i8> [[A]], <16 x i8> poison, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i8> [[A]], <16 x i8> poison, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		; CHECK-NEXT: [[S4:%.*]] = shufflevector <16 x i8> [[B]], <16 x i8> poison, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: [[VMULL1:%.*]] = tail call <8 x i16> @llvm.aarch64.neon.umull.v8i16(<8 x i8> [[TMP1]], <8 x i8> [[S4]])		; CHECK-NEXT: [[VMULL1:%.*]] = tail call <8 x i16> @llvm.aarch64.neon.umull.v8i16(<8 x i8> [[TMP1]], <8 x i8> [[S4]])
; CHECK-NEXT: ret <8 x i16> [[VMULL1]]		; CHECK-NEXT: ret <8 x i16> [[VMULL1]]
;		;
entry:		entry:
%s1 = shufflevector <16 x i8> %a, <16 x i8> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		%s1 = shufflevector <16 x i8> %a, <16 x i8> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%s3 = shufflevector <16 x i8> %a, <16 x i8> poison, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%s3 = shufflevector <16 x i8> %a, <16 x i8> poison, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
br i1 undef, label %if.then, label %if.else		br i1 undef, label %if.then, label %if.else

Show All 12 Lines
define <8 x i16> @sink_shufflevector_ext_subadd(<16 x i8> %a, <16 x i8> %b) {		define <8 x i16> @sink_shufflevector_ext_subadd(<16 x i8> %a, <16 x i8> %b) {
; CHECK-LABEL: @sink_shufflevector_ext_subadd(		; CHECK-LABEL: @sink_shufflevector_ext_subadd(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[S1:%.]] = shufflevector <16 x i8> [[A:%.]], <16 x i8> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: [[S1:%.]] = shufflevector <16 x i8> [[A:%.]], <16 x i8> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: [[S3:%.*]] = shufflevector <16 x i8> [[A]], <16 x i8> poison, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: [[S3:%.*]] = shufflevector <16 x i8> [[A]], <16 x i8> poison, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: br i1 undef, label [[IF_THEN:%.]], label [[IF_ELSE:%.]]		; CHECK-NEXT: br i1 undef, label [[IF_THEN:%.]], label [[IF_ELSE:%.]]
; CHECK: if.then:		; CHECK: if.then:
; CHECK-NEXT: [[S2:%.]] = shufflevector <16 x i8> [[B:%.]], <16 x i8> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: [[S2:%.]] = shufflevector <16 x i8> [[B:%.]], <16 x i8> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: [[Z2:%.*]] = zext <8 x i8> [[S2]] to <8 x i16>
; CHECK-NEXT: [[TMP0:%.*]] = zext <8 x i8> [[S1]] to <8 x i16>		; CHECK-NEXT: [[TMP0:%.*]] = zext <8 x i8> [[S1]] to <8 x i16>
		; CHECK-NEXT: [[Z2:%.*]] = zext <8 x i8> [[S2]] to <8 x i16>
; CHECK-NEXT: [[RES1:%.*]] = add <8 x i16> [[TMP0]], [[Z2]]		; CHECK-NEXT: [[RES1:%.*]] = add <8 x i16> [[TMP0]], [[Z2]]
; CHECK-NEXT: ret <8 x i16> [[RES1]]		; CHECK-NEXT: ret <8 x i16> [[RES1]]
; CHECK: if.else:		; CHECK: if.else:
; CHECK-NEXT: [[S4:%.*]] = shufflevector <16 x i8> [[B]], <16 x i8> poison, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: [[S4:%.*]] = shufflevector <16 x i8> [[B]], <16 x i8> poison, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: [[Z4:%.*]] = sext <8 x i8> [[S4]] to <8 x i16>
; CHECK-NEXT: [[TMP1:%.*]] = sext <8 x i8> [[S3]] to <8 x i16>		; CHECK-NEXT: [[TMP1:%.*]] = sext <8 x i8> [[S3]] to <8 x i16>
		; CHECK-NEXT: [[Z4:%.*]] = sext <8 x i8> [[S4]] to <8 x i16>
; CHECK-NEXT: [[RES2:%.*]] = sub <8 x i16> [[TMP1]], [[Z4]]		; CHECK-NEXT: [[RES2:%.*]] = sub <8 x i16> [[TMP1]], [[Z4]]
; CHECK-NEXT: ret <8 x i16> [[RES2]]		; CHECK-NEXT: ret <8 x i16> [[RES2]]
;		;
entry:		entry:
%s1 = shufflevector <16 x i8> %a, <16 x i8> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		%s1 = shufflevector <16 x i8> %a, <16 x i8> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%z1 = zext <8 x i8> %s1 to <8 x i16>		%z1 = zext <8 x i8> %s1 to <8 x i16>
%s3 = shufflevector <16 x i8> %a, <16 x i8> poison, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%s3 = shufflevector <16 x i8> %a, <16 x i8> poison, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%z3 = sext <8 x i8> %s3 to <8 x i16>		%z3 = sext <8 x i8> %s3 to <8 x i16>
Show All 21 Lines
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[S1:%.]] = shufflevector <16 x i8> [[A:%.]], <16 x i8> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: [[S1:%.]] = shufflevector <16 x i8> [[A:%.]], <16 x i8> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: [[S3:%.*]] = shufflevector <16 x i8> [[A]], <16 x i8> poison, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: [[S3:%.*]] = shufflevector <16 x i8> [[A]], <16 x i8> poison, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: [[Z3:%.*]] = sext <8 x i8> [[S3]] to <8 x i16>		; CHECK-NEXT: [[Z3:%.*]] = sext <8 x i8> [[S3]] to <8 x i16>
; CHECK-NEXT: call void @user1(<8 x i16> [[Z3]])		; CHECK-NEXT: call void @user1(<8 x i16> [[Z3]])
; CHECK-NEXT: br i1 undef, label [[IF_THEN:%.]], label [[IF_ELSE:%.]]		; CHECK-NEXT: br i1 undef, label [[IF_THEN:%.]], label [[IF_ELSE:%.]]
; CHECK: if.then:		; CHECK: if.then:
; CHECK-NEXT: [[S2:%.]] = shufflevector <16 x i8> [[B:%.]], <16 x i8> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: [[S2:%.]] = shufflevector <16 x i8> [[B:%.]], <16 x i8> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: [[Z2:%.*]] = zext <8 x i8> [[S2]] to <8 x i16>
; CHECK-NEXT: [[TMP0:%.*]] = zext <8 x i8> [[S1]] to <8 x i16>		; CHECK-NEXT: [[TMP0:%.*]] = zext <8 x i8> [[S1]] to <8 x i16>
		; CHECK-NEXT: [[Z2:%.*]] = zext <8 x i8> [[S2]] to <8 x i16>
; CHECK-NEXT: [[RES1:%.*]] = add <8 x i16> [[TMP0]], [[Z2]]		; CHECK-NEXT: [[RES1:%.*]] = add <8 x i16> [[TMP0]], [[Z2]]
; CHECK-NEXT: ret <8 x i16> [[RES1]]		; CHECK-NEXT: ret <8 x i16> [[RES1]]
; CHECK: if.else:		; CHECK: if.else:
; CHECK-NEXT: [[S4:%.*]] = shufflevector <16 x i8> [[B]], <16 x i8> poison, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: [[S4:%.*]] = shufflevector <16 x i8> [[B]], <16 x i8> poison, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: [[Z4:%.*]] = sext <8 x i8> [[S4]] to <8 x i16>
; CHECK-NEXT: [[TMP1:%.*]] = sext <8 x i8> [[S3]] to <8 x i16>		; CHECK-NEXT: [[TMP1:%.*]] = sext <8 x i8> [[S3]] to <8 x i16>
		; CHECK-NEXT: [[Z4:%.*]] = sext <8 x i8> [[S4]] to <8 x i16>
; CHECK-NEXT: [[RES2:%.*]] = sub <8 x i16> [[TMP1]], [[Z4]]		; CHECK-NEXT: [[RES2:%.*]] = sub <8 x i16> [[TMP1]], [[Z4]]
; CHECK-NEXT: ret <8 x i16> [[RES2]]		; CHECK-NEXT: ret <8 x i16> [[RES2]]
;		;
entry:		entry:
%s1 = shufflevector <16 x i8> %a, <16 x i8> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		%s1 = shufflevector <16 x i8> %a, <16 x i8> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%z1 = zext <8 x i8> %s1 to <8 x i16>		%z1 = zext <8 x i8> %s1 to <8 x i16>
%s3 = shufflevector <16 x i8> %a, <16 x i8> poison, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%s3 = shufflevector <16 x i8> %a, <16 x i8> poison, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%z3 = sext <8 x i8> %s3 to <8 x i16>		%z3 = sext <8 x i8> %s3 to <8 x i16>
▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

llvm/test/Transforms/CodeGenPrepare/AArch64/sink-free-instructions.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -codegenprepare -S \| FileCheck %s		; RUN: opt < %s -codegenprepare -S \| FileCheck %s

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"		target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-unknown"		target triple = "aarch64-unknown"

define <8 x i16> @sink_zext(<8 x i8> %a, <8 x i8> %b, i1 %c) {		define <8 x i16> @sink_zext(<8 x i8> %a, <8 x i8> %b, i1 %c) {
; CHECK-LABEL: @sink_zext(		; CHECK-LABEL: @sink_zext(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]		; CHECK-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]
; CHECK: if.then:		; CHECK: if.then:
; CHECK-NEXT: [[ZB_1:%.]] = zext <8 x i8> [[B:%.]] to <8 x i16>
; CHECK-NEXT: [[TMP0:%.]] = zext <8 x i8> [[A:%.]] to <8 x i16>		; CHECK-NEXT: [[TMP0:%.]] = zext <8 x i8> [[A:%.]] to <8 x i16>
		; CHECK-NEXT: [[ZB_1:%.]] = zext <8 x i8> [[B:%.]] to <8 x i16>
; CHECK-NEXT: [[RES_1:%.*]] = add <8 x i16> [[TMP0]], [[ZB_1]]		; CHECK-NEXT: [[RES_1:%.*]] = add <8 x i16> [[TMP0]], [[ZB_1]]
; CHECK-NEXT: ret <8 x i16> [[RES_1]]		; CHECK-NEXT: ret <8 x i16> [[RES_1]]
; CHECK: if.else:		; CHECK: if.else:
; CHECK-NEXT: [[ZB_2:%.*]] = zext <8 x i8> [[B]] to <8 x i16>
; CHECK-NEXT: [[TMP1:%.*]] = zext <8 x i8> [[A]] to <8 x i16>		; CHECK-NEXT: [[TMP1:%.*]] = zext <8 x i8> [[A]] to <8 x i16>
		; CHECK-NEXT: [[ZB_2:%.*]] = zext <8 x i8> [[B]] to <8 x i16>
; CHECK-NEXT: [[RES_2:%.*]] = sub <8 x i16> [[TMP1]], [[ZB_2]]		; CHECK-NEXT: [[RES_2:%.*]] = sub <8 x i16> [[TMP1]], [[ZB_2]]
; CHECK-NEXT: ret <8 x i16> [[RES_2]]		; CHECK-NEXT: ret <8 x i16> [[RES_2]]
;		;
entry:		entry:
%za = zext <8 x i8> %a to <8 x i16>		%za = zext <8 x i8> %a to <8 x i16>
br i1 %c, label %if.then, label %if.else		br i1 %c, label %if.then, label %if.else

if.then:		if.then:
%zb.1 = zext <8 x i8> %b to <8 x i16>		%zb.1 = zext <8 x i8> %b to <8 x i16>
%res.1 = add <8 x i16> %za, %zb.1		%res.1 = add <8 x i16> %za, %zb.1
ret <8 x i16> %res.1		ret <8 x i16> %res.1

if.else:		if.else:
%zb.2 = zext <8 x i8> %b to <8 x i16>		%zb.2 = zext <8 x i8> %b to <8 x i16>
%res.2 = sub <8 x i16> %za, %zb.2		%res.2 = sub <8 x i16> %za, %zb.2
ret <8 x i16> %res.2		ret <8 x i16> %res.2
}		}

define <8 x i16> @sink_sext(<8 x i8> %a, <8 x i8> %b, i1 %c) {		define <8 x i16> @sink_sext(<8 x i8> %a, <8 x i8> %b, i1 %c) {
; CHECK-LABEL: @sink_sext(		; CHECK-LABEL: @sink_sext(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]		; CHECK-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]
; CHECK: if.then:		; CHECK: if.then:
; CHECK-NEXT: [[ZB_1:%.]] = sext <8 x i8> [[B:%.]] to <8 x i16>
; CHECK-NEXT: [[TMP0:%.]] = sext <8 x i8> [[A:%.]] to <8 x i16>		; CHECK-NEXT: [[TMP0:%.]] = sext <8 x i8> [[A:%.]] to <8 x i16>
		; CHECK-NEXT: [[ZB_1:%.]] = sext <8 x i8> [[B:%.]] to <8 x i16>
; CHECK-NEXT: [[RES_1:%.*]] = add <8 x i16> [[TMP0]], [[ZB_1]]		; CHECK-NEXT: [[RES_1:%.*]] = add <8 x i16> [[TMP0]], [[ZB_1]]
; CHECK-NEXT: ret <8 x i16> [[RES_1]]		; CHECK-NEXT: ret <8 x i16> [[RES_1]]
; CHECK: if.else:		; CHECK: if.else:
; CHECK-NEXT: [[ZB_2:%.*]] = sext <8 x i8> [[B]] to <8 x i16>
; CHECK-NEXT: [[TMP1:%.*]] = sext <8 x i8> [[A]] to <8 x i16>		; CHECK-NEXT: [[TMP1:%.*]] = sext <8 x i8> [[A]] to <8 x i16>
		; CHECK-NEXT: [[ZB_2:%.*]] = sext <8 x i8> [[B]] to <8 x i16>
; CHECK-NEXT: [[RES_2:%.*]] = sub <8 x i16> [[TMP1]], [[ZB_2]]		; CHECK-NEXT: [[RES_2:%.*]] = sub <8 x i16> [[TMP1]], [[ZB_2]]
; CHECK-NEXT: ret <8 x i16> [[RES_2]]		; CHECK-NEXT: ret <8 x i16> [[RES_2]]
;		;
entry:		entry:
%za = sext <8 x i8> %a to <8 x i16>		%za = sext <8 x i8> %a to <8 x i16>
br i1 %c, label %if.then, label %if.else		br i1 %c, label %if.then, label %if.else

if.then:		if.then:
%zb.1 = sext <8 x i8> %b to <8 x i16>		%zb.1 = sext <8 x i8> %b to <8 x i16>
%res.1 = add <8 x i16> %za, %zb.1		%res.1 = add <8 x i16> %za, %zb.1
ret <8 x i16> %res.1		ret <8 x i16> %res.1

if.else:		if.else:
%zb.2 = sext <8 x i8> %b to <8 x i16>		%zb.2 = sext <8 x i8> %b to <8 x i16>
%res.2 = sub <8 x i16> %za, %zb.2		%res.2 = sub <8 x i16> %za, %zb.2
ret <8 x i16> %res.2		ret <8 x i16> %res.2
}		}

define <8 x i16> @do_not_sink_nonfree_zext(<8 x i8> %a, <8 x i8> %b, i1 %c) {		define <8 x i16> @do_not_sink_nonfree_zext(<8 x i8> %a, <8 x i8> %b, i1 %c) {
; CHECK-LABEL: @do_not_sink_nonfree_zext(		; CHECK-LABEL: @do_not_sink_nonfree_zext(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]		; CHECK-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]
; CHECK: if.then:		; CHECK: if.then:
; CHECK-NEXT: [[ZB_1:%.]] = sext <8 x i8> [[B:%.]] to <8 x i16>
; CHECK-NEXT: [[TMP0:%.]] = sext <8 x i8> [[A:%.]] to <8 x i16>		; CHECK-NEXT: [[TMP0:%.]] = sext <8 x i8> [[A:%.]] to <8 x i16>
		; CHECK-NEXT: [[ZB_1:%.]] = sext <8 x i8> [[B:%.]] to <8 x i16>
; CHECK-NEXT: [[RES_1:%.*]] = add <8 x i16> [[TMP0]], [[ZB_1]]		; CHECK-NEXT: [[RES_1:%.*]] = add <8 x i16> [[TMP0]], [[ZB_1]]
; CHECK-NEXT: ret <8 x i16> [[RES_1]]		; CHECK-NEXT: ret <8 x i16> [[RES_1]]
; CHECK: if.else:		; CHECK: if.else:
; CHECK-NEXT: [[ZB_2:%.*]] = sext <8 x i8> [[B]] to <8 x i16>		; CHECK-NEXT: [[ZB_2:%.*]] = sext <8 x i8> [[B]] to <8 x i16>
; CHECK-NEXT: ret <8 x i16> [[ZB_2]]		; CHECK-NEXT: ret <8 x i16> [[ZB_2]]
;		;
entry:		entry:
%za = sext <8 x i8> %a to <8 x i16>		%za = sext <8 x i8> %a to <8 x i16>
Show All 9 Lines	if.else:
ret <8 x i16> %zb.2		ret <8 x i16> %zb.2
}		}

define <8 x i16> @do_not_sink_nonfree_sext(<8 x i8> %a, <8 x i8> %b, i1 %c) {		define <8 x i16> @do_not_sink_nonfree_sext(<8 x i8> %a, <8 x i8> %b, i1 %c) {
; CHECK-LABEL: @do_not_sink_nonfree_sext(		; CHECK-LABEL: @do_not_sink_nonfree_sext(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]		; CHECK-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]
; CHECK: if.then:		; CHECK: if.then:
; CHECK-NEXT: [[ZB_1:%.]] = sext <8 x i8> [[B:%.]] to <8 x i16>
; CHECK-NEXT: [[TMP0:%.]] = sext <8 x i8> [[A:%.]] to <8 x i16>		; CHECK-NEXT: [[TMP0:%.]] = sext <8 x i8> [[A:%.]] to <8 x i16>
		; CHECK-NEXT: [[ZB_1:%.]] = sext <8 x i8> [[B:%.]] to <8 x i16>
; CHECK-NEXT: [[RES_1:%.*]] = add <8 x i16> [[TMP0]], [[ZB_1]]		; CHECK-NEXT: [[RES_1:%.*]] = add <8 x i16> [[TMP0]], [[ZB_1]]
; CHECK-NEXT: ret <8 x i16> [[RES_1]]		; CHECK-NEXT: ret <8 x i16> [[RES_1]]
; CHECK: if.else:		; CHECK: if.else:
; CHECK-NEXT: [[ZB_2:%.*]] = sext <8 x i8> [[B]] to <8 x i16>		; CHECK-NEXT: [[ZB_2:%.*]] = sext <8 x i8> [[B]] to <8 x i16>
; CHECK-NEXT: ret <8 x i16> [[ZB_2]]		; CHECK-NEXT: ret <8 x i16> [[ZB_2]]
;		;
entry:		entry:
%za = sext <8 x i8> %a to <8 x i16>		%za = sext <8 x i8> %a to <8 x i16>
Show All 10 Lines
}		}

; The masks used are suitable for umull, sink shufflevector to users.		; The masks used are suitable for umull, sink shufflevector to users.
define <8 x i16> @sink_shufflevector_umull(<16 x i8> %a, <16 x i8> %b) {		define <8 x i16> @sink_shufflevector_umull(<16 x i8> %a, <16 x i8> %b) {
; CHECK-LABEL: @sink_shufflevector_umull(		; CHECK-LABEL: @sink_shufflevector_umull(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br i1 undef, label [[IF_THEN:%.]], label [[IF_ELSE:%.]]		; CHECK-NEXT: br i1 undef, label [[IF_THEN:%.]], label [[IF_ELSE:%.]]
; CHECK: if.then:		; CHECK: if.then:
; CHECK-NEXT: [[S2:%.]] = shufflevector <16 x i8> [[B:%.]], <16 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: [[TMP0:%.]] = shufflevector <16 x i8> [[A:%.]], <16 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: [[TMP0:%.]] = shufflevector <16 x i8> [[A:%.]], <16 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; CHECK-NEXT: [[S2:%.]] = shufflevector <16 x i8> [[B:%.]], <16 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: [[VMULL0:%.*]] = tail call <8 x i16> @llvm.aarch64.neon.umull.v8i16(<8 x i8> [[TMP0]], <8 x i8> [[S2]])		; CHECK-NEXT: [[VMULL0:%.*]] = tail call <8 x i16> @llvm.aarch64.neon.umull.v8i16(<8 x i8> [[TMP0]], <8 x i8> [[S2]])
; CHECK-NEXT: ret <8 x i16> [[VMULL0]]		; CHECK-NEXT: ret <8 x i16> [[VMULL0]]
; CHECK: if.else:		; CHECK: if.else:
; CHECK-NEXT: [[S4:%.*]] = shufflevector <16 x i8> [[B]], <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i8> [[A]], <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <16 x i8> [[A]], <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		; CHECK-NEXT: [[S4:%.*]] = shufflevector <16 x i8> [[B]], <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: [[VMULL1:%.*]] = tail call <8 x i16> @llvm.aarch64.neon.umull.v8i16(<8 x i8> [[TMP1]], <8 x i8> [[S4]])		; CHECK-NEXT: [[VMULL1:%.*]] = tail call <8 x i16> @llvm.aarch64.neon.umull.v8i16(<8 x i8> [[TMP1]], <8 x i8> [[S4]])
; CHECK-NEXT: ret <8 x i16> [[VMULL1]]		; CHECK-NEXT: ret <8 x i16> [[VMULL1]]
;		;
entry:		entry:
%s1 = shufflevector <16 x i8> %a, <16 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		%s1 = shufflevector <16 x i8> %a, <16 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%s3 = shufflevector <16 x i8> %a, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%s3 = shufflevector <16 x i8> %a, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
br i1 undef, label %if.then, label %if.else		br i1 undef, label %if.then, label %if.else

▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	if.then:
%vmull0 = tail call <8 x i16> @llvm.aarch64.neon.umull.v8i16(<8 x i8> %s1, <8 x i8> %s2) #3		%vmull0 = tail call <8 x i16> @llvm.aarch64.neon.umull.v8i16(<8 x i8> %s1, <8 x i8> %s2) #3
ret <8 x i16> %vmull0		ret <8 x i16> %vmull0

if.else:		if.else:
%s4 = shufflevector <16 x i8> %b, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 10, i32 12, i32 13, i32 14, i32 15>		%s4 = shufflevector <16 x i8> %b, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 10, i32 12, i32 13, i32 14, i32 15>
%vmull1 = tail call <8 x i16> @llvm.aarch64.neon.umull.v8i16(<8 x i8> %s3, <8 x i8> %s4) #3		%vmull1 = tail call <8 x i16> @llvm.aarch64.neon.umull.v8i16(<8 x i8> %s3, <8 x i8> %s4) #3
ret <8 x i16> %vmull1		ret <8 x i16> %vmull1
}		}


; Function Attrs: nounwind readnone		; Function Attrs: nounwind readnone
declare <8 x i16> @llvm.aarch64.neon.umull.v8i16(<8 x i8>, <8 x i8>) #2		declare <8 x i16> @llvm.aarch64.neon.umull.v8i16(<8 x i8>, <8 x i8>) #2

		; The insertelement should be inserted before shufflevector, otherwise 'does not dominate all uses' error will occur.
		define <4 x i32> @sink_insertment(i16 %e, i8 %f) {
		; CHECK-LABEL: @sink_insertment(
		; CHECK-NEXT: for.cond4.preheader.lr.ph:
		; CHECK-NEXT: [[CMP:%.]] = icmp slt i8 [[F:%.]], 0
		; CHECK-NEXT: [[CONV25:%.]] = sext i16 [[E:%.]] to i32
		; CHECK-NEXT: br i1 [[CMP]], label [[FOR_COND4_PREHEADER_US_PREHEADER:%.]], label [[FOR_COND4_PREHEADER_PREHEADER:%.]]
		; CHECK: for.cond4.preheader.us.preheader:
		; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> poison, i32 [[CONV25]], i32 0
		; CHECK-NEXT: [[BROADCAST_SPLAT144:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer
		; CHECK-NEXT: [[TMP1:%.*]] = mul <4 x i32> zeroinitializer, [[BROADCAST_SPLAT144]]
		; CHECK-NEXT: ret <4 x i32> [[TMP1]]
		; CHECK: for.cond4.preheader.preheader:
		; CHECK-NEXT: ret <4 x i32> zeroinitializer
		;
		for.cond4.preheader.lr.ph:
		%cmp = icmp slt i8 %f, 0
		%conv25 = sext i16 %e to i32
		%broadcast.splatinsert143 = insertelement <4 x i32> poison, i32 %conv25, i32 0
		br i1 %cmp, label %for.cond4.preheader.us.preheader, label %for.cond4.preheader.preheader

		for.cond4.preheader.us.preheader: ; preds = %for.cond4.preheader.lr.ph
		%broadcast.splat144 = shufflevector <4 x i32> %broadcast.splatinsert143, <4 x i32> poison, <4 x i32> zeroinitializer
		%0 = mul <4 x i32> zeroinitializer, %broadcast.splat144
		ret <4 x i32> %0

		for.cond4.preheader.preheader: ; preds = %for.cond4.preheader.lr.ph
		ret <4 x i32> zeroinitializer
		}