This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
3/8
X86ISelLowering.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
avx512-shuffles/
7/11
shuffle-blend.ll
-
combine-sdiv.ll
1/1
haddsub-undef.ll
-
vselect-avx512.ll

Differential D129537

[X86][DAGISel] Don't widen shuffle element with AVX512
ClosedPublic

Authored by LuoYuanke on Jul 11 2022, 8:04 PM.

Download Raw Diff

Details

Reviewers

RKSimon
craig.topper
pengfei
wxiao3

Commits

rG5fb413421057: [X86][DAGISel] Don't widen shuffle element with AVX512

Summary

Currently the X86 shuffle lowering would widen the element type for
shuffle if the mask element value is adjacent. For below example

  %t2 = add nsw <16 x i32> %t0, %t1
  %t3 = sub nsw <16 x i32> %t0, %t1
  %t4 = shufflevector <16 x i32> %t2, <16 x i32> %t3,
                      <16 x i32> <i32 16, i32 17, i32 2, i32 3, i32 4,
                       i32 5, i32 6, i32 7, i32 8, i32 9, i32 10,
                       i32 11, i32 12, i32 13, i32 14, i32 15>

  ret <16 x i32> %t4

Compiler would transform the shuffle to
  %t4 = shufflevector <8 x i64> %t2, <8 x i64> %t3,
                      <8 x i64> <i32 8, i32 1, i32 2, i32 3, i32 4,
                                 i32 5, i32 6, i32 7>
This may lose the oppotunity to let ISel select mask instruction when
avx512 is enabled.

This patch is to prevent the tranform when avx512 feature is enabled.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

LuoYuanke created this revision.Jul 11 2022, 8:04 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 11 2022, 8:04 PM

Herald added subscribers: jsji, pengfei, hiraditya. · View Herald Transcript

LuoYuanke requested review of this revision.Jul 11 2022, 8:04 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 11 2022, 8:04 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B174790: Diff 443822.Jul 11 2022, 8:38 PM

Rebase.

Harbormaster completed remote builds in B174794: Diff 443828.Jul 11 2022, 10:20 PM

RKSimon added a reviewer: RKSimon.Jul 12 2022, 1:16 PM

wxiao3 added a subscriber: wxiao3.Jul 14 2022, 7:57 PM

RKSimon added inline comments.Jul 18 2022, 6:08 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
44372	This doesn't look right - shouldn't it be something like: APInt Mask = APIntOps::ScaleBitMask(ConstCond->getAPIntValue(), NumElts * 2) ?

LuoYuanke added inline comments.Jul 18 2022, 7:50 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
44372	Thanks, Simon. ScaleBitMask perfectly fit this coputation.

Address Simon's comments.

I think we can probably generalize this very easily to any legal widening, e.g. to handle: https://gcc.godbolt.org/z/Pjz5qfYT7 (v32i16 -> v16i32)

Harbormaster completed remote builds in B176040: Diff 445514.Jul 18 2022, 9:57 AM

RKSimon added inline comments.Jul 18 2022, 1:20 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
44346	Check if N->getOpcode() == ISD::VSELECT
44350	auto *ConstCond = dyn_cast<ConstantSDNode>(Cond.getOperand(0)); if (!ConstCond) return SDValue();

LuoYuanke added inline comments.Jul 18 2022, 10:37 PM

llvm/lib/Target/X86/X86ISelLowering.cpp

44346

Is it possible that CondVT is not vXi1 for ISD::VSELECT? I ask the question becasue the comments for ISD::VSELECT says "targets may change the condition type".

/// At first, the VSELECT condition is of vXi1 type. Later, targets may
/// change the condition type in order to match the VSELECT node using a
/// pattern. The condition follows the BooleanContent format of the target.

Address Simon's comments.

In D129537#3659947, @RKSimon wrote:

I think we can probably generalize this very easily to any legal widening, e.g. to handle: https://gcc.godbolt.org/z/Pjz5qfYT7 (v32i16 -> v16i32)

Good suggestion. Let me take a look at it.

Harbormaster completed remote builds in B176174: Diff 445706.Jul 19 2022, 1:17 AM

Address Simon's address to generalize the blend/select combine.

LuoYuanke added inline comments.Jul 20 2022, 1:48 AM

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll
129	For this case not sure if it is worse than left side code (previous code).

LuoYuanke added reviewers: craig.topper, pengfei, wxiao3.Jul 20 2022, 1:49 AM

Herald added a subscriber: StephenFan. · View Herald TranscriptJul 20 2022, 1:49 AM

Harbormaster completed remote builds in B176448: Diff 446075.Jul 20 2022, 2:39 AM

RKSimon mentioned this in rGbb4ff39bafdf: [X86] shuffle-blend.ll - add 32-bit test coverage.Jul 20 2022, 3:24 AM

I've added some additional test coverage to shuffle-blend.ll - please can you rebase?

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll
129	Add a 128-bit vector limit?

Please can you update the patch title/summary?

Rebase and update test case.

LuoYuanke retitled this revision from [X86][DAGISel] Combine select vXi64 with AVX512 target to [X86][DAGISel] Don't widen shuffle element with AVX512.Jul 20 2022, 4:25 AM

LuoYuanke edited the summary of this revision. (Show Details)

LuoYuanke added inline comments.Jul 20 2022, 4:48 AM

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll
129	I'll add 128-bit vector limit. However in this case it is 128-bit vector.

Harbormaster completed remote builds in B176472: Diff 446109.Jul 20 2022, 4:58 AM

Limit the vector bit width >=128 and add test cases.

RKSimon added inline comments.Jul 20 2022, 5:16 AM

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll
4–5	CHECK,AVX512BW,X86-AVX512BW CHECK,AVX512BW,X64-AVX512BW

Harbormaster completed remote builds in B176477: Diff 446114.Jul 20 2022, 5:46 AM

Address Simon's comments.

LuoYuanke marked an inline comment as done.Jul 20 2022, 6:06 AM

LuoYuanke added inline comments.

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll
94	`retl` can be merged to `retq` by `ret{{[l\|q]}}`. Not sure why utils/update_llc_test_checks.py doesn't merge.

RKSimon added inline comments.Jul 20 2022, 6:13 AM

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll
151	regression? you might need to improve the 128-bit limit logic to account for vXi16 specifically

Harbormaster completed remote builds in B176488: Diff 446132.Jul 20 2022, 6:53 AM

LuoYuanke added inline comments.Jul 20 2022, 7:49 AM

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll
151	There is PBLENDW, but there is no PBLENDB instruction, so it is better to widen to v2Xi16 from vXi8. Besides there is more instruction for 16-bit element (e.g., movsh). I'll investigate more on this issue.

Specially handling for vXi8, because vXi16 can be applied PBLENDW while vXi8 can't.

RKSimon added inline comments.Jul 20 2022, 8:16 AM

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll
94	"kmovd %eax" vs "kmovq %rax"
243	pre-commit these additional tests

LuoYuanke added inline comments.Jul 20 2022, 8:19 AM

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll
94	Got it. :)
243	Sure. I'll do it.

Harbormaster completed remote builds in B176516: Diff 446163.Jul 20 2022, 9:23 AM

Rebase

Harbormaster completed remote builds in B176666: Diff 446366.Jul 21 2022, 12:43 AM

RKSimon added inline comments.Jul 22 2022, 2:54 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
19326	This list is going to get longer, and we're likely to miss patterns that only fold to target nodes later on - I'm wondering whether we could consider accepting any TLI.isBinOp() case here?

LGTM - with one minor comment for future work

llvm/lib/Target/X86/X86ISelLowering.cpp
19326	Please can you add a TODO about maybe converting this to TLI.isBinOp()?

This revision is now accepted and ready to land.Jul 25 2022, 6:08 AM

Address Simon's comments.

LuoYuanke added inline comments.Jul 25 2022, 7:08 AM

llvm/test/CodeGen/X86/haddsub-undef.ll
1053	This looks a regression, I'll take a look at it.

LuoYuanke added inline comments.Jul 25 2022, 7:09 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
19326	Thank Simon for the suggestion. It seems there is regression on some cases, I'll take a look at the regression.

Harbormaster completed remote builds in B177366: Diff 447318.Jul 25 2022, 7:28 AM

Yeah - that was what I saw as well - if you want to get this in for 15.x I'd recommend going back to the old switch statement - then investigate the binop general case later (if you solve it soon enough you request a merge)

Revert to previous version and add TODO for checking TLI.isBinOp().

In D129537#3676257, @RKSimon wrote:

Yeah - that was what I saw as well - if you want to get this in for 15.x I'd recommend going back to the old switch statement - then investigate the binop general case later (if you solve it soon enough you request a merge)

Thank Simon for the review. All the comments are valuable. Let me land the old switch statement version first and then investigate the general binop.

Harbormaster completed remote builds in B177519: Diff 447540.Jul 25 2022, 8:47 PM

This revision was landed with ongoing or failed builds.Jul 25 2022, 8:56 PM

Closed by commit rG5fb413421057: [X86][DAGISel] Don't widen shuffle element with AVX512 (authored by LuoYuanke). · Explain Why

This revision was automatically updated to reflect the committed changes.

LuoYuanke added a commit: rG5fb413421057: [X86][DAGISel] Don't widen shuffle element with AVX512.

fhahn added a reverting change: rGf912bab111ad: Revert "[X86][DAGISel] Don't widen shuffle element with AVX512".Jul 28 2022, 7:27 AM

This patch unfortunately causes crashes when building llvm-test-suite optimizing for AVX512.

Reproducer for llc:

target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx"

define i32 @test(<32 x i32> %0) #0 {
entry:
  %1 = mul <32 x i32> %0, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %2 = tail call i32 @llvm.vector.reduce.add.v32i32(<32 x i32> %1)
  ret i32 %2
}

; Function Attrs: nocallback nofree nosync nounwind readnone willreturn
declare i32 @llvm.vector.reduce.add.v32i32(<32 x i32>) #1

attributes #0 = { "min-legal-vector-width"="0" "target-cpu"="skylake-avx512" }
attributes #1 = { nocallback nofree nosync nounwind readnone willreturn }

I've reverted the patch in the meantime to get current main back into a good state.

LuoYuanke mentioned this in rG6b4c386b1e70: [X86] Add test cases for D129537.Jul 30 2022, 4:43 AM

LuoYuanke mentioned this in D130830: Don't widen shuffle element with AVX512.Jul 30 2022, 8:42 PM

LuoYuanke added a reverting change: D131042: Revert "[X86][DAGISel] Don't widen shuffle element with AVX512".Aug 2 2022, 7:48 PM

LuoYuanke mentioned this in rGf885c08034fe: Don't widen shuffle element with AVX512.Oct 12 2022, 4:23 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86ISelLowering.cpp

32 lines

test/

CodeGen/

X86/

avx512-shuffles/

71 lines

26 lines

87 lines

40 lines

Diff 447318

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 19,296 Lines • ▼ Show 20 Lines	if (LowV2Elements == LowV1Elements) {
return true;		return true;
}		}
}		}
}		}

return false;		return false;
}		}

		static bool canCombineAsMaskOperation(SDValue V1, SDValue V2,
		const X86Subtarget &Subtarget,
		SelectionDAG &DAG) {
		if (!Subtarget.hasAVX512())
		return false;

		MVT VT = V1.getSimpleValueType().getScalarType();
		if ((VT == MVT::i16 \|\| VT == MVT::i8) && !Subtarget.hasBWI())
		return false;

		// i8 is better to be widen to i16, because there is PBLENDW for vXi16
		// when the vector bit size is 128 or 256.
		if (VT == MVT::i8 && V1.getSimpleValueType().getSizeInBits() < 512)
		return false;

		auto HasMaskOperation = [&](SDValue V) {
		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
		if (!TLI.isBinOp(V->getOpcode()))
		return false;
		if (!V->hasOneUse())
		return false;

		RKSimonUnsubmitted Not Done Reply Inline Actions This list is going to get longer, and we're likely to miss patterns that only fold to target nodes later on - I'm wondering whether we could consider accepting any TLI.isBinOp() case here? RKSimon: This list is going to get longer, and we're likely to miss patterns that only fold to target…
		RKSimonUnsubmitted Not Done Reply Inline Actions Please can you add a TODO about maybe converting this to TLI.isBinOp()? RKSimon: Please can you add a TODO about maybe converting this to TLI.isBinOp()?
		LuoYuankeAuthorUnsubmitted Done Reply Inline Actions Thank Simon for the suggestion. It seems there is regression on some cases, I'll take a look at the regression. LuoYuanke: Thank Simon for the suggestion. It seems there is regression on some cases, I'll take a look at…
		return true;
		};

		if (HasMaskOperation(V1) \|\| HasMaskOperation(V2))
		return true;

		return false;
		}

// Forward declaration.		// Forward declaration.
static SDValue canonicalizeShuffleMaskWithHorizOp(		static SDValue canonicalizeShuffleMaskWithHorizOp(
MutableArrayRef<SDValue> Ops, MutableArrayRef<int> Mask,		MutableArrayRef<SDValue> Ops, MutableArrayRef<int> Mask,
unsigned RootSizeInBits, const SDLoc &DL, SelectionDAG &DAG,		unsigned RootSizeInBits, const SDLoc &DL, SelectionDAG &DAG,
const X86Subtarget &Subtarget);		const X86Subtarget &Subtarget);

/// Top-level lowering for x86 vector shuffles.		/// Top-level lowering for x86 vector shuffles.
///		///
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	static SDValue lowerVECTOR_SHUFFLE(SDValue Op, const X86Subtarget &Subtarget,
bool V2IsZero = !V2IsUndef && ISD::isBuildVectorAllZeros(V2.getNode());		bool V2IsZero = !V2IsUndef && ISD::isBuildVectorAllZeros(V2.getNode());

// Try to collapse shuffles into using a vector type with fewer elements but		// Try to collapse shuffles into using a vector type with fewer elements but
// wider element types. We cap this to not form integers or floating point		// wider element types. We cap this to not form integers or floating point
// elements wider than 64 bits. It does not seem beneficial to form i128		// elements wider than 64 bits. It does not seem beneficial to form i128
// integers to handle flipping the low and high halves of AVX 256-bit vectors.		// integers to handle flipping the low and high halves of AVX 256-bit vectors.
SmallVector<int, 16> WidenedMask;		SmallVector<int, 16> WidenedMask;
if (VT.getScalarSizeInBits() < 64 && !Is1BitVector &&		if (VT.getScalarSizeInBits() < 64 && !Is1BitVector &&
		!canCombineAsMaskOperation(V1, V2, Subtarget, DAG) &&
canWidenShuffleElements(OrigMask, Zeroable, V2IsZero, WidenedMask)) {		canWidenShuffleElements(OrigMask, Zeroable, V2IsZero, WidenedMask)) {
// Shuffle mask widening should not interfere with a broadcast opportunity		// Shuffle mask widening should not interfere with a broadcast opportunity
// by obfuscating the operands with bitcasts.		// by obfuscating the operands with bitcasts.
// TODO: Avoid lowering directly from this top-level function: make this		// TODO: Avoid lowering directly from this top-level function: make this
// a query (canLowerAsBroadcast) and defer lowering to the type-based calls.		// a query (canLowerAsBroadcast) and defer lowering to the type-based calls.
if (SDValue Broadcast = lowerShuffleAsBroadcast(DL, VT, V1, V2, OrigMask,		if (SDValue Broadcast = lowerShuffleAsBroadcast(DL, VT, V1, V2, OrigMask,
Subtarget, DAG))		Subtarget, DAG))
return Broadcast;		return Broadcast;
▲ Show 20 Lines • Show All 24,918 Lines • ▼ Show 20 Lines	static SDValue combineVSelectToBLENDV(SDNode *N, SelectionDAG &DAG,
// and don't ever optimize vector selects that map to AVX512 mask-registers.		// and don't ever optimize vector selects that map to AVX512 mask-registers.
if (BitWidth < 8 \|\| BitWidth > 64)		if (BitWidth < 8 \|\| BitWidth > 64)
return SDValue();		return SDValue();

auto OnlyUsedAsSelectCond = [](SDValue Cond) {		auto OnlyUsedAsSelectCond = [](SDValue Cond) {
for (SDNode::use_iterator UI = Cond->use_begin(), UE = Cond->use_end();		for (SDNode::use_iterator UI = Cond->use_begin(), UE = Cond->use_end();
UI != UE; ++UI)		UI != UE; ++UI)
if ((UI->getOpcode() != ISD::VSELECT &&		if ((UI->getOpcode() != ISD::VSELECT &&
UI->getOpcode() != X86ISD::BLENDV) \|\|		UI->getOpcode() != X86ISD::BLENDV) \|\|
		RKSimonUnsubmitted Not Done Reply Inline Actions Check if N->getOpcode() == ISD::VSELECT RKSimon: Check if N->getOpcode() == ISD::VSELECT
		LuoYuankeAuthorUnsubmitted Done Reply Inline Actions Is it possible that CondVT is not vXi1 for ISD::VSELECT? I ask the question becasue the comments for ISD::VSELECT says "targets may change the condition type". /// At first, the VSELECT condition is of vXi1 type. Later, targets may /// change the condition type in order to match the VSELECT node using a /// pattern. The condition follows the BooleanContent format of the target. LuoYuanke: Is it possible that CondVT is not vXi1 for ISD::VSELECT? I ask the question becasue the…
UI.getOperandNo() != 0)		UI.getOperandNo() != 0)
return false;		return false;

return true;		return true;
		RKSimonUnsubmitted Not Done Reply Inline Actions auto ConstCond = dyn_cast<ConstantSDNode>(Cond.getOperand(0)); if (!ConstCond) return SDValue(); RKSimon:* ``` auto *ConstCond = dyn_cast<ConstantSDNode>(Cond.getOperand(0)); if (!ConstCond) return…
};		};

APInt DemandedBits(APInt::getSignMask(BitWidth));		APInt DemandedBits(APInt::getSignMask(BitWidth));

if (OnlyUsedAsSelectCond(Cond)) {		if (OnlyUsedAsSelectCond(Cond)) {
KnownBits Known;		KnownBits Known;
TargetLowering::TargetLoweringOpt TLO(DAG, !DCI.isBeforeLegalize(),		TargetLowering::TargetLoweringOpt TLO(DAG, !DCI.isBeforeLegalize(),
!DCI.isBeforeLegalizeOps());		!DCI.isBeforeLegalizeOps());
if (!TLI.SimplifyDemandedBits(Cond, DemandedBits, Known, TLO, 0, true))		if (!TLI.SimplifyDemandedBits(Cond, DemandedBits, Known, TLO, 0, true))
return SDValue();		return SDValue();

// If we changed the computation somewhere in the DAG, this change will		// If we changed the computation somewhere in the DAG, this change will
// affect all users of Cond. Update all the nodes so that we do not use		// affect all users of Cond. Update all the nodes so that we do not use
// the generic VSELECT anymore. Otherwise, we may perform wrong		// the generic VSELECT anymore. Otherwise, we may perform wrong
// optimizations as we messed with the actual expectation for the vector		// optimizations as we messed with the actual expectation for the vector
// boolean values.		// boolean values.
for (SDNode *U : Cond->uses()) {		for (SDNode *U : Cond->uses()) {
if (U->getOpcode() == X86ISD::BLENDV)		if (U->getOpcode() == X86ISD::BLENDV)
continue;		continue;

SDValue SB = DAG.getNode(X86ISD::BLENDV, SDLoc(U), U->getValueType(0),		SDValue SB = DAG.getNode(X86ISD::BLENDV, SDLoc(U), U->getValueType(0),
Cond, U->getOperand(1), U->getOperand(2));		Cond, U->getOperand(1), U->getOperand(2));
		RKSimonUnsubmitted Not Done Reply Inline Actions This doesn't look right - shouldn't it be something like: APInt Mask = APIntOps::ScaleBitMask(ConstCond->getAPIntValue(), NumElts * 2) ? RKSimon: This doesn't look right - shouldn't it be something like: APInt Mask = APIntOps::ScaleBitMask…
		LuoYuankeAuthorUnsubmitted Done Reply Inline Actions Thanks, Simon. ScaleBitMask perfectly fit this coputation. LuoYuanke: Thanks, Simon. ScaleBitMask perfectly fit this coputation.
DAG.ReplaceAllUsesOfValueWith(SDValue(U, 0), SB);		DAG.ReplaceAllUsesOfValueWith(SDValue(U, 0), SB);
DCI.AddToWorklist(U);		DCI.AddToWorklist(U);
}		}
DCI.CommitTargetLoweringOpt(TLO);		DCI.CommitTargetLoweringOpt(TLO);
return SDValue(N, 0);		return SDValue(N, 0);
}		}

// Otherwise we can still at least try to simplify multiple use bits.		// Otherwise we can still at least try to simplify multiple use bits.
▲ Show 20 Lines • Show All 7,798 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=i686-unknown-linux-gnu -mattr=+avx512f \| FileCheck %s --check-prefixes=CHECK,AVX512F,X86-AVX512F		; RUN: llc < %s -mtriple=i686-unknown-linux-gnu -mattr=+avx512f \| FileCheck %s --check-prefixes=CHECK,AVX512F,X86-AVX512F
; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+avx512f \| FileCheck %s --check-prefixes=CHECK,AVX512F,X64-AVX512F		; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+avx512f \| FileCheck %s --check-prefixes=CHECK,AVX512F,X64-AVX512F
; RUN: llc < %s -mtriple=i686-unknown-linux-gnu -mattr=+avx512f,+avx512vl,+avx512bw \| FileCheck %s --check-prefixes=CHECK,AVX512BW		; RUN: llc < %s -mtriple=i686-unknown-linux-gnu -mattr=+avx512f,+avx512vl,+avx512bw \| FileCheck %s --check-prefixes=CHECK,AVX512BW,X86-AVX512BW
; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+avx512f,+avx512vl,+avx512bw \| FileCheck %s --check-prefixes=CHECK,AVX512BW		; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+avx512f,+avx512vl,+avx512bw \| FileCheck %s --check-prefixes=CHECK,AVX512BW,X64-AVX512BW
		RKSimonUnsubmitted Done Reply Inline Actions CHECK,AVX512BW,X86-AVX512BW CHECK,AVX512BW,X64-AVX512BW RKSimon: CHECK,AVX512BW,X86-AVX512BW CHECK,AVX512BW,X64-AVX512BW

define <16 x i32> @shuffle_v8i64(<16 x i32> %t0, <16 x i32> %t1) {		define <16 x i32> @shuffle_v8i64(<16 x i32> %t0, <16 x i32> %t1) {
; AVX512F-LABEL: shuffle_v8i64:		; CHECK-LABEL: shuffle_v8i64:
; AVX512F: # %bb.0: # %entry		; CHECK: # %bb.0: # %entry
; AVX512F-NEXT: vpaddd %zmm1, %zmm0, %zmm2		; CHECK-NEXT: vpaddd %zmm1, %zmm0, %zmm2
; AVX512F-NEXT: vpsubd %zmm1, %zmm0, %zmm0		; CHECK-NEXT: vpsubd %zmm1, %zmm0, %zmm0
; AVX512F-NEXT: movb $-86, %al		; CHECK-NEXT: vshufps {{.*#+}} zmm0 = zmm2[0,1],zmm0[2,3],zmm2[4,5],zmm0[6,7],zmm2[8,9],zmm0[10,11],zmm2[12,13],zmm0[14,15]
; AVX512F-NEXT: kmovw %eax, %k1		; CHECK-NEXT: ret{{[l\|q]}}
; AVX512F-NEXT: vmovdqa64 %zmm0, %zmm2 {%k1}
; AVX512F-NEXT: vmovdqa64 %zmm2, %zmm0
; AVX512F-NEXT: ret{{[l\|q]}}
;
; AVX512BW-LABEL: shuffle_v8i64:
; AVX512BW: # %bb.0: # %entry
; AVX512BW-NEXT: vpaddd %zmm1, %zmm0, %zmm2
; AVX512BW-NEXT: vpsubd %zmm1, %zmm0, %zmm0
; AVX512BW-NEXT: movb $-86, %al
; AVX512BW-NEXT: kmovd %eax, %k1
; AVX512BW-NEXT: vmovdqa64 %zmm0, %zmm2 {%k1}
; AVX512BW-NEXT: vmovdqa64 %zmm2, %zmm0
; AVX512BW-NEXT: ret{{[l\|q]}}
entry:		entry:
%t2 = add nsw <16 x i32> %t0, %t1		%t2 = add nsw <16 x i32> %t0, %t1
%t3 = sub nsw <16 x i32> %t0, %t1		%t3 = sub nsw <16 x i32> %t0, %t1
%t4 = shufflevector <16 x i32> %t2, <16 x i32> %t3, <16 x i32> <i32 0, i32 1, i32 18, i32 19, i32 4, i32 5, i32 22, i32 23, i32 8, i32 9, i32 26, i32 27, i32 12, i32 13, i32 30, i32 31>		%t4 = shufflevector <16 x i32> %t2, <16 x i32> %t3, <16 x i32> <i32 0, i32 1, i32 18, i32 19, i32 4, i32 5, i32 22, i32 23, i32 8, i32 9, i32 26, i32 27, i32 12, i32 13, i32 30, i32 31>
ret <16 x i32> %t4		ret <16 x i32> %t4
}		}

define <8 x i32> @shuffle_v4i64(<8 x i32> %t0, <8 x i32> %t1) {		define <8 x i32> @shuffle_v4i64(<8 x i32> %t0, <8 x i32> %t1) {
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
; X64-AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm3		; X64-AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm3
; X64-AVX512F-NEXT: vpaddb %ymm2, %ymm3, %ymm2		; X64-AVX512F-NEXT: vpaddb %ymm2, %ymm3, %ymm2
; X64-AVX512F-NEXT: vpaddb %ymm1, %ymm0, %ymm3		; X64-AVX512F-NEXT: vpaddb %ymm1, %ymm0, %ymm3
; X64-AVX512F-NEXT: vinserti64x4 $1, %ymm2, %zmm3, %zmm2		; X64-AVX512F-NEXT: vinserti64x4 $1, %ymm2, %zmm3, %zmm2
; X64-AVX512F-NEXT: vpsubb %ymm1, %ymm0, %ymm0		; X64-AVX512F-NEXT: vpsubb %ymm1, %ymm0, %ymm0
; X64-AVX512F-NEXT: vpternlogq $216, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm0		; X64-AVX512F-NEXT: vpternlogq $216, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2, %zmm0
; X64-AVX512F-NEXT: retq		; X64-AVX512F-NEXT: retq
;		;
; AVX512BW-LABEL: addb_selectw_64xi8:		; X86-AVX512BW-LABEL: addb_selectw_64xi8:
; AVX512BW: # %bb.0:		; X86-AVX512BW: # %bb.0:
; AVX512BW-NEXT: vpaddb %zmm1, %zmm0, %zmm2		; X86-AVX512BW-NEXT: vpaddb %zmm1, %zmm0, %zmm2
; AVX512BW-NEXT: vpsubb %zmm1, %zmm0, %zmm0		; X86-AVX512BW-NEXT: movl $3, %eax
; AVX512BW-NEXT: movl $1, %eax		; X86-AVX512BW-NEXT: kmovd %eax, %k0
; AVX512BW-NEXT: kmovd %eax, %k1		; X86-AVX512BW-NEXT: kmovd %k0, %k1
; AVX512BW-NEXT: vmovdqu16 %zmm0, %zmm2 {%k1}		; X86-AVX512BW-NEXT: vpsubb %zmm1, %zmm0, %zmm2 {%k1}
; AVX512BW-NEXT: vmovdqa64 %zmm2, %zmm0		; X86-AVX512BW-NEXT: vmovdqa64 %zmm2, %zmm0
; AVX512BW-NEXT: ret{{[l\|q]}}		; X86-AVX512BW-NEXT: retl
		LuoYuankeAuthorUnsubmitted Done Reply Inline Actions `retl` can be merged to `retq` by `ret{{[l\|q]}}`. Not sure why utils/update_llc_test_checks.py doesn't merge. LuoYuanke: `retl` can be merged to `retq` by `ret{{[l\|q]}}`. Not sure why utils/update_llc_test_checks.py…
		RKSimonUnsubmitted Not Done Reply Inline Actions "kmovd %eax" vs "kmovq %rax" RKSimon: "kmovd %eax" vs "kmovq %rax"
		LuoYuankeAuthorUnsubmitted Done Reply Inline Actions Got it. :) LuoYuanke: Got it. :)
		;
		; X64-AVX512BW-LABEL: addb_selectw_64xi8:
		; X64-AVX512BW: # %bb.0:
		; X64-AVX512BW-NEXT: vpaddb %zmm1, %zmm0, %zmm2
		; X64-AVX512BW-NEXT: movl $3, %eax
		; X64-AVX512BW-NEXT: kmovq %rax, %k1
		; X64-AVX512BW-NEXT: vpsubb %zmm1, %zmm0, %zmm2 {%k1}
		; X64-AVX512BW-NEXT: vmovdqa64 %zmm2, %zmm0
		; X64-AVX512BW-NEXT: retq
%t2 = add nsw <64 x i8> %t0, %t1		%t2 = add nsw <64 x i8> %t0, %t1
%t3 = sub nsw <64 x i8> %t0, %t1		%t3 = sub nsw <64 x i8> %t0, %t1
%t4 = shufflevector <64 x i8> %t2, <64 x i8> %t3, <64 x i32> <i32 64, i32 65, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47, i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>		%t4 = shufflevector <64 x i8> %t2, <64 x i8> %t3, <64 x i32> <i32 64, i32 65, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47, i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>
ret <64 x i8> %t4		ret <64 x i8> %t4
}		}

define <32 x i8> @addb_selectw_32xi8(<32 x i8> %t0, <32 x i8> %t1) {		define <32 x i8> @addb_selectw_32xi8(<32 x i8> %t0, <32 x i8> %t1) {
; CHECK-LABEL: addb_selectw_32xi8:		; CHECK-LABEL: addb_selectw_32xi8:
Show All 9 Lines	; CHECK-NEXT: ret{{[l\|q]}}
ret <32 x i8> %t4		ret <32 x i8> %t4
}		}

define <16 x i8> @addb_selectw_16xi8(<16 x i8> %t0, <16 x i8> %t1) {		define <16 x i8> @addb_selectw_16xi8(<16 x i8> %t0, <16 x i8> %t1) {
; CHECK-LABEL: addb_selectw_16xi8:		; CHECK-LABEL: addb_selectw_16xi8:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vpaddb %xmm1, %xmm0, %xmm2		; CHECK-NEXT: vpaddb %xmm1, %xmm0, %xmm2
; CHECK-NEXT: vpsubb %xmm1, %xmm0, %xmm0		; CHECK-NEXT: vpsubb %xmm1, %xmm0, %xmm0
; CHECK-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0],xmm2[1,2,3,4,5,6,7]		; CHECK-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0],xmm2[1,2,3,4,5,6,7]
		LuoYuankeAuthorUnsubmitted Done Reply Inline Actions For this case not sure if it is worse than left side code (previous code). LuoYuanke: For this case not sure if it is worse than left side code (previous code).
		RKSimonUnsubmitted Not Done Reply Inline Actions Add a 128-bit vector limit? RKSimon: Add a 128-bit vector limit?
		LuoYuankeAuthorUnsubmitted Done Reply Inline Actions I'll add 128-bit vector limit. However in this case it is 128-bit vector. LuoYuanke: I'll add 128-bit vector limit. However in this case it is 128-bit vector.
; CHECK-NEXT: ret{{[l\|q]}}		; CHECK-NEXT: ret{{[l\|q]}}
%t2 = add nsw <16 x i8> %t0, %t1		%t2 = add nsw <16 x i8> %t0, %t1
%t3 = sub nsw <16 x i8> %t0, %t1		%t3 = sub nsw <16 x i8> %t0, %t1
%t4 = shufflevector <16 x i8> %t2, <16 x i8> %t3, <16 x i32> <i32 16, i32 17, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%t4 = shufflevector <16 x i8> %t2, <16 x i8> %t3, <16 x i32> <i32 16, i32 17, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
ret <16 x i8> %t4		ret <16 x i8> %t4
}		}

define <8 x i8> @addb_selectw_8xi8(<8 x i8> %t0, <8 x i8> %t1) {		define <8 x i8> @addb_selectw_8xi8(<8 x i8> %t0, <8 x i8> %t1) {
; CHECK-LABEL: addb_selectw_8xi8:		; CHECK-LABEL: addb_selectw_8xi8:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vpaddb %xmm1, %xmm0, %xmm2		; CHECK-NEXT: vpaddb %xmm1, %xmm0, %xmm2
; CHECK-NEXT: vpsubb %xmm1, %xmm0, %xmm0		; CHECK-NEXT: vpsubb %xmm1, %xmm0, %xmm0
; CHECK-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0],xmm2[1,2,3,4,5,6,7]		; CHECK-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0],xmm2[1,2,3,4,5,6,7]
; CHECK-NEXT: ret{{[l\|q]}}		; CHECK-NEXT: ret{{[l\|q]}}
%t2 = add nsw <8 x i8> %t0, %t1		%t2 = add nsw <8 x i8> %t0, %t1
%t3 = sub nsw <8 x i8> %t0, %t1		%t3 = sub nsw <8 x i8> %t0, %t1
%t4 = shufflevector <8 x i8> %t2, <8 x i8> %t3, <8 x i32> <i32 8, i32 9, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		%t4 = shufflevector <8 x i8> %t2, <8 x i8> %t3, <8 x i32> <i32 8, i32 9, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
ret <8 x i8> %t4		ret <8 x i8> %t4
}		}

define <32 x i16> @addw_selectd_32xi16(<32 x i16> %t0, <32 x i16> %t1) {		define <32 x i16> @addw_selectd_32xi16(<32 x i16> %t0, <32 x i16> %t1) {
; AVX512F-LABEL: addw_selectd_32xi16:		; AVX512F-LABEL: addw_selectd_32xi16:
		RKSimonUnsubmitted Not Done Reply Inline Actions regression? you might need to improve the 128-bit limit logic to account for vXi16 specifically RKSimon: regression? you might need to improve the 128-bit limit logic to account for vXi16 specifically
		LuoYuankeAuthorUnsubmitted Done Reply Inline Actions There is PBLENDW, but there is no PBLENDB instruction, so it is better to widen to v2Xi16 from vXi8. Besides there is more instruction for 16-bit element (e.g., movsh). I'll investigate more on this issue. LuoYuanke: There is PBLENDW, but there is no PBLENDB instruction, so it is better to widen to v2Xi16 from…
; AVX512F: # %bb.0:		; AVX512F: # %bb.0:
; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm2		; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm2
; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm3		; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm3
; AVX512F-NEXT: vpaddw %ymm2, %ymm3, %ymm2		; AVX512F-NEXT: vpaddw %ymm2, %ymm3, %ymm2
; AVX512F-NEXT: vpaddw %ymm1, %ymm0, %ymm3		; AVX512F-NEXT: vpaddw %ymm1, %ymm0, %ymm3
; AVX512F-NEXT: vinserti64x4 $1, %ymm2, %zmm3, %zmm2		; AVX512F-NEXT: vinserti64x4 $1, %ymm2, %zmm3, %zmm2
; AVX512F-NEXT: vpsubw %ymm1, %ymm0, %ymm0		; AVX512F-NEXT: vpsubw %ymm1, %ymm0, %ymm0
; AVX512F-NEXT: movw $1, %ax		; AVX512F-NEXT: movw $1, %ax
; AVX512F-NEXT: kmovw %eax, %k1		; AVX512F-NEXT: kmovw %eax, %k1
; AVX512F-NEXT: vmovdqa32 %zmm0, %zmm2 {%k1}		; AVX512F-NEXT: vmovdqa32 %zmm0, %zmm2 {%k1}
; AVX512F-NEXT: vmovdqa64 %zmm2, %zmm0		; AVX512F-NEXT: vmovdqa64 %zmm2, %zmm0
; AVX512F-NEXT: ret{{[l\|q]}}		; AVX512F-NEXT: ret{{[l\|q]}}
;		;
; AVX512BW-LABEL: addw_selectd_32xi16:		; AVX512BW-LABEL: addw_selectd_32xi16:
; AVX512BW: # %bb.0:		; AVX512BW: # %bb.0:
; AVX512BW-NEXT: vpaddw %zmm1, %zmm0, %zmm2		; AVX512BW-NEXT: vpaddw %zmm1, %zmm0, %zmm2
; AVX512BW-NEXT: vpsubw %zmm1, %zmm0, %zmm0		; AVX512BW-NEXT: movl $3, %eax
; AVX512BW-NEXT: movw $1, %ax
; AVX512BW-NEXT: kmovd %eax, %k1		; AVX512BW-NEXT: kmovd %eax, %k1
; AVX512BW-NEXT: vmovdqa32 %zmm0, %zmm2 {%k1}		; AVX512BW-NEXT: vpsubw %zmm1, %zmm0, %zmm2 {%k1}
; AVX512BW-NEXT: vmovdqa64 %zmm2, %zmm0		; AVX512BW-NEXT: vmovdqa64 %zmm2, %zmm0
; AVX512BW-NEXT: ret{{[l\|q]}}		; AVX512BW-NEXT: ret{{[l\|q]}}
%t2 = add nsw <32 x i16> %t0, %t1		%t2 = add nsw <32 x i16> %t0, %t1
%t3 = sub nsw <32 x i16> %t0, %t1		%t3 = sub nsw <32 x i16> %t0, %t1
%t4 = shufflevector <32 x i16> %t2, <32 x i16> %t3, <32 x i32> <i32 32, i32 33, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>		%t4 = shufflevector <32 x i16> %t2, <32 x i16> %t3, <32 x i32> <i32 32, i32 33, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
ret <32 x i16> %t4		ret <32 x i16> %t4
}		}

Show All 9 Lines	; CHECK-NEXT: ret{{[l\|q]}}
%t4 = shufflevector <16 x i16> %t2, <16 x i16> %t3, <16 x i32> <i32 16, i32 17, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%t4 = shufflevector <16 x i16> %t2, <16 x i16> %t3, <16 x i32> <i32 16, i32 17, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
ret <16 x i16> %t4		ret <16 x i16> %t4
}		}

define <16 x i32> @addd_selectq_16xi32(<16 x i32> %t0, <16 x i32> %t1) {		define <16 x i32> @addd_selectq_16xi32(<16 x i32> %t0, <16 x i32> %t1) {
; AVX512F-LABEL: addd_selectq_16xi32:		; AVX512F-LABEL: addd_selectq_16xi32:
; AVX512F: # %bb.0:		; AVX512F: # %bb.0:
; AVX512F-NEXT: vpaddd %zmm1, %zmm0, %zmm2		; AVX512F-NEXT: vpaddd %zmm1, %zmm0, %zmm2
; AVX512F-NEXT: vpsubd %zmm1, %zmm0, %zmm0		; AVX512F-NEXT: movw $3, %ax
; AVX512F-NEXT: movb $1, %al
; AVX512F-NEXT: kmovw %eax, %k1		; AVX512F-NEXT: kmovw %eax, %k1
; AVX512F-NEXT: vmovdqa64 %zmm0, %zmm2 {%k1}		; AVX512F-NEXT: vpsubd %zmm1, %zmm0, %zmm2 {%k1}
; AVX512F-NEXT: vmovdqa64 %zmm2, %zmm0		; AVX512F-NEXT: vmovdqa64 %zmm2, %zmm0
; AVX512F-NEXT: ret{{[l\|q]}}		; AVX512F-NEXT: ret{{[l\|q]}}
;		;
; AVX512BW-LABEL: addd_selectq_16xi32:		; AVX512BW-LABEL: addd_selectq_16xi32:
; AVX512BW: # %bb.0:		; AVX512BW: # %bb.0:
; AVX512BW-NEXT: vpaddd %zmm1, %zmm0, %zmm2		; AVX512BW-NEXT: vpaddd %zmm1, %zmm0, %zmm2
; AVX512BW-NEXT: vpsubd %zmm1, %zmm0, %zmm0		; AVX512BW-NEXT: movw $3, %ax
; AVX512BW-NEXT: movb $1, %al
; AVX512BW-NEXT: kmovd %eax, %k1		; AVX512BW-NEXT: kmovd %eax, %k1
; AVX512BW-NEXT: vmovdqa64 %zmm0, %zmm2 {%k1}		; AVX512BW-NEXT: vpsubd %zmm1, %zmm0, %zmm2 {%k1}
; AVX512BW-NEXT: vmovdqa64 %zmm2, %zmm0		; AVX512BW-NEXT: vmovdqa64 %zmm2, %zmm0
; AVX512BW-NEXT: ret{{[l\|q]}}		; AVX512BW-NEXT: ret{{[l\|q]}}
%t2 = add nsw <16 x i32> %t0, %t1		%t2 = add nsw <16 x i32> %t0, %t1
%t3 = sub nsw <16 x i32> %t0, %t1		%t3 = sub nsw <16 x i32> %t0, %t1
%t4 = shufflevector <16 x i32> %t2, <16 x i32> %t3, <16 x i32> <i32 16, i32 17, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%t4 = shufflevector <16 x i32> %t2, <16 x i32> %t3, <16 x i32> <i32 16, i32 17, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>

ret <16 x i32> %t4		ret <16 x i32> %t4
}		}
Show All 19 Lines
; CHECK-NEXT: vpsubd %xmm1, %xmm0, %xmm0		; CHECK-NEXT: vpsubd %xmm1, %xmm0, %xmm0
; CHECK-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0,1],xmm2[2,3]		; CHECK-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0,1],xmm2[2,3]
; CHECK-NEXT: ret{{[l\|q]}}		; CHECK-NEXT: ret{{[l\|q]}}
%t2 = add nsw <4 x i32> %t0, %t1		%t2 = add nsw <4 x i32> %t0, %t1
%t3 = sub nsw <4 x i32> %t0, %t1		%t3 = sub nsw <4 x i32> %t0, %t1
%t4 = shufflevector <4 x i32> %t2, <4 x i32> %t3, <4 x i32> <i32 4, i32 5, i32 2, i32 3>		%t4 = shufflevector <4 x i32> %t2, <4 x i32> %t3, <4 x i32> <i32 4, i32 5, i32 2, i32 3>

ret <4 x i32> %t4		ret <4 x i32> %t4
}		}
		RKSimonUnsubmitted Not Done Reply Inline Actions pre-commit these additional tests RKSimon: pre-commit these additional tests
		LuoYuankeAuthorUnsubmitted Done Reply Inline Actions Sure. I'll do it. LuoYuanke: Sure. I'll do it.

llvm/test/CodeGen/X86/combine-sdiv.ll

	Show First 20 Lines • Show All 2,883 Lines • ▼ Show 20 Lines
	;			;
	; AVX1-LABEL: combine_vec_sdiv_nonuniform7:			; AVX1-LABEL: combine_vec_sdiv_nonuniform7:
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vpxor %xmm1, %xmm1, %xmm1			; AVX1-NEXT: vpxor %xmm1, %xmm1, %xmm1
	; AVX1-NEXT: vpsubw %xmm0, %xmm1, %xmm1			; AVX1-NEXT: vpsubw %xmm0, %xmm1, %xmm1
	; AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm1[0,1,2,3],xmm0[4,5,6,7]			; AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm1[0,1,2,3],xmm0[4,5,6,7]
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2ORLATER-LABEL: combine_vec_sdiv_nonuniform7:			; AVX2-LABEL: combine_vec_sdiv_nonuniform7:
	; AVX2ORLATER: # %bb.0:			; AVX2: # %bb.0:
	; AVX2ORLATER-NEXT: vpxor %xmm1, %xmm1, %xmm1			; AVX2-NEXT: vpxor %xmm1, %xmm1, %xmm1
	; AVX2ORLATER-NEXT: vpsubw %xmm0, %xmm1, %xmm1			; AVX2-NEXT: vpsubw %xmm0, %xmm1, %xmm1
	; AVX2ORLATER-NEXT: vpblendd {{.*#+}} xmm0 = xmm1[0,1],xmm0[2,3]			; AVX2-NEXT: vpblendd {{.*#+}} xmm0 = xmm1[0,1],xmm0[2,3]
	; AVX2ORLATER-NEXT: retq			; AVX2-NEXT: retq
				;
				; AVX512F-LABEL: combine_vec_sdiv_nonuniform7:
				; AVX512F: # %bb.0:
				; AVX512F-NEXT: vpxor %xmm1, %xmm1, %xmm1
				; AVX512F-NEXT: vpsubw %xmm0, %xmm1, %xmm1
				; AVX512F-NEXT: vpblendd {{.*#+}} xmm0 = xmm1[0,1],xmm0[2,3]
				; AVX512F-NEXT: retq
				;
				; AVX512BW-LABEL: combine_vec_sdiv_nonuniform7:
				; AVX512BW: # %bb.0:
				; AVX512BW-NEXT: vpxor %xmm1, %xmm1, %xmm1
				; AVX512BW-NEXT: vpsubw %xmm0, %xmm1, %xmm1
				; AVX512BW-NEXT: vpblendw {{.*#+}} xmm0 = xmm1[0,1,2,3],xmm0[4,5,6,7]
				; AVX512BW-NEXT: retq
	;			;
	; XOP-LABEL: combine_vec_sdiv_nonuniform7:			; XOP-LABEL: combine_vec_sdiv_nonuniform7:
	; XOP: # %bb.0:			; XOP: # %bb.0:
	; XOP-NEXT: vpxor %xmm1, %xmm1, %xmm1			; XOP-NEXT: vpxor %xmm1, %xmm1, %xmm1
	; XOP-NEXT: vpsubw %xmm0, %xmm1, %xmm1			; XOP-NEXT: vpsubw %xmm0, %xmm1, %xmm1
	; XOP-NEXT: vpblendw {{.*#+}} xmm0 = xmm1[0,1,2,3],xmm0[4,5,6,7]			; XOP-NEXT: vpblendw {{.*#+}} xmm0 = xmm1[0,1,2,3],xmm0[4,5,6,7]
	; XOP-NEXT: retq			; XOP-NEXT: retq
	%1 = sdiv <8 x i16> %x, <i16 -1, i16 -1, i16 -1, i16 -1, i16 1, i16 1, i16 1, i16 1>			%1 = sdiv <8 x i16> %x, <i16 -1, i16 -1, i16 -1, i16 -1, i16 1, i16 1, i16 1, i16 1>
	▲ Show 20 Lines • Show All 292 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/haddsub-undef.ll

	Show First 20 Lines • Show All 967 Lines • ▼ Show 20 Lines
	; SSE-SLOW-NEXT: retq			; SSE-SLOW-NEXT: retq
	;			;
	; SSE-FAST-LABEL: PR45747_2:			; SSE-FAST-LABEL: PR45747_2:
	; SSE-FAST: # %bb.0:			; SSE-FAST: # %bb.0:
	; SSE-FAST-NEXT: haddps %xmm1, %xmm1			; SSE-FAST-NEXT: haddps %xmm1, %xmm1
	; SSE-FAST-NEXT: movshdup {{.*#+}} xmm0 = xmm1[1,1,3,3]			; SSE-FAST-NEXT: movshdup {{.*#+}} xmm0 = xmm1[1,1,3,3]
	; SSE-FAST-NEXT: retq			; SSE-FAST-NEXT: retq
	;			;
	; AVX-SLOW-LABEL: PR45747_2:			; AVX1-SLOW-LABEL: PR45747_2:
	; AVX-SLOW: # %bb.0:			; AVX1-SLOW: # %bb.0:
	; AVX-SLOW-NEXT: vpermilpd {{.*#+}} xmm0 = xmm1[1,0]			; AVX1-SLOW-NEXT: vpermilpd {{.*#+}} xmm0 = xmm1[1,0]
	; AVX-SLOW-NEXT: vpermilps {{.*#+}} xmm1 = xmm1[3,3,1,1]			; AVX1-SLOW-NEXT: vpermilps {{.*#+}} xmm1 = xmm1[3,3,1,1]
	; AVX-SLOW-NEXT: vaddps %xmm0, %xmm1, %xmm0			; AVX1-SLOW-NEXT: vaddps %xmm0, %xmm1, %xmm0
	; AVX-SLOW-NEXT: retq			; AVX1-SLOW-NEXT: retq
	;			;
	; AVX-FAST-LABEL: PR45747_2:			; AVX-FAST-LABEL: PR45747_2:
	; AVX-FAST: # %bb.0:			; AVX-FAST: # %bb.0:
	; AVX-FAST-NEXT: vhaddps %xmm1, %xmm1, %xmm0			; AVX-FAST-NEXT: vhaddps %xmm1, %xmm1, %xmm0
	; AVX-FAST-NEXT: vmovshdup {{.*#+}} xmm0 = xmm0[1,1,3,3]			; AVX-FAST-NEXT: vmovshdup {{.*#+}} xmm0 = xmm0[1,1,3,3]
	; AVX-FAST-NEXT: retq			; AVX-FAST-NEXT: retq
				;
				; AVX512-SLOW-LABEL: PR45747_2:
				; AVX512-SLOW: # %bb.0:
				; AVX512-SLOW-NEXT: vpermilps {{.*#+}} xmm0 = xmm1[2,2,2,2]
				; AVX512-SLOW-NEXT: vpermilps {{.*#+}} xmm1 = xmm1[3,3,3,3]
				; AVX512-SLOW-NEXT: vaddps %xmm0, %xmm1, %xmm0
				; AVX512-SLOW-NEXT: retq
	%t0 = shufflevector <4 x float> %b, <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 3, i32 undef>			%t0 = shufflevector <4 x float> %b, <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 3, i32 undef>
	%t1 = fadd <4 x float> %t0, %b			%t1 = fadd <4 x float> %t0, %b
	%shuffle = shufflevector <4 x float> %t1, <4 x float> undef, <4 x i32> <i32 2, i32 undef, i32 undef, i32 undef>			%shuffle = shufflevector <4 x float> %t1, <4 x float> undef, <4 x i32> <i32 2, i32 undef, i32 undef, i32 undef>
	ret <4 x float> %shuffle			ret <4 x float> %shuffle
	}			}

	define <4 x float> @PR34724_add_v4f32_u123(<4 x float> %0, <4 x float> %1) {			define <4 x float> @PR34724_add_v4f32_u123(<4 x float> %0, <4 x float> %1) {
	; SSE-LABEL: PR34724_add_v4f32_u123:			; SSE-LABEL: PR34724_add_v4f32_u123:
	Show All 28 Lines
	; SSE-SLOW-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,1],xmm2[2,0]			; SSE-SLOW-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,1],xmm2[2,0]
	; SSE-SLOW-NEXT: retq			; SSE-SLOW-NEXT: retq
	;			;
	; SSE-FAST-LABEL: PR34724_add_v4f32_0u23:			; SSE-FAST-LABEL: PR34724_add_v4f32_0u23:
	; SSE-FAST: # %bb.0:			; SSE-FAST: # %bb.0:
	; SSE-FAST-NEXT: haddps %xmm1, %xmm0			; SSE-FAST-NEXT: haddps %xmm1, %xmm0
	; SSE-FAST-NEXT: retq			; SSE-FAST-NEXT: retq
	;			;
	; AVX-SLOW-LABEL: PR34724_add_v4f32_0u23:			; AVX1-SLOW-LABEL: PR34724_add_v4f32_0u23:
	; AVX-SLOW: # %bb.0:			; AVX1-SLOW: # %bb.0:
	; AVX-SLOW-NEXT: vshufps {{.*#+}} xmm2 = xmm0[0,1],xmm1[0,3]			; AVX1-SLOW-NEXT: vshufps {{.*#+}} xmm2 = xmm0[0,1],xmm1[0,3]
	; AVX-SLOW-NEXT: vshufps {{.*#+}} xmm0 = xmm0[1,1],xmm1[1,2]			; AVX1-SLOW-NEXT: vshufps {{.*#+}} xmm0 = xmm0[1,1],xmm1[1,2]
	; AVX-SLOW-NEXT: vaddps %xmm2, %xmm0, %xmm0			; AVX1-SLOW-NEXT: vaddps %xmm2, %xmm0, %xmm0
	; AVX-SLOW-NEXT: retq			; AVX1-SLOW-NEXT: retq
	;			;
	; AVX-FAST-LABEL: PR34724_add_v4f32_0u23:			; AVX-FAST-LABEL: PR34724_add_v4f32_0u23:
	; AVX-FAST: # %bb.0:			; AVX-FAST: # %bb.0:
	; AVX-FAST-NEXT: vhaddps %xmm1, %xmm0, %xmm0			; AVX-FAST-NEXT: vhaddps %xmm1, %xmm0, %xmm0
	; AVX-FAST-NEXT: retq			; AVX-FAST-NEXT: retq
				;
				; AVX512-SLOW-LABEL: PR34724_add_v4f32_0u23:
				; AVX512-SLOW: # %bb.0:
				; AVX512-SLOW-NEXT: vmovshdup {{.*#+}} xmm2 = xmm0[1,1,3,3]
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions This looks a regression, I'll take a look at it. LuoYuanke: This looks a regression, I'll take a look at it.
				; AVX512-SLOW-NEXT: vaddps %xmm0, %xmm2, %xmm0
				; AVX512-SLOW-NEXT: vmovshdup {{.*#+}} xmm2 = xmm1[1,1,3,3]
				; AVX512-SLOW-NEXT: vaddps %xmm1, %xmm2, %xmm2
				; AVX512-SLOW-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],zero,xmm2[0],zero
				; AVX512-SLOW-NEXT: vmovsldup {{.*#+}} xmm2 = xmm1[0,0,2,2]
				; AVX512-SLOW-NEXT: vaddps %xmm1, %xmm2, %xmm1
				; AVX512-SLOW-NEXT: vblendps {{.*#+}} xmm0 = xmm0[0,1,2],xmm1[3]
				; AVX512-SLOW-NEXT: retq
	%3 = shufflevector <4 x float> %0, <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			%3 = shufflevector <4 x float> %0, <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	%4 = fadd <4 x float> %3, %0			%4 = fadd <4 x float> %3, %0
	%5 = shufflevector <4 x float> %1, <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			%5 = shufflevector <4 x float> %1, <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	%6 = fadd <4 x float> %5, %1			%6 = fadd <4 x float> %5, %1
	%7 = shufflevector <4 x float> %4, <4 x float> %6, <4 x i32> <i32 0, i32 undef, i32 4, i32 undef>			%7 = shufflevector <4 x float> %4, <4 x float> %6, <4 x i32> <i32 0, i32 undef, i32 4, i32 undef>
	%8 = shufflevector <4 x float> %1, <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 undef, i32 2>			%8 = shufflevector <4 x float> %1, <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 undef, i32 2>
	%9 = fadd <4 x float> %8, %1			%9 = fadd <4 x float> %8, %1
	%10 = shufflevector <4 x float> %7, <4 x float> %9, <4 x i32> <i32 0, i32 undef, i32 2, i32 7>			%10 = shufflevector <4 x float> %7, <4 x float> %9, <4 x i32> <i32 0, i32 undef, i32 2, i32 7>
	Show All 9 Lines
	; SSE-SLOW-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,1],xmm2[2,3]			; SSE-SLOW-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,1],xmm2[2,3]
	; SSE-SLOW-NEXT: retq			; SSE-SLOW-NEXT: retq
	;			;
	; SSE-FAST-LABEL: PR34724_add_v4f32_01u3:			; SSE-FAST-LABEL: PR34724_add_v4f32_01u3:
	; SSE-FAST: # %bb.0:			; SSE-FAST: # %bb.0:
	; SSE-FAST-NEXT: haddps %xmm1, %xmm0			; SSE-FAST-NEXT: haddps %xmm1, %xmm0
	; SSE-FAST-NEXT: retq			; SSE-FAST-NEXT: retq
	;			;
	; AVX-SLOW-LABEL: PR34724_add_v4f32_01u3:			; AVX1-SLOW-LABEL: PR34724_add_v4f32_01u3:
	; AVX-SLOW: # %bb.0:			; AVX1-SLOW: # %bb.0:
	; AVX-SLOW-NEXT: vhaddps %xmm0, %xmm0, %xmm0			; AVX1-SLOW-NEXT: vhaddps %xmm0, %xmm0, %xmm0
	; AVX-SLOW-NEXT: vmovsldup {{.*#+}} xmm2 = xmm1[0,0,2,2]			; AVX1-SLOW-NEXT: vmovsldup {{.*#+}} xmm2 = xmm1[0,0,2,2]
	; AVX-SLOW-NEXT: vaddps %xmm1, %xmm2, %xmm1			; AVX1-SLOW-NEXT: vaddps %xmm1, %xmm2, %xmm1
	; AVX-SLOW-NEXT: vblendps {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,3]			; AVX1-SLOW-NEXT: vblendps {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,3]
	; AVX-SLOW-NEXT: retq			; AVX1-SLOW-NEXT: retq
	;			;
	; AVX-FAST-LABEL: PR34724_add_v4f32_01u3:			; AVX-FAST-LABEL: PR34724_add_v4f32_01u3:
	; AVX-FAST: # %bb.0:			; AVX-FAST: # %bb.0:
	; AVX-FAST-NEXT: vhaddps %xmm1, %xmm0, %xmm0			; AVX-FAST-NEXT: vhaddps %xmm1, %xmm0, %xmm0
	; AVX-FAST-NEXT: retq			; AVX-FAST-NEXT: retq
				;
				; AVX512-SLOW-LABEL: PR34724_add_v4f32_01u3:
				; AVX512-SLOW: # %bb.0:
				; AVX512-SLOW-NEXT: vhaddps %xmm0, %xmm0, %xmm0
				; AVX512-SLOW-NEXT: vmovsldup {{.*#+}} xmm2 = xmm1[0,0,2,2]
				; AVX512-SLOW-NEXT: vaddps %xmm1, %xmm2, %xmm1
				; AVX512-SLOW-NEXT: vblendps {{.*#+}} xmm0 = xmm0[0,1,2],xmm1[3]
				; AVX512-SLOW-NEXT: retq
	%3 = shufflevector <4 x float> %0, <4 x float> undef, <2 x i32> <i32 0, i32 2>			%3 = shufflevector <4 x float> %0, <4 x float> undef, <2 x i32> <i32 0, i32 2>
	%4 = shufflevector <4 x float> %0, <4 x float> undef, <2 x i32> <i32 1, i32 3>			%4 = shufflevector <4 x float> %0, <4 x float> undef, <2 x i32> <i32 1, i32 3>
	%5 = fadd <2 x float> %3, %4			%5 = fadd <2 x float> %3, %4
	%6 = shufflevector <2 x float> %5, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			%6 = shufflevector <2 x float> %5, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	%7 = shufflevector <4 x float> %1, <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 undef, i32 2>			%7 = shufflevector <4 x float> %1, <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 undef, i32 2>
	%8 = fadd <4 x float> %7, %1			%8 = fadd <4 x float> %7, %1
	%9 = shufflevector <4 x float> %6, <4 x float> %8, <4 x i32> <i32 0, i32 1, i32 undef, i32 7>			%9 = shufflevector <4 x float> %6, <4 x float> %8, <4 x i32> <i32 0, i32 1, i32 undef, i32 7>
	ret <4 x float> %9			ret <4 x float> %9
	}			}

	define <4 x float> @PR34724_add_v4f32_012u(<4 x float> %0, <4 x float> %1) {			define <4 x float> @PR34724_add_v4f32_012u(<4 x float> %0, <4 x float> %1) {
	; SSE-SLOW-LABEL: PR34724_add_v4f32_012u:			; SSE-SLOW-LABEL: PR34724_add_v4f32_012u:
	; SSE-SLOW: # %bb.0:			; SSE-SLOW: # %bb.0:
	; SSE-SLOW-NEXT: haddps %xmm0, %xmm0			; SSE-SLOW-NEXT: haddps %xmm0, %xmm0
	; SSE-SLOW-NEXT: movshdup {{.*#+}} xmm2 = xmm1[1,1,3,3]			; SSE-SLOW-NEXT: movshdup {{.*#+}} xmm2 = xmm1[1,1,3,3]
	; SSE-SLOW-NEXT: addps %xmm1, %xmm2			; SSE-SLOW-NEXT: addps %xmm1, %xmm2
	; SSE-SLOW-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0],xmm2[0]			; SSE-SLOW-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0],xmm2[0]
	; SSE-SLOW-NEXT: retq			; SSE-SLOW-NEXT: retq
	;			;
	; SSE-FAST-LABEL: PR34724_add_v4f32_012u:			; SSE-FAST-LABEL: PR34724_add_v4f32_012u:
	; SSE-FAST: # %bb.0:			; SSE-FAST: # %bb.0:
	; SSE-FAST-NEXT: haddps %xmm1, %xmm0			; SSE-FAST-NEXT: haddps %xmm1, %xmm0
	; SSE-FAST-NEXT: retq			; SSE-FAST-NEXT: retq
	;			;
	; AVX-SLOW-LABEL: PR34724_add_v4f32_012u:			; AVX1-SLOW-LABEL: PR34724_add_v4f32_012u:
	; AVX-SLOW: # %bb.0:			; AVX1-SLOW: # %bb.0:
	; AVX-SLOW-NEXT: vhaddps %xmm0, %xmm0, %xmm0			; AVX1-SLOW-NEXT: vhaddps %xmm0, %xmm0, %xmm0
	; AVX-SLOW-NEXT: vmovshdup {{.*#+}} xmm2 = xmm1[1,1,3,3]			; AVX1-SLOW-NEXT: vmovshdup {{.*#+}} xmm2 = xmm1[1,1,3,3]
	; AVX-SLOW-NEXT: vaddps %xmm1, %xmm2, %xmm1			; AVX1-SLOW-NEXT: vaddps %xmm1, %xmm2, %xmm1
	; AVX-SLOW-NEXT: vmovlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]			; AVX1-SLOW-NEXT: vmovlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; AVX-SLOW-NEXT: retq			; AVX1-SLOW-NEXT: retq
	;			;
	; AVX-FAST-LABEL: PR34724_add_v4f32_012u:			; AVX-FAST-LABEL: PR34724_add_v4f32_012u:
	; AVX-FAST: # %bb.0:			; AVX-FAST: # %bb.0:
	; AVX-FAST-NEXT: vhaddps %xmm1, %xmm0, %xmm0			; AVX-FAST-NEXT: vhaddps %xmm1, %xmm0, %xmm0
	; AVX-FAST-NEXT: retq			; AVX-FAST-NEXT: retq
				;
				; AVX512-SLOW-LABEL: PR34724_add_v4f32_012u:
				; AVX512-SLOW: # %bb.0:
				; AVX512-SLOW-NEXT: vhaddps %xmm0, %xmm0, %xmm0
				; AVX512-SLOW-NEXT: vmovshdup {{.*#+}} xmm2 = xmm1[1,1,3,3]
				; AVX512-SLOW-NEXT: vaddps %xmm1, %xmm2, %xmm1
				; AVX512-SLOW-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0],zero
				; AVX512-SLOW-NEXT: retq
	%3 = shufflevector <4 x float> %0, <4 x float> undef, <2 x i32> <i32 0, i32 2>			%3 = shufflevector <4 x float> %0, <4 x float> undef, <2 x i32> <i32 0, i32 2>
	%4 = shufflevector <4 x float> %0, <4 x float> undef, <2 x i32> <i32 1, i32 3>			%4 = shufflevector <4 x float> %0, <4 x float> undef, <2 x i32> <i32 1, i32 3>
	%5 = fadd <2 x float> %3, %4			%5 = fadd <2 x float> %3, %4
	%6 = shufflevector <2 x float> %5, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			%6 = shufflevector <2 x float> %5, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	%7 = shufflevector <4 x float> %1, <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			%7 = shufflevector <4 x float> %1, <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	%8 = fadd <4 x float> %7, %1			%8 = fadd <4 x float> %7, %1
	%9 = shufflevector <4 x float> %6, <4 x float> %8, <4 x i32> <i32 0, i32 1, i32 4, i32 undef>			%9 = shufflevector <4 x float> %6, <4 x float> %8, <4 x i32> <i32 0, i32 1, i32 4, i32 undef>
	ret <4 x float> %9			ret <4 x float> %9
	▲ Show 20 Lines • Show All 207 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vselect-avx512.ll

	Show All 11 Lines
	; CHECK-NEXT: vpshufd {{.*#+}} zmm1 = zmm0[1,0,3,2,5,4,7,6,9,8,11,10,13,12,15,14]			; CHECK-NEXT: vpshufd {{.*#+}} zmm1 = zmm0[1,0,3,2,5,4,7,6,9,8,11,10,13,12,15,14]
	; CHECK-NEXT: vpminsd %zmm0, %zmm1, %zmm2			; CHECK-NEXT: vpminsd %zmm0, %zmm1, %zmm2
	; CHECK-NEXT: movw $-21846, %ax # imm = 0xAAAA			; CHECK-NEXT: movw $-21846, %ax # imm = 0xAAAA
	; CHECK-NEXT: kmovw %eax, %k1			; CHECK-NEXT: kmovw %eax, %k1
	; CHECK-NEXT: vpmaxsd %zmm0, %zmm1, %zmm2 {%k1}			; CHECK-NEXT: vpmaxsd %zmm0, %zmm1, %zmm2 {%k1}
	; CHECK-NEXT: vpshufd {{.*#+}} zmm0 = zmm2[3,2,1,0,7,6,5,4,11,10,9,8,15,14,13,12]			; CHECK-NEXT: vpshufd {{.*#+}} zmm0 = zmm2[3,2,1,0,7,6,5,4,11,10,9,8,15,14,13,12]
	; CHECK-NEXT: vpminsd %zmm2, %zmm0, %zmm1			; CHECK-NEXT: vpminsd %zmm2, %zmm0, %zmm1
	; CHECK-NEXT: vpmaxsd %zmm2, %zmm0, %zmm0			; CHECK-NEXT: vpmaxsd %zmm2, %zmm0, %zmm0
	; CHECK-NEXT: movb $-86, %al			; CHECK-NEXT: vshufps {{.*#+}} zmm2 = zmm1[0,1],zmm0[2,3],zmm1[4,5],zmm0[6,7],zmm1[8,9],zmm0[10,11],zmm1[12,13],zmm0[14,15]
	; CHECK-NEXT: kmovw %eax, %k2			; CHECK-NEXT: vshufps {{.*#+}} zmm0 = zmm1[1,0],zmm0[3,2],zmm1[5,4],zmm0[7,6],zmm1[9,8],zmm0[11,10],zmm1[13,12],zmm0[15,14]
	; CHECK-NEXT: vmovdqa64 %zmm0, %zmm1 {%k2}			; CHECK-NEXT: vpminsd %zmm2, %zmm0, %zmm1
	; CHECK-NEXT: vpshufd {{.*#+}} zmm0 = zmm1[1,0,3,2,5,4,7,6,9,8,11,10,13,12,15,14]			; CHECK-NEXT: vpmaxsd %zmm2, %zmm0, %zmm1 {%k1}
	; CHECK-NEXT: vpminsd %zmm1, %zmm0, %zmm2			; CHECK-NEXT: vpshufd {{.*#+}} zmm0 = zmm1[3,2,1,0,7,6,5,4,11,10,9,8,15,14,13,12]
	; CHECK-NEXT: vpmaxsd %zmm1, %zmm0, %zmm2 {%k1}
	; CHECK-NEXT: vpshufd {{.*#+}} zmm0 = zmm2[3,2,1,0,7,6,5,4,11,10,9,8,15,14,13,12]
	; CHECK-NEXT: vpermq {{.*#+}} zmm0 = zmm0[2,3,0,1,6,7,4,5]			; CHECK-NEXT: vpermq {{.*#+}} zmm0 = zmm0[2,3,0,1,6,7,4,5]
				; CHECK-NEXT: vpminsd %zmm1, %zmm0, %zmm2
				; CHECK-NEXT: movw $-3856, %ax # imm = 0xF0F0
				; CHECK-NEXT: kmovw %eax, %k2
				; CHECK-NEXT: vpmaxsd %zmm1, %zmm0, %zmm2 {%k2}
				; CHECK-NEXT: vpshufd {{.*#+}} zmm0 = zmm2[2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13]
	; CHECK-NEXT: vpminsd %zmm2, %zmm0, %zmm1			; CHECK-NEXT: vpminsd %zmm2, %zmm0, %zmm1
	; CHECK-NEXT: vpmaxsd %zmm2, %zmm0, %zmm0			; CHECK-NEXT: vpmaxsd %zmm2, %zmm0, %zmm0
	; CHECK-NEXT: movb $-52, %al			; CHECK-NEXT: vshufps {{.*#+}} zmm2 = zmm1[0,1],zmm0[2,3],zmm1[4,5],zmm0[6,7],zmm1[8,9],zmm0[10,11],zmm1[12,13],zmm0[14,15]
	; CHECK-NEXT: kmovw %eax, %k3			; CHECK-NEXT: vshufps {{.*#+}} zmm0 = zmm1[1,0],zmm0[3,2],zmm1[5,4],zmm0[7,6],zmm1[9,8],zmm0[11,10],zmm1[13,12],zmm0[15,14]
	; CHECK-NEXT: vmovdqa64 %zmm0, %zmm1 {%k3}
	; CHECK-NEXT: vpshufd {{.*#+}} zmm0 = zmm1[2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13]
	; CHECK-NEXT: vpminsd %zmm1, %zmm0, %zmm2
	; CHECK-NEXT: vpmaxsd %zmm1, %zmm0, %zmm0
	; CHECK-NEXT: vmovdqa64 %zmm0, %zmm2 {%k2}
	; CHECK-NEXT: vpshufd {{.*#+}} zmm0 = zmm2[1,0,3,2,5,4,7,6,9,8,11,10,13,12,15,14]
	; CHECK-NEXT: vpminsd %zmm2, %zmm0, %zmm1			; CHECK-NEXT: vpminsd %zmm2, %zmm0, %zmm1
	; CHECK-NEXT: vpmaxsd %zmm2, %zmm0, %zmm1 {%k1}			; CHECK-NEXT: vpmaxsd %zmm2, %zmm0, %zmm1 {%k1}
	; CHECK-NEXT: vpshufd {{.*#+}} zmm0 = zmm1[3,2,1,0,7,6,5,4,11,10,9,8,15,14,13,12]			; CHECK-NEXT: vpshufd {{.*#+}} zmm0 = zmm1[3,2,1,0,7,6,5,4,11,10,9,8,15,14,13,12]
	; CHECK-NEXT: vshufi64x2 {{.*#+}} zmm0 = zmm0[4,5,6,7,0,1,2,3]			; CHECK-NEXT: vshufi64x2 {{.*#+}} zmm0 = zmm0[4,5,6,7,0,1,2,3]
	; CHECK-NEXT: vpminsd %zmm1, %zmm0, %zmm2			; CHECK-NEXT: vpminsd %zmm1, %zmm0, %zmm2
	; CHECK-NEXT: vpmaxsd %zmm1, %zmm0, %zmm0			; CHECK-NEXT: vpmaxsd %zmm1, %zmm0, %zmm0
	; CHECK-NEXT: vshufi64x2 {{.*#+}} zmm1 = zmm2[2,3,0,1],zmm0[6,7,4,5]			; CHECK-NEXT: vshufi64x2 {{.*#+}} zmm1 = zmm2[2,3,0,1],zmm0[6,7,4,5]
	; CHECK-NEXT: vshufi64x2 {{.*#+}} zmm0 = zmm2[0,1,2,3],zmm0[4,5,6,7]			; CHECK-NEXT: vshufi64x2 {{.*#+}} zmm0 = zmm2[0,1,2,3],zmm0[4,5,6,7]
	; CHECK-NEXT: vpminsd %zmm0, %zmm1, %zmm2			; CHECK-NEXT: vpminsd %zmm0, %zmm1, %zmm2
	; CHECK-NEXT: vpmaxsd %zmm0, %zmm1, %zmm0			; CHECK-NEXT: vpmaxsd %zmm0, %zmm1, %zmm2 {%k2}
	; CHECK-NEXT: vmovdqa64 %zmm0, %zmm2 {%k3}
	; CHECK-NEXT: vpshufd {{.*#+}} zmm0 = zmm2[2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13]			; CHECK-NEXT: vpshufd {{.*#+}} zmm0 = zmm2[2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13]
	; CHECK-NEXT: vpminsd %zmm2, %zmm0, %zmm1			; CHECK-NEXT: vpminsd %zmm2, %zmm0, %zmm1
	; CHECK-NEXT: vpmaxsd %zmm2, %zmm0, %zmm0			; CHECK-NEXT: vpmaxsd %zmm2, %zmm0, %zmm0
	; CHECK-NEXT: vmovdqa64 %zmm0, %zmm1 {%k2}			; CHECK-NEXT: vshufps {{.*#+}} zmm2 = zmm1[0,1],zmm0[2,3],zmm1[4,5],zmm0[6,7],zmm1[8,9],zmm0[10,11],zmm1[12,13],zmm0[14,15]
	; CHECK-NEXT: vpshufd {{.*#+}} zmm0 = zmm1[1,0,3,2,5,4,7,6,9,8,11,10,13,12,15,14]			; CHECK-NEXT: vshufps {{.*#+}} zmm0 = zmm1[1,0],zmm0[3,2],zmm1[5,4],zmm0[7,6],zmm1[9,8],zmm0[11,10],zmm1[13,12],zmm0[15,14]
	; CHECK-NEXT: vpminsd %zmm1, %zmm0, %zmm2			; CHECK-NEXT: vpminsd %zmm2, %zmm0, %zmm1
	; CHECK-NEXT: vpmaxsd %zmm1, %zmm0, %zmm2 {%k1}			; CHECK-NEXT: vpmaxsd %zmm2, %zmm0, %zmm1 {%k1}
	; CHECK-NEXT: vmovdqu64 %zmm2, (%rdi)			; CHECK-NEXT: vmovdqu64 %zmm1, (%rdi)
	; CHECK-NEXT: vzeroupper			; CHECK-NEXT: vzeroupper
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%2 = load <16 x i32>, ptr %0, align 1			%2 = load <16 x i32>, ptr %0, align 1
	%3 = shufflevector <16 x i32> %2, <16 x i32> poison, <16 x i32> <i32 1, i32 0, i32 3, i32 2, i32 5, i32 4, i32 7, i32 6, i32 9, i32 8, i32 11, i32 10, i32 13, i32 12, i32 15, i32 14>			%3 = shufflevector <16 x i32> %2, <16 x i32> poison, <16 x i32> <i32 1, i32 0, i32 3, i32 2, i32 5, i32 4, i32 7, i32 6, i32 9, i32 8, i32 11, i32 10, i32 13, i32 12, i32 15, i32 14>
	%4 = tail call <16 x i32> @llvm.smin.v16i32(<16 x i32> %3, <16 x i32> %2) #2			%4 = tail call <16 x i32> @llvm.smin.v16i32(<16 x i32> %3, <16 x i32> %2) #2
	%5 = tail call <16 x i32> @llvm.smax.v16i32(<16 x i32> %3, <16 x i32> %2) #2			%5 = tail call <16 x i32> @llvm.smax.v16i32(<16 x i32> %3, <16 x i32> %2) #2
	%6 = shufflevector <16 x i32> %4, <16 x i32> %5, <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 4, i32 21, i32 6, i32 23, i32 8, i32 25, i32 10, i32 27, i32 12, i32 29, i32 14, i32 31>			%6 = shufflevector <16 x i32> %4, <16 x i32> %5, <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 4, i32 21, i32 6, i32 23, i32 8, i32 25, i32 10, i32 27, i32 12, i32 29, i32 14, i32 31>
	%7 = shufflevector <16 x i32> %6, <16 x i32> poison, <16 x i32> <i32 3, i32 2, i32 1, i32 0, i32 7, i32 6, i32 5, i32 4, i32 11, i32 10, i32 9, i32 8, i32 15, i32 14, i32 13, i32 12>			%7 = shufflevector <16 x i32> %6, <16 x i32> poison, <16 x i32> <i32 3, i32 2, i32 1, i32 0, i32 7, i32 6, i32 5, i32 4, i32 11, i32 10, i32 9, i32 8, i32 15, i32 14, i32 13, i32 12>
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86][DAGISel] Don't widen shuffle element with AVX512ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 447318

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll

llvm/test/CodeGen/X86/combine-sdiv.ll

llvm/test/CodeGen/X86/haddsub-undef.ll

llvm/test/CodeGen/X86/vselect-avx512.ll

[X86][DAGISel] Don't widen shuffle element with AVX512
ClosedPublic