This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
3/8
X86ISelLowering.cpp
-
test/CodeGen/X86/avx512-shuffles/
-
CodeGen/
-
X86/
-
avx512-shuffles/
7/11
shuffle-blend.ll

Differential D129537

[X86][DAGISel] Don't widen shuffle element with AVX512
ClosedPublic

Authored by LuoYuanke on Jul 11 2022, 8:04 PM.

Download Raw Diff

Details

Reviewers

RKSimon
craig.topper
pengfei
wxiao3

Commits

rG5fb413421057: [X86][DAGISel] Don't widen shuffle element with AVX512

Summary

Currently the X86 shuffle lowering would widen the element type for
shuffle if the mask element value is adjacent. For below example

  %t2 = add nsw <16 x i32> %t0, %t1
  %t3 = sub nsw <16 x i32> %t0, %t1
  %t4 = shufflevector <16 x i32> %t2, <16 x i32> %t3,
                      <16 x i32> <i32 16, i32 17, i32 2, i32 3, i32 4,
                       i32 5, i32 6, i32 7, i32 8, i32 9, i32 10,
                       i32 11, i32 12, i32 13, i32 14, i32 15>

  ret <16 x i32> %t4

Compiler would transform the shuffle to
  %t4 = shufflevector <8 x i64> %t2, <8 x i64> %t3,
                      <8 x i64> <i32 8, i32 1, i32 2, i32 3, i32 4,
                                 i32 5, i32 6, i32 7>
This may lose the oppotunity to let ISel select mask instruction when
avx512 is enabled.

This patch is to prevent the tranform when avx512 feature is enabled.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,070 ms	x64 debian > ThreadSanitizer-x86_64.ThreadSanitizer-x86_64::restore_stack.cpp

Event Timeline

LuoYuanke created this revision.Jul 11 2022, 8:04 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 11 2022, 8:04 PM

Herald added subscribers: jsji, pengfei, hiraditya. · View Herald Transcript

LuoYuanke requested review of this revision.Jul 11 2022, 8:04 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 11 2022, 8:04 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B174790: Diff 443822.Jul 11 2022, 8:38 PM

Rebase.

Harbormaster completed remote builds in B174794: Diff 443828.Jul 11 2022, 10:20 PM

RKSimon added a reviewer: RKSimon.Jul 12 2022, 1:16 PM

wxiao3 added a subscriber: wxiao3.Jul 14 2022, 7:57 PM

RKSimon added inline comments.Jul 18 2022, 6:08 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
44372	This doesn't look right - shouldn't it be something like: APInt Mask = APIntOps::ScaleBitMask(ConstCond->getAPIntValue(), NumElts * 2) ?

LuoYuanke added inline comments.Jul 18 2022, 7:50 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
44372	Thanks, Simon. ScaleBitMask perfectly fit this coputation.

Address Simon's comments.

I think we can probably generalize this very easily to any legal widening, e.g. to handle: https://gcc.godbolt.org/z/Pjz5qfYT7 (v32i16 -> v16i32)

Harbormaster completed remote builds in B176040: Diff 445514.Jul 18 2022, 9:57 AM

RKSimon added inline comments.Jul 18 2022, 1:20 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
44346	Check if N->getOpcode() == ISD::VSELECT
44350	auto *ConstCond = dyn_cast<ConstantSDNode>(Cond.getOperand(0)); if (!ConstCond) return SDValue();

LuoYuanke added inline comments.Jul 18 2022, 10:37 PM

llvm/lib/Target/X86/X86ISelLowering.cpp

44346

Is it possible that CondVT is not vXi1 for ISD::VSELECT? I ask the question becasue the comments for ISD::VSELECT says "targets may change the condition type".

/// At first, the VSELECT condition is of vXi1 type. Later, targets may
/// change the condition type in order to match the VSELECT node using a
/// pattern. The condition follows the BooleanContent format of the target.

Address Simon's comments.

In D129537#3659947, @RKSimon wrote:

I think we can probably generalize this very easily to any legal widening, e.g. to handle: https://gcc.godbolt.org/z/Pjz5qfYT7 (v32i16 -> v16i32)

Good suggestion. Let me take a look at it.

Harbormaster completed remote builds in B176174: Diff 445706.Jul 19 2022, 1:17 AM

Address Simon's address to generalize the blend/select combine.

LuoYuanke added inline comments.Jul 20 2022, 1:48 AM

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll
60	For this case not sure if it is worse than left side code (previous code).

LuoYuanke added reviewers: craig.topper, pengfei, wxiao3.Jul 20 2022, 1:49 AM

Herald added a subscriber: StephenFan. · View Herald TranscriptJul 20 2022, 1:49 AM

Harbormaster completed remote builds in B176448: Diff 446075.Jul 20 2022, 2:39 AM

RKSimon mentioned this in rGbb4ff39bafdf: [X86] shuffle-blend.ll - add 32-bit test coverage.Jul 20 2022, 3:24 AM

I've added some additional test coverage to shuffle-blend.ll - please can you rebase?

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll
60	Add a 128-bit vector limit?

Please can you update the patch title/summary?

Rebase and update test case.

LuoYuanke retitled this revision from [X86][DAGISel] Combine select vXi64 with AVX512 target to [X86][DAGISel] Don't widen shuffle element with AVX512.Jul 20 2022, 4:25 AM

LuoYuanke edited the summary of this revision. (Show Details)

LuoYuanke added inline comments.Jul 20 2022, 4:48 AM

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll
60	I'll add 128-bit vector limit. However in this case it is 128-bit vector.

Harbormaster completed remote builds in B176472: Diff 446109.Jul 20 2022, 4:58 AM

Limit the vector bit width >=128 and add test cases.

RKSimon added inline comments.Jul 20 2022, 5:16 AM

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll
2	CHECK,AVX512BW,X86-AVX512BW CHECK,AVX512BW,X64-AVX512BW

Harbormaster completed remote builds in B176477: Diff 446114.Jul 20 2022, 5:46 AM

Address Simon's comments.

LuoYuanke marked an inline comment as done.Jul 20 2022, 6:06 AM

LuoYuanke added inline comments.

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll
68	`retl` can be merged to `retq` by `ret{{[l\|q]}}`. Not sure why utils/update_llc_test_checks.py doesn't merge.

RKSimon added inline comments.Jul 20 2022, 6:13 AM

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll
75	regression? you might need to improve the 128-bit limit logic to account for vXi16 specifically

Harbormaster completed remote builds in B176488: Diff 446132.Jul 20 2022, 6:53 AM

LuoYuanke added inline comments.Jul 20 2022, 7:49 AM

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll
75	There is PBLENDW, but there is no PBLENDB instruction, so it is better to widen to v2Xi16 from vXi8. Besides there is more instruction for 16-bit element (e.g., movsh). I'll investigate more on this issue.

Specially handling for vXi8, because vXi16 can be applied PBLENDW while vXi8 can't.

RKSimon added inline comments.Jul 20 2022, 8:16 AM

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll
68	"kmovd %eax" vs "kmovq %rax"
74	pre-commit these additional tests

LuoYuanke added inline comments.Jul 20 2022, 8:19 AM

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll
68	Got it. :)
74	Sure. I'll do it.

Harbormaster completed remote builds in B176516: Diff 446163.Jul 20 2022, 9:23 AM

Rebase

Harbormaster completed remote builds in B176666: Diff 446366.Jul 21 2022, 12:43 AM

RKSimon added inline comments.Jul 22 2022, 2:54 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
19326	This list is going to get longer, and we're likely to miss patterns that only fold to target nodes later on - I'm wondering whether we could consider accepting any TLI.isBinOp() case here?

LGTM - with one minor comment for future work

llvm/lib/Target/X86/X86ISelLowering.cpp
19326	Please can you add a TODO about maybe converting this to TLI.isBinOp()?

This revision is now accepted and ready to land.Jul 25 2022, 6:08 AM

Address Simon's comments.

LuoYuanke added inline comments.Jul 25 2022, 7:08 AM

llvm/test/CodeGen/X86/haddsub-undef.ll
1053 ↗	(On Diff #447318)	This looks a regression, I'll take a look at it.

LuoYuanke added inline comments.Jul 25 2022, 7:09 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
19326	Thank Simon for the suggestion. It seems there is regression on some cases, I'll take a look at the regression.

Harbormaster completed remote builds in B177366: Diff 447318.Jul 25 2022, 7:28 AM

Yeah - that was what I saw as well - if you want to get this in for 15.x I'd recommend going back to the old switch statement - then investigate the binop general case later (if you solve it soon enough you request a merge)

Revert to previous version and add TODO for checking TLI.isBinOp().

In D129537#3676257, @RKSimon wrote:

Yeah - that was what I saw as well - if you want to get this in for 15.x I'd recommend going back to the old switch statement - then investigate the binop general case later (if you solve it soon enough you request a merge)

Thank Simon for the review. All the comments are valuable. Let me land the old switch statement version first and then investigate the general binop.

Harbormaster completed remote builds in B177519: Diff 447540.Jul 25 2022, 8:47 PM

This revision was landed with ongoing or failed builds.Jul 25 2022, 8:56 PM

Closed by commit rG5fb413421057: [X86][DAGISel] Don't widen shuffle element with AVX512 (authored by LuoYuanke). · Explain Why

This revision was automatically updated to reflect the committed changes.

LuoYuanke added a commit: rG5fb413421057: [X86][DAGISel] Don't widen shuffle element with AVX512.

fhahn added a reverting change: rGf912bab111ad: Revert "[X86][DAGISel] Don't widen shuffle element with AVX512".Jul 28 2022, 7:27 AM

This patch unfortunately causes crashes when building llvm-test-suite optimizing for AVX512.

Reproducer for llc:

target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx"

define i32 @test(<32 x i32> %0) #0 {
entry:
  %1 = mul <32 x i32> %0, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %2 = tail call i32 @llvm.vector.reduce.add.v32i32(<32 x i32> %1)
  ret i32 %2
}

; Function Attrs: nocallback nofree nosync nounwind readnone willreturn
declare i32 @llvm.vector.reduce.add.v32i32(<32 x i32>) #1

attributes #0 = { "min-legal-vector-width"="0" "target-cpu"="skylake-avx512" }
attributes #1 = { nocallback nofree nosync nounwind readnone willreturn }

I've reverted the patch in the meantime to get current main back into a good state.

LuoYuanke mentioned this in rG6b4c386b1e70: [X86] Add test cases for D129537.Jul 30 2022, 4:43 AM

LuoYuanke mentioned this in D130830: Don't widen shuffle element with AVX512.Jul 30 2022, 8:42 PM

LuoYuanke added a reverting change: D131042: Revert "[X86][DAGISel] Don't widen shuffle element with AVX512".Aug 2 2022, 7:48 PM

LuoYuanke mentioned this in rGf885c08034fe: Don't widen shuffle element with AVX512.Oct 12 2022, 4:23 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86ISelLowering.cpp

56 lines

test/

CodeGen/

X86/

avx512-shuffles/

shuffle-blend.ll

5 lines

Diff 443828

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 7,757 Lines • ▼ Show 20 Lines
	for (int &M : NewMask)			for (int &M : NewMask)
	if (M >= NumElements)			if (M >= NumElements)
	M = -1;			M = -1;
	return DAG.getVectorShuffle(VT, DL, V1, V2, NewMask);			return DAG.getVectorShuffle(VT, DL, V1, V2, NewMask);
	}			}

	// Check for illegal shuffle mask element index values.			// Check for illegal shuffle mask element index values.
	int MaskUpperLimit = OrigMask.size() * (V2IsUndef ? 1 : 2);			int MaskUpperLimit = OrigMask.size() * (V2IsUndef ? 1 : 2);
	(void)MaskUpperLimit;			(void)MaskUpperLimit;
				RKSimonUnsubmitted Not Done Reply Inline Actions This list is going to get longer, and we're likely to miss patterns that only fold to target nodes later on - I'm wondering whether we could consider accepting any TLI.isBinOp() case here? RKSimon: This list is going to get longer, and we're likely to miss patterns that only fold to target…
				RKSimonUnsubmitted Not Done Reply Inline Actions Please can you add a TODO about maybe converting this to TLI.isBinOp()? RKSimon: Please can you add a TODO about maybe converting this to TLI.isBinOp()?
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions Thank Simon for the suggestion. It seems there is regression on some cases, I'll take a look at the regression. LuoYuanke: Thank Simon for the suggestion. It seems there is regression on some cases, I'll take a look at…
	assert(llvm::all_of(OrigMask,			assert(llvm::all_of(OrigMask,
	[&](int M) { return -1 <= M && M < MaskUpperLimit; }) &&			[&](int M) { return -1 <= M && M < MaskUpperLimit; }) &&
	"Out of bounds shuffle index");			"Out of bounds shuffle index");

	// We actually see shuffles that are entirely re-arrangements of a set of			// We actually see shuffles that are entirely re-arrangements of a set of
	// zero inputs. This mostly happens while decomposing complex shuffles into			// zero inputs. This mostly happens while decomposing complex shuffles into
	// simple ones. Directly lower these as a buildvector of zeros.			// simple ones. Directly lower these as a buildvector of zeros.
	APInt KnownUndef, KnownZero;			APInt KnownUndef, KnownZero;
	▲ Show 20 Lines • Show All 24,985 Lines • ▼ Show 20 Lines
	// sub accomplishes the negation of the replacement pattern.			// sub accomplishes the negation of the replacement pattern.
	if (V == Y)			if (V == Y)
	std::swap(SubOp1, SubOp2);			std::swap(SubOp1, SubOp2);

	SDValue Res = DAG.getNode(ISD::SUB, DL, MaskVT, SubOp1, SubOp2);			SDValue Res = DAG.getNode(ISD::SUB, DL, MaskVT, SubOp1, SubOp2);
	return DAG.getBitcast(VT, Res);			return DAG.getBitcast(VT, Res);
	}			}

				static SDValue combineSelectVxi64(SDNode *N, SelectionDAG &DAG,
				const X86Subtarget &Subtarget) {
				SDLoc DL(N);
				SDValue Cond = N->getOperand(0);
				SDValue LHS = N->getOperand(1);
				SDValue RHS = N->getOperand(2);
				EVT VT = LHS.getValueType();
				EVT CondVT = Cond.getValueType();
				// Combine
				// select vXi1 bitcast (int cond),
				// (<vXi64> bitcast <v2Xi32> a),
				// (<vXi64> bitcast <v2Xi32> b)
				// to
				// select <v2Xi1> cond, <v2Xi32> a, <v2Xi32> b
				// to create opportunity for mask instructions with AVX512 instructions.
				if (!Subtarget.hasAVX512())
				return SDValue();

				if (!CondVT.isVector() \|\| CondVT.getVectorElementType() != MVT::i1)
				RKSimonUnsubmitted Not Done Reply Inline Actions Check if N->getOpcode() == ISD::VSELECT RKSimon: Check if N->getOpcode() == ISD::VSELECT
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions Is it possible that CondVT is not vXi1 for ISD::VSELECT? I ask the question becasue the comments for ISD::VSELECT says "targets may change the condition type". /// At first, the VSELECT condition is of vXi1 type. Later, targets may /// change the condition type in order to match the VSELECT node using a /// pattern. The condition follows the BooleanContent format of the target. LuoYuanke: Is it possible that CondVT is not vXi1 for ISD::VSELECT? I ask the question becasue the…
				return SDValue();
				if (Cond.getOpcode() != ISD::BITCAST)
				return SDValue();
				if (!dyn_cast<ConstantSDNode>(Cond.getOperand(0)))
				RKSimonUnsubmitted Not Done Reply Inline Actions auto ConstCond = dyn_cast<ConstantSDNode>(Cond.getOperand(0)); if (!ConstCond) return SDValue(); RKSimon:* ``` auto *ConstCond = dyn_cast<ConstantSDNode>(Cond.getOperand(0)); if (!ConstCond) return…
				return SDValue();

				if (VT.getVectorElementType() != MVT::i64)
				return SDValue();

				if (LHS.getOpcode() != ISD::BITCAST \|\|
				LHS.getOperand(0).getValueType().getVectorElementType() != MVT::i32)
				return SDValue();
				if (RHS.getOpcode() != ISD::BITCAST \|\|
				RHS.getOperand(0).getValueType().getVectorElementType() != MVT::i32)
				return SDValue();

				if (!Cond.hasOneUse() \|\| !LHS.hasOneUse() \|\| !RHS.hasOneUse())
				return SDValue();

				int NumElts = VT.getVectorNumElements();
				EVT ExpandCondVT = EVT::getVectorVT(DAG.getContext(), MVT::i1, NumElts 2);
				EVT ExpandVT = EVT::getVectorVT(DAG.getContext(), MVT::i32, NumElts 2);

				ConstantSDNode *ConstCond = cast<ConstantSDNode>(Cond.getOperand(0));
				uint64_t Mask = ConstCond->getZExtValue();
				Mask = (Mask << 1) \| Mask;
				RKSimonUnsubmitted Not Done Reply Inline Actions This doesn't look right - shouldn't it be something like: APInt Mask = APIntOps::ScaleBitMask(ConstCond->getAPIntValue(), NumElts * 2) ? RKSimon: This doesn't look right - shouldn't it be something like: APInt Mask = APIntOps::ScaleBitMask…
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions Thanks, Simon. ScaleBitMask perfectly fit this coputation. LuoYuanke: Thanks, Simon. ScaleBitMask perfectly fit this coputation.
				SDValue MaskVal = DAG.getConstant(
				Mask, DL, EVT::getIntegerVT(DAG.getContext(), NumElts 2));
				SDValue NewCond = DAG.getBitcast(ExpandCondVT, MaskVal);
				return DAG.getBitcast(VT,
				DAG.getSelect(DL, ExpandVT, NewCond, LHS.getOperand(0),
				RHS.getOperand(0)));
				}

	/// Do target-specific dag combines on SELECT and VSELECT nodes.			/// Do target-specific dag combines on SELECT and VSELECT nodes.
	static SDValue combineSelect(SDNode *N, SelectionDAG &DAG,			static SDValue combineSelect(SDNode *N, SelectionDAG &DAG,
	TargetLowering::DAGCombinerInfo &DCI,			TargetLowering::DAGCombinerInfo &DCI,
	const X86Subtarget &Subtarget) {			const X86Subtarget &Subtarget) {
	SDLoc DL(N);			SDLoc DL(N);
	SDValue Cond = N->getOperand(0);			SDValue Cond = N->getOperand(0);
	SDValue LHS = N->getOperand(1);			SDValue LHS = N->getOperand(1);
	SDValue RHS = N->getOperand(2);			SDValue RHS = N->getOperand(2);
	▲ Show 20 Lines • Show All 208 Lines • ▼ Show 20 Lines
	break;			break;
	}			}
	}			}

	if (Opcode)			if (Opcode)
	return DAG.getNode(Opcode, DL, N->getValueType(0), LHS, RHS);			return DAG.getNode(Opcode, DL, N->getValueType(0), LHS, RHS);
	}			}

				if (SDValue V = combineSelectVxi64(N, DAG, Subtarget))
				return V;

	// Some mask scalar intrinsics rely on checking if only one bit is set			// Some mask scalar intrinsics rely on checking if only one bit is set
	// and implement it in C code like this:			// and implement it in C code like this:
	// A[0] = (U & 1) ? A[0] : W[0];			// A[0] = (U & 1) ? A[0] : W[0];
	// This creates some redundant instructions that break pattern matching.			// This creates some redundant instructions that break pattern matching.
	// fold (select (setcc (and (X, 1), 0, seteq), Y, Z)) -> select(and(X, 1),Z,Y)			// fold (select (setcc (and (X, 1), 0, seteq), Y, Z)) -> select(and(X, 1),Z,Y)
	if (Subtarget.hasAVX512() && N->getOpcode() == ISD::SELECT &&			if (Subtarget.hasAVX512() && N->getOpcode() == ISD::SELECT &&
	Cond.getOpcode() == ISD::SETCC && (VT == MVT::f32 \|\| VT == MVT::f64)) {			Cond.getOpcode() == ISD::SETCC && (VT == MVT::f32 \|\| VT == MVT::f64)) {
	ISD::CondCode CC = cast<CondCodeSDNode>(Cond.getOperand(2))->get();			ISD::CondCode CC = cast<CondCodeSDNode>(Cond.getOperand(2))->get();
	▲ Show 20 Lines • Show All 11,717 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=x86_64-unknown-linux-gnu -mattr=+avx512f,+avx512vl,+avx512bw %s -o - \| FileCheck %s			; RUN: llc -mtriple=x86_64-unknown-linux-gnu -mattr=+avx512f,+avx512vl,+avx512bw %s -o - \| FileCheck %s
				RKSimonUnsubmitted Done Reply Inline Actions CHECK,AVX512BW,X86-AVX512BW CHECK,AVX512BW,X64-AVX512BW RKSimon: CHECK,AVX512BW,X86-AVX512BW CHECK,AVX512BW,X64-AVX512BW

	define <16 x i32> @shuffle_v8i64(<16 x i32> %t0, <16 x i32> %t1) {			define <16 x i32> @shuffle_v8i64(<16 x i32> %t0, <16 x i32> %t1) {
	; CHECK-LABEL: shuffle_v8i64:			; CHECK-LABEL: shuffle_v8i64:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: vpaddd %zmm1, %zmm0, %zmm2			; CHECK-NEXT: vpaddd %zmm1, %zmm0, %zmm2
	; CHECK-NEXT: vpsubd %zmm1, %zmm0, %zmm0			; CHECK-NEXT: movw $510, %ax # imm = 0x1FE
	; CHECK-NEXT: movb $-86, %al
	; CHECK-NEXT: kmovd %eax, %k1			; CHECK-NEXT: kmovd %eax, %k1
	; CHECK-NEXT: vmovdqa64 %zmm0, %zmm2 {%k1}			; CHECK-NEXT: vpsubd %zmm1, %zmm0, %zmm2 {%k1}
	; CHECK-NEXT: vmovdqa64 %zmm2, %zmm0			; CHECK-NEXT: vmovdqa64 %zmm2, %zmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%t2 = add nsw <16 x i32> %t0, %t1			%t2 = add nsw <16 x i32> %t0, %t1
	%t3 = sub nsw <16 x i32> %t0, %t1			%t3 = sub nsw <16 x i32> %t0, %t1
	%t4 = shufflevector <16 x i32> %t2, <16 x i32> %t3, <16 x i32> <i32 0, i32 1, i32 18, i32 19, i32 4, i32 5, i32 22, i32 23, i32 8, i32 9, i32 26, i32 27, i32 12, i32 13, i32 30, i32 31>			%t4 = shufflevector <16 x i32> %t2, <16 x i32> %t3, <16 x i32> <i32 0, i32 1, i32 18, i32 19, i32 4, i32 5, i32 22, i32 23, i32 8, i32 9, i32 26, i32 27, i32 12, i32 13, i32 30, i32 31>
	ret <16 x i32> %t4			ret <16 x i32> %t4
	}			}
	Show All 33 Lines
	; CHECK-NEXT: vpsubd %xmm1, %xmm0, %xmm0			; CHECK-NEXT: vpsubd %xmm1, %xmm0, %xmm0
	; CHECK-NEXT: vpblendd {{.*#+}} xmm0 = xmm2[0],xmm0[1],xmm2[2,3]			; CHECK-NEXT: vpblendd {{.*#+}} xmm0 = xmm2[0],xmm0[1],xmm2[2,3]
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%t2 = add nsw <2 x i32> %t0, %t1			%t2 = add nsw <2 x i32> %t0, %t1
	%t3 = sub nsw <2 x i32> %t0, %t1			%t3 = sub nsw <2 x i32> %t0, %t1
	%t4 = shufflevector <2 x i32> %t2, <2 x i32> %t3, <2 x i32> <i32 0, i32 3>			%t4 = shufflevector <2 x i32> %t2, <2 x i32> %t3, <2 x i32> <i32 0, i32 3>
	ret <2 x i32> %t4			ret <2 x i32> %t4
	}			}
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions For this case not sure if it is worse than left side code (previous code). LuoYuanke: For this case not sure if it is worse than left side code (previous code).
				RKSimonUnsubmitted Not Done Reply Inline Actions Add a 128-bit vector limit? RKSimon: Add a 128-bit vector limit?
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions I'll add 128-bit vector limit. However in this case it is 128-bit vector. LuoYuanke: I'll add 128-bit vector limit. However in this case it is 128-bit vector.
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions `retl` can be merged to `retq` by `ret{{[l\|q]}}`. Not sure why utils/update_llc_test_checks.py doesn't merge. LuoYuanke: `retl` can be merged to `retq` by `ret{{[l\|q]}}`. Not sure why utils/update_llc_test_checks.py…
				RKSimonUnsubmitted Not Done Reply Inline Actions "kmovd %eax" vs "kmovq %rax" RKSimon: "kmovd %eax" vs "kmovq %rax"
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions Got it. :) LuoYuanke: Got it. :)
				RKSimonUnsubmitted Not Done Reply Inline Actions regression? you might need to improve the 128-bit limit logic to account for vXi16 specifically RKSimon: regression? you might need to improve the 128-bit limit logic to account for vXi16 specifically
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions There is PBLENDW, but there is no PBLENDB instruction, so it is better to widen to v2Xi16 from vXi8. Besides there is more instruction for 16-bit element (e.g., movsh). I'll investigate more on this issue. LuoYuanke: There is PBLENDW, but there is no PBLENDB instruction, so it is better to widen to v2Xi16 from…
				RKSimonUnsubmitted Not Done Reply Inline Actions pre-commit these additional tests RKSimon: pre-commit these additional tests
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions Sure. I'll do it. LuoYuanke: Sure. I'll do it.

This is an archive of the discontinued LLVM Phabricator instance.

[X86][DAGISel] Don't widen shuffle element with AVX512ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 443828

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll

[X86][DAGISel] Don't widen shuffle element with AVX512
ClosedPublic