Download Raw Diff

Details

Reviewers

RKSimon
craig.topper
pengfei
wxiao3

Commits

rG5fb413421057: [X86][DAGISel] Don't widen shuffle element with AVX512

Summary

Currently the X86 shuffle lowering would widen the element type for
shuffle if the mask element value is adjacent. For below example

  %t2 = add nsw <16 x i32> %t0, %t1
  %t3 = sub nsw <16 x i32> %t0, %t1
  %t4 = shufflevector <16 x i32> %t2, <16 x i32> %t3,
                      <16 x i32> <i32 16, i32 17, i32 2, i32 3, i32 4,
                       i32 5, i32 6, i32 7, i32 8, i32 9, i32 10,
                       i32 11, i32 12, i32 13, i32 14, i32 15>

  ret <16 x i32> %t4

Compiler would transform the shuffle to
  %t4 = shufflevector <8 x i64> %t2, <8 x i64> %t3,
                      <8 x i64> <i32 8, i32 1, i32 2, i32 3, i32 4,
                                 i32 5, i32 6, i32 7>
This may lose the oppotunity to let ISel select mask instruction when
avx512 is enabled.

This patch is to prevent the tranform when avx512 feature is enabled.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,140 ms	x64 debian > AddressSanitizer-x86_64-linux-dynamic.TestCases::scariness_score_test.cpp
	60,080 ms	x64 debian > AddressSanitizer-x86_64-linux.TestCases::scariness_score_test.cpp
	60,040 ms	x64 debian > libFuzzer.libFuzzer::fuzzer-leak.test
	60,020 ms	x64 debian > libFuzzer.libFuzzer::minimize_crash.test
	60,030 ms	x64 debian > libFuzzer.libFuzzer::out-of-process-fuzz.test
		View Full Test Results (6 Failed)

Event Timeline

LuoYuanke created this revision.Jul 11 2022, 8:04 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 11 2022, 8:04 PM

Herald added subscribers: jsji, pengfei, hiraditya. · View Herald Transcript

LuoYuanke requested review of this revision.Jul 11 2022, 8:04 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 11 2022, 8:04 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B174790: Diff 443822.Jul 11 2022, 8:38 PM

Rebase.

Harbormaster completed remote builds in B174794: Diff 443828.Jul 11 2022, 10:20 PM

RKSimon added a reviewer: RKSimon.Jul 12 2022, 1:16 PM

wxiao3 added a subscriber: wxiao3.Jul 14 2022, 7:57 PM

RKSimon added inline comments.Jul 18 2022, 6:08 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
44372	This doesn't look right - shouldn't it be something like: APInt Mask = APIntOps::ScaleBitMask(ConstCond->getAPIntValue(), NumElts * 2) ?

LuoYuanke added inline comments.Jul 18 2022, 7:50 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
44372	Thanks, Simon. ScaleBitMask perfectly fit this coputation.

Address Simon's comments.

I think we can probably generalize this very easily to any legal widening, e.g. to handle: https://gcc.godbolt.org/z/Pjz5qfYT7 (v32i16 -> v16i32)

Harbormaster completed remote builds in B176040: Diff 445514.Jul 18 2022, 9:57 AM

RKSimon added inline comments.Jul 18 2022, 1:20 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
44346	Check if N->getOpcode() == ISD::VSELECT
44350	auto *ConstCond = dyn_cast<ConstantSDNode>(Cond.getOperand(0)); if (!ConstCond) return SDValue();

LuoYuanke added inline comments.Jul 18 2022, 10:37 PM

llvm/lib/Target/X86/X86ISelLowering.cpp

44346

Is it possible that CondVT is not vXi1 for ISD::VSELECT? I ask the question becasue the comments for ISD::VSELECT says "targets may change the condition type".

/// At first, the VSELECT condition is of vXi1 type. Later, targets may
/// change the condition type in order to match the VSELECT node using a
/// pattern. The condition follows the BooleanContent format of the target.

Address Simon's comments.

In D129537#3659947, @RKSimon wrote:

I think we can probably generalize this very easily to any legal widening, e.g. to handle: https://gcc.godbolt.org/z/Pjz5qfYT7 (v32i16 -> v16i32)

Good suggestion. Let me take a look at it.

Harbormaster completed remote builds in B176174: Diff 445706.Jul 19 2022, 1:17 AM

Address Simon's address to generalize the blend/select combine.

LuoYuanke added inline comments.Jul 20 2022, 1:48 AM

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll
94 ↗	(On Diff #446075)	For this case not sure if it is worse than left side code (previous code).

LuoYuanke added reviewers: craig.topper, pengfei, wxiao3.Jul 20 2022, 1:49 AM

Herald added a subscriber: StephenFan. · View Herald TranscriptJul 20 2022, 1:49 AM

Harbormaster completed remote builds in B176448: Diff 446075.Jul 20 2022, 2:39 AM

RKSimon mentioned this in rGbb4ff39bafdf: [X86] shuffle-blend.ll - add 32-bit test coverage.Jul 20 2022, 3:24 AM

I've added some additional test coverage to shuffle-blend.ll - please can you rebase?

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll
94 ↗	(On Diff #446075)	Add a 128-bit vector limit?

Please can you update the patch title/summary?

Rebase and update test case.

LuoYuanke retitled this revision from [X86][DAGISel] Combine select vXi64 with AVX512 target to [X86][DAGISel] Don't widen shuffle element with AVX512.Jul 20 2022, 4:25 AM

LuoYuanke edited the summary of this revision. (Show Details)

LuoYuanke added inline comments.Jul 20 2022, 4:48 AM

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll
94 ↗	(On Diff #446075)	I'll add 128-bit vector limit. However in this case it is 128-bit vector.

Harbormaster completed remote builds in B176472: Diff 446109.Jul 20 2022, 4:58 AM

Limit the vector bit width >=128 and add test cases.

RKSimon added inline comments.Jul 20 2022, 5:16 AM

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll
5 ↗	(On Diff #446114)	CHECK,AVX512BW,X86-AVX512BW CHECK,AVX512BW,X64-AVX512BW

Harbormaster completed remote builds in B176477: Diff 446114.Jul 20 2022, 5:46 AM

Address Simon's comments.

LuoYuanke marked an inline comment as done.Jul 20 2022, 6:06 AM

LuoYuanke added inline comments.

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll
94 ↗	(On Diff #446132)	`retl` can be merged to `retq` by `ret{{[l\|q]}}`. Not sure why utils/update_llc_test_checks.py doesn't merge.

RKSimon added inline comments.Jul 20 2022, 6:13 AM

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll
169 ↗	(On Diff #446132)	regression? you might need to improve the 128-bit limit logic to account for vXi16 specifically

Harbormaster completed remote builds in B176488: Diff 446132.Jul 20 2022, 6:53 AM

LuoYuanke added inline comments.Jul 20 2022, 7:49 AM

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll
169 ↗	(On Diff #446132)	There is PBLENDW, but there is no PBLENDB instruction, so it is better to widen to v2Xi16 from vXi8. Besides there is more instruction for 16-bit element (e.g., movsh). I'll investigate more on this issue.

Specially handling for vXi8, because vXi16 can be applied PBLENDW while vXi8 can't.

RKSimon added inline comments.Jul 20 2022, 8:16 AM

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll
243 ↗	(On Diff #446163)	pre-commit these additional tests
94 ↗	(On Diff #446132)	"kmovd %eax" vs "kmovq %rax"

LuoYuanke added inline comments.Jul 20 2022, 8:19 AM

llvm/test/CodeGen/X86/avx512-shuffles/shuffle-blend.ll
243 ↗	(On Diff #446163)	Sure. I'll do it.
94 ↗	(On Diff #446132)	Got it. :)

Harbormaster completed remote builds in B176516: Diff 446163.Jul 20 2022, 9:23 AM

Rebase

Harbormaster completed remote builds in B176666: Diff 446366.Jul 21 2022, 12:43 AM

RKSimon added inline comments.Jul 22 2022, 2:54 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
19326	This list is going to get longer, and we're likely to miss patterns that only fold to target nodes later on - I'm wondering whether we could consider accepting any TLI.isBinOp() case here?

LGTM - with one minor comment for future work

llvm/lib/Target/X86/X86ISelLowering.cpp
19326	Please can you add a TODO about maybe converting this to TLI.isBinOp()?

This revision is now accepted and ready to land.Jul 25 2022, 6:08 AM

Address Simon's comments.

LuoYuanke added inline comments.Jul 25 2022, 7:08 AM

llvm/test/CodeGen/X86/haddsub-undef.ll
1053 ↗	(On Diff #447318)	This looks a regression, I'll take a look at it.

LuoYuanke added inline comments.Jul 25 2022, 7:09 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
19326	Thank Simon for the suggestion. It seems there is regression on some cases, I'll take a look at the regression.

Harbormaster completed remote builds in B177366: Diff 447318.Jul 25 2022, 7:28 AM

Yeah - that was what I saw as well - if you want to get this in for 15.x I'd recommend going back to the old switch statement - then investigate the binop general case later (if you solve it soon enough you request a merge)

Revert to previous version and add TODO for checking TLI.isBinOp().

In D129537#3676257, @RKSimon wrote:

Yeah - that was what I saw as well - if you want to get this in for 15.x I'd recommend going back to the old switch statement - then investigate the binop general case later (if you solve it soon enough you request a merge)

Thank Simon for the review. All the comments are valuable. Let me land the old switch statement version first and then investigate the general binop.

Harbormaster completed remote builds in B177519: Diff 447540.Jul 25 2022, 8:47 PM

This revision was landed with ongoing or failed builds.Jul 25 2022, 8:56 PM

Closed by commit rG5fb413421057: [X86][DAGISel] Don't widen shuffle element with AVX512 (authored by LuoYuanke). · Explain Why

This revision was automatically updated to reflect the committed changes.

LuoYuanke added a commit: rG5fb413421057: [X86][DAGISel] Don't widen shuffle element with AVX512.

fhahn added a reverting change: rGf912bab111ad: Revert "[X86][DAGISel] Don't widen shuffle element with AVX512".Jul 28 2022, 7:27 AM

This patch unfortunately causes crashes when building llvm-test-suite optimizing for AVX512.

Reproducer for llc:

target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx"

define i32 @test(<32 x i32> %0) #0 {
entry:
  %1 = mul <32 x i32> %0, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %2 = tail call i32 @llvm.vector.reduce.add.v32i32(<32 x i32> %1)
  ret i32 %2
}

; Function Attrs: nocallback nofree nosync nounwind readnone willreturn
declare i32 @llvm.vector.reduce.add.v32i32(<32 x i32>) #1

attributes #0 = { "min-legal-vector-width"="0" "target-cpu"="skylake-avx512" }
attributes #1 = { nocallback nofree nosync nounwind readnone willreturn }

I've reverted the patch in the meantime to get current main back into a good state.

LuoYuanke mentioned this in rG6b4c386b1e70: [X86] Add test cases for D129537.Jul 30 2022, 4:43 AM

LuoYuanke mentioned this in D130830: Don't widen shuffle element with AVX512.Jul 30 2022, 8:42 PM

LuoYuanke added a reverting change: D131042: Revert "[X86][DAGISel] Don't widen shuffle element with AVX512".Aug 2 2022, 7:48 PM

LuoYuanke mentioned this in rGf885c08034fe: Don't widen shuffle element with AVX512.Oct 12 2022, 4:23 AM

Diff 443822

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 7,833 Lines • ▼ Show 20 Lines
	VT, DAG.getVectorShuffle(NewVT, DL, V1, V2, WidenedMask));			VT, DAG.getVectorShuffle(NewVT, DL, V1, V2, WidenedMask));
	}			}
	}			}

	SmallVector<SDValue> Ops = {V1, V2};			SmallVector<SDValue> Ops = {V1, V2};
	SmallVector<int> Mask(OrigMask.begin(), OrigMask.end());			SmallVector<int> Mask(OrigMask.begin(), OrigMask.end());

	// Canonicalize the shuffle with any horizontal ops inputs.			// Canonicalize the shuffle with any horizontal ops inputs.
	// NOTE: This may update Ops and Mask.			// NOTE: This may update Ops and Mask.
				RKSimonUnsubmitted Not Done Reply Inline Actions This list is going to get longer, and we're likely to miss patterns that only fold to target nodes later on - I'm wondering whether we could consider accepting any TLI.isBinOp() case here? RKSimon: This list is going to get longer, and we're likely to miss patterns that only fold to target…
				RKSimonUnsubmitted Not Done Reply Inline Actions Please can you add a TODO about maybe converting this to TLI.isBinOp()? RKSimon: Please can you add a TODO about maybe converting this to TLI.isBinOp()?
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions Thank Simon for the suggestion. It seems there is regression on some cases, I'll take a look at the regression. LuoYuanke: Thank Simon for the suggestion. It seems there is regression on some cases, I'll take a look at…
	if (SDValue HOp = canonicalizeShuffleMaskWithHorizOp(			if (SDValue HOp = canonicalizeShuffleMaskWithHorizOp(
	Ops, Mask, VT.getSizeInBits(), DL, DAG, Subtarget))			Ops, Mask, VT.getSizeInBits(), DL, DAG, Subtarget))
	return DAG.getBitcast(VT, HOp);			return DAG.getBitcast(VT, HOp);

	V1 = DAG.getBitcast(VT, Ops[0]);			V1 = DAG.getBitcast(VT, Ops[0]);
	V2 = DAG.getBitcast(VT, Ops[1]);			V2 = DAG.getBitcast(VT, Ops[1]);
	assert(NumElements == (int)Mask.size() &&			assert(NumElements == (int)Mask.size() &&
	"canonicalizeShuffleMaskWithHorizOp "			"canonicalizeShuffleMaskWithHorizOp "
	▲ Show 20 Lines • Show All 24,909 Lines • ▼ Show 20 Lines
	// sub accomplishes the negation of the replacement pattern.			// sub accomplishes the negation of the replacement pattern.
	if (V == Y)			if (V == Y)
	std::swap(SubOp1, SubOp2);			std::swap(SubOp1, SubOp2);

	SDValue Res = DAG.getNode(ISD::SUB, DL, MaskVT, SubOp1, SubOp2);			SDValue Res = DAG.getNode(ISD::SUB, DL, MaskVT, SubOp1, SubOp2);
	return DAG.getBitcast(VT, Res);			return DAG.getBitcast(VT, Res);
	}			}

				static SDValue combineSelectVxi64(SDNode *N, SelectionDAG &DAG,
				const X86Subtarget &Subtarget) {
				SDLoc DL(N);
				SDValue Cond = N->getOperand(0);
				SDValue LHS = N->getOperand(1);
				SDValue RHS = N->getOperand(2);
				EVT VT = LHS.getValueType();
				EVT CondVT = Cond.getValueType();
				// Combine
				// select vXi1 bitcast (int cond),
				// (<vXi64> bitcast <v2Xi32> a),
				// (<vXi64> bitcast <v2Xi32> b)
				// to
				// select <v2Xi1> cond, <v2Xi32> a, <v2Xi32> b
				// to create opportunity for mask instructions with AVX512 instructions.
				if (!Subtarget.hasAVX512())
				return SDValue();

				if (!CondVT.isVector() \|\| CondVT.getVectorElementType() != MVT::i1)
				return SDValue();
				if (Cond.getOpcode() != ISD::BITCAST)
				return SDValue();
				if (!dyn_cast<ConstantSDNode>(Cond.getOperand(0)))
				return SDValue();

				if (VT.getVectorElementType() != MVT::i64)
				return SDValue();

				if (LHS.getOpcode() != ISD::BITCAST \|\|
				LHS.getOperand(0).getValueType().getVectorElementType() != MVT::i32)
				return SDValue();
				if (RHS.getOpcode() != ISD::BITCAST \|\|
				RHS.getOperand(0).getValueType().getVectorElementType() != MVT::i32)
				return SDValue();

				if (!Cond.hasOneUse() \|\| !LHS.hasOneUse() \|\| !RHS.hasOneUse())
				return SDValue();

				int NumElts = VT.getVectorNumElements();
				EVT ExpandCondVT = EVT::getVectorVT(DAG.getContext(), MVT::i1, NumElts 2);
				EVT ExpandVT = EVT::getVectorVT(DAG.getContext(), MVT::i32, NumElts 2);

				ConstantSDNode *ConstCond = cast<ConstantSDNode>(Cond.getOperand(0));
				uint64_t Mask = ConstCond->getZExtValue();
				Mask = (Mask << 1) \| Mask;
				SDValue MaskVal = DAG.getConstant(
				Mask, DL, EVT::getIntegerVT(DAG.getContext(), NumElts 2));
				SDValue NewCond = DAG.getBitcast(ExpandCondVT, MaskVal);
				return DAG.getBitcast(VT,
				DAG.getSelect(DL, ExpandVT, NewCond, LHS.getOperand(0),
				RHS.getOperand(0)));
				}

	/// Do target-specific dag combines on SELECT and VSELECT nodes.			/// Do target-specific dag combines on SELECT and VSELECT nodes.
	static SDValue combineSelect(SDNode *N, SelectionDAG &DAG,			static SDValue combineSelect(SDNode *N, SelectionDAG &DAG,
	TargetLowering::DAGCombinerInfo &DCI,			TargetLowering::DAGCombinerInfo &DCI,
	const X86Subtarget &Subtarget) {			const X86Subtarget &Subtarget) {
	SDLoc DL(N);			SDLoc DL(N);
	SDValue Cond = N->getOperand(0);			SDValue Cond = N->getOperand(0);
	SDValue LHS = N->getOperand(1);			SDValue LHS = N->getOperand(1);
	SDValue RHS = N->getOperand(2);			SDValue RHS = N->getOperand(2);
	Show All 25 Lines
	SmallVector<int, 64> Mask;			SmallVector<int, 64> Mask;
	if (createShuffleMaskFromVSELECT(Mask, Cond,			if (createShuffleMaskFromVSELECT(Mask, Cond,
	N->getOpcode() == X86ISD::BLENDV))			N->getOpcode() == X86ISD::BLENDV))
	return DAG.getVectorShuffle(VT, DL, LHS, RHS, Mask);			return DAG.getVectorShuffle(VT, DL, LHS, RHS, Mask);
	}			}

	// fold vselect(cond, pshufb(x), pshufb(y)) -> or (pshufb(x), pshufb(y))			// fold vselect(cond, pshufb(x), pshufb(y)) -> or (pshufb(x), pshufb(y))
	// by forcing the unselected elements to zero.			// by forcing the unselected elements to zero.
	// TODO: Can we handle more shuffles with this?			// TODO: Can we handle more shuffles with this?
				RKSimonUnsubmitted Not Done Reply Inline Actions Check if N->getOpcode() == ISD::VSELECT RKSimon: Check if N->getOpcode() == ISD::VSELECT
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions Is it possible that CondVT is not vXi1 for ISD::VSELECT? I ask the question becasue the comments for ISD::VSELECT says "targets may change the condition type". /// At first, the VSELECT condition is of vXi1 type. Later, targets may /// change the condition type in order to match the VSELECT node using a /// pattern. The condition follows the BooleanContent format of the target. LuoYuanke: Is it possible that CondVT is not vXi1 for ISD::VSELECT? I ask the question becasue the…
	if (N->getOpcode() == ISD::VSELECT && CondVT.isVector() &&			if (N->getOpcode() == ISD::VSELECT && CondVT.isVector() &&
	LHS.getOpcode() == X86ISD::PSHUFB && RHS.getOpcode() == X86ISD::PSHUFB &&			LHS.getOpcode() == X86ISD::PSHUFB && RHS.getOpcode() == X86ISD::PSHUFB &&
	LHS.hasOneUse() && RHS.hasOneUse()) {			LHS.hasOneUse() && RHS.hasOneUse()) {
	MVT SimpleVT = VT.getSimpleVT();			MVT SimpleVT = VT.getSimpleVT();
				RKSimonUnsubmitted Not Done Reply Inline Actions auto ConstCond = dyn_cast<ConstantSDNode>(Cond.getOperand(0)); if (!ConstCond) return SDValue(); RKSimon:* ``` auto *ConstCond = dyn_cast<ConstantSDNode>(Cond.getOperand(0)); if (!ConstCond) return…
	SmallVector<SDValue, 1> LHSOps, RHSOps;			SmallVector<SDValue, 1> LHSOps, RHSOps;
	SmallVector<int, 64> LHSMask, RHSMask, CondMask;			SmallVector<int, 64> LHSMask, RHSMask, CondMask;
	if (createShuffleMaskFromVSELECT(CondMask, Cond) &&			if (createShuffleMaskFromVSELECT(CondMask, Cond) &&
	getTargetShuffleMask(LHS.getNode(), SimpleVT, true, LHSOps, LHSMask) &&			getTargetShuffleMask(LHS.getNode(), SimpleVT, true, LHSOps, LHSMask) &&
	getTargetShuffleMask(RHS.getNode(), SimpleVT, true, RHSOps, RHSMask)) {			getTargetShuffleMask(RHS.getNode(), SimpleVT, true, RHSOps, RHSMask)) {
	int NumElts = VT.getVectorNumElements();			int NumElts = VT.getVectorNumElements();
	for (int i = 0; i != NumElts; ++i) {			for (int i = 0; i != NumElts; ++i) {
	// getConstVector sets negative shuffle mask values as undef, so ensure			// getConstVector sets negative shuffle mask values as undef, so ensure
	// we hardcode SM_SentinelZero values to zero (0x80).			// we hardcode SM_SentinelZero values to zero (0x80).
	if (CondMask[i] < NumElts) {			if (CondMask[i] < NumElts) {
	LHSMask[i] = isUndefOrZero(LHSMask[i]) ? 0x80 : LHSMask[i];			LHSMask[i] = isUndefOrZero(LHSMask[i]) ? 0x80 : LHSMask[i];
	RHSMask[i] = 0x80;			RHSMask[i] = 0x80;
	} else {			} else {
	LHSMask[i] = 0x80;			LHSMask[i] = 0x80;
	RHSMask[i] = isUndefOrZero(RHSMask[i]) ? 0x80 : RHSMask[i];			RHSMask[i] = isUndefOrZero(RHSMask[i]) ? 0x80 : RHSMask[i];
	}			}
	}			}
	LHS = DAG.getNode(X86ISD::PSHUFB, DL, VT, LHS.getOperand(0),			LHS = DAG.getNode(X86ISD::PSHUFB, DL, VT, LHS.getOperand(0),
	getConstVector(LHSMask, SimpleVT, DAG, DL, true));			getConstVector(LHSMask, SimpleVT, DAG, DL, true));
	RHS = DAG.getNode(X86ISD::PSHUFB, DL, VT, RHS.getOperand(0),			RHS = DAG.getNode(X86ISD::PSHUFB, DL, VT, RHS.getOperand(0),
	getConstVector(RHSMask, SimpleVT, DAG, DL, true));			getConstVector(RHSMask, SimpleVT, DAG, DL, true));
	return DAG.getNode(ISD::OR, DL, VT, LHS, RHS);			return DAG.getNode(ISD::OR, DL, VT, LHS, RHS);
				RKSimonUnsubmitted Not Done Reply Inline Actions This doesn't look right - shouldn't it be something like: APInt Mask = APIntOps::ScaleBitMask(ConstCond->getAPIntValue(), NumElts * 2) ? RKSimon: This doesn't look right - shouldn't it be something like: APInt Mask = APIntOps::ScaleBitMask…
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions Thanks, Simon. ScaleBitMask perfectly fit this coputation. LuoYuanke: Thanks, Simon. ScaleBitMask perfectly fit this coputation.
	}			}
	}			}

	// If we have SSE[12] support, try to form min/max nodes. SSE min/max			// If we have SSE[12] support, try to form min/max nodes. SSE min/max
	// instructions match the semantics of the common C idiom x<y?x:y but not			// instructions match the semantics of the common C idiom x<y?x:y but not
	// x<=y?x:y, because of how they handle negative zero (which can be			// x<=y?x:y, because of how they handle negative zero (which can be
	// ignored in unsafe-math mode).			// ignored in unsafe-math mode).
	// We also try to create v2f32 min/max nodes, which we later widen to v4f32.			// We also try to create v2f32 min/max nodes, which we later widen to v4f32.
	▲ Show 20 Lines • Show All 140 Lines • ▼ Show 20 Lines
	break;			break;
	}			}
	}			}

	if (Opcode)			if (Opcode)
	return DAG.getNode(Opcode, DL, N->getValueType(0), LHS, RHS);			return DAG.getNode(Opcode, DL, N->getValueType(0), LHS, RHS);
	}			}

				if (SDValue V = combineSelectVxi64(N, DAG, Subtarget))
				return V;

	// Some mask scalar intrinsics rely on checking if only one bit is set			// Some mask scalar intrinsics rely on checking if only one bit is set
	// and implement it in C code like this:			// and implement it in C code like this:
	// A[0] = (U & 1) ? A[0] : W[0];			// A[0] = (U & 1) ? A[0] : W[0];
	// This creates some redundant instructions that break pattern matching.			// This creates some redundant instructions that break pattern matching.
	// fold (select (setcc (and (X, 1), 0, seteq), Y, Z)) -> select(and(X, 1),Z,Y)			// fold (select (setcc (and (X, 1), 0, seteq), Y, Z)) -> select(and(X, 1),Z,Y)
	if (Subtarget.hasAVX512() && N->getOpcode() == ISD::SELECT &&			if (Subtarget.hasAVX512() && N->getOpcode() == ISD::SELECT &&
	Cond.getOpcode() == ISD::SETCC && (VT == MVT::f32 \|\| VT == MVT::f64)) {			Cond.getOpcode() == ISD::SETCC && (VT == MVT::f32 \|\| VT == MVT::f64)) {
	ISD::CondCode CC = cast<CondCodeSDNode>(Cond.getOperand(2))->get();			ISD::CondCode CC = cast<CondCodeSDNode>(Cond.getOperand(2))->get();
	▲ Show 20 Lines • Show All 11,717 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86][DAGISel] Don't widen shuffle element with AVX512
ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 443822

llvm/lib/Target/X86/X86ISelLowering.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[X86][DAGISel] Don't widen shuffle element with AVX512ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 443822

llvm/lib/Target/X86/X86ISelLowering.cpp

[X86][DAGISel] Don't widen shuffle element with AVX512
ClosedPublic