This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
CodeGen/
-
SelectionDAGNodes.h
-
Target/
1/1
TargetSelectionDAG.td
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
SelectionDAG.cpp
-
Target/X86/
-
X86/
1/3
X86ISelLowering.cpp
4/4
X86InstrVecCompiler.td
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
avx-intrinsics-fast-isel.ll
-
avx-intrinsics-x86.ll
-
avx512-intrinsics.ll
-
avx512fp16-intrinsics.ll

Differential D130339

[CodeGen] Generate efficient assembly for freeze(poison) version of `mm_cast` intel intrinsics
ClosedPublic

Authored by aqjune on Jul 22 2022, 2:48 AM.

Download Raw Diff

Details

Reviewers

craig.topper
RKSimon
spatel
fhahn
nikic
pengfei

Commits

rG02e56e253302: [CodeGen] Generate efficient assembly for freeze(poison) version of `mm*_cast*`…

Summary

This patch makes the variants of mm*_cast* intel intrinsics that use shufflevector(freeze(poison), ..) emit efficient assembly.
(These intrinsics are planned to use shufflevector(freeze(poison), ..) after shufflevector's semantics update; relevant thread: D103874)

To do so, this patch

Updates LowerAVXCONCAT_VECTORS in X86ISelLowering.cpp to recognize FREEZE(UNDEF) operand of CONCAT_VECTOR in addition to UNDEF
Updates X86InstrVecCompiler.td to recognize insert_subvector of FREEZE(UNDEF) vector as its first operand.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

aqjune created this revision.Jul 22 2022, 2:48 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 22 2022, 2:48 AM

Herald added subscribers: jsji, StephenFan, pengfei, hiraditya. · View Herald Transcript

aqjune requested review of this revision.Jul 22 2022, 2:48 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 22 2022, 2:48 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

aqjune added inline comments.Jul 22 2022, 2:50 AM

llvm/lib/Target/X86/X86InstrVecCompiler.td
76	I found that using `insert_subvector (freeze undef)` instead of `insert_subvector (freeze (VT undef))` causes this rule to be silently ignored during instruction selection. What would the possible reasons be? I tried to figure it out, but couldn't.

Thank you for looking at this - its been on my todo pile for far too long!

RKSimon added a reviewer: pengfei.Jul 22 2022, 2:51 AM

Harbormaster completed remote builds in B176956: Diff 446757.Jul 22 2022, 3:39 AM

aqjune added inline comments.Jul 29 2022, 2:02 AM

llvm/lib/Target/X86/X86InstrVecCompiler.td
76	FWIW, `insert_subvector (VT (freeze undef)) ..` also results in generating `vblendps $15, %ymm0, %ymm0, %ymm0`. It seems `(freeze undef)` isn't being recognized. :/ Anyway, `freeze (VT undef)` works well. :)

RKSimon added inline comments.Jul 29 2022, 8:16 AM

llvm/lib/Target/X86/X86InstrVecCompiler.td
74	Could we add a generic pattern for "undef_or_freeze_undef" that we could use here as I imagine a lot more places are going to need something like this and we should try to avoid duplication if we can?

craig.topper added inline comments.Jul 29 2022, 10:30 AM

llvm/include/llvm/Target/TargetSelectionDAG.td
459	Don't use SDTUnaryOp here. Use `SDTypeProfile<1, 1, [SDTCisSameAs<0, 1>]>` SDTUnaryOp doesn't force the input and output to have the same type. I think that prevented `VT` from propagating through the freeze to the undef and that left the undef untyped causing the pattern to be dropped.

craig.topper added inline comments.Jul 29 2022, 10:33 AM

llvm/lib/Target/X86/X86InstrVecCompiler.td
74	With the change I mentioned in TargetSelectionDAG.td you shouldn't need to mention `VT` near freeze or undef. The one at the start of the pattern will be enough.

RKSimon added inline comments.Aug 2 2022, 5:23 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
11497	Is there any reason we couldn't just always use DAG.getFreeze(DAG.getUNDEF(ResVT)) ?

Define undef_or_freeze_undef and use it for the new pattern

aqjune marked 3 inline comments as done.Aug 3 2022, 9:20 AM

aqjune added inline comments.

llvm/lib/Target/X86/X86ISelLowering.cpp
11497	I tried using `DAG.getFreeze(DAG.getUNDEF(ResVT))`, and it needs updates in existing lowering functions to make the following tests pass: LLVM :: CodeGen/X86/haddsub-undef.ll LLVM :: CodeGen/X86/oddsubvector.ll LLVM :: CodeGen/X86/subvector-broadcast.ll LLVM :: CodeGen/X86/vector-interleaved-load-i16-stride-3.ll ... It causes insertion of `vinsert*` instruction instead of efficient ops. I think it is good to keep `DAG.getUNDEF(ResVT)` to avoid regression.

Harbormaster completed remote builds in B179043: Diff 449682.Aug 3 2022, 10:26 AM

RKSimon added inline comments.Aug 8 2022, 8:45 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
11497	Thanks - I'll take a look at those cases after this patch has gone in - we have some custom vector widening patterns that we'll need to adjust to handle freeze(undef)

LGTM - @craig.topper any more comments?

LGTM

This revision is now accepted and ready to land.Aug 9 2022, 9:07 AM

Closed by commit rG02e56e253302: [CodeGen] Generate efficient assembly for freeze(poison) version of `mm*_cast*`… (authored by aqjune). · Explain WhyAug 10 2022, 9:38 PM

This revision was automatically updated to reflect the committed changes.

aqjune added a commit: rG02e56e253302: [CodeGen] Generate efficient assembly for freeze(poison) version of `mm*_cast*`….

aqjune mentioned this in D103874: [IR] Rename the shufflevector's undef mask to poison.Aug 10 2022, 9:48 PM

aqjune mentioned this in D136737: [Draft] [clang] Add builtin_unspecified_value.Oct 25 2022, 10:37 PM

ManuelJBrito mentioned this in D143287: [Clang][X86] Change X86 cast intrinsics to use __builtin_nondeterministic_value.Feb 24 2023, 5:11 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

SelectionDAGNodes.h

3 lines

Target/

TargetSelectionDAG.td

7 lines

lib/

CodeGen/

SelectionDAG/

SelectionDAG.cpp

4 lines

Target/

X86/

X86ISelLowering.cpp

8 lines

X86InstrVecCompiler.td

2 lines

test/

CodeGen/

X86/

avx-intrinsics-fast-isel.ll

3 lines

avx-intrinsics-x86.ll

45 lines

avx512-intrinsics.ll

3 lines

avx512fp16-intrinsics.ll

2 lines

Diff 451715

llvm/include/llvm/CodeGen/SelectionDAGNodes.h

	Show First 20 Lines • Show All 116 Lines • ▼ Show 20 Lines
	/// Return true if the specified node is a BUILD_VECTOR node of all			/// Return true if the specified node is a BUILD_VECTOR node of all
	/// ConstantFPSDNode or undef.			/// ConstantFPSDNode or undef.
	bool isBuildVectorOfConstantFPSDNodes(const SDNode *N);			bool isBuildVectorOfConstantFPSDNodes(const SDNode *N);

	/// Return true if the node has at least one operand and all operands of the			/// Return true if the node has at least one operand and all operands of the
	/// specified node are ISD::UNDEF.			/// specified node are ISD::UNDEF.
	bool allOperandsUndef(const SDNode *N);			bool allOperandsUndef(const SDNode *N);

				/// Return true if the specified node is FREEZE(UNDEF).
				bool isFreezeUndef(const SDNode *N);

	} // end namespace ISD			} // end namespace ISD

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	/// Unlike LLVM values, Selection DAG nodes may return multiple			/// Unlike LLVM values, Selection DAG nodes may return multiple
	/// values as the result of a computation. Many nodes return multiple values,			/// values as the result of a computation. Many nodes return multiple values,
	/// from loads (which define a token and a return value) to ADDC (which returns			/// from loads (which define a token and a return value) to ADDC (which returns
	/// a result and a carry value), to calls (which may return an arbitrary number			/// a result and a carry value), to calls (which may return an arbitrary number
	/// of values).			/// of values).
	▲ Show 20 Lines • Show All 2,972 Lines • Show Last 20 Lines

llvm/include/llvm/Target/TargetSelectionDAG.td

Show First 20 Lines • Show All 169 Lines • ▼ Show 20 Lines
def SDTExtInreg : SDTypeProfile<1, 2, [ // sext_inreg		def SDTExtInreg : SDTypeProfile<1, 2, [ // sext_inreg
SDTCisSameAs<0, 1>, SDTCisInt<0>, SDTCisVT<2, OtherVT>,		SDTCisSameAs<0, 1>, SDTCisInt<0>, SDTCisVT<2, OtherVT>,
SDTCisVTSmallerThanOp<2, 1>		SDTCisVTSmallerThanOp<2, 1>
]>;		]>;
def SDTExtInvec : SDTypeProfile<1, 1, [ // sext_invec		def SDTExtInvec : SDTypeProfile<1, 1, [ // sext_invec
SDTCisInt<0>, SDTCisVec<0>, SDTCisInt<1>, SDTCisVec<1>,		SDTCisInt<0>, SDTCisVec<0>, SDTCisInt<1>, SDTCisVec<1>,
SDTCisOpSmallerThanOp<1, 0>		SDTCisOpSmallerThanOp<1, 0>
]>;		]>;
		def SDTFreeze : SDTypeProfile<1, 1, [
		SDTCisSameAs<0, 1>
		]>;

def SDTSetCC : SDTypeProfile<1, 3, [ // setcc		def SDTSetCC : SDTypeProfile<1, 3, [ // setcc
SDTCisInt<0>, SDTCisSameAs<1, 2>, SDTCisVT<3, OtherVT>		SDTCisInt<0>, SDTCisSameAs<1, 2>, SDTCisVT<3, OtherVT>
]>;		]>;

def SDTSelect : SDTypeProfile<1, 3, [ // select		def SDTSelect : SDTypeProfile<1, 3, [ // select
SDTCisInt<1>, SDTCisSameAs<0, 2>, SDTCisSameAs<2, 3>		SDTCisInt<1>, SDTCisSameAs<0, 2>, SDTCisSameAs<2, 3>
]>;		]>;
▲ Show 20 Lines • Show All 262 Lines • ▼ Show 20 Lines
def ctlz_zero_undef : SDNode<"ISD::CTLZ_ZERO_UNDEF", SDTIntBitCountUnaryOp>;		def ctlz_zero_undef : SDNode<"ISD::CTLZ_ZERO_UNDEF", SDTIntBitCountUnaryOp>;
def cttz_zero_undef : SDNode<"ISD::CTTZ_ZERO_UNDEF", SDTIntBitCountUnaryOp>;		def cttz_zero_undef : SDNode<"ISD::CTTZ_ZERO_UNDEF", SDTIntBitCountUnaryOp>;
def sext : SDNode<"ISD::SIGN_EXTEND", SDTIntExtendOp>;		def sext : SDNode<"ISD::SIGN_EXTEND", SDTIntExtendOp>;
def zext : SDNode<"ISD::ZERO_EXTEND", SDTIntExtendOp>;		def zext : SDNode<"ISD::ZERO_EXTEND", SDTIntExtendOp>;
def anyext : SDNode<"ISD::ANY_EXTEND" , SDTIntExtendOp>;		def anyext : SDNode<"ISD::ANY_EXTEND" , SDTIntExtendOp>;
def trunc : SDNode<"ISD::TRUNCATE" , SDTIntTruncOp>;		def trunc : SDNode<"ISD::TRUNCATE" , SDTIntTruncOp>;
def bitconvert : SDNode<"ISD::BITCAST" , SDTUnaryOp>;		def bitconvert : SDNode<"ISD::BITCAST" , SDTUnaryOp>;
def addrspacecast : SDNode<"ISD::ADDRSPACECAST", SDTUnaryOp>;		def addrspacecast : SDNode<"ISD::ADDRSPACECAST", SDTUnaryOp>;
		def freeze : SDNode<"ISD::FREEZE" , SDTFreeze>;
		craig.topperUnsubmitted Done Reply Inline Actions Don't use SDTUnaryOp here. Use `SDTypeProfile<1, 1, [SDTCisSameAs<0, 1>]>` SDTUnaryOp doesn't force the input and output to have the same type. I think that prevented `VT` from propagating through the freeze to the undef and that left the undef untyped causing the pattern to be dropped. craig.topper: Don't use SDTUnaryOp here. Use `SDTypeProfile<1, 1, [SDTCisSameAs<0, 1>]>` SDTUnaryOp doesn't…
def extractelt : SDNode<"ISD::EXTRACT_VECTOR_ELT", SDTVecExtract>;		def extractelt : SDNode<"ISD::EXTRACT_VECTOR_ELT", SDTVecExtract>;
def insertelt : SDNode<"ISD::INSERT_VECTOR_ELT", SDTVecInsert>;		def insertelt : SDNode<"ISD::INSERT_VECTOR_ELT", SDTVecInsert>;

def vecreduce_add : SDNode<"ISD::VECREDUCE_ADD", SDTVecReduce>;		def vecreduce_add : SDNode<"ISD::VECREDUCE_ADD", SDTVecReduce>;
def vecreduce_smax : SDNode<"ISD::VECREDUCE_SMAX", SDTVecReduce>;		def vecreduce_smax : SDNode<"ISD::VECREDUCE_SMAX", SDTVecReduce>;
def vecreduce_umax : SDNode<"ISD::VECREDUCE_UMAX", SDTVecReduce>;		def vecreduce_umax : SDNode<"ISD::VECREDUCE_UMAX", SDTVecReduce>;
def vecreduce_smin : SDNode<"ISD::VECREDUCE_SMIN", SDTVecReduce>;		def vecreduce_smin : SDNode<"ISD::VECREDUCE_SMIN", SDTVecReduce>;
def vecreduce_umin : SDNode<"ISD::VECREDUCE_UMIN", SDTVecReduce>;		def vecreduce_umin : SDNode<"ISD::VECREDUCE_UMIN", SDTVecReduce>;
▲ Show 20 Lines • Show All 831 Lines • ▼ Show 20 Lines	def post_truncstvi8 : PatFrag<(ops node:$val, node:$base, node:$offset),
let ScalarMemoryVT = i8;		let ScalarMemoryVT = i8;
}		}
def post_truncstvi16 : PatFrag<(ops node:$val, node:$base, node:$offset),		def post_truncstvi16 : PatFrag<(ops node:$val, node:$base, node:$offset),
(post_truncst node:$val, node:$base, node:$offset)> {		(post_truncst node:$val, node:$base, node:$offset)> {
let IsStore = true;		let IsStore = true;
let ScalarMemoryVT = i16;		let ScalarMemoryVT = i16;
}		}

		// A helper for matching undef or freeze undef
		def undef_or_freeze_undef : PatFrags<(ops), [(undef), (freeze undef)]>;

// TODO: Split these into volatile and unordered flavors to enable		// TODO: Split these into volatile and unordered flavors to enable
// selectively legal optimizations for each. (See D66309)		// selectively legal optimizations for each. (See D66309)
def simple_load : PatFrag<(ops node:$ptr),		def simple_load : PatFrag<(ops node:$ptr),
(load node:$ptr), [{		(load node:$ptr), [{
return cast<LoadSDNode>(N)->isSimple();		return cast<LoadSDNode>(N)->isSimple();
}]>;		}]>;
def simple_store : PatFrag<(ops node:$val, node:$ptr),		def simple_store : PatFrag<(ops node:$val, node:$ptr),
(store node:$val, node:$ptr), [{		(store node:$val, node:$ptr), [{
▲ Show 20 Lines • Show All 557 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 294 Lines • ▼ Show 20 Lines	bool ISD::allOperandsUndef(const SDNode *N) {
// Return false if the node has no operands.		// Return false if the node has no operands.
// This is "logically inconsistent" with the definition of "all" but		// This is "logically inconsistent" with the definition of "all" but
// is probably the desired behavior.		// is probably the desired behavior.
if (N->getNumOperands() == 0)		if (N->getNumOperands() == 0)
return false;		return false;
return all_of(N->op_values(), [](SDValue Op) { return Op.isUndef(); });		return all_of(N->op_values(), [](SDValue Op) { return Op.isUndef(); });
}		}

		bool ISD::isFreezeUndef(const SDNode *N) {
		return N->getOpcode() == ISD::FREEZE && N->getOperand(0).isUndef();
		}

bool ISD::matchUnaryPredicate(SDValue Op,		bool ISD::matchUnaryPredicate(SDValue Op,
std::function<bool(ConstantSDNode *)> Match,		std::function<bool(ConstantSDNode *)> Match,
bool AllowUndefs) {		bool AllowUndefs) {
// FIXME: Add support for scalar UNDEF cases?		// FIXME: Add support for scalar UNDEF cases?
if (auto *Cst = dyn_cast<ConstantSDNode>(Op))		if (auto *Cst = dyn_cast<ConstantSDNode>(Op))
return Match(Cst);		return Match(Cst);

// FIXME: Add support for vector UNDEF cases?		// FIXME: Add support for vector UNDEF cases?
▲ Show 20 Lines • Show All 11,607 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 11,455 Lines • ▼ Show 20 Lines	static SDValue LowerAVXCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG,
const X86Subtarget &Subtarget) {		const X86Subtarget &Subtarget) {
SDLoc dl(Op);		SDLoc dl(Op);
MVT ResVT = Op.getSimpleValueType();		MVT ResVT = Op.getSimpleValueType();

assert((ResVT.is256BitVector() \|\|		assert((ResVT.is256BitVector() \|\|
ResVT.is512BitVector()) && "Value type must be 256-/512-bit wide");		ResVT.is512BitVector()) && "Value type must be 256-/512-bit wide");

unsigned NumOperands = Op.getNumOperands();		unsigned NumOperands = Op.getNumOperands();
		unsigned NumFreezeUndef = 0;
unsigned NumZero = 0;		unsigned NumZero = 0;
unsigned NumNonZero = 0;		unsigned NumNonZero = 0;
unsigned NonZeros = 0;		unsigned NonZeros = 0;
for (unsigned i = 0; i != NumOperands; ++i) {		for (unsigned i = 0; i != NumOperands; ++i) {
SDValue SubVec = Op.getOperand(i);		SDValue SubVec = Op.getOperand(i);
if (SubVec.isUndef())		if (SubVec.isUndef())
continue;		continue;
if (ISD::isBuildVectorAllZeros(SubVec.getNode()))		if (ISD::isFreezeUndef(SubVec.getNode()) && SubVec.hasOneUse())
		++NumFreezeUndef;
		else if (ISD::isBuildVectorAllZeros(SubVec.getNode()))
++NumZero;		++NumZero;
else {		else {
assert(i < sizeof(NonZeros) * CHAR_BIT); // Ensure the shift is in range.		assert(i < sizeof(NonZeros) * CHAR_BIT); // Ensure the shift is in range.
NonZeros \|= 1 << i;		NonZeros \|= 1 << i;
++NumNonZero;		++NumNonZero;
}		}
}		}

// If we have more than 2 non-zeros, build each half separately.		// If we have more than 2 non-zeros, build each half separately.
if (NumNonZero > 2) {		if (NumNonZero > 2) {
MVT HalfVT = ResVT.getHalfNumVectorElementsVT();		MVT HalfVT = ResVT.getHalfNumVectorElementsVT();
ArrayRef<SDUse> Ops = Op->ops();		ArrayRef<SDUse> Ops = Op->ops();
SDValue Lo = DAG.getNode(ISD::CONCAT_VECTORS, dl, HalfVT,		SDValue Lo = DAG.getNode(ISD::CONCAT_VECTORS, dl, HalfVT,
Ops.slice(0, NumOperands/2));		Ops.slice(0, NumOperands/2));
SDValue Hi = DAG.getNode(ISD::CONCAT_VECTORS, dl, HalfVT,		SDValue Hi = DAG.getNode(ISD::CONCAT_VECTORS, dl, HalfVT,
Ops.slice(NumOperands/2));		Ops.slice(NumOperands/2));
return DAG.getNode(ISD::CONCAT_VECTORS, dl, ResVT, Lo, Hi);		return DAG.getNode(ISD::CONCAT_VECTORS, dl, ResVT, Lo, Hi);
}		}

// Otherwise, build it up through insert_subvectors.		// Otherwise, build it up through insert_subvectors.
SDValue Vec = NumZero ? getZeroVector(ResVT, Subtarget, DAG, dl)		SDValue Vec = NumZero ? getZeroVector(ResVT, Subtarget, DAG, dl)
: DAG.getUNDEF(ResVT);		: (NumFreezeUndef ? DAG.getFreeze(DAG.getUNDEF(ResVT))
		: DAG.getUNDEF(ResVT));
		RKSimonUnsubmitted Not Done Reply Inline Actions Is there any reason we couldn't just always use DAG.getFreeze(DAG.getUNDEF(ResVT)) ? RKSimon: Is there any reason we couldn't just always use DAG.getFreeze(DAG.getUNDEF(ResVT)) ?
		aqjuneAuthorUnsubmitted Done Reply Inline Actions I tried using `DAG.getFreeze(DAG.getUNDEF(ResVT))`, and it needs updates in existing lowering functions to make the following tests pass: LLVM :: CodeGen/X86/haddsub-undef.ll LLVM :: CodeGen/X86/oddsubvector.ll LLVM :: CodeGen/X86/subvector-broadcast.ll LLVM :: CodeGen/X86/vector-interleaved-load-i16-stride-3.ll ... It causes insertion of `vinsert` instruction instead of efficient ops. I think it is good to keep `DAG.getUNDEF(ResVT)` to avoid regression. aqjune:* I tried using `DAG.getFreeze(DAG.getUNDEF(ResVT))`, and it needs updates in existing lowering…
		RKSimonUnsubmitted Not Done Reply Inline Actions Thanks - I'll take a look at those cases after this patch has gone in - we have some custom vector widening patterns that we'll need to adjust to handle freeze(undef) RKSimon: Thanks - I'll take a look at those cases after this patch has gone in - we have some custom…

MVT SubVT = Op.getOperand(0).getSimpleValueType();		MVT SubVT = Op.getOperand(0).getSimpleValueType();
unsigned NumSubElems = SubVT.getVectorNumElements();		unsigned NumSubElems = SubVT.getVectorNumElements();
for (unsigned i = 0; i != NumOperands; ++i) {		for (unsigned i = 0; i != NumOperands; ++i) {
if ((NonZeros & (1 << i)) == 0)		if ((NonZeros & (1 << i)) == 0)
continue;		continue;

Vec = DAG.getNode(ISD::INSERT_SUBVECTOR, dl, ResVT, Vec,		Vec = DAG.getNode(ISD::INSERT_SUBVECTOR, dl, ResVT, Vec,
▲ Show 20 Lines • Show All 45,095 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86InstrVecCompiler.td

	Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines

	// Patterns for insert_subvector/extract_subvector to/from index=0			// Patterns for insert_subvector/extract_subvector to/from index=0
	multiclass subvector_subreg_lowering<RegisterClass subRC, ValueType subVT,			multiclass subvector_subreg_lowering<RegisterClass subRC, ValueType subVT,
	RegisterClass RC, ValueType VT,			RegisterClass RC, ValueType VT,
	SubRegIndex subIdx> {			SubRegIndex subIdx> {
	def : Pat<(subVT (extract_subvector (VT RC:$src), (iPTR 0))),			def : Pat<(subVT (extract_subvector (VT RC:$src), (iPTR 0))),
	(subVT (EXTRACT_SUBREG RC:$src, subIdx))>;			(subVT (EXTRACT_SUBREG RC:$src, subIdx))>;

	def : Pat<(VT (insert_subvector undef, subRC:$src, (iPTR 0))),			def : Pat<(VT (insert_subvector undef_or_freeze_undef, subRC:$src, (iPTR 0))),
	(VT (INSERT_SUBREG (IMPLICIT_DEF), subRC:$src, subIdx))>;			(VT (INSERT_SUBREG (IMPLICIT_DEF), subRC:$src, subIdx))>;
	}			}

				RKSimonUnsubmitted Done Reply Inline Actions Could we add a generic pattern for "undef_or_freeze_undef" that we could use here as I imagine a lot more places are going to need something like this and we should try to avoid duplication if we can? RKSimon: Could we add a generic pattern for "undef_or_freeze_undef" that we could use here as I imagine…
				craig.topperUnsubmitted Done Reply Inline Actions With the change I mentioned in TargetSelectionDAG.td you shouldn't need to mention `VT` near freeze or undef. The one at the start of the pattern will be enough. craig.topper: With the change I mentioned in TargetSelectionDAG.td you shouldn't need to mention `VT` near…
	// A 128-bit subvector extract from the first 256-bit vector position is a			// A 128-bit subvector extract from the first 256-bit vector position is a
	// subregister copy that needs no instruction. Likewise, a 128-bit subvector			// subregister copy that needs no instruction. Likewise, a 128-bit subvector
				aqjuneAuthorUnsubmitted Done Reply Inline Actions I found that using `insert_subvector (freeze undef)` instead of `insert_subvector (freeze (VT undef))` causes this rule to be silently ignored during instruction selection. What would the possible reasons be? I tried to figure it out, but couldn't. aqjune: I found that using `insert_subvector (freeze undef)` instead of `insert_subvector (freeze (VT…
				aqjuneAuthorUnsubmitted Done Reply Inline Actions FWIW, `insert_subvector (VT (freeze undef)) ..` also results in generating `vblendps $15, %ymm0, %ymm0, %ymm0`. It seems `(freeze undef)` isn't being recognized. :/ Anyway, `freeze (VT undef)` works well. :) aqjune: FWIW, `insert_subvector (VT (freeze undef)) ..` also results in generating `vblendps $15…
	// insert to the first 256-bit vector position is a subregister copy that needs			// insert to the first 256-bit vector position is a subregister copy that needs
	// no instruction.			// no instruction.
	defm : subvector_subreg_lowering<VR128, v4i32, VR256, v8i32, sub_xmm>;			defm : subvector_subreg_lowering<VR128, v4i32, VR256, v8i32, sub_xmm>;
	defm : subvector_subreg_lowering<VR128, v4f32, VR256, v8f32, sub_xmm>;			defm : subvector_subreg_lowering<VR128, v4f32, VR256, v8f32, sub_xmm>;
	defm : subvector_subreg_lowering<VR128, v2i64, VR256, v4i64, sub_xmm>;			defm : subvector_subreg_lowering<VR128, v2i64, VR256, v4i64, sub_xmm>;
	defm : subvector_subreg_lowering<VR128, v2f64, VR256, v4f64, sub_xmm>;			defm : subvector_subreg_lowering<VR128, v2f64, VR256, v4f64, sub_xmm>;
	defm : subvector_subreg_lowering<VR128, v8i16, VR256, v16i16, sub_xmm>;			defm : subvector_subreg_lowering<VR128, v8i16, VR256, v16i16, sub_xmm>;
	defm : subvector_subreg_lowering<VR128, v16i8, VR256, v32i8, sub_xmm>;			defm : subvector_subreg_lowering<VR128, v16i8, VR256, v32i8, sub_xmm>;
	▲ Show 20 Lines • Show All 391 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/avx-intrinsics-fast-isel.ll

Show First 20 Lines • Show All 252 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret{{[l\|q]}}
%res = shufflevector <2 x double> %a0, <2 x double> %a0, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>		%res = shufflevector <2 x double> %a0, <2 x double> %a0, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
ret <4 x double> %res		ret <4 x double> %res
}		}

define <4 x double> @test_mm256_castpd128_pd256_freeze(<2 x double> %a0) nounwind {		define <4 x double> @test_mm256_castpd128_pd256_freeze(<2 x double> %a0) nounwind {
; CHECK-LABEL: test_mm256_castpd128_pd256_freeze:		; CHECK-LABEL: test_mm256_castpd128_pd256_freeze:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0		; CHECK-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
; CHECK-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
; CHECK-NEXT: ret{{[l\|q]}}		; CHECK-NEXT: ret{{[l\|q]}}
%a1 = freeze <2 x double> poison		%a1 = freeze <2 x double> poison
%res = shufflevector <2 x double> %a0, <2 x double> %a1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		%res = shufflevector <2 x double> %a0, <2 x double> %a1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
ret <4 x double> %res		ret <4 x double> %res
}		}

define <2 x double> @test_mm256_castpd256_pd128(<4 x double> %a0) nounwind {		define <2 x double> @test_mm256_castpd256_pd128(<4 x double> %a0) nounwind {
; CHECK-LABEL: test_mm256_castpd256_pd128:		; CHECK-LABEL: test_mm256_castpd256_pd128:
Show All 29 Lines	; CHECK-NEXT: ret{{[l\|q]}}
%res = shufflevector <4 x float> %a0, <4 x float> %a0, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>		%res = shufflevector <4 x float> %a0, <4 x float> %a0, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
ret <8 x float> %res		ret <8 x float> %res
}		}

define <8 x float> @test_mm256_castps128_ps256_freeze(<4 x float> %a0) nounwind {		define <8 x float> @test_mm256_castps128_ps256_freeze(<4 x float> %a0) nounwind {
; CHECK-LABEL: test_mm256_castps128_ps256_freeze:		; CHECK-LABEL: test_mm256_castps128_ps256_freeze:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0		; CHECK-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
; CHECK-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
; CHECK-NEXT: ret{{[l\|q]}}		; CHECK-NEXT: ret{{[l\|q]}}
%a1 = freeze <4 x float> poison		%a1 = freeze <4 x float> poison
%res = shufflevector <4 x float> %a0, <4 x float> %a1, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		%res = shufflevector <4 x float> %a0, <4 x float> %a1, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
ret <8 x float> %res		ret <8 x float> %res
}		}

define <4 x float> @test_mm256_castps256_ps128(<8 x float> %a0) nounwind {		define <4 x float> @test_mm256_castps256_ps128(<8 x float> %a0) nounwind {
; CHECK-LABEL: test_mm256_castps256_ps128:		; CHECK-LABEL: test_mm256_castps256_ps128:
Show All 13 Lines	; CHECK-NEXT: ret{{[l\|q]}}
%res = shufflevector <2 x i64> %a0, <2 x i64> %a0, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>		%res = shufflevector <2 x i64> %a0, <2 x i64> %a0, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
ret <4 x i64> %res		ret <4 x i64> %res
}		}

define <4 x i64> @test_mm256_castsi128_si256_freeze(<2 x i64> %a0) nounwind {		define <4 x i64> @test_mm256_castsi128_si256_freeze(<2 x i64> %a0) nounwind {
; CHECK-LABEL: test_mm256_castsi128_si256_freeze:		; CHECK-LABEL: test_mm256_castsi128_si256_freeze:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0		; CHECK-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
; CHECK-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
; CHECK-NEXT: ret{{[l\|q]}}		; CHECK-NEXT: ret{{[l\|q]}}
%a1 = freeze <2 x i64> poison		%a1 = freeze <2 x i64> poison
%res = shufflevector <2 x i64> %a0, <2 x i64> %a1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		%res = shufflevector <2 x i64> %a0, <2 x i64> %a1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
ret <4 x i64> %res		ret <4 x i64> %res
}		}

define <4 x double> @test_mm256_castsi256_pd(<4 x i64> %a0) nounwind {		define <4 x double> @test_mm256_castsi256_pd(<4 x i64> %a0) nounwind {
; CHECK-LABEL: test_mm256_castsi256_pd:		; CHECK-LABEL: test_mm256_castsi256_pd:
▲ Show 20 Lines • Show All 2,753 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/avx-intrinsics-x86.ll

	Show First 20 Lines • Show All 1,027 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]			; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	%res = call <2 x i64> @llvm.x86.pclmulqdq(<2 x i64> %a0, <2 x i64> %a1, i8 0) ; <<2 x i64>> [#uses=1]			%res = call <2 x i64> @llvm.x86.pclmulqdq(<2 x i64> %a0, <2 x i64> %a1, i8 0) ; <<2 x i64>> [#uses=1]
	ret <2 x i64> %res			ret <2 x i64> %res
	}			}
	declare <2 x i64> @llvm.x86.pclmulqdq(<2 x i64>, <2 x i64>, i8) nounwind readnone			declare <2 x i64> @llvm.x86.pclmulqdq(<2 x i64>, <2 x i64>, i8) nounwind readnone


	define <4 x double> @test_mm256_castpd128_pd256_freeze(<2 x double> %a0) nounwind {			define <4 x double> @test_mm256_castpd128_pd256_freeze(<2 x double> %a0) nounwind {
	; AVX-LABEL: test_mm256_castpd128_pd256_freeze:			; CHECK-LABEL: test_mm256_castpd128_pd256_freeze:
	; AVX: # %bb.0:			; CHECK: # %bb.0:
	; AVX-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0			; CHECK-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
	; AVX-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0 # encoding: [0xc4,0xe3,0x7d,0x18,0xc0,0x01]			; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	; AVX-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	;
	; AVX512VL-LABEL: test_mm256_castpd128_pd256_freeze:
	; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
	; AVX512VL-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc4,0xe3,0x7d,0x18,0xc0,0x01]
	; AVX512VL-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	%a1 = freeze <2 x double> poison			%a1 = freeze <2 x double> poison
	%res = shufflevector <2 x double> %a0, <2 x double> %a1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%res = shufflevector <2 x double> %a0, <2 x double> %a1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	ret <4 x double> %res			ret <4 x double> %res
	}			}


	define <8 x float> @test_mm256_castps128_ps256_freeze(<4 x float> %a0) nounwind {			define <8 x float> @test_mm256_castps128_ps256_freeze(<4 x float> %a0) nounwind {
	; AVX-LABEL: test_mm256_castps128_ps256_freeze:			; CHECK-LABEL: test_mm256_castps128_ps256_freeze:
	; AVX: # %bb.0:			; CHECK: # %bb.0:
	; AVX-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0			; CHECK-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
	; AVX-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0 # encoding: [0xc4,0xe3,0x7d,0x18,0xc0,0x01]			; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	; AVX-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	;
	; AVX512VL-LABEL: test_mm256_castps128_ps256_freeze:
	; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
	; AVX512VL-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc4,0xe3,0x7d,0x18,0xc0,0x01]
	; AVX512VL-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	%a1 = freeze <4 x float> poison			%a1 = freeze <4 x float> poison
	%res = shufflevector <4 x float> %a0, <4 x float> %a1, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			%res = shufflevector <4 x float> %a0, <4 x float> %a1, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	ret <8 x float> %res			ret <8 x float> %res
	}			}


	define <4 x i64> @test_mm256_castsi128_si256_freeze(<2 x i64> %a0) nounwind {			define <4 x i64> @test_mm256_castsi128_si256_freeze(<2 x i64> %a0) nounwind {
	; AVX-LABEL: test_mm256_castsi128_si256_freeze:			; CHECK-LABEL: test_mm256_castsi128_si256_freeze:
	; AVX: # %bb.0:			; CHECK: # %bb.0:
	; AVX-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0			; CHECK-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
	; AVX-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0 # encoding: [0xc4,0xe3,0x7d,0x18,0xc0,0x01]			; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	; AVX-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	;
	; AVX512VL-LABEL: test_mm256_castsi128_si256_freeze:
	; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
	; AVX512VL-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc4,0xe3,0x7d,0x18,0xc0,0x01]
	; AVX512VL-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	%a1 = freeze <2 x i64> poison			%a1 = freeze <2 x i64> poison
	%res = shufflevector <2 x i64> %a0, <2 x i64> %a1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%res = shufflevector <2 x i64> %a0, <2 x i64> %a1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	ret <4 x i64> %res			ret <4 x i64> %res
	}			}

llvm/test/CodeGen/X86/avx512-intrinsics.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,504 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret{{[l\|q]}}
ret <8 x double> %res		ret <8 x double> %res
}		}


define <8 x double> @test_mm256_castpd256_pd256_freeze(<4 x double> %a0) nounwind {		define <8 x double> @test_mm256_castpd256_pd256_freeze(<4 x double> %a0) nounwind {
; CHECK-LABEL: test_mm256_castpd256_pd256_freeze:		; CHECK-LABEL: test_mm256_castpd256_pd256_freeze:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0		; CHECK-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0
; CHECK-NEXT: vinsertf64x4 $1, %ymm0, %zmm0, %zmm0
; CHECK-NEXT: ret{{[l\|q]}}		; CHECK-NEXT: ret{{[l\|q]}}
%a1 = freeze <4 x double> poison		%a1 = freeze <4 x double> poison
%res = shufflevector <4 x double> %a0, <4 x double> %a1, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		%res = shufflevector <4 x double> %a0, <4 x double> %a1, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
ret <8 x double> %res		ret <8 x double> %res
}		}


define <16 x float> @test_mm256_castps128_ps512_freeze(<4 x float> %a0) nounwind {		define <16 x float> @test_mm256_castps128_ps512_freeze(<4 x float> %a0) nounwind {
Show All 9 Lines	; CHECK-NEXT: ret{{[l\|q]}}
ret <16 x float> %res		ret <16 x float> %res
}		}


define <16 x float> @test_mm256_castps256_ps512_freeze(<8 x float> %a0) nounwind {		define <16 x float> @test_mm256_castps256_ps512_freeze(<8 x float> %a0) nounwind {
; CHECK-LABEL: test_mm256_castps256_ps512_freeze:		; CHECK-LABEL: test_mm256_castps256_ps512_freeze:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0		; CHECK-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0
; CHECK-NEXT: vinsertf64x4 $1, %ymm0, %zmm0, %zmm0
; CHECK-NEXT: ret{{[l\|q]}}		; CHECK-NEXT: ret{{[l\|q]}}
%a1 = freeze <8 x float> poison		%a1 = freeze <8 x float> poison
%res = shufflevector <8 x float> %a0, <8 x float> %a1, <16x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%res = shufflevector <8 x float> %a0, <8 x float> %a1, <16x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
ret <16 x float> %res		ret <16 x float> %res
}		}


define <8 x i64> @test_mm512_castsi128_si512_freeze(<2 x i64> %a0) nounwind {		define <8 x i64> @test_mm512_castsi128_si512_freeze(<2 x i64> %a0) nounwind {
Show All 9 Lines	; CHECK-NEXT: ret{{[l\|q]}}
ret <8 x i64> %res		ret <8 x i64> %res
}		}


define <8 x i64> @test_mm512_castsi256_si512_pd256_freeze(<4 x i64> %a0) nounwind {		define <8 x i64> @test_mm512_castsi256_si512_pd256_freeze(<4 x i64> %a0) nounwind {
; CHECK-LABEL: test_mm512_castsi256_si512_pd256_freeze:		; CHECK-LABEL: test_mm512_castsi256_si512_pd256_freeze:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0		; CHECK-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0
; CHECK-NEXT: vinsertf64x4 $1, %ymm0, %zmm0, %zmm0
; CHECK-NEXT: ret{{[l\|q]}}		; CHECK-NEXT: ret{{[l\|q]}}
%a1 = freeze <4 x i64> poison		%a1 = freeze <4 x i64> poison
%res = shufflevector <4 x i64> %a0, <4 x i64> %a1, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		%res = shufflevector <4 x i64> %a0, <4 x i64> %a1, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
ret <8 x i64> %res		ret <8 x i64> %res
}		}


define <16 x float> @bad_mask_transition(<8 x double> %a, <8 x double> %b, <8 x double> %c, <8 x double> %d, <16 x float> %e, <16 x float> %f) {		define <16 x float> @bad_mask_transition(<8 x double> %a, <8 x double> %b, <8 x double> %c, <8 x double> %d, <16 x float> %e, <16 x float> %f) {
▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/avx512fp16-intrinsics.ll

Show First 20 Lines • Show All 1,215 Lines • ▼ Show 20 Lines	; CHECK-NEXT: retq
ret <8 x half> %res		ret <8 x half> %res
}		}


define <16 x half> @test_mm256_castph128_ph256_freeze(<8 x half> %a0) nounwind {		define <16 x half> @test_mm256_castph128_ph256_freeze(<8 x half> %a0) nounwind {
; CHECK-LABEL: test_mm256_castph128_ph256_freeze:		; CHECK-LABEL: test_mm256_castph128_ph256_freeze:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0		; CHECK-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
; CHECK-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%a1 = freeze <8 x half> poison		%a1 = freeze <8 x half> poison
%res = shufflevector <8 x half> %a0, <8 x half> %a1, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%res = shufflevector <8 x half> %a0, <8 x half> %a1, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
ret <16 x half> %res		ret <16 x half> %res
}		}


define <32 x half> @test_mm512_castph128_ph512_freeze(<8 x half> %a0) nounwind {		define <32 x half> @test_mm512_castph128_ph512_freeze(<8 x half> %a0) nounwind {
Show All 9 Lines	; CHECK-NEXT: retq
ret <32 x half> %res		ret <32 x half> %res
}		}


define <32 x half> @test_mm512_castph256_ph512_freeze(<16 x half> %a0) nounwind {		define <32 x half> @test_mm512_castph256_ph512_freeze(<16 x half> %a0) nounwind {
; CHECK-LABEL: test_mm512_castph256_ph512_freeze:		; CHECK-LABEL: test_mm512_castph256_ph512_freeze:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0		; CHECK-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0
; CHECK-NEXT: vinsertf64x4 $1, %ymm0, %zmm0, %zmm0
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%a1 = freeze <16 x half> poison		%a1 = freeze <16 x half> poison
%res = shufflevector <16 x half> %a0, <16 x half> %a1, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>		%res = shufflevector <16 x half> %a0, <16 x half> %a1, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
ret <32 x half> %res		ret <32 x half> %res
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Generate efficient assembly for freeze(poison) version of `mm*_cast*` intel intrinsicsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 451715

llvm/include/llvm/CodeGen/SelectionDAGNodes.h

llvm/include/llvm/Target/TargetSelectionDAG.td

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/lib/Target/X86/X86InstrVecCompiler.td

llvm/test/CodeGen/X86/avx-intrinsics-fast-isel.ll

llvm/test/CodeGen/X86/avx-intrinsics-x86.ll

llvm/test/CodeGen/X86/avx512-intrinsics.ll

llvm/test/CodeGen/X86/avx512fp16-intrinsics.ll

[CodeGen] Generate efficient assembly for freeze(poison) version of `mm_cast` intel intrinsics
ClosedPublic