This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] visitEXTRACT_SUBVECTOR - 'little to big' extract_subvector(bitcast()) support
ClosedPublic

Authored by spatel on Jun 26 2019, 5:26 AM.

Download Raw Diff

Details

Reviewers

efriedma
MatzeB
aemerson
evandro
RKSimon
craig.topper

Commits

rG8cefc37be5ab: [DAGCombine] visitEXTRACT_SUBVECTOR - 'little to big' extract_subvector(bitcast…

Summary

This moves the X86 specific transform from rL364407 into DAGCombiner to generically handle 'little to big' cases (e.g. extract_subvector(v2i64 bitcast(v16i8))), this allows us to remove both the x86 implementation and the aarch64 bitcast(extract_subvector(bitcast())) combine (note none of the regressions are due to removing this).

We see a number of regressions which is why I went for making it x86 in the first place:

AArch64's poor handling of shuffle(bitcast(),bitcast()) cases - AArch64ISD::DUPLANE cases in particular as well as "extract high" patterns.

The merge-store.ll change is interesting - AArch64TargetLowering::allowsMisalignedMemoryAccesses permits 'fast' unaligned v2i64 stores in all cases which is being exposed by removing the bitcast.

Most likely we need to address these issues before this patch can proceed - I'm happy to help if someone can suggest what is the most critical to fix first.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

RKSimon created this revision.Jun 26 2019, 5:26 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 26 2019, 5:26 AM

Herald added subscribers: kristof.beyls, javed.absar. · View Herald Transcript

It looks like the arm64-neon-2velem.ll regressions are a shuffle lowering issue, yes; we're creating a DUPLANE where the operand is an extract_subvector, and it doesn't simplify.

The failure to match fcvtl2 probably is just a matter of fixing the pattern in the .td file: it explicitly checks for an operation where the operand is an extract_subvector.

Not sure what the right resolution is for allowsMisalignedMemoryAccesses; we currently only use FeatureSlowMisaligned128Store for certain Exynos targets. Probably not too important.

spatel mentioned this in D71515: [AArch64] match fcvtl2 with bitcasted extract.Dec 14 2019, 8:29 AM

spatel mentioned this in rG5e5e99c041e4: [AArch64] match fcvtl2 with bitcasted extract.Dec 18 2019, 5:55 AM

Commandeering to rebase. D71515 / rG5e5e99c041e4 removes one of the problems, but we probably need at least 1 other fix.

Herald added a subscriber: mcrosier. · View Herald TranscriptDec 18 2019, 8:17 AM

Patch updated:
Rebased to current trunk. The previous rev did not apply cleanly anymore. Some AArch diffs are gone as expected. There are x86 test diffs now, but these appear neutral or better AFAICT.

Herald added a subscriber: hiraditya. · View Herald TranscriptDec 18 2019, 8:21 AM

spatel mentioned this in rGb99111b3e4ab: [AArch64] add tests for bitcasted DUPLANE; NFC.Dec 18 2019, 9:22 AM

spatel mentioned this in D71672: [AArch64] match splat of bitcasted extract subvector to DUPLANE.Dec 18 2019, 12:49 PM

spatel mentioned this in rG0b38af89e2c0: [AArch64] match splat of bitcasted extract subvector to DUPLANE.Dec 22 2019, 6:02 AM

Patch updated:
Rebased after AArch64 change - rG0b38af89e2c0
I think that was the last known regression that required fixing.

LGTM - thanks for completing this

This revision is now accepted and ready to land.Dec 22 2019, 9:43 PM

Closed by commit rG8cefc37be5ab: [DAGCombine] visitEXTRACT_SUBVECTOR - 'little to big' extract_subvector(bitcast… (authored by spatel). · Explain WhyDec 23 2019, 7:25 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

18 lines

Target/

AArch64/

AArch64ISelLowering.cpp

71 lines

X86/

X86ISelLowering.cpp

26 lines

test/

CodeGen/

AArch64/

merge-store.ll

20 lines

ARM/

combine-vmovdrr.ll

4 lines

X86/

avg-mask.ll

68 lines

madd.ll

12 lines

masked_store_trunc_ssat.ll

4 lines

shuffle-vs-trunc-512.ll

36 lines

Diff 235144

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 18,509 Lines • ▼ Show 20 Lines	if ((SrcNumElts % DestNumElts) == 0) {
unsigned IndexValScaled = N->getConstantOperandVal(1) * SrcDestRatio;		unsigned IndexValScaled = N->getConstantOperandVal(1) * SrcDestRatio;
SDLoc DL(N);		SDLoc DL(N);
SDValue NewIndex = DAG.getIntPtrConstant(IndexValScaled, DL);		SDValue NewIndex = DAG.getIntPtrConstant(IndexValScaled, DL);
SDValue NewExtract = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, NewExtVT,		SDValue NewExtract = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, NewExtVT,
V.getOperand(0), NewIndex);		V.getOperand(0), NewIndex);
return DAG.getBitcast(NVT, NewExtract);		return DAG.getBitcast(NVT, NewExtract);
}		}
}		}
// TODO - handle (DestNumElts % SrcNumElts) == 0		if ((DestNumElts % SrcNumElts) == 0) {
		unsigned DestSrcRatio = DestNumElts / SrcNumElts;
		if ((NVT.getVectorNumElements() % DestSrcRatio) == 0) {
		unsigned NewExtNumElts = NVT.getVectorNumElements() / DestSrcRatio;
		EVT NewExtVT = EVT::getVectorVT(*DAG.getContext(),
		SrcVT.getScalarType(), NewExtNumElts);
		if ((N->getConstantOperandVal(1) % DestSrcRatio) == 0 &&
		TLI.isOperationLegalOrCustom(ISD::EXTRACT_SUBVECTOR, NewExtVT)) {
		unsigned IndexValScaled = N->getConstantOperandVal(1) / DestSrcRatio;
		SDLoc DL(N);
		SDValue NewIndex = DAG.getIntPtrConstant(IndexValScaled, DL);
		SDValue NewExtract = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, NewExtVT,
		V.getOperand(0), NewIndex);
		return DAG.getBitcast(NVT, NewExtract);
		}
		}
		}
}		}

// Combine:		// Combine:
// (extract_subvec (concat V1, V2, ...), i)		// (extract_subvec (concat V1, V2, ...), i)
// Into:		// Into:
// Vi if possible		// Vi if possible
// Only operand 0 is checked as 'concat' assumes all inputs of the same		// Only operand 0 is checked as 'concat' assumes all inputs of the same
// type.		// type.
▲ Show 20 Lines • Show All 2,622 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 612 Lines • ▼ Show 20 Lines	AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setTargetDAGCombine(ISD::FDIV);		setTargetDAGCombine(ISD::FDIV);

setTargetDAGCombine(ISD::INTRINSIC_WO_CHAIN);		setTargetDAGCombine(ISD::INTRINSIC_WO_CHAIN);

setTargetDAGCombine(ISD::ANY_EXTEND);		setTargetDAGCombine(ISD::ANY_EXTEND);
setTargetDAGCombine(ISD::ZERO_EXTEND);		setTargetDAGCombine(ISD::ZERO_EXTEND);
setTargetDAGCombine(ISD::SIGN_EXTEND);		setTargetDAGCombine(ISD::SIGN_EXTEND);
setTargetDAGCombine(ISD::SIGN_EXTEND_INREG);		setTargetDAGCombine(ISD::SIGN_EXTEND_INREG);
setTargetDAGCombine(ISD::BITCAST);
setTargetDAGCombine(ISD::CONCAT_VECTORS);		setTargetDAGCombine(ISD::CONCAT_VECTORS);
setTargetDAGCombine(ISD::STORE);		setTargetDAGCombine(ISD::STORE);
if (Subtarget->supportsAddressTopByteIgnored())		if (Subtarget->supportsAddressTopByteIgnored())
setTargetDAGCombine(ISD::LOAD);		setTargetDAGCombine(ISD::LOAD);

setTargetDAGCombine(ISD::MUL);		setTargetDAGCombine(ISD::MUL);

setTargetDAGCombine(ISD::SELECT);		setTargetDAGCombine(ISD::SELECT);
▲ Show 20 Lines • Show All 9,550 Lines • ▼ Show 20 Lines	if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(N1)) {
if (VT == MVT::i64 && ShiftAmt == 32 &&		if (VT == MVT::i64 && ShiftAmt == 32 &&
DAG.MaskedValueIsZero(N00, APInt::getHighBitsSet(64, 32)))		DAG.MaskedValueIsZero(N00, APInt::getHighBitsSet(64, 32)))
return DAG.getNode(ISD::ROTR, DL, VT, N0, N1);		return DAG.getNode(ISD::ROTR, DL, VT, N0, N1);
}		}
}		}
return SDValue();		return SDValue();
}		}

static SDValue performBitcastCombine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI,
SelectionDAG &DAG) {
// Wait 'til after everything is legalized to try this. That way we have
// legal vector types and such.
if (DCI.isBeforeLegalizeOps())
return SDValue();

// Remove extraneous bitcasts around an extract_subvector.
// For example,
// (v4i16 (bitconvert
// (extract_subvector (v2i64 (bitconvert (v8i16 ...)), (i64 1)))))
// becomes
// (extract_subvector ((v8i16 ...), (i64 4)))

// Only interested in 64-bit vectors as the ultimate result.
EVT VT = N->getValueType(0);
if (!VT.isVector() \|\| VT.isScalableVector())
return SDValue();
if (VT.getSimpleVT().getSizeInBits() != 64)
return SDValue();
// Is the operand an extract_subvector starting at the beginning or halfway
// point of the vector? A low half may also come through as an
// EXTRACT_SUBREG, so look for that, too.
SDValue Op0 = N->getOperand(0);
if (Op0->getOpcode() != ISD::EXTRACT_SUBVECTOR &&
!(Op0->isMachineOpcode() &&
Op0->getMachineOpcode() == AArch64::EXTRACT_SUBREG))
return SDValue();
uint64_t idx = cast<ConstantSDNode>(Op0->getOperand(1))->getZExtValue();
if (Op0->getOpcode() == ISD::EXTRACT_SUBVECTOR) {
if (Op0->getValueType(0).getVectorNumElements() != idx && idx != 0)
return SDValue();
} else if (Op0->getMachineOpcode() == AArch64::EXTRACT_SUBREG) {
if (idx != AArch64::dsub)
return SDValue();
// The dsub reference is equivalent to a lane zero subvector reference.
idx = 0;
}
// Look through the bitcast of the input to the extract.
if (Op0->getOperand(0)->getOpcode() != ISD::BITCAST)
return SDValue();
SDValue Source = Op0->getOperand(0)->getOperand(0);
// If the source type has twice the number of elements as our destination
// type, we know this is an extract of the high or low half of the vector.
EVT SVT = Source->getValueType(0);
if (!SVT.isVector() \|\|
SVT.getVectorNumElements() != VT.getVectorNumElements() * 2)
return SDValue();

LLVM_DEBUG(
dbgs() << "aarch64-lower: bitcast extract_subvector simplification\n");

// Create the simplified form to just extract the low or high half of the
// vector directly rather than bothering with the bitcasts.
SDLoc dl(N);
unsigned NumElements = VT.getVectorNumElements();
if (idx) {
SDValue HalfIdx = DAG.getConstant(NumElements, dl, MVT::i64);
return DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, VT, Source, HalfIdx);
} else {
SDValue SubReg = DAG.getTargetConstant(AArch64::dsub, dl, MVT::i32);
return SDValue(DAG.getMachineNode(TargetOpcode::EXTRACT_SUBREG, dl, VT,
Source, SubReg),
0);
}
}

static SDValue performConcatVectorsCombine(SDNode *N,		static SDValue performConcatVectorsCombine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI,		TargetLowering::DAGCombinerInfo &DCI,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
SDLoc dl(N);		SDLoc dl(N);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDValue N0 = N->getOperand(0), N1 = N->getOperand(1);		SDValue N0 = N->getOperand(0), N1 = N->getOperand(1);

// Optimize concat_vectors of truncated vectors, where the intermediate		// Optimize concat_vectors of truncated vectors, where the intermediate
▲ Show 20 Lines • Show All 2,184 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::PerformDAGCombine(SDNode *N,
case ISD::INTRINSIC_WO_CHAIN:		case ISD::INTRINSIC_WO_CHAIN:
return performIntrinsicCombine(N, DCI, Subtarget);		return performIntrinsicCombine(N, DCI, Subtarget);
case ISD::ANY_EXTEND:		case ISD::ANY_EXTEND:
case ISD::ZERO_EXTEND:		case ISD::ZERO_EXTEND:
case ISD::SIGN_EXTEND:		case ISD::SIGN_EXTEND:
return performExtendCombine(N, DCI, DAG);		return performExtendCombine(N, DCI, DAG);
case ISD::SIGN_EXTEND_INREG:		case ISD::SIGN_EXTEND_INREG:
return performSignExtendInRegCombine(N, DCI, DAG);		return performSignExtendInRegCombine(N, DCI, DAG);
case ISD::BITCAST:
return performBitcastCombine(N, DCI, DAG);
case ISD::CONCAT_VECTORS:		case ISD::CONCAT_VECTORS:
return performConcatVectorsCombine(N, DCI, DAG);		return performConcatVectorsCombine(N, DCI, DAG);
case ISD::SELECT:		case ISD::SELECT:
return performSelectCombine(N, DCI);		return performSelectCombine(N, DCI);
case ISD::VSELECT:		case ISD::VSELECT:
return performVSelectCombine(N, DCI.DAG);		return performVSelectCombine(N, DCI.DAG);
case ISD::LOAD:		case ISD::LOAD:
if (performTBISimplification(N->getOperand(1), DCI, DAG))		if (performTBISimplification(N->getOperand(1), DCI, DAG))
▲ Show 20 Lines • Show All 794 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 45,097 Lines • ▼ Show 20 Lines	static SDValue combineExtractSubvector(SDNode *N, SelectionDAG &DAG,
// back to this type.		// back to this type.
if (!N->getValueType(0).isSimple())		if (!N->getValueType(0).isSimple())
return SDValue();		return SDValue();

MVT VT = N->getSimpleValueType(0);		MVT VT = N->getSimpleValueType(0);
SDValue InVec = N->getOperand(0);		SDValue InVec = N->getOperand(0);
SDValue InVecBC = peekThroughBitcasts(InVec);		SDValue InVecBC = peekThroughBitcasts(InVec);
EVT InVecVT = InVec.getValueType();		EVT InVecVT = InVec.getValueType();
EVT InVecBCVT = InVecBC.getValueType();
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();

if (Subtarget.hasAVX() && !Subtarget.hasAVX2() &&		if (Subtarget.hasAVX() && !Subtarget.hasAVX2() &&
TLI.isTypeLegal(InVecVT) &&		TLI.isTypeLegal(InVecVT) &&
InVecVT.getSizeInBits() == 256 && InVecBC.getOpcode() == ISD::AND) {		InVecVT.getSizeInBits() == 256 && InVecBC.getOpcode() == ISD::AND) {
auto isConcatenatedNot = [] (SDValue V) {		auto isConcatenatedNot = [] (SDValue V) {
V = peekThroughBitcasts(V);		V = peekThroughBitcasts(V);
if (!isBitwiseNot(V))		if (!isBitwiseNot(V))
Show All 27 Lines	if (ISD::isBuildVectorAllOnes(InVec.getNode())) {
return getOnesVector(VT, DAG, SDLoc(N));		return getOnesVector(VT, DAG, SDLoc(N));
}		}

if (InVec.getOpcode() == ISD::BUILD_VECTOR)		if (InVec.getOpcode() == ISD::BUILD_VECTOR)
return DAG.getBuildVector(		return DAG.getBuildVector(
VT, SDLoc(N),		VT, SDLoc(N),
InVec.getNode()->ops().slice(IdxVal, VT.getVectorNumElements()));		InVec.getNode()->ops().slice(IdxVal, VT.getVectorNumElements()));

// Try to move vector bitcast after extract_subv by scaling extraction index:
// extract_subv (bitcast X), Index --> bitcast (extract_subv X, Index')
// TODO: Move this to DAGCombiner::visitEXTRACT_SUBVECTOR
if (InVec != InVecBC && InVecBCVT.isVector()) {
unsigned SrcNumElts = InVecBCVT.getVectorNumElements();
unsigned DestNumElts = InVecVT.getVectorNumElements();
if ((DestNumElts % SrcNumElts) == 0) {
unsigned DestSrcRatio = DestNumElts / SrcNumElts;
if ((VT.getVectorNumElements() % DestSrcRatio) == 0) {
unsigned NewExtNumElts = VT.getVectorNumElements() / DestSrcRatio;
EVT NewExtVT = EVT::getVectorVT(*DAG.getContext(),
InVecBCVT.getScalarType(), NewExtNumElts);
if ((N->getConstantOperandVal(1) % DestSrcRatio) == 0 &&
TLI.isOperationLegalOrCustom(ISD::EXTRACT_SUBVECTOR, NewExtVT)) {
unsigned IndexValScaled = N->getConstantOperandVal(1) / DestSrcRatio;
SDLoc DL(N);
SDValue NewIndex = DAG.getIntPtrConstant(IndexValScaled, DL);
SDValue NewExtract = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, NewExtVT,
InVecBC, NewIndex);
return DAG.getBitcast(VT, NewExtract);
}
}
}
}

// If we are extracting from an insert into a zero vector, replace with a		// If we are extracting from an insert into a zero vector, replace with a
// smaller insert into zero if we don't access less than the original		// smaller insert into zero if we don't access less than the original
// subvector. Don't do this for i1 vectors.		// subvector. Don't do this for i1 vectors.
if (VT.getVectorElementType() != MVT::i1 &&		if (VT.getVectorElementType() != MVT::i1 &&
InVec.getOpcode() == ISD::INSERT_SUBVECTOR && IdxVal == 0 &&		InVec.getOpcode() == ISD::INSERT_SUBVECTOR && IdxVal == 0 &&
InVec.hasOneUse() && isNullConstant(InVec.getOperand(2)) &&		InVec.hasOneUse() && isNullConstant(InVec.getOperand(2)) &&
ISD::isBuildVectorAllZeros(InVec.getOperand(0).getNode()) &&		ISD::isBuildVectorAllZeros(InVec.getOperand(0).getNode()) &&
InVec.getOperand(1).getValueSizeInBits() <= VT.getSizeInBits()) {		InVec.getOperand(1).getValueSizeInBits() <= VT.getSizeInBits()) {
▲ Show 20 Lines • Show All 1,575 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/merge-store.ll

	Show All 36 Lines

	; PR21711 - Merge vector stores into wider vector stores.			; PR21711 - Merge vector stores into wider vector stores.

	; On Cyclone, the stores should not get merged into a 16-byte store because			; On Cyclone, the stores should not get merged into a 16-byte store because
	; unaligned 16-byte stores are slow. This test would infinite loop when			; unaligned 16-byte stores are slow. This test would infinite loop when
	; the fastness of unaligned accesses was not specified correctly.			; the fastness of unaligned accesses was not specified correctly.

	define void @merge_vec_extract_stores(<4 x float> %v1, <2 x float>* %ptr) {			define void @merge_vec_extract_stores(<4 x float> %v1, <2 x float>* %ptr) {
	; SPLITTING-LABEL: merge_vec_extract_stores:			; CHECK-LABEL: merge_vec_extract_stores:
	; SPLITTING: // %bb.0:			; CHECK: // %bb.0:
	; SPLITTING-NEXT: ext v1.16b, v0.16b, v0.16b, #8			; CHECK-NEXT: stur q0, [x0, #24]
	; SPLITTING-NEXT: str d0, [x0, #24]			; CHECK-NEXT: ret
	; SPLITTING-NEXT: str d1, [x0, #32]
	; SPLITTING-NEXT: ret
	;
	; MISALIGNED-LABEL: merge_vec_extract_stores:
	; MISALIGNED: // %bb.0:
	; MISALIGNED-NEXT: stur q0, [x0, #24]
	; MISALIGNED-NEXT: ret
	%idx0 = getelementptr inbounds <2 x float>, <2 x float>* %ptr, i64 3			%idx0 = getelementptr inbounds <2 x float>, <2 x float>* %ptr, i64 3
	%idx1 = getelementptr inbounds <2 x float>, <2 x float>* %ptr, i64 4			%idx1 = getelementptr inbounds <2 x float>, <2 x float>* %ptr, i64 4

	%shuffle0 = shufflevector <4 x float> %v1, <4 x float> undef, <2 x i32> <i32 0, i32 1>			%shuffle0 = shufflevector <4 x float> %v1, <4 x float> undef, <2 x i32> <i32 0, i32 1>
	%shuffle1 = shufflevector <4 x float> %v1, <4 x float> undef, <2 x i32> <i32 2, i32 3>			%shuffle1 = shufflevector <4 x float> %v1, <4 x float> undef, <2 x i32> <i32 2, i32 3>

	store <2 x float> %shuffle0, <2 x float>* %idx0, align 8			store <2 x float> %shuffle0, <2 x float>* %idx0, align 8
	store <2 x float> %shuffle1, <2 x float>* %idx1, align 8			store <2 x float> %shuffle1, <2 x float>* %idx1, align 8
	ret void			ret void


	; FIXME: Ideally we would like to use a generic target for this test, but this relies
	; on suppressing store pairs.

	}			}

llvm/test/CodeGen/ARM/combine-vmovdrr.ll

	; RUN: llc %s -o - \| FileCheck %s			; RUN: llc %s -o - \| FileCheck %s

	target triple = "thumbv7s-apple-ios"			target triple = "thumbv7s-apple-ios"

	declare <8 x i8> @llvm.arm.neon.vtbl2(<8 x i8> %shuffle.i.i307, <8 x i8> %shuffle.i27.i308, <8 x i8> %vtbl2.i25.i)			declare <8 x i8> @llvm.arm.neon.vtbl2(<8 x i8> %shuffle.i.i307, <8 x i8> %shuffle.i27.i308, <8 x i8> %vtbl2.i25.i)

	; Check that we get the motivating example:			; Check that we get the motivating example:
	; The bitcasts force the values to go through the GPRs, whereas			; The bitcasts force the values to go through the GPRs, whereas
	; they are defined on VPRs and used on VPRs.			; they are defined on VPRs and used on VPRs.
	;			;
	; CHECK-LABEL: motivatingExample:			; CHECK-LABEL: motivatingExample:
	; CHECK: vldr [[ARG2_VAL:d[0-9]+]], [r1]			; CHECK: vld1.32 {[[ARG1_VALlo:d[0-9]+]], [[ARG1_VALhi:d[0-9]+]]}, [r0]
	; CHECK-NEXT: vld1.32 {[[ARG1_VALlo:d[0-9]+]], [[ARG1_VALhi:d[0-9]+]]}, [r0]			; CHECK-NEXT: vldr [[ARG2_VAL:d[0-9]+]], [r1]
	; CHECK-NEXT: vtbl.8 [[RES:d[0-9]+]], {[[ARG1_VALlo]], [[ARG1_VALhi]]}, [[ARG2_VAL]]			; CHECK-NEXT: vtbl.8 [[RES:d[0-9]+]], {[[ARG1_VALlo]], [[ARG1_VALhi]]}, [[ARG2_VAL]]
	; CHECK-NEXT: vstr [[RES]], [r1]			; CHECK-NEXT: vstr [[RES]], [r1]
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	define void @motivatingExample(<2 x i64>* %addr, <8 x i8>* %addr2) {			define void @motivatingExample(<2 x i64>* %addr, <8 x i8>* %addr2) {
	%shuffle.i.bc.i309 = load <2 x i64>, <2 x i64>* %addr			%shuffle.i.bc.i309 = load <2 x i64>, <2 x i64>* %addr
	%vtbl2.i25.i = load <8 x i8>, <8 x i8>* %addr2			%vtbl2.i25.i = load <8 x i8>, <8 x i8>* %addr2
	%shuffle.i.extract.i310 = extractelement <2 x i64> %shuffle.i.bc.i309, i32 0			%shuffle.i.extract.i310 = extractelement <2 x i64> %shuffle.i.bc.i309, i32 0
	%shuffle.i27.extract.i311 = extractelement <2 x i64> %shuffle.i.bc.i309, i32 1			%shuffle.i27.extract.i311 = extractelement <2 x i64> %shuffle.i.bc.i309, i32 1
	▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/avg-mask.ll

	Show First 20 Lines • Show All 124 Lines • ▼ Show 20 Lines
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: vextracti64x4 $1, %zmm2, %ymm3			; AVX512F-NEXT: vextracti64x4 $1, %zmm2, %ymm3
	; AVX512F-NEXT: movq %rdi, %rax			; AVX512F-NEXT: movq %rdi, %rax
	; AVX512F-NEXT: movl %edi, %ecx			; AVX512F-NEXT: movl %edi, %ecx
	; AVX512F-NEXT: kmovw %edi, %k1			; AVX512F-NEXT: kmovw %edi, %k1
	; AVX512F-NEXT: shrq $32, %rdi			; AVX512F-NEXT: shrq $32, %rdi
	; AVX512F-NEXT: shrq $48, %rax			; AVX512F-NEXT: shrq $48, %rax
	; AVX512F-NEXT: shrl $16, %ecx			; AVX512F-NEXT: shrl $16, %ecx
	; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm4			; AVX512F-NEXT: vpavgb %ymm1, %ymm0, %ymm4
	; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm5			; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm1
	; AVX512F-NEXT: vpavgb %ymm4, %ymm5, %ymm4			; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm0
	; AVX512F-NEXT: vpavgb %ymm1, %ymm0, %ymm0			; AVX512F-NEXT: vpavgb %ymm1, %ymm0, %ymm0
	; AVX512F-NEXT: kmovw %ecx, %k2			; AVX512F-NEXT: kmovw %ecx, %k2
	; AVX512F-NEXT: kmovw %eax, %k3			; AVX512F-NEXT: kmovw %eax, %k3
	; AVX512F-NEXT: kmovw %edi, %k4			; AVX512F-NEXT: kmovw %edi, %k4
	; AVX512F-NEXT: vpternlogd $255, %zmm1, %zmm1, %zmm1 {%k4} {z}			; AVX512F-NEXT: vpternlogd $255, %zmm1, %zmm1, %zmm1 {%k4} {z}
	; AVX512F-NEXT: vpmovdb %zmm1, %xmm1			; AVX512F-NEXT: vpmovdb %zmm1, %xmm1
	; AVX512F-NEXT: vpternlogd $255, %zmm5, %zmm5, %zmm5 {%k3} {z}			; AVX512F-NEXT: vpternlogd $255, %zmm5, %zmm5, %zmm5 {%k3} {z}
	; AVX512F-NEXT: vpmovdb %zmm5, %xmm5			; AVX512F-NEXT: vpmovdb %zmm5, %xmm5
	; AVX512F-NEXT: vinserti128 $1, %xmm5, %ymm1, %ymm1			; AVX512F-NEXT: vinserti128 $1, %xmm5, %ymm1, %ymm1
	; AVX512F-NEXT: vpblendvb %ymm1, %ymm4, %ymm3, %ymm1			; AVX512F-NEXT: vpblendvb %ymm1, %ymm0, %ymm3, %ymm0
	; AVX512F-NEXT: vpternlogd $255, %zmm3, %zmm3, %zmm3 {%k1} {z}			; AVX512F-NEXT: vpternlogd $255, %zmm1, %zmm1, %zmm1 {%k1} {z}
				; AVX512F-NEXT: vpmovdb %zmm1, %xmm1
				; AVX512F-NEXT: vpternlogd $255, %zmm3, %zmm3, %zmm3 {%k2} {z}
	; AVX512F-NEXT: vpmovdb %zmm3, %xmm3			; AVX512F-NEXT: vpmovdb %zmm3, %xmm3
	; AVX512F-NEXT: vpternlogd $255, %zmm4, %zmm4, %zmm4 {%k2} {z}			; AVX512F-NEXT: vinserti128 $1, %xmm3, %ymm1, %ymm1
	; AVX512F-NEXT: vpmovdb %zmm4, %xmm4			; AVX512F-NEXT: vpblendvb %ymm1, %ymm4, %ymm2, %ymm1
	; AVX512F-NEXT: vinserti128 $1, %xmm4, %ymm3, %ymm3			; AVX512F-NEXT: vinserti64x4 $1, %ymm0, %zmm1, %zmm0
	; AVX512F-NEXT: vpblendvb %ymm3, %ymm0, %ymm2, %ymm0
	; AVX512F-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512BWVL-LABEL: avg_v64i8_mask:			; AVX512BWVL-LABEL: avg_v64i8_mask:
	; AVX512BWVL: # %bb.0:			; AVX512BWVL: # %bb.0:
	; AVX512BWVL-NEXT: kmovq %rdi, %k1			; AVX512BWVL-NEXT: kmovq %rdi, %k1
	; AVX512BWVL-NEXT: vpavgb %zmm1, %zmm0, %zmm2 {%k1}			; AVX512BWVL-NEXT: vpavgb %zmm1, %zmm0, %zmm2 {%k1}
	; AVX512BWVL-NEXT: vmovdqa64 %zmm2, %zmm0			; AVX512BWVL-NEXT: vmovdqa64 %zmm2, %zmm0
	; AVX512BWVL-NEXT: retq			; AVX512BWVL-NEXT: retq
	Show All 12 Lines
	; AVX512F-LABEL: avg_v64i8_maskz:			; AVX512F-LABEL: avg_v64i8_maskz:
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: movq %rdi, %rax			; AVX512F-NEXT: movq %rdi, %rax
	; AVX512F-NEXT: movl %edi, %ecx			; AVX512F-NEXT: movl %edi, %ecx
	; AVX512F-NEXT: kmovw %edi, %k1			; AVX512F-NEXT: kmovw %edi, %k1
	; AVX512F-NEXT: shrq $32, %rdi			; AVX512F-NEXT: shrq $32, %rdi
	; AVX512F-NEXT: shrq $48, %rax			; AVX512F-NEXT: shrq $48, %rax
	; AVX512F-NEXT: shrl $16, %ecx			; AVX512F-NEXT: shrl $16, %ecx
	; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm2			; AVX512F-NEXT: vpavgb %ymm1, %ymm0, %ymm2
	; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm3			; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm1
	; AVX512F-NEXT: vpavgb %ymm2, %ymm3, %ymm2			; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm0
	; AVX512F-NEXT: vpavgb %ymm1, %ymm0, %ymm0			; AVX512F-NEXT: vpavgb %ymm1, %ymm0, %ymm0
	; AVX512F-NEXT: kmovw %ecx, %k2			; AVX512F-NEXT: kmovw %ecx, %k2
	; AVX512F-NEXT: kmovw %eax, %k3			; AVX512F-NEXT: kmovw %eax, %k3
	; AVX512F-NEXT: kmovw %edi, %k4			; AVX512F-NEXT: kmovw %edi, %k4
	; AVX512F-NEXT: vpternlogd $255, %zmm1, %zmm1, %zmm1 {%k4} {z}			; AVX512F-NEXT: vpternlogd $255, %zmm1, %zmm1, %zmm1 {%k4} {z}
	; AVX512F-NEXT: vpmovdb %zmm1, %xmm1			; AVX512F-NEXT: vpmovdb %zmm1, %xmm1
	; AVX512F-NEXT: vpternlogd $255, %zmm3, %zmm3, %zmm3 {%k3} {z}			; AVX512F-NEXT: vpternlogd $255, %zmm3, %zmm3, %zmm3 {%k3} {z}
	; AVX512F-NEXT: vpmovdb %zmm3, %xmm3			; AVX512F-NEXT: vpmovdb %zmm3, %xmm3
	; AVX512F-NEXT: vinserti128 $1, %xmm3, %ymm1, %ymm1			; AVX512F-NEXT: vinserti128 $1, %xmm3, %ymm1, %ymm1
	; AVX512F-NEXT: vpand %ymm2, %ymm1, %ymm1			; AVX512F-NEXT: vpand %ymm0, %ymm1, %ymm0
	; AVX512F-NEXT: vpternlogd $255, %zmm2, %zmm2, %zmm2 {%k1} {z}			; AVX512F-NEXT: vpternlogd $255, %zmm1, %zmm1, %zmm1 {%k1} {z}
	; AVX512F-NEXT: vpmovdb %zmm2, %xmm2			; AVX512F-NEXT: vpmovdb %zmm1, %xmm1
	; AVX512F-NEXT: vpternlogd $255, %zmm3, %zmm3, %zmm3 {%k2} {z}			; AVX512F-NEXT: vpternlogd $255, %zmm3, %zmm3, %zmm3 {%k2} {z}
	; AVX512F-NEXT: vpmovdb %zmm3, %xmm3			; AVX512F-NEXT: vpmovdb %zmm3, %xmm3
	; AVX512F-NEXT: vinserti128 $1, %xmm3, %ymm2, %ymm2			; AVX512F-NEXT: vinserti128 $1, %xmm3, %ymm1, %ymm1
	; AVX512F-NEXT: vpand %ymm0, %ymm2, %ymm0			; AVX512F-NEXT: vpand %ymm2, %ymm1, %ymm1
	; AVX512F-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0			; AVX512F-NEXT: vinserti64x4 $1, %ymm0, %zmm1, %zmm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512BWVL-LABEL: avg_v64i8_maskz:			; AVX512BWVL-LABEL: avg_v64i8_maskz:
	; AVX512BWVL: # %bb.0:			; AVX512BWVL: # %bb.0:
	; AVX512BWVL-NEXT: kmovq %rdi, %k1			; AVX512BWVL-NEXT: kmovq %rdi, %k1
	; AVX512BWVL-NEXT: vpavgb %zmm1, %zmm0, %zmm0 {%k1} {z}			; AVX512BWVL-NEXT: vpavgb %zmm1, %zmm0, %zmm0 {%k1} {z}
	; AVX512BWVL-NEXT: retq			; AVX512BWVL-NEXT: retq
	%za = zext <64 x i8> %a to <64 x i16>			%za = zext <64 x i8> %a to <64 x i16>
	▲ Show 20 Lines • Show All 116 Lines • ▼ Show 20 Lines
	}			}

	define <32 x i16> @avg_v32i16_mask(<32 x i16> %a, <32 x i16> %b, <32 x i16> %src, i32 %mask) nounwind {			define <32 x i16> @avg_v32i16_mask(<32 x i16> %a, <32 x i16> %b, <32 x i16> %src, i32 %mask) nounwind {
	; AVX512F-LABEL: avg_v32i16_mask:			; AVX512F-LABEL: avg_v32i16_mask:
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: vextracti64x4 $1, %zmm2, %ymm3			; AVX512F-NEXT: vextracti64x4 $1, %zmm2, %ymm3
	; AVX512F-NEXT: kmovw %edi, %k1			; AVX512F-NEXT: kmovw %edi, %k1
	; AVX512F-NEXT: shrl $16, %edi			; AVX512F-NEXT: shrl $16, %edi
	; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm4			; AVX512F-NEXT: vpavgw %ymm1, %ymm0, %ymm4
	; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm5			; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm1
	; AVX512F-NEXT: vpavgw %ymm4, %ymm5, %ymm4			; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm0
	; AVX512F-NEXT: vpavgw %ymm1, %ymm0, %ymm0			; AVX512F-NEXT: vpavgw %ymm1, %ymm0, %ymm0
	; AVX512F-NEXT: kmovw %edi, %k2			; AVX512F-NEXT: kmovw %edi, %k2
	; AVX512F-NEXT: vpternlogd $255, %zmm1, %zmm1, %zmm1 {%k2} {z}			; AVX512F-NEXT: vpternlogd $255, %zmm1, %zmm1, %zmm1 {%k2} {z}
	; AVX512F-NEXT: vpmovdw %zmm1, %ymm1			; AVX512F-NEXT: vpmovdw %zmm1, %ymm1
	; AVX512F-NEXT: vpblendvb %ymm1, %ymm4, %ymm3, %ymm1			; AVX512F-NEXT: vpblendvb %ymm1, %ymm0, %ymm3, %ymm0
	; AVX512F-NEXT: vpternlogd $255, %zmm3, %zmm3, %zmm3 {%k1} {z}			; AVX512F-NEXT: vpternlogd $255, %zmm1, %zmm1, %zmm1 {%k1} {z}
	; AVX512F-NEXT: vpmovdw %zmm3, %ymm3			; AVX512F-NEXT: vpmovdw %zmm1, %ymm1
	; AVX512F-NEXT: vpblendvb %ymm3, %ymm0, %ymm2, %ymm0			; AVX512F-NEXT: vpblendvb %ymm1, %ymm4, %ymm2, %ymm1
	; AVX512F-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0			; AVX512F-NEXT: vinserti64x4 $1, %ymm0, %zmm1, %zmm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512BWVL-LABEL: avg_v32i16_mask:			; AVX512BWVL-LABEL: avg_v32i16_mask:
	; AVX512BWVL: # %bb.0:			; AVX512BWVL: # %bb.0:
	; AVX512BWVL-NEXT: kmovd %edi, %k1			; AVX512BWVL-NEXT: kmovd %edi, %k1
	; AVX512BWVL-NEXT: vpavgw %zmm1, %zmm0, %zmm2 {%k1}			; AVX512BWVL-NEXT: vpavgw %zmm1, %zmm0, %zmm2 {%k1}
	; AVX512BWVL-NEXT: vmovdqa64 %zmm2, %zmm0			; AVX512BWVL-NEXT: vmovdqa64 %zmm2, %zmm0
	; AVX512BWVL-NEXT: retq			; AVX512BWVL-NEXT: retq
	%za = zext <32 x i16> %a to <32 x i32>			%za = zext <32 x i16> %a to <32 x i32>
	%zb = zext <32 x i16> %b to <32 x i32>			%zb = zext <32 x i16> %b to <32 x i32>
	%add = add nuw nsw <32 x i32> %za, %zb			%add = add nuw nsw <32 x i32> %za, %zb
	%add1 = add nuw nsw <32 x i32> %add, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>			%add1 = add nuw nsw <32 x i32> %add, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
	%lshr = lshr <32 x i32> %add1, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>			%lshr = lshr <32 x i32> %add1, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
	%trunc = trunc <32 x i32> %lshr to <32 x i16>			%trunc = trunc <32 x i32> %lshr to <32 x i16>
	%mask1 = bitcast i32 %mask to <32 x i1>			%mask1 = bitcast i32 %mask to <32 x i1>
	%res = select <32 x i1> %mask1, <32 x i16> %trunc, <32 x i16> %src			%res = select <32 x i1> %mask1, <32 x i16> %trunc, <32 x i16> %src
	ret <32 x i16> %res			ret <32 x i16> %res
	}			}

	define <32 x i16> @avg_v32i16_maskz(<32 x i16> %a, <32 x i16> %b, i32 %mask) nounwind {			define <32 x i16> @avg_v32i16_maskz(<32 x i16> %a, <32 x i16> %b, i32 %mask) nounwind {
	; AVX512F-LABEL: avg_v32i16_maskz:			; AVX512F-LABEL: avg_v32i16_maskz:
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: kmovw %edi, %k1			; AVX512F-NEXT: kmovw %edi, %k1
	; AVX512F-NEXT: shrl $16, %edi			; AVX512F-NEXT: shrl $16, %edi
	; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm2			; AVX512F-NEXT: vpavgw %ymm1, %ymm0, %ymm2
	; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm3			; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm1
	; AVX512F-NEXT: vpavgw %ymm2, %ymm3, %ymm2			; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm0
	; AVX512F-NEXT: vpavgw %ymm1, %ymm0, %ymm0			; AVX512F-NEXT: vpavgw %ymm1, %ymm0, %ymm0
	; AVX512F-NEXT: kmovw %edi, %k2			; AVX512F-NEXT: kmovw %edi, %k2
	; AVX512F-NEXT: vpternlogd $255, %zmm1, %zmm1, %zmm1 {%k2} {z}			; AVX512F-NEXT: vpternlogd $255, %zmm1, %zmm1, %zmm1 {%k2} {z}
	; AVX512F-NEXT: vpmovdw %zmm1, %ymm1			; AVX512F-NEXT: vpmovdw %zmm1, %ymm1
				; AVX512F-NEXT: vpand %ymm0, %ymm1, %ymm0
				; AVX512F-NEXT: vpternlogd $255, %zmm1, %zmm1, %zmm1 {%k1} {z}
				; AVX512F-NEXT: vpmovdw %zmm1, %ymm1
	; AVX512F-NEXT: vpand %ymm2, %ymm1, %ymm1			; AVX512F-NEXT: vpand %ymm2, %ymm1, %ymm1
	; AVX512F-NEXT: vpternlogd $255, %zmm2, %zmm2, %zmm2 {%k1} {z}			; AVX512F-NEXT: vinserti64x4 $1, %ymm0, %zmm1, %zmm0
	; AVX512F-NEXT: vpmovdw %zmm2, %ymm2
	; AVX512F-NEXT: vpand %ymm0, %ymm2, %ymm0
	; AVX512F-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512BWVL-LABEL: avg_v32i16_maskz:			; AVX512BWVL-LABEL: avg_v32i16_maskz:
	; AVX512BWVL: # %bb.0:			; AVX512BWVL: # %bb.0:
	; AVX512BWVL-NEXT: kmovd %edi, %k1			; AVX512BWVL-NEXT: kmovd %edi, %k1
	; AVX512BWVL-NEXT: vpavgw %zmm1, %zmm0, %zmm0 {%k1} {z}			; AVX512BWVL-NEXT: vpavgw %zmm1, %zmm0, %zmm0 {%k1} {z}
	; AVX512BWVL-NEXT: retq			; AVX512BWVL-NEXT: retq
	%za = zext <32 x i16> %a to <32 x i32>			%za = zext <32 x i16> %a to <32 x i32>
	Show All 9 Lines

llvm/test/CodeGen/X86/madd.ll

	Show First 20 Lines • Show All 1,969 Lines • ▼ Show 20 Lines
	; AVX2-LABEL: pmaddwd_32:			; AVX2-LABEL: pmaddwd_32:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpmaddwd %ymm2, %ymm0, %ymm0			; AVX2-NEXT: vpmaddwd %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpmaddwd %ymm3, %ymm1, %ymm1			; AVX2-NEXT: vpmaddwd %ymm3, %ymm1, %ymm1
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512F-LABEL: pmaddwd_32:			; AVX512F-LABEL: pmaddwd_32:
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm2			; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm2
	; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm3			; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm3
	; AVX512F-NEXT: vpmaddwd %ymm2, %ymm3, %ymm2			; AVX512F-NEXT: vpmaddwd %ymm3, %ymm2, %ymm2
	; AVX512F-NEXT: vpmaddwd %ymm1, %ymm0, %ymm0			; AVX512F-NEXT: vpmaddwd %ymm1, %ymm0, %ymm0
	; AVX512F-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0			; AVX512F-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512BW-LABEL: pmaddwd_32:			; AVX512BW-LABEL: pmaddwd_32:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpmaddwd %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpmaddwd %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	▲ Show 20 Lines • Show All 194 Lines • ▼ Show 20 Lines
	; AVX2-LABEL: jumbled_indices16:			; AVX2-LABEL: jumbled_indices16:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpmaddwd %ymm2, %ymm0, %ymm0			; AVX2-NEXT: vpmaddwd %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpmaddwd %ymm3, %ymm1, %ymm1			; AVX2-NEXT: vpmaddwd %ymm3, %ymm1, %ymm1
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512F-LABEL: jumbled_indices16:			; AVX512F-LABEL: jumbled_indices16:
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm2			; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm2
	; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm3			; AVX512F-NEXT: vextracti64x4 $1, %zmm1, %ymm3
	; AVX512F-NEXT: vpmaddwd %ymm2, %ymm3, %ymm2			; AVX512F-NEXT: vpmaddwd %ymm3, %ymm2, %ymm2
	; AVX512F-NEXT: vpmaddwd %ymm1, %ymm0, %ymm0			; AVX512F-NEXT: vpmaddwd %ymm1, %ymm0, %ymm0
	; AVX512F-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0			; AVX512F-NEXT: vinserti64x4 $1, %ymm2, %zmm0, %zmm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512BW-LABEL: jumbled_indices16:			; AVX512BW-LABEL: jumbled_indices16:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vpmaddwd %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpmaddwd %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	▲ Show 20 Lines • Show All 711 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/masked_store_trunc_ssat.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 6,368 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: je .LBB15_64			; AVX2-NEXT: je .LBB15_64
	; AVX2-NEXT: .LBB15_63: # %cond.store61			; AVX2-NEXT: .LBB15_63: # %cond.store61
	; AVX2-NEXT: vpextrb $15, %xmm0, 31(%rdi)			; AVX2-NEXT: vpextrb $15, %xmm0, 31(%rdi)
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512F-LABEL: truncstore_v32i16_v32i8:			; AVX512F-LABEL: truncstore_v32i16_v32i8:
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
				; AVX512F-NEXT: vpxor %xmm2, %xmm2, %xmm2
				; AVX512F-NEXT: vpcmpeqb %ymm2, %ymm1, %ymm1
	; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm2			; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm2
	; AVX512F-NEXT: vpxor %xmm3, %xmm3, %xmm3
	; AVX512F-NEXT: vpcmpeqb %ymm3, %ymm1, %ymm1
	; AVX512F-NEXT: vpacksswb %ymm2, %ymm0, %ymm0			; AVX512F-NEXT: vpacksswb %ymm2, %ymm0, %ymm0
	; AVX512F-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,2,1,3]			; AVX512F-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,2,1,3]
	; AVX512F-NEXT: vpmovmskb %ymm1, %eax			; AVX512F-NEXT: vpmovmskb %ymm1, %eax
	; AVX512F-NEXT: notl %eax			; AVX512F-NEXT: notl %eax
	; AVX512F-NEXT: testb $1, %al			; AVX512F-NEXT: testb $1, %al
	; AVX512F-NEXT: jne .LBB15_1			; AVX512F-NEXT: jne .LBB15_1
	; AVX512F-NEXT: # %bb.2: # %else			; AVX512F-NEXT: # %bb.2: # %else
	; AVX512F-NEXT: testb $2, %al			; AVX512F-NEXT: testb $2, %al
	▲ Show 20 Lines • Show All 1,187 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/shuffle-vs-trunc-512.ll

	Show First 20 Lines • Show All 719 Lines • ▼ Show 20 Lines
	; AVX512VBMIVL-NEXT: retq			; AVX512VBMIVL-NEXT: retq
	%res = shufflevector <64 x i8> %x, <64 x i8> %x, <16 x i32> <i32 1, i32 5, i32 9, i32 13, i32 17, i32 21, i32 25, i32 29, i32 33, i32 37, i32 41, i32 45, i32 49, i32 53, i32 57, i32 61>			%res = shufflevector <64 x i8> %x, <64 x i8> %x, <16 x i32> <i32 1, i32 5, i32 9, i32 13, i32 17, i32 21, i32 25, i32 29, i32 33, i32 37, i32 41, i32 45, i32 49, i32 53, i32 57, i32 61>
	ret <16 x i8> %res			ret <16 x i8> %res
	}			}

	define <16 x i8> @trunc_shuffle_v64i8_01_05_09_13_17_21_25_29_33_37_41_45_49_53_57_62(<64 x i8> %x) {			define <16 x i8> @trunc_shuffle_v64i8_01_05_09_13_17_21_25_29_33_37_41_45_49_53_57_62(<64 x i8> %x) {
	; AVX512F-LABEL: trunc_shuffle_v64i8_01_05_09_13_17_21_25_29_33_37_41_45_49_53_57_62:			; AVX512F-LABEL: trunc_shuffle_v64i8_01_05_09_13_17_21_25_29_33_37_41_45_49_53_57_62:
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm1			; AVX512F-NEXT: vextracti128 $1, %ymm0, %xmm1
				; AVX512F-NEXT: vmovdqa {{.*#+}} xmm2 = <1,5,9,13,u,u,u,u,u,u,u,u,u,u,u,u>
				; AVX512F-NEXT: vpshufb %xmm2, %xmm1, %xmm1
				; AVX512F-NEXT: vpshufb %xmm2, %xmm0, %xmm2
				; AVX512F-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[1],xmm1[1]
				; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm0
	; AVX512F-NEXT: vextracti128 $1, %ymm0, %xmm2			; AVX512F-NEXT: vextracti128 $1, %ymm0, %xmm2
	; AVX512F-NEXT: vmovdqa {{.*#+}} xmm3 = <1,5,9,13,u,u,u,u,u,u,u,u,u,u,u,u>
	; AVX512F-NEXT: vpshufb %xmm3, %xmm2, %xmm2
	; AVX512F-NEXT: vpshufb %xmm3, %xmm0, %xmm0
	; AVX512F-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
	; AVX512F-NEXT: vextracti128 $1, %ymm1, %xmm2
	; AVX512F-NEXT: vpshufb {{.*#+}} xmm2 = xmm2[u,u,u,u,1,5,9,14,u,u,u,u,u,u,u,u]			; AVX512F-NEXT: vpshufb {{.*#+}} xmm2 = xmm2[u,u,u,u,1,5,9,14,u,u,u,u,u,u,u,u]
	; AVX512F-NEXT: vpshufb {{.*#+}} xmm1 = xmm1[u,u,u,u,1,5,9,13,u,u,u,u,u,u,u,u]			; AVX512F-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[u,u,u,u,1,5,9,13,u,u,u,u,u,u,u,u]
	; AVX512F-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1]			; AVX512F-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
	; AVX512F-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,3]			; AVX512F-NEXT: vpblendd {{.*#+}} xmm0 = xmm1[0,1],xmm0[2,3]
	; AVX512F-NEXT: vzeroupper			; AVX512F-NEXT: vzeroupper
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: trunc_shuffle_v64i8_01_05_09_13_17_21_25_29_33_37_41_45_49_53_57_62:			; AVX512VL-LABEL: trunc_shuffle_v64i8_01_05_09_13_17_21_25_29_33_37_41_45_49_53_57_62:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vextracti64x4 $1, %zmm0, %ymm1			; AVX512VL-NEXT: vextracti128 $1, %ymm0, %xmm1
				; AVX512VL-NEXT: vmovdqa {{.*#+}} xmm2 = <1,5,9,13,u,u,u,u,u,u,u,u,u,u,u,u>
				; AVX512VL-NEXT: vpshufb %xmm2, %xmm1, %xmm1
				; AVX512VL-NEXT: vpshufb %xmm2, %xmm0, %xmm2
				; AVX512VL-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[1],xmm1[1]
				; AVX512VL-NEXT: vextracti64x4 $1, %zmm0, %ymm0
	; AVX512VL-NEXT: vextracti128 $1, %ymm0, %xmm2			; AVX512VL-NEXT: vextracti128 $1, %ymm0, %xmm2
	; AVX512VL-NEXT: vmovdqa {{.*#+}} xmm3 = <1,5,9,13,u,u,u,u,u,u,u,u,u,u,u,u>
	; AVX512VL-NEXT: vpshufb %xmm3, %xmm2, %xmm2
	; AVX512VL-NEXT: vpshufb %xmm3, %xmm0, %xmm0
	; AVX512VL-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
	; AVX512VL-NEXT: vextracti128 $1, %ymm1, %xmm2
	; AVX512VL-NEXT: vpshufb {{.*#+}} xmm2 = xmm2[u,u,u,u,1,5,9,14,u,u,u,u,u,u,u,u]			; AVX512VL-NEXT: vpshufb {{.*#+}} xmm2 = xmm2[u,u,u,u,1,5,9,14,u,u,u,u,u,u,u,u]
	; AVX512VL-NEXT: vpshufb {{.*#+}} xmm1 = xmm1[u,u,u,u,1,5,9,13,u,u,u,u,u,u,u,u]			; AVX512VL-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[u,u,u,u,1,5,9,13,u,u,u,u,u,u,u,u]
	; AVX512VL-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1]			; AVX512VL-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
	; AVX512VL-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,3]			; AVX512VL-NEXT: vpblendd {{.*#+}} xmm0 = xmm1[0,1],xmm0[2,3]
	; AVX512VL-NEXT: vzeroupper			; AVX512VL-NEXT: vzeroupper
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; AVX512BW-LABEL: trunc_shuffle_v64i8_01_05_09_13_17_21_25_29_33_37_41_45_49_53_57_62:			; AVX512BW-LABEL: trunc_shuffle_v64i8_01_05_09_13_17_21_25_29_33_37_41_45_49_53_57_62:
	; AVX512BW: # %bb.0:			; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: vextracti128 $1, %ymm0, %xmm1			; AVX512BW-NEXT: vextracti128 $1, %ymm0, %xmm1
	; AVX512BW-NEXT: vmovdqa {{.*#+}} xmm2 = <1,5,9,13,u,u,u,u,u,u,u,u,u,u,u,u>			; AVX512BW-NEXT: vmovdqa {{.*#+}} xmm2 = <1,5,9,13,u,u,u,u,u,u,u,u,u,u,u,u>
	; AVX512BW-NEXT: vpshufb %xmm2, %xmm1, %xmm1			; AVX512BW-NEXT: vpshufb %xmm2, %xmm1, %xmm1
	▲ Show 20 Lines • Show All 131 Lines • Show Last 20 Lines