This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] use narrow load to avoid vector extract
ClosedPublic

Authored by spatel on May 25 2017, 4:38 PM.

Download Raw Diff

Details

Reviewers

RKSimon
efriedma
delena
zvi
igorb
niravd

Commits

rG33f4a9728741: [DAGCombiner] use narrow load to avoid vector extract
rL304072: [DAGCombiner] use narrow load to avoid vector extract

Summary

If we have (extract_subvector(load wide vector)) with no other users, that can just be (load narrow vector).

I need help to confirm that all of the test diffs are correct. When I saw how many AArch tests were changing I thought something went wrong, but on closer inspection, we just delete the '2' from all of those instructions. Hooray for mnemonics that actually make sense!

The memop chain updating is based on code that already exists multiple times in x86, so I think that should be pulled into a helper function as a follow-up. I wouldn't have gotten that sequence on my own.

Background: this is a potential improvement noticed via regressions caused by making x86's peekThroughBitcasts() not loop on consecutive bitcasts (see comments in D33137).

Diff Detail

Repository: rL LLVM

Event Timeline

spatel created this revision.May 25 2017, 4:38 PM

Herald added subscribers: javed.absar, mcrosier. · View Herald TranscriptMay 25 2017, 4:38 PM

niravd added inline comments.May 25 2017, 7:39 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14472 ↗	(On Diff #100327)	The hasOneUse restriction seems overly conservative for most Targets and you've already the logic to deal with duplicated loads. This seems like a good place for "isSubVectorExtractFree".

spatel added inline comments.May 26 2017, 6:34 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14472 ↗	(On Diff #100327)	Yes, I agree this is conservative. If it's ok, I'd prefer to make this a TODO comment in this patch, and then I'll follow-up with that small enhancement. I want to be extra cautious with memop transforms to make sure we don't hit any perf or correctness problems. Just removing the one-use check (without checking extract cost) will result in another 31 regression test files changing, so I need to take a close look at all those diffs.

LGTM modulo minor comment.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14470 ↗	(On Diff #100327)	You should use BaseIndexOffset for Offset extraction. This may be better deferred to another patch with a TODO as well.
14472 ↗	(On Diff #100327)	Makes sense to me.

spatel added inline comments.May 26 2017, 6:47 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14470 ↗	(On Diff #100327)	Ah, right - I forgot about that API. I'm running up my TODO tab, but yes, let me tack on one more. :) FWIW, this part of the code is also based on code from x86 lowering, so there may be more refactoring opportunity.

LGTM

This revision is now accepted and ready to land.May 26 2017, 7:35 AM

Closed by commit rL304072: [DAGCombiner] use narrow load to avoid vector extract (authored by spatel). · Explain WhyMay 27 2017, 7:07 AM

This revision was automatically updated to reflect the committed changes.

spatel mentioned this in D33866: [DAGCombiner] loosen restriction for creating narrow vector load from extract(wide load).Jun 3 2017, 9:10 AM

spatel mentioned this in D54073: [x86] allow vector load narrowing with multi-use values.Nov 3 2018, 10:26 AM

spatel mentioned this in rL346595: [x86] allow vector load narrowing with multi-use values.Nov 10 2018, 12:08 PM

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

50 lines

test/

CodeGen/

AArch64/

24 lines

12 lines

24 lines

12 lines

24 lines

ARM/

vcombine.ll

4 lines

vext.ll

8 lines

X86/

avx512bw-intrinsics.ll

6 lines

vector-shuffle-512-v16.ll

3 lines

vector-shuffle-avx512.ll

34 lines

widened-broadcast.ll

73 lines

x86-interleaved-access.ll

12 lines

Diff 100532

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 14,549 Lines • ▼ Show 20 Lines	SDValue Y = ConcatR ? DAG.getBitcast(NarrowBVT, RHS.getOperand(ConcatOpNum))
: DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, NarrowBVT,		: DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, NarrowBVT,
BinOp.getOperand(1),		BinOp.getOperand(1),
DAG.getConstant(ExtBOIdx, DL, ExtBOIdxVT));		DAG.getConstant(ExtBOIdx, DL, ExtBOIdxVT));

SDValue NarrowBinOp = DAG.getNode(BOpcode, DL, NarrowBVT, X, Y);		SDValue NarrowBinOp = DAG.getNode(BOpcode, DL, NarrowBVT, X, Y);
return DAG.getBitcast(VT, NarrowBinOp);		return DAG.getBitcast(VT, NarrowBinOp);
}		}

		/// If we are extracting a subvector from a wide vector load, convert to a
		/// narrow load to eliminate the extraction:
		/// (extract_subvector (load wide vector)) --> (load narrow vector)
		static SDValue narrowExtractedVectorLoad(SDNode *Extract, SelectionDAG &DAG) {
		// TODO: Add support for big-endian. The offset calculation must be adjusted.
		if (DAG.getDataLayout().isBigEndian())
		return SDValue();

		// TODO: The one-use check is overly conservative. Check the cost of the
		// extract instead or remove that condition entirely.
		auto *Ld = dyn_cast<LoadSDNode>(Extract->getOperand(0));
		auto *ExtIdx = dyn_cast<ConstantSDNode>(Extract->getOperand(1));
		if (!Ld \|\| !Ld->hasOneUse() \|\| Ld->isVolatile() \|\| !ExtIdx)
		return SDValue();

		// The narrow load will be offset from the base address of the old load if
		// we are extracting from something besides index 0 (little-endian).
		EVT VT = Extract->getValueType(0);
		SDLoc DL(Extract);
		SDValue BaseAddr = Ld->getOperand(1);
		unsigned Offset = ExtIdx->getZExtValue() * VT.getScalarType().getStoreSize();

		// TODO: Use "BaseIndexOffset" to make this more effective.
		SDValue NewAddr = DAG.getMemBasePlusOffset(BaseAddr, Offset, DL);
		MachineFunction &MF = DAG.getMachineFunction();
		MachineMemOperand *MMO = MF.getMachineMemOperand(Ld->getMemOperand(), Offset,
		VT.getStoreSize());
		SDValue NewLd = DAG.getLoad(VT, DL, Ld->getChain(), NewAddr, MMO);

		// The new load must have the same position as the old load in terms of memory
		// dependency. Create a TokenFactor for Ld and NewLd and update uses of Ld's
		// output chain to use that TokenFactor.
		// TODO: This code is based on a similar sequence in x86 lowering. It should
		// be moved to a helper function, so it can be shared and reused.
		if (Ld->hasAnyUseOfValue(1)) {
		SDValue OldChain = SDValue(Ld, 1);
		SDValue NewChain = SDValue(NewLd.getNode(), 1);
		SDValue TokenFactor = DAG.getNode(ISD::TokenFactor, DL, MVT::Other,
		OldChain, NewChain);
		DAG.ReplaceAllUsesOfValueWith(OldChain, TokenFactor);
		DAG.UpdateNodeOperands(TokenFactor.getNode(), OldChain, NewChain);
		}

		return NewLd;
		}

SDValue DAGCombiner::visitEXTRACT_SUBVECTOR(SDNode* N) {		SDValue DAGCombiner::visitEXTRACT_SUBVECTOR(SDNode* N) {
EVT NVT = N->getValueType(0);		EVT NVT = N->getValueType(0);
SDValue V = N->getOperand(0);		SDValue V = N->getOperand(0);

// Extract from UNDEF is UNDEF.		// Extract from UNDEF is UNDEF.
if (V.isUndef())		if (V.isUndef())
return DAG.getUNDEF(NVT);		return DAG.getUNDEF(NVT);

		if (TLI.isOperationLegalOrCustomOrPromote(ISD::LOAD, NVT))
		if (SDValue NarrowLoad = narrowExtractedVectorLoad(N, DAG))
		return NarrowLoad;

// Combine:		// Combine:
// (extract_subvec (concat V1, V2, ...), i)		// (extract_subvec (concat V1, V2, ...), i)
// Into:		// Into:
// Vi if possible		// Vi if possible
// Only operand 0 is checked as 'concat' assumes all inputs of the same		// Only operand 0 is checked as 'concat' assumes all inputs of the same
// type.		// type.
if (V->getOpcode() == ISD::CONCAT_VECTORS &&		if (V->getOpcode() == ISD::CONCAT_VECTORS &&
isa<ConstantSDNode>(N->getOperand(1)) &&		isa<ConstantSDNode>(N->getOperand(1)) &&
▲ Show 20 Lines • Show All 2,177 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/arm64-vabs.ll

Show All 27 Lines	;CHECK: sabdl.2d
%tmp2 = load <2 x i32>, <2 x i32>* %B		%tmp2 = load <2 x i32>, <2 x i32>* %B
%tmp3 = call <2 x i32> @llvm.aarch64.neon.sabd.v2i32(<2 x i32> %tmp1, <2 x i32> %tmp2)		%tmp3 = call <2 x i32> @llvm.aarch64.neon.sabd.v2i32(<2 x i32> %tmp1, <2 x i32> %tmp2)
%tmp4 = zext <2 x i32> %tmp3 to <2 x i64>		%tmp4 = zext <2 x i32> %tmp3 to <2 x i64>
ret <2 x i64> %tmp4		ret <2 x i64> %tmp4
}		}

define <8 x i16> @sabdl2_8h(<16 x i8>* %A, <16 x i8>* %B) nounwind {		define <8 x i16> @sabdl2_8h(<16 x i8>* %A, <16 x i8>* %B) nounwind {
;CHECK-LABEL: sabdl2_8h:		;CHECK-LABEL: sabdl2_8h:
;CHECK: sabdl2.8h		;CHECK: sabdl.8h
%load1 = load <16 x i8>, <16 x i8>* %A		%load1 = load <16 x i8>, <16 x i8>* %A
%load2 = load <16 x i8>, <16 x i8>* %B		%load2 = load <16 x i8>, <16 x i8>* %B
%tmp1 = shufflevector <16 x i8> %load1, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%tmp1 = shufflevector <16 x i8> %load1, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%tmp2 = shufflevector <16 x i8> %load2, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%tmp2 = shufflevector <16 x i8> %load2, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%tmp3 = call <8 x i8> @llvm.aarch64.neon.sabd.v8i8(<8 x i8> %tmp1, <8 x i8> %tmp2)		%tmp3 = call <8 x i8> @llvm.aarch64.neon.sabd.v8i8(<8 x i8> %tmp1, <8 x i8> %tmp2)
%tmp4 = zext <8 x i8> %tmp3 to <8 x i16>		%tmp4 = zext <8 x i8> %tmp3 to <8 x i16>
ret <8 x i16> %tmp4		ret <8 x i16> %tmp4
}		}

define <4 x i32> @sabdl2_4s(<8 x i16>* %A, <8 x i16>* %B) nounwind {		define <4 x i32> @sabdl2_4s(<8 x i16>* %A, <8 x i16>* %B) nounwind {
;CHECK-LABEL: sabdl2_4s:		;CHECK-LABEL: sabdl2_4s:
;CHECK: sabdl2.4s		;CHECK: sabdl.4s
%load1 = load <8 x i16>, <8 x i16>* %A		%load1 = load <8 x i16>, <8 x i16>* %A
%load2 = load <8 x i16>, <8 x i16>* %B		%load2 = load <8 x i16>, <8 x i16>* %B
%tmp1 = shufflevector <8 x i16> %load1, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		%tmp1 = shufflevector <8 x i16> %load1, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%tmp2 = shufflevector <8 x i16> %load2, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		%tmp2 = shufflevector <8 x i16> %load2, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%tmp3 = call <4 x i16> @llvm.aarch64.neon.sabd.v4i16(<4 x i16> %tmp1, <4 x i16> %tmp2)		%tmp3 = call <4 x i16> @llvm.aarch64.neon.sabd.v4i16(<4 x i16> %tmp1, <4 x i16> %tmp2)
%tmp4 = zext <4 x i16> %tmp3 to <4 x i32>		%tmp4 = zext <4 x i16> %tmp3 to <4 x i32>
ret <4 x i32> %tmp4		ret <4 x i32> %tmp4
}		}

define <2 x i64> @sabdl2_2d(<4 x i32>* %A, <4 x i32>* %B) nounwind {		define <2 x i64> @sabdl2_2d(<4 x i32>* %A, <4 x i32>* %B) nounwind {
;CHECK-LABEL: sabdl2_2d:		;CHECK-LABEL: sabdl2_2d:
;CHECK: sabdl2.2d		;CHECK: sabdl.2d
%load1 = load <4 x i32>, <4 x i32>* %A		%load1 = load <4 x i32>, <4 x i32>* %A
%load2 = load <4 x i32>, <4 x i32>* %B		%load2 = load <4 x i32>, <4 x i32>* %B
%tmp1 = shufflevector <4 x i32> %load1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>		%tmp1 = shufflevector <4 x i32> %load1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%tmp2 = shufflevector <4 x i32> %load2, <4 x i32> undef, <2 x i32> <i32 2, i32 3>		%tmp2 = shufflevector <4 x i32> %load2, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%tmp3 = call <2 x i32> @llvm.aarch64.neon.sabd.v2i32(<2 x i32> %tmp1, <2 x i32> %tmp2)		%tmp3 = call <2 x i32> @llvm.aarch64.neon.sabd.v2i32(<2 x i32> %tmp1, <2 x i32> %tmp2)
%tmp4 = zext <2 x i32> %tmp3 to <2 x i64>		%tmp4 = zext <2 x i32> %tmp3 to <2 x i64>
ret <2 x i64> %tmp4		ret <2 x i64> %tmp4
}		}
Show All 25 Lines	;CHECK: uabdl.2d
%tmp2 = load <2 x i32>, <2 x i32>* %B		%tmp2 = load <2 x i32>, <2 x i32>* %B
%tmp3 = call <2 x i32> @llvm.aarch64.neon.uabd.v2i32(<2 x i32> %tmp1, <2 x i32> %tmp2)		%tmp3 = call <2 x i32> @llvm.aarch64.neon.uabd.v2i32(<2 x i32> %tmp1, <2 x i32> %tmp2)
%tmp4 = zext <2 x i32> %tmp3 to <2 x i64>		%tmp4 = zext <2 x i32> %tmp3 to <2 x i64>
ret <2 x i64> %tmp4		ret <2 x i64> %tmp4
}		}

define <8 x i16> @uabdl2_8h(<16 x i8>* %A, <16 x i8>* %B) nounwind {		define <8 x i16> @uabdl2_8h(<16 x i8>* %A, <16 x i8>* %B) nounwind {
;CHECK-LABEL: uabdl2_8h:		;CHECK-LABEL: uabdl2_8h:
;CHECK: uabdl2.8h		;CHECK: uabdl.8h
%load1 = load <16 x i8>, <16 x i8>* %A		%load1 = load <16 x i8>, <16 x i8>* %A
%load2 = load <16 x i8>, <16 x i8>* %B		%load2 = load <16 x i8>, <16 x i8>* %B
%tmp1 = shufflevector <16 x i8> %load1, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%tmp1 = shufflevector <16 x i8> %load1, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%tmp2 = shufflevector <16 x i8> %load2, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%tmp2 = shufflevector <16 x i8> %load2, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>

%tmp3 = call <8 x i8> @llvm.aarch64.neon.uabd.v8i8(<8 x i8> %tmp1, <8 x i8> %tmp2)		%tmp3 = call <8 x i8> @llvm.aarch64.neon.uabd.v8i8(<8 x i8> %tmp1, <8 x i8> %tmp2)
%tmp4 = zext <8 x i8> %tmp3 to <8 x i16>		%tmp4 = zext <8 x i8> %tmp3 to <8 x i16>
ret <8 x i16> %tmp4		ret <8 x i16> %tmp4
}		}

define <4 x i32> @uabdl2_4s(<8 x i16>* %A, <8 x i16>* %B) nounwind {		define <4 x i32> @uabdl2_4s(<8 x i16>* %A, <8 x i16>* %B) nounwind {
;CHECK-LABEL: uabdl2_4s:		;CHECK-LABEL: uabdl2_4s:
;CHECK: uabdl2.4s		;CHECK: uabdl.4s
%load1 = load <8 x i16>, <8 x i16>* %A		%load1 = load <8 x i16>, <8 x i16>* %A
%load2 = load <8 x i16>, <8 x i16>* %B		%load2 = load <8 x i16>, <8 x i16>* %B
%tmp1 = shufflevector <8 x i16> %load1, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		%tmp1 = shufflevector <8 x i16> %load1, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%tmp2 = shufflevector <8 x i16> %load2, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		%tmp2 = shufflevector <8 x i16> %load2, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%tmp3 = call <4 x i16> @llvm.aarch64.neon.uabd.v4i16(<4 x i16> %tmp1, <4 x i16> %tmp2)		%tmp3 = call <4 x i16> @llvm.aarch64.neon.uabd.v4i16(<4 x i16> %tmp1, <4 x i16> %tmp2)
%tmp4 = zext <4 x i16> %tmp3 to <4 x i32>		%tmp4 = zext <4 x i16> %tmp3 to <4 x i32>
ret <4 x i32> %tmp4		ret <4 x i32> %tmp4
}		}

define <2 x i64> @uabdl2_2d(<4 x i32>* %A, <4 x i32>* %B) nounwind {		define <2 x i64> @uabdl2_2d(<4 x i32>* %A, <4 x i32>* %B) nounwind {
;CHECK-LABEL: uabdl2_2d:		;CHECK-LABEL: uabdl2_2d:
;CHECK: uabdl2.2d		;CHECK: uabdl.2d
%load1 = load <4 x i32>, <4 x i32>* %A		%load1 = load <4 x i32>, <4 x i32>* %A
%load2 = load <4 x i32>, <4 x i32>* %B		%load2 = load <4 x i32>, <4 x i32>* %B
%tmp1 = shufflevector <4 x i32> %load1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>		%tmp1 = shufflevector <4 x i32> %load1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%tmp2 = shufflevector <4 x i32> %load2, <4 x i32> undef, <2 x i32> <i32 2, i32 3>		%tmp2 = shufflevector <4 x i32> %load2, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%tmp3 = call <2 x i32> @llvm.aarch64.neon.uabd.v2i32(<2 x i32> %tmp1, <2 x i32> %tmp2)		%tmp3 = call <2 x i32> @llvm.aarch64.neon.uabd.v2i32(<2 x i32> %tmp1, <2 x i32> %tmp2)
%tmp4 = zext <2 x i32> %tmp3 to <2 x i64>		%tmp4 = zext <2 x i32> %tmp3 to <2 x i64>
ret <2 x i64> %tmp4		ret <2 x i64> %tmp4
}		}
▲ Show 20 Lines • Show All 420 Lines • ▼ Show 20 Lines	;CHECK: sabal.2d
%tmp4.1 = zext <2 x i32> %tmp4 to <2 x i64>		%tmp4.1 = zext <2 x i32> %tmp4 to <2 x i64>
%tmp4.1.1 = zext <2 x i32> %tmp4 to <2 x i64>		%tmp4.1.1 = zext <2 x i32> %tmp4 to <2 x i64>
%tmp5 = add <2 x i64> %tmp3, %tmp4.1		%tmp5 = add <2 x i64> %tmp3, %tmp4.1
ret <2 x i64> %tmp5		ret <2 x i64> %tmp5
}		}

define <8 x i16> @sabal2_8h(<16 x i8>* %A, <16 x i8>* %B, <8 x i16>* %C) nounwind {		define <8 x i16> @sabal2_8h(<16 x i8>* %A, <16 x i8>* %B, <8 x i16>* %C) nounwind {
;CHECK-LABEL: sabal2_8h:		;CHECK-LABEL: sabal2_8h:
;CHECK: sabal2.8h		;CHECK: sabal.8h
%load1 = load <16 x i8>, <16 x i8>* %A		%load1 = load <16 x i8>, <16 x i8>* %A
%load2 = load <16 x i8>, <16 x i8>* %B		%load2 = load <16 x i8>, <16 x i8>* %B
%tmp3 = load <8 x i16>, <8 x i16>* %C		%tmp3 = load <8 x i16>, <8 x i16>* %C
%tmp1 = shufflevector <16 x i8> %load1, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%tmp1 = shufflevector <16 x i8> %load1, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%tmp2 = shufflevector <16 x i8> %load2, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%tmp2 = shufflevector <16 x i8> %load2, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%tmp4 = call <8 x i8> @llvm.aarch64.neon.sabd.v8i8(<8 x i8> %tmp1, <8 x i8> %tmp2)		%tmp4 = call <8 x i8> @llvm.aarch64.neon.sabd.v8i8(<8 x i8> %tmp1, <8 x i8> %tmp2)
%tmp4.1 = zext <8 x i8> %tmp4 to <8 x i16>		%tmp4.1 = zext <8 x i8> %tmp4 to <8 x i16>
%tmp5 = add <8 x i16> %tmp3, %tmp4.1		%tmp5 = add <8 x i16> %tmp3, %tmp4.1
ret <8 x i16> %tmp5		ret <8 x i16> %tmp5
}		}

define <4 x i32> @sabal2_4s(<8 x i16>* %A, <8 x i16>* %B, <4 x i32>* %C) nounwind {		define <4 x i32> @sabal2_4s(<8 x i16>* %A, <8 x i16>* %B, <4 x i32>* %C) nounwind {
;CHECK-LABEL: sabal2_4s:		;CHECK-LABEL: sabal2_4s:
;CHECK: sabal2.4s		;CHECK: sabal.4s
%load1 = load <8 x i16>, <8 x i16>* %A		%load1 = load <8 x i16>, <8 x i16>* %A
%load2 = load <8 x i16>, <8 x i16>* %B		%load2 = load <8 x i16>, <8 x i16>* %B
%tmp3 = load <4 x i32>, <4 x i32>* %C		%tmp3 = load <4 x i32>, <4 x i32>* %C
%tmp1 = shufflevector <8 x i16> %load1, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		%tmp1 = shufflevector <8 x i16> %load1, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%tmp2 = shufflevector <8 x i16> %load2, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		%tmp2 = shufflevector <8 x i16> %load2, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%tmp4 = call <4 x i16> @llvm.aarch64.neon.sabd.v4i16(<4 x i16> %tmp1, <4 x i16> %tmp2)		%tmp4 = call <4 x i16> @llvm.aarch64.neon.sabd.v4i16(<4 x i16> %tmp1, <4 x i16> %tmp2)
%tmp4.1 = zext <4 x i16> %tmp4 to <4 x i32>		%tmp4.1 = zext <4 x i16> %tmp4 to <4 x i32>
%tmp5 = add <4 x i32> %tmp3, %tmp4.1		%tmp5 = add <4 x i32> %tmp3, %tmp4.1
ret <4 x i32> %tmp5		ret <4 x i32> %tmp5
}		}

define <2 x i64> @sabal2_2d(<4 x i32>* %A, <4 x i32>* %B, <2 x i64>* %C) nounwind {		define <2 x i64> @sabal2_2d(<4 x i32>* %A, <4 x i32>* %B, <2 x i64>* %C) nounwind {
;CHECK-LABEL: sabal2_2d:		;CHECK-LABEL: sabal2_2d:
;CHECK: sabal2.2d		;CHECK: sabal.2d
%load1 = load <4 x i32>, <4 x i32>* %A		%load1 = load <4 x i32>, <4 x i32>* %A
%load2 = load <4 x i32>, <4 x i32>* %B		%load2 = load <4 x i32>, <4 x i32>* %B
%tmp3 = load <2 x i64>, <2 x i64>* %C		%tmp3 = load <2 x i64>, <2 x i64>* %C
%tmp1 = shufflevector <4 x i32> %load1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>		%tmp1 = shufflevector <4 x i32> %load1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%tmp2 = shufflevector <4 x i32> %load2, <4 x i32> undef, <2 x i32> <i32 2, i32 3>		%tmp2 = shufflevector <4 x i32> %load2, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%tmp4 = call <2 x i32> @llvm.aarch64.neon.sabd.v2i32(<2 x i32> %tmp1, <2 x i32> %tmp2)		%tmp4 = call <2 x i32> @llvm.aarch64.neon.sabd.v2i32(<2 x i32> %tmp1, <2 x i32> %tmp2)
%tmp4.1 = zext <2 x i32> %tmp4 to <2 x i64>		%tmp4.1 = zext <2 x i32> %tmp4 to <2 x i64>
%tmp5 = add <2 x i64> %tmp3, %tmp4.1		%tmp5 = add <2 x i64> %tmp3, %tmp4.1
Show All 33 Lines	;CHECK: uabal.2d
%tmp4 = call <2 x i32> @llvm.aarch64.neon.uabd.v2i32(<2 x i32> %tmp1, <2 x i32> %tmp2)		%tmp4 = call <2 x i32> @llvm.aarch64.neon.uabd.v2i32(<2 x i32> %tmp1, <2 x i32> %tmp2)
%tmp4.1 = zext <2 x i32> %tmp4 to <2 x i64>		%tmp4.1 = zext <2 x i32> %tmp4 to <2 x i64>
%tmp5 = add <2 x i64> %tmp3, %tmp4.1		%tmp5 = add <2 x i64> %tmp3, %tmp4.1
ret <2 x i64> %tmp5		ret <2 x i64> %tmp5
}		}

define <8 x i16> @uabal2_8h(<16 x i8>* %A, <16 x i8>* %B, <8 x i16>* %C) nounwind {		define <8 x i16> @uabal2_8h(<16 x i8>* %A, <16 x i8>* %B, <8 x i16>* %C) nounwind {
;CHECK-LABEL: uabal2_8h:		;CHECK-LABEL: uabal2_8h:
;CHECK: uabal2.8h		;CHECK: uabal.8h
%load1 = load <16 x i8>, <16 x i8>* %A		%load1 = load <16 x i8>, <16 x i8>* %A
%load2 = load <16 x i8>, <16 x i8>* %B		%load2 = load <16 x i8>, <16 x i8>* %B
%tmp3 = load <8 x i16>, <8 x i16>* %C		%tmp3 = load <8 x i16>, <8 x i16>* %C
%tmp1 = shufflevector <16 x i8> %load1, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%tmp1 = shufflevector <16 x i8> %load1, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%tmp2 = shufflevector <16 x i8> %load2, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%tmp2 = shufflevector <16 x i8> %load2, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%tmp4 = call <8 x i8> @llvm.aarch64.neon.uabd.v8i8(<8 x i8> %tmp1, <8 x i8> %tmp2)		%tmp4 = call <8 x i8> @llvm.aarch64.neon.uabd.v8i8(<8 x i8> %tmp1, <8 x i8> %tmp2)
%tmp4.1 = zext <8 x i8> %tmp4 to <8 x i16>		%tmp4.1 = zext <8 x i8> %tmp4 to <8 x i16>
%tmp5 = add <8 x i16> %tmp3, %tmp4.1		%tmp5 = add <8 x i16> %tmp3, %tmp4.1
ret <8 x i16> %tmp5		ret <8 x i16> %tmp5
}		}

define <4 x i32> @uabal2_4s(<8 x i16>* %A, <8 x i16>* %B, <4 x i32>* %C) nounwind {		define <4 x i32> @uabal2_4s(<8 x i16>* %A, <8 x i16>* %B, <4 x i32>* %C) nounwind {
;CHECK-LABEL: uabal2_4s:		;CHECK-LABEL: uabal2_4s:
;CHECK: uabal2.4s		;CHECK: uabal.4s
%load1 = load <8 x i16>, <8 x i16>* %A		%load1 = load <8 x i16>, <8 x i16>* %A
%load2 = load <8 x i16>, <8 x i16>* %B		%load2 = load <8 x i16>, <8 x i16>* %B
%tmp3 = load <4 x i32>, <4 x i32>* %C		%tmp3 = load <4 x i32>, <4 x i32>* %C
%tmp1 = shufflevector <8 x i16> %load1, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		%tmp1 = shufflevector <8 x i16> %load1, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%tmp2 = shufflevector <8 x i16> %load2, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		%tmp2 = shufflevector <8 x i16> %load2, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%tmp4 = call <4 x i16> @llvm.aarch64.neon.uabd.v4i16(<4 x i16> %tmp1, <4 x i16> %tmp2)		%tmp4 = call <4 x i16> @llvm.aarch64.neon.uabd.v4i16(<4 x i16> %tmp1, <4 x i16> %tmp2)
%tmp4.1 = zext <4 x i16> %tmp4 to <4 x i32>		%tmp4.1 = zext <4 x i16> %tmp4 to <4 x i32>
%tmp5 = add <4 x i32> %tmp3, %tmp4.1		%tmp5 = add <4 x i32> %tmp3, %tmp4.1
ret <4 x i32> %tmp5		ret <4 x i32> %tmp5
}		}

define <2 x i64> @uabal2_2d(<4 x i32>* %A, <4 x i32>* %B, <2 x i64>* %C) nounwind {		define <2 x i64> @uabal2_2d(<4 x i32>* %A, <4 x i32>* %B, <2 x i64>* %C) nounwind {
;CHECK-LABEL: uabal2_2d:		;CHECK-LABEL: uabal2_2d:
;CHECK: uabal2.2d		;CHECK: uabal.2d
%load1 = load <4 x i32>, <4 x i32>* %A		%load1 = load <4 x i32>, <4 x i32>* %A
%load2 = load <4 x i32>, <4 x i32>* %B		%load2 = load <4 x i32>, <4 x i32>* %B
%tmp3 = load <2 x i64>, <2 x i64>* %C		%tmp3 = load <2 x i64>, <2 x i64>* %C
%tmp1 = shufflevector <4 x i32> %load1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>		%tmp1 = shufflevector <4 x i32> %load1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%tmp2 = shufflevector <4 x i32> %load2, <4 x i32> undef, <2 x i32> <i32 2, i32 3>		%tmp2 = shufflevector <4 x i32> %load2, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%tmp4 = call <2 x i32> @llvm.aarch64.neon.uabd.v2i32(<2 x i32> %tmp1, <2 x i32> %tmp2)		%tmp4 = call <2 x i32> @llvm.aarch64.neon.uabd.v2i32(<2 x i32> %tmp1, <2 x i32> %tmp2)
%tmp4.1 = zext <2 x i32> %tmp4 to <2 x i64>		%tmp4.1 = zext <2 x i32> %tmp4 to <2 x i64>
%tmp5 = add <2 x i64> %tmp3, %tmp4.1		%tmp5 = add <2 x i64> %tmp3, %tmp4.1
▲ Show 20 Lines • Show All 250 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/arm64-vadd.ll

Show First 20 Lines • Show All 312 Lines • ▼ Show 20 Lines	;CHECK: uaddw.2d
%tmp2 = load <2 x i32>, <2 x i32>* %B		%tmp2 = load <2 x i32>, <2 x i32>* %B
%tmp3 = zext <2 x i32> %tmp2 to <2 x i64>		%tmp3 = zext <2 x i32> %tmp2 to <2 x i64>
%tmp4 = add <2 x i64> %tmp1, %tmp3		%tmp4 = add <2 x i64> %tmp1, %tmp3
ret <2 x i64> %tmp4		ret <2 x i64> %tmp4
}		}

define <8 x i16> @uaddw2_8h(<8 x i16>* %A, <16 x i8>* %B) nounwind {		define <8 x i16> @uaddw2_8h(<8 x i16>* %A, <16 x i8>* %B) nounwind {
;CHECK-LABEL: uaddw2_8h:		;CHECK-LABEL: uaddw2_8h:
;CHECK: uaddw2.8h		;CHECK: uaddw.8h
%tmp1 = load <8 x i16>, <8 x i16>* %A		%tmp1 = load <8 x i16>, <8 x i16>* %A

%tmp2 = load <16 x i8>, <16 x i8>* %B		%tmp2 = load <16 x i8>, <16 x i8>* %B
%high2 = shufflevector <16 x i8> %tmp2, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%high2 = shufflevector <16 x i8> %tmp2, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%ext2 = zext <8 x i8> %high2 to <8 x i16>		%ext2 = zext <8 x i8> %high2 to <8 x i16>

%res = add <8 x i16> %tmp1, %ext2		%res = add <8 x i16> %tmp1, %ext2
ret <8 x i16> %res		ret <8 x i16> %res
}		}

define <4 x i32> @uaddw2_4s(<4 x i32>* %A, <8 x i16>* %B) nounwind {		define <4 x i32> @uaddw2_4s(<4 x i32>* %A, <8 x i16>* %B) nounwind {
;CHECK-LABEL: uaddw2_4s:		;CHECK-LABEL: uaddw2_4s:
;CHECK: uaddw2.4s		;CHECK: uaddw.4s
%tmp1 = load <4 x i32>, <4 x i32>* %A		%tmp1 = load <4 x i32>, <4 x i32>* %A

%tmp2 = load <8 x i16>, <8 x i16>* %B		%tmp2 = load <8 x i16>, <8 x i16>* %B
%high2 = shufflevector <8 x i16> %tmp2, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		%high2 = shufflevector <8 x i16> %tmp2, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%ext2 = zext <4 x i16> %high2 to <4 x i32>		%ext2 = zext <4 x i16> %high2 to <4 x i32>

%res = add <4 x i32> %tmp1, %ext2		%res = add <4 x i32> %tmp1, %ext2
ret <4 x i32> %res		ret <4 x i32> %res
}		}

define <2 x i64> @uaddw2_2d(<2 x i64>* %A, <4 x i32>* %B) nounwind {		define <2 x i64> @uaddw2_2d(<2 x i64>* %A, <4 x i32>* %B) nounwind {
;CHECK-LABEL: uaddw2_2d:		;CHECK-LABEL: uaddw2_2d:
;CHECK: uaddw2.2d		;CHECK: uaddw.2d
%tmp1 = load <2 x i64>, <2 x i64>* %A		%tmp1 = load <2 x i64>, <2 x i64>* %A

%tmp2 = load <4 x i32>, <4 x i32>* %B		%tmp2 = load <4 x i32>, <4 x i32>* %B
%high2 = shufflevector <4 x i32> %tmp2, <4 x i32> undef, <2 x i32> <i32 2, i32 3>		%high2 = shufflevector <4 x i32> %tmp2, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%ext2 = zext <2 x i32> %high2 to <2 x i64>		%ext2 = zext <2 x i32> %high2 to <2 x i64>

%res = add <2 x i64> %tmp1, %ext2		%res = add <2 x i64> %tmp1, %ext2
ret <2 x i64> %res		ret <2 x i64> %res
Show All 26 Lines	;CHECK: saddw.2d
%tmp2 = load <2 x i32>, <2 x i32>* %B		%tmp2 = load <2 x i32>, <2 x i32>* %B
%tmp3 = sext <2 x i32> %tmp2 to <2 x i64>		%tmp3 = sext <2 x i32> %tmp2 to <2 x i64>
%tmp4 = add <2 x i64> %tmp1, %tmp3		%tmp4 = add <2 x i64> %tmp1, %tmp3
ret <2 x i64> %tmp4		ret <2 x i64> %tmp4
}		}

define <8 x i16> @saddw2_8h(<8 x i16>* %A, <16 x i8>* %B) nounwind {		define <8 x i16> @saddw2_8h(<8 x i16>* %A, <16 x i8>* %B) nounwind {
;CHECK-LABEL: saddw2_8h:		;CHECK-LABEL: saddw2_8h:
;CHECK: saddw2.8h		;CHECK: saddw.8h
%tmp1 = load <8 x i16>, <8 x i16>* %A		%tmp1 = load <8 x i16>, <8 x i16>* %A

%tmp2 = load <16 x i8>, <16 x i8>* %B		%tmp2 = load <16 x i8>, <16 x i8>* %B
%high2 = shufflevector <16 x i8> %tmp2, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%high2 = shufflevector <16 x i8> %tmp2, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%ext2 = sext <8 x i8> %high2 to <8 x i16>		%ext2 = sext <8 x i8> %high2 to <8 x i16>

%res = add <8 x i16> %tmp1, %ext2		%res = add <8 x i16> %tmp1, %ext2
ret <8 x i16> %res		ret <8 x i16> %res
}		}

define <4 x i32> @saddw2_4s(<4 x i32>* %A, <8 x i16>* %B) nounwind {		define <4 x i32> @saddw2_4s(<4 x i32>* %A, <8 x i16>* %B) nounwind {
;CHECK-LABEL: saddw2_4s:		;CHECK-LABEL: saddw2_4s:
;CHECK: saddw2.4s		;CHECK: saddw.4s
%tmp1 = load <4 x i32>, <4 x i32>* %A		%tmp1 = load <4 x i32>, <4 x i32>* %A

%tmp2 = load <8 x i16>, <8 x i16>* %B		%tmp2 = load <8 x i16>, <8 x i16>* %B
%high2 = shufflevector <8 x i16> %tmp2, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		%high2 = shufflevector <8 x i16> %tmp2, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%ext2 = sext <4 x i16> %high2 to <4 x i32>		%ext2 = sext <4 x i16> %high2 to <4 x i32>

%res = add <4 x i32> %tmp1, %ext2		%res = add <4 x i32> %tmp1, %ext2
ret <4 x i32> %res		ret <4 x i32> %res
}		}

define <2 x i64> @saddw2_2d(<2 x i64>* %A, <4 x i32>* %B) nounwind {		define <2 x i64> @saddw2_2d(<2 x i64>* %A, <4 x i32>* %B) nounwind {
;CHECK-LABEL: saddw2_2d:		;CHECK-LABEL: saddw2_2d:
;CHECK: saddw2.2d		;CHECK: saddw.2d
%tmp1 = load <2 x i64>, <2 x i64>* %A		%tmp1 = load <2 x i64>, <2 x i64>* %A

%tmp2 = load <4 x i32>, <4 x i32>* %B		%tmp2 = load <4 x i32>, <4 x i32>* %B
%high2 = shufflevector <4 x i32> %tmp2, <4 x i32> undef, <2 x i32> <i32 2, i32 3>		%high2 = shufflevector <4 x i32> %tmp2, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%ext2 = sext <2 x i32> %high2 to <2 x i64>		%ext2 = sext <2 x i32> %high2 to <2 x i64>

%res = add <2 x i64> %tmp1, %ext2		%res = add <2 x i64> %tmp1, %ext2
ret <2 x i64> %res		ret <2 x i64> %res
▲ Show 20 Lines • Show All 517 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/arm64-vmul.ll

Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	;CHECK: sqdmull.2d
%tmp1 = load <2 x i32>, <2 x i32>* %A		%tmp1 = load <2 x i32>, <2 x i32>* %A
%tmp2 = load <2 x i32>, <2 x i32>* %B		%tmp2 = load <2 x i32>, <2 x i32>* %B
%tmp3 = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> %tmp1, <2 x i32> %tmp2)		%tmp3 = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> %tmp1, <2 x i32> %tmp2)
ret <2 x i64> %tmp3		ret <2 x i64> %tmp3
}		}

define <4 x i32> @sqdmull2_4s(<8 x i16>* %A, <8 x i16>* %B) nounwind {		define <4 x i32> @sqdmull2_4s(<8 x i16>* %A, <8 x i16>* %B) nounwind {
;CHECK-LABEL: sqdmull2_4s:		;CHECK-LABEL: sqdmull2_4s:
;CHECK: sqdmull2.4s		;CHECK: sqdmull.4s
%load1 = load <8 x i16>, <8 x i16>* %A		%load1 = load <8 x i16>, <8 x i16>* %A
%load2 = load <8 x i16>, <8 x i16>* %B		%load2 = load <8 x i16>, <8 x i16>* %B
%tmp1 = shufflevector <8 x i16> %load1, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		%tmp1 = shufflevector <8 x i16> %load1, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%tmp2 = shufflevector <8 x i16> %load2, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		%tmp2 = shufflevector <8 x i16> %load2, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%tmp3 = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> %tmp1, <4 x i16> %tmp2)		%tmp3 = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> %tmp1, <4 x i16> %tmp2)
ret <4 x i32> %tmp3		ret <4 x i32> %tmp3
}		}

define <2 x i64> @sqdmull2_2d(<4 x i32>* %A, <4 x i32>* %B) nounwind {		define <2 x i64> @sqdmull2_2d(<4 x i32>* %A, <4 x i32>* %B) nounwind {
;CHECK-LABEL: sqdmull2_2d:		;CHECK-LABEL: sqdmull2_2d:
;CHECK: sqdmull2.2d		;CHECK: sqdmull.2d
%load1 = load <4 x i32>, <4 x i32>* %A		%load1 = load <4 x i32>, <4 x i32>* %A
%load2 = load <4 x i32>, <4 x i32>* %B		%load2 = load <4 x i32>, <4 x i32>* %B
%tmp1 = shufflevector <4 x i32> %load1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>		%tmp1 = shufflevector <4 x i32> %load1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%tmp2 = shufflevector <4 x i32> %load2, <4 x i32> undef, <2 x i32> <i32 2, i32 3>		%tmp2 = shufflevector <4 x i32> %load2, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%tmp3 = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> %tmp1, <2 x i32> %tmp2)		%tmp3 = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> %tmp1, <2 x i32> %tmp2)
ret <2 x i64> %tmp3		ret <2 x i64> %tmp3
}		}

▲ Show 20 Lines • Show All 213 Lines • ▼ Show 20 Lines	;CHECK: sqdmlal.2d
%tmp3 = load <2 x i64>, <2 x i64>* %C		%tmp3 = load <2 x i64>, <2 x i64>* %C
%tmp4 = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> %tmp1, <2 x i32> %tmp2)		%tmp4 = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> %tmp1, <2 x i32> %tmp2)
%tmp5 = call <2 x i64> @llvm.aarch64.neon.sqadd.v2i64(<2 x i64> %tmp3, <2 x i64> %tmp4)		%tmp5 = call <2 x i64> @llvm.aarch64.neon.sqadd.v2i64(<2 x i64> %tmp3, <2 x i64> %tmp4)
ret <2 x i64> %tmp5		ret <2 x i64> %tmp5
}		}

define <4 x i32> @sqdmlal2_4s(<8 x i16>* %A, <8 x i16>* %B, <4 x i32>* %C) nounwind {		define <4 x i32> @sqdmlal2_4s(<8 x i16>* %A, <8 x i16>* %B, <4 x i32>* %C) nounwind {
;CHECK-LABEL: sqdmlal2_4s:		;CHECK-LABEL: sqdmlal2_4s:
;CHECK: sqdmlal2.4s		;CHECK: sqdmlal.4s
%load1 = load <8 x i16>, <8 x i16>* %A		%load1 = load <8 x i16>, <8 x i16>* %A
%load2 = load <8 x i16>, <8 x i16>* %B		%load2 = load <8 x i16>, <8 x i16>* %B
%tmp3 = load <4 x i32>, <4 x i32>* %C		%tmp3 = load <4 x i32>, <4 x i32>* %C
%tmp1 = shufflevector <8 x i16> %load1, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		%tmp1 = shufflevector <8 x i16> %load1, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%tmp2 = shufflevector <8 x i16> %load2, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		%tmp2 = shufflevector <8 x i16> %load2, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%tmp4 = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> %tmp1, <4 x i16> %tmp2)		%tmp4 = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> %tmp1, <4 x i16> %tmp2)
%tmp5 = call <4 x i32> @llvm.aarch64.neon.sqadd.v4i32(<4 x i32> %tmp3, <4 x i32> %tmp4)		%tmp5 = call <4 x i32> @llvm.aarch64.neon.sqadd.v4i32(<4 x i32> %tmp3, <4 x i32> %tmp4)
ret <4 x i32> %tmp5		ret <4 x i32> %tmp5
}		}

define <2 x i64> @sqdmlal2_2d(<4 x i32>* %A, <4 x i32>* %B, <2 x i64>* %C) nounwind {		define <2 x i64> @sqdmlal2_2d(<4 x i32>* %A, <4 x i32>* %B, <2 x i64>* %C) nounwind {
;CHECK-LABEL: sqdmlal2_2d:		;CHECK-LABEL: sqdmlal2_2d:
;CHECK: sqdmlal2.2d		;CHECK: sqdmlal.2d
%load1 = load <4 x i32>, <4 x i32>* %A		%load1 = load <4 x i32>, <4 x i32>* %A
%load2 = load <4 x i32>, <4 x i32>* %B		%load2 = load <4 x i32>, <4 x i32>* %B
%tmp3 = load <2 x i64>, <2 x i64>* %C		%tmp3 = load <2 x i64>, <2 x i64>* %C
%tmp1 = shufflevector <4 x i32> %load1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>		%tmp1 = shufflevector <4 x i32> %load1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%tmp2 = shufflevector <4 x i32> %load2, <4 x i32> undef, <2 x i32> <i32 2, i32 3>		%tmp2 = shufflevector <4 x i32> %load2, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%tmp4 = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> %tmp1, <2 x i32> %tmp2)		%tmp4 = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> %tmp1, <2 x i32> %tmp2)
%tmp5 = call <2 x i64> @llvm.aarch64.neon.sqadd.v2i64(<2 x i64> %tmp3, <2 x i64> %tmp4)		%tmp5 = call <2 x i64> @llvm.aarch64.neon.sqadd.v2i64(<2 x i64> %tmp3, <2 x i64> %tmp4)
ret <2 x i64> %tmp5		ret <2 x i64> %tmp5
Show All 18 Lines	;CHECK: sqdmlsl.2d
%tmp3 = load <2 x i64>, <2 x i64>* %C		%tmp3 = load <2 x i64>, <2 x i64>* %C
%tmp4 = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> %tmp1, <2 x i32> %tmp2)		%tmp4 = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> %tmp1, <2 x i32> %tmp2)
%tmp5 = call <2 x i64> @llvm.aarch64.neon.sqsub.v2i64(<2 x i64> %tmp3, <2 x i64> %tmp4)		%tmp5 = call <2 x i64> @llvm.aarch64.neon.sqsub.v2i64(<2 x i64> %tmp3, <2 x i64> %tmp4)
ret <2 x i64> %tmp5		ret <2 x i64> %tmp5
}		}

define <4 x i32> @sqdmlsl2_4s(<8 x i16>* %A, <8 x i16>* %B, <4 x i32>* %C) nounwind {		define <4 x i32> @sqdmlsl2_4s(<8 x i16>* %A, <8 x i16>* %B, <4 x i32>* %C) nounwind {
;CHECK-LABEL: sqdmlsl2_4s:		;CHECK-LABEL: sqdmlsl2_4s:
;CHECK: sqdmlsl2.4s		;CHECK: sqdmlsl.4s
%load1 = load <8 x i16>, <8 x i16>* %A		%load1 = load <8 x i16>, <8 x i16>* %A
%load2 = load <8 x i16>, <8 x i16>* %B		%load2 = load <8 x i16>, <8 x i16>* %B
%tmp3 = load <4 x i32>, <4 x i32>* %C		%tmp3 = load <4 x i32>, <4 x i32>* %C
%tmp1 = shufflevector <8 x i16> %load1, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		%tmp1 = shufflevector <8 x i16> %load1, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%tmp2 = shufflevector <8 x i16> %load2, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		%tmp2 = shufflevector <8 x i16> %load2, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%tmp4 = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> %tmp1, <4 x i16> %tmp2)		%tmp4 = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> %tmp1, <4 x i16> %tmp2)
%tmp5 = call <4 x i32> @llvm.aarch64.neon.sqsub.v4i32(<4 x i32> %tmp3, <4 x i32> %tmp4)		%tmp5 = call <4 x i32> @llvm.aarch64.neon.sqsub.v4i32(<4 x i32> %tmp3, <4 x i32> %tmp4)
ret <4 x i32> %tmp5		ret <4 x i32> %tmp5
}		}

define <2 x i64> @sqdmlsl2_2d(<4 x i32>* %A, <4 x i32>* %B, <2 x i64>* %C) nounwind {		define <2 x i64> @sqdmlsl2_2d(<4 x i32>* %A, <4 x i32>* %B, <2 x i64>* %C) nounwind {
;CHECK-LABEL: sqdmlsl2_2d:		;CHECK-LABEL: sqdmlsl2_2d:
;CHECK: sqdmlsl2.2d		;CHECK: sqdmlsl.2d
%load1 = load <4 x i32>, <4 x i32>* %A		%load1 = load <4 x i32>, <4 x i32>* %A
%load2 = load <4 x i32>, <4 x i32>* %B		%load2 = load <4 x i32>, <4 x i32>* %B
%tmp3 = load <2 x i64>, <2 x i64>* %C		%tmp3 = load <2 x i64>, <2 x i64>* %C
%tmp1 = shufflevector <4 x i32> %load1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>		%tmp1 = shufflevector <4 x i32> %load1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%tmp2 = shufflevector <4 x i32> %load2, <4 x i32> undef, <2 x i32> <i32 2, i32 3>		%tmp2 = shufflevector <4 x i32> %load2, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%tmp4 = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> %tmp1, <2 x i32> %tmp2)		%tmp4 = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> %tmp1, <2 x i32> %tmp2)
%tmp5 = call <2 x i64> @llvm.aarch64.neon.sqsub.v2i64(<2 x i64> %tmp3, <2 x i64> %tmp4)		%tmp5 = call <2 x i64> @llvm.aarch64.neon.sqsub.v2i64(<2 x i64> %tmp3, <2 x i64> %tmp4)
ret <2 x i64> %tmp5		ret <2 x i64> %tmp5
▲ Show 20 Lines • Show All 472 Lines • ▼ Show 20 Lines	;CHECK: sqdmull.2d
%tmp3 = shufflevector <2 x i32> %tmp2, <2 x i32> %tmp2, <2 x i32> <i32 1, i32 1>		%tmp3 = shufflevector <2 x i32> %tmp2, <2 x i32> %tmp2, <2 x i32> <i32 1, i32 1>
%tmp4 = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> %tmp1, <2 x i32> %tmp3)		%tmp4 = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> %tmp1, <2 x i32> %tmp3)
ret <2 x i64> %tmp4		ret <2 x i64> %tmp4
}		}

define <4 x i32> @sqdmull2_lane_4s(<8 x i16>* %A, <8 x i16>* %B) nounwind {		define <4 x i32> @sqdmull2_lane_4s(<8 x i16>* %A, <8 x i16>* %B) nounwind {
;CHECK-LABEL: sqdmull2_lane_4s:		;CHECK-LABEL: sqdmull2_lane_4s:
;CHECK-NOT: dup		;CHECK-NOT: dup
;CHECK: sqdmull2.4s		;CHECK: sqdmull.4s
%load1 = load <8 x i16>, <8 x i16>* %A		%load1 = load <8 x i16>, <8 x i16>* %A
%load2 = load <8 x i16>, <8 x i16>* %B		%load2 = load <8 x i16>, <8 x i16>* %B
%tmp1 = shufflevector <8 x i16> %load1, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		%tmp1 = shufflevector <8 x i16> %load1, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%tmp2 = shufflevector <8 x i16> %load2, <8 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>		%tmp2 = shufflevector <8 x i16> %load2, <8 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
%tmp4 = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> %tmp1, <4 x i16> %tmp2)		%tmp4 = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> %tmp1, <4 x i16> %tmp2)
ret <4 x i32> %tmp4		ret <4 x i32> %tmp4
}		}

define <2 x i64> @sqdmull2_lane_2d(<4 x i32>* %A, <4 x i32>* %B) nounwind {		define <2 x i64> @sqdmull2_lane_2d(<4 x i32>* %A, <4 x i32>* %B) nounwind {
;CHECK-LABEL: sqdmull2_lane_2d:		;CHECK-LABEL: sqdmull2_lane_2d:
;CHECK-NOT: dup		;CHECK-NOT: dup
;CHECK: sqdmull2.2d		;CHECK: sqdmull.2d
%load1 = load <4 x i32>, <4 x i32>* %A		%load1 = load <4 x i32>, <4 x i32>* %A
%load2 = load <4 x i32>, <4 x i32>* %B		%load2 = load <4 x i32>, <4 x i32>* %B
%tmp1 = shufflevector <4 x i32> %load1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>		%tmp1 = shufflevector <4 x i32> %load1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%tmp2 = shufflevector <4 x i32> %load2, <4 x i32> undef, <2 x i32> <i32 1, i32 1>		%tmp2 = shufflevector <4 x i32> %load2, <4 x i32> undef, <2 x i32> <i32 1, i32 1>
%tmp4 = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> %tmp1, <2 x i32> %tmp2)		%tmp4 = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> %tmp1, <2 x i32> %tmp2)
ret <2 x i64> %tmp4		ret <2 x i64> %tmp4
}		}

▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	;CHECK: sqdmlal.2d
%tmp5 = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> %tmp1, <2 x i32> %tmp4)		%tmp5 = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> %tmp1, <2 x i32> %tmp4)
%tmp6 = call <2 x i64> @llvm.aarch64.neon.sqadd.v2i64(<2 x i64> %tmp3, <2 x i64> %tmp5)		%tmp6 = call <2 x i64> @llvm.aarch64.neon.sqadd.v2i64(<2 x i64> %tmp3, <2 x i64> %tmp5)
ret <2 x i64> %tmp6		ret <2 x i64> %tmp6
}		}

define <4 x i32> @sqdmlal2_lane_4s(<8 x i16>* %A, <8 x i16>* %B, <4 x i32>* %C) nounwind {		define <4 x i32> @sqdmlal2_lane_4s(<8 x i16>* %A, <8 x i16>* %B, <4 x i32>* %C) nounwind {
;CHECK-LABEL: sqdmlal2_lane_4s:		;CHECK-LABEL: sqdmlal2_lane_4s:
;CHECK-NOT: dup		;CHECK-NOT: dup
;CHECK: sqdmlal2.4s		;CHECK: sqdmlal.4s
%load1 = load <8 x i16>, <8 x i16>* %A		%load1 = load <8 x i16>, <8 x i16>* %A
%load2 = load <8 x i16>, <8 x i16>* %B		%load2 = load <8 x i16>, <8 x i16>* %B
%tmp3 = load <4 x i32>, <4 x i32>* %C		%tmp3 = load <4 x i32>, <4 x i32>* %C
%tmp1 = shufflevector <8 x i16> %load1, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		%tmp1 = shufflevector <8 x i16> %load1, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%tmp2 = shufflevector <8 x i16> %load2, <8 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>		%tmp2 = shufflevector <8 x i16> %load2, <8 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
%tmp5 = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> %tmp1, <4 x i16> %tmp2)		%tmp5 = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> %tmp1, <4 x i16> %tmp2)
%tmp6 = call <4 x i32> @llvm.aarch64.neon.sqadd.v4i32(<4 x i32> %tmp3, <4 x i32> %tmp5)		%tmp6 = call <4 x i32> @llvm.aarch64.neon.sqadd.v4i32(<4 x i32> %tmp3, <4 x i32> %tmp5)
ret <4 x i32> %tmp6		ret <4 x i32> %tmp6
}		}

define <2 x i64> @sqdmlal2_lane_2d(<4 x i32>* %A, <4 x i32>* %B, <2 x i64>* %C) nounwind {		define <2 x i64> @sqdmlal2_lane_2d(<4 x i32>* %A, <4 x i32>* %B, <2 x i64>* %C) nounwind {
;CHECK-LABEL: sqdmlal2_lane_2d:		;CHECK-LABEL: sqdmlal2_lane_2d:
;CHECK-NOT: dup		;CHECK-NOT: dup
;CHECK: sqdmlal2.2d		;CHECK: sqdmlal.2d
%load1 = load <4 x i32>, <4 x i32>* %A		%load1 = load <4 x i32>, <4 x i32>* %A
%load2 = load <4 x i32>, <4 x i32>* %B		%load2 = load <4 x i32>, <4 x i32>* %B
%tmp3 = load <2 x i64>, <2 x i64>* %C		%tmp3 = load <2 x i64>, <2 x i64>* %C
%tmp1 = shufflevector <4 x i32> %load1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>		%tmp1 = shufflevector <4 x i32> %load1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%tmp2 = shufflevector <4 x i32> %load2, <4 x i32> undef, <2 x i32> <i32 1, i32 1>		%tmp2 = shufflevector <4 x i32> %load2, <4 x i32> undef, <2 x i32> <i32 1, i32 1>
%tmp5 = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> %tmp1, <2 x i32> %tmp2)		%tmp5 = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> %tmp1, <2 x i32> %tmp2)
%tmp6 = call <2 x i64> @llvm.aarch64.neon.sqadd.v2i64(<2 x i64> %tmp3, <2 x i64> %tmp5)		%tmp6 = call <2 x i64> @llvm.aarch64.neon.sqadd.v2i64(<2 x i64> %tmp3, <2 x i64> %tmp5)
ret <2 x i64> %tmp6		ret <2 x i64> %tmp6
▲ Show 20 Lines • Show All 122 Lines • ▼ Show 20 Lines	;CHECK: sqdmlsl.2d
%tmp5 = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> %tmp1, <2 x i32> %tmp4)		%tmp5 = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> %tmp1, <2 x i32> %tmp4)
%tmp6 = call <2 x i64> @llvm.aarch64.neon.sqsub.v2i64(<2 x i64> %tmp3, <2 x i64> %tmp5)		%tmp6 = call <2 x i64> @llvm.aarch64.neon.sqsub.v2i64(<2 x i64> %tmp3, <2 x i64> %tmp5)
ret <2 x i64> %tmp6		ret <2 x i64> %tmp6
}		}

define <4 x i32> @sqdmlsl2_lane_4s(<8 x i16>* %A, <8 x i16>* %B, <4 x i32>* %C) nounwind {		define <4 x i32> @sqdmlsl2_lane_4s(<8 x i16>* %A, <8 x i16>* %B, <4 x i32>* %C) nounwind {
;CHECK-LABEL: sqdmlsl2_lane_4s:		;CHECK-LABEL: sqdmlsl2_lane_4s:
;CHECK-NOT: dup		;CHECK-NOT: dup
;CHECK: sqdmlsl2.4s		;CHECK: sqdmlsl.4s
%load1 = load <8 x i16>, <8 x i16>* %A		%load1 = load <8 x i16>, <8 x i16>* %A
%load2 = load <8 x i16>, <8 x i16>* %B		%load2 = load <8 x i16>, <8 x i16>* %B
%tmp3 = load <4 x i32>, <4 x i32>* %C		%tmp3 = load <4 x i32>, <4 x i32>* %C
%tmp1 = shufflevector <8 x i16> %load1, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		%tmp1 = shufflevector <8 x i16> %load1, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%tmp2 = shufflevector <8 x i16> %load2, <8 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>		%tmp2 = shufflevector <8 x i16> %load2, <8 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
%tmp5 = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> %tmp1, <4 x i16> %tmp2)		%tmp5 = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> %tmp1, <4 x i16> %tmp2)
%tmp6 = call <4 x i32> @llvm.aarch64.neon.sqsub.v4i32(<4 x i32> %tmp3, <4 x i32> %tmp5)		%tmp6 = call <4 x i32> @llvm.aarch64.neon.sqsub.v4i32(<4 x i32> %tmp3, <4 x i32> %tmp5)
ret <4 x i32> %tmp6		ret <4 x i32> %tmp6
}		}

define <2 x i64> @sqdmlsl2_lane_2d(<4 x i32>* %A, <4 x i32>* %B, <2 x i64>* %C) nounwind {		define <2 x i64> @sqdmlsl2_lane_2d(<4 x i32>* %A, <4 x i32>* %B, <2 x i64>* %C) nounwind {
;CHECK-LABEL: sqdmlsl2_lane_2d:		;CHECK-LABEL: sqdmlsl2_lane_2d:
;CHECK-NOT: dup		;CHECK-NOT: dup
;CHECK: sqdmlsl2.2d		;CHECK: sqdmlsl.2d
%load1 = load <4 x i32>, <4 x i32>* %A		%load1 = load <4 x i32>, <4 x i32>* %A
%load2 = load <4 x i32>, <4 x i32>* %B		%load2 = load <4 x i32>, <4 x i32>* %B
%tmp3 = load <2 x i64>, <2 x i64>* %C		%tmp3 = load <2 x i64>, <2 x i64>* %C
%tmp1 = shufflevector <4 x i32> %load1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>		%tmp1 = shufflevector <4 x i32> %load1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%tmp2 = shufflevector <4 x i32> %load2, <4 x i32> undef, <2 x i32> <i32 1, i32 1>		%tmp2 = shufflevector <4 x i32> %load2, <4 x i32> undef, <2 x i32> <i32 1, i32 1>
%tmp5 = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> %tmp1, <2 x i32> %tmp2)		%tmp5 = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> %tmp1, <2 x i32> %tmp2)
%tmp6 = call <2 x i64> @llvm.aarch64.neon.sqsub.v2i64(<2 x i64> %tmp3, <2 x i64> %tmp5)		%tmp6 = call <2 x i64> @llvm.aarch64.neon.sqsub.v2i64(<2 x i64> %tmp3, <2 x i64> %tmp5)
ret <2 x i64> %tmp6		ret <2 x i64> %tmp6
▲ Show 20 Lines • Show All 864 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/arm64-vshift.ll

Show First 20 Lines • Show All 1,158 Lines • ▼ Show 20 Lines	;CHECK: ushll.2d v0, {{v[0-9]+}}, #1
%tmp1 = load <2 x i32>, <2 x i32>* %A		%tmp1 = load <2 x i32>, <2 x i32>* %A
%tmp2 = zext <2 x i32> %tmp1 to <2 x i64>		%tmp2 = zext <2 x i32> %tmp1 to <2 x i64>
%tmp3 = shl <2 x i64> %tmp2, <i64 1, i64 1>		%tmp3 = shl <2 x i64> %tmp2, <i64 1, i64 1>
ret <2 x i64> %tmp3		ret <2 x i64> %tmp3
}		}

define <8 x i16> @ushll2_8h(<16 x i8>* %A) nounwind {		define <8 x i16> @ushll2_8h(<16 x i8>* %A) nounwind {
;CHECK-LABEL: ushll2_8h:		;CHECK-LABEL: ushll2_8h:
;CHECK: ushll2.8h v0, {{v[0-9]+}}, #1		;CHECK: ushll.8h v0, {{v[0-9]+}}, #1
%load1 = load <16 x i8>, <16 x i8>* %A		%load1 = load <16 x i8>, <16 x i8>* %A
%tmp1 = shufflevector <16 x i8> %load1, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%tmp1 = shufflevector <16 x i8> %load1, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%tmp2 = zext <8 x i8> %tmp1 to <8 x i16>		%tmp2 = zext <8 x i8> %tmp1 to <8 x i16>
%tmp3 = shl <8 x i16> %tmp2, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>		%tmp3 = shl <8 x i16> %tmp2, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
ret <8 x i16> %tmp3		ret <8 x i16> %tmp3
}		}

define <4 x i32> @ushll2_4s(<8 x i16>* %A) nounwind {		define <4 x i32> @ushll2_4s(<8 x i16>* %A) nounwind {
;CHECK-LABEL: ushll2_4s:		;CHECK-LABEL: ushll2_4s:
;CHECK: ushll2.4s v0, {{v[0-9]+}}, #1		;CHECK: ushll.4s v0, {{v[0-9]+}}, #1
%load1 = load <8 x i16>, <8 x i16>* %A		%load1 = load <8 x i16>, <8 x i16>* %A
%tmp1 = shufflevector <8 x i16> %load1, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		%tmp1 = shufflevector <8 x i16> %load1, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%tmp2 = zext <4 x i16> %tmp1 to <4 x i32>		%tmp2 = zext <4 x i16> %tmp1 to <4 x i32>
%tmp3 = shl <4 x i32> %tmp2, <i32 1, i32 1, i32 1, i32 1>		%tmp3 = shl <4 x i32> %tmp2, <i32 1, i32 1, i32 1, i32 1>
ret <4 x i32> %tmp3		ret <4 x i32> %tmp3
}		}

define <2 x i64> @ushll2_2d(<4 x i32>* %A) nounwind {		define <2 x i64> @ushll2_2d(<4 x i32>* %A) nounwind {
;CHECK-LABEL: ushll2_2d:		;CHECK-LABEL: ushll2_2d:
;CHECK: ushll2.2d v0, {{v[0-9]+}}, #1		;CHECK: ushll.2d v0, {{v[0-9]+}}, #1
%load1 = load <4 x i32>, <4 x i32>* %A		%load1 = load <4 x i32>, <4 x i32>* %A
%tmp1 = shufflevector <4 x i32> %load1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>		%tmp1 = shufflevector <4 x i32> %load1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%tmp2 = zext <2 x i32> %tmp1 to <2 x i64>		%tmp2 = zext <2 x i32> %tmp1 to <2 x i64>
%tmp3 = shl <2 x i64> %tmp2, <i64 1, i64 1>		%tmp3 = shl <2 x i64> %tmp2, <i64 1, i64 1>
ret <2 x i64> %tmp3		ret <2 x i64> %tmp3
}		}

define <8 x i16> @sshll8h(<8 x i8>* %A) nounwind {		define <8 x i16> @sshll8h(<8 x i8>* %A) nounwind {
Show All 20 Lines	;CHECK: sshll.2d v0, {{v[0-9]+}}, #1
%tmp1 = load <2 x i32>, <2 x i32>* %A		%tmp1 = load <2 x i32>, <2 x i32>* %A
%tmp2 = sext <2 x i32> %tmp1 to <2 x i64>		%tmp2 = sext <2 x i32> %tmp1 to <2 x i64>
%tmp3 = shl <2 x i64> %tmp2, <i64 1, i64 1>		%tmp3 = shl <2 x i64> %tmp2, <i64 1, i64 1>
ret <2 x i64> %tmp3		ret <2 x i64> %tmp3
}		}

define <8 x i16> @sshll2_8h(<16 x i8>* %A) nounwind {		define <8 x i16> @sshll2_8h(<16 x i8>* %A) nounwind {
;CHECK-LABEL: sshll2_8h:		;CHECK-LABEL: sshll2_8h:
;CHECK: sshll2.8h v0, {{v[0-9]+}}, #1		;CHECK: sshll.8h v0, {{v[0-9]+}}, #1
%load1 = load <16 x i8>, <16 x i8>* %A		%load1 = load <16 x i8>, <16 x i8>* %A
%tmp1 = shufflevector <16 x i8> %load1, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%tmp1 = shufflevector <16 x i8> %load1, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%tmp2 = sext <8 x i8> %tmp1 to <8 x i16>		%tmp2 = sext <8 x i8> %tmp1 to <8 x i16>
%tmp3 = shl <8 x i16> %tmp2, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>		%tmp3 = shl <8 x i16> %tmp2, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
ret <8 x i16> %tmp3		ret <8 x i16> %tmp3
}		}

define <4 x i32> @sshll2_4s(<8 x i16>* %A) nounwind {		define <4 x i32> @sshll2_4s(<8 x i16>* %A) nounwind {
;CHECK-LABEL: sshll2_4s:		;CHECK-LABEL: sshll2_4s:
;CHECK: sshll2.4s v0, {{v[0-9]+}}, #1		;CHECK: sshll.4s v0, {{v[0-9]+}}, #1
%load1 = load <8 x i16>, <8 x i16>* %A		%load1 = load <8 x i16>, <8 x i16>* %A
%tmp1 = shufflevector <8 x i16> %load1, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		%tmp1 = shufflevector <8 x i16> %load1, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%tmp2 = sext <4 x i16> %tmp1 to <4 x i32>		%tmp2 = sext <4 x i16> %tmp1 to <4 x i32>
%tmp3 = shl <4 x i32> %tmp2, <i32 1, i32 1, i32 1, i32 1>		%tmp3 = shl <4 x i32> %tmp2, <i32 1, i32 1, i32 1, i32 1>
ret <4 x i32> %tmp3		ret <4 x i32> %tmp3
}		}

define <2 x i64> @sshll2_2d(<4 x i32>* %A) nounwind {		define <2 x i64> @sshll2_2d(<4 x i32>* %A) nounwind {
;CHECK-LABEL: sshll2_2d:		;CHECK-LABEL: sshll2_2d:
;CHECK: sshll2.2d v0, {{v[0-9]+}}, #1		;CHECK: sshll.2d v0, {{v[0-9]+}}, #1
%load1 = load <4 x i32>, <4 x i32>* %A		%load1 = load <4 x i32>, <4 x i32>* %A
%tmp1 = shufflevector <4 x i32> %load1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>		%tmp1 = shufflevector <4 x i32> %load1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%tmp2 = sext <2 x i32> %tmp1 to <2 x i64>		%tmp2 = sext <2 x i32> %tmp1 to <2 x i64>
%tmp3 = shl <2 x i64> %tmp2, <i64 1, i64 1>		%tmp3 = shl <2 x i64> %tmp2, <i64 1, i64 1>
ret <2 x i64> %tmp3		ret <2 x i64> %tmp3
}		}

define <8 x i8> @sqshli8b(<8 x i8>* %A) nounwind {		define <8 x i8> @sqshli8b(<8 x i8>* %A) nounwind {
▲ Show 20 Lines • Show All 674 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/arm64-vsub.ll

Show First 20 Lines • Show All 151 Lines • ▼ Show 20 Lines	;CHECK: ssubl.2d
%tmp3 = sext <2 x i32> %tmp1 to <2 x i64>		%tmp3 = sext <2 x i32> %tmp1 to <2 x i64>
%tmp4 = sext <2 x i32> %tmp2 to <2 x i64>		%tmp4 = sext <2 x i32> %tmp2 to <2 x i64>
%tmp5 = sub <2 x i64> %tmp3, %tmp4		%tmp5 = sub <2 x i64> %tmp3, %tmp4
ret <2 x i64> %tmp5		ret <2 x i64> %tmp5
}		}

define <8 x i16> @ssubl2_8h(<16 x i8>* %A, <16 x i8>* %B) nounwind {		define <8 x i16> @ssubl2_8h(<16 x i8>* %A, <16 x i8>* %B) nounwind {
;CHECK-LABEL: ssubl2_8h:		;CHECK-LABEL: ssubl2_8h:
;CHECK: ssubl2.8h		;CHECK: ssubl.8h
%tmp1 = load <16 x i8>, <16 x i8>* %A		%tmp1 = load <16 x i8>, <16 x i8>* %A
%high1 = shufflevector <16 x i8> %tmp1, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%high1 = shufflevector <16 x i8> %tmp1, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%ext1 = sext <8 x i8> %high1 to <8 x i16>		%ext1 = sext <8 x i8> %high1 to <8 x i16>

%tmp2 = load <16 x i8>, <16 x i8>* %B		%tmp2 = load <16 x i8>, <16 x i8>* %B
%high2 = shufflevector <16 x i8> %tmp2, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%high2 = shufflevector <16 x i8> %tmp2, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%ext2 = sext <8 x i8> %high2 to <8 x i16>		%ext2 = sext <8 x i8> %high2 to <8 x i16>

%res = sub <8 x i16> %ext1, %ext2		%res = sub <8 x i16> %ext1, %ext2
ret <8 x i16> %res		ret <8 x i16> %res
}		}

define <4 x i32> @ssubl2_4s(<8 x i16>* %A, <8 x i16>* %B) nounwind {		define <4 x i32> @ssubl2_4s(<8 x i16>* %A, <8 x i16>* %B) nounwind {
;CHECK-LABEL: ssubl2_4s:		;CHECK-LABEL: ssubl2_4s:
;CHECK: ssubl2.4s		;CHECK: ssubl.4s
%tmp1 = load <8 x i16>, <8 x i16>* %A		%tmp1 = load <8 x i16>, <8 x i16>* %A
%high1 = shufflevector <8 x i16> %tmp1, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		%high1 = shufflevector <8 x i16> %tmp1, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%ext1 = sext <4 x i16> %high1 to <4 x i32>		%ext1 = sext <4 x i16> %high1 to <4 x i32>

%tmp2 = load <8 x i16>, <8 x i16>* %B		%tmp2 = load <8 x i16>, <8 x i16>* %B
%high2 = shufflevector <8 x i16> %tmp2, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		%high2 = shufflevector <8 x i16> %tmp2, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%ext2 = sext <4 x i16> %high2 to <4 x i32>		%ext2 = sext <4 x i16> %high2 to <4 x i32>

%res = sub <4 x i32> %ext1, %ext2		%res = sub <4 x i32> %ext1, %ext2
ret <4 x i32> %res		ret <4 x i32> %res
}		}

define <2 x i64> @ssubl2_2d(<4 x i32>* %A, <4 x i32>* %B) nounwind {		define <2 x i64> @ssubl2_2d(<4 x i32>* %A, <4 x i32>* %B) nounwind {
;CHECK-LABEL: ssubl2_2d:		;CHECK-LABEL: ssubl2_2d:
;CHECK: ssubl2.2d		;CHECK: ssubl.2d
%tmp1 = load <4 x i32>, <4 x i32>* %A		%tmp1 = load <4 x i32>, <4 x i32>* %A
%high1 = shufflevector <4 x i32> %tmp1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>		%high1 = shufflevector <4 x i32> %tmp1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%ext1 = sext <2 x i32> %high1 to <2 x i64>		%ext1 = sext <2 x i32> %high1 to <2 x i64>

%tmp2 = load <4 x i32>, <4 x i32>* %B		%tmp2 = load <4 x i32>, <4 x i32>* %B
%high2 = shufflevector <4 x i32> %tmp2, <4 x i32> undef, <2 x i32> <i32 2, i32 3>		%high2 = shufflevector <4 x i32> %tmp2, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%ext2 = sext <2 x i32> %high2 to <2 x i64>		%ext2 = sext <2 x i32> %high2 to <2 x i64>

Show All 31 Lines	;CHECK: usubl.2d
%tmp3 = zext <2 x i32> %tmp1 to <2 x i64>		%tmp3 = zext <2 x i32> %tmp1 to <2 x i64>
%tmp4 = zext <2 x i32> %tmp2 to <2 x i64>		%tmp4 = zext <2 x i32> %tmp2 to <2 x i64>
%tmp5 = sub <2 x i64> %tmp3, %tmp4		%tmp5 = sub <2 x i64> %tmp3, %tmp4
ret <2 x i64> %tmp5		ret <2 x i64> %tmp5
}		}

define <8 x i16> @usubl2_8h(<16 x i8>* %A, <16 x i8>* %B) nounwind {		define <8 x i16> @usubl2_8h(<16 x i8>* %A, <16 x i8>* %B) nounwind {
;CHECK-LABEL: usubl2_8h:		;CHECK-LABEL: usubl2_8h:
;CHECK: usubl2.8h		;CHECK: usubl.8h
%tmp1 = load <16 x i8>, <16 x i8>* %A		%tmp1 = load <16 x i8>, <16 x i8>* %A
%high1 = shufflevector <16 x i8> %tmp1, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%high1 = shufflevector <16 x i8> %tmp1, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%ext1 = zext <8 x i8> %high1 to <8 x i16>		%ext1 = zext <8 x i8> %high1 to <8 x i16>

%tmp2 = load <16 x i8>, <16 x i8>* %B		%tmp2 = load <16 x i8>, <16 x i8>* %B
%high2 = shufflevector <16 x i8> %tmp2, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%high2 = shufflevector <16 x i8> %tmp2, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%ext2 = zext <8 x i8> %high2 to <8 x i16>		%ext2 = zext <8 x i8> %high2 to <8 x i16>

%res = sub <8 x i16> %ext1, %ext2		%res = sub <8 x i16> %ext1, %ext2
ret <8 x i16> %res		ret <8 x i16> %res
}		}

define <4 x i32> @usubl2_4s(<8 x i16>* %A, <8 x i16>* %B) nounwind {		define <4 x i32> @usubl2_4s(<8 x i16>* %A, <8 x i16>* %B) nounwind {
;CHECK-LABEL: usubl2_4s:		;CHECK-LABEL: usubl2_4s:
;CHECK: usubl2.4s		;CHECK: usubl.4s
%tmp1 = load <8 x i16>, <8 x i16>* %A		%tmp1 = load <8 x i16>, <8 x i16>* %A
%high1 = shufflevector <8 x i16> %tmp1, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		%high1 = shufflevector <8 x i16> %tmp1, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%ext1 = zext <4 x i16> %high1 to <4 x i32>		%ext1 = zext <4 x i16> %high1 to <4 x i32>

%tmp2 = load <8 x i16>, <8 x i16>* %B		%tmp2 = load <8 x i16>, <8 x i16>* %B
%high2 = shufflevector <8 x i16> %tmp2, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		%high2 = shufflevector <8 x i16> %tmp2, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%ext2 = zext <4 x i16> %high2 to <4 x i32>		%ext2 = zext <4 x i16> %high2 to <4 x i32>

%res = sub <4 x i32> %ext1, %ext2		%res = sub <4 x i32> %ext1, %ext2
ret <4 x i32> %res		ret <4 x i32> %res
}		}

define <2 x i64> @usubl2_2d(<4 x i32>* %A, <4 x i32>* %B) nounwind {		define <2 x i64> @usubl2_2d(<4 x i32>* %A, <4 x i32>* %B) nounwind {
;CHECK-LABEL: usubl2_2d:		;CHECK-LABEL: usubl2_2d:
;CHECK: usubl2.2d		;CHECK: usubl.2d
%tmp1 = load <4 x i32>, <4 x i32>* %A		%tmp1 = load <4 x i32>, <4 x i32>* %A
%high1 = shufflevector <4 x i32> %tmp1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>		%high1 = shufflevector <4 x i32> %tmp1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%ext1 = zext <2 x i32> %high1 to <2 x i64>		%ext1 = zext <2 x i32> %high1 to <2 x i64>

%tmp2 = load <4 x i32>, <4 x i32>* %B		%tmp2 = load <4 x i32>, <4 x i32>* %B
%high2 = shufflevector <4 x i32> %tmp2, <4 x i32> undef, <2 x i32> <i32 2, i32 3>		%high2 = shufflevector <4 x i32> %tmp2, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%ext2 = zext <2 x i32> %high2 to <2 x i64>		%ext2 = zext <2 x i32> %high2 to <2 x i64>

Show All 28 Lines	;CHECK: ssubw.2d
%tmp2 = load <2 x i32>, <2 x i32>* %B		%tmp2 = load <2 x i32>, <2 x i32>* %B
%tmp3 = sext <2 x i32> %tmp2 to <2 x i64>		%tmp3 = sext <2 x i32> %tmp2 to <2 x i64>
%tmp4 = sub <2 x i64> %tmp1, %tmp3		%tmp4 = sub <2 x i64> %tmp1, %tmp3
ret <2 x i64> %tmp4		ret <2 x i64> %tmp4
}		}

define <8 x i16> @ssubw2_8h(<8 x i16>* %A, <16 x i8>* %B) nounwind {		define <8 x i16> @ssubw2_8h(<8 x i16>* %A, <16 x i8>* %B) nounwind {
;CHECK-LABEL: ssubw2_8h:		;CHECK-LABEL: ssubw2_8h:
;CHECK: ssubw2.8h		;CHECK: ssubw.8h
%tmp1 = load <8 x i16>, <8 x i16>* %A		%tmp1 = load <8 x i16>, <8 x i16>* %A

%tmp2 = load <16 x i8>, <16 x i8>* %B		%tmp2 = load <16 x i8>, <16 x i8>* %B
%high2 = shufflevector <16 x i8> %tmp2, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%high2 = shufflevector <16 x i8> %tmp2, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%ext2 = sext <8 x i8> %high2 to <8 x i16>		%ext2 = sext <8 x i8> %high2 to <8 x i16>

%res = sub <8 x i16> %tmp1, %ext2		%res = sub <8 x i16> %tmp1, %ext2
ret <8 x i16> %res		ret <8 x i16> %res
}		}

define <4 x i32> @ssubw2_4s(<4 x i32>* %A, <8 x i16>* %B) nounwind {		define <4 x i32> @ssubw2_4s(<4 x i32>* %A, <8 x i16>* %B) nounwind {
;CHECK-LABEL: ssubw2_4s:		;CHECK-LABEL: ssubw2_4s:
;CHECK: ssubw2.4s		;CHECK: ssubw.4s
%tmp1 = load <4 x i32>, <4 x i32>* %A		%tmp1 = load <4 x i32>, <4 x i32>* %A

%tmp2 = load <8 x i16>, <8 x i16>* %B		%tmp2 = load <8 x i16>, <8 x i16>* %B
%high2 = shufflevector <8 x i16> %tmp2, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		%high2 = shufflevector <8 x i16> %tmp2, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%ext2 = sext <4 x i16> %high2 to <4 x i32>		%ext2 = sext <4 x i16> %high2 to <4 x i32>

%res = sub <4 x i32> %tmp1, %ext2		%res = sub <4 x i32> %tmp1, %ext2
ret <4 x i32> %res		ret <4 x i32> %res
}		}

define <2 x i64> @ssubw2_2d(<2 x i64>* %A, <4 x i32>* %B) nounwind {		define <2 x i64> @ssubw2_2d(<2 x i64>* %A, <4 x i32>* %B) nounwind {
;CHECK-LABEL: ssubw2_2d:		;CHECK-LABEL: ssubw2_2d:
;CHECK: ssubw2.2d		;CHECK: ssubw.2d
%tmp1 = load <2 x i64>, <2 x i64>* %A		%tmp1 = load <2 x i64>, <2 x i64>* %A

%tmp2 = load <4 x i32>, <4 x i32>* %B		%tmp2 = load <4 x i32>, <4 x i32>* %B
%high2 = shufflevector <4 x i32> %tmp2, <4 x i32> undef, <2 x i32> <i32 2, i32 3>		%high2 = shufflevector <4 x i32> %tmp2, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%ext2 = sext <2 x i32> %high2 to <2 x i64>		%ext2 = sext <2 x i32> %high2 to <2 x i64>

%res = sub <2 x i64> %tmp1, %ext2		%res = sub <2 x i64> %tmp1, %ext2
ret <2 x i64> %res		ret <2 x i64> %res
Show All 26 Lines	;CHECK: usubw.2d
%tmp2 = load <2 x i32>, <2 x i32>* %B		%tmp2 = load <2 x i32>, <2 x i32>* %B
%tmp3 = zext <2 x i32> %tmp2 to <2 x i64>		%tmp3 = zext <2 x i32> %tmp2 to <2 x i64>
%tmp4 = sub <2 x i64> %tmp1, %tmp3		%tmp4 = sub <2 x i64> %tmp1, %tmp3
ret <2 x i64> %tmp4		ret <2 x i64> %tmp4
}		}

define <8 x i16> @usubw2_8h(<8 x i16>* %A, <16 x i8>* %B) nounwind {		define <8 x i16> @usubw2_8h(<8 x i16>* %A, <16 x i8>* %B) nounwind {
;CHECK-LABEL: usubw2_8h:		;CHECK-LABEL: usubw2_8h:
;CHECK: usubw2.8h		;CHECK: usubw.8h
%tmp1 = load <8 x i16>, <8 x i16>* %A		%tmp1 = load <8 x i16>, <8 x i16>* %A

%tmp2 = load <16 x i8>, <16 x i8>* %B		%tmp2 = load <16 x i8>, <16 x i8>* %B
%high2 = shufflevector <16 x i8> %tmp2, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%high2 = shufflevector <16 x i8> %tmp2, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%ext2 = zext <8 x i8> %high2 to <8 x i16>		%ext2 = zext <8 x i8> %high2 to <8 x i16>

%res = sub <8 x i16> %tmp1, %ext2		%res = sub <8 x i16> %tmp1, %ext2
ret <8 x i16> %res		ret <8 x i16> %res
}		}

define <4 x i32> @usubw2_4s(<4 x i32>* %A, <8 x i16>* %B) nounwind {		define <4 x i32> @usubw2_4s(<4 x i32>* %A, <8 x i16>* %B) nounwind {
;CHECK-LABEL: usubw2_4s:		;CHECK-LABEL: usubw2_4s:
;CHECK: usubw2.4s		;CHECK: usubw.4s
%tmp1 = load <4 x i32>, <4 x i32>* %A		%tmp1 = load <4 x i32>, <4 x i32>* %A

%tmp2 = load <8 x i16>, <8 x i16>* %B		%tmp2 = load <8 x i16>, <8 x i16>* %B
%high2 = shufflevector <8 x i16> %tmp2, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		%high2 = shufflevector <8 x i16> %tmp2, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%ext2 = zext <4 x i16> %high2 to <4 x i32>		%ext2 = zext <4 x i16> %high2 to <4 x i32>

%res = sub <4 x i32> %tmp1, %ext2		%res = sub <4 x i32> %tmp1, %ext2
ret <4 x i32> %res		ret <4 x i32> %res
}		}

define <2 x i64> @usubw2_2d(<2 x i64>* %A, <4 x i32>* %B) nounwind {		define <2 x i64> @usubw2_2d(<2 x i64>* %A, <4 x i32>* %B) nounwind {
;CHECK-LABEL: usubw2_2d:		;CHECK-LABEL: usubw2_2d:
;CHECK: usubw2.2d		;CHECK: usubw.2d
%tmp1 = load <2 x i64>, <2 x i64>* %A		%tmp1 = load <2 x i64>, <2 x i64>* %A

%tmp2 = load <4 x i32>, <4 x i32>* %B		%tmp2 = load <4 x i32>, <4 x i32>* %B
%high2 = shufflevector <4 x i32> %tmp2, <4 x i32> undef, <2 x i32> <i32 2, i32 3>		%high2 = shufflevector <4 x i32> %tmp2, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%ext2 = zext <2 x i32> %high2 to <2 x i64>		%ext2 = zext <2 x i32> %high2 to <2 x i64>

%res = sub <2 x i64> %tmp1, %ext2		%res = sub <2 x i64> %tmp1, %ext2
ret <2 x i64> %res		ret <2 x i64> %res
}		}

llvm/trunk/test/CodeGen/ARM/vcombine.ll

Show First 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	; CHECK-BE: vmov r1, r0, d16
%tmp1 = load <8 x i16>, <8 x i16>* %A		%tmp1 = load <8 x i16>, <8 x i16>* %A
%tmp2 = shufflevector <8 x i16> %tmp1, <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		%tmp2 = shufflevector <8 x i16> %tmp1, <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
ret <4 x i16> %tmp2		ret <4 x i16> %tmp2
}		}

define <8 x i8> @vget_high8(<16 x i8>* %A) nounwind {		define <8 x i8> @vget_high8(<16 x i8>* %A) nounwind {
; CHECK: vget_high8		; CHECK: vget_high8
; CHECK-NOT: vst		; CHECK-NOT: vst
; CHECK-LE: vmov r0, r1, d17		; CHECK-LE-NOT: vld1.64 {d16, d17}, [r0]
		; CHECK-LE: vldr d16, [r0, #8]
		; CHECK-LE: vmov r0, r1, d16
; CHECK-BE: vmov r1, r0, d16		; CHECK-BE: vmov r1, r0, d16
%tmp1 = load <16 x i8>, <16 x i8>* %A		%tmp1 = load <16 x i8>, <16 x i8>* %A
%tmp2 = shufflevector <16 x i8> %tmp1, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%tmp2 = shufflevector <16 x i8> %tmp1, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
ret <8 x i8> %tmp2		ret <8 x i8> %tmp2
}		}

; vcombine(vld1_dup(p), vld1_dup(p2))		; vcombine(vld1_dup(p), vld1_dup(p2))
define <8 x i16> @vcombine_vdup(<8 x i16> %src, i16* nocapture readonly %p) {		define <8 x i16> @vcombine_vdup(<8 x i16> %src, i16* nocapture readonly %p) {
Show All 15 Lines

llvm/trunk/test/CodeGen/ARM/vext.ll

Show First 20 Lines • Show All 193 Lines • ▼ Show 20 Lines	; CHECK-NEXT: mov pc, lr
%tmp3 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <4 x i32> <i32 3, i32 8, i32 5, i32 9>		%tmp3 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <4 x i32> <i32 3, i32 8, i32 5, i32 9>
ret <4 x i16> %tmp3		ret <4 x i16> %tmp3
}		}

; An undef in the shuffle list should still be optimizable		; An undef in the shuffle list should still be optimizable
define <4 x i16> @test_undef(<8 x i16>* %A, <8 x i16>* %B) nounwind {		define <4 x i16> @test_undef(<8 x i16>* %A, <8 x i16>* %B) nounwind {
; CHECK-LABEL: test_undef:		; CHECK-LABEL: test_undef:
; CHECK: @ BB#0:		; CHECK: @ BB#0:
; CHECK-NEXT: vld1.64 {d16, d17}, [r1]		; CHECK-NEXT: vldr d16, [r1]
; CHECK-NEXT: vld1.64 {d18, d19}, [r0]		; CHECK-NEXT: vldr d17, [r0, #8]
; CHECK-NEXT: vzip.16 d19, d16		; CHECK-NEXT: vzip.16 d17, d16
; CHECK-NEXT: vmov r0, r1, d19		; CHECK-NEXT: vmov r0, r1, d17
; CHECK-NEXT: mov pc, lr		; CHECK-NEXT: mov pc, lr
%tmp1 = load <8 x i16>, <8 x i16>* %A		%tmp1 = load <8 x i16>, <8 x i16>* %A
%tmp2 = load <8 x i16>, <8 x i16>* %B		%tmp2 = load <8 x i16>, <8 x i16>* %B
%tmp3 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <4 x i32> <i32 undef, i32 8, i32 5, i32 9>		%tmp3 = shufflevector <8 x i16> %tmp1, <8 x i16> %tmp2, <4 x i32> <i32 undef, i32 8, i32 5, i32 9>
ret <4 x i16> %tmp3		ret <4 x i16> %tmp3
}		}

; We should ignore a build_vector with more than two sources.		; We should ignore a build_vector with more than two sources.
▲ Show 20 Lines • Show All 142 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/avx512bw-intrinsics.ll

	Show First 20 Lines • Show All 2,210 Lines • ▼ Show 20 Lines
	; AVX512BW-NEXT: kmovd %edi, %k0			; AVX512BW-NEXT: kmovd %edi, %k0
	; AVX512BW-NEXT: kmovd %esi, %k1			; AVX512BW-NEXT: kmovd %esi, %k1
	; AVX512BW-NEXT: kunpckwd %k1, %k0, %k0			; AVX512BW-NEXT: kunpckwd %k1, %k0, %k0
	; AVX512BW-NEXT: kmovd %k0, %eax			; AVX512BW-NEXT: kmovd %k0, %eax
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512F-32-LABEL: test_int_x86_avx512_kunpck_wd:			; AVX512F-32-LABEL: test_int_x86_avx512_kunpck_wd:
	; AVX512F-32: # BB#0:			; AVX512F-32: # BB#0:
	; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k0			; AVX512F-32-NEXT: kmovw {{[0-9]+}}(%esp), %k0
	; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1			; AVX512F-32-NEXT: kmovw {{[0-9]+}}(%esp), %k1
	; AVX512F-32-NEXT: kunpckwd %k1, %k0, %k0			; AVX512F-32-NEXT: kunpckwd %k0, %k1, %k0
	; AVX512F-32-NEXT: kmovd %k0, %eax			; AVX512F-32-NEXT: kmovd %k0, %eax
	; AVX512F-32-NEXT: retl			; AVX512F-32-NEXT: retl
	%res = call i32 @llvm.x86.avx512.kunpck.wd(i32 %x0, i32 %x1)			%res = call i32 @llvm.x86.avx512.kunpck.wd(i32 %x0, i32 %x1)
	ret i32 %res			ret i32 %res
	}			}

	declare i64 @llvm.x86.avx512.kunpck.dq(i64, i64)			declare i64 @llvm.x86.avx512.kunpck.dq(i64, i64)

	▲ Show 20 Lines • Show All 711 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/vector-shuffle-512-v16.ll

	Show First 20 Lines • Show All 276 Lines • ▼ Show 20 Lines
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	%c = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 0, i32 1, i32 2, i32 19, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%c = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 0, i32 1, i32 2, i32 19, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	ret <16 x i32> %c			ret <16 x i32> %c
	}			}

	define <8 x float> @shuffle_v16f32_extract_256(float* %RET, float* %a) {			define <8 x float> @shuffle_v16f32_extract_256(float* %RET, float* %a) {
	; ALL-LABEL: shuffle_v16f32_extract_256:			; ALL-LABEL: shuffle_v16f32_extract_256:
	; ALL: # BB#0:			; ALL: # BB#0:
	; ALL-NEXT: vmovups (%rsi), %zmm0			; ALL-NEXT: vmovups 32(%rsi), %ymm0
	; ALL-NEXT: vextractf32x8 $1, %zmm0, %ymm0
	; ALL-NEXT: retq			; ALL-NEXT: retq
	%ptr_a = bitcast float* %a to <16 x float>*			%ptr_a = bitcast float* %a to <16 x float>*
	%v_a = load <16 x float>, <16 x float>* %ptr_a, align 4			%v_a = load <16 x float>, <16 x float>* %ptr_a, align 4
	%v2 = shufflevector <16 x float> %v_a, <16 x float> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			%v2 = shufflevector <16 x float> %v_a, <16 x float> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	ret <8 x float> %v2			ret <8 x float> %v2
	}			}

	define <16 x i32> @shuffle_v16i16_1_0_0_0_5_4_4_4_9_8_8_8_13_12_12_12(<16 x i32> %a, <16 x i32> %b) {			define <16 x i32> @shuffle_v16i16_1_0_0_0_5_4_4_4_9_8_8_8_13_12_12_12(<16 x i32> %a, <16 x i32> %b) {
	▲ Show 20 Lines • Show All 361 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/vector-shuffle-avx512.ll

	Show First 20 Lines • Show All 505 Lines • ▼ Show 20 Lines
	; SKX64-NEXT: # kill: %XMM0<def> %XMM0<kill> %YMM0<def>			; SKX64-NEXT: # kill: %XMM0<def> %XMM0<kill> %YMM0<def>
	; SKX64-NEXT: movb $20, %al			; SKX64-NEXT: movb $20, %al
	; SKX64-NEXT: kmovd %eax, %k1			; SKX64-NEXT: kmovd %eax, %k1
	; SKX64-NEXT: vexpandps %ymm0, %ymm0 {%k1} {z}			; SKX64-NEXT: vexpandps %ymm0, %ymm0 {%k1} {z}
	; SKX64-NEXT: retq			; SKX64-NEXT: retq
	;			;
	; KNL64-LABEL: expand14:			; KNL64-LABEL: expand14:
	; KNL64: # BB#0:			; KNL64: # BB#0:
				; KNL64-NEXT: vpermilps {{.*#+}} xmm1 = mem[3,3,0,0]
				; KNL64-NEXT: vpermpd {{.*#+}} ymm1 = ymm1[0,1,1,1]
	; KNL64-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,1,1,3]			; KNL64-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,1,1,3]
	; KNL64-NEXT: vpermpd {{.*#+}} ymm0 = ymm0[0,0,1,3]			; KNL64-NEXT: vpermpd {{.*#+}} ymm0 = ymm0[0,0,1,3]
	; KNL64-NEXT: vmovaps {{.*#+}} ymm1 = <0,2,4,0,u,u,u,u>
	; KNL64-NEXT: vpermilps {{.*#+}} xmm1 = xmm1[3,3,0,0]
	; KNL64-NEXT: vpermpd {{.*#+}} ymm1 = ymm1[0,1,1,1]
	; KNL64-NEXT: vblendps {{.*#+}} ymm0 = ymm1[0,1],ymm0[2],ymm1[3],ymm0[4],ymm1[5,6,7]			; KNL64-NEXT: vblendps {{.*#+}} ymm0 = ymm1[0,1],ymm0[2],ymm1[3],ymm0[4],ymm1[5,6,7]
	; KNL64-NEXT: retq			; KNL64-NEXT: retq
	;			;
	; SKX32-LABEL: expand14:			; SKX32-LABEL: expand14:
	; SKX32: # BB#0:			; SKX32: # BB#0:
	; SKX32-NEXT: # kill: %XMM0<def> %XMM0<kill> %YMM0<def>			; SKX32-NEXT: # kill: %XMM0<def> %XMM0<kill> %YMM0<def>
	; SKX32-NEXT: movb $20, %al			; SKX32-NEXT: movb $20, %al
	; SKX32-NEXT: kmovd %eax, %k1			; SKX32-NEXT: kmovd %eax, %k1
	; SKX32-NEXT: vexpandps %ymm0, %ymm0 {%k1} {z}			; SKX32-NEXT: vexpandps %ymm0, %ymm0 {%k1} {z}
	; SKX32-NEXT: retl			; SKX32-NEXT: retl
	;			;
	; KNL32-LABEL: expand14:			; KNL32-LABEL: expand14:
	; KNL32: # BB#0:			; KNL32: # BB#0:
				; KNL32-NEXT: vpermilps {{.*#+}} xmm1 = mem[3,3,0,0]
				; KNL32-NEXT: vpermpd {{.*#+}} ymm1 = ymm1[0,1,1,1]
	; KNL32-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,1,1,3]			; KNL32-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,1,1,3]
	; KNL32-NEXT: vpermpd {{.*#+}} ymm0 = ymm0[0,0,1,3]			; KNL32-NEXT: vpermpd {{.*#+}} ymm0 = ymm0[0,0,1,3]
	; KNL32-NEXT: vmovaps {{.*#+}} ymm1 = <0,2,4,0,u,u,u,u>
	; KNL32-NEXT: vpermilps {{.*#+}} xmm1 = xmm1[3,3,0,0]
	; KNL32-NEXT: vpermpd {{.*#+}} ymm1 = ymm1[0,1,1,1]
	; KNL32-NEXT: vblendps {{.*#+}} ymm0 = ymm1[0,1],ymm0[2],ymm1[3],ymm0[4],ymm1[5,6,7]			; KNL32-NEXT: vblendps {{.*#+}} ymm0 = ymm1[0,1],ymm0[2],ymm1[3],ymm0[4],ymm1[5,6,7]
	; KNL32-NEXT: retl			; KNL32-NEXT: retl
	%addV = fadd <4 x float> <float 0.0,float 1.0,float 2.0,float 0.0> , <float 0.0,float 1.0,float 2.0,float 0.0>			%addV = fadd <4 x float> <float 0.0,float 1.0,float 2.0,float 0.0> , <float 0.0,float 1.0,float 2.0,float 0.0>
	%res = shufflevector <4 x float> %addV, <4 x float> %a, <8 x i32> <i32 3, i32 3, i32 4, i32 0, i32 5, i32 0, i32 0, i32 0>			%res = shufflevector <4 x float> %addV, <4 x float> %a, <8 x i32> <i32 3, i32 3, i32 4, i32 0, i32 5, i32 0, i32 0, i32 0>
	ret <8 x float> %res			ret <8 x float> %res
	}			}

	;Negative test.			;Negative test.
	define <8 x float> @expand15(<4 x float> %a) {			define <8 x float> @expand15(<4 x float> %a) {
	; SKX64-LABEL: expand15:			; SKX64-LABEL: expand15:
	; SKX64: # BB#0:			; SKX64: # BB#0:
	; SKX64-NEXT: vpermilps {{.*#+}} xmm1 = xmm0[0,1,1,3]			; SKX64-NEXT: vpermilps {{.*#+}} xmm1 = mem[0,1,0,0]
	; SKX64-NEXT: vmovaps {{.*#+}} ymm0 = <0,2,4,0,u,u,u,u>			; SKX64-NEXT: vpermilps {{.*#+}} xmm2 = xmm0[0,1,1,3]
	; SKX64-NEXT: vpermilps {{.*#+}} xmm2 = xmm0[0,1,0,0]
	; SKX64-NEXT: vmovaps {{.*#+}} ymm0 = [0,1,8,3,10,3,2,3]			; SKX64-NEXT: vmovaps {{.*#+}} ymm0 = [0,1,8,3,10,3,2,3]
	; SKX64-NEXT: vpermi2ps %ymm1, %ymm2, %ymm0			; SKX64-NEXT: vpermi2ps %ymm2, %ymm1, %ymm0
	; SKX64-NEXT: retq			; SKX64-NEXT: retq
	;			;
	; KNL64-LABEL: expand15:			; KNL64-LABEL: expand15:
	; KNL64: # BB#0:			; KNL64: # BB#0:
				; KNL64-NEXT: vpermilps {{.*#+}} xmm1 = mem[0,1,0,0]
				; KNL64-NEXT: vpermpd {{.*#+}} ymm1 = ymm1[0,1,1,1]
	; KNL64-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,1,1,3]			; KNL64-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,1,1,3]
	; KNL64-NEXT: vpermpd {{.*#+}} ymm0 = ymm0[0,0,1,3]			; KNL64-NEXT: vpermpd {{.*#+}} ymm0 = ymm0[0,0,1,3]
	; KNL64-NEXT: vmovaps {{.*#+}} ymm1 = <0,2,4,0,u,u,u,u>
	; KNL64-NEXT: vpermilps {{.*#+}} xmm1 = xmm1[0,1,0,0]
	; KNL64-NEXT: vpermpd {{.*#+}} ymm1 = ymm1[0,1,1,1]
	; KNL64-NEXT: vblendps {{.*#+}} ymm0 = ymm1[0,1],ymm0[2],ymm1[3],ymm0[4],ymm1[5,6,7]			; KNL64-NEXT: vblendps {{.*#+}} ymm0 = ymm1[0,1],ymm0[2],ymm1[3],ymm0[4],ymm1[5,6,7]
	; KNL64-NEXT: retq			; KNL64-NEXT: retq
	;			;
	; SKX32-LABEL: expand15:			; SKX32-LABEL: expand15:
	; SKX32: # BB#0:			; SKX32: # BB#0:
	; SKX32-NEXT: vpermilps {{.*#+}} xmm1 = xmm0[0,1,1,3]			; SKX32-NEXT: vpermilps {{.*#+}} xmm1 = mem[0,1,0,0]
	; SKX32-NEXT: vmovaps {{.*#+}} ymm0 = <0,2,4,0,u,u,u,u>			; SKX32-NEXT: vpermilps {{.*#+}} xmm2 = xmm0[0,1,1,3]
	; SKX32-NEXT: vpermilps {{.*#+}} xmm2 = xmm0[0,1,0,0]
	; SKX32-NEXT: vmovaps {{.*#+}} ymm0 = [0,1,8,3,10,3,2,3]			; SKX32-NEXT: vmovaps {{.*#+}} ymm0 = [0,1,8,3,10,3,2,3]
	; SKX32-NEXT: vpermi2ps %ymm1, %ymm2, %ymm0			; SKX32-NEXT: vpermi2ps %ymm2, %ymm1, %ymm0
	; SKX32-NEXT: retl			; SKX32-NEXT: retl
	;			;
	; KNL32-LABEL: expand15:			; KNL32-LABEL: expand15:
	; KNL32: # BB#0:			; KNL32: # BB#0:
				; KNL32-NEXT: vpermilps {{.*#+}} xmm1 = mem[0,1,0,0]
				; KNL32-NEXT: vpermpd {{.*#+}} ymm1 = ymm1[0,1,1,1]
	; KNL32-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,1,1,3]			; KNL32-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,1,1,3]
	; KNL32-NEXT: vpermpd {{.*#+}} ymm0 = ymm0[0,0,1,3]			; KNL32-NEXT: vpermpd {{.*#+}} ymm0 = ymm0[0,0,1,3]
	; KNL32-NEXT: vmovaps {{.*#+}} ymm1 = <0,2,4,0,u,u,u,u>
	; KNL32-NEXT: vpermilps {{.*#+}} xmm1 = xmm1[0,1,0,0]
	; KNL32-NEXT: vpermpd {{.*#+}} ymm1 = ymm1[0,1,1,1]
	; KNL32-NEXT: vblendps {{.*#+}} ymm0 = ymm1[0,1],ymm0[2],ymm1[3],ymm0[4],ymm1[5,6,7]			; KNL32-NEXT: vblendps {{.*#+}} ymm0 = ymm1[0,1],ymm0[2],ymm1[3],ymm0[4],ymm1[5,6,7]
	; KNL32-NEXT: retl			; KNL32-NEXT: retl
	%addV = fadd <4 x float> <float 0.0,float 1.0,float 2.0,float 0.0> , <float 0.0,float 1.0,float 2.0,float 0.0>			%addV = fadd <4 x float> <float 0.0,float 1.0,float 2.0,float 0.0> , <float 0.0,float 1.0,float 2.0,float 0.0>
	%res = shufflevector <4 x float> %addV, <4 x float> %a, <8 x i32> <i32 0, i32 1, i32 4, i32 0, i32 5, i32 0, i32 0, i32 0>			%res = shufflevector <4 x float> %addV, <4 x float> %a, <8 x i32> <i32 0, i32 1, i32 4, i32 0, i32 5, i32 0, i32 0, i32 0>
	ret <8 x float> %res			ret <8 x float> %res
	}			}


	▲ Show 20 Lines • Show All 286 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/widened-broadcast.ll

	Show First 20 Lines • Show All 145 Lines • ▼ Show 20 Lines
	; SSE-LABEL: load_splat_8i32_8i32_01010101:			; SSE-LABEL: load_splat_8i32_8i32_01010101:
	; SSE: # BB#0: # %entry			; SSE: # BB#0: # %entry
	; SSE-NEXT: pshufd {{.*#+}} xmm0 = mem[0,1,0,1]			; SSE-NEXT: pshufd {{.*#+}} xmm0 = mem[0,1,0,1]
	; SSE-NEXT: movdqa %xmm0, %xmm1			; SSE-NEXT: movdqa %xmm0, %xmm1
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX1-LABEL: load_splat_8i32_8i32_01010101:			; AVX1-LABEL: load_splat_8i32_8i32_01010101:
	; AVX1: # BB#0: # %entry			; AVX1: # BB#0: # %entry
	; AVX1-NEXT: vmovapd (%rdi), %ymm0			; AVX1-NEXT: vmovddup {{.*#+}} xmm0 = mem[0,0]
	; AVX1-NEXT: vmovddup {{.*#+}} xmm0 = xmm0[0,0]
	; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: load_splat_8i32_8i32_01010101:			; AVX2-LABEL: load_splat_8i32_8i32_01010101:
	; AVX2: # BB#0: # %entry			; AVX2: # BB#0: # %entry
	; AVX2-NEXT: vbroadcastsd (%rdi), %ymm0			; AVX2-NEXT: vbroadcastsd (%rdi), %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines
	; SSE-LABEL: load_splat_16i16_16i16_0101010101010101:			; SSE-LABEL: load_splat_16i16_16i16_0101010101010101:
	; SSE: # BB#0: # %entry			; SSE: # BB#0: # %entry
	; SSE-NEXT: pshufd {{.*#+}} xmm0 = mem[0,0,0,0]			; SSE-NEXT: pshufd {{.*#+}} xmm0 = mem[0,0,0,0]
	; SSE-NEXT: movdqa %xmm0, %xmm1			; SSE-NEXT: movdqa %xmm0, %xmm1
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX1-LABEL: load_splat_16i16_16i16_0101010101010101:			; AVX1-LABEL: load_splat_16i16_16i16_0101010101010101:
	; AVX1: # BB#0: # %entry			; AVX1: # BB#0: # %entry
	; AVX1-NEXT: vmovaps (%rdi), %ymm0			; AVX1-NEXT: vpermilps {{.*#+}} xmm0 = mem[0,0,0,0]
	; AVX1-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,0,0,0]
	; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: load_splat_16i16_16i16_0101010101010101:			; AVX2-LABEL: load_splat_16i16_16i16_0101010101010101:
	; AVX2: # BB#0: # %entry			; AVX2: # BB#0: # %entry
	; AVX2-NEXT: vbroadcastss (%rdi), %ymm0			; AVX2-NEXT: vbroadcastss (%rdi), %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	Show All 9 Lines

	define <16 x i16> @load_splat_16i16_16i16_0123012301230123(<16 x i16>* %ptr) nounwind uwtable readnone ssp {			define <16 x i16> @load_splat_16i16_16i16_0123012301230123(<16 x i16>* %ptr) nounwind uwtable readnone ssp {
	; SSE-LABEL: load_splat_16i16_16i16_0123012301230123:			; SSE-LABEL: load_splat_16i16_16i16_0123012301230123:
	; SSE: # BB#0: # %entry			; SSE: # BB#0: # %entry
	; SSE-NEXT: pshufd {{.*#+}} xmm0 = mem[0,1,0,1]			; SSE-NEXT: pshufd {{.*#+}} xmm0 = mem[0,1,0,1]
	; SSE-NEXT: movdqa %xmm0, %xmm1			; SSE-NEXT: movdqa %xmm0, %xmm1
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX1-LABEL: load_splat_16i16_16i16_0123012301230123:			; AVX-LABEL: load_splat_16i16_16i16_0123012301230123:
	; AVX1: # BB#0: # %entry			; AVX: # BB#0: # %entry
	; AVX1-NEXT: vbroadcastsd (%rdi), %ymm0			; AVX-NEXT: vbroadcastsd (%rdi), %ymm0
	; AVX1-NEXT: retq			; AVX-NEXT: retq
	;
	; AVX2-LABEL: load_splat_16i16_16i16_0123012301230123:
	; AVX2: # BB#0: # %entry
	; AVX2-NEXT: vmovaps (%rdi), %ymm0
	; AVX2-NEXT: vbroadcastsd %xmm0, %ymm0
	; AVX2-NEXT: retq
	;
	; AVX512-LABEL: load_splat_16i16_16i16_0123012301230123:
	; AVX512: # BB#0: # %entry
	; AVX512-NEXT: vmovaps (%rdi), %ymm0
	; AVX512-NEXT: vbroadcastsd %xmm0, %ymm0
	; AVX512-NEXT: retq
	entry:			entry:
	%ld = load <16 x i16>, <16 x i16>* %ptr			%ld = load <16 x i16>, <16 x i16>* %ptr
	%ret = shufflevector <16 x i16> %ld, <16 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3,i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>			%ret = shufflevector <16 x i16> %ld, <16 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3,i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
	ret <16 x i16> %ret			ret <16 x i16> %ret
	}			}

	define <16 x i8> @load_splat_16i8_16i8_0101010101010101(<16 x i8>* %ptr) nounwind uwtable readnone ssp {			define <16 x i8> @load_splat_16i8_16i8_0101010101010101(<16 x i8>* %ptr) nounwind uwtable readnone ssp {
	; SSE-LABEL: load_splat_16i8_16i8_0101010101010101:			; SSE-LABEL: load_splat_16i8_16i8_0101010101010101:
	▲ Show 20 Lines • Show All 166 Lines • ▼ Show 20 Lines
	; SSE: # BB#0: # %entry			; SSE: # BB#0: # %entry
	; SSE-NEXT: pshuflw {{.*#+}} xmm0 = mem[0,0,0,0,4,5,6,7]			; SSE-NEXT: pshuflw {{.*#+}} xmm0 = mem[0,0,0,0,4,5,6,7]
	; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,0,1,1]			; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,0,1,1]
	; SSE-NEXT: movdqa %xmm0, %xmm1			; SSE-NEXT: movdqa %xmm0, %xmm1
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX1-LABEL: load_splat_32i8_32i8_01010101010101010101010101010101:			; AVX1-LABEL: load_splat_32i8_32i8_01010101010101010101010101010101:
	; AVX1: # BB#0: # %entry			; AVX1: # BB#0: # %entry
	; AVX1-NEXT: vmovdqa (%rdi), %ymm0			; AVX1-NEXT: vpshuflw {{.*#+}} xmm0 = mem[0,0,0,0,4,5,6,7]
	; AVX1-NEXT: vpshuflw {{.*#+}} xmm0 = xmm0[0,0,0,0,4,5,6,7]
	; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,0,1,1]			; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,0,1,1]
	; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: load_splat_32i8_32i8_01010101010101010101010101010101:			; AVX2-LABEL: load_splat_32i8_32i8_01010101010101010101010101010101:
	; AVX2: # BB#0: # %entry			; AVX2: # BB#0: # %entry
	; AVX2-NEXT: vpbroadcastw (%rdi), %ymm0			; AVX2-NEXT: vpbroadcastw (%rdi), %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines

	define <4 x float> @load_splat_4f32_8f32_0000(<8 x float>* %ptr) nounwind uwtable readnone ssp {			define <4 x float> @load_splat_4f32_8f32_0000(<8 x float>* %ptr) nounwind uwtable readnone ssp {
	; SSE-LABEL: load_splat_4f32_8f32_0000:			; SSE-LABEL: load_splat_4f32_8f32_0000:
	; SSE: # BB#0: # %entry			; SSE: # BB#0: # %entry
	; SSE-NEXT: movaps (%rdi), %xmm0			; SSE-NEXT: movaps (%rdi), %xmm0
	; SSE-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,0,0,0]			; SSE-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,0,0,0]
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX1-LABEL: load_splat_4f32_8f32_0000:			; AVX-LABEL: load_splat_4f32_8f32_0000:
	; AVX1: # BB#0: # %entry			; AVX: # BB#0: # %entry
	; AVX1-NEXT: vmovaps (%rdi), %ymm0			; AVX-NEXT: vbroadcastss (%rdi), %xmm0
	; AVX1-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,0,0,0]			; AVX-NEXT: retq
	; AVX1-NEXT: vzeroupper
	; AVX1-NEXT: retq
	;
	; AVX2-LABEL: load_splat_4f32_8f32_0000:
	; AVX2: # BB#0: # %entry
	; AVX2-NEXT: vmovaps (%rdi), %ymm0
	; AVX2-NEXT: vbroadcastss %xmm0, %xmm0
	; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq
	;
	; AVX512-LABEL: load_splat_4f32_8f32_0000:
	; AVX512: # BB#0: # %entry
	; AVX512-NEXT: vmovaps (%rdi), %ymm0
	; AVX512-NEXT: vbroadcastss %xmm0, %xmm0
	; AVX512-NEXT: vzeroupper
	; AVX512-NEXT: retq
	entry:			entry:
	%ld = load <8 x float>, <8 x float>* %ptr			%ld = load <8 x float>, <8 x float>* %ptr
	%ret = shufflevector <8 x float> %ld, <8 x float> undef, <4 x i32> zeroinitializer			%ret = shufflevector <8 x float> %ld, <8 x float> undef, <4 x i32> zeroinitializer
	ret <4 x float> %ret			ret <4 x float> %ret
	}			}

	define <8 x float> @load_splat_8f32_16f32_89898989(<16 x float>* %ptr) nounwind uwtable readnone ssp {			define <8 x float> @load_splat_8f32_16f32_89898989(<16 x float>* %ptr) nounwind uwtable readnone ssp {
	; SSE2-LABEL: load_splat_8f32_16f32_89898989:			; SSE2-LABEL: load_splat_8f32_16f32_89898989:
	; SSE2: # BB#0: # %entry			; SSE2: # BB#0: # %entry
	; SSE2-NEXT: movaps 32(%rdi), %xmm0			; SSE2-NEXT: movaps 32(%rdi), %xmm0
	; SSE2-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0,0]			; SSE2-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0,0]
	; SSE2-NEXT: movaps %xmm0, %xmm1			; SSE2-NEXT: movaps %xmm0, %xmm1
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE42-LABEL: load_splat_8f32_16f32_89898989:			; SSE42-LABEL: load_splat_8f32_16f32_89898989:
	; SSE42: # BB#0: # %entry			; SSE42: # BB#0: # %entry
	; SSE42-NEXT: movddup {{.*#+}} xmm0 = mem[0,0]			; SSE42-NEXT: movddup {{.*#+}} xmm0 = mem[0,0]
	; SSE42-NEXT: movapd %xmm0, %xmm1			; SSE42-NEXT: movapd %xmm0, %xmm1
	; SSE42-NEXT: retq			; SSE42-NEXT: retq
	;			;
	; AVX1-LABEL: load_splat_8f32_16f32_89898989:			; AVX-LABEL: load_splat_8f32_16f32_89898989:
	; AVX1: # BB#0: # %entry			; AVX: # BB#0: # %entry
	; AVX1-NEXT: vbroadcastsd 32(%rdi), %ymm0			; AVX-NEXT: vbroadcastsd 32(%rdi), %ymm0
	; AVX1-NEXT: retq			; AVX-NEXT: retq
	;
	; AVX2-LABEL: load_splat_8f32_16f32_89898989:
	; AVX2: # BB#0: # %entry
	; AVX2-NEXT: vbroadcastsd 32(%rdi), %ymm0
	; AVX2-NEXT: retq
	;
	; AVX512-LABEL: load_splat_8f32_16f32_89898989:
	; AVX512: # BB#0: # %entry
	; AVX512-NEXT: vmovapd (%rdi), %zmm0
	; AVX512-NEXT: vextractf64x4 $1, %zmm0, %ymm0
	; AVX512-NEXT: vbroadcastsd %xmm0, %ymm0
	; AVX512-NEXT: retq
	entry:			entry:
	%ld = load <16 x float>, <16 x float>* %ptr			%ld = load <16 x float>, <16 x float>* %ptr
	%ret = shufflevector <16 x float> %ld, <16 x float> undef, <8 x i32> <i32 8, i32 9, i32 8, i32 9, i32 8, i32 9, i32 8, i32 9>			%ret = shufflevector <16 x float> %ld, <16 x float> undef, <8 x i32> <i32 8, i32 9, i32 8, i32 9, i32 8, i32 9, i32 8, i32 9>
	ret <8 x float> %ret			ret <8 x float> %ret
	}			}

llvm/trunk/test/CodeGen/X86/x86-interleaved-access.ll

Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	; AVX-NEXT: retq
ret <4 x double> %mul		ret <4 x double> %mul
}		}

define <4 x double> @load_factorf64_1(<16 x double>* %ptr) {		define <4 x double> @load_factorf64_1(<16 x double>* %ptr) {
; AVX1-LABEL: load_factorf64_1:		; AVX1-LABEL: load_factorf64_1:
; AVX1: # BB#0:		; AVX1: # BB#0:
; AVX1-NEXT: vmovups (%rdi), %ymm0		; AVX1-NEXT: vmovups (%rdi), %ymm0
; AVX1-NEXT: vmovups 32(%rdi), %ymm1		; AVX1-NEXT: vmovups 32(%rdi), %ymm1
; AVX1-NEXT: vmovups 64(%rdi), %ymm2		; AVX1-NEXT: vinsertf128 $1, 64(%rdi), %ymm0, %ymm0
; AVX1-NEXT: vmovups 96(%rdi), %ymm3		; AVX1-NEXT: vinsertf128 $1, 96(%rdi), %ymm1, %ymm1
; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0
; AVX1-NEXT: vinsertf128 $1, %xmm3, %ymm1, %ymm1
; AVX1-NEXT: vunpcklpd {{.*#+}} ymm0 = ymm0[0],ymm1[0],ymm0[2],ymm1[2]		; AVX1-NEXT: vunpcklpd {{.*#+}} ymm0 = ymm0[0],ymm1[0],ymm0[2],ymm1[2]
; AVX1-NEXT: vmulpd %ymm0, %ymm0, %ymm0		; AVX1-NEXT: vmulpd %ymm0, %ymm0, %ymm0
; AVX1-NEXT: retq		; AVX1-NEXT: retq
;		;
; AVX2-LABEL: load_factorf64_1:		; AVX2-LABEL: load_factorf64_1:
; AVX2: # BB#0:		; AVX2: # BB#0:
; AVX2-NEXT: vmovupd (%rdi), %ymm0		; AVX2-NEXT: vmovupd (%rdi), %ymm0
; AVX2-NEXT: vmovupd 32(%rdi), %ymm1		; AVX2-NEXT: vmovupd 32(%rdi), %ymm1
; AVX2-NEXT: vmovupd 64(%rdi), %ymm2		; AVX2-NEXT: vinsertf128 $1, 64(%rdi), %ymm0, %ymm0
; AVX2-NEXT: vmovupd 96(%rdi), %ymm3		; AVX2-NEXT: vinsertf128 $1, 96(%rdi), %ymm1, %ymm1
; AVX2-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0
; AVX2-NEXT: vinsertf128 $1, %xmm3, %ymm1, %ymm1
; AVX2-NEXT: vunpcklpd {{.*#+}} ymm0 = ymm0[0],ymm1[0],ymm0[2],ymm1[2]		; AVX2-NEXT: vunpcklpd {{.*#+}} ymm0 = ymm0[0],ymm1[0],ymm0[2],ymm1[2]
; AVX2-NEXT: vmulpd %ymm0, %ymm0, %ymm0		; AVX2-NEXT: vmulpd %ymm0, %ymm0, %ymm0
; AVX2-NEXT: retq		; AVX2-NEXT: retq
%wide.vec = load <16 x double>, <16 x double>* %ptr, align 16		%wide.vec = load <16 x double>, <16 x double>* %ptr, align 16
%strided.v0 = shufflevector <16 x double> %wide.vec, <16 x double> undef, <4 x i32> <i32 0, i32 4, i32 8, i32 12>		%strided.v0 = shufflevector <16 x double> %wide.vec, <16 x double> undef, <4 x i32> <i32 0, i32 4, i32 8, i32 12>
%strided.v3 = shufflevector <16 x double> %wide.vec, <16 x double> undef, <4 x i32> <i32 0, i32 4, i32 8, i32 12>		%strided.v3 = shufflevector <16 x double> %wide.vec, <16 x double> undef, <4 x i32> <i32 0, i32 4, i32 8, i32 12>
%mul = fmul <4 x double> %strided.v0, %strided.v3		%mul = fmul <4 x double> %strided.v0, %strided.v3
ret <4 x double> %mul		ret <4 x double> %mul
▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] use narrow load to avoid vector extractClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 100532

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/trunk/test/CodeGen/AArch64/arm64-vabs.ll

llvm/trunk/test/CodeGen/AArch64/arm64-vadd.ll

llvm/trunk/test/CodeGen/AArch64/arm64-vmul.ll

llvm/trunk/test/CodeGen/AArch64/arm64-vshift.ll

llvm/trunk/test/CodeGen/AArch64/arm64-vsub.ll

llvm/trunk/test/CodeGen/ARM/vcombine.ll

llvm/trunk/test/CodeGen/ARM/vext.ll

llvm/trunk/test/CodeGen/X86/avx512bw-intrinsics.ll

llvm/trunk/test/CodeGen/X86/vector-shuffle-512-v16.ll

llvm/trunk/test/CodeGen/X86/vector-shuffle-avx512.ll

llvm/trunk/test/CodeGen/X86/widened-broadcast.ll

llvm/trunk/test/CodeGen/X86/x86-interleaved-access.ll

[DAGCombiner] use narrow load to avoid vector extract
ClosedPublic