This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] support neon_sshl and neon_ushl in performIntrinsicCombine.
ClosedPublic

Authored by fhahn on May 23 2019, 6:11 AM.

Download Raw Diff

Details

Reviewers

t.p.northover
samparker
dmgreen
anemet

Commits

rG3e2fdbee80b0: [AArch64] support neon_sshl and neon_ushl in performIntrinsicCombine.
rL372565: [AArch64] support neon_sshl and neon_ushl in performIntrinsicCombine.

Summary

If we have a constant vector mask with the shift values being all equal,
we can simplify aarch64_neon_sshl to VSHL.

This pattern can be generated by code using vshlq_s32(a,vdupq_n_s32(n))
instead of vshlq_n_s32(a, n), because it is used in contexts where n is
not guaranteed to be constant, before inlining.

We can do a similar combine for aarch64_neon_ushl, but we have to be
a bit more careful, because we can only match ushll/ushll2 for vector
shifts with a zero-extended first operand.

Also adds 2 tests marked with FIXME, where we can further increase
codegen.

Diff Detail

Repository: rL LLVM

Event Timeline

fhahn created this revision.May 23 2019, 6:11 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2019, 6:11 AM

Herald added subscribers: hiraditya, kristof.beyls, javed.absar. · View Herald Transcript

Harbormaster completed remote builds in B32380: Diff 200949.May 23 2019, 6:13 AM

I think there are probably other shifts that we can include while we're in the area. Most obviously aarch64_neon_ushl, but maybe others too.

Add support for neon_ushl.

Harbormaster completed remote builds in B36760: Diff 215173.Aug 14 2019, 11:05 AM

In D62308#1513868, @t.p.northover wrote:

I think there are probably other shifts that we can include while we're in the area. Most obviously aarch64_neon_ushl, but maybe others too.

Thanks Tim. I've added support for neon_ushl. I think w have to be a bit more careful with that one, as we only have ushll/ushll2 that take immediates

fhahn retitled this revision from [AArch64] support neon_sshl in performIntrinsicCombine. to [AArch64] support neon_sshl and neon_ushl in performIntrinsicCombine..Aug 15 2019, 2:04 AM

fhahn edited the summary of this revision. (Show Details)

ping

Ping

Does this also apply to right shifts?

In D62308#1661854, @anemet wrote:

Does this also apply to right shifts?

I think the relevant cases should be handled already. For vector right shifts, it looks like there are only immediate forms and we turn left shifts with negative immediate into the corresponding right shifts.

ping

Some minor test questions/suggestions. Feel free to commit after addressing.

llvm/test/CodeGen/AArch64/arm64-vshift.ll
1204 ↗	(On Diff #215173)	I don't see any negative tests when we zero-extend not to the next one higher type.
1292 ↗	(On Diff #215173)	technically this is not sshll (long)
1318 ↗	(On Diff #215173)	Isn't it used for the extensions?
1333 ↗	(On Diff #215173)	I think that we should also have other shl tests (.4s non-foldable and perhaps some other sizes).

This revision is now accepted and ready to land.Sep 17 2019, 10:21 PM

Add additional test cases and limit this patch to converting cases with appropriate zext/sext.

I'll submit a separate patch for turning ushl -> shl, if the shift is all constant.

Harbormaster completed remote builds in B38356: Diff 221021.Sep 20 2019, 6:54 AM

In D62308#1676691, @fhahn wrote:

I'll submit a separate patch for turning ushl -> shl, if the shift is all constant.

... and sshl -> shl?

In D62308#1676800, @anemet wrote:

In D62308#1676691, @fhahn wrote:

I'll submit a separate patch for turning ushl -> shl, if the shift is all constant.

... and sshl -> shl?

Yep, let's look at that separately.

llvm/test/CodeGen/AArch64/arm64-vshift.ll
1318 ↗	(On Diff #215173)	Yeah, I initially thought there might be a long version of sshr, but looks like there is not

Closed by commit rL372565: [AArch64] support neon_sshl and neon_ushl in performIntrinsicCombine. (authored by fhahn). · Explain WhySep 23 2019, 2:38 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

25 lines

test/

CodeGen/

AArch64/

arm64-vshift.ll

190 lines

Diff 221273

llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 10,326 Lines • ▼ Show 20 Lines	static SDValue tryCombineShiftImm(unsigned IID, SDNode *N, SelectionDAG &DAG) {
case Intrinsic::aarch64_neon_urshl:		case Intrinsic::aarch64_neon_urshl:
Opcode = AArch64ISD::URSHR_I;		Opcode = AArch64ISD::URSHR_I;
IsRightShift = true;		IsRightShift = true;
break;		break;
case Intrinsic::aarch64_neon_sqshlu:		case Intrinsic::aarch64_neon_sqshlu:
Opcode = AArch64ISD::SQSHLU_I;		Opcode = AArch64ISD::SQSHLU_I;
IsRightShift = false;		IsRightShift = false;
break;		break;
		case Intrinsic::aarch64_neon_sshl:
		case Intrinsic::aarch64_neon_ushl: {
		// ushll/ushll2 provide unsigned shifts with immediate operands and
		// sshll/sshll2 provide signed shifts with immediates, so we have to make
		// sure we only match patterns here we can later match to them.
		SDValue Op0 = N->getOperand(1);
		if (Op0.getNode()->getOpcode() != (IID == Intrinsic::aarch64_neon_ushl
		? ISD::ZERO_EXTEND
		: ISD::SIGN_EXTEND))
		return SDValue();

		EVT FromType = Op0.getOperand(0).getValueType();
		EVT ToType = Op0.getValueType();
		unsigned FromSize = FromType.getScalarSizeInBits();
		if (!FromType.isVector() \|\| !ToType.isVector() \|\|
		(FromSize != 8 && FromSize != 16 && FromSize != 32) \|\|
		2 * FromSize != ToType.getScalarSizeInBits())
		return SDValue();

		Opcode = AArch64ISD::VSHL;
		IsRightShift = false;
		break;
		}
}		}

if (IsRightShift && ShiftAmount <= -1 && ShiftAmount >= -(int)ElemBits) {		if (IsRightShift && ShiftAmount <= -1 && ShiftAmount >= -(int)ElemBits) {
SDLoc dl(N);		SDLoc dl(N);
return DAG.getNode(Opcode, dl, N->getValueType(0), N->getOperand(1),		return DAG.getNode(Opcode, dl, N->getValueType(0), N->getOperand(1),
DAG.getConstant(-ShiftAmount, dl, MVT::i32));		DAG.getConstant(-ShiftAmount, dl, MVT::i32));
} else if (!IsRightShift && ShiftAmount >= 0 && ShiftAmount < ElemBits) {		} else if (!IsRightShift && ShiftAmount >= 0 && ShiftAmount < ElemBits) {
SDLoc dl(N);		SDLoc dl(N);
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	static SDValue performIntrinsicCombine(SDNode *N,
case Intrinsic::aarch64_neon_pmull:		case Intrinsic::aarch64_neon_pmull:
case Intrinsic::aarch64_neon_sqdmull:		case Intrinsic::aarch64_neon_sqdmull:
return tryCombineLongOpWithDup(IID, N, DCI, DAG);		return tryCombineLongOpWithDup(IID, N, DCI, DAG);
case Intrinsic::aarch64_neon_sqshl:		case Intrinsic::aarch64_neon_sqshl:
case Intrinsic::aarch64_neon_uqshl:		case Intrinsic::aarch64_neon_uqshl:
case Intrinsic::aarch64_neon_sqshlu:		case Intrinsic::aarch64_neon_sqshlu:
case Intrinsic::aarch64_neon_srshl:		case Intrinsic::aarch64_neon_srshl:
case Intrinsic::aarch64_neon_urshl:		case Intrinsic::aarch64_neon_urshl:
		case Intrinsic::aarch64_neon_sshl:
		case Intrinsic::aarch64_neon_ushl:
return tryCombineShiftImm(IID, N, DAG);		return tryCombineShiftImm(IID, N, DAG);
case Intrinsic::aarch64_crc32b:		case Intrinsic::aarch64_crc32b:
case Intrinsic::aarch64_crc32cb:		case Intrinsic::aarch64_crc32cb:
return tryCombineCRC32(0xff, N, DAG);		return tryCombineCRC32(0xff, N, DAG);
case Intrinsic::aarch64_crc32h:		case Intrinsic::aarch64_crc32h:
case Intrinsic::aarch64_crc32ch:		case Intrinsic::aarch64_crc32ch:
return tryCombineCRC32(0xffff, N, DAG);		return tryCombineCRC32(0xffff, N, DAG);
}		}
▲ Show 20 Lines • Show All 1,931 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/arm64-vshift.ll

	Show First 20 Lines • Show All 1,186 Lines • ▼ Show 20 Lines
	;CHECK: ushll.2d v0, {{v[0-9]+}}, #1			;CHECK: ushll.2d v0, {{v[0-9]+}}, #1
	%load1 = load <4 x i32>, <4 x i32>* %A			%load1 = load <4 x i32>, <4 x i32>* %A
	%tmp1 = shufflevector <4 x i32> %load1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>			%tmp1 = shufflevector <4 x i32> %load1, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
	%tmp2 = zext <2 x i32> %tmp1 to <2 x i64>			%tmp2 = zext <2 x i32> %tmp1 to <2 x i64>
	%tmp3 = shl <2 x i64> %tmp2, <i64 1, i64 1>			%tmp3 = shl <2 x i64> %tmp2, <i64 1, i64 1>
	ret <2 x i64> %tmp3			ret <2 x i64> %tmp3
	}			}

				declare <16 x i8> @llvm.aarch64.neon.ushl.v16i8(<16 x i8>, <16 x i8>)
				declare <8 x i16> @llvm.aarch64.neon.ushl.v8i16(<8 x i16>, <8 x i16>)
				declare <4 x i32> @llvm.aarch64.neon.ushl.v4i32(<4 x i32>, <4 x i32>)
				declare <2 x i64> @llvm.aarch64.neon.ushl.v2i64(<2 x i64>, <2 x i64>)

				define <8 x i16> @neon.ushll8h_constant_shift(<8 x i8>* %A) nounwind {
				;CHECK-LABEL: neon.ushll8h_constant_shift
				;CHECK: ushll.8h v0, {{v[0-9]+}}, #1
				%tmp1 = load <8 x i8>, <8 x i8>* %A
				%tmp2 = zext <8 x i8> %tmp1 to <8 x i16>
				%tmp3 = call <8 x i16> @llvm.aarch64.neon.ushl.v8i16(<8 x i16> %tmp2, <8 x i16> <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>)
				ret <8 x i16> %tmp3
				}

				define <8 x i16> @neon.ushl8h_no_constant_shift(<8 x i8>* %A) nounwind {
				;CHECK-LABEL: neon.ushl8h_no_constant_shift
				;CHECK: ushl.8h v0, v0, v0
				%tmp1 = load <8 x i8>, <8 x i8>* %A
				%tmp2 = zext <8 x i8> %tmp1 to <8 x i16>
				%tmp3 = call <8 x i16> @llvm.aarch64.neon.ushl.v8i16(<8 x i16> %tmp2, <8 x i16> %tmp2)
				ret <8 x i16> %tmp3
				}

				; Here we do not extend to the double the bitwidth, so we cannot fold to ushll.
				define <4 x i32> @neon.ushll8h_constant_shift_extend_not_2x(<4 x i8>* %A) nounwind {
				;CHECK-LABEL: @neon.ushll8h_constant_shift_extend_not_2x
				;CHECK-NOT: ushll.8h v0,
				;CHECK: ldrb w8, [x0]
				;CHECK: movi.4s v1, #1
				;CHECK: fmov s0, w8
				;CHECK: ldrb w8, [x0, #1]
				;CHECK: mov.s v0[1], w8
				;CHECK: ldrb w8, [x0, #2]
				;CHECK: mov.s v0[2], w8
				;CHECK: ldrb w8, [x0, #3]
				;CHECK: mov.s v0[3], w8
				;CHECK: ushl.4s v0, v0, v1
				%tmp1 = load <4 x i8>, <4 x i8>* %A
				%tmp2 = zext <4 x i8> %tmp1 to <4 x i32>
				%tmp3 = call <4 x i32> @llvm.aarch64.neon.ushl.v4i32(<4 x i32> %tmp2, <4 x i32> <i32 1, i32 1, i32 1, i32 1>)
				ret <4 x i32> %tmp3
				}

				define <8 x i16> @neon.ushl8_noext_constant_shift(<8 x i16>* %A) nounwind {
				; CHECK-LABEL: neon.ushl8_noext_constant_shift
				; CHECK: ldr q0, [x0]
				; CHECK-NEXT: movi.8h v1, #1
				; CHECK-NEXT: ushl.8h v0, v0, v1
				; CHECK-NEXT: ret
				%tmp1 = load <8 x i16>, <8 x i16>* %A
				%tmp3 = call <8 x i16> @llvm.aarch64.neon.ushl.v8i16(<8 x i16> %tmp1, <8 x i16> <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>)
				ret <8 x i16> %tmp3
				}

				define <4 x i32> @neon.ushll4s_constant_shift(<4 x i16>* %A) nounwind {
				;CHECK-LABEL: neon.ushll4s_constant_shift
				;CHECK: ushll.4s v0, {{v[0-9]+}}, #1
				%tmp1 = load <4 x i16>, <4 x i16>* %A
				%tmp2 = zext <4 x i16> %tmp1 to <4 x i32>
				%tmp3 = call <4 x i32> @llvm.aarch64.neon.ushl.v4i32(<4 x i32> %tmp2, <4 x i32> <i32 1, i32 1, i32 1, i32 1>)
				ret <4 x i32> %tmp3
				}

				; FIXME: unnecessary ushll.4s v0, v0, #0?
				define <4 x i32> @neon.ushll4s_neg_constant_shift(<4 x i16>* %A) nounwind {
				; CHECK-LABEL: neon.ushll4s_neg_constant_shift
				; CHECK: movi.2d v1, #0xffffffffffffffff
				; CHECK: ushll.4s v0, v0, #0
				; CHECK: ushl.4s v0, v0, v1
				%tmp1 = load <4 x i16>, <4 x i16>* %A
				%tmp2 = zext <4 x i16> %tmp1 to <4 x i32>
				%tmp3 = call <4 x i32> @llvm.aarch64.neon.ushl.v4i32(<4 x i32> %tmp2, <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>)
				ret <4 x i32> %tmp3
				}

				; FIXME: should be constant folded.
				define <4 x i32> @neon.ushll4s_constant_fold() nounwind {
				; CHECK-LABEL: neon.ushll4s_constant_fold
				; CHECK: movi.4s v1, #1
				; CHECK-NEXT: ushl.4s v0, v0, v1
				;
				%tmp3 = call <4 x i32> @llvm.aarch64.neon.ushl.v4i32(<4 x i32> <i32 0, i32 1, i32 2, i32 3>, <4 x i32> <i32 1, i32 1, i32 1, i32 1>)
				ret <4 x i32> %tmp3
				}

				define <2 x i64> @neon.ushll2d_constant_shift(<2 x i32>* %A) nounwind {
				;CHECK-LABEL: neon.ushll2d_constant_shift
				;CHECK: ushll.2d v0, {{v[0-9]+}}, #1
				%tmp1 = load <2 x i32>, <2 x i32>* %A
				%tmp2 = zext <2 x i32> %tmp1 to <2 x i64>
				%tmp3 = call <2 x i64> @llvm.aarch64.neon.ushl.v2i64(<2 x i64> %tmp2, <2 x i64> <i64 1, i64 1>)
				ret <2 x i64> %tmp3
				}

	define <8 x i16> @sshll8h(<8 x i8>* %A) nounwind {			define <8 x i16> @sshll8h(<8 x i8>* %A) nounwind {
	;CHECK-LABEL: sshll8h:			;CHECK-LABEL: sshll8h:
	;CHECK: sshll.8h v0, {{v[0-9]+}}, #1			;CHECK: sshll.8h v0, {{v[0-9]+}}, #1
	%tmp1 = load <8 x i8>, <8 x i8>* %A			%tmp1 = load <8 x i8>, <8 x i8>* %A
	%tmp2 = sext <8 x i8> %tmp1 to <8 x i16>			%tmp2 = sext <8 x i8> %tmp1 to <8 x i16>
	%tmp3 = shl <8 x i16> %tmp2, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>			%tmp3 = shl <8 x i16> %tmp2, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
	ret <8 x i16> %tmp3			ret <8 x i16> %tmp3
	}			}

	define <4 x i32> @sshll4s(<4 x i16>* %A) nounwind {			define <2 x i64> @sshll2d(<2 x i32>* %A) nounwind {
	;CHECK-LABEL: sshll4s:			;CHECK-LABEL: sshll2d:
				;CHECK: sshll.2d v0, {{v[0-9]+}}, #1
				%tmp1 = load <2 x i32>, <2 x i32>* %A
				%tmp2 = sext <2 x i32> %tmp1 to <2 x i64>
				%tmp3 = shl <2 x i64> %tmp2, <i64 1, i64 1>
				ret <2 x i64> %tmp3
				}

				declare <16 x i8> @llvm.aarch64.neon.sshl.v16i8(<16 x i8>, <16 x i8>)
				declare <8 x i16> @llvm.aarch64.neon.sshl.v8i16(<8 x i16>, <8 x i16>)
				declare <4 x i32> @llvm.aarch64.neon.sshl.v4i32(<4 x i32>, <4 x i32>)
				declare <2 x i64> @llvm.aarch64.neon.sshl.v2i64(<2 x i64>, <2 x i64>)

				define <16 x i8> @neon.sshl16b_constant_shift(<16 x i8>* %A) nounwind {
				;CHECK-LABEL: neon.sshl16b_constant_shift
				;CHECK: sshl.16b {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}}
				%tmp1 = load <16 x i8>, <16 x i8>* %A
				%tmp2 = call <16 x i8> @llvm.aarch64.neon.sshl.v16i8(<16 x i8> %tmp1, <16 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>)
				ret <16 x i8> %tmp2
				}

				define <8 x i16> @neon.sshll8h_constant_shift(<8 x i8>* %A) nounwind {
				;CHECK-LABEL: neon.sshll8h_constant_shift
				;CHECK: sshll.8h v0, {{v[0-9]+}}, #1
				%tmp1 = load <8 x i8>, <8 x i8>* %A
				%tmp2 = sext <8 x i8> %tmp1 to <8 x i16>
				%tmp3 = call <8 x i16> @llvm.aarch64.neon.sshl.v8i16(<8 x i16> %tmp2, <8 x i16> <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>)
				ret <8 x i16> %tmp3
				}

				define <4 x i32> @neon.sshl4s_wrong_ext_constant_shift(<4 x i8>* %A) nounwind {
				;CHECK-LABEL: neon.sshl4s_wrong_ext_constant_shift
				;CHECK: sshl.4s {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}}
				%tmp1 = load <4 x i8>, <4 x i8>* %A
				%tmp2 = sext <4 x i8> %tmp1 to <4 x i32>
				%tmp3 = call <4 x i32> @llvm.aarch64.neon.sshl.v4i32(<4 x i32> %tmp2, <4 x i32> <i32 1, i32 1, i32 1, i32 1>)
				ret <4 x i32> %tmp3
				}


				define <4 x i32> @neon.sshll4s_constant_shift(<4 x i16>* %A) nounwind {
				;CHECK-LABEL: neon.sshll4s_constant_shift
	;CHECK: sshll.4s v0, {{v[0-9]+}}, #1			;CHECK: sshll.4s v0, {{v[0-9]+}}, #1
	%tmp1 = load <4 x i16>, <4 x i16>* %A			%tmp1 = load <4 x i16>, <4 x i16>* %A
	%tmp2 = sext <4 x i16> %tmp1 to <4 x i32>			%tmp2 = sext <4 x i16> %tmp1 to <4 x i32>
	%tmp3 = shl <4 x i32> %tmp2, <i32 1, i32 1, i32 1, i32 1>			%tmp3 = call <4 x i32> @llvm.aarch64.neon.sshl.v4i32(<4 x i32> %tmp2, <4 x i32> <i32 1, i32 1, i32 1, i32 1>)
	ret <4 x i32> %tmp3			ret <4 x i32> %tmp3
	}			}

	define <2 x i64> @sshll2d(<2 x i32>* %A) nounwind {			define <4 x i32> @neon.sshll4s_neg_constant_shift(<4 x i16>* %A) nounwind {
	;CHECK-LABEL: sshll2d:			;CHECK-LABEL: neon.sshll4s_neg_constant_shift
				;CHECK: movi.2d v1, #0xffffffffffffffff
				;CHECK: sshll.4s v0, v0, #0
				;CHECK: sshl.4s v0, v0, v1
				%tmp1 = load <4 x i16>, <4 x i16>* %A
				%tmp2 = sext <4 x i16> %tmp1 to <4 x i32>
				%tmp3 = call <4 x i32> @llvm.aarch64.neon.sshl.v4i32(<4 x i32> %tmp2, <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>)
				ret <4 x i32> %tmp3
				}

				; FIXME: should be constant folded.
				define <4 x i32> @neon.sshl4s_constant_fold() nounwind {
				;CHECK-LABEL: neon.sshl4s_constant_fold
				;CHECK: sshl.4s {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}}
				%tmp3 = call <4 x i32> @llvm.aarch64.neon.sshl.v4i32(<4 x i32> <i32 0, i32 1, i32 2, i32 3>, <4 x i32> <i32 1, i32 1, i32 1, i32 1>)
				ret <4 x i32> %tmp3
				}

				define <4 x i32> @neon.sshl4s_no_fold(<4 x i32>* %A) nounwind {
				;CHECK-LABEL: neon.sshl4s_no_fold
				;CHECK: sshl.4s {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}}
				%tmp1 = load <4 x i32>, <4 x i32>* %A
				%tmp3 = call <4 x i32> @llvm.aarch64.neon.sshl.v4i32(<4 x i32> %tmp1, <4 x i32> <i32 1, i32 1, i32 1, i32 1>)
				ret <4 x i32> %tmp3
				}

				define <2 x i64> @neon.sshll2d_constant_shift(<2 x i32>* %A) nounwind {
				;CHECK-LABEL: neon.sshll2d_constant_shift
	;CHECK: sshll.2d v0, {{v[0-9]+}}, #1			;CHECK: sshll.2d v0, {{v[0-9]+}}, #1
	%tmp1 = load <2 x i32>, <2 x i32>* %A			%tmp1 = load <2 x i32>, <2 x i32>* %A
	%tmp2 = sext <2 x i32> %tmp1 to <2 x i64>			%tmp2 = sext <2 x i32> %tmp1 to <2 x i64>
	%tmp3 = shl <2 x i64> %tmp2, <i64 1, i64 1>			%tmp3 = call <2 x i64> @llvm.aarch64.neon.sshl.v2i64(<2 x i64> %tmp2, <2 x i64> <i64 1, i64 1>)
				ret <2 x i64> %tmp3
				}

				; FIXME: should be constant folded.
				define <2 x i64> @neon.sshl2d_constant_fold() nounwind {
				;CHECK-LABEL: neon.sshl2d_constant_fold
				;CHECK: sshl.2d {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}}
				%tmp3 = call <2 x i64> @llvm.aarch64.neon.sshl.v2i64(<2 x i64> <i64 99, i64 1000>, <2 x i64> <i64 1, i64 1>)
				ret <2 x i64> %tmp3
				}

				define <2 x i64> @neon.sshl2d_no_fold(<2 x i64>* %A) nounwind {
				;CHECK-LABEL: neon.sshl2d_no_fold
				;CHECK: sshl.2d {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}}
				%tmp2 = load <2 x i64>, <2 x i64>* %A
				%tmp3 = call <2 x i64> @llvm.aarch64.neon.sshl.v2i64(<2 x i64> %tmp2, <2 x i64> <i64 2, i64 2>)
	ret <2 x i64> %tmp3			ret <2 x i64> %tmp3
	}			}

	define <8 x i16> @sshll2_8h(<16 x i8>* %A) nounwind {			define <8 x i16> @sshll2_8h(<16 x i8>* %A) nounwind {
	;CHECK-LABEL: sshll2_8h:			;CHECK-LABEL: sshll2_8h:
	;CHECK: sshll.8h v0, {{v[0-9]+}}, #1			;CHECK: sshll.8h v0, {{v[0-9]+}}, #1
	%load1 = load <16 x i8>, <16 x i8>* %A			%load1 = load <16 x i8>, <16 x i8>* %A
	%tmp1 = shufflevector <16 x i8> %load1, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			%tmp1 = shufflevector <16 x i8> %load1, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	▲ Show 20 Lines • Show All 700 Lines • Show Last 20 Lines