This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Basic demand elements for some intrinsics
ClosedPublic

Authored by dmgreen on Jan 12 2022, 1:50 AM.

Download Raw Diff

Details

Reviewers

jaykang10
samtebbs
sdesmalen
david-arm
MattDevereau

Commits

rG61888d97f67d: [AArch64] Basic demand elements for some intrinsics

Summary

A lot of neon intrinsics work lane-wise, meaning that non-demanded elements in and not demanded out. This teaches that to AArch64TTIImpl::simplifyDemandedVectorEltsIntrinsic for some simple single-input truncate intrinsics, which can help remove unnecessary instructions in the final result.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dmgreen created this revision.Jan 12 2022, 1:50 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald TranscriptJan 12 2022, 1:50 AM

dmgreen requested review of this revision.Jan 12 2022, 1:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2022, 1:50 AM

Harbormaster completed remote builds in B142870: Diff 399241.Jan 12 2022, 1:50 AM

The change looks sensible to me, although I was a little confused by the commit message? It looks like what your patch is doing is just adding an AArch64 version of simplifyDemandedVectorEltsIntrinsic, that allows us to potentially simplify the input operand to the intrinsic based on the demanded elements. So, for example, if we know that we are only going to use the first N elements of the intrinsic result we can use that information to simplify the intrinsic operand too.

dmgreen edited the summary of this revision. (Show Details)Jan 13 2022, 12:35 AM

It looks like what your patch is doing is just adding an AArch64 version of simplifyDemandedVectorEltsIntrinsic, that allows us to potentially simplify the input operand to the intrinsic based on the demanded elements. So, for example, if we know that we are only going to use the first N elements of the intrinsic result we can use that information to simplify the intrinsic operand too.

Yep. I had written "single element" where I meant "single source". I can see that being confusing, but you have the right idea.

We could do it for binops too, but I've not looked at those here, just truncates with a single input.

LGTM!

This revision is now accepted and ready to land.Jan 13 2022, 12:46 AM

This revision was landed with ongoing or failed builds.Jan 13 2022, 3:53 AM

Closed by commit rG61888d97f67d: [AArch64] Basic demand elements for some intrinsics (authored by dmgreen). · Explain Why

This revision was automatically updated to reflect the committed changes.

dmgreen added a commit: rG61888d97f67d: [AArch64] Basic demand elements for some intrinsics.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64TargetTransformInfo.h

6 lines

AArch64TargetTransformInfo.cpp

26 lines

test/

Transforms/

InstCombine/

AArch64/

demandelts.ll

23 lines

Diff 399627

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	public:
}		}

InstructionCost getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,		InstructionCost getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);

Optional<Instruction *> instCombineIntrinsic(InstCombiner &IC,		Optional<Instruction *> instCombineIntrinsic(InstCombiner &IC,
IntrinsicInst &II) const;		IntrinsicInst &II) const;

		Optional<Value *> simplifyDemandedVectorEltsIntrinsic(
		InstCombiner &IC, IntrinsicInst &II, APInt DemandedElts, APInt &UndefElts,
		APInt &UndefElts2, APInt &UndefElts3,
		std::function<void(Instruction *, unsigned, APInt, APInt &)>
		SimplifyAndSetOp) const;

TypeSize getRegisterBitWidth(TargetTransformInfo::RegisterKind K) const {		TypeSize getRegisterBitWidth(TargetTransformInfo::RegisterKind K) const {
switch (K) {		switch (K) {
case TargetTransformInfo::RGK_Scalar:		case TargetTransformInfo::RGK_Scalar:
return TypeSize::getFixed(64);		return TypeSize::getFixed(64);
case TargetTransformInfo::RGK_FixedWidthVector:		case TargetTransformInfo::RGK_FixedWidthVector:
if (ST->hasSVE())		if (ST->hasSVE())
return TypeSize::getFixed(		return TypeSize::getFixed(
std::max(ST->getMinSVEVectorSizeInBits(), 128u));		std::max(ST->getMinSVEVectorSizeInBits(), 128u));
▲ Show 20 Lines • Show All 213 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 1,162 Lines • ▼ Show 20 Lines	case Intrinsic::aarch64_sve_st1:
return instCombineSVEST1(IC, II, DL);		return instCombineSVEST1(IC, II, DL);
case Intrinsic::aarch64_sve_sdiv:		case Intrinsic::aarch64_sve_sdiv:
return instCombineSVESDIV(IC, II);		return instCombineSVESDIV(IC, II);
}		}

return None;		return None;
}		}

		Optional<Value *> AArch64TTIImpl::simplifyDemandedVectorEltsIntrinsic(
		InstCombiner &IC, IntrinsicInst &II, APInt OrigDemandedElts,
		APInt &UndefElts, APInt &UndefElts2, APInt &UndefElts3,
		std::function<void(Instruction *, unsigned, APInt, APInt &)>
		SimplifyAndSetOp) const {
		switch (II.getIntrinsicID()) {
		default:
		break;
		case Intrinsic::aarch64_neon_fcvtxn:
		case Intrinsic::aarch64_neon_rshrn:
		case Intrinsic::aarch64_neon_sqrshrn:
		case Intrinsic::aarch64_neon_sqrshrun:
		case Intrinsic::aarch64_neon_sqshrn:
		case Intrinsic::aarch64_neon_sqshrun:
		case Intrinsic::aarch64_neon_sqxtn:
		case Intrinsic::aarch64_neon_sqxtun:
		case Intrinsic::aarch64_neon_uqrshrn:
		case Intrinsic::aarch64_neon_uqshrn:
		case Intrinsic::aarch64_neon_uqxtn:
		SimplifyAndSetOp(&II, 0, OrigDemandedElts, UndefElts);
		break;
		}

		return None;
		}

bool AArch64TTIImpl::isWideningInstruction(Type *DstTy, unsigned Opcode,		bool AArch64TTIImpl::isWideningInstruction(Type *DstTy, unsigned Opcode,
ArrayRef<const Value *> Args) {		ArrayRef<const Value *> Args) {

// A helper that returns a vector type from the given type. The number of		// A helper that returns a vector type from the given type. The number of
// elements in type Ty determine the vector width.		// elements in type Ty determine the vector width.
auto toVectorTy = [&](Type *ArgTy) {		auto toVectorTy = [&](Type *ArgTy) {
return VectorType::get(ArgTy->getScalarType(),		return VectorType::get(ArgTy->getScalarType(),
cast<VectorType>(DstTy)->getElementCount());		cast<VectorType>(DstTy)->getElementCount());
▲ Show 20 Lines • Show All 1,381 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/AArch64/demandelts.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -instcombine -mtriple aarch64-none-eabi < %s \| FileCheck %s			; RUN: opt -S -instcombine -mtriple aarch64-none-eabi < %s \| FileCheck %s

	define <2 x float> @fcvtxn(<2 x double> %d1) {			define <2 x float> @fcvtxn(<2 x double> %d1) {
	; CHECK-LABEL: @fcvtxn(			; CHECK-LABEL: @fcvtxn(
	; CHECK-NEXT: [[A:%.]] = shufflevector <2 x double> [[D1:%.]], <2 x double> undef, <2 x i32> zeroinitializer			; CHECK-NEXT: [[I:%.]] = call <2 x float> @llvm.aarch64.neon.fcvtxn.v2f32.v2f64(<2 x double> [[D1:%.]])
	; CHECK-NEXT: [[I:%.*]] = call <2 x float> @llvm.aarch64.neon.fcvtxn.v2f32.v2f64(<2 x double> [[A]])
	; CHECK-NEXT: [[S:%.*]] = shufflevector <2 x float> [[I]], <2 x float> undef, <2 x i32> <i32 0, i32 undef>			; CHECK-NEXT: [[S:%.*]] = shufflevector <2 x float> [[I]], <2 x float> undef, <2 x i32> <i32 0, i32 undef>
	; CHECK-NEXT: ret <2 x float> [[S]]			; CHECK-NEXT: ret <2 x float> [[S]]
	;			;
	%a = shufflevector <2 x double> %d1, <2 x double> undef, <2 x i32> <i32 0, i32 0>			%a = shufflevector <2 x double> %d1, <2 x double> undef, <2 x i32> <i32 0, i32 0>
	%i = call <2 x float> @llvm.aarch64.neon.fcvtxn.v2f32.v2f64(<2 x double> %a)			%i = call <2 x float> @llvm.aarch64.neon.fcvtxn.v2f32.v2f64(<2 x double> %a)
	%s = shufflevector <2 x float> %i, <2 x float> undef, <2 x i32> <i32 0, i32 undef>			%s = shufflevector <2 x float> %i, <2 x float> undef, <2 x i32> <i32 0, i32 undef>
	ret <2 x float> %s			ret <2 x float> %s
	}			}

	define <4 x i16> @rshrn(<2 x i32> %d1, <2 x i32> %d2) {			define <4 x i16> @rshrn(<2 x i32> %d1, <2 x i32> %d2) {
	; CHECK-LABEL: @rshrn(			; CHECK-LABEL: @rshrn(
	; CHECK-NEXT: [[A:%.]] = shufflevector <2 x i32> [[D1:%.]], <2 x i32> [[D2:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>			; CHECK-NEXT: [[A:%.]] = shufflevector <2 x i32> [[D1:%.]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[I:%.*]] = call <4 x i16> @llvm.aarch64.neon.rshrn.v4i16(<4 x i32> [[A]], i32 9)			; CHECK-NEXT: [[I:%.*]] = call <4 x i16> @llvm.aarch64.neon.rshrn.v4i16(<4 x i32> [[A]], i32 9)
	; CHECK-NEXT: [[S:%.*]] = shufflevector <4 x i16> [[I]], <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[S:%.*]] = shufflevector <4 x i16> [[I]], <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: ret <4 x i16> [[S]]			; CHECK-NEXT: ret <4 x i16> [[S]]
	;			;
	%a = shufflevector <2 x i32> %d1, <2 x i32> %d2, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%a = shufflevector <2 x i32> %d1, <2 x i32> %d2, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	%i = call <4 x i16> @llvm.aarch64.neon.rshrn.v4i16(<4 x i32> %a, i32 9)			%i = call <4 x i16> @llvm.aarch64.neon.rshrn.v4i16(<4 x i32> %a, i32 9)
	%s = shufflevector <4 x i16> %i, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			%s = shufflevector <4 x i16> %i, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	ret <4 x i16> %s			ret <4 x i16> %s
	}			}

	define <4 x i16> @sqrshrn(<2 x i32> %d1, <2 x i32> %d2) {			define <4 x i16> @sqrshrn(<2 x i32> %d1, <2 x i32> %d2) {
	; CHECK-LABEL: @sqrshrn(			; CHECK-LABEL: @sqrshrn(
	; CHECK-NEXT: [[A:%.]] = shufflevector <2 x i32> [[D1:%.]], <2 x i32> [[D2:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>			; CHECK-NEXT: [[A:%.]] = shufflevector <2 x i32> [[D1:%.]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqrshrn.v4i16(<4 x i32> [[A]], i32 9)			; CHECK-NEXT: [[I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqrshrn.v4i16(<4 x i32> [[A]], i32 9)
	; CHECK-NEXT: [[S:%.*]] = shufflevector <4 x i16> [[I]], <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[S:%.*]] = shufflevector <4 x i16> [[I]], <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: ret <4 x i16> [[S]]			; CHECK-NEXT: ret <4 x i16> [[S]]
	;			;
	%a = shufflevector <2 x i32> %d1, <2 x i32> %d2, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%a = shufflevector <2 x i32> %d1, <2 x i32> %d2, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	%i = call <4 x i16> @llvm.aarch64.neon.sqrshrn.v4i16(<4 x i32> %a, i32 9)			%i = call <4 x i16> @llvm.aarch64.neon.sqrshrn.v4i16(<4 x i32> %a, i32 9)
	%s = shufflevector <4 x i16> %i, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			%s = shufflevector <4 x i16> %i, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	ret <4 x i16> %s			ret <4 x i16> %s
	}			}

	define <4 x i16> @sqrshrun(<2 x i32> %d1, <2 x i32> %d2) {			define <4 x i16> @sqrshrun(<2 x i32> %d1, <2 x i32> %d2) {
	; CHECK-LABEL: @sqrshrun(			; CHECK-LABEL: @sqrshrun(
	; CHECK-NEXT: [[A:%.]] = shufflevector <2 x i32> [[D1:%.]], <2 x i32> [[D2:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>			; CHECK-NEXT: [[A:%.]] = shufflevector <2 x i32> [[D1:%.]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqrshrun.v4i16(<4 x i32> [[A]], i32 9)			; CHECK-NEXT: [[I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqrshrun.v4i16(<4 x i32> [[A]], i32 9)
	; CHECK-NEXT: [[S:%.*]] = shufflevector <4 x i16> [[I]], <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[S:%.*]] = shufflevector <4 x i16> [[I]], <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: ret <4 x i16> [[S]]			; CHECK-NEXT: ret <4 x i16> [[S]]
	;			;
	%a = shufflevector <2 x i32> %d1, <2 x i32> %d2, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%a = shufflevector <2 x i32> %d1, <2 x i32> %d2, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	%i = call <4 x i16> @llvm.aarch64.neon.sqrshrun.v4i16(<4 x i32> %a, i32 9)			%i = call <4 x i16> @llvm.aarch64.neon.sqrshrun.v4i16(<4 x i32> %a, i32 9)
	%s = shufflevector <4 x i16> %i, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			%s = shufflevector <4 x i16> %i, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	ret <4 x i16> %s			ret <4 x i16> %s
	}			}

	define <4 x i16> @sqshrn(<2 x i32> %d1, <2 x i32> %d2) {			define <4 x i16> @sqshrn(<2 x i32> %d1, <2 x i32> %d2) {
	; CHECK-LABEL: @sqshrn(			; CHECK-LABEL: @sqshrn(
	; CHECK-NEXT: [[A:%.]] = shufflevector <2 x i32> [[D1:%.]], <2 x i32> [[D2:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>			; CHECK-NEXT: [[A:%.]] = shufflevector <2 x i32> [[D1:%.]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqshrn.v4i16(<4 x i32> [[A]], i32 9)			; CHECK-NEXT: [[I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqshrn.v4i16(<4 x i32> [[A]], i32 9)
	; CHECK-NEXT: [[S:%.*]] = shufflevector <4 x i16> [[I]], <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[S:%.*]] = shufflevector <4 x i16> [[I]], <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: ret <4 x i16> [[S]]			; CHECK-NEXT: ret <4 x i16> [[S]]
	;			;
	%a = shufflevector <2 x i32> %d1, <2 x i32> %d2, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%a = shufflevector <2 x i32> %d1, <2 x i32> %d2, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	%i = call <4 x i16> @llvm.aarch64.neon.sqshrn.v4i16(<4 x i32> %a, i32 9)			%i = call <4 x i16> @llvm.aarch64.neon.sqshrn.v4i16(<4 x i32> %a, i32 9)
	%s = shufflevector <4 x i16> %i, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			%s = shufflevector <4 x i16> %i, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	ret <4 x i16> %s			ret <4 x i16> %s
	}			}

	define <4 x i16> @sqshrun(<2 x i32> %d1, <2 x i32> %d2) {			define <4 x i16> @sqshrun(<2 x i32> %d1, <2 x i32> %d2) {
	; CHECK-LABEL: @sqshrun(			; CHECK-LABEL: @sqshrun(
	; CHECK-NEXT: [[A:%.]] = shufflevector <2 x i32> [[D1:%.]], <2 x i32> [[D2:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>			; CHECK-NEXT: [[A:%.]] = shufflevector <2 x i32> [[D1:%.]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqshrun.v4i16(<4 x i32> [[A]], i32 9)			; CHECK-NEXT: [[I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqshrun.v4i16(<4 x i32> [[A]], i32 9)
	; CHECK-NEXT: [[S:%.*]] = shufflevector <4 x i16> [[I]], <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[S:%.*]] = shufflevector <4 x i16> [[I]], <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: ret <4 x i16> [[S]]			; CHECK-NEXT: ret <4 x i16> [[S]]
	;			;
	%a = shufflevector <2 x i32> %d1, <2 x i32> %d2, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%a = shufflevector <2 x i32> %d1, <2 x i32> %d2, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	%i = call <4 x i16> @llvm.aarch64.neon.sqshrun.v4i16(<4 x i32> %a, i32 9)			%i = call <4 x i16> @llvm.aarch64.neon.sqshrun.v4i16(<4 x i32> %a, i32 9)
	%s = shufflevector <4 x i16> %i, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			%s = shufflevector <4 x i16> %i, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	ret <4 x i16> %s			ret <4 x i16> %s
	}			}

	define <4 x i16> @sqxtn(<2 x i32> %d1, <2 x i32> %d2) {			define <4 x i16> @sqxtn(<2 x i32> %d1, <2 x i32> %d2) {
	; CHECK-LABEL: @sqxtn(			; CHECK-LABEL: @sqxtn(
	; CHECK-NEXT: [[A:%.]] = shufflevector <2 x i32> [[D1:%.]], <2 x i32> [[D2:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>			; CHECK-NEXT: [[A:%.]] = shufflevector <2 x i32> [[D1:%.]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqxtn.v4i16(<4 x i32> [[A]])			; CHECK-NEXT: [[I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqxtn.v4i16(<4 x i32> [[A]])
	; CHECK-NEXT: [[S:%.*]] = shufflevector <4 x i16> [[I]], <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[S:%.*]] = shufflevector <4 x i16> [[I]], <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: ret <4 x i16> [[S]]			; CHECK-NEXT: ret <4 x i16> [[S]]
	;			;
	%a = shufflevector <2 x i32> %d1, <2 x i32> %d2, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%a = shufflevector <2 x i32> %d1, <2 x i32> %d2, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	%i = call <4 x i16> @llvm.aarch64.neon.sqxtn.v4i16(<4 x i32> %a)			%i = call <4 x i16> @llvm.aarch64.neon.sqxtn.v4i16(<4 x i32> %a)
	%s = shufflevector <4 x i16> %i, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			%s = shufflevector <4 x i16> %i, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	ret <4 x i16> %s			ret <4 x i16> %s
	}			}

	define <4 x i16> @sqxtun(<2 x i32> %d1, <2 x i32> %d2) {			define <4 x i16> @sqxtun(<2 x i32> %d1, <2 x i32> %d2) {
	; CHECK-LABEL: @sqxtun(			; CHECK-LABEL: @sqxtun(
	; CHECK-NEXT: [[A:%.]] = shufflevector <2 x i32> [[D1:%.]], <2 x i32> [[D2:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>			; CHECK-NEXT: [[A:%.]] = shufflevector <2 x i32> [[D1:%.]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqxtun.v4i16(<4 x i32> [[A]])			; CHECK-NEXT: [[I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqxtun.v4i16(<4 x i32> [[A]])
	; CHECK-NEXT: [[S:%.*]] = shufflevector <4 x i16> [[I]], <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[S:%.*]] = shufflevector <4 x i16> [[I]], <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: ret <4 x i16> [[S]]			; CHECK-NEXT: ret <4 x i16> [[S]]
	;			;
	%a = shufflevector <2 x i32> %d1, <2 x i32> %d2, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%a = shufflevector <2 x i32> %d1, <2 x i32> %d2, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	%i = call <4 x i16> @llvm.aarch64.neon.sqxtun.v4i16(<4 x i32> %a)			%i = call <4 x i16> @llvm.aarch64.neon.sqxtun.v4i16(<4 x i32> %a)
	%s = shufflevector <4 x i16> %i, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			%s = shufflevector <4 x i16> %i, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	ret <4 x i16> %s			ret <4 x i16> %s
	}			}

	define <4 x i16> @uqrshrn(<2 x i32> %d1, <2 x i32> %d2) {			define <4 x i16> @uqrshrn(<2 x i32> %d1, <2 x i32> %d2) {
	; CHECK-LABEL: @uqrshrn(			; CHECK-LABEL: @uqrshrn(
	; CHECK-NEXT: [[A:%.]] = shufflevector <2 x i32> [[D1:%.]], <2 x i32> [[D2:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>			; CHECK-NEXT: [[A:%.]] = shufflevector <2 x i32> [[D1:%.]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[I:%.*]] = call <4 x i16> @llvm.aarch64.neon.uqrshrn.v4i16(<4 x i32> [[A]], i32 9)			; CHECK-NEXT: [[I:%.*]] = call <4 x i16> @llvm.aarch64.neon.uqrshrn.v4i16(<4 x i32> [[A]], i32 9)
	; CHECK-NEXT: [[S:%.*]] = shufflevector <4 x i16> [[I]], <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[S:%.*]] = shufflevector <4 x i16> [[I]], <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: ret <4 x i16> [[S]]			; CHECK-NEXT: ret <4 x i16> [[S]]
	;			;
	%a = shufflevector <2 x i32> %d1, <2 x i32> %d2, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%a = shufflevector <2 x i32> %d1, <2 x i32> %d2, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	%i = call <4 x i16> @llvm.aarch64.neon.uqrshrn.v4i16(<4 x i32> %a, i32 9)			%i = call <4 x i16> @llvm.aarch64.neon.uqrshrn.v4i16(<4 x i32> %a, i32 9)
	%s = shufflevector <4 x i16> %i, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			%s = shufflevector <4 x i16> %i, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	ret <4 x i16> %s			ret <4 x i16> %s
	}			}

	define <4 x i16> @uqshrn(<2 x i32> %d1, <2 x i32> %d2) {			define <4 x i16> @uqshrn(<2 x i32> %d1, <2 x i32> %d2) {
	; CHECK-LABEL: @uqshrn(			; CHECK-LABEL: @uqshrn(
	; CHECK-NEXT: [[A:%.]] = shufflevector <2 x i32> [[D1:%.]], <2 x i32> [[D2:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>			; CHECK-NEXT: [[A:%.]] = shufflevector <2 x i32> [[D1:%.]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[I:%.*]] = call <4 x i16> @llvm.aarch64.neon.uqshrn.v4i16(<4 x i32> [[A]], i32 9)			; CHECK-NEXT: [[I:%.*]] = call <4 x i16> @llvm.aarch64.neon.uqshrn.v4i16(<4 x i32> [[A]], i32 9)
	; CHECK-NEXT: [[S:%.*]] = shufflevector <4 x i16> [[I]], <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[S:%.*]] = shufflevector <4 x i16> [[I]], <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: ret <4 x i16> [[S]]			; CHECK-NEXT: ret <4 x i16> [[S]]
	;			;
	%a = shufflevector <2 x i32> %d1, <2 x i32> %d2, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%a = shufflevector <2 x i32> %d1, <2 x i32> %d2, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	%i = call <4 x i16> @llvm.aarch64.neon.uqshrn.v4i16(<4 x i32> %a, i32 9)			%i = call <4 x i16> @llvm.aarch64.neon.uqshrn.v4i16(<4 x i32> %a, i32 9)
	%s = shufflevector <4 x i16> %i, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			%s = shufflevector <4 x i16> %i, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	ret <4 x i16> %s			ret <4 x i16> %s
	}			}

	define <4 x i16> @uqxtn(<2 x i32> %d1, <2 x i32> %d2) {			define <4 x i16> @uqxtn(<2 x i32> %d1, <2 x i32> %d2) {
	; CHECK-LABEL: @uqxtn(			; CHECK-LABEL: @uqxtn(
	; CHECK-NEXT: [[A:%.]] = shufflevector <2 x i32> [[D1:%.]], <2 x i32> [[D2:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>			; CHECK-NEXT: [[A:%.]] = shufflevector <2 x i32> [[D1:%.]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[I:%.*]] = call <4 x i16> @llvm.aarch64.neon.uqxtn.v4i16(<4 x i32> [[A]])			; CHECK-NEXT: [[I:%.*]] = call <4 x i16> @llvm.aarch64.neon.uqxtn.v4i16(<4 x i32> [[A]])
	; CHECK-NEXT: [[S:%.*]] = shufflevector <4 x i16> [[I]], <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[S:%.*]] = shufflevector <4 x i16> [[I]], <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: ret <4 x i16> [[S]]			; CHECK-NEXT: ret <4 x i16> [[S]]
	;			;
	%a = shufflevector <2 x i32> %d1, <2 x i32> %d2, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%a = shufflevector <2 x i32> %d1, <2 x i32> %d2, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	%i = call <4 x i16> @llvm.aarch64.neon.uqxtn.v4i16(<4 x i32> %a)			%i = call <4 x i16> @llvm.aarch64.neon.uqxtn.v4i16(<4 x i32> %a)
	%s = shufflevector <4 x i16> %i, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			%s = shufflevector <4 x i16> %i, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	ret <4 x i16> %s			ret <4 x i16> %s
	Show All 14 Lines