This is an archive of the discontinued LLVM Phabricator instance.

%mask = tail call <vscale x 8 x i16> @llvm.aarch64.sve.dup.x.nxv8i16(i16 0x7FFF)
%and = tail call <vscale x 8 x i16> @llvm.aarch64.sve.and.nxv8i16(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %a, <vscale x 8 x i16> %mask)
%splat = tail call <vscale x 8 x i16> @llvm.aarch64.sve.dup.x.nxv8i16(i16 2)
%shr = tail call <vscale x 8 x i16> @llvm.aarch64.sve.srshl.nxv8i16(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %and, <vscale x 8 x i16> %splat)

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1230	Do you mean SRSHL instead of SQSHL in all the comments?
llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-abs-srshl.ll
43	nit: Maybe it's worth splitting the tests out into Positive and Negative tests, i.e. top half positive, bottom half negative? I think that makes it a bit easier to see what's going on.

In D125233#3503702, @david-arm wrote:

Hi @bsmith, this looks like a sensible optimisation! I suppose we can also do something similar when the input is an and too? i.e.

%mask = tail call <vscale x 8 x i16> @llvm.aarch64.sve.dup.x.nxv8i16(i16 0x7FFF)
%and = tail call <vscale x 8 x i16> @llvm.aarch64.sve.and.nxv8i16(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %a, <vscale x 8 x i16> %mask)
%splat = tail call <vscale x 8 x i16> @llvm.aarch64.sve.dup.x.nxv8i16(i16 2)
%shr = tail call <vscale x 8 x i16> @llvm.aarch64.sve.srshl.nxv8i16(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %and, <vscale x 8 x i16> %splat)

I think it's probably not worth handling a case like this, the abs case is one we have explicitly seen in real world code. We could in theory go further and add even more cases (shr also, for example), but given there is no obvious generic way to do this we'd have to add loads of special cases for each intrinsic we might care about, all just in case someone happened to write/end up with suboptimal ACLE code, (srshl is not a compiler generated thing, it will only come from ACLE intrinsics). I think it's probably best to only extend this if and when we need it.

Fix typos in comments, SQSHL -> SRSHL.
Rearrange tests to have positive cases first.

bsmith added a comment.May 11 2022, 8:07 AM

This comment was removed by bsmith.

Harbormaster completed remote builds in B163907: Diff 428668.May 11 2022, 8:51 AM

paulwalker-arm added inline comments.May 16 2022, 6:15 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1262–1265	I don't suppose `m_NonNegative()` works, or can be made to work, here? Ideally I'd hope for something similar in the higher up code paths so that perhaps we can catch more cases for free but I think realistically this is the only place where it might work today out-the-box.

Use m_NonNegative instead of manually checking for splats and their values

Harbormaster completed remote builds in B164638: Diff 429708.May 16 2022, 8:08 AM

paulwalker-arm accepted this revision.May 17 2022, 10:17 AM

paulwalker-arm added inline comments.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1234–1235	I think you can use `IRBuilder<> Builder(II);` here.

This revision is now accepted and ready to land.May 17 2022, 10:17 AM

This revision was landed with ongoing or failed builds.May 19 2022, 7:08 AM

Closed by commit rG5f4541fefbfc: [AArch64][SVE] Convert SRSHL to LSL when the fed from an ABS intrinsic (authored by bsmith). · Explain Why

This revision was automatically updated to reflect the committed changes.

bsmith added a commit: rG5f4541fefbfc: [AArch64][SVE] Convert SRSHL to LSL when the fed from an ABS intrinsic.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64TargetTransformInfo.cpp

38 lines

test/

Transforms/

InstCombine/

AArch64/

sve-intrinsic-abs-srshl.ll

150 lines

Diff 430664

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 1,221 Lines • ▼ Show 20 Lines
static Optional<Instruction *> instCombineMaxMinNM(InstCombiner &IC,		static Optional<Instruction *> instCombineMaxMinNM(InstCombiner &IC,
IntrinsicInst &II) {		IntrinsicInst &II) {
Value *A = II.getArgOperand(0);		Value *A = II.getArgOperand(0);
Value *B = II.getArgOperand(1);		Value *B = II.getArgOperand(1);
if (A == B)		if (A == B)
return IC.replaceInstUsesWith(II, A);		return IC.replaceInstUsesWith(II, A);

return None;		return None;
}		}
		david-armUnsubmitted Not Done Reply Inline Actions Do you mean SRSHL instead of SQSHL in all the comments? david-arm: Do you mean SRSHL instead of SQSHL in all the comments?

		static Optional<Instruction *> instCombineSVESrshl(InstCombiner &IC,
		IntrinsicInst &II) {
		IRBuilder<> Builder(&II);
		Value *Pred = II.getOperand(0);
		paulwalker-armUnsubmitted Not Done Reply Inline Actions I think you can use `IRBuilder<> Builder(II);` here. paulwalker-arm: I think you can use `IRBuilder<> Builder(II);` here.
		Value *Vec = II.getOperand(1);
		Value *Shift = II.getOperand(2);

		// Convert SRSHL into the simpler LSL intrinsic when fed by an ABS intrinsic.
		Value AbsPred, MergedValue;
		if (!match(Vec, m_Intrinsic<Intrinsic::aarch64_sve_sqabs>(
		m_Value(MergedValue), m_Value(AbsPred), m_Value())) &&
		!match(Vec, m_Intrinsic<Intrinsic::aarch64_sve_abs>(
		m_Value(MergedValue), m_Value(AbsPred), m_Value())))

		return None;

		// Transform is valid if any of the following are true:
		// * The ABS merge value is an undef or non-negative
		// * The ABS predicate is all active
		// * The ABS predicate and the SRSHL predicates are the same
		if (!isa<UndefValue>(MergedValue) &&
		!match(MergedValue, m_NonNegative()) &&
		AbsPred != Pred && !isAllActivePredicate(AbsPred))
		return None;

		// Only valid when the shift amount is non-negative, otherwise the rounding
		// behaviour of SRSHL cannot be ignored.
		if (!match(Shift, m_NonNegative()))
		return None;

		auto LSL = Builder.CreateIntrinsic(Intrinsic::aarch64_sve_lsl, {II.getType()},
		{Pred, Vec, Shift});

		return IC.replaceInstUsesWith(II, LSL);
		paulwalker-armUnsubmitted Not Done Reply Inline Actions I don't suppose `m_NonNegative()` works, or can be made to work, here? Ideally I'd hope for something similar in the higher up code paths so that perhaps we can catch more cases for free but I think realistically this is the only place where it might work today out-the-box. paulwalker-arm: I don't suppose `m_NonNegative()` works, or can be made to work, here? Ideally I'd hope for…
		}

Optional<Instruction *>		Optional<Instruction *>
AArch64TTIImpl::instCombineIntrinsic(InstCombiner &IC,		AArch64TTIImpl::instCombineIntrinsic(InstCombiner &IC,
IntrinsicInst &II) const {		IntrinsicInst &II) const {
Intrinsic::ID IID = II.getIntrinsicID();		Intrinsic::ID IID = II.getIntrinsicID();
switch (IID) {		switch (IID) {
default:		default:
break;		break;
case Intrinsic::aarch64_neon_fmaxnm:		case Intrinsic::aarch64_neon_fmaxnm:
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	AArch64TTIImpl::instCombineIntrinsic(InstCombiner &IC,
case Intrinsic::aarch64_sve_ld1:		case Intrinsic::aarch64_sve_ld1:
return instCombineSVELD1(IC, II, DL);		return instCombineSVELD1(IC, II, DL);
case Intrinsic::aarch64_sve_st1:		case Intrinsic::aarch64_sve_st1:
return instCombineSVEST1(IC, II, DL);		return instCombineSVEST1(IC, II, DL);
case Intrinsic::aarch64_sve_sdiv:		case Intrinsic::aarch64_sve_sdiv:
return instCombineSVESDIV(IC, II);		return instCombineSVESDIV(IC, II);
case Intrinsic::aarch64_sve_sel:		case Intrinsic::aarch64_sve_sel:
return instCombineSVESel(IC, II);		return instCombineSVESel(IC, II);
		case Intrinsic::aarch64_sve_srshl:
		return instCombineSVESrshl(IC, II);
}		}

return None;		return None;
}		}

Optional<Value *> AArch64TTIImpl::simplifyDemandedVectorEltsIntrinsic(		Optional<Value *> AArch64TTIImpl::simplifyDemandedVectorEltsIntrinsic(
InstCombiner &IC, IntrinsicInst &II, APInt OrigDemandedElts,		InstCombiner &IC, IntrinsicInst &II, APInt OrigDemandedElts,
APInt &UndefElts, APInt &UndefElts2, APInt &UndefElts3,		APInt &UndefElts, APInt &UndefElts2, APInt &UndefElts3,
▲ Show 20 Lines • Show All 1,557 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-abs-srshl.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -S -passes=instcombine < %s \| FileCheck %s

				target triple = "aarch64-unknown-linux-gnu"

				define <vscale x 8 x i16> @srshl_abs_undef_merge(<vscale x 8 x i16> %a, <vscale x 8 x i1> %pg, <vscale x 8 x i1> %pg2) #0 {
				; CHECK-LABEL: @srshl_abs_undef_merge(
				; CHECK-NEXT: [[ABS:%.]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.abs.nxv8i16(<vscale x 8 x i16> undef, <vscale x 8 x i1> [[PG:%.]], <vscale x 8 x i16> [[A:%.*]])
				; CHECK-NEXT: [[TMP1:%.]] = call <vscale x 8 x i16> @llvm.aarch64.sve.lsl.nxv8i16(<vscale x 8 x i1> [[PG2:%.]], <vscale x 8 x i16> [[ABS]], <vscale x 8 x i16> shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 2, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer))
				; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP1]]
				;
				%abs = tail call <vscale x 8 x i16> @llvm.aarch64.sve.abs.nxv8i16(<vscale x 8 x i16> undef, <vscale x 8 x i1> %pg, <vscale x 8 x i16> %a)
				%splat = tail call <vscale x 8 x i16> @llvm.aarch64.sve.dup.x.nxv8i16(i16 2)
				%shr = tail call <vscale x 8 x i16> @llvm.aarch64.sve.srshl.nxv8i16(<vscale x 8 x i1> %pg2, <vscale x 8 x i16> %abs, <vscale x 8 x i16> %splat)
				ret <vscale x 8 x i16> %shr
				}

				define <vscale x 8 x i16> @srshl_abs_zero_merge(<vscale x 8 x i16> %a, <vscale x 8 x i1> %pg, <vscale x 8 x i1> %pg2) #0 {
				; CHECK-LABEL: @srshl_abs_zero_merge(
				; CHECK-NEXT: [[ABS:%.]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.abs.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i1> [[PG:%.]], <vscale x 8 x i16> [[A:%.*]])
				; CHECK-NEXT: [[TMP1:%.]] = call <vscale x 8 x i16> @llvm.aarch64.sve.lsl.nxv8i16(<vscale x 8 x i1> [[PG2:%.]], <vscale x 8 x i16> [[ABS]], <vscale x 8 x i16> shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 2, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer))
				; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP1]]
				;
				%abs = tail call <vscale x 8 x i16> @llvm.aarch64.sve.abs.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i1> %pg, <vscale x 8 x i16> %a)
				%splat = tail call <vscale x 8 x i16> @llvm.aarch64.sve.dup.x.nxv8i16(i16 2)
				%shr = tail call <vscale x 8 x i16> @llvm.aarch64.sve.srshl.nxv8i16(<vscale x 8 x i1> %pg2, <vscale x 8 x i16> %abs, <vscale x 8 x i16> %splat)
				ret <vscale x 8 x i16> %shr
				}

				define <vscale x 8 x i16> @srshl_abs_positive_merge(<vscale x 8 x i16> %a, <vscale x 8 x i1> %pg, <vscale x 8 x i1> %pg2) #0 {
				; CHECK-LABEL: @srshl_abs_positive_merge(
				; CHECK-NEXT: [[ABS:%.]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.abs.nxv8i16(<vscale x 8 x i16> shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 2, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer), <vscale x 8 x i1> [[PG:%.]], <vscale x 8 x i16> [[A:%.*]])
				; CHECK-NEXT: [[TMP1:%.]] = call <vscale x 8 x i16> @llvm.aarch64.sve.lsl.nxv8i16(<vscale x 8 x i1> [[PG2:%.]], <vscale x 8 x i16> [[ABS]], <vscale x 8 x i16> shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 2, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer))
				; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP1]]
				;
				%absmerge = tail call <vscale x 8 x i16> @llvm.aarch64.sve.dup.x.nxv8i16(i16 2)
				%abs = tail call <vscale x 8 x i16> @llvm.aarch64.sve.abs.nxv8i16(<vscale x 8 x i16> %absmerge, <vscale x 8 x i1> %pg, <vscale x 8 x i16> %a)
				%splat = tail call <vscale x 8 x i16> @llvm.aarch64.sve.dup.x.nxv8i16(i16 2)
				%shr = tail call <vscale x 8 x i16> @llvm.aarch64.sve.srshl.nxv8i16(<vscale x 8 x i1> %pg2, <vscale x 8 x i16> %abs, <vscale x 8 x i16> %splat)
				ret <vscale x 8 x i16> %shr
				}

				define <vscale x 8 x i16> @srshl_abs_all_active_pred(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, <vscale x 8 x i1> %pg2) #0 {
				david-armUnsubmitted Not Done Reply Inline Actions nit: Maybe it's worth splitting the tests out into Positive and Negative tests, i.e. top half positive, bottom half negative? I think that makes it a bit easier to see what's going on. david-arm: nit: Maybe it's worth splitting the tests out into Positive and Negative tests, i.e. top half…
				; CHECK-LABEL: @srshl_abs_all_active_pred(
				; CHECK-NEXT: [[PG:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				; CHECK-NEXT: [[ABS:%.]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.abs.nxv8i16(<vscale x 8 x i16> [[B:%.]], <vscale x 8 x i1> [[PG]], <vscale x 8 x i16> [[A:%.*]])
				; CHECK-NEXT: [[TMP1:%.]] = call <vscale x 8 x i16> @llvm.aarch64.sve.lsl.nxv8i16(<vscale x 8 x i1> [[PG2:%.]], <vscale x 8 x i16> [[ABS]], <vscale x 8 x i16> shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 2, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer))
				; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP1]]
				;
				%pg = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%abs = tail call <vscale x 8 x i16> @llvm.aarch64.sve.abs.nxv8i16(<vscale x 8 x i16> %b, <vscale x 8 x i1> %pg, <vscale x 8 x i16> %a)
				%splat = tail call <vscale x 8 x i16> @llvm.aarch64.sve.dup.x.nxv8i16(i16 2)
				%shr = tail call <vscale x 8 x i16> @llvm.aarch64.sve.srshl.nxv8i16(<vscale x 8 x i1> %pg2, <vscale x 8 x i16> %abs, <vscale x 8 x i16> %splat)
				ret <vscale x 8 x i16> %shr
				}

				define <vscale x 8 x i16> @srshl_abs_same_pred(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, <vscale x 8 x i1> %pg) #0 {
				; CHECK-LABEL: @srshl_abs_same_pred(
				; CHECK-NEXT: [[ABS:%.]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.abs.nxv8i16(<vscale x 8 x i16> [[B:%.]], <vscale x 8 x i1> [[PG:%.]], <vscale x 8 x i16> [[A:%.]])
				; CHECK-NEXT: [[TMP1:%.*]] = call <vscale x 8 x i16> @llvm.aarch64.sve.lsl.nxv8i16(<vscale x 8 x i1> [[PG]], <vscale x 8 x i16> [[ABS]], <vscale x 8 x i16> shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 2, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer))
				; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP1]]
				;
				%abs = tail call <vscale x 8 x i16> @llvm.aarch64.sve.abs.nxv8i16(<vscale x 8 x i16> %b, <vscale x 8 x i1> %pg, <vscale x 8 x i16> %a)
				%splat = tail call <vscale x 8 x i16> @llvm.aarch64.sve.dup.x.nxv8i16(i16 2)
				%shr = tail call <vscale x 8 x i16> @llvm.aarch64.sve.srshl.nxv8i16(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %abs, <vscale x 8 x i16> %splat)
				ret <vscale x 8 x i16> %shr
				}

				define <vscale x 8 x i16> @srshl_sqabs(<vscale x 8 x i16> %a, <vscale x 8 x i1> %pg, <vscale x 8 x i1> %pg2) #0 {
				; CHECK-LABEL: @srshl_sqabs(
				; CHECK-NEXT: [[ABS:%.]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.sqabs.nxv8i16(<vscale x 8 x i16> undef, <vscale x 8 x i1> [[PG:%.]], <vscale x 8 x i16> [[A:%.*]])
				; CHECK-NEXT: [[TMP1:%.]] = call <vscale x 8 x i16> @llvm.aarch64.sve.lsl.nxv8i16(<vscale x 8 x i1> [[PG2:%.]], <vscale x 8 x i16> [[ABS]], <vscale x 8 x i16> shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 2, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer))
				; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP1]]
				;
				%abs = tail call <vscale x 8 x i16> @llvm.aarch64.sve.sqabs.nxv8i16(<vscale x 8 x i16> undef, <vscale x 8 x i1> %pg, <vscale x 8 x i16> %a)
				%splat = tail call <vscale x 8 x i16> @llvm.aarch64.sve.dup.x.nxv8i16(i16 2)
				%shr = tail call <vscale x 8 x i16> @llvm.aarch64.sve.srshl.nxv8i16(<vscale x 8 x i1> %pg2, <vscale x 8 x i16> %abs, <vscale x 8 x i16> %splat)
				ret <vscale x 8 x i16> %shr
				}

				define <vscale x 8 x i16> @srshl_abs_negative_merge(<vscale x 8 x i16> %a, <vscale x 8 x i1> %pg, <vscale x 8 x i1> %pg2) #0 {
				; CHECK-LABEL: @srshl_abs_negative_merge(
				; CHECK-NEXT: [[ABS:%.]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.abs.nxv8i16(<vscale x 8 x i16> shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 -1, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer), <vscale x 8 x i1> [[PG:%.]], <vscale x 8 x i16> [[A:%.*]])
				; CHECK-NEXT: [[SHR:%.]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.srshl.nxv8i16(<vscale x 8 x i1> [[PG2:%.]], <vscale x 8 x i16> [[ABS]], <vscale x 8 x i16> shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 2, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer))
				; CHECK-NEXT: ret <vscale x 8 x i16> [[SHR]]
				;
				%absmerge = tail call <vscale x 8 x i16> @llvm.aarch64.sve.dup.x.nxv8i16(i16 -1)
				%abs = tail call <vscale x 8 x i16> @llvm.aarch64.sve.abs.nxv8i16(<vscale x 8 x i16> %absmerge, <vscale x 8 x i1> %pg, <vscale x 8 x i16> %a)
				%splat = tail call <vscale x 8 x i16> @llvm.aarch64.sve.dup.x.nxv8i16(i16 2)
				%shr = tail call <vscale x 8 x i16> @llvm.aarch64.sve.srshl.nxv8i16(<vscale x 8 x i1> %pg2, <vscale x 8 x i16> %abs, <vscale x 8 x i16> %splat)
				ret <vscale x 8 x i16> %shr
				}

				define <vscale x 8 x i16> @srshl_abs_nonconst_merge(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, <vscale x 8 x i1> %pg, <vscale x 8 x i1> %pg2) #0 {
				; CHECK-LABEL: @srshl_abs_nonconst_merge(
				; CHECK-NEXT: [[ABS:%.]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.abs.nxv8i16(<vscale x 8 x i16> [[B:%.]], <vscale x 8 x i1> [[PG:%.]], <vscale x 8 x i16> [[A:%.]])
				; CHECK-NEXT: [[SHR:%.]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.srshl.nxv8i16(<vscale x 8 x i1> [[PG2:%.]], <vscale x 8 x i16> [[ABS]], <vscale x 8 x i16> shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 2, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer))
				; CHECK-NEXT: ret <vscale x 8 x i16> [[SHR]]
				;
				%abs = tail call <vscale x 8 x i16> @llvm.aarch64.sve.abs.nxv8i16(<vscale x 8 x i16> %b, <vscale x 8 x i1> %pg, <vscale x 8 x i16> %a)
				%splat = tail call <vscale x 8 x i16> @llvm.aarch64.sve.dup.x.nxv8i16(i16 2)
				%shr = tail call <vscale x 8 x i16> @llvm.aarch64.sve.srshl.nxv8i16(<vscale x 8 x i1> %pg2, <vscale x 8 x i16> %abs, <vscale x 8 x i16> %splat)
				ret <vscale x 8 x i16> %shr
				}

				define <vscale x 8 x i16> @srshl_abs_not_all_active_pred(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, <vscale x 8 x i1> %pg2) #0 {
				; CHECK-LABEL: @srshl_abs_not_all_active_pred(
				; CHECK-NEXT: [[PG:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 8)
				; CHECK-NEXT: [[ABS:%.]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.abs.nxv8i16(<vscale x 8 x i16> [[B:%.]], <vscale x 8 x i1> [[PG]], <vscale x 8 x i16> [[A:%.*]])
				; CHECK-NEXT: [[SHR:%.]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.srshl.nxv8i16(<vscale x 8 x i1> [[PG2:%.]], <vscale x 8 x i16> [[ABS]], <vscale x 8 x i16> shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 2, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer))
				; CHECK-NEXT: ret <vscale x 8 x i16> [[SHR]]
				;
				%pg = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 8)
				%abs = tail call <vscale x 8 x i16> @llvm.aarch64.sve.abs.nxv8i16(<vscale x 8 x i16> %b, <vscale x 8 x i1> %pg, <vscale x 8 x i16> %a)
				%splat = tail call <vscale x 8 x i16> @llvm.aarch64.sve.dup.x.nxv8i16(i16 2)
				%shr = tail call <vscale x 8 x i16> @llvm.aarch64.sve.srshl.nxv8i16(<vscale x 8 x i1> %pg2, <vscale x 8 x i16> %abs, <vscale x 8 x i16> %splat)
				ret <vscale x 8 x i16> %shr
				}

				define <vscale x 8 x i16> @srshl_abs_diff_pred(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, <vscale x 8 x i1> %pg, <vscale x 8 x i1> %pg2) #0 {
				; CHECK-LABEL: @srshl_abs_diff_pred(
				; CHECK-NEXT: [[ABS:%.]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.abs.nxv8i16(<vscale x 8 x i16> [[B:%.]], <vscale x 8 x i1> [[PG:%.]], <vscale x 8 x i16> [[A:%.]])
				; CHECK-NEXT: [[SHR:%.]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.srshl.nxv8i16(<vscale x 8 x i1> [[PG2:%.]], <vscale x 8 x i16> [[ABS]], <vscale x 8 x i16> shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 2, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer))
				; CHECK-NEXT: ret <vscale x 8 x i16> [[SHR]]
				;
				%abs = tail call <vscale x 8 x i16> @llvm.aarch64.sve.abs.nxv8i16(<vscale x 8 x i16> %b, <vscale x 8 x i1> %pg, <vscale x 8 x i16> %a)
				%splat = tail call <vscale x 8 x i16> @llvm.aarch64.sve.dup.x.nxv8i16(i16 2)
				%shr = tail call <vscale x 8 x i16> @llvm.aarch64.sve.srshl.nxv8i16(<vscale x 8 x i1> %pg2, <vscale x 8 x i16> %abs, <vscale x 8 x i16> %splat)
				ret <vscale x 8 x i16> %shr
				}

				define <vscale x 8 x i16> @srshl_abs_negative_shift(<vscale x 8 x i16> %a, <vscale x 8 x i1> %pg, <vscale x 8 x i1> %pg2) #0 {
				; CHECK-LABEL: @srshl_abs_negative_shift(
				; CHECK-NEXT: [[ABS:%.]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.abs.nxv8i16(<vscale x 8 x i16> undef, <vscale x 8 x i1> [[PG:%.]], <vscale x 8 x i16> [[A:%.*]])
				; CHECK-NEXT: [[SHR:%.]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.srshl.nxv8i16(<vscale x 8 x i1> [[PG2:%.]], <vscale x 8 x i16> [[ABS]], <vscale x 8 x i16> shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 -2, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer))
				; CHECK-NEXT: ret <vscale x 8 x i16> [[SHR]]
				;
				%abs = tail call <vscale x 8 x i16> @llvm.aarch64.sve.abs.nxv8i16(<vscale x 8 x i16> undef, <vscale x 8 x i1> %pg, <vscale x 8 x i16> %a)
				%splat = tail call <vscale x 8 x i16> @llvm.aarch64.sve.dup.x.nxv8i16(i16 -2)
				%shr = tail call <vscale x 8 x i16> @llvm.aarch64.sve.srshl.nxv8i16(<vscale x 8 x i1> %pg2, <vscale x 8 x i16> %abs, <vscale x 8 x i16> %splat)
				ret <vscale x 8 x i16> %shr
				}

				declare <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 immarg)
				declare <vscale x 8 x i16> @llvm.aarch64.sve.dup.x.nxv8i16(i16)
				declare <vscale x 8 x i16> @llvm.aarch64.sve.abs.nxv8i16(<vscale x 8 x i16>, <vscale x 8 x i1>, <vscale x 8 x i16>)
				declare <vscale x 8 x i16> @llvm.aarch64.sve.sqabs.nxv8i16(<vscale x 8 x i16>, <vscale x 8 x i1>, <vscale x 8 x i16>)
				declare <vscale x 8 x i16> @llvm.aarch64.sve.srshl.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>, <vscale x 8 x i16>)

				attributes #0 = { "target-features"="+sve,+sve2" }

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Convert SRSHL to LSL when the fed from an ABS intrinsicClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 430664

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-abs-srshl.ll

[AArch64][SVE] Convert SRSHL to LSL when the fed from an ABS intrinsic
ClosedPublic