This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/ARM/
-
Target/
-
ARM/
-
ARMTargetTransformInfo.h
1
ARMTargetTransformInfo.cpp
-
test/Transforms/InstCombine/ARM/
-
Transforms/
-
InstCombine/
-
ARM/
-
mve-narrow.ll

Differential D109325

[ARM] Teach DemandedVectorElts about VMOVN lanes
ClosedPublic

Authored by dmgreen on Sep 6 2021, 7:31 AM.

Download Raw Diff

Details

Reviewers

simon_tatham
samtebbs
SjoerdMeijer
ostannard

Commits

rG5a6dfbb8cd26: [ARM] Teach DemandedVectorElts about VMOVN lanes

Summary

The class of instructions that write to narrow top/bottom lanes only demand the even or odd elements of the input lanes. Which means that VMOVNT; VMOVNB demands no lanes from the original input. This teaches that to instcombine from the target hooks available through ARMTTIImpl.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dmgreen created this revision.Sep 6 2021, 7:31 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald TranscriptSep 6 2021, 7:31 AM

dmgreen requested review of this revision.Sep 6 2021, 7:31 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 6 2021, 7:31 AM

Harbormaster completed remote builds in B122765: Diff 370920.Sep 6 2021, 7:32 AM

Before I continue reading the rest, just wanted to check this:

The class of instructions that write to narrow top/bottom lanes only demand the even or odd elements of the input lanes. Which means that VMOVNT; VMOVNB demands no lanes from the original input.

I don't follow the conclusion here that the VMOVNT; VMOVNB don't demand lanes from the original input, because we do read from the original input?

In D109325#2989713, @SjoerdMeijer wrote:

Before I continue reading the rest, just wanted to check this:

The class of instructions that write to narrow top/bottom lanes only demand the even or odd elements of the input lanes. Which means that VMOVNT; VMOVNB demands no lanes from the original input.

I don't follow the conclusion here that the VMOVNT; VMOVNB don't demand lanes from the original input, because we do read from the original input?

Oh yeah, good point - the predicated intrinsics will not work this way if they are not acting on all lanes. I had forgotten that the first input is also the passthrough value. I'll update the patch to remove those.

For the non-predicated cases I mean to say - if you have overwritten the top lanes and you have overwritten the bottom lanes - then no value from the original input are demanded. You have overridden all the lanes. So a VMOVNT will demand the bottom (even) lanes from the first input and insert a new value into into the top (odd) lanes. A VMOVNB will demand the top (odd) lanes and write new values into the bottom. A pair of X=VMOVNT A, B; Y= VMOVNB X, C will use none of the lanes of A.

Remove predicated instructions.

Ah, thanks, got it, that's clear! Looks like a good change to me.

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
276	The constants 2, 4, and 7 here were not immediately obvious to me. Perhaps a comment would help less familiar readers.

This revision is now accepted and ready to land.Sep 10 2021, 2:38 AM

Harbormaster completed remote builds in B123393: Diff 371838.Sep 10 2021, 2:59 AM

This revision was landed with ongoing or failed builds.Sep 14 2021, 3:05 AM

Closed by commit rG5a6dfbb8cd26: [ARM] Teach DemandedVectorElts about VMOVN lanes (authored by dmgreen). · Explain Why

This revision was automatically updated to reflect the committed changes.

dmgreen added a commit: rG5a6dfbb8cd26: [ARM] Teach DemandedVectorElts about VMOVN lanes.

Revision Contents

Path

Size

llvm/

lib/

Target/

ARM/

ARMTargetTransformInfo.h

5 lines

ARMTargetTransformInfo.cpp

42 lines

test/

Transforms/

InstCombine/

ARM/

mve-narrow.ll

26 lines

Diff 372450

llvm/lib/Target/ARM/ARMTargetTransformInfo.h

Show First 20 Lines • Show All 114 Lines • ▼ Show 20 Lines	public:
/// SIMD instructions remains unchanged from ARMv7. Only AArch64 SIMD		/// SIMD instructions remains unchanged from ARMv7. Only AArch64 SIMD
/// and Arm MVE are IEEE-754 compliant.		/// and Arm MVE are IEEE-754 compliant.
bool isFPVectorizationPotentiallyUnsafe() {		bool isFPVectorizationPotentiallyUnsafe() {
return !ST->isTargetDarwin() && !ST->hasMVEFloatOps();		return !ST->isTargetDarwin() && !ST->hasMVEFloatOps();
}		}

Optional<Instruction *> instCombineIntrinsic(InstCombiner &IC,		Optional<Instruction *> instCombineIntrinsic(InstCombiner &IC,
IntrinsicInst &II) const;		IntrinsicInst &II) const;
		Optional<Value *> simplifyDemandedVectorEltsIntrinsic(
		InstCombiner &IC, IntrinsicInst &II, APInt DemandedElts, APInt &UndefElts,
		APInt &UndefElts2, APInt &UndefElts3,
		std::function<void(Instruction *, unsigned, APInt, APInt &)>
		SimplifyAndSetOp) const;

/// \name Scalar TTI Implementations		/// \name Scalar TTI Implementations
/// @{		/// @{

InstructionCost getIntImmCodeSizeCost(unsigned Opcode, unsigned Idx,		InstructionCost getIntImmCodeSizeCost(unsigned Opcode, unsigned Idx,
const APInt &Imm, Type *Ty);		const APInt &Imm, Type *Ty);

using BaseT::getIntImmCost;		using BaseT::getIntImmCost;
▲ Show 20 Lines • Show All 202 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp

Show First 20 Lines • Show All 242 Lines • ▼ Show 20 Lines	if (I->hasOneUse()) {
}		}
}		}
return None;		return None;
}		}
}		}
return None;		return None;
}		}

		Optional<Value *> ARMTTIImpl::simplifyDemandedVectorEltsIntrinsic(
		InstCombiner &IC, IntrinsicInst &II, APInt OrigDemandedElts,
		APInt &UndefElts, APInt &UndefElts2, APInt &UndefElts3,
		std::function<void(Instruction *, unsigned, APInt, APInt &)>
		SimplifyAndSetOp) const {

		// Compute the demanded bits for a narrowing MVE intrinsic. The TopOpc is the
		// opcode specifying a Top/Bottom instruction, which can change between
		// instructions.
		auto SimplifyNarrowInstrTopBottom =[&](unsigned TopOpc) {
		unsigned NumElts = cast<FixedVectorType>(II.getType())->getNumElements();
		unsigned IsTop = cast<ConstantInt>(II.getOperand(TopOpc))->getZExtValue();

		// The only odd/even lanes of operand 0 will only be demanded depending
		// on whether this is a top/bottom instruction.
		APInt DemandedElts =
		APInt::getSplat(NumElts, IsTop ? APInt::getLowBitsSet(2, 1)
		: APInt::getHighBitsSet(2, 1));
		SimplifyAndSetOp(&II, 0, OrigDemandedElts & DemandedElts, UndefElts);
		// The other lanes will be defined from the inserted elements.
		UndefElts &= APInt::getSplat(NumElts, !IsTop ? APInt::getLowBitsSet(2, 1)
		: APInt::getHighBitsSet(2, 1));
		return None;
		};

		switch (II.getIntrinsicID()) {
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions The constants 2, 4, and 7 here were not immediately obvious to me. Perhaps a comment would help less familiar readers. SjoerdMeijer: The constants 2, 4, and 7 here were not immediately obvious to me. Perhaps a comment would help…
		default:
		break;
		case Intrinsic::arm_mve_vcvt_narrow:
		SimplifyNarrowInstrTopBottom(2);
		break;
		case Intrinsic::arm_mve_vqmovn:
		SimplifyNarrowInstrTopBottom(4);
		break;
		case Intrinsic::arm_mve_vshrn:
		SimplifyNarrowInstrTopBottom(7);
		break;
		}

		return None;
		}

InstructionCost ARMTTIImpl::getIntImmCost(const APInt &Imm, Type *Ty,		InstructionCost ARMTTIImpl::getIntImmCost(const APInt &Imm, Type *Ty,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind) {
assert(Ty->isIntegerTy());		assert(Ty->isIntegerTy());

unsigned Bits = Ty->getPrimitiveSizeInBits();		unsigned Bits = Ty->getPrimitiveSizeInBits();
if (Bits == 0 \|\| Imm.getActiveBits() >= 64)		if (Bits == 0 \|\| Imm.getActiveBits() >= 64)
return 4;		return 4;

▲ Show 20 Lines • Show All 2,031 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/ARM/mve-narrow.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt -instcombine -S -mtriple=thumbv8.1m.main-none-eabi -mattr=+mve.fp -o - %s \| FileCheck %s		; RUN: opt -instcombine -S -mtriple=thumbv8.1m.main-none-eabi -mattr=+mve.fp -o - %s \| FileCheck %s

target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"		target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"

; Various patterns testing v8i16		; Various patterns testing v8i16

define <8 x i16> @test_shrn_v8i16_t1(<8 x i16> %a, <8 x i16> %b, <4 x i32> %c, <4 x i32> %d) {		define <8 x i16> @test_shrn_v8i16_t1(<8 x i16> %a, <8 x i16> %b, <4 x i32> %c, <4 x i32> %d) {
; CHECK-LABEL: @test_shrn_v8i16_t1(		; CHECK-LABEL: @test_shrn_v8i16_t1(
; CHECK-NEXT: [[X:%.]] = add <8 x i16> [[A:%.]], <i16 1, i16 -1, i16 1, i16 -1, i16 1, i16 -1, i16 1, i16 -1>		; CHECK-NEXT: [[X:%.]] = add <8 x i16> [[A:%.]], <i16 1, i16 poison, i16 1, i16 poison, i16 1, i16 poison, i16 1, i16 poison>
; CHECK-NEXT: [[Z:%.]] = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> [[X]], <4 x i32> [[D:%.]], i32 16, i32 0, i32 0, i32 0, i32 0, i32 1)		; CHECK-NEXT: [[Z:%.]] = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> [[X]], <4 x i32> [[D:%.]], i32 16, i32 0, i32 0, i32 0, i32 0, i32 1)
; CHECK-NEXT: ret <8 x i16> [[Z]]		; CHECK-NEXT: ret <8 x i16> [[Z]]
;		;
%x = add <8 x i16> %a, <i16 1, i16 -1, i16 1, i16 -1, i16 1, i16 -1, i16 1, i16 -1>		%x = add <8 x i16> %a, <i16 1, i16 -1, i16 1, i16 -1, i16 1, i16 -1, i16 1, i16 -1>
%z = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> %x, <4 x i32> %d, i32 16, i32 0, i32 0, i32 0, i32 0, i32 1)		%z = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> %x, <4 x i32> %d, i32 16, i32 0, i32 0, i32 0, i32 0, i32 1)
ret <8 x i16> %z		ret <8 x i16> %z
}		}

define <8 x i16> @test_shrn_v8i16_t2(<8 x i16> %a, <8 x i16> %b, <4 x i32> %c, <4 x i32> %d) {		define <8 x i16> @test_shrn_v8i16_t2(<8 x i16> %a, <8 x i16> %b, <4 x i32> %c, <4 x i32> %d) {
; CHECK-LABEL: @test_shrn_v8i16_t2(		; CHECK-LABEL: @test_shrn_v8i16_t2(
; CHECK-NEXT: [[X:%.]] = add <8 x i16> [[A:%.]], <i16 -1, i16 1, i16 -1, i16 1, i16 -1, i16 1, i16 -1, i16 1>		; CHECK-NEXT: [[X:%.]] = add <8 x i16> [[A:%.]], <i16 -1, i16 poison, i16 -1, i16 poison, i16 -1, i16 poison, i16 -1, i16 poison>
; CHECK-NEXT: [[Z:%.]] = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> [[X]], <4 x i32> [[D:%.]], i32 16, i32 0, i32 0, i32 0, i32 0, i32 1)		; CHECK-NEXT: [[Z:%.]] = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> [[X]], <4 x i32> [[D:%.]], i32 16, i32 0, i32 0, i32 0, i32 0, i32 1)
; CHECK-NEXT: ret <8 x i16> [[Z]]		; CHECK-NEXT: ret <8 x i16> [[Z]]
;		;
%x = add <8 x i16> %a, <i16 -1, i16 1, i16 -1, i16 1, i16 -1, i16 1, i16 -1, i16 1>		%x = add <8 x i16> %a, <i16 -1, i16 1, i16 -1, i16 1, i16 -1, i16 1, i16 -1, i16 1>
%z = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> %x, <4 x i32> %d, i32 16, i32 0, i32 0, i32 0, i32 0, i32 1)		%z = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> %x, <4 x i32> %d, i32 16, i32 0, i32 0, i32 0, i32 0, i32 1)
ret <8 x i16> %z		ret <8 x i16> %z
}		}

define <8 x i16> @test_shrn_v8i16_b1(<8 x i16> %a, <8 x i16> %b, <4 x i32> %c, <4 x i32> %d) {		define <8 x i16> @test_shrn_v8i16_b1(<8 x i16> %a, <8 x i16> %b, <4 x i32> %c, <4 x i32> %d) {
; CHECK-LABEL: @test_shrn_v8i16_b1(		; CHECK-LABEL: @test_shrn_v8i16_b1(
; CHECK-NEXT: [[X:%.]] = add <8 x i16> [[A:%.]], <i16 1, i16 -1, i16 1, i16 -1, i16 1, i16 -1, i16 1, i16 -1>		; CHECK-NEXT: [[X:%.]] = add <8 x i16> [[A:%.]], <i16 poison, i16 -1, i16 poison, i16 -1, i16 poison, i16 -1, i16 poison, i16 -1>
; CHECK-NEXT: [[Z:%.]] = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> [[X]], <4 x i32> [[D:%.]], i32 16, i32 0, i32 0, i32 0, i32 0, i32 0)		; CHECK-NEXT: [[Z:%.]] = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> [[X]], <4 x i32> [[D:%.]], i32 16, i32 0, i32 0, i32 0, i32 0, i32 0)
; CHECK-NEXT: ret <8 x i16> [[Z]]		; CHECK-NEXT: ret <8 x i16> [[Z]]
;		;
%x = add <8 x i16> %a, <i16 1, i16 -1, i16 1, i16 -1, i16 1, i16 -1, i16 1, i16 -1>		%x = add <8 x i16> %a, <i16 1, i16 -1, i16 1, i16 -1, i16 1, i16 -1, i16 1, i16 -1>
%z = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> %x, <4 x i32> %d, i32 16, i32 0, i32 0, i32 0, i32 0, i32 0)		%z = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> %x, <4 x i32> %d, i32 16, i32 0, i32 0, i32 0, i32 0, i32 0)
ret <8 x i16> %z		ret <8 x i16> %z
}		}

define <8 x i16> @test_shrn_v8i16_b2(<8 x i16> %a, <8 x i16> %b, <4 x i32> %c, <4 x i32> %d) {		define <8 x i16> @test_shrn_v8i16_b2(<8 x i16> %a, <8 x i16> %b, <4 x i32> %c, <4 x i32> %d) {
; CHECK-LABEL: @test_shrn_v8i16_b2(		; CHECK-LABEL: @test_shrn_v8i16_b2(
; CHECK-NEXT: [[X:%.]] = add <8 x i16> [[A:%.]], <i16 -1, i16 1, i16 -1, i16 1, i16 -1, i16 1, i16 -1, i16 1>		; CHECK-NEXT: [[X:%.]] = add <8 x i16> [[A:%.]], <i16 poison, i16 1, i16 poison, i16 1, i16 poison, i16 1, i16 poison, i16 1>
; CHECK-NEXT: [[Z:%.]] = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> [[X]], <4 x i32> [[D:%.]], i32 16, i32 0, i32 0, i32 0, i32 0, i32 0)		; CHECK-NEXT: [[Z:%.]] = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> [[X]], <4 x i32> [[D:%.]], i32 16, i32 0, i32 0, i32 0, i32 0, i32 0)
; CHECK-NEXT: ret <8 x i16> [[Z]]		; CHECK-NEXT: ret <8 x i16> [[Z]]
;		;
%x = add <8 x i16> %a, <i16 -1, i16 1, i16 -1, i16 1, i16 -1, i16 1, i16 -1, i16 1>		%x = add <8 x i16> %a, <i16 -1, i16 1, i16 -1, i16 1, i16 -1, i16 1, i16 -1, i16 1>
%z = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> %x, <4 x i32> %d, i32 16, i32 0, i32 0, i32 0, i32 0, i32 0)		%z = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> %x, <4 x i32> %d, i32 16, i32 0, i32 0, i32 0, i32 0, i32 0)
ret <8 x i16> %z		ret <8 x i16> %z
}		}

define <8 x i16> @test_shrn_v8i16_bt(<8 x i16> %a, <8 x i16> %b, <4 x i32> %c, <4 x i32> %d) {		define <8 x i16> @test_shrn_v8i16_bt(<8 x i16> %a, <8 x i16> %b, <4 x i32> %c, <4 x i32> %d) {
; CHECK-LABEL: @test_shrn_v8i16_bt(		; CHECK-LABEL: @test_shrn_v8i16_bt(
; CHECK-NEXT: [[X:%.]] = add <8 x i16> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[Y:%.]] = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> poison, <4 x i32> [[C:%.]], i32 16, i32 0, i32 0, i32 0, i32 0, i32 0)
; CHECK-NEXT: [[Y:%.]] = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> [[X]], <4 x i32> [[C:%.]], i32 16, i32 0, i32 0, i32 0, i32 0, i32 0)
; CHECK-NEXT: [[Z:%.]] = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> [[Y]], <4 x i32> [[D:%.]], i32 16, i32 0, i32 0, i32 0, i32 0, i32 1)		; CHECK-NEXT: [[Z:%.]] = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> [[Y]], <4 x i32> [[D:%.]], i32 16, i32 0, i32 0, i32 0, i32 0, i32 1)
; CHECK-NEXT: ret <8 x i16> [[Z]]		; CHECK-NEXT: ret <8 x i16> [[Z]]
;		;
%x = add <8 x i16> %a, %b		%x = add <8 x i16> %a, %b
%y = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> %x, <4 x i32> %c, i32 16, i32 0, i32 0, i32 0, i32 0, i32 0)		%y = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> %x, <4 x i32> %c, i32 16, i32 0, i32 0, i32 0, i32 0, i32 0)
%z = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> %y, <4 x i32> %d, i32 16, i32 0, i32 0, i32 0, i32 0, i32 1)		%z = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> %y, <4 x i32> %d, i32 16, i32 0, i32 0, i32 0, i32 0, i32 1)
ret <8 x i16> %z		ret <8 x i16> %z
}		}

define <8 x i16> @test_shrn_v8i16_tb(<8 x i16> %a, <8 x i16> %b, <4 x i32> %c, <4 x i32> %d) {		define <8 x i16> @test_shrn_v8i16_tb(<8 x i16> %a, <8 x i16> %b, <4 x i32> %c, <4 x i32> %d) {
; CHECK-LABEL: @test_shrn_v8i16_tb(		; CHECK-LABEL: @test_shrn_v8i16_tb(
; CHECK-NEXT: [[X:%.]] = add <8 x i16> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[Y:%.]] = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> poison, <4 x i32> [[C:%.]], i32 16, i32 0, i32 0, i32 0, i32 0, i32 1)
; CHECK-NEXT: [[Y:%.]] = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> [[X]], <4 x i32> [[C:%.]], i32 16, i32 0, i32 0, i32 0, i32 0, i32 1)
; CHECK-NEXT: [[Z:%.]] = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> [[Y]], <4 x i32> [[D:%.]], i32 16, i32 0, i32 0, i32 0, i32 0, i32 0)		; CHECK-NEXT: [[Z:%.]] = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> [[Y]], <4 x i32> [[D:%.]], i32 16, i32 0, i32 0, i32 0, i32 0, i32 0)
; CHECK-NEXT: ret <8 x i16> [[Z]]		; CHECK-NEXT: ret <8 x i16> [[Z]]
;		;
%x = add <8 x i16> %a, %b		%x = add <8 x i16> %a, %b
%y = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> %x, <4 x i32> %c, i32 16, i32 0, i32 0, i32 0, i32 0, i32 1)		%y = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> %x, <4 x i32> %c, i32 16, i32 0, i32 0, i32 0, i32 0, i32 1)
%z = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> %y, <4 x i32> %d, i32 16, i32 0, i32 0, i32 0, i32 0, i32 0)		%z = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> %y, <4 x i32> %d, i32 16, i32 0, i32 0, i32 0, i32 0, i32 0)
ret <8 x i16> %z		ret <8 x i16> %z
}		}
Show All 23 Lines	;
%z = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> %y, <4 x i32> %d, i32 16, i32 0, i32 0, i32 0, i32 0, i32 1)		%z = call <8 x i16> @llvm.arm.mve.vshrn.v8i16.v4i32(<8 x i16> %y, <4 x i32> %d, i32 16, i32 0, i32 0, i32 0, i32 0, i32 1)
ret <8 x i16> %z		ret <8 x i16> %z
}		}

; Other types and sizes		; Other types and sizes

define <16 x i8> @test_shrn_v16i8_bt(<16 x i8> %a, <16 x i8> %b, <8 x i16> %c, <8 x i16> %d) {		define <16 x i8> @test_shrn_v16i8_bt(<16 x i8> %a, <16 x i8> %b, <8 x i16> %c, <8 x i16> %d) {
; CHECK-LABEL: @test_shrn_v16i8_bt(		; CHECK-LABEL: @test_shrn_v16i8_bt(
; CHECK-NEXT: [[X:%.]] = add <16 x i8> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[Y:%.]] = call <16 x i8> @llvm.arm.mve.vshrn.v16i8.v8i16(<16 x i8> poison, <8 x i16> [[C:%.]], i32 16, i32 0, i32 0, i32 0, i32 0, i32 0)
; CHECK-NEXT: [[Y:%.]] = call <16 x i8> @llvm.arm.mve.vshrn.v16i8.v8i16(<16 x i8> [[X]], <8 x i16> [[C:%.]], i32 16, i32 0, i32 0, i32 0, i32 0, i32 0)
; CHECK-NEXT: [[Z:%.]] = call <16 x i8> @llvm.arm.mve.vshrn.v16i8.v8i16(<16 x i8> [[Y]], <8 x i16> [[D:%.]], i32 16, i32 0, i32 0, i32 0, i32 0, i32 1)		; CHECK-NEXT: [[Z:%.]] = call <16 x i8> @llvm.arm.mve.vshrn.v16i8.v8i16(<16 x i8> [[Y]], <8 x i16> [[D:%.]], i32 16, i32 0, i32 0, i32 0, i32 0, i32 1)
; CHECK-NEXT: ret <16 x i8> [[Z]]		; CHECK-NEXT: ret <16 x i8> [[Z]]
;		;
%x = add <16 x i8> %a, %b		%x = add <16 x i8> %a, %b
%y = call <16 x i8> @llvm.arm.mve.vshrn.v16i8.v8i16(<16 x i8> %x, <8 x i16> %c, i32 16, i32 0, i32 0, i32 0, i32 0, i32 0)		%y = call <16 x i8> @llvm.arm.mve.vshrn.v16i8.v8i16(<16 x i8> %x, <8 x i16> %c, i32 16, i32 0, i32 0, i32 0, i32 0, i32 0)
%z = call <16 x i8> @llvm.arm.mve.vshrn.v16i8.v8i16(<16 x i8> %y, <8 x i16> %d, i32 16, i32 0, i32 0, i32 0, i32 0, i32 1)		%z = call <16 x i8> @llvm.arm.mve.vshrn.v16i8.v8i16(<16 x i8> %y, <8 x i16> %d, i32 16, i32 0, i32 0, i32 0, i32 0, i32 1)
ret <16 x i8> %z		ret <16 x i8> %z
}		}
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	;
%x = add <16 x i8> %a, %b		%x = add <16 x i8> %a, %b
%y = call <16 x i8> @llvm.arm.mve.vmovn.predicated.v16i8.v8i16.v8i1(<16 x i8> %x, <8 x i16> %c, i32 0, <8 x i1> %p)		%y = call <16 x i8> @llvm.arm.mve.vmovn.predicated.v16i8.v8i16.v8i1(<16 x i8> %x, <8 x i16> %c, i32 0, <8 x i1> %p)
%z = call <16 x i8> @llvm.arm.mve.vmovn.predicated.v16i8.v8i16.v8i1(<16 x i8> %y, <8 x i16> %d, i32 1, <8 x i1> %p)		%z = call <16 x i8> @llvm.arm.mve.vmovn.predicated.v16i8.v8i16.v8i1(<16 x i8> %y, <8 x i16> %d, i32 1, <8 x i1> %p)
ret <16 x i8> %z		ret <16 x i8> %z
}		}

define <8 x i16> @test_qmovn_v8i16_bt(<8 x i16> %a, <8 x i16> %b, <4 x i32> %c, <4 x i32> %d) {		define <8 x i16> @test_qmovn_v8i16_bt(<8 x i16> %a, <8 x i16> %b, <4 x i32> %c, <4 x i32> %d) {
; CHECK-LABEL: @test_qmovn_v8i16_bt(		; CHECK-LABEL: @test_qmovn_v8i16_bt(
; CHECK-NEXT: [[X:%.]] = add <8 x i16> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[Y:%.]] = call <8 x i16> @llvm.arm.mve.vqmovn.v8i16.v4i32(<8 x i16> poison, <4 x i32> [[C:%.]], i32 0, i32 0, i32 0)
; CHECK-NEXT: [[Y:%.]] = call <8 x i16> @llvm.arm.mve.vqmovn.v8i16.v4i32(<8 x i16> [[X]], <4 x i32> [[C:%.]], i32 0, i32 0, i32 0)
; CHECK-NEXT: [[Z:%.]] = call <8 x i16> @llvm.arm.mve.vqmovn.v8i16.v4i32(<8 x i16> [[Y]], <4 x i32> [[D:%.]], i32 0, i32 0, i32 1)		; CHECK-NEXT: [[Z:%.]] = call <8 x i16> @llvm.arm.mve.vqmovn.v8i16.v4i32(<8 x i16> [[Y]], <4 x i32> [[D:%.]], i32 0, i32 0, i32 1)
; CHECK-NEXT: ret <8 x i16> [[Z]]		; CHECK-NEXT: ret <8 x i16> [[Z]]
;		;
%x = add <8 x i16> %a, %b		%x = add <8 x i16> %a, %b
%y = call <8 x i16> @llvm.arm.mve.vqmovn.v8i16.v4i32(<8 x i16> %x, <4 x i32> %c, i32 0, i32 0, i32 0)		%y = call <8 x i16> @llvm.arm.mve.vqmovn.v8i16.v4i32(<8 x i16> %x, <4 x i32> %c, i32 0, i32 0, i32 0)
%z = call <8 x i16> @llvm.arm.mve.vqmovn.v8i16.v4i32(<8 x i16> %y, <4 x i32> %d, i32 0, i32 0, i32 1)		%z = call <8 x i16> @llvm.arm.mve.vqmovn.v8i16.v4i32(<8 x i16> %y, <4 x i32> %d, i32 0, i32 0, i32 1)
ret <8 x i16> %z		ret <8 x i16> %z
}		}

define <16 x i8> @test_qmovn_v16i8_bt(<16 x i8> %a, <16 x i8> %b, <8 x i16> %c, <8 x i16> %d) {		define <16 x i8> @test_qmovn_v16i8_bt(<16 x i8> %a, <16 x i8> %b, <8 x i16> %c, <8 x i16> %d) {
; CHECK-LABEL: @test_qmovn_v16i8_bt(		; CHECK-LABEL: @test_qmovn_v16i8_bt(
; CHECK-NEXT: [[X:%.]] = add <16 x i8> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[Y:%.]] = call <16 x i8> @llvm.arm.mve.vqmovn.v16i8.v8i16(<16 x i8> poison, <8 x i16> [[C:%.]], i32 0, i32 0, i32 0)
; CHECK-NEXT: [[Y:%.]] = call <16 x i8> @llvm.arm.mve.vqmovn.v16i8.v8i16(<16 x i8> [[X]], <8 x i16> [[C:%.]], i32 0, i32 0, i32 0)
; CHECK-NEXT: [[Z:%.]] = call <16 x i8> @llvm.arm.mve.vqmovn.v16i8.v8i16(<16 x i8> [[Y]], <8 x i16> [[D:%.]], i32 0, i32 0, i32 1)		; CHECK-NEXT: [[Z:%.]] = call <16 x i8> @llvm.arm.mve.vqmovn.v16i8.v8i16(<16 x i8> [[Y]], <8 x i16> [[D:%.]], i32 0, i32 0, i32 1)
; CHECK-NEXT: ret <16 x i8> [[Z]]		; CHECK-NEXT: ret <16 x i8> [[Z]]
;		;
%x = add <16 x i8> %a, %b		%x = add <16 x i8> %a, %b
%y = call <16 x i8> @llvm.arm.mve.vqmovn.v16i8.v8i16(<16 x i8> %x, <8 x i16> %c, i32 0, i32 0, i32 0)		%y = call <16 x i8> @llvm.arm.mve.vqmovn.v16i8.v8i16(<16 x i8> %x, <8 x i16> %c, i32 0, i32 0, i32 0)
%z = call <16 x i8> @llvm.arm.mve.vqmovn.v16i8.v8i16(<16 x i8> %y, <8 x i16> %d, i32 0, i32 0, i32 1)		%z = call <16 x i8> @llvm.arm.mve.vqmovn.v16i8.v8i16(<16 x i8> %y, <8 x i16> %d, i32 0, i32 0, i32 1)
ret <16 x i8> %z		ret <16 x i8> %z
}		}
Show All 21 Lines	;
%x = add <16 x i8> %a, %b		%x = add <16 x i8> %a, %b
%y = call <16 x i8> @llvm.arm.mve.vqmovn.predicated.v16i8.v8i16.v8i1(<16 x i8> %x, <8 x i16> %c, i32 0, i32 0, i32 0, <8 x i1> %p)		%y = call <16 x i8> @llvm.arm.mve.vqmovn.predicated.v16i8.v8i16.v8i1(<16 x i8> %x, <8 x i16> %c, i32 0, i32 0, i32 0, <8 x i1> %p)
%z = call <16 x i8> @llvm.arm.mve.vqmovn.predicated.v16i8.v8i16.v8i1(<16 x i8> %y, <8 x i16> %d, i32 0, i32 0, i32 1, <8 x i1> %p)		%z = call <16 x i8> @llvm.arm.mve.vqmovn.predicated.v16i8.v8i16.v8i1(<16 x i8> %y, <8 x i16> %d, i32 0, i32 0, i32 1, <8 x i1> %p)
ret <16 x i8> %z		ret <16 x i8> %z
}		}

define <8 x half> @test_cvtn_v8i16_bt(<8 x half> %a, <8 x half> %b, <4 x float> %c, <4 x float> %d) {		define <8 x half> @test_cvtn_v8i16_bt(<8 x half> %a, <8 x half> %b, <4 x float> %c, <4 x float> %d) {
; CHECK-LABEL: @test_cvtn_v8i16_bt(		; CHECK-LABEL: @test_cvtn_v8i16_bt(
; CHECK-NEXT: [[X:%.]] = fadd <8 x half> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[Y:%.]] = call <8 x half> @llvm.arm.mve.vcvt.narrow(<8 x half> poison, <4 x float> [[C:%.]], i32 0)
; CHECK-NEXT: [[Y:%.]] = call <8 x half> @llvm.arm.mve.vcvt.narrow(<8 x half> [[X]], <4 x float> [[C:%.]], i32 0)
; CHECK-NEXT: [[Z:%.]] = call <8 x half> @llvm.arm.mve.vcvt.narrow(<8 x half> [[Y]], <4 x float> [[D:%.]], i32 1)		; CHECK-NEXT: [[Z:%.]] = call <8 x half> @llvm.arm.mve.vcvt.narrow(<8 x half> [[Y]], <4 x float> [[D:%.]], i32 1)
; CHECK-NEXT: ret <8 x half> [[Z]]		; CHECK-NEXT: ret <8 x half> [[Z]]
;		;
%x = fadd <8 x half> %a, %b		%x = fadd <8 x half> %a, %b
%y = call <8 x half> @llvm.arm.mve.vcvt.narrow(<8 x half> %x, <4 x float> %c, i32 0)		%y = call <8 x half> @llvm.arm.mve.vcvt.narrow(<8 x half> %x, <4 x float> %c, i32 0)
%z = call <8 x half> @llvm.arm.mve.vcvt.narrow(<8 x half> %y, <4 x float> %d, i32 1)		%z = call <8 x half> @llvm.arm.mve.vcvt.narrow(<8 x half> %y, <4 x float> %d, i32 1)
ret <8 x half> %z		ret <8 x half> %z
}		}
Show All 29 Lines