This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Prefer SIMD&FP variant of clast[ab]
ClosedPublic

Authored by c-rhodes on Jul 11 2022, 3:51 AM.

Download Raw Diff

Details

Reviewers

paulwalker-arm
bsmith
peterwaller-arm
DavidTruby
efriedma

Commits

rG7c3cda551ac7: [AArch64][SVE] Prefer SIMD&FP variant of clast[ab]

Summary

The scalar variant with GPR source/dest has considerably higher latency
than the SIMD&FP scalar variant across a variety of micro-architectures:

Core           Scalar    SIMD&FP
--------------------------------
Neoverse V1     9 cyc      3 cyc
Neoverse N2     8 cyc      3 cyc
Cortex A510     8 cyc      4 cyc

Diff Detail

Unit TestsFailed

	Time	Test
	60,050 ms	x64 debian > ThreadSanitizer-x86_64.ThreadSanitizer-x86_64::restore_stack.cpp
	60,040 ms	x64 debian > libFuzzer.libFuzzer::fuzzer-leak.test
	60,050 ms	x64 debian > libFuzzer.libFuzzer::value-profile-load.test

Event Timeline

c-rhodes created this revision.Jul 11 2022, 3:51 AM

Herald added a reviewer: efriedma. · View Herald TranscriptJul 11 2022, 3:51 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: psnobl, hiraditya, kristof.beyls, tschuett. · View Herald Transcript

c-rhodes requested review of this revision.Jul 11 2022, 3:51 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 11 2022, 3:51 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Add full patch context.

georges added a subscriber: georges.Jul 11 2022, 4:27 AM

Harbormaster completed remote builds in B174634: Diff 443594.Jul 11 2022, 4:56 AM

Are you saying that it's faster to use clasta targeting a float register, then move the result to an integer register, rather than use the integer form directly? Or is the issue just that we want to split the operations in case we can simplify the resulting bitcast?

Do we expect SelectionDAG to combine a clasta+bitcast to a single instruction? Do we have test coverage for that?

Matt added a subscriber: Matt.Jul 11 2022, 3:27 PM

dtemirbulatov added a subscriber: dtemirbulatov.Jul 12 2022, 6:55 AM

In D129476#3643382, @efriedma wrote:

Are you saying that it's faster to use clasta targeting a float register, then move the result to an integer register, rather than use the integer form directly? Or is the issue just that we want to split the operations in case we can simplify the resulting bitcast?

The former, for the most part. If clast[ab] is inside a loop and is a loop-carried dependency it's considerably faster. If it's a straight bitcast-to-fp + clast[ab] + bitcast-to-int then that costs a cycle or two more depending on the micro-architecture, but in slightly more complicated code from what I've observed the SIMD&FP with bitcasts is as fast as the integer variant, if not faster.

Do we expect SelectionDAG to combine a clasta+bitcast to a single instruction? Do we have test coverage for that?

No, there'll be a mov to an integer register.

Please update the comment, then LGTM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
801	Maybe put a bit of the explanation you just gave into a comment here, for reference.

This revision is now accepted and ready to land.Jul 12 2022, 9:18 AM

In D129476#3645667, @efriedma wrote:

Please update the comment, then LGTM

Done, cheers Eli.

Revision Contents

Path

Size

clang/

test/

CodeGen/

aarch64-sve-intrinsics/

acle_sve_clasta.c

84 lines

acle_sve_clastb.c

84 lines

llvm/

lib/

Target/

AArch64/

AArch64TargetTransformInfo.cpp

41 lines

test/

Transforms/

InstCombine/

AArch64/

sve-intrinsic-opts-clast.ll

44 lines

Diff 443594

clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_clasta.c

	Show First 20 Lines • Show All 209 Lines • ▼ Show 20 Lines
	int8_t test_svclasta_n_s8(svbool_t pg, int8_t fallback, svint8_t data)			int8_t test_svclasta_n_s8(svbool_t pg, int8_t fallback, svint8_t data)
	{			{
	return SVE_ACLE_FUNC(svclasta,_n_s8,,)(pg, fallback, data);			return SVE_ACLE_FUNC(svclasta,_n_s8,,)(pg, fallback, data);
	}			}

	// CHECK-LABEL: @test_svclasta_n_s16(			// CHECK-LABEL: @test_svclasta_n_s16(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[TMP0:%.]] = call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.]])			// CHECK-NEXT: [[TMP0:%.]] = call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.]])
	// CHECK-NEXT: [[TMP1:%.]] = call i16 @llvm.aarch64.sve.clasta.n.nxv8i16(<vscale x 8 x i1> [[TMP0]], i16 [[FALLBACK:%.]], <vscale x 8 x i16> [[DATA:%.*]])			// CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[FALLBACK:%.]] to half
	// CHECK-NEXT: ret i16 [[TMP1]]			// CHECK-NEXT: [[TMP2:%.]] = bitcast <vscale x 8 x i16> [[DATA:%.]] to <vscale x 8 x half>
				// CHECK-NEXT: [[TMP3:%.*]] = call half @llvm.aarch64.sve.clasta.n.nxv8f16(<vscale x 8 x i1> [[TMP0]], half [[TMP1]], <vscale x 8 x half> [[TMP2]])
				// CHECK-NEXT: [[TMP4:%.*]] = bitcast half [[TMP3]] to i16
				// CHECK-NEXT: ret i16 [[TMP4]]
	//			//
	// CPP-CHECK-LABEL: @_Z19test_svclasta_n_s16u10__SVBool_tsu11__SVInt16_t(			// CPP-CHECK-LABEL: @_Z19test_svclasta_n_s16u10__SVBool_tsu11__SVInt16_t(
	// CPP-CHECK-NEXT: entry:			// CPP-CHECK-NEXT: entry:
	// CPP-CHECK-NEXT: [[TMP0:%.]] = call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.]])			// CPP-CHECK-NEXT: [[TMP0:%.]] = call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.]])
	// CPP-CHECK-NEXT: [[TMP1:%.]] = call i16 @llvm.aarch64.sve.clasta.n.nxv8i16(<vscale x 8 x i1> [[TMP0]], i16 [[FALLBACK:%.]], <vscale x 8 x i16> [[DATA:%.*]])			// CPP-CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[FALLBACK:%.]] to half
	// CPP-CHECK-NEXT: ret i16 [[TMP1]]			// CPP-CHECK-NEXT: [[TMP2:%.]] = bitcast <vscale x 8 x i16> [[DATA:%.]] to <vscale x 8 x half>
				// CPP-CHECK-NEXT: [[TMP3:%.*]] = call half @llvm.aarch64.sve.clasta.n.nxv8f16(<vscale x 8 x i1> [[TMP0]], half [[TMP1]], <vscale x 8 x half> [[TMP2]])
				// CPP-CHECK-NEXT: [[TMP4:%.*]] = bitcast half [[TMP3]] to i16
				// CPP-CHECK-NEXT: ret i16 [[TMP4]]
	//			//
	int16_t test_svclasta_n_s16(svbool_t pg, int16_t fallback, svint16_t data)			int16_t test_svclasta_n_s16(svbool_t pg, int16_t fallback, svint16_t data)
	{			{
	return SVE_ACLE_FUNC(svclasta,_n_s16,,)(pg, fallback, data);			return SVE_ACLE_FUNC(svclasta,_n_s16,,)(pg, fallback, data);
	}			}

	// CHECK-LABEL: @test_svclasta_n_s32(			// CHECK-LABEL: @test_svclasta_n_s32(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[TMP0:%.]] = call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.]])			// CHECK-NEXT: [[TMP0:%.]] = call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.]])
	// CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.aarch64.sve.clasta.n.nxv4i32(<vscale x 4 x i1> [[TMP0]], i32 [[FALLBACK:%.]], <vscale x 4 x i32> [[DATA:%.*]])			// CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[FALLBACK:%.]] to float
	// CHECK-NEXT: ret i32 [[TMP1]]			// CHECK-NEXT: [[TMP2:%.]] = bitcast <vscale x 4 x i32> [[DATA:%.]] to <vscale x 4 x float>
				// CHECK-NEXT: [[TMP3:%.*]] = call float @llvm.aarch64.sve.clasta.n.nxv4f32(<vscale x 4 x i1> [[TMP0]], float [[TMP1]], <vscale x 4 x float> [[TMP2]])
				// CHECK-NEXT: [[TMP4:%.*]] = bitcast float [[TMP3]] to i32
				// CHECK-NEXT: ret i32 [[TMP4]]
	//			//
	// CPP-CHECK-LABEL: @_Z19test_svclasta_n_s32u10__SVBool_tiu11__SVInt32_t(			// CPP-CHECK-LABEL: @_Z19test_svclasta_n_s32u10__SVBool_tiu11__SVInt32_t(
	// CPP-CHECK-NEXT: entry:			// CPP-CHECK-NEXT: entry:
	// CPP-CHECK-NEXT: [[TMP0:%.]] = call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.]])			// CPP-CHECK-NEXT: [[TMP0:%.]] = call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.]])
	// CPP-CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.aarch64.sve.clasta.n.nxv4i32(<vscale x 4 x i1> [[TMP0]], i32 [[FALLBACK:%.]], <vscale x 4 x i32> [[DATA:%.*]])			// CPP-CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[FALLBACK:%.]] to float
	// CPP-CHECK-NEXT: ret i32 [[TMP1]]			// CPP-CHECK-NEXT: [[TMP2:%.]] = bitcast <vscale x 4 x i32> [[DATA:%.]] to <vscale x 4 x float>
				// CPP-CHECK-NEXT: [[TMP3:%.*]] = call float @llvm.aarch64.sve.clasta.n.nxv4f32(<vscale x 4 x i1> [[TMP0]], float [[TMP1]], <vscale x 4 x float> [[TMP2]])
				// CPP-CHECK-NEXT: [[TMP4:%.*]] = bitcast float [[TMP3]] to i32
				// CPP-CHECK-NEXT: ret i32 [[TMP4]]
	//			//
	int32_t test_svclasta_n_s32(svbool_t pg, int32_t fallback, svint32_t data)			int32_t test_svclasta_n_s32(svbool_t pg, int32_t fallback, svint32_t data)
	{			{
	return SVE_ACLE_FUNC(svclasta,_n_s32,,)(pg, fallback, data);			return SVE_ACLE_FUNC(svclasta,_n_s32,,)(pg, fallback, data);
	}			}

	// CHECK-LABEL: @test_svclasta_n_s64(			// CHECK-LABEL: @test_svclasta_n_s64(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[TMP0:%.]] = call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.]])			// CHECK-NEXT: [[TMP0:%.]] = call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.]])
	// CHECK-NEXT: [[TMP1:%.]] = call i64 @llvm.aarch64.sve.clasta.n.nxv2i64(<vscale x 2 x i1> [[TMP0]], i64 [[FALLBACK:%.]], <vscale x 2 x i64> [[DATA:%.*]])			// CHECK-NEXT: [[TMP1:%.]] = bitcast i64 [[FALLBACK:%.]] to double
	// CHECK-NEXT: ret i64 [[TMP1]]			// CHECK-NEXT: [[TMP2:%.]] = bitcast <vscale x 2 x i64> [[DATA:%.]] to <vscale x 2 x double>
				// CHECK-NEXT: [[TMP3:%.*]] = call double @llvm.aarch64.sve.clasta.n.nxv2f64(<vscale x 2 x i1> [[TMP0]], double [[TMP1]], <vscale x 2 x double> [[TMP2]])
				// CHECK-NEXT: [[TMP4:%.*]] = bitcast double [[TMP3]] to i64
				// CHECK-NEXT: ret i64 [[TMP4]]
	//			//
	// CPP-CHECK-LABEL: @_Z19test_svclasta_n_s64u10__SVBool_tlu11__SVInt64_t(			// CPP-CHECK-LABEL: @_Z19test_svclasta_n_s64u10__SVBool_tlu11__SVInt64_t(
	// CPP-CHECK-NEXT: entry:			// CPP-CHECK-NEXT: entry:
	// CPP-CHECK-NEXT: [[TMP0:%.]] = call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.]])			// CPP-CHECK-NEXT: [[TMP0:%.]] = call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.]])
	// CPP-CHECK-NEXT: [[TMP1:%.]] = call i64 @llvm.aarch64.sve.clasta.n.nxv2i64(<vscale x 2 x i1> [[TMP0]], i64 [[FALLBACK:%.]], <vscale x 2 x i64> [[DATA:%.*]])			// CPP-CHECK-NEXT: [[TMP1:%.]] = bitcast i64 [[FALLBACK:%.]] to double
	// CPP-CHECK-NEXT: ret i64 [[TMP1]]			// CPP-CHECK-NEXT: [[TMP2:%.]] = bitcast <vscale x 2 x i64> [[DATA:%.]] to <vscale x 2 x double>
				// CPP-CHECK-NEXT: [[TMP3:%.*]] = call double @llvm.aarch64.sve.clasta.n.nxv2f64(<vscale x 2 x i1> [[TMP0]], double [[TMP1]], <vscale x 2 x double> [[TMP2]])
				// CPP-CHECK-NEXT: [[TMP4:%.*]] = bitcast double [[TMP3]] to i64
				// CPP-CHECK-NEXT: ret i64 [[TMP4]]
	//			//
	int64_t test_svclasta_n_s64(svbool_t pg, int64_t fallback, svint64_t data)			int64_t test_svclasta_n_s64(svbool_t pg, int64_t fallback, svint64_t data)
	{			{
	return SVE_ACLE_FUNC(svclasta,_n_s64,,)(pg, fallback, data);			return SVE_ACLE_FUNC(svclasta,_n_s64,,)(pg, fallback, data);
	}			}

	// CHECK-LABEL: @test_svclasta_n_u8(			// CHECK-LABEL: @test_svclasta_n_u8(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[TMP0:%.]] = call i8 @llvm.aarch64.sve.clasta.n.nxv16i8(<vscale x 16 x i1> [[PG:%.]], i8 [[FALLBACK:%.]], <vscale x 16 x i8> [[DATA:%.]])			// CHECK-NEXT: [[TMP0:%.]] = call i8 @llvm.aarch64.sve.clasta.n.nxv16i8(<vscale x 16 x i1> [[PG:%.]], i8 [[FALLBACK:%.]], <vscale x 16 x i8> [[DATA:%.]])
	// CHECK-NEXT: ret i8 [[TMP0]]			// CHECK-NEXT: ret i8 [[TMP0]]
	//			//
	// CPP-CHECK-LABEL: @_Z18test_svclasta_n_u8u10__SVBool_thu11__SVUint8_t(			// CPP-CHECK-LABEL: @_Z18test_svclasta_n_u8u10__SVBool_thu11__SVUint8_t(
	// CPP-CHECK-NEXT: entry:			// CPP-CHECK-NEXT: entry:
	// CPP-CHECK-NEXT: [[TMP0:%.]] = call i8 @llvm.aarch64.sve.clasta.n.nxv16i8(<vscale x 16 x i1> [[PG:%.]], i8 [[FALLBACK:%.]], <vscale x 16 x i8> [[DATA:%.]])			// CPP-CHECK-NEXT: [[TMP0:%.]] = call i8 @llvm.aarch64.sve.clasta.n.nxv16i8(<vscale x 16 x i1> [[PG:%.]], i8 [[FALLBACK:%.]], <vscale x 16 x i8> [[DATA:%.]])
	// CPP-CHECK-NEXT: ret i8 [[TMP0]]			// CPP-CHECK-NEXT: ret i8 [[TMP0]]
	//			//
	uint8_t test_svclasta_n_u8(svbool_t pg, uint8_t fallback, svuint8_t data)			uint8_t test_svclasta_n_u8(svbool_t pg, uint8_t fallback, svuint8_t data)
	{			{
	return SVE_ACLE_FUNC(svclasta,_n_u8,,)(pg, fallback, data);			return SVE_ACLE_FUNC(svclasta,_n_u8,,)(pg, fallback, data);
	}			}

	// CHECK-LABEL: @test_svclasta_n_u16(			// CHECK-LABEL: @test_svclasta_n_u16(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[TMP0:%.]] = call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.]])			// CHECK-NEXT: [[TMP0:%.]] = call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.]])
	// CHECK-NEXT: [[TMP1:%.]] = call i16 @llvm.aarch64.sve.clasta.n.nxv8i16(<vscale x 8 x i1> [[TMP0]], i16 [[FALLBACK:%.]], <vscale x 8 x i16> [[DATA:%.*]])			// CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[FALLBACK:%.]] to half
	// CHECK-NEXT: ret i16 [[TMP1]]			// CHECK-NEXT: [[TMP2:%.]] = bitcast <vscale x 8 x i16> [[DATA:%.]] to <vscale x 8 x half>
				// CHECK-NEXT: [[TMP3:%.*]] = call half @llvm.aarch64.sve.clasta.n.nxv8f16(<vscale x 8 x i1> [[TMP0]], half [[TMP1]], <vscale x 8 x half> [[TMP2]])
				// CHECK-NEXT: [[TMP4:%.*]] = bitcast half [[TMP3]] to i16
				// CHECK-NEXT: ret i16 [[TMP4]]
	//			//
	// CPP-CHECK-LABEL: @_Z19test_svclasta_n_u16u10__SVBool_ttu12__SVUint16_t(			// CPP-CHECK-LABEL: @_Z19test_svclasta_n_u16u10__SVBool_ttu12__SVUint16_t(
	// CPP-CHECK-NEXT: entry:			// CPP-CHECK-NEXT: entry:
	// CPP-CHECK-NEXT: [[TMP0:%.]] = call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.]])			// CPP-CHECK-NEXT: [[TMP0:%.]] = call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.]])
	// CPP-CHECK-NEXT: [[TMP1:%.]] = call i16 @llvm.aarch64.sve.clasta.n.nxv8i16(<vscale x 8 x i1> [[TMP0]], i16 [[FALLBACK:%.]], <vscale x 8 x i16> [[DATA:%.*]])			// CPP-CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[FALLBACK:%.]] to half
	// CPP-CHECK-NEXT: ret i16 [[TMP1]]			// CPP-CHECK-NEXT: [[TMP2:%.]] = bitcast <vscale x 8 x i16> [[DATA:%.]] to <vscale x 8 x half>
				// CPP-CHECK-NEXT: [[TMP3:%.*]] = call half @llvm.aarch64.sve.clasta.n.nxv8f16(<vscale x 8 x i1> [[TMP0]], half [[TMP1]], <vscale x 8 x half> [[TMP2]])
				// CPP-CHECK-NEXT: [[TMP4:%.*]] = bitcast half [[TMP3]] to i16
				// CPP-CHECK-NEXT: ret i16 [[TMP4]]
	//			//
	uint16_t test_svclasta_n_u16(svbool_t pg, uint16_t fallback, svuint16_t data)			uint16_t test_svclasta_n_u16(svbool_t pg, uint16_t fallback, svuint16_t data)
	{			{
	return SVE_ACLE_FUNC(svclasta,_n_u16,,)(pg, fallback, data);			return SVE_ACLE_FUNC(svclasta,_n_u16,,)(pg, fallback, data);
	}			}

	// CHECK-LABEL: @test_svclasta_n_u32(			// CHECK-LABEL: @test_svclasta_n_u32(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[TMP0:%.]] = call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.]])			// CHECK-NEXT: [[TMP0:%.]] = call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.]])
	// CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.aarch64.sve.clasta.n.nxv4i32(<vscale x 4 x i1> [[TMP0]], i32 [[FALLBACK:%.]], <vscale x 4 x i32> [[DATA:%.*]])			// CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[FALLBACK:%.]] to float
	// CHECK-NEXT: ret i32 [[TMP1]]			// CHECK-NEXT: [[TMP2:%.]] = bitcast <vscale x 4 x i32> [[DATA:%.]] to <vscale x 4 x float>
				// CHECK-NEXT: [[TMP3:%.*]] = call float @llvm.aarch64.sve.clasta.n.nxv4f32(<vscale x 4 x i1> [[TMP0]], float [[TMP1]], <vscale x 4 x float> [[TMP2]])
				// CHECK-NEXT: [[TMP4:%.*]] = bitcast float [[TMP3]] to i32
				// CHECK-NEXT: ret i32 [[TMP4]]
	//			//
	// CPP-CHECK-LABEL: @_Z19test_svclasta_n_u32u10__SVBool_tju12__SVUint32_t(			// CPP-CHECK-LABEL: @_Z19test_svclasta_n_u32u10__SVBool_tju12__SVUint32_t(
	// CPP-CHECK-NEXT: entry:			// CPP-CHECK-NEXT: entry:
	// CPP-CHECK-NEXT: [[TMP0:%.]] = call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.]])			// CPP-CHECK-NEXT: [[TMP0:%.]] = call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.]])
	// CPP-CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.aarch64.sve.clasta.n.nxv4i32(<vscale x 4 x i1> [[TMP0]], i32 [[FALLBACK:%.]], <vscale x 4 x i32> [[DATA:%.*]])			// CPP-CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[FALLBACK:%.]] to float
	// CPP-CHECK-NEXT: ret i32 [[TMP1]]			// CPP-CHECK-NEXT: [[TMP2:%.]] = bitcast <vscale x 4 x i32> [[DATA:%.]] to <vscale x 4 x float>
				// CPP-CHECK-NEXT: [[TMP3:%.*]] = call float @llvm.aarch64.sve.clasta.n.nxv4f32(<vscale x 4 x i1> [[TMP0]], float [[TMP1]], <vscale x 4 x float> [[TMP2]])
				// CPP-CHECK-NEXT: [[TMP4:%.*]] = bitcast float [[TMP3]] to i32
				// CPP-CHECK-NEXT: ret i32 [[TMP4]]
	//			//
	uint32_t test_svclasta_n_u32(svbool_t pg, uint32_t fallback, svuint32_t data)			uint32_t test_svclasta_n_u32(svbool_t pg, uint32_t fallback, svuint32_t data)
	{			{
	return SVE_ACLE_FUNC(svclasta,_n_u32,,)(pg, fallback, data);			return SVE_ACLE_FUNC(svclasta,_n_u32,,)(pg, fallback, data);
	}			}

	// CHECK-LABEL: @test_svclasta_n_u64(			// CHECK-LABEL: @test_svclasta_n_u64(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[TMP0:%.]] = call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.]])			// CHECK-NEXT: [[TMP0:%.]] = call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.]])
	// CHECK-NEXT: [[TMP1:%.]] = call i64 @llvm.aarch64.sve.clasta.n.nxv2i64(<vscale x 2 x i1> [[TMP0]], i64 [[FALLBACK:%.]], <vscale x 2 x i64> [[DATA:%.*]])			// CHECK-NEXT: [[TMP1:%.]] = bitcast i64 [[FALLBACK:%.]] to double
	// CHECK-NEXT: ret i64 [[TMP1]]			// CHECK-NEXT: [[TMP2:%.]] = bitcast <vscale x 2 x i64> [[DATA:%.]] to <vscale x 2 x double>
				// CHECK-NEXT: [[TMP3:%.*]] = call double @llvm.aarch64.sve.clasta.n.nxv2f64(<vscale x 2 x i1> [[TMP0]], double [[TMP1]], <vscale x 2 x double> [[TMP2]])
				// CHECK-NEXT: [[TMP4:%.*]] = bitcast double [[TMP3]] to i64
				// CHECK-NEXT: ret i64 [[TMP4]]
	//			//
	// CPP-CHECK-LABEL: @_Z19test_svclasta_n_u64u10__SVBool_tmu12__SVUint64_t(			// CPP-CHECK-LABEL: @_Z19test_svclasta_n_u64u10__SVBool_tmu12__SVUint64_t(
	// CPP-CHECK-NEXT: entry:			// CPP-CHECK-NEXT: entry:
	// CPP-CHECK-NEXT: [[TMP0:%.]] = call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.]])			// CPP-CHECK-NEXT: [[TMP0:%.]] = call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.]])
	// CPP-CHECK-NEXT: [[TMP1:%.]] = call i64 @llvm.aarch64.sve.clasta.n.nxv2i64(<vscale x 2 x i1> [[TMP0]], i64 [[FALLBACK:%.]], <vscale x 2 x i64> [[DATA:%.*]])			// CPP-CHECK-NEXT: [[TMP1:%.]] = bitcast i64 [[FALLBACK:%.]] to double
	// CPP-CHECK-NEXT: ret i64 [[TMP1]]			// CPP-CHECK-NEXT: [[TMP2:%.]] = bitcast <vscale x 2 x i64> [[DATA:%.]] to <vscale x 2 x double>
				// CPP-CHECK-NEXT: [[TMP3:%.*]] = call double @llvm.aarch64.sve.clasta.n.nxv2f64(<vscale x 2 x i1> [[TMP0]], double [[TMP1]], <vscale x 2 x double> [[TMP2]])
				// CPP-CHECK-NEXT: [[TMP4:%.*]] = bitcast double [[TMP3]] to i64
				// CPP-CHECK-NEXT: ret i64 [[TMP4]]
	//			//
	uint64_t test_svclasta_n_u64(svbool_t pg, uint64_t fallback, svuint64_t data)			uint64_t test_svclasta_n_u64(svbool_t pg, uint64_t fallback, svuint64_t data)
	{			{
	return SVE_ACLE_FUNC(svclasta,_n_u64,,)(pg, fallback, data);			return SVE_ACLE_FUNC(svclasta,_n_u64,,)(pg, fallback, data);
	}			}

	// CHECK-LABEL: @test_svclasta_n_f16(			// CHECK-LABEL: @test_svclasta_n_f16(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_clastb.c

	Show First 20 Lines • Show All 209 Lines • ▼ Show 20 Lines
	int8_t test_svclastb_n_s8(svbool_t pg, int8_t fallback, svint8_t data)			int8_t test_svclastb_n_s8(svbool_t pg, int8_t fallback, svint8_t data)
	{			{
	return SVE_ACLE_FUNC(svclastb,_n_s8,,)(pg, fallback, data);			return SVE_ACLE_FUNC(svclastb,_n_s8,,)(pg, fallback, data);
	}			}

	// CHECK-LABEL: @test_svclastb_n_s16(			// CHECK-LABEL: @test_svclastb_n_s16(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[TMP0:%.]] = call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.]])			// CHECK-NEXT: [[TMP0:%.]] = call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.]])
	// CHECK-NEXT: [[TMP1:%.]] = call i16 @llvm.aarch64.sve.clastb.n.nxv8i16(<vscale x 8 x i1> [[TMP0]], i16 [[FALLBACK:%.]], <vscale x 8 x i16> [[DATA:%.*]])			// CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[FALLBACK:%.]] to half
	// CHECK-NEXT: ret i16 [[TMP1]]			// CHECK-NEXT: [[TMP2:%.]] = bitcast <vscale x 8 x i16> [[DATA:%.]] to <vscale x 8 x half>
				// CHECK-NEXT: [[TMP3:%.*]] = call half @llvm.aarch64.sve.clastb.n.nxv8f16(<vscale x 8 x i1> [[TMP0]], half [[TMP1]], <vscale x 8 x half> [[TMP2]])
				// CHECK-NEXT: [[TMP4:%.*]] = bitcast half [[TMP3]] to i16
				// CHECK-NEXT: ret i16 [[TMP4]]
	//			//
	// CPP-CHECK-LABEL: @_Z19test_svclastb_n_s16u10__SVBool_tsu11__SVInt16_t(			// CPP-CHECK-LABEL: @_Z19test_svclastb_n_s16u10__SVBool_tsu11__SVInt16_t(
	// CPP-CHECK-NEXT: entry:			// CPP-CHECK-NEXT: entry:
	// CPP-CHECK-NEXT: [[TMP0:%.]] = call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.]])			// CPP-CHECK-NEXT: [[TMP0:%.]] = call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.]])
	// CPP-CHECK-NEXT: [[TMP1:%.]] = call i16 @llvm.aarch64.sve.clastb.n.nxv8i16(<vscale x 8 x i1> [[TMP0]], i16 [[FALLBACK:%.]], <vscale x 8 x i16> [[DATA:%.*]])			// CPP-CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[FALLBACK:%.]] to half
	// CPP-CHECK-NEXT: ret i16 [[TMP1]]			// CPP-CHECK-NEXT: [[TMP2:%.]] = bitcast <vscale x 8 x i16> [[DATA:%.]] to <vscale x 8 x half>
				// CPP-CHECK-NEXT: [[TMP3:%.*]] = call half @llvm.aarch64.sve.clastb.n.nxv8f16(<vscale x 8 x i1> [[TMP0]], half [[TMP1]], <vscale x 8 x half> [[TMP2]])
				// CPP-CHECK-NEXT: [[TMP4:%.*]] = bitcast half [[TMP3]] to i16
				// CPP-CHECK-NEXT: ret i16 [[TMP4]]
	//			//
	int16_t test_svclastb_n_s16(svbool_t pg, int16_t fallback, svint16_t data)			int16_t test_svclastb_n_s16(svbool_t pg, int16_t fallback, svint16_t data)
	{			{
	return SVE_ACLE_FUNC(svclastb,_n_s16,,)(pg, fallback, data);			return SVE_ACLE_FUNC(svclastb,_n_s16,,)(pg, fallback, data);
	}			}

	// CHECK-LABEL: @test_svclastb_n_s32(			// CHECK-LABEL: @test_svclastb_n_s32(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[TMP0:%.]] = call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.]])			// CHECK-NEXT: [[TMP0:%.]] = call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.]])
	// CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.aarch64.sve.clastb.n.nxv4i32(<vscale x 4 x i1> [[TMP0]], i32 [[FALLBACK:%.]], <vscale x 4 x i32> [[DATA:%.*]])			// CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[FALLBACK:%.]] to float
	// CHECK-NEXT: ret i32 [[TMP1]]			// CHECK-NEXT: [[TMP2:%.]] = bitcast <vscale x 4 x i32> [[DATA:%.]] to <vscale x 4 x float>
				// CHECK-NEXT: [[TMP3:%.*]] = call float @llvm.aarch64.sve.clastb.n.nxv4f32(<vscale x 4 x i1> [[TMP0]], float [[TMP1]], <vscale x 4 x float> [[TMP2]])
				// CHECK-NEXT: [[TMP4:%.*]] = bitcast float [[TMP3]] to i32
				// CHECK-NEXT: ret i32 [[TMP4]]
	//			//
	// CPP-CHECK-LABEL: @_Z19test_svclastb_n_s32u10__SVBool_tiu11__SVInt32_t(			// CPP-CHECK-LABEL: @_Z19test_svclastb_n_s32u10__SVBool_tiu11__SVInt32_t(
	// CPP-CHECK-NEXT: entry:			// CPP-CHECK-NEXT: entry:
	// CPP-CHECK-NEXT: [[TMP0:%.]] = call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.]])			// CPP-CHECK-NEXT: [[TMP0:%.]] = call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.]])
	// CPP-CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.aarch64.sve.clastb.n.nxv4i32(<vscale x 4 x i1> [[TMP0]], i32 [[FALLBACK:%.]], <vscale x 4 x i32> [[DATA:%.*]])			// CPP-CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[FALLBACK:%.]] to float
	// CPP-CHECK-NEXT: ret i32 [[TMP1]]			// CPP-CHECK-NEXT: [[TMP2:%.]] = bitcast <vscale x 4 x i32> [[DATA:%.]] to <vscale x 4 x float>
				// CPP-CHECK-NEXT: [[TMP3:%.*]] = call float @llvm.aarch64.sve.clastb.n.nxv4f32(<vscale x 4 x i1> [[TMP0]], float [[TMP1]], <vscale x 4 x float> [[TMP2]])
				// CPP-CHECK-NEXT: [[TMP4:%.*]] = bitcast float [[TMP3]] to i32
				// CPP-CHECK-NEXT: ret i32 [[TMP4]]
	//			//
	int32_t test_svclastb_n_s32(svbool_t pg, int32_t fallback, svint32_t data)			int32_t test_svclastb_n_s32(svbool_t pg, int32_t fallback, svint32_t data)
	{			{
	return SVE_ACLE_FUNC(svclastb,_n_s32,,)(pg, fallback, data);			return SVE_ACLE_FUNC(svclastb,_n_s32,,)(pg, fallback, data);
	}			}

	// CHECK-LABEL: @test_svclastb_n_s64(			// CHECK-LABEL: @test_svclastb_n_s64(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[TMP0:%.]] = call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.]])			// CHECK-NEXT: [[TMP0:%.]] = call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.]])
	// CHECK-NEXT: [[TMP1:%.]] = call i64 @llvm.aarch64.sve.clastb.n.nxv2i64(<vscale x 2 x i1> [[TMP0]], i64 [[FALLBACK:%.]], <vscale x 2 x i64> [[DATA:%.*]])			// CHECK-NEXT: [[TMP1:%.]] = bitcast i64 [[FALLBACK:%.]] to double
	// CHECK-NEXT: ret i64 [[TMP1]]			// CHECK-NEXT: [[TMP2:%.]] = bitcast <vscale x 2 x i64> [[DATA:%.]] to <vscale x 2 x double>
				// CHECK-NEXT: [[TMP3:%.*]] = call double @llvm.aarch64.sve.clastb.n.nxv2f64(<vscale x 2 x i1> [[TMP0]], double [[TMP1]], <vscale x 2 x double> [[TMP2]])
				// CHECK-NEXT: [[TMP4:%.*]] = bitcast double [[TMP3]] to i64
				// CHECK-NEXT: ret i64 [[TMP4]]
	//			//
	// CPP-CHECK-LABEL: @_Z19test_svclastb_n_s64u10__SVBool_tlu11__SVInt64_t(			// CPP-CHECK-LABEL: @_Z19test_svclastb_n_s64u10__SVBool_tlu11__SVInt64_t(
	// CPP-CHECK-NEXT: entry:			// CPP-CHECK-NEXT: entry:
	// CPP-CHECK-NEXT: [[TMP0:%.]] = call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.]])			// CPP-CHECK-NEXT: [[TMP0:%.]] = call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.]])
	// CPP-CHECK-NEXT: [[TMP1:%.]] = call i64 @llvm.aarch64.sve.clastb.n.nxv2i64(<vscale x 2 x i1> [[TMP0]], i64 [[FALLBACK:%.]], <vscale x 2 x i64> [[DATA:%.*]])			// CPP-CHECK-NEXT: [[TMP1:%.]] = bitcast i64 [[FALLBACK:%.]] to double
	// CPP-CHECK-NEXT: ret i64 [[TMP1]]			// CPP-CHECK-NEXT: [[TMP2:%.]] = bitcast <vscale x 2 x i64> [[DATA:%.]] to <vscale x 2 x double>
				// CPP-CHECK-NEXT: [[TMP3:%.*]] = call double @llvm.aarch64.sve.clastb.n.nxv2f64(<vscale x 2 x i1> [[TMP0]], double [[TMP1]], <vscale x 2 x double> [[TMP2]])
				// CPP-CHECK-NEXT: [[TMP4:%.*]] = bitcast double [[TMP3]] to i64
				// CPP-CHECK-NEXT: ret i64 [[TMP4]]
	//			//
	int64_t test_svclastb_n_s64(svbool_t pg, int64_t fallback, svint64_t data)			int64_t test_svclastb_n_s64(svbool_t pg, int64_t fallback, svint64_t data)
	{			{
	return SVE_ACLE_FUNC(svclastb,_n_s64,,)(pg, fallback, data);			return SVE_ACLE_FUNC(svclastb,_n_s64,,)(pg, fallback, data);
	}			}

	// CHECK-LABEL: @test_svclastb_n_u8(			// CHECK-LABEL: @test_svclastb_n_u8(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[TMP0:%.]] = call i8 @llvm.aarch64.sve.clastb.n.nxv16i8(<vscale x 16 x i1> [[PG:%.]], i8 [[FALLBACK:%.]], <vscale x 16 x i8> [[DATA:%.]])			// CHECK-NEXT: [[TMP0:%.]] = call i8 @llvm.aarch64.sve.clastb.n.nxv16i8(<vscale x 16 x i1> [[PG:%.]], i8 [[FALLBACK:%.]], <vscale x 16 x i8> [[DATA:%.]])
	// CHECK-NEXT: ret i8 [[TMP0]]			// CHECK-NEXT: ret i8 [[TMP0]]
	//			//
	// CPP-CHECK-LABEL: @_Z18test_svclastb_n_u8u10__SVBool_thu11__SVUint8_t(			// CPP-CHECK-LABEL: @_Z18test_svclastb_n_u8u10__SVBool_thu11__SVUint8_t(
	// CPP-CHECK-NEXT: entry:			// CPP-CHECK-NEXT: entry:
	// CPP-CHECK-NEXT: [[TMP0:%.]] = call i8 @llvm.aarch64.sve.clastb.n.nxv16i8(<vscale x 16 x i1> [[PG:%.]], i8 [[FALLBACK:%.]], <vscale x 16 x i8> [[DATA:%.]])			// CPP-CHECK-NEXT: [[TMP0:%.]] = call i8 @llvm.aarch64.sve.clastb.n.nxv16i8(<vscale x 16 x i1> [[PG:%.]], i8 [[FALLBACK:%.]], <vscale x 16 x i8> [[DATA:%.]])
	// CPP-CHECK-NEXT: ret i8 [[TMP0]]			// CPP-CHECK-NEXT: ret i8 [[TMP0]]
	//			//
	uint8_t test_svclastb_n_u8(svbool_t pg, uint8_t fallback, svuint8_t data)			uint8_t test_svclastb_n_u8(svbool_t pg, uint8_t fallback, svuint8_t data)
	{			{
	return SVE_ACLE_FUNC(svclastb,_n_u8,,)(pg, fallback, data);			return SVE_ACLE_FUNC(svclastb,_n_u8,,)(pg, fallback, data);
	}			}

	// CHECK-LABEL: @test_svclastb_n_u16(			// CHECK-LABEL: @test_svclastb_n_u16(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[TMP0:%.]] = call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.]])			// CHECK-NEXT: [[TMP0:%.]] = call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.]])
	// CHECK-NEXT: [[TMP1:%.]] = call i16 @llvm.aarch64.sve.clastb.n.nxv8i16(<vscale x 8 x i1> [[TMP0]], i16 [[FALLBACK:%.]], <vscale x 8 x i16> [[DATA:%.*]])			// CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[FALLBACK:%.]] to half
	// CHECK-NEXT: ret i16 [[TMP1]]			// CHECK-NEXT: [[TMP2:%.]] = bitcast <vscale x 8 x i16> [[DATA:%.]] to <vscale x 8 x half>
				// CHECK-NEXT: [[TMP3:%.*]] = call half @llvm.aarch64.sve.clastb.n.nxv8f16(<vscale x 8 x i1> [[TMP0]], half [[TMP1]], <vscale x 8 x half> [[TMP2]])
				// CHECK-NEXT: [[TMP4:%.*]] = bitcast half [[TMP3]] to i16
				// CHECK-NEXT: ret i16 [[TMP4]]
	//			//
	// CPP-CHECK-LABEL: @_Z19test_svclastb_n_u16u10__SVBool_ttu12__SVUint16_t(			// CPP-CHECK-LABEL: @_Z19test_svclastb_n_u16u10__SVBool_ttu12__SVUint16_t(
	// CPP-CHECK-NEXT: entry:			// CPP-CHECK-NEXT: entry:
	// CPP-CHECK-NEXT: [[TMP0:%.]] = call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.]])			// CPP-CHECK-NEXT: [[TMP0:%.]] = call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[PG:%.]])
	// CPP-CHECK-NEXT: [[TMP1:%.]] = call i16 @llvm.aarch64.sve.clastb.n.nxv8i16(<vscale x 8 x i1> [[TMP0]], i16 [[FALLBACK:%.]], <vscale x 8 x i16> [[DATA:%.*]])			// CPP-CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[FALLBACK:%.]] to half
	// CPP-CHECK-NEXT: ret i16 [[TMP1]]			// CPP-CHECK-NEXT: [[TMP2:%.]] = bitcast <vscale x 8 x i16> [[DATA:%.]] to <vscale x 8 x half>
				// CPP-CHECK-NEXT: [[TMP3:%.*]] = call half @llvm.aarch64.sve.clastb.n.nxv8f16(<vscale x 8 x i1> [[TMP0]], half [[TMP1]], <vscale x 8 x half> [[TMP2]])
				// CPP-CHECK-NEXT: [[TMP4:%.*]] = bitcast half [[TMP3]] to i16
				// CPP-CHECK-NEXT: ret i16 [[TMP4]]
	//			//
	uint16_t test_svclastb_n_u16(svbool_t pg, uint16_t fallback, svuint16_t data)			uint16_t test_svclastb_n_u16(svbool_t pg, uint16_t fallback, svuint16_t data)
	{			{
	return SVE_ACLE_FUNC(svclastb,_n_u16,,)(pg, fallback, data);			return SVE_ACLE_FUNC(svclastb,_n_u16,,)(pg, fallback, data);
	}			}

	// CHECK-LABEL: @test_svclastb_n_u32(			// CHECK-LABEL: @test_svclastb_n_u32(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[TMP0:%.]] = call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.]])			// CHECK-NEXT: [[TMP0:%.]] = call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.]])
	// CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.aarch64.sve.clastb.n.nxv4i32(<vscale x 4 x i1> [[TMP0]], i32 [[FALLBACK:%.]], <vscale x 4 x i32> [[DATA:%.*]])			// CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[FALLBACK:%.]] to float
	// CHECK-NEXT: ret i32 [[TMP1]]			// CHECK-NEXT: [[TMP2:%.]] = bitcast <vscale x 4 x i32> [[DATA:%.]] to <vscale x 4 x float>
				// CHECK-NEXT: [[TMP3:%.*]] = call float @llvm.aarch64.sve.clastb.n.nxv4f32(<vscale x 4 x i1> [[TMP0]], float [[TMP1]], <vscale x 4 x float> [[TMP2]])
				// CHECK-NEXT: [[TMP4:%.*]] = bitcast float [[TMP3]] to i32
				// CHECK-NEXT: ret i32 [[TMP4]]
	//			//
	// CPP-CHECK-LABEL: @_Z19test_svclastb_n_u32u10__SVBool_tju12__SVUint32_t(			// CPP-CHECK-LABEL: @_Z19test_svclastb_n_u32u10__SVBool_tju12__SVUint32_t(
	// CPP-CHECK-NEXT: entry:			// CPP-CHECK-NEXT: entry:
	// CPP-CHECK-NEXT: [[TMP0:%.]] = call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.]])			// CPP-CHECK-NEXT: [[TMP0:%.]] = call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[PG:%.]])
	// CPP-CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.aarch64.sve.clastb.n.nxv4i32(<vscale x 4 x i1> [[TMP0]], i32 [[FALLBACK:%.]], <vscale x 4 x i32> [[DATA:%.*]])			// CPP-CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[FALLBACK:%.]] to float
	// CPP-CHECK-NEXT: ret i32 [[TMP1]]			// CPP-CHECK-NEXT: [[TMP2:%.]] = bitcast <vscale x 4 x i32> [[DATA:%.]] to <vscale x 4 x float>
				// CPP-CHECK-NEXT: [[TMP3:%.*]] = call float @llvm.aarch64.sve.clastb.n.nxv4f32(<vscale x 4 x i1> [[TMP0]], float [[TMP1]], <vscale x 4 x float> [[TMP2]])
				// CPP-CHECK-NEXT: [[TMP4:%.*]] = bitcast float [[TMP3]] to i32
				// CPP-CHECK-NEXT: ret i32 [[TMP4]]
	//			//
	uint32_t test_svclastb_n_u32(svbool_t pg, uint32_t fallback, svuint32_t data)			uint32_t test_svclastb_n_u32(svbool_t pg, uint32_t fallback, svuint32_t data)
	{			{
	return SVE_ACLE_FUNC(svclastb,_n_u32,,)(pg, fallback, data);			return SVE_ACLE_FUNC(svclastb,_n_u32,,)(pg, fallback, data);
	}			}

	// CHECK-LABEL: @test_svclastb_n_u64(			// CHECK-LABEL: @test_svclastb_n_u64(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[TMP0:%.]] = call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.]])			// CHECK-NEXT: [[TMP0:%.]] = call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.]])
	// CHECK-NEXT: [[TMP1:%.]] = call i64 @llvm.aarch64.sve.clastb.n.nxv2i64(<vscale x 2 x i1> [[TMP0]], i64 [[FALLBACK:%.]], <vscale x 2 x i64> [[DATA:%.*]])			// CHECK-NEXT: [[TMP1:%.]] = bitcast i64 [[FALLBACK:%.]] to double
	// CHECK-NEXT: ret i64 [[TMP1]]			// CHECK-NEXT: [[TMP2:%.]] = bitcast <vscale x 2 x i64> [[DATA:%.]] to <vscale x 2 x double>
				// CHECK-NEXT: [[TMP3:%.*]] = call double @llvm.aarch64.sve.clastb.n.nxv2f64(<vscale x 2 x i1> [[TMP0]], double [[TMP1]], <vscale x 2 x double> [[TMP2]])
				// CHECK-NEXT: [[TMP4:%.*]] = bitcast double [[TMP3]] to i64
				// CHECK-NEXT: ret i64 [[TMP4]]
	//			//
	// CPP-CHECK-LABEL: @_Z19test_svclastb_n_u64u10__SVBool_tmu12__SVUint64_t(			// CPP-CHECK-LABEL: @_Z19test_svclastb_n_u64u10__SVBool_tmu12__SVUint64_t(
	// CPP-CHECK-NEXT: entry:			// CPP-CHECK-NEXT: entry:
	// CPP-CHECK-NEXT: [[TMP0:%.]] = call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.]])			// CPP-CHECK-NEXT: [[TMP0:%.]] = call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[PG:%.]])
	// CPP-CHECK-NEXT: [[TMP1:%.]] = call i64 @llvm.aarch64.sve.clastb.n.nxv2i64(<vscale x 2 x i1> [[TMP0]], i64 [[FALLBACK:%.]], <vscale x 2 x i64> [[DATA:%.*]])			// CPP-CHECK-NEXT: [[TMP1:%.]] = bitcast i64 [[FALLBACK:%.]] to double
	// CPP-CHECK-NEXT: ret i64 [[TMP1]]			// CPP-CHECK-NEXT: [[TMP2:%.]] = bitcast <vscale x 2 x i64> [[DATA:%.]] to <vscale x 2 x double>
				// CPP-CHECK-NEXT: [[TMP3:%.*]] = call double @llvm.aarch64.sve.clastb.n.nxv2f64(<vscale x 2 x i1> [[TMP0]], double [[TMP1]], <vscale x 2 x double> [[TMP2]])
				// CPP-CHECK-NEXT: [[TMP4:%.*]] = bitcast double [[TMP3]] to i64
				// CPP-CHECK-NEXT: ret i64 [[TMP4]]
	//			//
	uint64_t test_svclastb_n_u64(svbool_t pg, uint64_t fallback, svuint64_t data)			uint64_t test_svclastb_n_u64(svbool_t pg, uint64_t fallback, svuint64_t data)
	{			{
	return SVE_ACLE_FUNC(svclastb,_n_u64,,)(pg, fallback, data);			return SVE_ACLE_FUNC(svclastb,_n_u64,,)(pg, fallback, data);
	}			}

	// CHECK-LABEL: @test_svclastb_n_f16(			// CHECK-LABEL: @test_svclastb_n_f16(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 790 Lines • ▼ Show 20 Lines	static Optional<Instruction *> instCombineSVELast(InstCombiner &IC,
// The intrinsic is extracting a fixed lane so use an extract instead.		// The intrinsic is extracting a fixed lane so use an extract instead.
auto *IdxTy = Type::getInt64Ty(II.getContext());		auto *IdxTy = Type::getInt64Ty(II.getContext());
auto *Extract = ExtractElementInst::Create(Vec, ConstantInt::get(IdxTy, Idx));		auto *Extract = ExtractElementInst::Create(Vec, ConstantInt::get(IdxTy, Idx));
Extract->insertBefore(&II);		Extract->insertBefore(&II);
Extract->takeName(&II);		Extract->takeName(&II);
return IC.replaceInstUsesWith(II, Extract);		return IC.replaceInstUsesWith(II, Extract);
}		}

		static Optional<Instruction *> instCombineSVECondLast(InstCombiner &IC,
		IntrinsicInst &II) {
		// Replace scalar integer CLAST[AB] intrinsic with optimal SIMD&FP variant.
		efriedmaUnsubmitted Done Reply Inline Actions Maybe put a bit of the explanation you just gave into a comment here, for reference. efriedma: Maybe put a bit of the explanation you just gave into a comment here, for reference.
		IRBuilder<> Builder(II.getContext());
		Builder.SetInsertPoint(&II);
		Value *Pg = II.getArgOperand(0);
		Value *Fallback = II.getArgOperand(1);
		Value *Vec = II.getArgOperand(2);
		Type *Ty = II.getType();

		if (!Ty->isIntegerTy())
		return None;

		Type *FPTy;
		switch (cast<IntegerType>(Ty)->getBitWidth()) {
		default:
		return None;
		case 16:
		FPTy = Builder.getHalfTy();
		break;
		case 32:
		FPTy = Builder.getFloatTy();
		break;
		case 64:
		FPTy = Builder.getDoubleTy();
		break;
		}

		Value *FPFallBack = Builder.CreateBitCast(Fallback, FPTy);
		auto *FPVTy = VectorType::get(
		FPTy, cast<VectorType>(Vec->getType())->getElementCount());
		Value *FPVec = Builder.CreateBitCast(Vec, FPVTy);
		auto *FPII = Builder.CreateIntrinsic(II.getIntrinsicID(), {FPVec->getType()},
		{Pg, FPFallBack, FPVec});
		Value *FPIItoInt = Builder.CreateBitCast(FPII, II.getType());
		return IC.replaceInstUsesWith(II, FPIItoInt);
		}

static Optional<Instruction *> instCombineRDFFR(InstCombiner &IC,		static Optional<Instruction *> instCombineRDFFR(InstCombiner &IC,
IntrinsicInst &II) {		IntrinsicInst &II) {
LLVMContext &Ctx = II.getContext();		LLVMContext &Ctx = II.getContext();
IRBuilder<> Builder(Ctx);		IRBuilder<> Builder(Ctx);
Builder.SetInsertPoint(&II);		Builder.SetInsertPoint(&II);
// Replace rdffr with predicated rdffr.z intrinsic, so that optimizePTestInstr		// Replace rdffr with predicated rdffr.z intrinsic, so that optimizePTestInstr
// can work with RDFFR_PP for ptest elimination.		// can work with RDFFR_PP for ptest elimination.
auto *AllPat =		auto *AllPat =
▲ Show 20 Lines • Show All 482 Lines • ▼ Show 20 Lines	AArch64TTIImpl::instCombineIntrinsic(InstCombiner &IC,
case Intrinsic::aarch64_sve_cmpne:		case Intrinsic::aarch64_sve_cmpne:
case Intrinsic::aarch64_sve_cmpne_wide:		case Intrinsic::aarch64_sve_cmpne_wide:
return instCombineSVECmpNE(IC, II);		return instCombineSVECmpNE(IC, II);
case Intrinsic::aarch64_sve_rdffr:		case Intrinsic::aarch64_sve_rdffr:
return instCombineRDFFR(IC, II);		return instCombineRDFFR(IC, II);
case Intrinsic::aarch64_sve_lasta:		case Intrinsic::aarch64_sve_lasta:
case Intrinsic::aarch64_sve_lastb:		case Intrinsic::aarch64_sve_lastb:
return instCombineSVELast(IC, II);		return instCombineSVELast(IC, II);
		case Intrinsic::aarch64_sve_clasta_n:
		case Intrinsic::aarch64_sve_clastb_n:
		return instCombineSVECondLast(IC, II);
case Intrinsic::aarch64_sve_cntd:		case Intrinsic::aarch64_sve_cntd:
return instCombineSVECntElts(IC, II, 2);		return instCombineSVECntElts(IC, II, 2);
case Intrinsic::aarch64_sve_cntw:		case Intrinsic::aarch64_sve_cntw:
return instCombineSVECntElts(IC, II, 4);		return instCombineSVECntElts(IC, II, 4);
case Intrinsic::aarch64_sve_cnth:		case Intrinsic::aarch64_sve_cnth:
return instCombineSVECntElts(IC, II, 8);		return instCombineSVECntElts(IC, II, 8);
case Intrinsic::aarch64_sve_cntb:		case Intrinsic::aarch64_sve_cntb:
return instCombineSVECntElts(IC, II, 16);		return instCombineSVECntElts(IC, II, 16);
▲ Show 20 Lines • Show All 1,606 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-opts-clast.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -passes=instcombine -S < %s \| FileCheck %s

				target triple = "aarch64"

				define i16 @clastb_n_i16(<vscale x 8 x i1> %pg, i16 %a, <vscale x 8 x i16> %b) {
				; CHECK-LABEL: @clastb_n_i16(
				; CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[A:%.]] to half
				; CHECK-NEXT: [[TMP2:%.]] = bitcast <vscale x 8 x i16> [[B:%.]] to <vscale x 8 x half>
				; CHECK-NEXT: [[TMP3:%.]] = call half @llvm.aarch64.sve.clastb.n.nxv8f16(<vscale x 8 x i1> [[PG:%.]], half [[TMP1]], <vscale x 8 x half> [[TMP2]])
				; CHECK-NEXT: [[TMP4:%.*]] = bitcast half [[TMP3]] to i16
				; CHECK-NEXT: ret i16 [[TMP4]]
				;
				%out = call i16 @llvm.aarch64.sve.clastb.n.nxv8i16(<vscale x 8 x i1> %pg, i16 %a, <vscale x 8 x i16> %b)
				ret i16 %out
				}

				define i32 @clastb_n_i32(<vscale x 4 x i1> %pg, i32 %a, <vscale x 4 x i32> %b) {
				; CHECK-LABEL: @clastb_n_i32(
				; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[A:%.]] to float
				; CHECK-NEXT: [[TMP2:%.]] = bitcast <vscale x 4 x i32> [[B:%.]] to <vscale x 4 x float>
				; CHECK-NEXT: [[TMP3:%.]] = call float @llvm.aarch64.sve.clastb.n.nxv4f32(<vscale x 4 x i1> [[PG:%.]], float [[TMP1]], <vscale x 4 x float> [[TMP2]])
				; CHECK-NEXT: [[TMP4:%.*]] = bitcast float [[TMP3]] to i32
				; CHECK-NEXT: ret i32 [[TMP4]]
				;
				%out = call i32 @llvm.aarch64.sve.clastb.n.nxv4i32(<vscale x 4 x i1> %pg, i32 %a, <vscale x 4 x i32> %b)
				ret i32 %out
				}

				define i64 @clastb_n_i64(<vscale x 2 x i1> %pg, i64 %a, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: @clastb_n_i64(
				; CHECK-NEXT: [[TMP1:%.]] = bitcast i64 [[A:%.]] to double
				; CHECK-NEXT: [[TMP2:%.]] = bitcast <vscale x 2 x i64> [[B:%.]] to <vscale x 2 x double>
				; CHECK-NEXT: [[TMP3:%.]] = call double @llvm.aarch64.sve.clastb.n.nxv2f64(<vscale x 2 x i1> [[PG:%.]], double [[TMP1]], <vscale x 2 x double> [[TMP2]])
				; CHECK-NEXT: [[TMP4:%.*]] = bitcast double [[TMP3]] to i64
				; CHECK-NEXT: ret i64 [[TMP4]]
				;
				%out = call i64 @llvm.aarch64.sve.clastb.n.nxv2i64(<vscale x 2 x i1> %pg, i64 %a, <vscale x 2 x i64> %b)
				ret i64 %out
				}

				declare i16 @llvm.aarch64.sve.clastb.n.nxv8i16(<vscale x 8 x i1>, i16, <vscale x 8 x i16>)
				declare i32 @llvm.aarch64.sve.clastb.n.nxv4i32(<vscale x 4 x i1>, i32, <vscale x 4 x i32>)
				declare i64 @llvm.aarch64.sve.clastb.n.nxv2i64(<vscale x 2 x i1>, i64, <vscale x 2 x i64>)