This is an archive of the discontinued LLVM Phabricator instance.

[NEON] Define vget_high_f16() and vget_low_f16() intrinsics in AArch64 mode only
ClosedPublic

Authored by kosarev on Apr 15 2018, 6:08 AM.

Download Raw Diff

Details

Reviewers

t.p.northover
rengolin
SjoerdMeijer

Commits

rG1243ebdcdb6e: Revert r330195 "[NEON] Define vget_high_f16() and vget_low_f16() intrinsics in…
rGb3b87c3314d6: [NEON] Define vget_high_f16() and vget_low_f16() intrinsics in AArch64 mode only
rC330248: Revert r330195 "[NEON] Define vget_high_f16() and vget_low_f16() intrinsics in…
rL330248: Revert r330195 "[NEON] Define vget_high_f16() and vget_low_f16() intrinsics in…
rC330195: [NEON] Define vget_high_f16() and vget_low_f16() intrinsics in AArch64 mode only
rL330195: [NEON] Define vget_high_f16() and vget_low_f16() intrinsics in AArch64 mode only

Summary

These are AArch64-specific intrinsics. The patch removes AArch32-mode test cases and maintains AArch64 ones in tools/clang/test/CodeGen/aarch64-neon-vget-hilo.c.

Diff Detail

Event Timeline

kosarev created this revision.Apr 15 2018, 6:08 AM

Herald added subscribers: kristof.beyls, javed.absar. · View Herald TranscriptApr 15 2018, 6:08 AM

Not really familiar with these 2 intrinsics, I had a quick look at the ACLE:

T vget_high_ST(T 2 a);
T vget_low_ST(T 2 a);

Gets the high, or low, half of a 128-bit vector. There are 24 intrinsics. ARMv8
adds 4 more intrinsics for 128-bit vectors with float64_t and poly64_t lane
type.

I don't read here that they are unavailable in AArch32. Have I missed something?

The NEON Intrinsics Reference (http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ihi0073a/index.html) reads like they are AArch64-only.

Yep, agreed, also on the new shiny https://developer.arm.com/technologies/neon/intrinsics it is listed as A64 only.

This revision is now accepted and ready to land.Apr 17 2018, 7:25 AM

Closed by commit rL330195: [NEON] Define vget_high_f16() and vget_low_f16() intrinsics in AArch64 mode only (authored by kosarev). · Explain WhyApr 17 2018, 9:46 AM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: llvm-commits. · View Herald TranscriptApr 17 2018, 9:46 AM

Sorry, I have second thoughts on this.
This seems more like a doc issue than anything else. There is no reason why this could not be supported in A32. GCC is also supporting this, and removing it is a bit user unfriendly.
Would you mind reverting this?

Sure, will do. Should we treat these intrinsics as ARMv8 or ARMv7/v8? Also, would you mind if I commit a comment under this differential revision explaining the situation?

Thanks, and I am going to try to get some clarity on this doc issue. But looks like it should be "ARMv7, ARMv8", as it used to be. Make sense to comment on this in the commit message, if that's what you mean.

In D45668#1070878, @SjoerdMeijer wrote:

Thanks, and I am going to try to get some clarity on this doc issue. But looks like it should be "ARMv7, ARMv8", as it used to be. Make sense to comment on this in the commit message, if that's what you mean.

These should be available whenever the float16x4_t and float16x8_t types are available. So v7/A32/A64. I have pushed this change to the docs locally; but I don't know when this will make it to public documentation, so you will just have to take my word for it when I say this is a documentation bug and it will be fixed in a future release.

Thanks James!

Thanks Sjoerd and James. Just added a comment referring to this revision in rL330420.

Revision Contents

Path

Size

include/

clang/

Basic/

arm_neon.td

10 lines

test/

CodeGen/

arm_neon_intrinsics.c

14 lines

Diff 142558

include/clang/Basic/arm_neon.td

	Show First 20 Lines • Show All 392 Lines • ▼ Show 20 Lines

	////////////////////////////////////////////////////////////////////////////////			////////////////////////////////////////////////////////////////////////////////
	// E.3.20 Combining vectors			// E.3.20 Combining vectors
	def VCOMBINE : NoTestOpInst<"vcombine", "kdd", "csilhfUcUsUiUlPcPs", OP_CONC>;			def VCOMBINE : NoTestOpInst<"vcombine", "kdd", "csilhfUcUsUiUlPcPs", OP_CONC>;

	////////////////////////////////////////////////////////////////////////////////			////////////////////////////////////////////////////////////////////////////////
	// E.3.21 Splitting vectors			// E.3.21 Splitting vectors
	let InstName = "vmov" in {			let InstName = "vmov" in {
	def VGET_HIGH : NoTestOpInst<"vget_high", "dk", "csilhfUcUsUiUlPcPs", OP_HI>;			def VGET_HIGH : NoTestOpInst<"vget_high", "dk", "csilfUcUsUiUlPcPs", OP_HI>;
	def VGET_LOW : NoTestOpInst<"vget_low", "dk", "csilhfUcUsUiUlPcPs", OP_LO>;			def VGET_LOW : NoTestOpInst<"vget_low", "dk", "csilfUcUsUiUlPcPs", OP_LO>;
				}
				let ArchGuard = "__ARM_ARCH >= 8 && defined(__aarch64__)" in {
				let InstName = "vmov" in {
				def VGET_HIGH_F16 : NoTestOpInst<"vget_high", "dk", "h", OP_HI>;
				def VGET_LOW_F16 : NoTestOpInst<"vget_low", "dk", "h", OP_LO>;
				}
	}			}

	////////////////////////////////////////////////////////////////////////////////			////////////////////////////////////////////////////////////////////////////////
	// E.3.22 Converting vectors			// E.3.22 Converting vectors

	let ArchGuard = "(__ARM_FP & 2)" in {			let ArchGuard = "(__ARM_FP & 2)" in {
	def VCVT_F16_F32 : SInst<"vcvt_f16_f32", "md", "Hf">;			def VCVT_F16_F32 : SInst<"vcvt_f16_f32", "md", "Hf">;
	def VCVT_F32_F16 : SInst<"vcvt_f32_f16", "wd", "h">;			def VCVT_F32_F16 : SInst<"vcvt_f32_f16", "wd", "h">;
	▲ Show 20 Lines • Show All 1,166 Lines • Show Last 20 Lines

test/CodeGen/arm_neon_intrinsics.c

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 3,248 Lines • ▼ Show 20 Lines

	// CHECK-LABEL: @test_vget_high_s64(			// CHECK-LABEL: @test_vget_high_s64(
	// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <2 x i64> %a, <2 x i64> %a, <1 x i32> <i32 1>			// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <2 x i64> %a, <2 x i64> %a, <1 x i32> <i32 1>
	// CHECK: ret <1 x i64> [[SHUFFLE_I]]			// CHECK: ret <1 x i64> [[SHUFFLE_I]]
	int64x1_t test_vget_high_s64(int64x2_t a) {			int64x1_t test_vget_high_s64(int64x2_t a) {
	return vget_high_s64(a);			return vget_high_s64(a);
	}			}

	// CHECK-LABEL: @test_vget_high_f16(
	// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <8 x half> %a, <8 x half> %a, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
	// CHECK: ret <4 x half> [[SHUFFLE_I]]
	float16x4_t test_vget_high_f16(float16x8_t a) {
	return vget_high_f16(a);
	}

	// CHECK-LABEL: @test_vget_high_f32(			// CHECK-LABEL: @test_vget_high_f32(
	// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <4 x float> %a, <4 x float> %a, <2 x i32> <i32 2, i32 3>			// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <4 x float> %a, <4 x float> %a, <2 x i32> <i32 2, i32 3>
	// CHECK: ret <2 x float> [[SHUFFLE_I]]			// CHECK: ret <2 x float> [[SHUFFLE_I]]
	float32x2_t test_vget_high_f32(float32x4_t a) {			float32x2_t test_vget_high_f32(float32x4_t a) {
	return vget_high_f32(a);			return vget_high_f32(a);
	}			}

	// CHECK-LABEL: @test_vget_high_u8(			// CHECK-LABEL: @test_vget_high_u8(
	▲ Show 20 Lines • Show All 283 Lines • ▼ Show 20 Lines

	// CHECK-LABEL: @test_vget_low_s64(			// CHECK-LABEL: @test_vget_low_s64(
	// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <2 x i64> %a, <2 x i64> %a, <1 x i32> zeroinitializer			// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <2 x i64> %a, <2 x i64> %a, <1 x i32> zeroinitializer
	// CHECK: ret <1 x i64> [[SHUFFLE_I]]			// CHECK: ret <1 x i64> [[SHUFFLE_I]]
	int64x1_t test_vget_low_s64(int64x2_t a) {			int64x1_t test_vget_low_s64(int64x2_t a) {
	return vget_low_s64(a);			return vget_low_s64(a);
	}			}

	// CHECK-LABEL: @test_vget_low_f16(
	// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <8 x half> %a, <8 x half> %a, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	// CHECK: ret <4 x half> [[SHUFFLE_I]]
	float16x4_t test_vget_low_f16(float16x8_t a) {
	return vget_low_f16(a);
	}

	// CHECK-LABEL: @test_vget_low_f32(			// CHECK-LABEL: @test_vget_low_f32(
	// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <4 x float> %a, <4 x float> %a, <2 x i32> <i32 0, i32 1>			// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <4 x float> %a, <4 x float> %a, <2 x i32> <i32 0, i32 1>
	// CHECK: ret <2 x float> [[SHUFFLE_I]]			// CHECK: ret <2 x float> [[SHUFFLE_I]]
	float32x2_t test_vget_low_f32(float32x4_t a) {			float32x2_t test_vget_low_f32(float32x4_t a) {
	return vget_low_f32(a);			return vget_low_f32(a);
	}			}

	// CHECK-LABEL: @test_vget_low_u8(			// CHECK-LABEL: @test_vget_low_u8(
	▲ Show 20 Lines • Show All 17,825 Lines • Show Last 20 Lines