Download Raw Diff

Details

Reviewers

MarkMurrayARM
john.brawn
simon_tatham
dmgreen

Commits

rGb14a6f06cc87: [ARM][MVE] vcreateq lane ordering for big endian

Summary

Use of bitcast resulted in lanes being swapped for vcreateq with big
endian. Fix this by using vreinterpret. No code change for little
endian. Adds IR lit test.

For example, the following code will print different results for big and little endian:

#include <arm_mve.h>
extern "C" {
void printf(const char *, ...);
}


int main() {
    int16x8_t x = vcreateq_s16(0x0003000200010000, 0x0007000600050004);

    printf("8x16 lanes:\n");
    printf("%d:%d\n", 0, vgetq_lane_s16(x, 0));
    printf("%d:%d\n", 1, vgetq_lane_s16(x, 1));
    printf("%d:%d\n", 2, vgetq_lane_s16(x, 2));
    printf("%d:%d\n", 3, vgetq_lane_s16(x, 3));
    printf("%d:%d\n", 4, vgetq_lane_s16(x, 4));
    printf("%d:%d\n", 5, vgetq_lane_s16(x, 5));
    printf("%d:%d\n", 6, vgetq_lane_s16(x, 6));
    printf("%d:%d\n", 7, vgetq_lane_s16(x, 7));

    int8x16_t y = vcreateq_s8(0x0706050403020100, 0x0f0e0d0c0b0a0908);
    printf("16x8 lanes:\n");
    printf("%d:%d\n", 0, vgetq_lane_s8(y, 0));
    printf("%d:%d\n", 1, vgetq_lane_s8(y, 1));
    printf("%d:%d\n", 2, vgetq_lane_s8(y, 2));
    printf("%d:%d\n", 3, vgetq_lane_s8(y, 3));
    printf("%d:%d\n", 4, vgetq_lane_s8(y, 4));
    printf("%d:%d\n", 5, vgetq_lane_s8(y, 5));
    printf("%d:%d\n", 6, vgetq_lane_s8(y, 6));
    printf("%d:%d\n", 7, vgetq_lane_s8(y, 7));
    printf("%d:%d\n", 8, vgetq_lane_s8(y, 8));
    printf("%d:%d\n", 9, vgetq_lane_s8(y, 9));
    printf("%d:%d\n", 10, vgetq_lane_s8(y, 10));
    printf("%d:%d\n", 11, vgetq_lane_s8(y, 11));
    printf("%d:%d\n", 12, vgetq_lane_s8(y, 12));
    printf("%d:%d\n", 13, vgetq_lane_s8(y, 13));
    printf("%d:%d\n", 14, vgetq_lane_s8(y, 14));
    printf("%d:%d\n", 15, vgetq_lane_s8(y, 15));
    return 0;
}

For little endian (correct) (clang++ -target arm-none-none-eabi -march=armv8.1-m.main+mve.fp -O0):

8x16 lanes:
0:0
1:1
2:2
3:3
4:4
5:5
6:6
7:7
16x8 lanes:
0:0
1:1
2:2
3:3
4:4
5:5
6:6
7:7
8:8
9:9
10:10
11:11
12:12
13:13
14:14
15:15

For big endian (incorrect) (clang++ -target arm-none-none-eabi -march=armv8.1-m.main+mve.fp -O0 -mbig-endian):

8x16 lanes:
0:3
1:2
2:1
3:0
4:7
5:6
6:5
7:4
16x8 lanes:
0:7
1:6
2:5
3:4
4:3
5:2
6:1
7:0
8:15
9:14
10:13
11:12
12:11
13:10
14:9
15:8

This patch brings the big endian output in line with the little endian output.

Bitcast documentation: https://llvm.org/docs/LangRef.html#bitcast-to-instruction
Helium intrinsics documentation: https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/helium-intrinsics

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tmatheson created this revision.Apr 30 2021, 1:36 AM

Herald added subscribers: danielkiss, dmgreen, kristof.beyls. · View Herald TranscriptApr 30 2021, 1:36 AM

tmatheson requested review of this revision.Apr 30 2021, 1:36 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 30 2021, 1:36 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

tmatheson added reviewers: MarkMurrayARM, greened, john.brawn, simon_tatham.Apr 30 2021, 1:37 AM

tmatheson edited reviewers, added: dmgreen; removed: greened.Apr 30 2021, 1:40 AM

Not sure yet.

clang/test/CodeGen/arm-mve-intrinsics/admin.c
90	Surely there is a problem here also?
158	And a problem here also (with BE)?

tmatheson retitled this revision from [ARM] vcreateq lane ordering for big endian to [ARM][MVE] vcreateq lane ordering for big endian.Apr 30 2021, 1:55 AM

tmatheson edited the summary of this revision. (Show Details)

Sounds good to me.

Whilst we are here, are any of the other uses of bitcast in arm_mve.td potentially a problem? I took a quick look and because they both converting the inputs and the outputs, I believe they will be OK. (Two wrongs make a right, if you will).

clang/test/CodeGen/arm-mve-intrinsics/admin.c
1–2	Is this updated with update_cc_test_checks? It may make the output more verbose, but it will be more standard.
90	I don't see why these would be a problem. Can you elaborate?

MarkMurrayARM added inline comments.Apr 30 2021, 2:04 AM

clang/test/CodeGen/arm-mve-intrinsics/admin.c
90	I'm wondering if they need to be swapped in the BE case.

Harbormaster completed remote builds in B101855: Diff 341802.Apr 30 2021, 2:09 AM

tmatheson added inline comments.Apr 30 2021, 2:35 AM

clang/test/CodeGen/arm-mve-intrinsics/admin.c
90	vcreateq is not endianness aware, it just inserts the two given 64 bit values `a` and `b` into the low and high lanes respectively. The bit representation of each 64 bit int will be different but that is not shown here. Therefore the IR is the same for big and little endian. I have also confirmed locally with runtime output: uint64x2_t w = vcreateq_u64(0x0000000000000001, 0x0000000000000002); printf("%d:%llu\n", 0, vgetq_lane_u64(w, 0)); printf("%d:%llu\n", 1, vgetq_lane_u64(w, 1)); which gives for both little and bit endian (with this patch): 0:1 1:2
158	See above

Use update_cc_test_checks

dmgreen added inline comments.Apr 30 2021, 2:46 AM

clang/test/CodeGen/arm-mve-intrinsics/admin.c
86	You have to remove the old checks - the script isn't very good at that. What would probably be even better would be if it used --check-prefixes=CHECK,CHECK-LE. That way it should be able to common the snippets that don't change between LE and BE.

remove old check lines that were not automatically removed

Use --check-prefixes=CHECK,CHECK-BE etc to combine common blocks.
Sorry for the churn.

Thanks for the updates. LGTM.

This revision is now accepted and ready to land.Apr 30 2021, 2:58 AM

In D101606#2728320, @dmgreen wrote:

Sounds good to me.

Whilst we are here, are any of the other uses of bitcast in arm_mve.td potentially a problem? I took a quick look and because they both converting the inputs and the outputs, I believe they will be OK. (Two wrongs make a right, if you will).

I had a look and came to the same conclusion, I couldn't find any way to make them break. Worth noting that they are all converting between vectors with the same number of lanes, e.g. typically between the signed and unsigned versions of NxM vectors.

Harbormaster completed remote builds in B101869: Diff 341819.Apr 30 2021, 3:29 AM

Harbormaster completed remote builds in B101868: Diff 341818.Apr 30 2021, 3:39 AM

Harbormaster completed remote builds in B101873: Diff 341823.Apr 30 2021, 3:46 AM

Closed by commit rGb14a6f06cc87: [ARM][MVE] vcreateq lane ordering for big endian (authored by tmatheson). · Explain WhyApr 30 2021, 5:49 AM

This revision was automatically updated to reflect the committed changes.

tmatheson added a commit: rGb14a6f06cc87: [ARM][MVE] vcreateq lane ordering for big endian.

Diff 341864

clang/include/clang/Basic/arm_mve.td

Show First 20 Lines • Show All 1,537 Lines • ▼ Show 20 Lines	in {
def "vreinterpretq_" # desttype: Intrinsic<		def "vreinterpretq_" # desttype: Intrinsic<
VecOf<desttype>, (args Vector:$x), (vreinterpret $x, VecOf<desttype>)>;		VecOf<desttype>, (args Vector:$x), (vreinterpret $x, VecOf<desttype>)>;
}		}
}		}

let params = T.All in {		let params = T.All in {
let pnt = PNT_None in {		let pnt = PNT_None in {
def vcreateq: Intrinsic<Vector, (args u64:$a, u64:$b),		def vcreateq: Intrinsic<Vector, (args u64:$a, u64:$b),
(bitcast (ielt_const (ielt_const (undef VecOf<u64>), $a, 0),		(vreinterpret (ielt_const (ielt_const (undef VecOf<u64>), $a, 0),
$b, 1), Vector)>;		$b, 1), Vector)>;
def vuninitializedq: Intrinsic<Vector, (args), (undef Vector)>;		def vuninitializedq: Intrinsic<Vector, (args), (undef Vector)>;
}		}

// This is the polymorphic form of vuninitializedq, which takes no type		// This is the polymorphic form of vuninitializedq, which takes no type
// suffix, but takes an _unevaluated_ vector parameter and returns an		// suffix, but takes an _unevaluated_ vector parameter and returns an
// uninitialized vector of the same vector type.		// uninitialized vector of the same vector type.
//		//
▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

clang/test/CodeGen/arm-mve-intrinsics/admin.c

	// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py			// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
	// RUN: %clang_cc1 -triple thumbv8.1m.main-none-none-eabi -target-feature +mve.fp -mfloat-abi hard -fallow-half-arguments-and-returns -O0 -disable-O0-optnone -S -emit-llvm -o - %s \| opt -S -mem2reg -sroa -early-cse \| FileCheck %s			// RUN: %clang_cc1 -triple thumbv8.1m.main-none-none-eabi -target-feature +mve.fp -mfloat-abi hard -fallow-half-arguments-and-returns -O0 -disable-O0-optnone -S -emit-llvm -o - %s \| opt -S -mem2reg -sroa -early-cse \| FileCheck %s --check-prefixes=CHECK,CHECK-LE
				dmgreenUnsubmitted Done Reply Inline Actions Is this updated with update_cc_test_checks? It may make the output more verbose, but it will be more standard. dmgreen: Is this updated with update_cc_test_checks? It may make the output more verbose, but it will…
	// RUN: %clang_cc1 -triple thumbv8.1m.main-none-none-eabi -target-feature +mve.fp -mfloat-abi hard -fallow-half-arguments-and-returns -O0 -disable-O0-optnone -DPOLYMORPHIC -S -emit-llvm -o - %s \| opt -S -mem2reg -sroa -early-cse \| FileCheck %s			// RUN: %clang_cc1 -triple thumbv8.1m.main-none-none-eabi -target-feature +mve.fp -mfloat-abi hard -fallow-half-arguments-and-returns -O0 -disable-O0-optnone -DPOLYMORPHIC -S -emit-llvm -o - %s \| opt -S -mem2reg -sroa -early-cse \| FileCheck %s --check-prefixes=CHECK,CHECK-LE
				// RUN: %clang_cc1 -triple thumbebv8.1m.main-none-none-eabi -target-feature +mve.fp -mfloat-abi hard -fallow-half-arguments-and-returns -O0 -disable-O0-optnone -S -emit-llvm -o - %s \| opt -S -mem2reg -sroa -early-cse \| FileCheck %s --check-prefixes=CHECK,CHECK-BE
				// RUN: %clang_cc1 -triple thumbebv8.1m.main-none-none-eabi -target-feature +mve.fp -mfloat-abi hard -fallow-half-arguments-and-returns -O0 -disable-O0-optnone -DPOLYMORPHIC -S -emit-llvm -o - %s \| opt -S -mem2reg -sroa -early-cse \| FileCheck %s --check-prefixes=CHECK,CHECK-BE


	#include <arm_mve.h>			#include <arm_mve.h>

	// CHECK-LABEL: @test_vcreateq_f16(			// CHECK-LE-LABEL: @test_vcreateq_f16(
	// CHECK-NEXT: entry:			// CHECK-LE-NEXT: entry:
	// CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i64> undef, i64 [[A:%.]], i64 0			// CHECK-LE-NEXT: [[TMP0:%.]] = insertelement <2 x i64> undef, i64 [[A:%.]], i64 0
	// CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i64> [[TMP0]], i64 [[B:%.]], i64 1			// CHECK-LE-NEXT: [[TMP1:%.]] = insertelement <2 x i64> [[TMP0]], i64 [[B:%.]], i64 1
	// CHECK-NEXT: [[TMP2:%.*]] = bitcast <2 x i64> [[TMP1]] to <8 x half>			// CHECK-LE-NEXT: [[TMP2:%.*]] = bitcast <2 x i64> [[TMP1]] to <8 x half>
	// CHECK-NEXT: ret <8 x half> [[TMP2]]			// CHECK-LE-NEXT: ret <8 x half> [[TMP2]]
				//
				// CHECK-BE-LABEL: @test_vcreateq_f16(
				// CHECK-BE-NEXT: entry:
				// CHECK-BE-NEXT: [[TMP0:%.]] = insertelement <2 x i64> undef, i64 [[A:%.]], i64 0
				// CHECK-BE-NEXT: [[TMP1:%.]] = insertelement <2 x i64> [[TMP0]], i64 [[B:%.]], i64 1
				// CHECK-BE-NEXT: [[TMP2:%.*]] = call <8 x half> @llvm.arm.mve.vreinterpretq.v8f16.v2i64(<2 x i64> [[TMP1]])
				// CHECK-BE-NEXT: ret <8 x half> [[TMP2]]
	//			//
	float16x8_t test_vcreateq_f16(uint64_t a, uint64_t b)			float16x8_t test_vcreateq_f16(uint64_t a, uint64_t b)
	{			{
	return vcreateq_f16(a, b);			return vcreateq_f16(a, b);
	}			}

	// CHECK-LABEL: @test_vcreateq_f32(			// CHECK-LE-LABEL: @test_vcreateq_f32(
	// CHECK-NEXT: entry:			// CHECK-LE-NEXT: entry:
	// CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i64> undef, i64 [[A:%.]], i64 0			// CHECK-LE-NEXT: [[TMP0:%.]] = insertelement <2 x i64> undef, i64 [[A:%.]], i64 0
	// CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i64> [[TMP0]], i64 [[B:%.]], i64 1			// CHECK-LE-NEXT: [[TMP1:%.]] = insertelement <2 x i64> [[TMP0]], i64 [[B:%.]], i64 1
	// CHECK-NEXT: [[TMP2:%.*]] = bitcast <2 x i64> [[TMP1]] to <4 x float>			// CHECK-LE-NEXT: [[TMP2:%.*]] = bitcast <2 x i64> [[TMP1]] to <4 x float>
	// CHECK-NEXT: ret <4 x float> [[TMP2]]			// CHECK-LE-NEXT: ret <4 x float> [[TMP2]]
				//
				// CHECK-BE-LABEL: @test_vcreateq_f32(
				// CHECK-BE-NEXT: entry:
				// CHECK-BE-NEXT: [[TMP0:%.]] = insertelement <2 x i64> undef, i64 [[A:%.]], i64 0
				// CHECK-BE-NEXT: [[TMP1:%.]] = insertelement <2 x i64> [[TMP0]], i64 [[B:%.]], i64 1
				// CHECK-BE-NEXT: [[TMP2:%.*]] = call <4 x float> @llvm.arm.mve.vreinterpretq.v4f32.v2i64(<2 x i64> [[TMP1]])
				// CHECK-BE-NEXT: ret <4 x float> [[TMP2]]
	//			//
	float32x4_t test_vcreateq_f32(uint64_t a, uint64_t b)			float32x4_t test_vcreateq_f32(uint64_t a, uint64_t b)
	{			{
	return vcreateq_f32(a, b);			return vcreateq_f32(a, b);
	}			}

	// CHECK-LABEL: @test_vcreateq_s16(			// CHECK-LE-LABEL: @test_vcreateq_s16(
	// CHECK-NEXT: entry:			// CHECK-LE-NEXT: entry:
	// CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i64> undef, i64 [[A:%.]], i64 0			// CHECK-LE-NEXT: [[TMP0:%.]] = insertelement <2 x i64> undef, i64 [[A:%.]], i64 0
	// CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i64> [[TMP0]], i64 [[B:%.]], i64 1			// CHECK-LE-NEXT: [[TMP1:%.]] = insertelement <2 x i64> [[TMP0]], i64 [[B:%.]], i64 1
	// CHECK-NEXT: [[TMP2:%.*]] = bitcast <2 x i64> [[TMP1]] to <8 x i16>			// CHECK-LE-NEXT: [[TMP2:%.*]] = bitcast <2 x i64> [[TMP1]] to <8 x i16>
	// CHECK-NEXT: ret <8 x i16> [[TMP2]]			// CHECK-LE-NEXT: ret <8 x i16> [[TMP2]]
				//
				// CHECK-BE-LABEL: @test_vcreateq_s16(
				// CHECK-BE-NEXT: entry:
				// CHECK-BE-NEXT: [[TMP0:%.]] = insertelement <2 x i64> undef, i64 [[A:%.]], i64 0
				// CHECK-BE-NEXT: [[TMP1:%.]] = insertelement <2 x i64> [[TMP0]], i64 [[B:%.]], i64 1
				// CHECK-BE-NEXT: [[TMP2:%.*]] = call <8 x i16> @llvm.arm.mve.vreinterpretq.v8i16.v2i64(<2 x i64> [[TMP1]])
				// CHECK-BE-NEXT: ret <8 x i16> [[TMP2]]
	//			//
	int16x8_t test_vcreateq_s16(uint64_t a, uint64_t b)			int16x8_t test_vcreateq_s16(uint64_t a, uint64_t b)
	{			{
	return vcreateq_s16(a, b);			return vcreateq_s16(a, b);
	}			}

	// CHECK-LABEL: @test_vcreateq_s32(			// CHECK-LE-LABEL: @test_vcreateq_s32(
	// CHECK-NEXT: entry:			// CHECK-LE-NEXT: entry:
	// CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i64> undef, i64 [[A:%.]], i64 0			// CHECK-LE-NEXT: [[TMP0:%.]] = insertelement <2 x i64> undef, i64 [[A:%.]], i64 0
	// CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i64> [[TMP0]], i64 [[B:%.]], i64 1			// CHECK-LE-NEXT: [[TMP1:%.]] = insertelement <2 x i64> [[TMP0]], i64 [[B:%.]], i64 1
	// CHECK-NEXT: [[TMP2:%.*]] = bitcast <2 x i64> [[TMP1]] to <4 x i32>			// CHECK-LE-NEXT: [[TMP2:%.*]] = bitcast <2 x i64> [[TMP1]] to <4 x i32>
	// CHECK-NEXT: ret <4 x i32> [[TMP2]]			// CHECK-LE-NEXT: ret <4 x i32> [[TMP2]]
				//
				// CHECK-BE-LABEL: @test_vcreateq_s32(
				// CHECK-BE-NEXT: entry:
				// CHECK-BE-NEXT: [[TMP0:%.]] = insertelement <2 x i64> undef, i64 [[A:%.]], i64 0
				// CHECK-BE-NEXT: [[TMP1:%.]] = insertelement <2 x i64> [[TMP0]], i64 [[B:%.]], i64 1
				// CHECK-BE-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.arm.mve.vreinterpretq.v4i32.v2i64(<2 x i64> [[TMP1]])
				// CHECK-BE-NEXT: ret <4 x i32> [[TMP2]]
	//			//
	int32x4_t test_vcreateq_s32(uint64_t a, uint64_t b)			int32x4_t test_vcreateq_s32(uint64_t a, uint64_t b)
	{			{
	return vcreateq_s32(a, b);			return vcreateq_s32(a, b);
	}			}

	// CHECK-LABEL: @test_vcreateq_s64(			// CHECK-LABEL: @test_vcreateq_s64(
				dmgreenUnsubmitted Done Reply Inline Actions You have to remove the old checks - the script isn't very good at that. What would probably be even better would be if it used --check-prefixes=CHECK,CHECK-LE. That way it should be able to common the snippets that don't change between LE and BE. dmgreen: You have to remove the old checks - the script isn't very good at that. What would probably be…
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i64> undef, i64 [[A:%.]], i64 0			// CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i64> undef, i64 [[A:%.]], i64 0
	// CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i64> [[TMP0]], i64 [[B:%.]], i64 1			// CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i64> [[TMP0]], i64 [[B:%.]], i64 1
	// CHECK-NEXT: ret <2 x i64> [[TMP1]]			// CHECK-NEXT: ret <2 x i64> [[TMP1]]
				MarkMurrayARMUnsubmitted Not Done Reply Inline Actions Surely there is a problem here also? MarkMurrayARM: Surely there is a problem here also?
				dmgreenUnsubmitted Not Done Reply Inline Actions I don't see why these would be a problem. Can you elaborate? dmgreen: I don't see why these would be a problem. Can you elaborate?
				MarkMurrayARMUnsubmitted Not Done Reply Inline Actions I'm wondering if they need to be swapped in the BE case. MarkMurrayARM: I'm wondering if they need to be swapped in the BE case.
				tmathesonAuthorUnsubmitted Done Reply Inline Actions vcreateq is not endianness aware, it just inserts the two given 64 bit values `a` and `b` into the low and high lanes respectively. The bit representation of each 64 bit int will be different but that is not shown here. Therefore the IR is the same for big and little endian. I have also confirmed locally with runtime output: uint64x2_t w = vcreateq_u64(0x0000000000000001, 0x0000000000000002); printf("%d:%llu\n", 0, vgetq_lane_u64(w, 0)); printf("%d:%llu\n", 1, vgetq_lane_u64(w, 1)); which gives for both little and bit endian (with this patch): 0:1 1:2 tmatheson: vcreateq is not endianness aware, it just inserts the two given 64 bit values `a` and `b` into…
	//			//
	int64x2_t test_vcreateq_s64(uint64_t a, uint64_t b)			int64x2_t test_vcreateq_s64(uint64_t a, uint64_t b)
	{			{
	return vcreateq_s64(a, b);			return vcreateq_s64(a, b);
	}			}

	// CHECK-LABEL: @test_vcreateq_s8(			// CHECK-LE-LABEL: @test_vcreateq_s8(
	// CHECK-NEXT: entry:			// CHECK-LE-NEXT: entry:
	// CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i64> undef, i64 [[A:%.]], i64 0			// CHECK-LE-NEXT: [[TMP0:%.]] = insertelement <2 x i64> undef, i64 [[A:%.]], i64 0
	// CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i64> [[TMP0]], i64 [[B:%.]], i64 1			// CHECK-LE-NEXT: [[TMP1:%.]] = insertelement <2 x i64> [[TMP0]], i64 [[B:%.]], i64 1
	// CHECK-NEXT: [[TMP2:%.*]] = bitcast <2 x i64> [[TMP1]] to <16 x i8>			// CHECK-LE-NEXT: [[TMP2:%.*]] = bitcast <2 x i64> [[TMP1]] to <16 x i8>
	// CHECK-NEXT: ret <16 x i8> [[TMP2]]			// CHECK-LE-NEXT: ret <16 x i8> [[TMP2]]
				//
				// CHECK-BE-LABEL: @test_vcreateq_s8(
				// CHECK-BE-NEXT: entry:
				// CHECK-BE-NEXT: [[TMP0:%.]] = insertelement <2 x i64> undef, i64 [[A:%.]], i64 0
				// CHECK-BE-NEXT: [[TMP1:%.]] = insertelement <2 x i64> [[TMP0]], i64 [[B:%.]], i64 1
				// CHECK-BE-NEXT: [[TMP2:%.*]] = call <16 x i8> @llvm.arm.mve.vreinterpretq.v16i8.v2i64(<2 x i64> [[TMP1]])
				// CHECK-BE-NEXT: ret <16 x i8> [[TMP2]]
	//			//
	int8x16_t test_vcreateq_s8(uint64_t a, uint64_t b)			int8x16_t test_vcreateq_s8(uint64_t a, uint64_t b)
	{			{
	return vcreateq_s8(a, b);			return vcreateq_s8(a, b);
	}			}

	// CHECK-LABEL: @test_vcreateq_u16(			// CHECK-LE-LABEL: @test_vcreateq_u16(
	// CHECK-NEXT: entry:			// CHECK-LE-NEXT: entry:
	// CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i64> undef, i64 [[A:%.]], i64 0			// CHECK-LE-NEXT: [[TMP0:%.]] = insertelement <2 x i64> undef, i64 [[A:%.]], i64 0
	// CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i64> [[TMP0]], i64 [[B:%.]], i64 1			// CHECK-LE-NEXT: [[TMP1:%.]] = insertelement <2 x i64> [[TMP0]], i64 [[B:%.]], i64 1
	// CHECK-NEXT: [[TMP2:%.*]] = bitcast <2 x i64> [[TMP1]] to <8 x i16>			// CHECK-LE-NEXT: [[TMP2:%.*]] = bitcast <2 x i64> [[TMP1]] to <8 x i16>
	// CHECK-NEXT: ret <8 x i16> [[TMP2]]			// CHECK-LE-NEXT: ret <8 x i16> [[TMP2]]
				//
				// CHECK-BE-LABEL: @test_vcreateq_u16(
				// CHECK-BE-NEXT: entry:
				// CHECK-BE-NEXT: [[TMP0:%.]] = insertelement <2 x i64> undef, i64 [[A:%.]], i64 0
				// CHECK-BE-NEXT: [[TMP1:%.]] = insertelement <2 x i64> [[TMP0]], i64 [[B:%.]], i64 1
				// CHECK-BE-NEXT: [[TMP2:%.*]] = call <8 x i16> @llvm.arm.mve.vreinterpretq.v8i16.v2i64(<2 x i64> [[TMP1]])
				// CHECK-BE-NEXT: ret <8 x i16> [[TMP2]]
	//			//
	uint16x8_t test_vcreateq_u16(uint64_t a, uint64_t b)			uint16x8_t test_vcreateq_u16(uint64_t a, uint64_t b)
	{			{
	return vcreateq_u16(a, b);			return vcreateq_u16(a, b);
	}			}

	// CHECK-LABEL: @test_vcreateq_u32(			// CHECK-LE-LABEL: @test_vcreateq_u32(
	// CHECK-NEXT: entry:			// CHECK-LE-NEXT: entry:
	// CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i64> undef, i64 [[A:%.]], i64 0			// CHECK-LE-NEXT: [[TMP0:%.]] = insertelement <2 x i64> undef, i64 [[A:%.]], i64 0
	// CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i64> [[TMP0]], i64 [[B:%.]], i64 1			// CHECK-LE-NEXT: [[TMP1:%.]] = insertelement <2 x i64> [[TMP0]], i64 [[B:%.]], i64 1
	// CHECK-NEXT: [[TMP2:%.*]] = bitcast <2 x i64> [[TMP1]] to <4 x i32>			// CHECK-LE-NEXT: [[TMP2:%.*]] = bitcast <2 x i64> [[TMP1]] to <4 x i32>
	// CHECK-NEXT: ret <4 x i32> [[TMP2]]			// CHECK-LE-NEXT: ret <4 x i32> [[TMP2]]
				//
				// CHECK-BE-LABEL: @test_vcreateq_u32(
				// CHECK-BE-NEXT: entry:
				// CHECK-BE-NEXT: [[TMP0:%.]] = insertelement <2 x i64> undef, i64 [[A:%.]], i64 0
				// CHECK-BE-NEXT: [[TMP1:%.]] = insertelement <2 x i64> [[TMP0]], i64 [[B:%.]], i64 1
				// CHECK-BE-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.arm.mve.vreinterpretq.v4i32.v2i64(<2 x i64> [[TMP1]])
				// CHECK-BE-NEXT: ret <4 x i32> [[TMP2]]
	//			//
	uint32x4_t test_vcreateq_u32(uint64_t a, uint64_t b)			uint32x4_t test_vcreateq_u32(uint64_t a, uint64_t b)
	{			{
	return vcreateq_u32(a, b);			return vcreateq_u32(a, b);
	}			}

	// CHECK-LABEL: @test_vcreateq_u64(			// CHECK-LABEL: @test_vcreateq_u64(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i64> undef, i64 [[A:%.]], i64 0			// CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i64> undef, i64 [[A:%.]], i64 0
	// CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i64> [[TMP0]], i64 [[B:%.]], i64 1			// CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i64> [[TMP0]], i64 [[B:%.]], i64 1
	// CHECK-NEXT: ret <2 x i64> [[TMP1]]			// CHECK-NEXT: ret <2 x i64> [[TMP1]]
				MarkMurrayARMUnsubmitted Not Done Reply Inline Actions And a problem here also (with BE)? MarkMurrayARM: And a problem here also (with BE)?
				tmathesonAuthorUnsubmitted Done Reply Inline Actions See above tmatheson: See above
	//			//
	uint64x2_t test_vcreateq_u64(uint64_t a, uint64_t b)			uint64x2_t test_vcreateq_u64(uint64_t a, uint64_t b)
	{			{
	return vcreateq_u64(a, b);			return vcreateq_u64(a, b);
	}			}

	// CHECK-LABEL: @test_vcreateq_u8(			// CHECK-LE-LABEL: @test_vcreateq_u8(
	// CHECK-NEXT: entry:			// CHECK-LE-NEXT: entry:
	// CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i64> undef, i64 [[A:%.]], i64 0			// CHECK-LE-NEXT: [[TMP0:%.]] = insertelement <2 x i64> undef, i64 [[A:%.]], i64 0
	// CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i64> [[TMP0]], i64 [[B:%.]], i64 1			// CHECK-LE-NEXT: [[TMP1:%.]] = insertelement <2 x i64> [[TMP0]], i64 [[B:%.]], i64 1
	// CHECK-NEXT: [[TMP2:%.*]] = bitcast <2 x i64> [[TMP1]] to <16 x i8>			// CHECK-LE-NEXT: [[TMP2:%.*]] = bitcast <2 x i64> [[TMP1]] to <16 x i8>
	// CHECK-NEXT: ret <16 x i8> [[TMP2]]			// CHECK-LE-NEXT: ret <16 x i8> [[TMP2]]
				//
				// CHECK-BE-LABEL: @test_vcreateq_u8(
				// CHECK-BE-NEXT: entry:
				// CHECK-BE-NEXT: [[TMP0:%.]] = insertelement <2 x i64> undef, i64 [[A:%.]], i64 0
				// CHECK-BE-NEXT: [[TMP1:%.]] = insertelement <2 x i64> [[TMP0]], i64 [[B:%.]], i64 1
				// CHECK-BE-NEXT: [[TMP2:%.*]] = call <16 x i8> @llvm.arm.mve.vreinterpretq.v16i8.v2i64(<2 x i64> [[TMP1]])
				// CHECK-BE-NEXT: ret <16 x i8> [[TMP2]]
	//			//
	uint8x16_t test_vcreateq_u8(uint64_t a, uint64_t b)			uint8x16_t test_vcreateq_u8(uint64_t a, uint64_t b)
	{			{
	return vcreateq_u8(a, b);			return vcreateq_u8(a, b);
	}			}

	// CHECK-LABEL: @test_vuninitializedq_polymorphic_f16(			// CHECK-LABEL: @test_vuninitializedq_polymorphic_f16(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	▲ Show 20 Lines • Show All 178 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[ARM][MVE] vcreateq lane ordering for big endian
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 341864

clang/include/clang/Basic/arm_mve.td

clang/test/CodeGen/arm-mve-intrinsics/admin.c

This is an archive of the discontinued LLVM Phabricator instance.

[ARM][MVE] vcreateq lane ordering for big endianClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 341864

clang/include/clang/Basic/arm_mve.td

clang/test/CodeGen/arm-mve-intrinsics/admin.c

[ARM][MVE] vcreateq lane ordering for big endian
ClosedPublic