Use of bitcast resulted in lanes being swapped for vcreateq with big
endian. Fix this by using vreinterpret. No code change for little
endian. Adds IR lit test.
For example, the following code will print different results for big and little endian:
#include <arm_mve.h> extern "C" { void printf(const char *, ...); } int main() { int16x8_t x = vcreateq_s16(0x0003000200010000, 0x0007000600050004); printf("8x16 lanes:\n"); printf("%d:%d\n", 0, vgetq_lane_s16(x, 0)); printf("%d:%d\n", 1, vgetq_lane_s16(x, 1)); printf("%d:%d\n", 2, vgetq_lane_s16(x, 2)); printf("%d:%d\n", 3, vgetq_lane_s16(x, 3)); printf("%d:%d\n", 4, vgetq_lane_s16(x, 4)); printf("%d:%d\n", 5, vgetq_lane_s16(x, 5)); printf("%d:%d\n", 6, vgetq_lane_s16(x, 6)); printf("%d:%d\n", 7, vgetq_lane_s16(x, 7)); int8x16_t y = vcreateq_s8(0x0706050403020100, 0x0f0e0d0c0b0a0908); printf("16x8 lanes:\n"); printf("%d:%d\n", 0, vgetq_lane_s8(y, 0)); printf("%d:%d\n", 1, vgetq_lane_s8(y, 1)); printf("%d:%d\n", 2, vgetq_lane_s8(y, 2)); printf("%d:%d\n", 3, vgetq_lane_s8(y, 3)); printf("%d:%d\n", 4, vgetq_lane_s8(y, 4)); printf("%d:%d\n", 5, vgetq_lane_s8(y, 5)); printf("%d:%d\n", 6, vgetq_lane_s8(y, 6)); printf("%d:%d\n", 7, vgetq_lane_s8(y, 7)); printf("%d:%d\n", 8, vgetq_lane_s8(y, 8)); printf("%d:%d\n", 9, vgetq_lane_s8(y, 9)); printf("%d:%d\n", 10, vgetq_lane_s8(y, 10)); printf("%d:%d\n", 11, vgetq_lane_s8(y, 11)); printf("%d:%d\n", 12, vgetq_lane_s8(y, 12)); printf("%d:%d\n", 13, vgetq_lane_s8(y, 13)); printf("%d:%d\n", 14, vgetq_lane_s8(y, 14)); printf("%d:%d\n", 15, vgetq_lane_s8(y, 15)); return 0; }
For little endian (correct) (clang++ -target arm-none-none-eabi -march=armv8.1-m.main+mve.fp -O0):
8x16 lanes: 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7 16x8 lanes: 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7 8:8 9:9 10:10 11:11 12:12 13:13 14:14 15:15
For big endian (incorrect) (clang++ -target arm-none-none-eabi -march=armv8.1-m.main+mve.fp -O0 -mbig-endian):
8x16 lanes: 0:3 1:2 2:1 3:0 4:7 5:6 6:5 7:4 16x8 lanes: 0:7 1:6 2:5 3:4 4:3 5:2 6:1 7:0 8:15 9:14 10:13 11:12 12:11 13:10 14:9 15:8
This patch brings the big endian output in line with the little endian output.
Bitcast documentation: https://llvm.org/docs/LangRef.html#bitcast-to-instruction
Helium intrinsics documentation: https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/helium-intrinsics
Is this updated with update_cc_test_checks?
It may make the output more verbose, but it will be more standard.