This is a continuation of https://reviews.llvm.org/D32511. It mainly has:
- use fullfp16 flag instead of fp16
- Update a failing test case that runs with long-tests configuration.
Paths
| Differential D34161
[AArch64] Add ARMv8.2-A FP16 vector intrinsics - Continuation ClosedPublic Authored by az on Jun 13 2017, 12:57 PM.
Details
Summary This is a continuation of https://reviews.llvm.org/D32511. It mainly has:
Diff Detail Event TimelineHerald added subscribers: kristof.beyls, javed.absar, rengolin, aemerson. · View Herald TranscriptJun 13 2017, 12:57 PM This revision is now accepted and ready to land.Jun 13 2017, 3:05 PM Comment Actions Not sure if I am missing something, but the new regression tests arm-v8.2a-neon-intrinsics.c and aarch64-neon-intrinsics.c are failing for me (after first applying patch D32511 and then this one here in D34161). For example, the very first test in arm-v8.2a-neon-intrinsics.c checks that the function argument is <4 x half> %a, but <4 x half> %1 is generated. With anything other than -O0 you will get the %a (and then it becomes a tailcall). Comment Actions
I renamed arm-v8.2a-neon-intrinsics.c to aarch64-v8.2a-neon-intrinsics.c and modified the +fp16 flag. Could it be that you are still running the old file from the old patch with the old flag? If that is not the case, then I will add this: For the newly added file aarch64-v8.2a-neon-intrinsics.c, It passed for me in this patch and in the original one (D32511). However, it failed for other people (in the same way as you are describing: %1 vs. %a) in the original patch and the changes were reverted. That lead me to execute this test individually in D32511. It gave me a warning about the +fp16 flag and I replaced it with the correct +fullfp16 flag after consulting with you about it. I was hoping that fixing the flag and the warning will make the test pass for all. Note that the conversion of <4 x half> %1, to ##<4 x half> %a# is done by mem2reg pass (opt -S mem2reg). I am not sure why this optimization does not work for you on this test. I can either 1) replace the occurrence of variables such as %a by * in the test because we are not really testing the optimization after all or 2) send me your cmake command so that I can build and test in the same way to be able to reproduce and investigate the problem on my side. For the old test arm-neon-intrinsics.c, I did not run it in D32511 because it has REQUIRES: long-tests and our testing configuration did not run those. Once I run it, I can reproduce the problem and I fixed it in this patch. Does it still fail for you? Comment Actions Just to avoid any confusion/mistakes, can you upload the final patch that you intend to commit? [[ABS:%.*]] = call <4 x half> @llvm.fabs.v4f16(<4 x half> %a) to match this: %vabs1.i = tail call <4 x half> @llvm.fabs.v4f16(<4 x half> %a) where the only difference is "tail". So I agree that is a straightforward bugfix and also the correct thing to do (i.e. using option +fullfp16). Comment Actions Ok, I will upload the combination of the two patches here. I am kind of undecided between your suggestion of using -O2 or my previous solution of removing 'opt -S -mem2reg'. the former makes the test more readable but unstable given that future changes to optimizations at -O2 level may result in updating the tests (but, I do no expect much changes given that they are small tests). Comment Actions I am resubmitting the combination of D32511 and D34161 in here to avoid any confusion. Also, I experimented with our build system and figure out a way to reproduce the problem we were discussing and it looks like adding the flag -disable-O0-optnone fixes the issue as this will not disable "opt -memreg" optimization after running clang. Comment Actions Hi Abderrazek,
Closed by commit rL305820: [AArch64] ADD ARMv.2-A FP16 vector intrinsics (authored by az). · Explain WhyJun 20 2017, 11:55 AM This revision was automatically updated to reflect the committed changes. az marked an inline comment as done. Comment Actions Unfortunately this is causing problems in testing: SplitVectorOperand Op #3: t8: ch = llvm.arm.neon.vst4lane<ST32[%0](align=2)> t3, TargetConstant:i32<699>, FrameIndex:i32<0>, undef:v4f16, undef:v4f16, undef:v4f16, undef:v4f16, Constant:i32<1>, Constant:i32<2> fatal error: error in backend: Do not know how to split this operator's operand! This assert is triggered in lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp:1457 Reproducer: typedef __attribute__((neon_vector_type(4))) short uint16x4_t; void dotests_1082() { __fp16 result[1]; uint16x4_t __s1_0_3; uint16x4_t __s1_0_2; uint16x4_t __s1_0_1; uint16x4_t __s1_0_0; __builtin_neon_vst4_lane_v(result, __s1_0_0, __s1_0_1, __s1_0_2, __s1_0_3, 1, 8); } Compile with: clang -O2 -target armv8-linux-gnueabihf -mcpu=cortex-a57 -D__ARM_NEON_FP16_INTRINSICS -S t.c Befor this patch the IR looked like this: call void @llvm.arm.neon.vst4lane.p0i8.v4i16(i8* nonnull %0, <4 x i16> undef, <4 x i16> undef, <4 x i16> undef, <4 x i16> undef, i32 1, i32 2) And now after: call void @llvm.arm.neon.vst4lane.p0i8.v4f16(i8* nonnull %0, <4 x half> undef, <4 x half> undef, <4 x half> undef, <4 x half> undef, i32 1, i32 2) Note that the i16 types have changed into half and the intrinsic from i16 to f16.
Revision Contents
Diff 103097 clang/include/clang/Basic/arm_neon.td
clang/lib/Basic/Targets.cpp
clang/lib/CodeGen/CGBuiltin.cpp
clang/lib/CodeGen/CodeGenModule.cpp
clang/lib/CodeGen/CodeGenTypeCache.h
clang/test/CodeGen/aarch64-neon-intrinsics.c
clang/test/CodeGen/aarch64-neon-ldst-one.c
clang/test/CodeGen/aarch64-v8.2a-neon-intrinsics.c
clang/test/CodeGen/arm_neon_intrinsics.c
clang/utils/TableGen/NeonEmitter.cpp
|
Renaming this to HasFullFP16 is more consistent.