Add support for vcadd_* family of intrinsics. This set of intrinsics is
available in Armv8.3-A.
The fp16 versions require the FP16 extension, which has been available
(opt-in) since Armv8.2-A.
Paths
| Differential D70862
[ARM][AArch64] Complex addition Neon intrinsics for Armv8.3-A ClosedPublic Authored by vhscampos on Nov 29 2019, 9:10 AM.
Details Summary Add support for vcadd_* family of intrinsics. This set of intrinsics is The fp16 versions require the FP16 extension, which has been available
Diff Detail
Event TimelineHerald added projects: Restricted Project, Restricted Project. · View Herald TranscriptNov 29 2019, 9:10 AM Herald added subscribers: llvm-commits, cfe-commits, hiraditya, kristof.beyls. · View Herald Transcript Comment Actions Why are you only implementing rot90 and rot270 intrinsics? My quick calculations made rot0 and rot90 the natural ones to implement a bog-standard complex multiplication, but even if I slipped up there I'd expect the others to be useful in some situations.
Comment Actions
Sorry, ignore that. I didn't notice you were doing the addition instructions rather than multiplication. LGTM! This revision is now accepted and ready to land.Dec 2 2019, 1:36 AM Closed by commit rGdcf11c5e86ce: [ARM][AArch64] Complex addition Neon intrinsics for Armv8.3-A (authored by vhscampos). · Explain WhyDec 2 2019, 6:40 AM This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 231560 clang/include/clang/Basic/arm_neon.td
clang/lib/Basic/Targets/AArch64.cpp
clang/lib/Basic/Targets/ARM.h
clang/lib/Basic/Targets/ARM.cpp
clang/lib/CodeGen/CGBuiltin.cpp
clang/test/CodeGen/aarch64-neon-vcadd.c
clang/test/CodeGen/arm-neon-vcadd.c
llvm/include/llvm/IR/IntrinsicsAArch64.td
llvm/include/llvm/IR/IntrinsicsARM.td
llvm/lib/Target/AArch64/AArch64InstrInfo.td
llvm/lib/Target/ARM/ARMInstrNEON.td
llvm/test/CodeGen/AArch64/neon-vcadd.ll
llvm/test/CodeGen/ARM/neon-vcadd.ll
|
I take it you can't fuse this with vcadd_rot90 because NeonEmitter tries to call it vcadd_rot90q? If so, I think your solution is reasonable, the rotations are a tiny edge-case in the ISA.