This is an archive of the discontinued LLVM Phabricator instance.

[ARM][AArch64] Complex addition Neon intrinsics for Armv8.3-A
ClosedPublic

Authored by vhscampos on Nov 29 2019, 9:10 AM.

Details

Summary

Add support for vcadd_* family of intrinsics. This set of intrinsics is
available in Armv8.3-A.

The fp16 versions require the FP16 extension, which has been available
(opt-in) since Armv8.2-A.

Event Timeline

vhscampos created this revision.Nov 29 2019, 9:10 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptNov 29 2019, 9:10 AM

Why are you only implementing rot90 and rot270 intrinsics? My quick calculations made rot0 and rot90 the natural ones to implement a bog-standard complex multiplication, but even if I slipped up there I'd expect the others to be useful in some situations.

clang/include/clang/Basic/arm_neon.td
1687

I take it you can't fuse this with vcadd_rot90 because NeonEmitter tries to call it vcadd_rot90q? If so, I think your solution is reasonable, the rotations are a tiny edge-case in the ISA.

t.p.northover accepted this revision.Dec 2 2019, 1:36 AM

Why are you only implementing rot90 and rot270 intrinsics? My quick calculations made rot0 and rot90 the natural ones to implement a bog-standard complex multiplication, but even if I slipped up there I'd expect the others to be useful in some situations.

Sorry, ignore that. I didn't notice you were doing the addition instructions rather than multiplication. LGTM!

This revision is now accepted and ready to land.Dec 2 2019, 1:36 AM
This revision was automatically updated to reflect the committed changes.