This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE2] Asm: implement CDOT instruction
ClosedPublic

Authored by c-rhodes on May 14 2019, 8:29 AM.

Details

Summary

The complex DOT instructions perform a dot-product on quadtuplets from
two source vectors and the resuling wide real or wide imaginary is
accumulated into the destination register. The instructions come in two
forms:

Vector form, e.g.

cdot z0.s, z1.b, z2.b, #90    - complex dot product on four 8-bit quad-tuplets,
                                accumulating results in 32-bit elements. The
                                complex numbers in the second source vector are
                                rotated by 90 degrees.

cdot z0.d, z1.h, z2.h, #180   - complex dot product on four 16-bit quad-tuplets,
                                accumulating results in 64-bit elements.
                                The complex numbers in the second source
                                vector are rotated by 180 degrees.

Indexed form, e.g.

cdot z0.s, z1.b, z2.b[3], #0  - complex dot product on four 8-bit quad-tuplets,
                                with specified quadtuplet from second source vector,
                                accumulating results in 32-bit elements.
cdot z0.d, z1.h, z2.h[1], #0  - complex dot product on four 16-bit quad-tuplets,
                                with specified quadtuplet from second source vector,
                                accumulating results in 64-bit elements.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Diff Detail

Repository
rL LLVM

Event Timeline

c-rhodes created this revision.May 14 2019, 8:29 AM
rovka accepted this revision.May 15 2019, 12:50 AM

LGTM

This revision is now accepted and ready to land.May 15 2019, 12:50 AM

Ah, sorry, didn't noticed the LGTM until I clicked the submit button.

rovka added a comment.May 15 2019, 1:42 AM

Ah, sorry, didn't noticed the LGTM until I clicked the submit button.

No problem, two LGTMs are better than one ;)

This revision was automatically updated to reflect the committed changes.