This is an archive of the discontinued LLVM Phabricator instance.

ARM v8.1a adds Advanced SIMD instructions for Rounding Double Multiply Add/Subtract.
ClosedPublic

Authored by labrinea on Nov 25 2015, 4:28 AM.

Details

Summary

The following instructions are added to AArch32 instruction set:

  • VQRDMLAH: Vector Saturating Rounding Doubling Multiply Accumulate Returning High Half
  • VQRDMLSH: Vector Saturating Rounding Doubling Multiply Subtract Returning High Half

The following instructions are added to AArch64 instruction set:

  • SQRDMLAH: Signed Saturating Rounding Doubling Multiply Accumulate Returning High Half
  • SQRDMLSH: Signed Saturating Rounding Doubling Multiply Subtract Returning High Half

This patch adds intrinsic and ACLE macro support for these instructions, as well as corresponding tests.

Diff Detail

Repository
rL LLVM

Event Timeline

labrinea updated this revision to Diff 41128.Nov 25 2015, 4:28 AM
labrinea retitled this revision from to ARM v8.1a adds Advanced SIMD instructions for Rounding Double Multiply Add/Subtract..
labrinea updated this object.
labrinea added reviewers: jmolloy, rengolin, cfe-commits.
labrinea updated this object.Nov 25 2015, 5:04 AM

Do these get the right diagnostics when used on CPUs without the new feature? I can't see how __ARM_FEATURE_QRDMX gets wired through to arm_neon.h.

labrinea updated this revision to Diff 41316.Nov 27 2015, 10:11 AM

@t.p.northover you were right, my patch was missing predefined guard macros for the instrinsics. I've now updated the patch.

t.p.northover accepted this revision.Nov 27 2015, 10:23 AM
t.p.northover added a reviewer: t.p.northover.

Thanks, LGTM!

Tim.

This revision is now accepted and ready to land.Nov 27 2015, 10:23 AM
This revision was automatically updated to reflect the committed changes.
pengbins added inline comments.
cfe/trunk/include/clang/Basic/arm_neon.td
377

@labrinea
It seems QRDMLSH(p0, p1, p2) is not equivelent to with vqsub( p0, vqrdmulh(p1, p2)).

  • QRDMLSH(p0, p1, p2)
accum = ((p0 << esize) - 2 * (p1 * p2) + rounding_const);
ret = SignedSatQ(accum >> esize, esize);

p0<< esize + rounding_const

  • vqsub( p0, vqrdmulh(p1, p2))
temp = SignedSatQ( (2 * (p1 * p2) + rounding_const) >> esize );
ret = SignedSat( p0  -  temp);

p0<< esize - rounding_const

Here is an example where the results are not same.
vqrdmlshq_s16 ( -197, -512, 11040) = -24
vqsubq_s16( -197, vqrdmulhq_n_s16(-512, 11040)) = -25