Page MenuHomePhabricator

[AArch64][SVE] Add intrnisics for saturating scalar arithmetic

Authored by andwar on Dec 10 2019, 4:36 AM.



The following intrnisics are added:

  • @llvm.aarch64.sve.sqdec{b|h|w|d|p}
  • @llvm.aarch64.sve.sqinc{b|h|w|d|p}
  • @llvm.aarch64.sve.uqdec{b|h|w|d|p}
  • @llvm.aarch64.sve.uqinc{b|h|w|d|p}

For every intrnisic there a scalar variants (with n32 or n64 suffix) and
vector variants (no suffix).

Diff Detail

Event Timeline

andwar created this revision.Dec 10 2019, 4:36 AM
Herald added a reviewer: efriedma. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
efriedma added inline comments.Dec 10 2019, 4:16 PM

Missing ImmArg markings?


SQDECB always returns a value in a 64-bit register; why are you treating the return value as 32 bits? Even if there's some reason to prefer that form at the IR level, it doesn't seem like a good idea in isel; if you need a sign-extended value, you'll be forced to emit a redundant sign extension.

andwar marked 2 inline comments as done.Dec 13 2019, 9:32 AM
andwar added a subscriber: eli.friedman.

Cheers for taking a look @eli.friedman !


Yes, good point. This means that I will have to duplicate sve_incdec_imm and see_pred_enum with TImmLeaf based equivalents.


There's a 64-bit and 32-bit variant of SQDECB. This pattern is for the 32-bit variant, which returns 32-bit as well 64-bit result. Here we only care about the 32-bit result (because that's what the ACLE intrinsic returns). More specifically, this is meant to allow 1:1 mapping between:

  • int32_t svqdecb_n_s32(int32_t op, uint64_t imm_factor) from ACLE
  • declare i32 @llvm.aarch64.sve.sqdecb.n32(i32, i32, i32) IR intrinsic
  • sqdecb x0, w0, vl3, mul #4 SVE instruction

For the 64-bit variant there's a different intrinsic:

  • int64_t svqdecb_n_s64(int64_t op, uint64_t imm_factor) from ACLE
  • declare i64 @llvm.aarch64.sve.sqdecb.n64(i64, i32, i32) IR intrinsic
  • sqdecb x0, vl4, mul #5 SVE instruction

Also, this multiclass is only used for the intrnisics.

efriedma added inline comments.Dec 13 2019, 10:37 AM

Consider something like the following:

long x(int z) { return svqdecb_n_s32(z, 1);

This function should lower to just a single sqdecb. The way this is written you end up with an unnecessary sxtw.

andwar marked an inline comment as done.Dec 19 2019, 10:36 AM
andwar added inline comments.

I will add extra patterns to cater for this scenario (please check the next patch). The other option would be to rewrite this pattern so that the return value is always i64 and then add some new ISD nodes and truncate the user requests i32. But the overall effect would be similar.

andwar updated this revision to Diff 234751.Dec 19 2019, 10:41 AM
  • Add patterns for scenarios when a 64bit value is requested from an intrinsic returning a 32 bit value (so that unecessary sxtw is avoided)
  • Add test cases for the above
  • Split tests into 4 seperate files (one per instruction)
  • Add missing ImmArg
  • Rebase on top of master
efriedma accepted this revision.Dec 19 2019, 2:25 PM



Okay, that works.

This revision is now accepted and ready to land.Dec 19 2019, 2:25 PM
sdesmalen accepted this revision.Dec 20 2019, 1:29 AM
This revision was automatically updated to reflect the committed changes.