This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Select sadd_sat, uadd_sat, usub_sat and ssub_sat.
ClosedPublic

Authored by dmgreen on Oct 24 2019, 4:15 AM.

Details

Summary

This adds some extra patterns to select AArch64 Neon SQADD, UQADD, SQSUB and UQSUB from the existing target independent _sat nodes.

It does not attempt to replace the existing int_aarch64_neon_uqadd intrinsic nodes as they are apparently used for both scalar and vector, and need to be legal on scalar types for some of the patterns to work. The int_aarch64_neon_uqadd on scalar would move the two integers into floating point registers, perform a Neon uqadd and move the value back. I don't believe this is good idea for uadd_sat to do the same as the scalar alternative is simpler (an adds with a csinv). For signed it may be smaller, but I'm not sure about it being better.

So this just adds some extra patterns for the existing vector instructions, matching on the _sat nodes.

Diff Detail

Event Timeline

dmgreen created this revision.Oct 24 2019, 4:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 24 2019, 4:15 AM

Do we care about FPSR.QC? I remember we had some discussion about it in the past, but I'm not remembering off the top of my head.

I agree it doesn't make sense to move an integer from a GPR to an FPR just to perform a saturating add. Maybe GlobalISel could do something more clever here.

Otherwise looks fine.

Thanks.

Do we care about FPSR.QC? I remember we had some discussion about it in the past, but I'm not remembering off the top of my head.

I had not seen anything on FPSR.QC for AArch64 (or AArch32 Neon), and was presuming it would be fine to generate them from _sats. Was this discussion on the llvm lists somewhere? The aarch32 ACLE intrinsics tried to make some rules regarding q flags, but that's not something that either clang or gcc honour. And I've not seen anything similar for Neon.

efriedma accepted this revision.Oct 28 2019, 5:31 PM

I might be getting QC confused with one of the other flags. I think there was an issue at some point when someone was trying to add intrinsics to read/write the flags, but we didn't provide appropriate guarantees. Not sure that ever got merged.

LGTM

This revision is now accepted and ready to land.Oct 28 2019, 5:31 PM
This revision was automatically updated to reflect the committed changes.