Page MenuHomePhabricator

[SVE][CodeGen] Enable reciprocal estimates for scalable fdiv/fsqrt

Authored by kmclaughlin on Oct 12 2021, 9:27 AM.



This patch enables the use of reciprocal estimates for SVE
when both the -Ofast and -mrecip flags are used.

Diff Detail

Event Timeline

kmclaughlin created this revision.Oct 12 2021, 9:27 AM
kmclaughlin requested review of this revision.Oct 12 2021, 9:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 12 2021, 9:27 AM
david-arm accepted this revision.Oct 13 2021, 1:51 AM

LGTM! I think the patch looks good to go as is, but if you do manage to work out why we're adding the compares and selects for @fsqrt_4f32 and remove them that would be awesome. :)


It's interesting that the fmul here is the predicated form, whereas for fdiv_recip_4f32 it's the unpredicated form. This has nothing to do with your patch though, but perhaps worth investigating in the future?


nit: Just for clarity is it worth renaming these to @fdiv_2f64 and @fdiv_recip_2f64 to be consistent with the f32 versions?


Again, I don't think this is caused by your patch, but it's probably worth investigating why we're selecting between the original input and the estimate based on a zero input. It feels inconsistent with @fsqrt_4f32 where we don't seem to worry about the input.


nit: Again, maybe for consistency it's better to use the name @fsqrt_2f64 here and below?

This revision is now accepted and ready to land.Oct 13 2021, 1:51 AM
Matt added a subscriber: Matt.Oct 13 2021, 2:50 PM
paulwalker-arm added inline comments.

Out of interest is there a reason we ignore f16 vectors here?


Are these required? The patterns should already exist within the instruction definition classes. All that's needed is to add c++ code to lower the intrinsics to these AArch64ISD nodes, which is something we've done for other operations so as not to have duplicate patterns.

kmclaughlin marked 4 inline comments as done.
  • Add lowering of the aarch64_sve_frecp[e|s]_x & aarch64_sve_frsqrt[e|s]_x intrinsics to existing AArch64ISD nodes in AArch64ISelLowering.cpp & removed duplicate tablegen patterns.
  • Added tests for nxv2f16, nxv4f16 & nxv8f16 types

When I created this patch I thought that DAGCombiner::BuildDivEstimate/BuildSqrtEstimate didn't support f16 types, which was incorrect. I've added the f16 vector types here & added tests for these to sve-fp-reciprocal.ll.

david-arm accepted this revision.Oct 21 2021, 3:05 AM

LGTM! It looks like you've addressed @paulwalker-arm's comments. I'm happy for us to look at investigating removing the fcmeq and sel instructions at a later time.


nit: Can you fix the formatting issue before merging please?


nit: For the fsqrt functions can you remove the second %b arguments as they seem to be unused?

paulwalker-arm added inline comments.Oct 21 2021, 3:25 AM

@david-arm The predicate is generated for the unpacked types so that inactive lanes can never trigger floating point exceptions. So this is expected behaviour.

@kmclaughlin This does raise an interesting point though. Is it safe to use the reciprocal instructions for unpacked types? With the answer depending on whether these instruction can generate exceptions.

Just a couple of extra points that depend on the answer to my previous question.

1935–1936 ↗(On Diff #380921)

If the answer to my previous question is that it is unsafe for unpacked vectors then please remove these patterns. If there comes a time that we definitely want to support unpacked vectors I think we'll probably need slightly differ ISEL patterns for those types.

2637 ↗(On Diff #380921)

As above.

kmclaughlin marked 4 inline comments as done.
  • Removed additional patterns and tests for unpacked vector types.
1935–1936 ↗(On Diff #380921)

Hi @paulwalker-arm, I've removed these patterns as the reciprocal instructions can generate exceptions.

This revision was landed with ongoing or failed builds.Oct 25 2021, 3:31 AM
This revision was automatically updated to reflect the committed changes.