This patch enables the use of reciprocal estimates for SVE
when both the -Ofast and -mrecip flags are used.
LGTM! I think the patch looks good to go as is, but if you do manage to work out why we're adding the compares and selects for @fsqrt_4f32 and remove them that would be awesome. :)
It's interesting that the fmul here is the predicated form, whereas for fdiv_recip_4f32 it's the unpredicated form. This has nothing to do with your patch though, but perhaps worth investigating in the future?
nit: Just for clarity is it worth renaming these to @fdiv_2f64 and @fdiv_recip_2f64 to be consistent with the f32 versions?
Again, I don't think this is caused by your patch, but it's probably worth investigating why we're selecting between the original input and the estimate based on a zero input. It feels inconsistent with @fsqrt_4f32 where we don't seem to worry about the input.
nit: Again, maybe for consistency it's better to use the name @fsqrt_2f64 here and below?
Out of interest is there a reason we ignore f16 vectors here?
Are these required? The patterns should already exist within the instruction definition classes. All that's needed is to add c++ code to lower the intrinsics to these AArch64ISD nodes, which is something we've done for other operations so as not to have duplicate patterns.
- Add lowering of the aarch64_sve_frecp[e|s]_x & aarch64_sve_frsqrt[e|s]_x intrinsics to existing AArch64ISD nodes in AArch64ISelLowering.cpp & removed duplicate tablegen patterns.
- Added tests for nxv2f16, nxv4f16 & nxv8f16 types
When I created this patch I thought that DAGCombiner::BuildDivEstimate/BuildSqrtEstimate didn't support f16 types, which was incorrect. I've added the f16 vector types here & added tests for these to sve-fp-reciprocal.ll.
LGTM! It looks like you've addressed @paulwalker-arm's comments. I'm happy for us to look at investigating removing the fcmeq and sel instructions at a later time.
nit: Can you fix the formatting issue before merging please?
nit: For the fsqrt functions can you remove the second %b arguments as they seem to be unused?
@david-arm The predicate is generated for the unpacked types so that inactive lanes can never trigger floating point exceptions. So this is expected behaviour.
@kmclaughlin This does raise an interesting point though. Is it safe to use the reciprocal instructions for unpacked types? With the answer depending on whether these instruction can generate exceptions.
Just a couple of extra points that depend on the answer to my previous question.
|1935–1936 ↗||(On Diff #380921)|
If the answer to my previous question is that it is unsafe for unpacked vectors then please remove these patterns. If there comes a time that we definitely want to support unpacked vectors I think we'll probably need slightly differ ISEL patterns for those types.
|2637 ↗||(On Diff #380921)|