This patch enables the use of reciprocal estimates for SVE
when both the -Ofast and -mrecip flags are used.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
LGTM! I think the patch looks good to go as is, but if you do manage to work out why we're adding the compares and selects for @fsqrt_4f32 and remove them that would be awesome. :)
llvm/test/CodeGen/AArch64/sve-fp-reciprocal.ll | ||
---|---|---|
25 | It's interesting that the fmul here is the predicated form, whereas for fdiv_recip_4f32 it's the unpredicated form. This has nothing to do with your patch though, but perhaps worth investigating in the future? | |
56 | nit: Just for clarity is it worth renaming these to @fdiv_2f64 and @fdiv_recip_2f64 to be consistent with the f32 versions? | |
137 | Again, I don't think this is caused by your patch, but it's probably worth investigating why we're selecting between the original input and the estimate based on a zero input. It feels inconsistent with @fsqrt_4f32 where we don't seem to worry about the input. | |
144 | nit: Again, maybe for consistency it's better to use the name @fsqrt_2f64 here and below? |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
8255 | Out of interest is there a reason we ignore f16 vectors here? | |
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td | ||
1987–2011 | Are these required? The patterns should already exist within the instruction definition classes. All that's needed is to add c++ code to lower the intrinsics to these AArch64ISD nodes, which is something we've done for other operations so as not to have duplicate patterns. |
- Add lowering of the aarch64_sve_frecp[e|s]_x & aarch64_sve_frsqrt[e|s]_x intrinsics to existing AArch64ISD nodes in AArch64ISelLowering.cpp & removed duplicate tablegen patterns.
- Added tests for nxv2f16, nxv4f16 & nxv8f16 types
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
8255 | When I created this patch I thought that DAGCombiner::BuildDivEstimate/BuildSqrtEstimate didn't support f16 types, which was incorrect. I've added the f16 vector types here & added tests for these to sve-fp-reciprocal.ll. |
LGTM! It looks like you've addressed @paulwalker-arm's comments. I'm happy for us to look at investigating removing the fcmeq and sel instructions at a later time.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
8253 | nit: Can you fix the formatting issue before merging please? | |
llvm/test/CodeGen/AArch64/sve-fp-reciprocal.ll | ||
158 | nit: For the fsqrt functions can you remove the second %b arguments as they seem to be unused? |
llvm/test/CodeGen/AArch64/sve-fp-reciprocal.ll | ||
---|---|---|
25 | @david-arm The predicate is generated for the unpacked types so that inactive lanes can never trigger floating point exceptions. So this is expected behaviour. @kmclaughlin This does raise an interesting point though. Is it safe to use the reciprocal instructions for unpacked types? With the answer depending on whether these instruction can generate exceptions. |
Just a couple of extra points that depend on the answer to my previous question.
llvm/lib/Target/AArch64/SVEInstrFormats.td | ||
---|---|---|
1935–1936 ↗ | (On Diff #380921) | If the answer to my previous question is that it is unsafe for unpacked vectors then please remove these patterns. If there comes a time that we definitely want to support unpacked vectors I think we'll probably need slightly differ ISEL patterns for those types. |
2637 ↗ | (On Diff #380921) | As above. |
- Removed additional patterns and tests for unpacked vector types.
llvm/lib/Target/AArch64/SVEInstrFormats.td | ||
---|---|---|
1935–1936 ↗ | (On Diff #380921) | Hi @paulwalker-arm, I've removed these patterns as the reciprocal instructions can generate exceptions. |
nit: Can you fix the formatting issue before merging please?