This is an archive of the discontinued LLVM Phabricator instance.

[X86] Enable reciprocal estimates for v16f32 vectors by using VRCP14PS/VRSQRT14PS
ClosedPublic

Authored by craig.topper on May 5 2018, 4:16 PM.

Details

Summary

The legacy VRCPPS/VRSQRTPS instructions aren't available in 512-bit versions. The new increased precision versions are. So we can use those to implement v16f32 reciprocal estimates.

For KNL CPUs we can probably use VRCP28PS/VRSQRT28PS and avoid the NR step altogether, but I leave that for a future patch.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.May 5 2018, 4:16 PM
spatel accepted this revision.May 6 2018, 8:47 AM
spatel added a subscriber: RKSimon.

LGTM - see inline for possible improvements.

lib/Target/X86/X86ISelLowering.cpp
17813–17817 ↗(On Diff #145388)

Potential enhancements for follow-up patches:

  1. Use the new scalar estimate (VRSQRT14SS) if we have the required AVX-ness.
  2. Use VRSQRT14SD for an f64.
  3. Use VRSQRT14PD for vectors of f64.
  4. Repeat all of the above for VRCP14xx.
test/CodeGen/X86/recip-fastmath.ll
1226 ↗(On Diff #145388)

Not sure where the timing is defined (cc @RKSimon), but that vdivps timing can't be right. Agner has it at 32:20. Might want to verify the new instruction sequence timings too

This revision is now accepted and ready to land.May 6 2018, 8:47 AM
This revision was automatically updated to reflect the committed changes.
LuoYuanke added inline comments.
llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
17823

@craig.topper, for v4f32 and v8f32, if avx512f is available, do we prefer RSQRT14 or FRSQRT?

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptOct 15 2022, 4:59 PM
craig.topper added inline comments.Oct 15 2022, 6:03 PM
llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
17823

FRSQRT is a shorter encoding but the result would probably be more accurate with RSQRT14. Not sure what’s best.

test/CodeGen/X86/recip-fastmath.ll
1226 ↗(On Diff #145388)

KNL is using the Haswell scheduler model I think. And last I looked all the divide instructions were using InstRWs for each instruction. Since Haswell doesn't have VDIVPSZrr it probably just got some garbage default.

LuoYuanke added inline comments.Oct 15 2022, 6:10 PM
llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
17823

Got it. Thanks, Craig.