This is an archive of the discontinued LLVM Phabricator instance.

[x86] enable fast sqrtss tuning for AMD Zen cores
ClosedPublic

Authored by spatel on Feb 4 2022, 7:52 AM.

Details

Summary

As discussed in D118534, all of the recent AMD CPUs have relatively fast (<14 cycle latency) "sqrtss" instructions:
https://uops.info/table.html?search=sqrtps&cb_lat=on&cb_tp=on&cb_SNB=on&cb_SKL=on&cb_ZENp=on&cb_ZEN2=on&cb_ZEN3=on&cb_measurements=on&cb_avx=on&cb_sse=on

So we should set this tuning flag to alter codegen of plain "sqrt(X)" expansion (as opposed to reciprocal-sqrt - there is other test coverage for that pattern). The expansion is both slower and less accurate than the hardware instruction.

Diff Detail

Event Timeline

spatel created this revision.Feb 4 2022, 7:52 AM
spatel requested review of this revision.Feb 4 2022, 7:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 4 2022, 7:52 AM
lebedev.ri added inline comments.Feb 4 2022, 8:02 AM
llvm/lib/Target/X86/X86.td
1172

Please also add TuningFastVectorFSQRT, i don't see any difference between scalar and vector variants at least on znver3.

spatel marked an inline comment as done.Feb 4 2022, 9:05 AM
spatel added inline comments.
llvm/lib/Target/X86/X86.td
1172

Good point. On Zen1, it looks like 256-bit would be split, but that's still recip throughput of 8 cycles, so it's better than the 10+ instructions/cycles in the estimate sequence.

spatel edited the summary of this revision. (Show Details)Feb 4 2022, 9:05 AM
spatel updated this revision to Diff 405992.Feb 4 2022, 9:08 AM
spatel marked an inline comment as done.

Added fast vector sqrt flag in addition to fast scalar. These have throughput <8 (see uops link in description), so better than the estimate sequence.

lebedev.ri accepted this revision.Feb 4 2022, 9:21 AM

LGTM, thank you.
This should be backported, please file an issue.

This revision is now accepted and ready to land.Feb 4 2022, 9:21 AM
This revision was landed with ongoing or failed builds.Feb 4 2022, 10:59 AM
This revision was automatically updated to reflect the committed changes.