rGa3b8695bf592 enabled this for znver3, but AMD SoG, Agner and uops.info all agree that even znver1 has a fast per-lane shuffle op (VPSHUFB), but cross-lane shuffles seem to be slow (PERMPS etc.)
Fixes PR44795
Paths
| Differential D123306
[X86] Enable fast variable per-lane shuffle tuning on all Ryzen targets ClosedPublic Authored by RKSimon on Apr 7 2022, 6:17 AM.
Details Summary rGa3b8695bf592 enabled this for znver3, but AMD SoG, Agner and uops.info all agree that even znver1 has a fast per-lane shuffle op (VPSHUFB), but cross-lane shuffles seem to be slow (PERMPS etc.) Fixes PR44795
Diff Detail
Event TimelineThis revision is now accepted and ready to land.Apr 7 2022, 6:58 AM This revision was landed with ongoing or failed builds.Apr 7 2022, 8:20 AM Closed by commit rGcf3a09369a29: [X86] Enable fast variable per-lane shuffle tuning on all Ryzen targets… (authored by RKSimon). · Explain Why This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 421224 llvm/lib/Target/X86/X86.td
llvm/test/CodeGen/X86/vector-shuffle-fast-per-lane.ll
|