rGa3b8695bf592 enabled this for znver3, but AMD SoG, Agner and uops.info all agree that even znver1 has a fast per-lane shuffle op (VPSHUFB), but cross-lane shuffles seem to be slow (PERMPS etc.)
Fixes PR44795
| Paths 
 |  Differential  D123306  
[X86] Enable fast variable per-lane shuffle tuning on all Ryzen targets ClosedPublic Authored by RKSimon on Apr 7 2022, 6:17 AM. 
Details Summary rGa3b8695bf592 enabled this for znver3, but AMD SoG, Agner and uops.info all agree that even znver1 has a fast per-lane shuffle op (VPSHUFB), but cross-lane shuffles seem to be slow (PERMPS etc.) Fixes PR44795 
Diff Detail 
 Event TimelineThis revision is now accepted and ready to land.Apr 7 2022, 6:58 AM This revision was landed with ongoing or failed builds.Apr 7 2022, 8:20 AM Closed by commit rGcf3a09369a29: [X86] Enable fast variable per-lane shuffle tuning on all Ryzen targets… (authored by RKSimon).  ·  Explain Why This revision was automatically updated to reflect the committed changes. 
Revision Contents 
 
Diff 421224 llvm/lib/Target/X86/X86.td
 llvm/test/CodeGen/X86/vector-shuffle-fast-per-lane.ll
 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||