This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add inst fixup for `unpckpd` -> `unpckqdq`.
ClosedPublic

Authored by goldstein.w.n on Apr 6 2023, 11:12 AM.

Details

Summary

unpckqdq seems to be treated as a shuffle from bypass delay
perspective (which makes sense it appears to have shared shuffle units
for all micro-arch).

unpckqdq is slightly preferable to shufpd as it saves 1-byte of
code size and can be used to replace the micro-fused rm version. So,
if the target has no bypass delay, we should do unpckpd ->
unpckqdq instead of `shufpd.

Diff Detail

Event Timeline

goldstein.w.n created this revision.Apr 6 2023, 11:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2023, 11:12 AM
goldstein.w.n requested review of this revision.Apr 6 2023, 11:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2023, 11:12 AM
pengfei accepted this revision.Apr 7 2023, 1:18 AM

LGTM.

This revision is now accepted and ready to land.Apr 7 2023, 1:18 AM
pengfei added inline comments.Apr 7 2023, 2:36 AM
llvm/lib/Target/X86/X86FixupInstTuning.cpp
156–157

Update the comments here.

254–261

Change PS to PD.

Rebase + fix shufps selection

goldstein.w.n marked an inline comment as done.

Update comments

goldstein.w.n marked an inline comment as done.Apr 7 2023, 11:03 AM
goldstein.w.n added inline comments.
llvm/lib/Target/X86/X86FixupInstTuning.cpp
156–157

Likewise for the unpckps changes.

254–261

Also added tests for this case in D147726

This revision was automatically updated to reflect the committed changes.
goldstein.w.n marked an inline comment as done.