This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add InstFixup for masked `unpck{l|h}pd` -> masked `shufpd`
ClosedPublic

Authored by goldstein.w.n on Apr 4 2023, 9:05 AM.

Details

Summary

This is a follow up D147507 which removed the prior transformation to
shufps which was incorrect as the mask was for 64-bit double
elements, not 32-bit float elements. Using shufpd for the
replacement, however, preserves the mask semantics and has the same
benefits as shufps.

Diff Detail

Event Timeline

goldstein.w.n created this revision.Apr 4 2023, 9:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 4 2023, 9:05 AM
goldstein.w.n requested review of this revision.Apr 4 2023, 9:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 4 2023, 9:05 AM
pengfei accepted this revision.Apr 4 2023, 7:41 PM

LGTM.

llvm/lib/Target/X86/X86FixupInstTuning.cpp
245–262

No sure if we should use VSHUFPD for them too.

This revision is now accepted and ready to land.Apr 4 2023, 7:41 PM
RKSimon accepted this revision.Apr 5 2023, 12:50 AM

LGTM

llvm/lib/Target/X86/X86FixupInstTuning.cpp
245–262

+1

goldstein.w.n marked 2 inline comments as done.Apr 5 2023, 10:52 PM

shufps -> shufpd

This revision was landed with ongoing or failed builds.Apr 5 2023, 11:37 PM
This revision was automatically updated to reflect the committed changes.
skatkov added a subscriber: skatkov.Apr 7 2023, 1:52 AM
skatkov added inline comments.
llvm/lib/Target/X86/X86FixupInstTuning.cpp
157

Hello Noah, we've got downstream mis-compile on fuzzer testing on this patch.
Specifically reverting constants 0x00 -> 0x44 and 0xFF -> 0xee fixes the issue.

While we do not have IR reproducer for this problem at the moment, may be you would be able to detect the problem from source code?
At least I see that this comment states that for r,r case we should use old constants and for r,r,k new ones.
However it looks like the patch updates more cases. Is it correct?
It just is wild guess as I'm not familiar with this code.

pengfei added inline comments.Apr 7 2023, 2:16 AM
llvm/lib/Target/X86/X86FixupInstTuning.cpp
235–242

Guess problem may come from here.

skatkov added a comment.EditedApr 7 2023, 2:19 AM

If it helps. the bad and good compilations differ only in one line:

vshufps	$0, %xmm0, %xmm1, %xmm0         # xmm0 = xmm1[0,0],xmm0[0,0]

vs

vshufps	$68, %xmm0, %xmm1, %xmm0        # xmm0 = xmm1[0,1],xmm0[0,1]

So probably the problem comes from case which have not been updated but uses a lambda which is updated?

I put D147775 for example, but think we can modify it together with D147728.

@pengfei D147775 works, thanks!

Thanks for the confirmation! Then I'd like to land it as a quick fix.