This is an archive of the discontinued LLVM Phabricator instance.

[X86] Twist shuffle mask when fold HOP(SHUFFLE(X,Y),SHUFFLE(X,Y)) -> SHUFFLE(HOP(X,Y))
ClosedPublic

Authored by pengfei on Jun 25 2021, 2:41 AM.

Details

Summary

This patch tries to fix PR50823.

The shuffle mask should be twisted twice before gotten the correct one due to the difference between inner HOP and outer.

Diff Detail

Event Timeline

pengfei created this revision.Jun 25 2021, 2:41 AM
pengfei requested review of this revision.Jun 25 2021, 2:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 25 2021, 2:41 AM

Thanks for looking at this - I've been busy with other things this week and haven't really been keeping up with bug traffic!

llvm/test/CodeGen/X86/pr50823.ll
45

can some of this metadata be stripped out to simplify the test?

This has scaringly many magic numbers.

llvm/lib/Target/X86/X86ISelLowering.cpp
43639

Since the final shuffle has element type of i64/f64,
should this enforce that the source element type is less than that?

43650

Comment didn't save somehow.
Don't you want something like sizeof(i64)/sizeof(current elt size) here?

43664–43666

So what this checking is that ScaledMask0 == ScaledMask1 (i.e. scaled Mask0 and Mask1 are identical)?
Again, special-casing 8 seems weird.

spatel added a subscriber: spatel.Jun 25 2021, 8:48 AM
pengfei marked an inline comment as done.Jun 27 2021, 2:55 AM
pengfei added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
43639

The source type can only be i16, i32, f32 and f64.

43664–43666

Yes. Thanks for reminding.

pengfei updated this revision to Diff 354735.Jun 27 2021, 2:55 AM

Address review comments.

Finally, I figured out the math here. The shuffle mask should be twisted twice before gotten the correct one. But the output happens to be identical when the input mask is <0, 2, 1, 3>. It confused me a long time and I think this is why the bug was hidden.

pengfei added inline comments.Jun 27 2021, 2:57 AM
llvm/test/CodeGen/X86/packss.ll
373

This is a bug. Verified here https://godbolt.org/z/x5Kns4Mha

pengfei retitled this revision from [X86] Limit the scaled element type to i64/f64 to [X86] Twist shuffle mask when fold HOP(SHUFFLE(X,Y),SHUFFLE(X,Y)) -> SHUFFLE(HOP(X,Y)).Jun 27 2021, 3:01 AM
pengfei edited the summary of this revision. (Show Details)
RKSimon added inline comments.Jun 29 2021, 11:26 AM
llvm/test/CodeGen/X86/packss.ll
373

Did you manage to track this one down?

llvm/test/CodeGen/X86/pr50823.ll
6

unused declaration?

pengfei added inline comments.Jun 29 2021, 5:54 PM
llvm/test/CodeGen/X86/packss.ll
373

Oops, the words are misleading. I meant the previous generated code was buggy and this patch fixes it. I should have commented on the left side. :)

llvm/test/CodeGen/X86/pr50823.ll
6

Good catch!

pengfei updated this revision to Diff 355415.Jun 29 2021, 5:54 PM

Address review comment.

pengfei updated this revision to Diff 356432.Jul 4 2021, 11:10 PM

Fix Lint warning and rebase.

RKSimon accepted this revision.Jul 5 2021, 5:26 AM

LGTM - confirmed the fix - thanks!

This revision is now accepted and ready to land.Jul 5 2021, 5:26 AM

LGTM - confirmed the fix - thanks!

Thanks for confirming it!