This patch tries to fix PR50823.
The shuffle mask should be twisted twice before gotten the correct one due to the difference between inner HOP and outer.
Paths
| Differential D104903
[X86] Twist shuffle mask when fold HOP(SHUFFLE(X,Y),SHUFFLE(X,Y)) -> SHUFFLE(HOP(X,Y)) ClosedPublic Authored by pengfei on Jun 25 2021, 2:41 AM.
Details Summary This patch tries to fix PR50823. The shuffle mask should be twisted twice before gotten the correct one due to the difference between inner HOP and outer.
Diff Detail
Event TimelineComment Actions Thanks for looking at this - I've been busy with other things this week and haven't really been keeping up with bug traffic!
Comment Actions This has scaringly many magic numbers.
pengfei added inline comments.
Comment Actions Address review comments. Finally, I figured out the math here. The shuffle mask should be twisted twice before gotten the correct one. But the output happens to be identical when the input mask is <0, 2, 1, 3>. It confused me a long time and I think this is why the bug was hidden.
pengfei retitled this revision from [X86] Limit the scaled element type to i64/f64 to [X86] Twist shuffle mask when fold HOP(SHUFFLE(X,Y),SHUFFLE(X,Y)) -> SHUFFLE(HOP(X,Y)).Jun 27 2021, 3:01 AM This revision is now accepted and ready to land.Jul 5 2021, 5:26 AM Closed by commit rG9ab99f773fec: [X86] Twist shuffle mask when fold HOP(SHUFFLE(X,Y),SHUFFLE(X,Y)) -> SHUFFLE… (authored by Wang, Pengfei <pengfei.wang@intel.com>). · Explain WhyJul 5 2021, 6:30 AM This revision was automatically updated to reflect the committed changes. Comment Actions
Thanks for confirming it!
Revision Contents
Diff 356496 llvm/lib/Target/X86/X86ISelLowering.cpp
llvm/test/CodeGen/X86/haddsub-undef.ll
llvm/test/CodeGen/X86/packss.ll
llvm/test/CodeGen/X86/pr50823.ll
|
Since the final shuffle has element type of i64/f64,
should this enforce that the source element type is less than that?