This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Avoid perfect shuffle when mask has multiple uses
AbandonedPublic

Authored by qiucf on Jan 7 2022, 2:13 AM.

Details

Reviewers
nemanjai
shchenz
jsji
Group Reviewers
Restricted Project
Summary

The perfect shuffle (only enabled in big endian yet) may transform a shuffle vector into multiple merge/inserts, but when the shuffle mask is shared between multiple shuffles, it's better to use a single load with multiple vperm.

An obvious blocker is the mask is not operand of vector_shuffle in DAG, so I have to record all masks and check number of uses of each mask.

Diff Detail

Event Timeline

qiucf created this revision.Jan 7 2022, 2:13 AM
qiucf requested review of this revision.Jan 7 2022, 2:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 7 2022, 2:13 AM
shchenz added inline comments.Jan 9 2022, 6:44 PM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
10050

Please fix all the Lint warnings.

10063

Can we first check isLittleEndian and then isFourElementShuffle and then MaskMap[PermMask].second to improve the compile time?

qiucf marked an inline comment as done.Jan 9 2022, 9:21 PM
qiucf added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
10063

isFourElementShuffle is also computed, when we think it's unprofitable to do perfect shuffle, it's meaningless to compute it.

Do you mean we check endian first, and only calculate MaskMap on big endian? That may improve compile time, but only before little-endian perfect shuffles are implemented.

qiucf updated this revision to Diff 398502.Jan 9 2022, 9:22 PM

Fix clang-format warnings

shchenz added inline comments.Jan 10 2022, 12:31 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
10063

yes, right, I mean we first check the expressions which have lower complexity.

qiucf planned changes to this revision.Jan 10 2022, 1:37 AM

This method isn't sound, because there're many other optimizations (before perfect shuffle and vperm), these lowered shuffles shouldn't be counted.

qiucf updated this revision to Diff 401508.Jan 19 2022, 9:51 PM
qiucf updated this revision to Diff 401527.Jan 20 2022, 12:10 AM

This adds a whole lot of computation on the DAG in addition to having thread safety issues and the gain is very small. I am not in favour of something similar to this.

qiucf abandoned this revision.Mar 6 2022, 7:15 PM

Abandon this in favor of D121082.

Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2022, 7:15 PM