This is an archive of the discontinued LLVM Phabricator instance.

[X86 isel] Remove lane requirement from lowerShuffleAsUNPCKAndPermute
ClosedPublic

Authored by zhuhan0 on Apr 5 2023, 3:51 PM.

Details

Summary

lowerShuffleAsUNPCKAndPermute requires the shuffle mask element to be
in the same lane in both the input and output vectors. This prevents it from
matching certain patterns for example in GHI 61964. Removing the lane
requirement fixes the issue.

The change I'm targeting is in the test llvm/test/CodeGen/X86/pr61964.ll. The
codegen has improved notably with this patch. Otherwise, looks like some
broadcast instructions are replaced with unpck and perm. To check if there's
any other performance change, I ran llvm-test-suite benchmarks from the
SingleSource, MultiSource, and MicroBenchmarks directories:

Tests: 2665
Short Running: 2009 (filtered out)
Same hash: 140 (filtered out)
In Blacklist: 513 (filtered out)
Remaining: 3
Metric: exec_time

Program                                                                exec_time
                                                                       lhs       rhs    diff
 test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test   1.64      1.64  0.1%
      test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test   1.06      1.06  0.0%
          test-suite :: MultiSource/Applications/JM/lencod/lencod.test   5.25      5.25  0.0%
                                                    Geomean difference    nan       nan  0.0%
      exec_time
l/r         lhs       rhs      diff
count  3.000000  3.000000  3.000000
mean   2.648300  2.649100  0.000462
std    2.269035  2.268849  0.000415
min    1.055500  1.055900  0.000095
25%    1.349300  1.350250  0.000237
50%    1.643100  1.644600  0.000379
75%    3.444700  3.445700  0.000646
max    5.246300  5.246800  0.000913

The patch only hits three cases and the result is neutral. (The 513 blacklisted
benchmarks are the ones under MicroBenchmarks, which --filter-hash does
not work and I manually verified their code did not change).

Diff Detail

Event Timeline

zhuhan0 created this revision.Apr 5 2023, 3:51 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 5 2023, 3:51 PM
zhuhan0 requested review of this revision.Apr 5 2023, 3:51 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 5 2023, 3:51 PM
zhuhan0 edited the summary of this revision. (Show Details)Apr 5 2023, 3:57 PM
zhuhan0 added reviewers: RKSimon, MatzeB, nadav.
RKSimon added inline comments.Apr 6 2023, 3:42 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
13320

I'm not certain this will work properly for 512-bit vectors?

zhuhan0 updated this revision to Diff 511479.Apr 6 2023, 11:08 AM

Improve the LO/HI matching logic. Essentially we want the masks to all be in the lower or higher-order bits but they can be in any lane.

Also normalized the masks (NormM = M - NumElts if Op == V2) so that the code is more concise.

zhuhan0 added inline comments.Apr 6 2023, 11:09 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
13320

The new version should work now.

RKSimon added inline comments.Apr 6 2023, 1:38 PM
llvm/test/CodeGen/X86/vector-shuffle-128-v16.ll
313 ↗(On Diff #511479)

Please can you take a look at this regression

zhuhan0 updated this revision to Diff 511816.Apr 7 2023, 3:57 PM

Fix SSE2 regression in vector-shuffle-128-v16.ll.

one minor

llvm/lib/Target/X86/X86ISelLowering.cpp
13559

We can probably avoid the SmallSet by using std::optional?

std::optional<int> UniqueElt;
for (int Elt : Mask) {
  if (Elt == SM_SentinelUndef) {
      NumUndefs++;
      continue;
  }
  if (UniqueElt.has_value() && UniqueElt.value() != Elt)
    return false;
  UniqueElt = Elt;
}
return NumUndefs <= Mask.size() / 2 && UniqueElt.has_value();

(untested)

zhuhan0 added inline comments.Apr 11 2023, 10:47 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
13559

Sweet. This works!

zhuhan0 updated this revision to Diff 512660.Apr 11 2023, 10:47 PM

Address comment.

zhuhan0 retitled this revision from [RFC][X86 isel] Remove lane requirement from lowerShuffleAsUNPCKAndPermute to [X86 isel] Remove lane requirement from lowerShuffleAsUNPCKAndPermute.Apr 11 2023, 10:48 PM
RKSimon accepted this revision.Apr 12 2023, 2:17 AM

LGTM - cheers

This revision is now accepted and ready to land.Apr 12 2023, 2:17 AM
This revision was landed with ongoing or failed builds.Apr 12 2023, 5:11 PM
This revision was automatically updated to reflect the committed changes.
dyung added a subscriber: dyung.Apr 19 2023, 9:42 AM

Hi @zhuhan0 we are seeing a test failure on some internal tests which I bisected back to your change. I have put the a repro up in issue 62242. Can you take a look?

Hi @zhuhan0 we are seeing a test failure on some internal tests which I bisected back to your change. I have put the a repro up in issue 62242. Can you take a look?

Sorry about that. Let me take a look.