This is an archive of the discontinued LLVM Phabricator instance.

[X86, AVX] instcombine vperm2 intrinsics with zero inputs into shuffles
ClosedPublic

Authored by spatel on Mar 23 2015, 2:49 PM.

Details

Summary

This is the IR optimizer follow-on patch for D8563: the x86 backend patch that converts this kind of shuffle back into a vperm2.

This is also a continuation of the transform that started in D8486. In that patch, Andrea suggested that we could convert vperm2 intrinsics that use zero masks into a single shuffle. This is an implementation of that suggestion.

I recognize that we could go even further into bit twiddling hackery to make the code a line or two shorter, but I thought it would hurt readability.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 22519.Mar 23 2015, 2:49 PM
spatel retitled this revision from to [X86, AVX] instcombine vperm2 intrinsics with zero inputs into shuffles.
spatel updated this object.
spatel edited the test plan for this revision. (Show Details)
spatel added reviewers: andreadb, mkuper, RKSimon, chandlerc.
spatel added a subscriber: Unknown Object (MLST).
spatel updated this revision to Diff 22582.Mar 24 2015, 10:11 AM

Patch updated based on email suggestion by Andrea (thanks!):
Rather than keeping the shuffle operand order fixed based on the inputs to the intrinsic, swap them as needed. This has 2 benefits:

  1. It simplifies the zero vector replacement logic.
  2. It creates a shuffle in a more canonical form; the x86 backend swaps shuffle operands to reduce accesses to the 2nd input in the low half of the result vector.
andreadb accepted this revision.Mar 24 2015, 10:19 AM
andreadb edited edge metadata.

LGTM. Thanks Sanjay!

lib/Transforms/InstCombine/InstCombineCalls.cpp
222–230 ↗(On Diff #22582)

You can move this logic after line 235.

This revision is now accepted and ready to land.Mar 24 2015, 10:19 AM
This revision was automatically updated to reflect the committed changes.

Thanks, Andrea!

For the record, I did a little Clang test case dance with r233109 and r233111
in an attempt to not cause any buildbot breakage with this commit (r233110).
I learned my lesson from the last two similar changes. :)

Is it wrong that a Clang regression test has a dependency on the IR optimizer?