This is an archive of the discontinued LLVM Phabricator instance.

[VectorUtils] add IR-level analysis for widening of shuffle mask

Authored by spatel on Apr 10 2020, 8:51 AM.



This is similar to the recent move/addition of "scaleShuffleMask" (D76508), but there are a couple of differences:

  1. The existing x86 helper (canWidenShuffleElements) always tries to divide-by-2, so it gets called iteratively and wouldn't handle the general case of non-pow-2 length.
  2. The existing x86 code handles "SM_SentinelZero", but we don't have that in IR.

The motivation is to enable shuffle folds in instcombine/vector-combine that are similar to D76844 and D76727, but in the reverse-bitcast direction. Those patterns are visible in the tests for D40633.

Diff Detail

Event Timeline

spatel created this revision.Apr 10 2020, 8:51 AM

This looks good to me, but do we want to have some roundtrip and/or exhaustive tests for this?


This reads weird, can seems out of place here.

spatel updated this revision to Diff 256664.Apr 10 2020, 2:30 PM
spatel marked an inline comment as done.

Patch updated:

  1. Improved code comments/asserts.
  2. Changed logic - allow matching any negative (sentinel) value as long as that value repeats across all elements of a subsection. In the earlier version, we could match partial undef pieces in a widened element, but that means the round-trip with scaleShuffleMask is not always 1-to-1 (partial undefs would get remapped to defined values). This could get us into trouble with poison propagation, so let's avoid that. This way also allows usage from codegen because we would seamlessly handle SentinelZero or any other special mask constants.
  3. Added round-trip unit tests to confirm that this is the opposite of scaleShuffleMask. I haven't tried creating exhaustive unit tests before, so not sure yet how to be both exhaustive and not blow up test timing on something like this where we would want to test across multiple dimensions (mask sizes, values, scale factor).
lebedev.ri marked an inline comment as done.Apr 10 2020, 2:59 PM
lebedev.ri added inline comments.

As a preliminary step, rename this to be narrowShuffleMask() ?


Oh hmm, i almost missed this. We indeed can't define undef shuffle mask elts here:

define <2 x i16> @t(<4 x i8> %x) {
  %t0 = shufflevector <4 x i8> %x, <4 x i8> undef, 4294967295, 1, 2, 3
  %t1 = bitcast <4 x i8> %t0 to <2 x i16>
  ret <2 x i16> %t1
define <2 x i16> @t(<4 x i8> %x) {
  %t0 = bitcast <4 x i8> %x to <2 x i16>
  %t1 = shufflevector <2 x i16> %t0, <2 x i16> undef, 0, 1
  ret <2 x i16> %t1
Transformation doesn't verify!
ERROR: Target is more poisonous than source

<4 x i8> %x = < poison, poison, poison, poison >

<4 x i8> %t0 = < undef, poison, poison, poison >
<2 x i16> %t1 = < #x0000 (0)    [based on undef value], poison >

<2 x i16> %t0 = < poison, poison >
<2 x i16> %t1 = < poison, poison >
Source value: < #x0000 (0), poison >
Target value: < poison, poison >

  0 correct transformations
  1 incorrect transformations
  0 Alive2 errors
lebedev.ri added inline comments.Apr 11 2020, 3:08 AM

Ok, in light of not accepting partial sentinel values, what are your thoughts on the following then:

// Step through the input mask by splitting into Scale-sized subsections.
ScaledMask.reserve(NumElts / Scale);

for (ArrayRef<int> MaskSlice = Mask.take_front(Scale),
                   RemainingMaskElts = Mask.take_back(Mask.size() - Scale);
     !MaskSlice.empty(); MaskSlice = RemainingMaskElts.take_front(Scale),
                   RemainingMaskElts = RemainingMaskElts.take_back(
                       RemainingMaskElts.size() - Scale)) {
  assert((int)MaskSlice.size() == Scale && "Expected Scale-sized slice.");

  // The slice must be homogeneous.
  int OutputElt;

  if (MaskSlice.front() < 0) {
    // Negative values (undef or other "sentinel" values) must be equal across
    // the entire subsection.
    if (!is_splat(MaskSlice))
      return false;
    OutputElt = MaskSlice.front();
  } else {
    // A positive mask element must be cleanly divisible.
    if (MaskSlice.front() % Scale != 0)
      return false;
    // The elements of the subsection must be consecutive.
    auto ExpectedSlice =
        llvm::seq(MaskSlice.front(), MaskSlice.front() + Scale);
    assert(llvm::size(ExpectedSlice) == Scale && "Got wrong sequence.");
    if (!std::equal(adl_begin(MaskSlice), adl_end(MaskSlice),
      return false;
    OutputElt = MaskSlice.front() / Scale;

  // All narrow elements in this subsection map to the same wider element.

This results in ~Scale less divisions.

spatel marked 2 inline comments as done.Apr 11 2020, 5:57 AM
spatel added inline comments.

Yes, that would make more sense if we have both of these utils. Another possibility would be to make a single function with a ScaleUpOrDown bool param, but that's probably less readable in calling code.


Very C++. :)
I'm good with that...especially since you already wrote it and checked the optimization!

lebedev.ri accepted this revision.Apr 11 2020, 8:08 AM

LG otherwise.

This revision is now accepted and ready to land.Apr 11 2020, 8:08 AM
spatel updated this revision to Diff 256782.Apr 11 2020, 10:08 AM

Patch updated:

  1. Rebased after rG1318ddbc14c2 (name change for scaleShuffleMask), so this patch doesn't alter comments in the other function.
  2. Adopted most of the implementation suggestions to improve readability/iteration processing . I couldn't take the 5-line 'for' statement, so used a take_front/drop_front pattern instead. Also used a straight 'for' loop to avoid adding an #include for llvm::seq(). Hopefully, that's same or better readability.
spatel updated this revision to Diff 256785.Apr 11 2020, 10:19 AM

Patch updated again:
Tried to make the documentation comment easier to understand.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptApr 12 2020, 7:27 AM