This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] reduce shuffle of concat of same vector
ClosedPublic

Authored by spatel on Jan 6 2020, 12:56 PM.

Details

Summary

This is possibly a small part towards solving PR42024:
https://bugs.llvm.org/show_bug.cgi?id=42024

The vectorizer is creating shuffles of concat like this:

%63 = shufflevector <4 x i64> %x, <4 x i64> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
%64 = shufflevector <8 x i64> %63, <8 x i64> undef, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>

That might be fixable in the vectorizers, but we're not allowed to fold that into a single shuffle in instcombine, so we should have a backend backstop to convert that into the likely simpler form:

%64 = shufflevector <8 x i64> %x, <8 x i64> undef, <8 x i32> <i32 0, i32 0, i32 1, i32 1, i32 2, i32 2, i32 3, i32 3>

Diff Detail

Event Timeline

spatel created this revision.Jan 6 2020, 12:56 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 6 2020, 12:56 PM
xbolva00 added a subscriber: xbolva00.EditedJan 6 2020, 2:34 PM

LG

Accept button missed?

This revision is now accepted and ready to land.Jan 6 2020, 8:53 PM

Why doesn't partitionShuffleOfConcats catch this?

spatel added a comment.Jan 7 2020, 5:40 AM

Why doesn't partitionShuffleOfConcats catch this?

It's not exactly the pattern that function is looking for:

// Tries to turn a shuffle of two CONCAT_VECTORS into a single concat,
// or turn a shuffle of a single concat into simpler shuffle then concat.

I did consider putting this block into that helper since it's similar, but I didn't see any real shared code savings in the pattern matching. I can still move it inside there for better organization of folds.

RKSimon accepted this revision.Jan 7 2020, 6:06 AM

Its fine to leave it where it is - LGTM cheers.

spatel added a comment.Jan 7 2020, 6:23 AM

Its fine to leave it where it is - LGTM cheers.

On 2nd look at more motivating patterns, I think we should move it into partitionShuffleOfConcats() and loosen the isShuffleMaskLegal() restriction, but I'll make that a follow-on with more tests.

That's because we have illegal vectors like this being produced for AVX1/2 targets:

define <16 x i32> @concat_self_v4i32(<4 x i32> %x) {
  %t90 = shufflevector <4 x i32> %x, <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
  %t91 = shufflevector <8 x i32> %t90, <8 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
  %s = shufflevector <16 x i32> %t91, <16 x i32> undef, <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 1, i32 5, i32 9, i32 13, i32 2, i32 6, i32 10, i32 14, i32 3, i32 7, i32 11, i32 15>
  ret <16 x i32> %s
}

This patch won't fire as-is on that because the 512-bit vector requires an illegal shuffle mask for AVX1/2.

This revision was automatically updated to reflect the committed changes.