Page MenuHomePhabricator

[X86][SSE] Generalised unpackl/unpckh shuffle matching
ClosedPublic

Authored by RKSimon on Feb 11 2015, 8:54 AM.

Details

Summary

The existing unpck instruction lowering was based on matching explicit shuffle patterns, and missed many alternative shuffle masks (notably commuted masks and duplicate inputs).

This patch adds lowerVectorUnpack() which can be used to thoroughly match any unpckl/unpckh pattern.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 19764.Feb 11 2015, 8:54 AM
RKSimon retitled this revision from to [X86][SSE] Generalised unpackl/unpckh shuffle matching.
RKSimon updated this object.
RKSimon edited the test plan for this revision. (Show Details)
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: Unknown Object (MLST).

Moving this discussion back to phabricator.

Chandler - I can look at adding explicit (commuted) shuffle patterns if you prefer. The added test case 'shuffle_v4i32_40u1' was the start of all of this - canonicalization seemed to be forcing a commute preventing the lowering to punpckldq - the presense of the undefined lane seems to cause it.

The zmm test cases improved because of the poor handling of these shuffles so far.

RKSimon updated this revision to Diff 19986.Feb 15 2015, 9:19 AM

Reduced scope of this to just deal with commuted version of the 64 and 32 bit element shuffles (pre-AVX512) byt adding missing matching patterns.

The core of the problem is that the lowerVectorShuffle canonicalization can't (and shouldn't) be second guessing the importance of undefined lanes and should just sort by the numbers of defined 1st/2nd input lanes. This is a problem for explicit shuffle matching using both inputs, which is just the unpck instructions these days.

qcolombet edited edge metadata.Feb 16 2015, 11:44 AM

Hi Simon,

LGTM, but Chandler may have a different opinion on the way of canonicalize this.

Thanks,
Quentin

lib/Target/X86/X86ISelLowering.cpp
10763 ↗(On Diff #19986)

Shouldn’t the last number be 5?

10863 ↗(On Diff #19986)

Ditto.

RKSimon updated this revision to Diff 20046.Feb 16 2015, 2:15 PM
RKSimon edited edge metadata.

Thanks Quentin, fixed bad shuffle mask and tightened tests. Added extra unpckh test.

qcolombet accepted this revision.Feb 17 2015, 11:17 AM
qcolombet edited edge metadata.

Thanks Simon.

Let us move forward. We can address Chandler’s concerns if any later on.

Q.

This revision is now accepted and ready to land.Feb 17 2015, 11:17 AM
This revision was automatically updated to reflect the committed changes.
chandlerc edited edge metadata.Feb 17 2015, 5:00 PM

I really disagree with the direction of this commit. I'm sorry I've not replied more promptly, but I would like us to go back and look at *why* we cannot canonicalize this problem away, and if we cannot, making isShuffleMaskEquivalent do the commuting itself. I have an idea of how to implement the latter, but I really don't understand what is breaking canonicalizing yet.

I'm probably just being dense (sorry), I do believe there may be a fundamental problem here, but I'm not seeing it yet. undef lanes *never* impact what is canonical in either direction. Is there some opposing canonicalization rules that we have pick between, and that forces some cases to be canonicalized the wrong way?

I agree that adding multiple shuffle pattern matches for the same instruction is a waste.

If we take the shuffle <-1, 0, 5, 1> as an example (2 V1, 1 V2) - lowerVectorShuffle won't to commute this as there are already more V1 inputs than V2, and won't commute to ensure V1 elements are earlier in the shuffle than V2 either, and it is this that is necessary to ensure that the unpcks correctly match.

Altering isShuffleMaskEquivalent to support commutation looks relatively straightforward (as would supporting duplicate inputs - but that is probably unecessary) - would you like me to prepare a patch?