This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Shuffle blends with zero
ClosedPublic

Authored by RKSimon on Oct 25 2015, 3:06 PM.

Details

Summary

This patch generalizes the zeroing of vector elements with the BLEND instructions. Currently a zero vector will only blend if the shuffled elements are correctly inline, this patch recognises when a vector input is zero (or zeroable) and modifies a local copy of the shuffle mask to support a blend. As a zeroable vector input may not be all zeroes, the zeroable vector is regenerated if necessary.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 38360.Oct 25 2015, 3:06 PM
RKSimon retitled this revision from to [X86][SSE] Shuffle blends with zero.
RKSimon updated this object.
RKSimon added reviewers: spatel, andreadb, delena, qcolombet.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: llvm-commits.
delena edited edge metadata.Oct 28 2015, 6:24 AM

In more general case it will work if one of V1 or V2 is vector of constants with '0' in the right place. When you calculate computeZeroableShuffleElements() you check this option.

if (Zeroable[i]) {

You just should know what input to choose. You don't need to rebuild V1 or V2. 
And you **always **can define mask for the "zeroable" element, no fallthru in this case.

If computeZeroableShuffleElements() was returning not only mask, but also input number (V1 or V2) per zeroable element, you'd just use this information.

Both isBuildVectorAllZeros and computeZeroableShuffleElements() treats undef lanes as zeroable - so we have a problem when the shuffle mask wants an actual zero input but the lane that we'd need to blend from is actually UNDEF:

shufflevector <4 x float> %v, <4 x float><float 0.000000e+00, float undef, float undef, float undef>, <4 x i32> <i32 0, i32 4, i32 2, i32 4>

which to use BLENDPS we'd need to convert to:

shufflevector <4 x float> %v, <4 x float> <float undef, float 0.000000e+00, float undef, float 0.000000e+00>, <4 x i32> <i32 0, i32 5, i32 2, i32 7>

But its easier if we just set the whole input vector as zero (since we know its zeroable anyhow).

I'll add an extra test for this example.

Now it could be that we have cases where we have a BUILD_VECTOR input with zero/nonzero constants that could be matched up (possibly by creating a new BUILD_VECTOR with reordered constants suitable for blending) but it'll be a much more involved change and I haven't seen any real world code that would benefit from this yet, so I just focussed on the zeroing which I do have examples of.

delena added a subscriber: delena.Oct 28 2015, 10:54 AM

Hi Simon,

Your code is fully correct. I just think that you miss some opportunities.

I'll take your example and change one element:

shufflevector <4 x float> %v, <4 x float><float 0.000000e+00, float undef, float undef, float 0.2>, <4 x i32> <i32 0, i32 4, i32 2, i32 7>

it is equal to the blend:

shufflevector <4 x float> %v, <4 x float><float 0.000000e+00, float 0.0, float undef, float 0.2>, <4 x i32> <i32 0, i32 5, i32 2, i32 7>

  • Elena

I've been investigating general build vector support and the problem I'm finding is the BUILD_VECTOR node is usually lowered before the shuffle we're trying to match - preventing easy matching (the constant data has often disappeared inside a LOAD, INSERT_VECTOR_ELT or something else). It should be possible to create a helper function to do this (possibly just expanding getShuffleScalarElt) but its quite beyond the scope of what I had in mind for this patch.

RKSimon updated this revision to Diff 38739.Oct 29 2015, 7:58 AM
RKSimon edited edge metadata.
delena accepted this revision.Oct 29 2015, 9:09 AM
delena edited edge metadata.

I did not mean to complicate solution. Please commit, I'll try to check with debugger what happens with const vector.

This revision is now accepted and ready to land.Oct 29 2015, 9:09 AM
This revision was automatically updated to reflect the committed changes.