This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Preserve shuffles when one of the vector operands is constant
ClosedPublic

Authored by zvi on Oct 12 2016, 10:31 AM.

Details

Summary

Do *not* perform combines such as:

vector_shuffle<4,1,2,3>(build_vector(Ud, C0, C1 C2), scalar_to_vector(X))
->
build_vector(X, C0, C1, C2)

Keeping the shuffle allows lowering the constant build_vector to a materialized
constant vector (such as a vector-load from the constant-pool or some other idiom).

Diff Detail

Repository
rL LLVM

Event Timeline

zvi updated this revision to Diff 74399.Oct 12 2016, 10:31 AM
zvi retitled this revision from to [X86] Lower BUILD_VECTOR as BLEND.
zvi updated this object.
zvi added reviewers: delena, igorb, RKSimon, spatel.
zvi set the repository for this revision to rL LLVM.
zvi added a reviewer: mkuper.
RKSimon edited edge metadata.Oct 12 2016, 2:55 PM

My concern is whether we should be adding more shuffle specific code to the build vector lowering or whether we should be focussing on having only basic support and merging as much as possible into target shuffle combines.

mkuper edited edge metadata.Oct 12 2016, 4:16 PM

I tend to agree with Simon.

Perhaps extend the build_vector -> vector_shuffle code to catch this case (right now it only performs the combine if the constants are zero), and then make sure x86 lowers the shuffle to a blend?

zvi added a comment.Oct 12 2016, 9:43 PM

I tend to agree with Simon.

Perhaps extend the build_vector -> vector_shuffle code to catch this case (right now it only performs the combine if the constants are zero), and then make sure x86 lowers the shuffle to a blend?

Will improve patch to follow your suggestion. Thanks Simon and Michael!

zvi updated this revision to Diff 74791.EditedOct 16 2016, 5:29 AM
zvi retitled this revision from [X86] Lower BUILD_VECTOR as BLEND to [X86] Lower BUILD_VECTOR as BLEND-matchable shuffle .
zvi updated this object.
zvi edited edge metadata.

This is an improvement over the previous revision, but still not there yet.

Following Simon and Michael's suggestion, this is what i want to achieve:

build_vector(X, C0, C1, C2)
->
vector_shuffle<4,1,2,3>(
  build_vector(Undef, C0, C1, C2),
  scalar_to_vector(X)
)

The new build_vector should be expanded to a constant pool vector-load, and the vector_shuffle should be lowered as a BLEND in shuffle lowering.
The problem i am facing with the above approach is that the constants in the above pattern (C0, C1, C2) are visited by the legalizer before the new build_vector node is visited, and as a result each constant is legalized as a separate constant-pool load which prevents the expansion of the build_vector to a constant-pool load of a vector-load.
In this revision, the constant-pool vector-load is still being created 'manually'.

Would appreciate suggestions on how to avoid the 'manual' creation of the constant-pool load, and, of course, any other comments.

Thanks, Zvi

You may want to make sure you hit this pre-legalization, then. :-)

Also, what I meant was extending the existing build_vector -> vector_shuffle combine, not creating an x86-specific combine to do this. (It would be nice to be able to hit the case where X is an extract by blending with the original vector.)

If you're worried about that having an unexpected effect on other targets (because they don't have the appropriate blends or can't match them) I think a TTI hook would be better than an x86-specific combine.

As @mkuper suggested, we should be able to do more of this in DAGCombiner::reduceBuildVecToShuffle (there is a TODO to add better constant support).

Another area would be to investigate improving isShuffleEquivalent to recognise if a one-use BUILD_VECTOR (not sure if it just needs to be constants) to be regenerated so that it could correctly match with the expected mask - at the moment this barely works for zero vectors only. General constant support in lowerVectorShuffleAsBlend wouldn't go amiss either.

With those in place, LowerBuildVectorAsBLEND might not be needed or could be refactored to generate any subtarget shuffle (not just SSE41 style blends).

zvi added a comment.Oct 17 2016, 12:08 PM

You may want to make sure you hit this pre-legalization, then. :-)

Also, what I meant was extending the existing build_vector -> vector_shuffle combine, not creating an x86-specific combine to do this. (It would be nice to be able to hit the case where X is an extract by blending with the original vector.)

If you're worried about that having an unexpected effect on other targets (because they don't have the appropriate blends or can't match them) I think a TTI hook would be better than an x86-specific combine.

Sorry i misunderstood you about doing this as a target-independent combine.
I actually did consider doing a X86-specific combine, but i got discouraged after seeing that DAGCombiner::visitVECTOR_SHUFFLE will perform the reverse-transformation of what i have in mind (see code starting with comment: "// Attempt to combine a shuffle of 2 inputs of 'scalar sources'"). I will need to work out these supposed conflicting transformations.

zvi added a comment.Oct 17 2016, 12:25 PM

As @mkuper suggested, we should be able to do more of this in DAGCombiner::reduceBuildVecToShuffle (there is a TODO to add better constant support).

Another area would be to investigate improving isShuffleEquivalent to recognise if a one-use BUILD_VECTOR (not sure if it just needs to be constants) to be regenerated so that it could correctly match with the expected mask - at the moment this barely works for zero vectors only. General constant support in lowerVectorShuffleAsBlend wouldn't go amiss either.

With those in place, LowerBuildVectorAsBLEND might not be needed or could be refactored to generate any subtarget shuffle (not just SSE41 style blends).

Thanks for the suggestion. Will definitely look into it.

andreadb edited edge metadata.Oct 17 2016, 12:32 PM
In D25524#571943, @zvi wrote:

You may want to make sure you hit this pre-legalization, then. :-)

Also, what I meant was extending the existing build_vector -> vector_shuffle combine, not creating an x86-specific combine to do this. (It would be nice to be able to hit the case where X is an extract by blending with the original vector.)

If you're worried about that having an unexpected effect on other targets (because they don't have the appropriate blends or can't match them) I think a TTI hook would be better than an x86-specific combine.

Sorry i misunderstood you about doing this as a target-independent combine.
I actually did consider doing a X86-specific combine, but i got discouraged after seeing that DAGCombiner::visitVECTOR_SHUFFLE will perform the reverse-transformation of what i have in mind (see code starting with comment: "// Attempt to combine a shuffle of 2 inputs of 'scalar sources'"). I will need to work out these supposed conflicting transformations.

That logic was introduced at revision 234004 (http://llvm.org/viewvc/llvm-project?view=revision&revision=234004).
I didn't look carefully at the code, however it looks like we need a special case for when the build_vector in input to the shuffle is a build_vector of all constants.

As a side note: I noticed that all the emails generated by phabricator are not goint to llvm-commits. I guess it was not added as "subscriber".

zvi updated this revision to Diff 75016.Oct 18 2016, 8:55 AM
zvi retitled this revision from [X86] Lower BUILD_VECTOR as BLEND-matchable shuffle to [DAGCombine] Preserve shuffles when one of the vector operands is constant.
zvi updated this object.
zvi edited edge metadata.

Updated the patch to a target-independent combine following @andreadb's suggestion.

Hi Zvi,

Nice patch.
I have a couple of suggestions to improve it (see comment below).

Thanks
-Andrea

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
13937–13943

One thing I noticed is that we don't perform this transformation if any of the inputs to the shuffle has more than one use. We can check if either N0 and N1 has more than one use, and bail out earlier before we even get to the loop. For simplicity, you may want to move the computation to a helper function (it is up to you).

If we reach this code, then the shuffle has been already canonicalized with rules defined at lines 13754:13788. That means, if one of the operands is 'undef', then this shuffle is effectively a permute (or a splat), and the undef operand is always moved to the "rhs".

By construction, if we have an undef, that undef vector can only be N1. Also, (still by construction), no shuffle indices can reference the undef vector. As a consequence, if one of the operands is Undef, then you don't need to check if N0 is a "suitable" constant. Note that you cannot have both operands undef (that case would be canonicalized before we even reach this code).

If N1 is not undef, then you can check in advance if either N0 or N1 is "unsuitable". If at least one of the operands is not a constant vector (i.e. BothN0N1Const is false), and the other operand is "unsuitable", then we should not combine the shuffle.

The bottom line is: I think that you can probably hoist the 'hasOneUse' checks outside of the loop. You can also avoid to check if a operand is suitable and hoist that check outside of the loop. If we get into the loop, then we know that it is "safe" to attempt to fold the computation. That would let you remove the 'IsConstNotAllZeroes' smallvector.

zvi added a comment.Oct 18 2016, 2:54 PM

Andrea, that's a very good analysis. Will move the conditions out of the loop to early-exits. Thanks!

zvi updated this revision to Diff 75083.EditedOct 18 2016, 4:03 PM
  • Refactored the combine to a helper furnction
  • Hoisted checks on vector contents to outside of loops. Checks in loop are only on mask indices.

Hi Zvi,

see my inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
13784

Anywhere else in this file we return an SDValue() when the combine is not successful. It would be nicer if you do the same in this function. So, I suggest that you replace all the occurrences of return {} with return SDValue().

13787–13793

I don't think that this check is correct. For example, one of the operands in input can be a SCALAR_TO_VECTOR. The extra checks for the opcode should not be there.

Basically, I was expecting something like this:

if (N0AnyConst && !N1AnyConst && !ISD::isBuildVectorAllZeros(N0.getNode()))
  return SDValue();
if (!N0AnyConst && N1AnyConst && !ISD::isBuildVectorAllZeros(N1.getNode()))
  return SDValue();
13812–13813

This can never happen because we return at line 13787 if one of the operands cannot be combined.
This if-stmt should be removed.

andreadb added inline comments.Oct 19 2016, 4:49 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
13785–13793

If N1 is Undef, then we should never early exit. As I wrote in one of my previous comments, by construction N0 can't be Undef; only N1 can be Undef.

Therefore, you can skip the computation of N0AnyConst and N1AnyConst if N1 is known to be Undef. What I am suggesting is to change that code into something like this:

if (!N1.isUndef()) {
  bool N0AnyConst = isAnyConstantBuildVector(N0.getNode());
  bool N1AnyConst = isAnyConstantBuildVector(N1.getNode());
  if (N0AnyConst && !N1AnyConst && !ISD::isBuildVectorAllZeros(N0.getNode()))
    return SDValue();
  if (!N0AnyConst && N1AnyConst && !ISD::isBuildVectorAllZeros(N1.getNode()))
    return SDValue();
}

The extra check for N->isUndef() in isAnyConstantBuildVector would become redundant if your code checks in advance if N1 is Undef.

mkuper added inline comments.Oct 20 2016, 12:39 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
13757

Any chance we can put this in a place that makes more sense?
See discussion on D25685.

zvi added inline comments.Oct 22 2016, 12:11 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
13757

Sure

13784

I must have done this subconsciously. Would never have done it in a sound mind :)

13785–13793

I humbly accept your improvement.

The extra check for N->isUndef() in isAnyConstantBuildVector would become redundant if your code checks in advance if N1 is Undef.

Not sure if you are suggesting to change

bool N1AnyConst = isAnyConstantBuildVector(N1.getNode());

to

bool N1AnyConst = ISD::isBuildVectorOfConstantSDNodes(N) ||
        ISD::isBuildVectorOfConstantFPSDNodes(N);

or that is just a side note.

I prefer the former for brevity.

13812–13813

Will remove. Thanks.

zvi updated this revision to Diff 75527.EditedOct 22 2016, 12:40 AM
  • Using @andreadb's improved early-exit conditions
  • Moved isAnyConstantBuildVector to a more suitable location
  • Just to emphasize, combineShuffleOfScalars still needs to be moved to a better location.

LGTM

Thanks Zvi!

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
13785–13793

That's okay. Using isAnyConstantBuildVector is fine.
My point was that the check for isUndef() in that function is not longer needed because you check in advance if N1 is undef.

zvi marked 6 inline comments as done.Oct 22 2016, 2:34 AM
zvi added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
13785–13793

You're right and I will improve this.

andreadb accepted this revision.Oct 22 2016, 3:13 AM
andreadb edited edge metadata.
This revision is now accepted and ready to land.Oct 22 2016, 3:13 AM
zvi updated this revision to Diff 75533.Oct 22 2016, 11:38 AM
zvi edited edge metadata.

Removed the check for undef from isAnyConstantBuildVector()

RKSimon added inline comments.Oct 23 2016, 1:40 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7755

If possible, do this change (along with the addition of isAnyConstantBuildVector) as a separate NFC pre-commit?

13936–13937

Remove the outer braces to match style guide.

zvi added inline comments.Oct 23 2016, 2:11 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7755

sure

13936–13937

ok

zvi updated this revision to Diff 75543.Oct 23 2016, 2:14 AM

Fixes for @RKSimon's comments.

RKSimon accepted this revision.Oct 23 2016, 2:36 AM
RKSimon edited edge metadata.

LGTM

This revision was automatically updated to reflect the committed changes.