This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Generalize build_vector -> vector_shuffle combine for more than 2 inputs
ClosedPublic

Authored by mkuper on Sep 16 2016, 1:26 PM.

Details

Summary

This generalizes the build_vector -> vector_shuffle combine to support any number of inputs.
The idea is to create a binary tree of shuffles, where the first layer performs pairwise shuffles of the input vectors placing each input element into the correct lane, and the rest of the tree blends these shuffles together.

This doesn't try to be smart and create any sort of "optimal" shuffles - the assumption is that even a "poor" shuffle sequence is better than extracting and inserting the elements one by one.
Also, currently, this fires unconditionally. I'm not sure whether we really want that, or it should depend on the specifics of the input vector use - the edge case is a matrix transpose, where every output vector uses exactly one element from each input.

Diff Detail

Repository
rL LLVM

Event Timeline

mkuper updated this revision to Diff 71694.Sep 16 2016, 1:26 PM
mkuper retitled this revision from to [DAGCombine] Generalize build_vector -> vector_shuffle combine for more than 2 inputs.
mkuper updated this object.
RKSimon edited edge metadata.Sep 17 2016, 9:16 AM

Nice work! There's a lot going on here which makes we wonder whether this should be split into several patches:

1 - createBuildVecShuffle - generalise support for the shuffling of different sized inputs + outputs
2 - shuffling with constant values (not just zeros)
3 - support any number of inputs

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12900 ↗(On Diff #71694)

Support (VT.getSizeInBits() % InVT1.getSizeInBits()) == 0 ? Not just x2?

12906 ↗(On Diff #71694)

Support (InVT1.getSizeInBits() % VT.getSizeInBits()) == 0 ? Not just x2?

12981 ↗(On Diff #71694)

This comment needs a rewrite now that you're supporting any number of inputs.

13005 ↗(On Diff #71694)

Yes VecMap is probably overkill.

13006 ↗(On Diff #71694)

Initialise to VectorMask(NumElems, -1) ?

13020 ↗(On Diff #71694)

Could we not generalise this to any constant? I'm thinking about cases such as float4 where the .w component is set to 1.0.

13050 ↗(On Diff #71694)

Rewrite the comment to explain its one input vector + zero vector.

13073 ↗(On Diff #71694)

for (unsigned In = 0, Len = (VecIn.size() / 2); In < Len; ++In) {

13123 ↗(On Diff #71694)

for (unsigned In = 0, Len = CurSize / 2; In < Len; ++In) {

Nice work! There's a lot going on here which makes we wonder whether this should be split into several patches:

1 - createBuildVecShuffle - generalise support for the shuffling of different sized inputs + outputs
2 - shuffling with constant values (not just zeros)
3 - support any number of inputs

Thanks, Simon!

I'm not sure what you mean by "split". :-)
This patch only contains item 3.

Re 1 - splitting the shuffle logic into "createBuildVecShuffle" doesn't generalize anything, it's just a refactoring, that only makes sense once we have more than 2 inputs. The restrictions on input sizes existed before this patch.
Re 2 - we have only been blending with zeroes for years. I agree we should probably blend with any constant vectors, and will be happy to add yet another TODO. However, that seems orthogonal to this patch.

In any case, I don't see a reason to gate (3) on either (1) or (2).

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12900 ↗(On Diff #71694)

There's a TODO for this in line 12934.
This isn't really new code, we've been doing this only for x2 for ages - some of it is really old, some I've added sometime in 2014. (I refactored it somewhat in r281283 to make this patch easier, but that didn't change the logic.)

12906 ↗(On Diff #71694)

Same as above.

12981 ↗(On Diff #71694)

Argh, right, missed it. Thanks.

13005 ↗(On Diff #71694)

Ok, will rewrite with a search, unless someone objects. Too bad we don't have a SmallMap.

13006 ↗(On Diff #71694)

As usual. :-)

13020 ↗(On Diff #71694)

I think we could.

Note that this isn't really new - the blend with zero logic has been here for a while.
I assume the motivation was that blends with zero are especially cheap, because you don't even need to load the constant. But I agree that general constants should probably also be a win. That's for a separate patch, though. I'll add a TODO.

13050 ↗(On Diff #71694)

I don't think it's true - it can really be a single input vector.

13073 ↗(On Diff #71694)

Right, thanks.

13123 ↗(On Diff #71694)

Right, thanks.

mkuper updated this revision to Diff 71885.Sep 19 2016, 3:34 PM
mkuper edited edge metadata.

Applied Simon's comments.

Nice work! There's a lot going on here which makes we wonder whether this should be split into several patches:

1 - createBuildVecShuffle - generalise support for the shuffling of different sized inputs + outputs
2 - shuffling with constant values (not just zeros)
3 - support any number of inputs

Thanks, Simon!

I'm not sure what you mean by "split". :-)
This patch only contains item 3.

Re 1 - splitting the shuffle logic into "createBuildVecShuffle" doesn't generalize anything, it's just a refactoring, that only makes sense once we have more than 2 inputs. The restrictions on input sizes existed before this patch.
Re 2 - we have only been blending with zeroes for years. I agree we should probably blend with any constant vectors, and will be happy to add yet another TODO. However, that seems orthogonal to this patch.

In any case, I don't see a reason to gate (3) on either (1) or (2).

Sorry for not being clear, but yes my concern was whether the createBuildVecShuffle refactor should be done first as it would make the rest of the patch easier to follow. My proposal for the order of work was based on trying to reduce the size of each patch and show the effect of each feature addition. I can understand if you don't want to take that approach though.

Sorry for not being clear, but yes my concern was whether the createBuildVecShuffle refactor should be done first as it would make the rest of the patch easier to follow. My proposal for the order of work was based on trying to reduce the size of each patch and show the effect of each feature addition. I can understand if you don't want to take that approach though.

Oh, I see what you mean now (I hope. :-) )
I don't think an NFC refactoring would make sense here - implementing a separate shuffle and blend stages would look really odd for only two inputs. I tried to minimize the changes I'm making here by committing rL281283 first, but I guess that didn't help too much.

Ping.

I apologize for this being a relatively hefty patch, but, as above - I don't see a good way to split it up.

delena added inline comments.Sep 28 2016, 1:11 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12883 ↗(On Diff #71885)

Please add a comment here.

13030 ↗(On Diff #71885)

I suppose you can put "assert" here.

13079 ↗(On Diff #71885)

In what case it fails to create a shuffle?

13084 ↗(On Diff #71885)

If you fail in shuffle creation, to you want to try zero-vector?

Thanks, Elena!

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12883 ↗(On Diff #71885)

Sure.

13030 ↗(On Diff #71885)

I guess the comment is ambiguous - by "must" I didn't mean that we never get here if that doesn't happen, but that it's required for the combine. I'll change the comment.

13079 ↗(On Diff #71885)

If the number of elements in the vectors is too different, see line 12938.

13084 ↗(On Diff #71885)

I don't think so.
The only reason it'll fail is if at least one of the input vectors has the wrong size. If this happens, it doesn't really matter what the other input is - there's no advantage to blending with zero, as compared to shuffling with a vector which matches the output size.

delena accepted this revision.Sep 28 2016, 2:48 AM
delena edited edge metadata.

Some minor comments inside.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
13030 ↗(On Diff #71885)

You can't create BUILD_VECTOR node of v8i32 and put i8 operands for it.

13038 ↗(On Diff #71885)

line alignment

13045 ↗(On Diff #71885)

at least two?

13073 ↗(On Diff #71885)

line alignment

This revision is now accepted and ready to land.Sep 28 2016, 2:48 AM
mkuper added inline comments.Sep 28 2016, 2:53 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
13030 ↗(On Diff #71885)

Yes, that's why I check that, and bail out if the types don't match. But this is the first place that actually checks this (I think?), so this has to be a failure return, not an assert.

13038 ↗(On Diff #71885)

Will fix, thanks.

13045 ↗(On Diff #71885)

VecIn[0] is reserved for the zero vector. So it's really at least one.
(We actually care about the case where it's exactly 1 - regardless of whether we're blending with 0 or not.)

13073 ↗(On Diff #71885)

Will fix, thanks.
I was sure I ran clang-format over this...

zvi added a subscriber: zvi.Sep 28 2016, 6:19 AM
zvi added a comment.Sep 28 2016, 8:21 AM

Great job!

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12884 ↗(On Diff #71885)

consider changing type of VectorMask to ArrayRef<int>

dorit added a subscriber: dorit.Oct 5 2016, 11:40 PM
dorit added a comment.Oct 5 2016, 11:55 PM

+1
Just FYI - this patch gives 15% improvement in one of the Geekbench workloads on Haswell (compared to enabling interleaving for X86 without this patch) :-)

andreadb accepted this revision.Oct 6 2016, 4:25 AM
andreadb edited edge metadata.

LGTM.
Nice patch!

This revision was automatically updated to reflect the committed changes.