This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] narrow shuffle of concatenated vectors
ClosedPublic

Authored by spatel on Apr 10 2019, 3:35 PM.

Details

Summary
// shuffle (concat X, undef), (concat Y, undef), Mask -->
// concat (shuffle X, Y, Mask0), (shuffle X, Y, Mask1)

Someone with more ARM NEON experience can confirm, but I think the changes with 'vtrn' are improvements.

The x86 changes look neutral or better. There's one test with an extra instruction, but that could be reversed for a subtarget with the right attributes.

But by default, I think we want to avoid the 256-bit op when possible (in my motivating benchmark, a handful of ymm ops sprinkled into a sequence of xmm ops are triggering frequency throttling on Haswell resulting in signficantly worse perf).

Diff Detail

Repository
rL LLVM

Event Timeline

spatel created this revision.Apr 10 2019, 3:35 PM

From looking at the A57 scheduler model in LLVM this seems like an improvement, with for example VUZPq taking twice as long as VUZPd. But in general I could imagine shuffles on shorter vectors to be cheaper.
It will probably also be easier to spot that one of concatenated subvectors becomes fully undef after doing the shuffling on the subvector first.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
17640 ↗(On Diff #194598)

Would it be possible to move/copy this comment to the call of foldShuffleOfConcatUndefs ?

spatel marked an inline comment as done.Apr 11 2019, 10:46 AM
spatel added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
17640 ↗(On Diff #194598)

I can do that if you think that makes the code clearer, but in general I try to make the function's doxygen comment a bit more vague in case the implementation changes/grows, but not repeat that function-level comment at the callers.

For example, I think this function could be extended to handle a pattern where the 2nd operand is not an undef value as long as the shuffle mask is not choosing elements from that 2nd operand.

Let me know if I am misunderstanding the suggestion.

sdesmalen accepted this revision.Apr 12 2019, 6:27 AM

LGTM. The compiler will have more opportunity to use cheaper shuffles if part of the (sub)vector is undef, so this looks like a general improvement.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
17640 ↗(On Diff #194598)

When reading through DAGCombiner code I usually appreciate the "X -> Y" comments that provide an overview of transforms so you can easily see what case is optimised without having to look at further function definitions/doxygen. But that only makes sense if the function handles a single transform. So I agree that if the scope of this function widens, it makes little sense to copy this comment to the call-site.

This revision is now accepted and ready to land.Apr 12 2019, 6:27 AM

LGTM. The compiler will have more opportunity to use cheaper shuffles if part of the (sub)vector is undef, so this looks like a general improvement.

Thanks!

Re: code comment organization - I see your point, and I'll take that as something to be more aware of in general...I've been hacking away in here long enough that it's hard to remember the shock of seeing this mess for the 1st time. :)

This revision was automatically updated to reflect the committed changes.