This is an archive of the discontinued LLVM Phabricator instance.

[InstSimplify] fold identity shuffles (recursing if needed)
ClosedPublic

Authored by spatel on Apr 11 2017, 2:37 PM.

Details

Summary

Please have a hard look to make sure I haven't botched the shuffle mask logic...
This patch simplifies the examples from D31509 and D31927 (PR30630) and catches the basic identity shuffle tests that Zvi recently added.

I'm not sure if we have something like this in DAGCombiner, but we should?

It's worth noting that "MaxRecurse / RecursionLimit" is only 3 on entry at the moment. We might want to bump that up if there are longer shuffle chains like this in the wild.

Diff Detail

Event Timeline

spatel created this revision.Apr 11 2017, 2:37 PM
filcab added a subscriber: filcab.Apr 12 2017, 10:52 AM
filcab added inline comments.
lib/Analysis/InstructionSimplify.cpp
4087

Maybe mention that if we can't find a non-shuffle by the time MaxRecurse is 0, then we bail out completely, we don't try to track things to that last shuffle.

4088

I'd like this name to be a bit more descriptive. Even if it's something like tryToBypassNoOpShuffles or something.

4098

But we might end up with shuffles we'd like to remove if we always skip shuffles with undef, no?
Do you want to add a shuffle similar to your identity_mask tests which shows that we don't remove if there's an undef (assuming it's as easy as making one of the last shuffle mask's elements an undef. No need to go out of your way for a negative test)?

4193

Do we care about bitcasts? Should we add a TODO comment to add support for those later?

spatel added inline comments.Apr 12 2017, 12:34 PM
lib/Analysis/InstructionSimplify.cpp
4098

Yes, this is conservative as a first step, so it will miss some patterns. I wanted to make sure that I have my undef story straight before enhancing this (eg, D31980).

Also, what is the expected behavior in this case?

define <4 x float> @elts_test_vpermilvar_ps(<4 x float> %a0, i32 %a1) {
; CHECK-LABEL: @elts_test_vpermilvar_ps(
; CHECK-NEXT:    ret <4 x float> %a0
;
%1 = insertelement <4 x i32> <i32 0, i32 1, i32 2, i32 3>, i32 %a1, i32 3
%2 = tail call <4 x float> @llvm.x86.avx.vpermilvar.ps(<4 x float> %a0, <4 x i32> %1)
%3 = shufflevector <4 x float> %2, <4 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 undef>
ret <4 x float> %3
}

If we simplify the shuffle to %2, we can no longer eliminate the x86 vperm?

spatel updated this revision to Diff 95020.Apr 12 2017, 1:02 PM
spatel marked 3 inline comments as done.

Patch updated:

  1. Make name more descriptive: "foldIdentityShuffles".
  2. Add comment about the per-element limit for recursion.
  3. Add TODO about bitcasts.
  4. Add test case with an undef mask element - not sure what's best in this case. See earlier example.
zvi edited edge metadata.Apr 19 2017, 3:55 AM

This patch LGTM for starters. The conservative handling of undef mask elements can be improved over time.

andreadb accepted this revision.Apr 19 2017, 6:30 AM

I only have one request. Otherwise it LGTM too.

lib/Analysis/InstructionSimplify.cpp
4188–4191

You can skip this loop if you know in advance that at least one index in Mask is undef.
Otherwise, you would end up with needless recursive calls to foldIdentityShuffles().

This revision is now accepted and ready to land.Apr 19 2017, 6:30 AM
filcab accepted this revision.Apr 19 2017, 6:52 AM

Changes LGTM. Thanks!
(Sorry for not replying earlier)

lib/Analysis/InstructionSimplify.cpp
4098

Being in IR, I'd guess one way is to add target hooks like we have on SDAG to ask the target "is this a shuffle-like operation? If so, what's the mask?".

It will not be fun, though :(

Dealing with undef has a few edge cases, so being conservative here and getting that sorted out in a different patch sounds good.

spatel added inline comments.Apr 19 2017, 9:22 AM
lib/Analysis/InstructionSimplify.cpp
4188–4191

Sure - I'll add this check before committing.

Thanks, all!

This revision was automatically updated to reflect the committed changes.