This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Combine shuffle nodes with multiple uses if all the users are being combined.
ClosedPublic

Authored by RKSimon on Feb 1 2017, 9:40 AM.

Details

Summary

Currently we only combine shuffle nodes if they have a single user to prevent us from causing code bloat by splitting the shuffles into several different combines.

We don't take into account that in some cases we will already have combined all the users during recursively calling up the shuffle tree.

This patch keeps a list of all the shuffle nodes that have been combined so far and permits combining of further shuffle nodes if all its users are in that list.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon created this revision.Feb 1 2017, 9:40 AM
andreadb edited edge metadata.Feb 1 2017, 12:51 PM

Hi Simon,

Do you have other code examples (other than function @_clearupper2xi64b) which may benefit from your change?

I am asking this because function @_clearupper2xi64b would be easily optimized before we even reach the code generator.
InstCombine knows how to combine a sequence of constant insertelement into a single shuffle (See function 'foldConstantInsEltIntoShuffle' in InstCombineVectorOps.cpp). If you pass function @_clearupper2xi64b to opt -instcombine you would get this:

define <2 x i64> @_clearupper2xi64b(<2 x i64>) #0 {
  %x32 = bitcast <2 x i64> %0 to <4 x i32>
  %r1 = shufflevector <4 x i32> %x32, <4 x i32> <i32 undef, i32 0, i32 undef, i32 0>, <4 x i32> <i32 0, i32 5, i32 2, i32 7>
  %r = bitcast <4 x i32> %r1 to <2 x i64>
  ret <2 x i64> %r
}

Later on, that shuffle would be lowered into a ANDPS (on pre-SSE4.1 targets):
andps .LCPI0_0(%rip), %xmm0

RKSimon updated this revision to Diff 87117.Feb 4 2017, 3:02 PM

Updated with additional tests - both are stripped down examples of a partially combined build_vector and shuffles all from a single vector source.

andreadb accepted this revision.Feb 6 2017, 4:09 AM

Hi Simon,
thanks for the extra tests.

The patch looks good to me.

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7129–7141 ↗(On Diff #87117)

Correct me if I am wrong.
We can only return false from line 7140 if N has no users.

Assuming that the use_empty() is a valid scenario, you can slightly simplify the code by adding an early check to N->use_empty() at the beginning of the function.

This revision is now accepted and ready to land.Feb 6 2017, 4:09 AM

Thanks Andrea

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7129–7141 ↗(On Diff #87117)

We can also return false if N has any users that are not contained in Nodes.

I was going for consistency with SDNode::isOnlyUserOf - AFAICT checking for use_empty() is not going to be much quicker than calling use_begin()/use_end(), matching them and not entering the for loop.

Shout if you disagree.

This revision was automatically updated to reflect the committed changes.