This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Combine shuffles of BUILD_VECTOR and SCALAR_TO_VECTOR
ClosedPublic

Authored by RKSimon on Mar 22 2015, 7:10 AM.

Details

Summary

This patch attempts to optimize the shuffling of 'scalar source' inputs - BUILD_VECTOR and SCALAR_TO_VECTOR nodes. This folds away a lot of unnecessary shuffle nodes, and allows quite a bit of constant folding that was being missed.

At the moment the inputs are only combined if they are only being used once - I'm interested in extending this so that constant inputs are always combined. It would create more constant data but would remove more shuffles (which may be introducing their own constant data for masks anyhow). Comments please.

Also removed a x86 insertps test that was testing for the old shuffle lowering system.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 22419.Mar 22 2015, 7:10 AM
RKSimon retitled this revision from to [DAGCombiner] Combine shuffles of BUILD_VECTOR and SCALAR_TO_VECTOR.
RKSimon updated this object.
RKSimon edited the test plan for this revision. (Show Details)
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: Unknown Object (MLST).

At the moment the inputs are only combined if they are only being used once - I'm interested in extending this so that constant inputs are always combined. It would create more constant data but would remove more shuffles (which may be introducing their own constant data for masks anyhow). Comments please.

We need to be careful here. Constant loads can be much more expensive than shuffles on some targets.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12019 ↗(On Diff #22419)

Add space after if.

12029 ↗(On Diff #22419)

When would this be false?

12042 ↗(On Diff #22419)

It might be better to use ZExt, depending on the target/type. Might be better to use something like this:

Op = isZExtFree(Op.getValueType(), SVT) ? DAG.getZExtOrTrunc(Op, SDLoc(N), SVT) : DAG.getSExtOrTrunc(Op, SDLoc(N), SVT);

Thanks for the review Hal, I'll update the patch later.

At the moment the inputs are only combined if they are only being used once - I'm interested in extending this so that constant inputs are always combined. It would create more constant data but would remove more shuffles (which may be introducing their own constant data for masks anyhow). Comments please.

We need to be careful here. Constant loads can be much more expensive than shuffles on some targets.

Yes that was my concern as well - there isn't an easy way to determine how expensive the shuffle is that we're trying to remove. I'll leave it as it is for now and only combine when the inputs.are only used once.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12029 ↗(On Diff #22419)

We have an early out from the loop if an active operand turns out to be something other than BUILD_VECTOR or SCALAR_TO_VECTOR. I'll move this logic to the opening if() entry into the block.

12042 ↗(On Diff #22419)

Easy enough to add.

RKSimon updated this revision to Diff 22555.Mar 24 2015, 4:45 AM

Updated patch based on Hal's review.

Updated patch

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12032 ↗(On Diff #22555)

I ended up keeping this in as it makes for much easier understanding than all the conditions I would have to add to the if() test. I've added a comment explaining the 'bail out' to make it clearer.

resistor accepted this revision.Apr 1 2015, 9:48 AM
resistor added a reviewer: resistor.
resistor added a subscriber: resistor.

Looks good to me.

This revision is now accepted and ready to land.Apr 1 2015, 9:48 AM
andreadb added inline comments.Apr 1 2015, 10:03 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12033–12035 ↗(On Diff #22555)

On x86, a build_vector with all operands undef excluding the first operand is legalized to a scalar_to_vector. I am not sure if other targets would do the same; in case, I would suggest to remove this code and just emit a build_vector.
Do you have an example that relies on this check? A shuffles with only one non-undef element should have been canonicalized/simplified to a shuffle where one of the operands is undef.

test/CodeGen/X86/mmx-bitcast.ll
78 ↗(On Diff #22555)

I don't think this is related to your patch. However, this looks like a bug to me. Shouldn't this be a 'movq'?

test/CodeGen/X86/sse41.ll
1028–1029 ↗(On Diff #22555)

My question is: do we have an equivalent test for insertps somewhere else? If so, then I think it is OK to remove it. Otherwise I would keep it.

test/CodeGen/X86/vector-shuffle-128-v16.ll
646 ↗(On Diff #22555)

Shouldn't this be 'vmovd'?

Thanks Andrea. Are you happy with me submitting with these changes (after tests) or would prefer another review?

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12033–12035 ↗(On Diff #22555)

No specific need - its easy enough to always create a BUILD_VECTOR and rely on a target's lowering logic to do the right thing.

test/CodeGen/X86/mmx-bitcast.ll
78 ↗(On Diff #22555)

movd deals with 32 and 64-bit gprs <-> vector moves. movq does vector load/stores and vector <-> vector moves. Its a funny old world.

test/CodeGen/X86/sse41.ll
1028–1029 ↗(On Diff #22555)

Its a reduced test that with that patch folds to a store of a constant. I'll change it to use non-constant vector data and see if that works.

test/CodeGen/X86/vector-shuffle-128-v16.ll
646 ↗(On Diff #22555)

Fixed.

RKSimon added inline comments.Apr 1 2015, 11:34 AM
test/CodeGen/X86/mmx-bitcast.ll
78 ↗(On Diff #22555)

Actually scrub that - this does appear to be a bug. Its is a (v)movq instruction - but encoded similar to the (v)movd and completely separate to the (v)movq vector version.

andreadb accepted this revision.Apr 1 2015, 11:50 AM
andreadb edited edge metadata.

Thanks Andrea. Are you happy with me submitting with these changes (after tests) or would prefer another review?

The patch looks good to me.
Thanks!

test/CodeGen/X86/mmx-bitcast.ll
78 ↗(On Diff #22555)

Anyway, this problem is not related to your patch. In case, you can raise a bug for it.

spatel added inline comments.Apr 1 2015, 12:31 PM
test/CodeGen/X86/mmx-bitcast.ll
78 ↗(On Diff #22555)

I had a similar question in:
http://reviews.llvm.org/D8691

Looks like some weirdness due to opcode prefixes; one of the movq versions won't work on a 64-bit system.

This revision was automatically updated to reflect the committed changes.