This is an archive of the discontinued LLVM Phabricator instance.

Don't combine a shuffle of two BUILD_VECTORs in general.
ClosedPublic

Authored by efriedma on Dec 14 2016, 5:14 PM.

Details

Summary

This combine causes problems in many situations. (Not sure exactly when it's safe; suggestions welcome.)

Fixes https://llvm.org/bugs/show_bug.cgi?id=31364 . Partially fixes https://llvm.org/bugs/show_bug.cgi?id=31301.

It's possible the change to combineShuffleOfScalars is a little too restrictive... but I'm not sure what a good heuristic would be.

Diff Detail

Repository
rL LLVM

Event Timeline

efriedma updated this revision to Diff 81499.Dec 14 2016, 5:14 PM
efriedma retitled this revision from to Restrict some DAGCombines for SHUFFLE_VECTOR nodes..
efriedma updated this object.
efriedma added reviewers: mkuper, craig.topper, RKSimon.
efriedma set the repository for this revision to rL LLVM.
efriedma added subscribers: llvm-commits, rengolin.
mkuper added inline comments.Dec 14 2016, 5:32 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14099

Just to make sure I understand - the reason you're removing the hasOneUse constraint is that for the case where the BUILD_VECTOR ends up a constant pool load, it's better to just load the combined vector instead of shuffling even if other users exist?

14361

Is there a reason this has to land in the same patch as the other change?
Also, could you explain why you consider splats free? On x86, at least, I don't see why that would be the case.

test/CodeGen/ARM/vzip.ll
270

Regenerate this with the update script?

test/CodeGen/X86/mmx-bitcast.ll
83

This looks like a common enough pattern that I'm not excited about regressing it at the expense of the "zip" example.

test/CodeGen/X86/vector-shuffle-128-v16.ll
670

Same as above.

efriedma added inline comments.Dec 14 2016, 5:53 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14099

We might end up with something slightly more expensive in some cases, but it seems like a worthwhile tradeoff in most situations. The hasOneUse() check won't reliably preserve constants anyway; other parts of the code like to fold away bitcasts and insert undefs into constants.

14361

I can land it separately, sure.

Not all broadcasts are free, but some are; for example, on x86 with AVX2, a splat can be folded into a load to form vpbroadcastb. I'll add an x86 testcase which covers this.

test/CodeGen/ARM/vzip.ll
270

There is no update script for ARM, as far as I know; it's possible I should spend some time writing one.

test/CodeGen/X86/mmx-bitcast.ll
83

Mmm... maybe. I would appreciate suggestions.

efriedma updated this revision to Diff 81517.Dec 14 2016, 6:13 PM
efriedma retitled this revision from Restrict some DAGCombines for SHUFFLE_VECTOR nodes. to Don't combine a shuffle of two BUILD_VECTORs in general..
efriedma updated this object.

Split into multiple patches.

RKSimon edited edge metadata.Dec 15 2016, 3:09 AM

Not sure about this, would we be better off attempting a SHUFFLE(BUILD_VECTOR(), BUILD_VECTOR()) -> SHUFFLE(BUILD_VECTOR(), UNDEF) combine replacing repeated insertions with shuffles?

mkuper added inline comments.Dec 15 2016, 9:12 AM
test/CodeGen/ARM/vzip.ll
270

Right, forgot, I'm too used to x86, sorry. :-)

test/CodeGen/X86/mmx-bitcast.ll
83

I don't really have good ones.

My knee-jerk reaction is "do the (simplifying!) combine, and improve build_vector lowering". I guess, in general, that would require looking at the BUILD_VECTOR and trying to find the best way to split it into the best sequence of BUILD_VECTORs and target shuffles. After all, there's nothing guaranteeing the pre-combine BUILD_VECTORs were optimal in any way. What if you're shuffling two BUILD_VECTORs that are splats of the same value?

But this all sounds a bit scary.

efriedma updated this revision to Diff 81622.Dec 15 2016, 11:16 AM
efriedma edited edge metadata.

More "conservative" patch. This just tries to specifically detect splat-ish cases.

Not sure this actually covers every interesting case, but I don't have any other ideas.

Not sure about this, would we be better off attempting a SHUFFLE(BUILD_VECTOR(), BUILD_VECTOR()) -> SHUFFLE(BUILD_VECTOR(), UNDEF) combine replacing repeated insertions with shuffles?

This doesn't reliably produce better code; for example, see https://llvm.org/bugs/show_bug.cgi?id=31301. The particular shuffle written by the user is cheaper than anything we're realistically going to produce from SHUFFLE(BUILD_VECTOR(), UNDEF) unless we add a special case specifically for ZIP(splat, splat), and I don't have the time to write code to special-case every possible permutation of SHUFFLE(splat, splat)/SHUFFLE(SHUFFLE(splat, splat), SHUFFLE(splat, splat))/etc. for every target supported by LLVM.

test/CodeGen/X86/mmx-bitcast.ll
83

(Phabricator won't let me delete this...)

mkuper accepted this revision.Dec 15 2016, 11:54 AM
mkuper edited edge metadata.

This LGTM with a comment nit.
I'm sure there still are cases it'll pessimize, but I think it's an improvement over what we had. Thanks, Eli!

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14122

"a non-constant BUILD_VECTOR operand".

This revision is now accepted and ready to land.Dec 15 2016, 11:54 AM
This revision was automatically updated to reflect the committed changes.