Page MenuHomePhabricator

[DAGCombiner] reduce buildvec of zexted extracted element to shuffle
ClosedPublic

Authored by spatel on Thu, Jan 3, 11:18 AM.

Details

Summary

The motivating case for this is shown in the first regression test. We are transferring to scalar and back rather than just zero-extending with 'vpmovzxdq'.

That's a special-case for a more general pattern as shown here. In all tests, we're avoiding the vector-scalar-vector moves in favor of vector ops.

I suspect that we aren't producing optimal shuffle code in some cases though, so we may want to limit this patch and/or account for those patterns first. But I figured it was worth posting the larger test diffs, so we can see what's happening and make sure the logic is correct.

If we want to limit this patch but still get that 1st motivating case, I see 2 possibilities:

  1. Don't handle patterns where we require translating the source element to a different location in the result.
  2. Don't handle patterns with zero elements in the build vector (only deal with undefs there).

Diff Detail

Repository
rL LLVM

Event Timeline

spatel created this revision.Thu, Jan 3, 11:18 AM
spatel updated this revision to Diff 181103.Thu, Jan 10, 10:56 AM

Patch updated:
I reduced the reach of this patch to only handle a build vector of undefs + 1-non-undef. If x86 isn't lowering the other cases optimally, then other targets may not be either even if there's no regression test evidence of that.

So now, these should be mostly clear wins. Not sure about some of the SSE2-only shuffling.

RKSimon added inline comments.Sun, Jan 13, 4:24 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16136 ↗(On Diff #181103)

Should ANY_EXTEND be handled as well? SimplifyDemandedBits can reduce ZERO_EXTEND -> ANY_EXTEND more aggressively these days.

test/CodeGen/X86/buildvec-extract.ll
422 ↗(On Diff #181103)

Please can you investigate what's happening here? The xmm0[6,7] at the end seems really weird....

I also wonder if we have much crossover with PR39975 (truncate(extract()) -> extract(bitcast())) to handle TRUNCATE as well as *_EXTEND

spatel marked 2 inline comments as done.Mon, Jan 14, 8:31 AM
spatel added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16136 ↗(On Diff #181103)

That seems ok, and I can update with a draft of that change. But any idea how to make a test that would provide coverage for that pattern? I'm not showing any existing test diffs. Alternatively, I can add a TODO enhancement comment while trying to find a way to make that happen.

test/CodeGen/X86/buildvec-extract.ll
422 ↗(On Diff #181103)

Hmm - I didn't notice that before.

There's some shuffle lowering madness that I haven't stepped through yet that creates several nodes before this becomes the single pshufb.

I filed PR40306 to track this:
https://bugs.llvm.org/show_bug.cgi?id=40306

RKSimon added inline comments.Mon, Jan 14, 8:40 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16136 ↗(On Diff #181103)

TODO is fine

spatel updated this revision to Diff 181582.Mon, Jan 14, 10:17 AM
spatel marked an inline comment as done.

Patch updated:
Added TODO comment for handling ISD::ANY_EXTEND.

spatel updated this revision to Diff 181649.Mon, Jan 14, 1:58 PM

Patch updated:
No code changes, but rebased after rL351103 so we get more vector zext'ing codegen.

RKSimon accepted this revision.Tue, Jan 15, 1:17 AM

LGTM (with one pcg bug request).

test/CodeGen/X86/buildvec-extract.ll
412 ↗(On Diff #181649)

Please can you raise a bug on this - we should do better for this shuffle.

458 ↗(On Diff #181649)

Add this shuffle to the same bug - I think its the same culprit.

This revision is now accepted and ready to land.Tue, Jan 15, 1:17 AM
spatel marked 2 inline comments as done.Tue, Jan 15, 7:47 AM
spatel added inline comments.
test/CodeGen/X86/buildvec-extract.ll
412 ↗(On Diff #181649)
This revision was automatically updated to reflect the committed changes.
spatel marked an inline comment as done.