This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] reduce buildvec of zexted extracted element to shuffle
ClosedPublic

Authored by spatel on Jan 3 2019, 11:18 AM.

Details

Summary

The motivating case for this is shown in the first regression test. We are transferring to scalar and back rather than just zero-extending with 'vpmovzxdq'.

That's a special-case for a more general pattern as shown here. In all tests, we're avoiding the vector-scalar-vector moves in favor of vector ops.

I suspect that we aren't producing optimal shuffle code in some cases though, so we may want to limit this patch and/or account for those patterns first. But I figured it was worth posting the larger test diffs, so we can see what's happening and make sure the logic is correct.

If we want to limit this patch but still get that 1st motivating case, I see 2 possibilities:

  1. Don't handle patterns where we require translating the source element to a different location in the result.
  2. Don't handle patterns with zero elements in the build vector (only deal with undefs there).

Diff Detail

Event Timeline

spatel created this revision.Jan 3 2019, 11:18 AM
spatel updated this revision to Diff 181103.Jan 10 2019, 10:56 AM

Patch updated:
I reduced the reach of this patch to only handle a build vector of undefs + 1-non-undef. If x86 isn't lowering the other cases optimally, then other targets may not be either even if there's no regression test evidence of that.

So now, these should be mostly clear wins. Not sure about some of the SSE2-only shuffling.

RKSimon added inline comments.Jan 13 2019, 4:24 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16136

Should ANY_EXTEND be handled as well? SimplifyDemandedBits can reduce ZERO_EXTEND -> ANY_EXTEND more aggressively these days.

test/CodeGen/X86/buildvec-extract.ll
422

Please can you investigate what's happening here? The xmm0[6,7] at the end seems really weird....

I also wonder if we have much crossover with PR39975 (truncate(extract()) -> extract(bitcast())) to handle TRUNCATE as well as *_EXTEND

spatel marked 2 inline comments as done.Jan 14 2019, 8:31 AM
spatel added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16136

That seems ok, and I can update with a draft of that change. But any idea how to make a test that would provide coverage for that pattern? I'm not showing any existing test diffs. Alternatively, I can add a TODO enhancement comment while trying to find a way to make that happen.

test/CodeGen/X86/buildvec-extract.ll
422

Hmm - I didn't notice that before.

There's some shuffle lowering madness that I haven't stepped through yet that creates several nodes before this becomes the single pshufb.

I filed PR40306 to track this:
https://bugs.llvm.org/show_bug.cgi?id=40306

RKSimon added inline comments.Jan 14 2019, 8:40 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16136

TODO is fine

spatel updated this revision to Diff 181582.Jan 14 2019, 10:17 AM
spatel marked an inline comment as done.

Patch updated:
Added TODO comment for handling ISD::ANY_EXTEND.

spatel updated this revision to Diff 181649.Jan 14 2019, 1:58 PM

Patch updated:
No code changes, but rebased after rL351103 so we get more vector zext'ing codegen.

RKSimon accepted this revision.Jan 15 2019, 1:17 AM

LGTM (with one pcg bug request).

test/CodeGen/X86/buildvec-extract.ll
412

Please can you raise a bug on this - we should do better for this shuffle.

456

Add this shuffle to the same bug - I think its the same culprit.

This revision is now accepted and ready to land.Jan 15 2019, 1:17 AM
spatel marked 2 inline comments as done.Jan 15 2019, 7:47 AM
spatel added inline comments.
test/CodeGen/X86/buildvec-extract.ll
412
This revision was automatically updated to reflect the committed changes.
spatel marked an inline comment as done.

Hello. I am investigating a crash and this assertion failure on AArch64:

lib/CodeGen/SelectionDAG/SelectionDAG.cpp:1548: llvm::SDValue llvm::SelectionDAG::getVectorShuffle(llvm::EVT, const llvm::SDLoc&, llvm::SDValue, llvm::SDValue, llvm::ArrayRef<int>): Assertion `VT.getVectorNumElements() == Mask.size() && "Must have the same number of vector elements as mask elements!"' failed.

Bisecting is blaming this commit. I've only just started debugging this (and it is hitting this new function), but just wanted to share this in case you have some ideas and thoughts on this. This is my reproducer:

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-unknown-linux-gnu"

@a = common dso_local local_unnamed_addr global <4 x i16> zeroinitializer, align 8
@b = common dso_local local_unnamed_addr global i32 0, align 4

; Function Attrs: nounwind
define dso_local i32 @vqmovns_s32() local_unnamed_addr {
entry:
  %0 = load <4 x i16>, <4 x i16>* @a, align 8
  %1 = extractelement <4 x i16> %0, i32 2
  %vgetq_lane = zext i16 %1 to i32
  %2 = insertelement <4 x i32> undef, i32 %vgetq_lane, i64 0
  %vqmovns_s32 = tail call <4 x i16> @llvm.aarch64.neon.sqxtn.v4i16(<4 x i32> %2)
  %3 = extractelement <4 x i16> %vqmovns_s32, i64 0
  %conv = sext i16 %3 to i32
  store i32 %conv, i32* @b, align 4
  ret i32 undef
}

; Function Attrs: nounwind readnone
declare <4 x i16> @llvm.aarch64.neon.sqxtn.v4i16(<4 x i32>)

Hello. I am investigating a crash and this assertion failure on AArch64:

lib/CodeGen/SelectionDAG/SelectionDAG.cpp:1548: llvm::SDValue llvm::SelectionDAG::getVectorShuffle(llvm::EVT, const llvm::SDLoc&, llvm::SDValue, llvm::SDValue, llvm::ArrayRef<int>): Assertion `VT.getVectorNumElements() == Mask.size() && "Must have the same number of vector elements as mask elements!"' failed.

I see the problem now: I forgot to verify that the build vector and the source vector of the extract element are the same size. Should have a fix committed soon.

I see the problem now: I forgot to verify that the build vector and the source vector of the extract element are the same size. Should have a fix committed soon.

Please see:
rL351753
rL351754

Filed a blocking bug for the 8.0 release:
https://bugs.llvm.org/show_bug.cgi?id=40394

Many thanks for your speedy responses and for fixing this!