Page MenuHomePhabricator

[DAGCombiner] Call SimplifyDemandedBits to simplify EXTRACT_VECTOR_ELT
Needs ReviewPublic

Authored by foad on Sep 30 2020, 7:12 AM.

Diff Detail

Event Timeline

foad created this revision.Sep 30 2020, 7:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 30 2020, 7:12 AM
foad requested review of this revision.Sep 30 2020, 7:12 AM
foad added inline comments.Sep 30 2020, 7:19 AM
llvm/test/CodeGen/Mips/cconv/vector.ll
697–758

There are some regressions in this file but also some improvements. I haven't worked out what's going on yet.

llvm/test/CodeGen/Thumb2/mve-vecreduce-mla.ll
249–259

Regression here and in other cases that are now using muls instead of umull/umlal.

llvm/test/CodeGen/X86/vector-fshl-128.ll
178–179

Regression. Quite a few tests are now using pxor+punpckhdq instead of pshufd. I wonder if some kind of combine could spot this case and turn it back into pshufd.

foad added inline comments.Sep 30 2020, 7:34 AM
llvm/test/CodeGen/ARM/dagcombine-anyexttozeroext.ll
46–47

Regression.

llvm/test/CodeGen/ARM/vdup.ll
59–69

Regression in lots of cases in this file.

foad updated this revision to Diff 295595.Oct 1 2020, 9:15 AM

Rebase on D88570.

foad updated this revision to Diff 296629.Oct 7 2020, 2:57 AM

Rebase.

RKSimon added inline comments.Oct 7 2020, 3:44 AM
llvm/test/CodeGen/AMDGPU/sdiv64.ll
502

Is this a regression? It looks like we're lost track that we only need 1 element

llvm/test/CodeGen/ARM/func-argpassing-endian.ll
106 ↗(On Diff #296629)

regression? we're no longer doing a single multiple load

llvm/test/CodeGen/ARM/vdup.ll
59–69

poor duplicate/splat detection?

@foad Any update on this?

yubing added a subscriber: yubing.Dec 4 2020, 4:32 AM
yubing added inline comments.Dec 6 2020, 4:19 AM
llvm/test/CodeGen/X86/vector-fshr-128.ll
179–180

Hi, during combination before legalizeDAG. there are the following Nodes:

  t63: i32 = extract_vector_elt t62, Constant:i32<2>
  t64: i32 = extract_vector_elt t62, Constant:i32<3>
t66: v4i32 = BUILD_VECTOR t63, t64, undef:i32, undef:i32

SimplifyDemandedBits deduce that t64 is a zero, so t66 will transformed into a pxor and a punpckhqd instead of a single pshufd.

Maybe in such a case where a buildvector have elts which are extracted from the same vector, you shouldn't SimplifyDemandedBits for extract_vector_elt.

foad updated this revision to Diff 316897.Jan 15 2021, 3:53 AM

Rebase.

RKSimon added inline comments.Jan 15 2021, 5:54 AM
llvm/test/CodeGen/X86/buildvec-insertvec.ll
783

this should simplify to "store i32 undef, store i32* undef" and be removed - can you check why it isn't please?

llvm/test/CodeGen/X86/vec_setcc.ll
223

why didn't this simplify?

foad added inline comments.Jan 15 2021, 9:53 AM
llvm/test/CodeGen/X86/buildvec-insertvec.ll
783

The value being stored is not undef, it's either -2147483648 or poison, depending on the value of %a0.

Anyway -simplifycfg would change the store into a trap + unreachable, but nothing in llc's codegen pipeline does that.

llvm/test/CodeGen/X86/vec_setcc.ll
223

What simplification are you expecting?

foad added inline comments.Jan 18 2021, 3:40 AM
llvm/test/CodeGen/X86/buildvec-insertvec.ll
783

I've looked into this more carefully now. Hopefully this answer makes more sense.

With my patch %4 = extractelement <4 x i32> zeroinitializer, i32 %2 is simplified into i32 0 based on the known bits of all elements. This happens before we simplify %2, because of the weird way that the DAG combiner runs top-down.

When we visit %2 we simplify it to -2147483648. After that, if we visited the original %4 again, we would simplify it to undef; but with my patch we have already simplified %4 to 0 so it's too late.

I'm not sure what to do about this -- other than change DAGCombine to run bottom-up ;-)

RKSimon added inline comments.Jan 23 2021, 10:19 AM
llvm/test/CodeGen/X86/buildvec-insertvec.ll
783

Do we need to tweak the out-of-range handling to ISD::EXTRACT_VECTOR_ELT indices? Either just for constant indices or we use computeKnownBits to work out if the minimum value always the exceeds the vector element count?

@foad Please can you rebase this?

foad updated this revision to Diff 332981.Mar 24 2021, 7:18 AM

Rebase. There are two failing tests that I have not updated yet:

Failed Tests (2):

LLVM :: CodeGen/AMDGPU/cttz_zero_undef.ll
LLVM :: CodeGen/AMDGPU/scratch-simple.ll
foad added a comment.Mar 24 2021, 7:21 AM

Maybe in such a case where a buildvector have elts which are extracted from the same vector, you shouldn't SimplifyDemandedBits for extract_vector_elt.

Yes, I am coming round to this idea: don't simplify EXTRACT_VECTOR_ELT to a constant if it is used by BUILD_VECTOR, because of the risk of breaking things like shuffle patterns.

RKSimon added inline comments.Mar 24 2021, 7:52 AM
llvm/test/CodeGen/X86/nontemporal-3.ll
2–8

Please can you add a common CHECK prefix:

; RUN: llc < %s -mtriple=x86_64-unknown-unknown | FileCheck %s --check-prefixes=CHECK,SSE
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse4a | FileCheck %s --check-prefixes=CHECK,SSE
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse4.1 | FileCheck %s --check-prefixes=CHECK,SSE
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx | FileCheck %s --check-prefixes=CHECK,AVX
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 | FileCheck %s --check-prefixes=CHECK,AVX
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512dq,+avx512vl | FileCheck %s --check-prefixes=CHECK,AVX512
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw,+avx512vl | FileCheck %s --check-prefixes=CHECK,AVX512
llvm/test/CodeGen/X86/vec_setcc.ll
223

The 0'th index should be able to extract from the source of the _EXTEND_VECTOR_INREG using SimplifyMultipleUseDemandedBits

foad updated this revision to Diff 333005.Mar 24 2021, 8:24 AM

Add a common CHECK prefix.

RKSimon added inline comments.Mar 29 2021, 3:07 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
18650

How come this isn't picking up many of these test cases? Is it being run too late?

foad added inline comments.Mar 29 2021, 3:16 AM
llvm/test/CodeGen/X86/vec_setcc.ll
223

Well %eax is extracted directly from the result of the pcmpeqw, which is the source of the sign_extend_vector_inreg.