This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner][X86] `visitVECTOR_SHUFFLE()`: splats with a single non-undef element are not splats
Needs ReviewPublic

Authored by lebedev.ri on Dec 31 2022, 12:48 PM.

Details

Reviewers
RKSimon
pengfei
Summary

This addresses a significant portion of regressions that would otherwise appear in D140677.

@RKSimon this seems obviously good in general overall,
but there are some dubious changes here, at mostly for SSE2:
we fail to simplify some and/andn masks,
and pull identical target-specific shuffles out of commutative opcodes.

Please can you indicate which of the test changes should be dealt with?

Diff Detail

Event Timeline

lebedev.ri created this revision.Dec 31 2022, 12:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 31 2022, 12:48 PM
lebedev.ri edited the summary of this revision. (Show Details)Dec 31 2022, 1:00 PM

Cheers - I'll review the x86 diffs and get back to you

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
23792

Instead of the count() - it might be worth adding a bool arg to isSplat()/isSplatMask() to only match splats with more than a single matching element?

Cheers - I'll review the x86 diffs and get back to you

Thank you for taking a look!
I've de-shedded a number of regressions off of D140677,
but the last few remaining are a last few for a reason :S

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
23792

Sure, and i think that should even be default.

Sorry for the delay - I'm slowly returning from Christmas break.

I think if we can address the horizontal-op regressions then the other cases are pretty trivial to address - tweaking canonicalizeShuffleMaskWithHorizOp might help us.

llvm/test/CodeGen/X86/haddsub-3.ll
5–7

(trivial) Looks like you can add back the baseline common AVX prefix to cover both AVX1/AVX2 checks

Sorry for the delay - I'm slowly returning from Christmas break.

I think if we can address the horizontal-op regressions

Err. As far as i was concerned, the horisontal-op changes *weren't* regressions,
because they all happen for the run lines where horisontal math is slow.
Do we actually want those to remain HOps?

then the other cases are pretty trivial to address - tweaking canonicalizeShuffleMaskWithHorizOp might help us.

lebedev.ri marked an inline comment as done.Jan 3 2023, 5:11 PM
lebedev.ri added inline comments.
llvm/test/CodeGen/X86/haddsub-undef.ll
471

@RKSimon these no longer match because previously we had shuffle-fadd-shuffle,
so isHorizontalBinOp() called shouldUseHorizontalOp() with IsSingleSource=false,
and now we have shuffle-fadd, and shouldUseHorizontalOp() is called with IsSingleSource=false.
So this really doesn't look like a regression to me.

lebedev.ri added inline comments.Jan 3 2023, 5:14 PM
llvm/test/CodeGen/X86/haddsub-undef.ll
471

and now we have shuffle-fadd, and shouldUseHorizontalOp() is called with IsSingleSource=true, of course that is

FYI, these are the regressions i was aware of.

llvm/test/CodeGen/X86/vector-narrow-binop.ll
112

This is all-ones mask

170

Failure to pull identical shuffle out of binop

llvm/test/CodeGen/X86/vector-shuffle-128-v4.ll
1540–1550

Missing lowering strategy?

llvm/test/CodeGen/X86/vector-shuffle-combining-avx2.ll
843

We intentionally lower it this way,
but given that we end with an extra shuffle, it may be a regression?

lebedev.ri updated this revision to Diff 486394.Jan 4 2023, 2:21 PM

Rebased, no changes, NFC.

RKSimon added inline comments.Jan 5 2023, 5:08 AM
llvm/test/CodeGen/X86/haddsub-3.ll
5–7

why not just call the common prefix AVX? AVX1/AVX1-ONLY just seems to be confusing tbh.

RKSimon added inline comments.Jan 5 2023, 5:10 AM
llvm/test/CodeGen/X86/haddsub-undef.ll
471

yes - for slow hops, 2shuffles+alu is the threshold for using hop (unless building for size)

RKSimon added inline comments.Jan 5 2023, 5:12 AM
llvm/test/CodeGen/X86/horizontal-sum.ll
66

regression - we've gone from 3hops to 4hops + extra shuffles

lebedev.ri marked an inline comment as done.Jan 9 2023, 5:12 PM
lebedev.ri added inline comments.
llvm/test/CodeGen/X86/haddsub-undef.ll
471

So most of these hop changes are not regressions, correct?

lebedev.ri marked an inline comment as done.Jan 9 2023, 5:49 PM
lebedev.ri added a subscriber: tstellar.
lebedev.ri added inline comments.
llvm/test/CodeGen/X86/horizontal-sum.ll
66

Ok, i'll start with this one i guess.
It also seems reasonably straight-forward,
at least as the first step we need to go from

    t23: v2f32 = vector_shuffle<1,u> t21, undef:v2f32
  t24: v2f32 = fadd t21, t23
    t33: v2f32 = vector_shuffle<1,u> t32, undef:v2f32
  t34: v2f32 = fadd t32, t33
t75: v4f32 = concat_vectors t24, t34

to (pseudocode)

i0: v4f32 = concat_vectors t21, t32
i1: v4f32 = vector_shuffle<1,u,3,u> i0, undef:v4f32
i2: v4f32 = fadd i1, i0

I'm guessing *just* folding concat_vectors of identical opcodes
to a single opcode of multiple concat_vectors may not be a win though,
and shuffles must be matched too. Not sure yet.

But, i'm getting mixed signals here.
@RKSimon Should this kind of straight-forward yak shaving be just committed, or submitted to phab first?

@lebedev.ri I'm sorry about the slow response - but its taking forever to catch up after the Christmas break

Rebased, NFC.
Same regression remains, and given that my attempts to improve other things aren't being well-received, i'm not sure if i should bothed.

lebedev.ri marked an inline comment as done.
lebedev.ri added inline comments.Jan 18 2023, 2:50 PM
llvm/test/CodeGen/X86/vector-narrow-binop.ll
112

Err, no, not all-ones mask. It masks away high half/byte of each i16 element.

lebedev.ri added inline comments.Jan 18 2023, 3:51 PM
llvm/test/CodeGen/X86/vector-narrow-binop.ll
170

X86's narrowShuffle() intentionally does this, we can't treat this as a general shuffle combining issue.
Do we have an inverse of canonicalizeShuffleWithBinOps()?

RKSimon added inline comments.Jan 23 2023, 2:35 AM
llvm/test/CodeGen/X86/haddsub-undef.ll
471

Agreed - all the haddsub-undef.ll changes are improvements or neutral.