This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner][X86] `mergeConsecutiveStores()`: support merging splat-stores of the same value
Needs ReviewPublic

Authored by lebedev.ri on Jan 14 2023, 5:52 PM.

Details

Summary

getStoreMergeCandidates() is quite obvious for this kind
of source - just look for identical store values.
tryStoreMergeOfConstants() basically already supports it,
we just need to bypass some checks that are constant-specific.

For X86, this does appear to be always better,
as long as we store at least a single full vector register.
But the test changes for other targets were more questionable,
so i didn't enable it elsewhere.

Diff Detail

Event Timeline

lebedev.ri created this revision.Jan 14 2023, 5:52 PM
lebedev.ri requested review of this revision.Jan 14 2023, 5:52 PM

Drop unused function declaration, NFC.

obious -> obvious

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
676–677

Is this too strong to assume all the other cases are Splat? If this is the fact, why adding it in StoreSource rather than replacing Unknown?

llvm/test/CodeGen/X86/legalize-shl-vec.ll
249–255

Looks like regression here.

lebedev.ri edited the summary of this revision. (Show Details)Jan 15 2023, 6:31 AM
lebedev.ri marked an inline comment as done.Jan 15 2023, 6:35 AM
lebedev.ri added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
676–677

What do you mean by "too strong"?
Originally we'd just return StoreSource::Unknown and bailout
without looking at other stores. If the other stores *don't* store
the same value, then we'll bailout just the same.

There is probably some generalization -- just because we've matched
a more fine-grained StoreSource, doesn't mean that the stores
*aren't* splatting the same value -- but that is not a required part
of this enhancement.

As for Unknown, i don't know, but it seems not useful to just drop it,
since it can be used to signal invalid state.

lebedev.ri marked an inline comment as done.

Add subvector test coverage.
@spatel should we not split such stores just like we do with stores of CONCAT_VECTOR?

lebedev.ri removed subscribers: arsenm, jvesely, arichardson and 30 others.

Actually upload the right diff.

Add subvector test coverage.
@spatel should we not split such stores just like we do with stores of CONCAT_VECTOR?

Ah, so we already do that in collectConcatOps(). Okay, then these must be improvements.

lebedev.ri added inline comments.Jan 15 2023, 7:25 AM
llvm/test/CodeGen/X86/legalize-shl-vec.ll
249–255

We've traded 4 scalar stores to two vector stores + GPR->XMM xfer + two shuffles + shift.
I wouldn't say it's an obvious regression, since we get less contention in CPU store unit,
but it's not really an improvement, yes.

Can you help spot issues in the tests in elementwise-store-of-scalar-splat.ll / subvectorwise-store-of-vector-splat.ll?
If those do not have regressions, then we need to restrict some other fold.

lebedev.ri marked an inline comment as done.Jan 15 2023, 8:19 AM
lebedev.ri added inline comments.
llvm/test/CodeGen/X86/legalize-shl-vec.ll
249–255

This is an unrelated shuffle combining issue

movq    %r8, %xmm0
pshufd  $68, %xmm0, %xmm0               # xmm0 = xmm0[0,1,0,1]
psrad   $31, %xmm0
pshufd  $245, %xmm0, %xmm0              # xmm0 = xmm0[1,1,3,3]

What we want here, is to splat the sign bit of a 64-bit scalar to the entire XMM.
This should just be: (note the decoded shuffle mask)

movq    %r8, %xmm0
pshufd  $<???>, %xmm0, %xmm0            # xmm0 = xmm0[1,1,1,1]
psrad   $31, %xmm0
lebedev.ri added inline comments.
llvm/test/CodeGen/X86/legalize-shl-vec.ll
249–255

https://reviews.llvm.org/D141806 will address this.
Does anyone see any other regressions?

pengfei added inline comments.Jan 15 2023, 5:50 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
676–677

Alright, how about to use MayBeSplat?

I got the logic now. There is a getStoreSource before calling getStoreMergeCandidates and there is another getStoreSource in getStoreMergeCandidates. I think this is just a trick to make it works. It introduces uncertainty into the result. People may get unexpected result if they use this function somewhere else.

lebedev.ri marked an inline comment as done.Jan 15 2023, 6:05 PM
lebedev.ri added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
676–677

Essentially, we want to guess what strategy will succeed
to merge the stores, looking at a single store.
StoreSource don't really describe the source,
but names a strategy, it's not an ideal enum name.

But likewise, i'm not sure why MayBeSplat is a better strategy name?
It either is if the strategy matched, or isn't if it didn't.

I got the logic now. There is a getStoreSource before calling getStoreMergeCandidates and there is another getStoreSource in getStoreMergeCandidates.

Yes.

I think this is just a trick to make it works.

What is?

It introduces uncertainty into the result. People may get unexpected result if they use this function somewhere else.

Sorry, i do not follow.

pengfei added inline comments.Jan 15 2023, 6:59 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
676–677

StoreSource don't really describe the source, but names a strategy, it's not an ideal enum name.

I understand it now. Thanks for the explanation!
I had wrong assumption about the function. Please ignore the comments.

19442–19443

This is dead now.

pengfei added inline comments.Jan 15 2023, 7:28 PM
llvm/test/CodeGen/X86/legalize-shl-vec.ll
249–255

Can you help spot issues in the tests in elementwise-store-of-scalar-splat.ll / subvectorwise-store-of-vector-splat.ll?

I like the AVX512 broadcast version, i.e., GPR->XMM directly. AVX2 and some AVX512F is suboptimal. I cannot tell whether it's good or not to replace splat store with shufle on SSE2. Should a rep stos be better?

lebedev.ri marked 3 inline comments as done.Jan 16 2023, 7:04 AM
lebedev.ri added inline comments.
llvm/test/CodeGen/X86/legalize-shl-vec.ll
249–255

Should a rep stos be better?

Almost certainly not.

lebedev.ri marked 2 inline comments as done.

Nits.

Rebased, NFC.

RKSimon added inline comments.Jan 22 2023, 4:57 AM
llvm/test/CodeGen/X86/elementwise-store-of-scalar-splat.ll
2

why is this 'SCALAR' given that the triple implicitly assumes SSE2?

lebedev.ri marked an inline comment as done.

@RKSimon thank you for taking a look!
Addressing nits.