This is an archive of the discontinued LLVM Phabricator instance.

[DAG] Support store merging of vector constant stores
ClosedPublic

Authored by reames on Jul 26 2023, 10:48 AM.

Details

Summary

Ran across this when making a change to RISCV memset lowering. Seems very odd that manually merging a store into a vector prevents it from being further merged.

p.s. I'm making this depend on the RISCV mem* lowering, but this could easily be rebased off that change set. If reviewers would prefer to separate, let me know.

Diff Detail

Event Timeline

reames created this revision.Jul 26 2023, 10:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 26 2023, 10:48 AM
reames requested review of this revision.Jul 26 2023, 10:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 26 2023, 10:48 AM
craig.topper added inline comments.Jul 26 2023, 10:04 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
702

I believe this function allows undef elements. Does that case work ok?

reames updated this revision to Diff 546522.Aug 2 2023, 9:51 AM

Add test coverage for case mentioned in review comment.

reames added inline comments.Aug 2 2023, 9:55 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
702

Seems to. I added x86 coverage for these cases, and got reasonable results.

It does seem like maybe we should be consistent in allowing scalar undefs in our constant determination, but that looks like a bigger change as we have to add Undef to the lattice when merging. I guess I could start with always considering undef constant - which is basically what this change does - but either option feels like it should be a separate change.

This revision is now accepted and ready to land.Aug 2 2023, 10:14 AM
This revision was landed with ongoing or failed builds.Aug 2 2023, 2:42 PM
This revision was automatically updated to reflect the committed changes.
bjope added a subscriber: bjope.Aug 9 2023, 2:54 PM
bjope added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
19406

Just a heads up. Found some problems with this downstream.

When it crashes it tries to merge these stores:

t218: v4i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
t221: ch = store<(store (s64) into %ir.arrayinit.start, align 1, !tbaa <0x557ab8701858>), trunc to v4i16> t4, t218, t104, undef:i16
t219: v2i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>
t194: ch = store<(store (s64) into %fixed-stack.0 + 15, align 1)> t4, t219, t102, undef:i16

So the first store is a v4i32->v4i16 trunc store.

When we arrive on this line (19406) we got MemVT=v2i32 but Val is t218 from above (so that is a v4i32).
I think the correct thing would be to first truncate to v4i16 and then bitcast to v2i32 (given that we have a trunc store). And maybe the same bug exist for the IsConstantSrc false path as well.

I'll see if I can create an upstream reproducer for this. I only see it downstream in certain configs depending on which vector types that are legal, so not sure how easy it will be.

bjope added inline comments.Aug 10 2023, 4:13 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
19406

I find it quite hard to find an in-tree reproducer.

I think I need to fulfil UseVector = (LastLegalVectorType > LastLegalType) && AllowVectors to trigger this path. And then for example TLI.storeOfVectorConstantIsCheap need to return true in order to set LastLegalVectorType (limiting the amount of possible targets a bit). And then it depends on which vector types that are legal. And if I for example use RISCV as target it seems like it folds the truncate of a build vector, rather than introducing a truncating store.

bjope added inline comments.Aug 10 2023, 4:58 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
19406

Finally some luck:

; RUN: llc -mtriple=x86_64-unknown-unknown -mcpu=skx -o - %s
define void @foo(ptr %p, ptr %q) {
  %v = insertelement <8 x i64> zeroinitializer, i64 0, i32 1
  %tranc = trunc <8 x i64> %v to <8 x i8>
  %p1 = getelementptr i8, ptr %p, i32 0
  %p2 = getelementptr i8, ptr %p, i32 8
  %p3 = getelementptr i8, ptr %p, i32 128
  store <8 x i8> %tranc, ptr %p1
  store <8 x i8> zeroinitializer, ptr %p2
  ret void
}
bjope added inline comments.Aug 10 2023, 5:08 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
19406

@bjope I relanded with the fix for your reported issue since it turned out to be fairly simple. Can you confirm that everything works on your original example?

Also, for future reference, feel free to just revert in a case like this. You don't need to wait for me to see the comments. Clear compiler crash or regression with a patch which reverts cleanly - go for it. You're doing the author a favor.

bjope added a comment.Aug 11 2023, 3:55 AM

@bjope I relanded with the fix for your reported issue since it turned out to be fairly simple. Can you confirm that everything works on your original example?

Also, for future reference, feel free to just revert in a case like this. You don't need to wait for me to see the comments. Clear compiler crash or regression with a patch which reverts cleanly - go for it. You're doing the author a favor.

@reames: Ok, great!, Thanks a lot, and I can confirm that the original test case now pass with the updated patch.