The types of the store values can diverge when checking for valid mask store candidates to eliminate via DSE.
Details
Diff Detail
Event Timeline
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp | ||
---|---|---|
250 | Is it possible to relax the comparison from type to type size? |
llvm/test/Transforms/DeadStoreElimination/masked-dead-store-no-merge.ll | ||
---|---|---|
3 ↗ | (On Diff #455690) | -tbaa and datalayout should not be necessary here. |
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp | ||
---|---|---|
245 | I am not sure I understand the meaning of the comment. Could you re-phrase? | |
250 | Yeah it would be good to check the stored sizes instead of requiring the types to match. I think we should remove one store if both masked stores are flipped. | |
llvm/test/Transforms/DeadStoreElimination/masked-dead-store-no-merge.ll | ||
11 ↗ | (On Diff #455690) | Could you instead extend the existing masked store test? llvm/test/Transforms/DeadStoreElimination/masked-dead-store.ll Also, it would be good to pre-commit the test separately and only include the changed lines caused by the patch in this diff. |
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp | ||
---|---|---|
250 | Well, we have an ulterior motive, VLA uses VL with type to constrain size. |
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp | ||
---|---|---|
250 | Right, so it sounds like this should also 1) have tests with scalable vectors and 2) different handling for scalable vs non-scalable vectors? |
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp | ||
---|---|---|
250 | We would like to keep this patch to just the fixed non-scalable vectors like in this example. I can add the type size and I think element count would also be useful, as it is within context of this intrinsic. |
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp | ||
---|---|---|
250 | @fhahn what does "flipped" mean in your comment. "I think we should remove one store if both masked stores are flipped." I think there was a confusion about was meant by scalable here. When @mcberg2021 said scalable that was referring to a different intrinsic, vp_store, that we will also need to handle here eventually. When @fhahn said scalable that was referring to this intrinsic with a scalable vector type. My suggestion is to check getElementCount and that getScalarSizeInBits for the type match. That will cover both fixed and scalable. Alternatively, since the masks are the same, the element count is guaranteed the same. So we could check only that getPrimitiveSizeInBits() is the same. |
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp | ||
---|---|---|
250 | Query: What does "flipped" mean? Right now I think we cover both fixed and scalable for masked_store with a type size check and ElementCount check being the same for both stores as is it precludes mixing fixed and scalable and will work for either scenario. |
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp | ||
---|---|---|
250 | I'm going to venture a reasonable guess for flipped: It's the test example turned around from: tail call void @llvm.masked.store.v4i32.p0(<4 x i32> %v2, ptr %a, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>) tail call void @llvm.masked.store.v4i8.p0(<4 x i8> %v1, ptr %a, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>) to tail call void @llvm.masked.store.v4i8.p0(<4 x i8> %v1, ptr %a, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>) tail call void @llvm.masked.store.v4i32.p0(<4 x i32> %v2, ptr %a, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>) where if you applied a mask to supply the superset for the second store it would be : 1111 1111 1111 1111 meaning the store is fully contained in a constant mask known at compile time. |
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp | ||
---|---|---|
250 | Yes that's exactly what I had in mind! |
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp | ||
---|---|---|
252 | This should be fine if KillingTy->getScalarSizeInBits >= DeadTy->getScalarSizeInBits(), right? I might have missed it, but t looks like there's no test case like the one @mcberg2021 mentioned above? |
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp | ||
---|---|---|
252 | That too should work. I will update for the subsumption case. However I would like to leave the mask decomposition support for a later change. |
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp | ||
---|---|---|
252 | If the mask is unknown then the store is not writing contiguous bytes. There can be gaps. So it is not sufficient to check that scalar size is larger. |
LGTM, thanks! Please add the extra test case as discussed separately.
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp | ||
---|---|---|
252 | Yeah good point, we will need to combine this with the mask analysis. Still good to add the extra test coverage. The new test seems to have constant masks, it would be good to also have variants where the mask is unknown (e.g. arg to function) |
I am not sure I understand the meaning of the comment. Could you re-phrase?