Remove a dead masked store if another one has the same base pointer and mask.
Details
Diff Detail
Event Timeline
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
11167 | Is this check necessary? I would expect that it doesn't run the DAGCombiner when OptLevel == CodeGenOpt::None. Does it still run the DAGCombiner with -O0? | |
11168 | A store returns a Chain, which always has one use, so this test can be removed. | |
11169 | This seems like an odd case to ever happen, and you did not add a specific test for this case. Why did you add this check? | |
11170 | Is this check necessary? I would expect that if both pointers are equal, they use the same address space? | |
11172 | I think this can be isKnownEQ, because for the predicates to match, the sizes must match too. | |
llvm/test/CodeGen/AArch64/sve-dead-masked-store.ll | ||
12 | Another case that can be handled is where the second store has an 'all true' mask, then it doesn't really matter what the mask of the first store is, e.g. %alltrue.ins = insertelement <vscale x 4 x i1> poison, i1 true, i32 0 %alltrue = shufflevector <vscale x 4 x i1> %splat.ins, <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val, ptr %a, i32 4, <vscale x 4 x i1> %mask) call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val, ptr %a, i32 4, <vscale x 4 x i1> %alltrue) |
I found an error in my implementation: we could not remove the store if the chained store is a fixed type and the store we consider to remove is a scalable type, since we don't know scalable type size in the runtime. fixed.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
11121–11124 | Can this be replaced by: ElementCount::isKnownLE(MST1->getMemoryVT().getStoreSize(), MST->getMemoryVT().getStoreSize()) ? |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
11121–11124 | hmm, I could image somethis like: %alltrue.ins = insertelement <vscale x 4 x i1> poison, i1 true, i32 0 %alltrue = shufflevector <vscale x 4 x i1> %alltrue.ins, <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val, ptr %a, i32 4, <vscale x 4 x i1> %mask) call void @llvm.masked.store.nxv4i16(<vscale x 4 x i16> %val1, ptr %a, i32 4, <vscale x 4 x i1> %alltrue) that ends up with st1h, probably TypeSize::KnownLE suits better here. |
Thanks for adding the tests @dtemirbulatov! I've just got two more nits on the tests, then I'm happy to accept.
llvm/test/CodeGen/AArch64/sve-dead-masked-store.ll | ||
---|---|---|
37–38 | Can you change this to: call void @llvm.masked.store.nxv4i16(<vscale x 4 x i16> %val, ptr %a, i32 4, <vscale x 4 x i1> %mask) call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val1, ptr %a, i32 4, <vscale x 4 x i1> %mask) ; with same value for the mask such that it's more clear that the first store is eliminated? Now it code-generates the store of nxv4i64 into two stores, because nxv4i64 is not a legal type and needs splitting. | |
52 | For this test, can you use the same %mask value? |
Adjust according to comments, fixed error with the same mask but different type size by forbid to delete the store.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
11119 |
I'm not sure if this is the reason you made this change, but the test that previously removed the redundant store: define void @dead_masked_store_same_mask_bigger_type(<vscale x 4 x i16> %val, <vscale x 4 x i32> %val1, ptr %a, <vscale x 4 x i1> %mask) { ; CHECK-LABEL: dead_masked_store_same_mask_bigger_type: ; CHECK: // %bb.0: ; CHECK-NEXT: st1w { z1.s }, p0, [x0] ; CHECK-NEXT: ret call void @llvm.masked.store.nxv4i16(<vscale x 4 x i16> %val, ptr %a, i32 4, <vscale x 4 x i1> %mask) call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val1, ptr %a, i32 4, <vscale x 4 x i1> %mask) ret void } Is now doing this: define void @dead_masked_store_same_mask_bigger_type(<vscale x 4 x i16> %val, <vscale x 4 x i32> %val1, ptr %a, <vscale x 4 x i1> %mask) { ; CHECK-LABEL: dead_masked_store_same_mask_bigger_type: ; CHECK: // %bb.0: ; CHECK-NEXT: st1h { z0.s }, p0, [x0] ; CHECK-NEXT: st1w { z1.s }, p0, [x0] ; CHECK-NEXT: ret call void @llvm.masked.store.nxv4i16(<vscale x 4 x i16> %val, ptr %a, i32 4, <vscale x 4 x i1> %mask) call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val1, ptr %a, i32 4, <vscale x 4 x i1> %mask) ret void } The second store is overwriting all the i16 elements that were stored in the first llvm.masked.store, with i32 elements from the second llvm.masked.store, so that means the first store can be removed right? |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
11119 | The mask is the same for both stores, but because we types are different starting then offsets of starting and ending affected elements different too. I think we have to keep both store in those cases. |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
11119 | The mask is the same for both stores, but because types are different starting then offsets of starting and ending affected elements different too. I think we have to keep both store in those cases. |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
11119 | You're right, your current code is correct. The elements wouldn't necessarily line-up in memory, e.g. call void @llvm.masked.store.v4i16(<4 x i16> <a, b, c, d>, ptr %a, i32 4, <4 x i1> <1, 0, 0, 1>) ; would store `a` and `d` resulting in { a__d____ } call void @llvm.masked.store.v4i32(<4 x i32> <e, f, g, h>, ptr %a, i32 4, <4 x i1> <1, 0, 0, 1>) ; would store `e` and `h` resulting in { ee_d__hh } so the first call can indeed not be removed. |
I'm not sure if this is the reason you made this change, but the test that previously removed the redundant store:
Is now doing this:
The second store is overwriting all the i16 elements that were stored in the first llvm.masked.store, with i32 elements from the second llvm.masked.store, so that means the first store can be removed right?