This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Allow DAGCombine to remove dead masked stores
ClosedPublic

Authored by dtemirbulatov on Feb 1 2023, 7:01 AM.

Details

Summary

Remove a dead masked store if another one has the same base pointer and mask.

Diff Detail

Event Timeline

dtemirbulatov created this revision.Feb 1 2023, 7:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 1 2023, 7:01 AM
dtemirbulatov requested review of this revision.Feb 1 2023, 7:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 1 2023, 7:01 AM
sdesmalen added inline comments.Feb 1 2023, 7:43 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11185

Is this check necessary? I would expect that it doesn't run the DAGCombiner when OptLevel == CodeGenOpt::None. Does it still run the DAGCombiner with -O0?

11186

A store returns a Chain, which always has one use, so this test can be removed.

11187

This seems like an odd case to ever happen, and you did not add a specific test for this case. Why did you add this check?

11188

Is this check necessary? I would expect that if both pointers are equal, they use the same address space?

11190

I think this can be isKnownEQ, because for the predicates to match, the sizes must match too.

llvm/test/CodeGen/AArch64/sve-dead-masked-store.ll
13

Another case that can be handled is where the second store has an 'all true' mask, then it doesn't really matter what the mask of the first store is, e.g.

%alltrue.ins = insertelement <vscale x 4 x i1> poison, i1 true, i32 0
%alltrue = shufflevector  <vscale x 4 x i1> %splat.ins,  <vscale x 4 x i1> poison,  <vscale x 4 x i32> zeroinitializer
call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val, ptr %a, i32 4, <vscale x 4 x i1> %mask)
call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val, ptr %a, i32 4, <vscale x 4 x i1> %alltrue)
dtemirbulatov marked 6 inline comments as done.

Resolved comments.

I found an error in my implementation: we could not remove the store if the chained store is a fixed type and the store we consider to remove is a scalable type, since we don't know scalable type size in the runtime. fixed.

sdesmalen added inline comments.Feb 7 2023, 5:12 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11121–11124

Can this be replaced by:

ElementCount::isKnownLE(MST1->getMemoryVT().getStoreSize(),
                        MST->getMemoryVT().getStoreSize())

?

dtemirbulatov added inline comments.Feb 7 2023, 7:24 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11121–11124

hmm, I could image somethis like:

%alltrue.ins = insertelement <vscale x 4 x i1> poison, i1 true, i32 0
%alltrue = shufflevector  <vscale x 4 x i1> %alltrue.ins,  <vscale x 4 x i1> poison,  <vscale x 4 x i32> zeroinitializer
call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val, ptr %a, i32 4, <vscale x 4 x i1> %mask)
call void @llvm.masked.store.nxv4i16(<vscale x 4 x i16> %val1, ptr %a, i32 4, <vscale x 4 x i1> %alltrue)

that ends up with st1h, probably TypeSize::KnownLE suits better here.

Replaced to TypeSize::isKnownLE() usage for all constant true case. Added few tests.

Thanks for adding the tests @dtemirbulatov! I've just got two more nits on the tests, then I'm happy to accept.

llvm/test/CodeGen/AArch64/sve-dead-masked-store.ll
37–38

Can you change this to:

call void @llvm.masked.store.nxv4i16(<vscale x 4 x i16> %val, ptr %a, i32 4, <vscale x 4 x i1> %mask)
call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val1, ptr %a, i32 4, <vscale x 4 x i1> %mask) ; with same value for the mask

such that it's more clear that the first store is eliminated?

Now it code-generates the store of nxv4i64 into two stores, because nxv4i64 is not a legal type and needs splitting.

52

For this test, can you use the same %mask value?

Adjust according to comments, fixed error with the same mask but different type size by forbid to delete the store.

dtemirbulatov marked an inline comment as done.Feb 12 2023, 4:48 PM
sdesmalen added inline comments.Feb 13 2023, 1:44 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11119

fixed error with the same mask but different type size by forbid to delete the store

I'm not sure if this is the reason you made this change, but the test that previously removed the redundant store:

define void @dead_masked_store_same_mask_bigger_type(<vscale x 4 x i16> %val, <vscale x 4 x i32> %val1, ptr %a, <vscale x 4 x i1> %mask) {
; CHECK-LABEL: dead_masked_store_same_mask_bigger_type:
; CHECK:       // %bb.0:
; CHECK-NEXT:    st1w { z1.s }, p0, [x0]
; CHECK-NEXT:    ret
  call void @llvm.masked.store.nxv4i16(<vscale x 4 x i16> %val, ptr %a, i32 4, <vscale x 4 x i1> %mask)
  call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val1, ptr %a, i32 4, <vscale x 4 x i1> %mask)
  ret void
}

Is now doing this:

define void @dead_masked_store_same_mask_bigger_type(<vscale x 4 x i16> %val, <vscale x 4 x i32> %val1, ptr %a, <vscale x 4 x i1> %mask) {
; CHECK-LABEL: dead_masked_store_same_mask_bigger_type:
; CHECK:       // %bb.0:
; CHECK-NEXT:    st1h { z0.s }, p0, [x0]
; CHECK-NEXT:    st1w { z1.s }, p0, [x0]
; CHECK-NEXT:    ret
  call void @llvm.masked.store.nxv4i16(<vscale x 4 x i16> %val, ptr %a, i32 4, <vscale x 4 x i1> %mask)
  call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val1, ptr %a, i32 4, <vscale x 4 x i1> %mask)
  ret void
}

The second store is overwriting all the i16 elements that were stored in the first llvm.masked.store, with i32 elements from the second llvm.masked.store, so that means the first store can be removed right?

dtemirbulatov added inline comments.Feb 13 2023, 2:07 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11119

The mask is the same for both stores, but because we types are different starting then offsets of starting and ending affected elements different too. I think we have to keep both store in those cases.

dtemirbulatov marked an inline comment as not done.Feb 13 2023, 2:12 AM
dtemirbulatov added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11119

The mask is the same for both stores, but because types are different starting then offsets of starting and ending affected elements different too. I think we have to keep both store in those cases.

sdesmalen accepted this revision.Feb 13 2023, 3:14 AM
sdesmalen added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11119

You're right, your current code is correct. The elements wouldn't necessarily line-up in memory, e.g.

call void @llvm.masked.store.v4i16(<4 x i16> <a, b, c, d>, ptr %a, i32 4, <4 x i1> <1, 0, 0, 1>)  ; would store `a` and `d` resulting in   { a__d____ }
call void @llvm.masked.store.v4i32(<4 x i32> <e, f, g, h>, ptr %a, i32 4, <4 x i1> <1, 0, 0, 1>)   ; would store `e` and `h` resulting in  { ee_d__hh }

so the first call can indeed not be removed.

This revision is now accepted and ready to land.Feb 13 2023, 3:14 AM
This revision was landed with ongoing or failed builds.Feb 13 2023, 8:12 AM
This revision was automatically updated to reflect the committed changes.