This is an archive of the discontinued LLVM Phabricator instance.

[DSE] Eliminate noop store even through has clobbering between LoadI and StoreI
AcceptedPublic

Authored by StephenFan on Aug 25 2022, 7:02 AM.

Details

Summary

For noop store of the form of LoadI and StoreI,
An invariant should be kept is that the memory state of the related
MemoryLoc before LoadI is the same as before StoreI.
For this example:

define void @pr49927(i32* %q, i32* %p) {
  %v = load i32, i32* %p, align 4
  store i32 %v, i32* %q, align 4
  store i32 %v, i32* %p, align 4
  ret void
}

Here the definition of the store's destination is different with the
definition of the load's destination, which it seems that the
invariant mentioned above is broken. But the definition of the
store's destination would write a value that is LoadI, actually, the
invariant is still kept. So we can safely ignore it.

Fixes https://github.com/llvm/llvm-project/issues/49271

Diff Detail

Event Timeline

StephenFan created this revision.Aug 25 2022, 7:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2022, 7:02 AM
StephenFan requested review of this revision.Aug 25 2022, 7:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2022, 7:02 AM
nikic added a reviewer: nikic.Aug 30 2022, 6:25 AM
nikic added a subscriber: nikic.

Looks reasonable to me

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1879

I feel like this comment is overly rambling, something like "This is a potentially clobbering store, but it writes the same value, so we can safely ignore it" should be sufficient?

StephenFan edited the summary of this revision. (Show Details)

Reduce comments.

nikic accepted this revision.Aug 31 2022, 12:00 AM

LG

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1871

here -> store?

This revision is now accepted and ready to land.Aug 31 2022, 12:00 AM
This revision was landed with ongoing or failed builds.Sep 1 2022, 11:41 PM
This revision was automatically updated to reflect the committed changes.

Do you need some sort of alignment check here?

nikic added a comment.Sep 2 2022, 12:10 AM

Do you need some sort of alignment check here?

Gah, you're right. If it doesn't happen to be a splat value, then writes at an offset become a problem.

Do you need some sort of alignment check here?

Gah, you're right. If it doesn't happen to be a splat value, then writes at an offset become a problem.

https://godbolt.org/z/5oEh8rdY6 Is this the problem you mentioned?

nikic added a comment.Sep 2 2022, 2:51 AM

Do you need some sort of alignment check here?

Gah, you're right. If it doesn't happen to be a splat value, then writes at an offset become a problem.

https://godbolt.org/z/5oEh8rdY6 Is this the problem you mentioned?

Yes, exactly.

Do you need some sort of alignment check here?

Gah, you're right. If it doesn't happen to be a splat value, then writes at an offset become a problem.

https://godbolt.org/z/5oEh8rdY6 Is this the problem you mentioned?

Yes, exactly.

I found GCC would delete the last store. Is this a bug in GCC?

nikic added a comment.Sep 2 2022, 3:12 AM

Do you need some sort of alignment check here?

Gah, you're right. If it doesn't happen to be a splat value, then writes at an offset become a problem.

https://godbolt.org/z/5oEh8rdY6 Is this the problem you mentioned?

Yes, exactly.

I found GCC would delete the last store. Is this a bug in GCC?

It does seem that way. It happens also if alignment is forced to 1, so this is not relying on UB of unaligned stores: https://godbolt.org/z/Yb8Kj3sT3

gcc is doing something with alignment. In the following, foo() is optimized, foo2() is not.

struct A { int x; };
struct __attribute((packed)) A2 { int x; };
void foo(char *a) {
    struct A c = *((struct A *)a);
    *((struct A *)(a + 1)) = c;
    *((struct A *)a) = c;
}
void foo2(char *a) {
    struct A2 c = *((struct A2 *)a);
    *((struct A2 *)(a + 1)) = c;
    *((struct A2 *)a) = c;
}

I'm not sure if gcc honors the alignment attribute in (int __attribute__((aligned (1))) *)a.

efriedma reopened this revision.Sep 2 2022, 2:25 PM
This revision is now accepted and ready to land.Sep 2 2022, 2:25 PM

Check alignment of store value.

@efriedma I have added the check of alignment and corresponding test cases. Are you happy with the change?

efriedma added inline comments.Sep 14 2022, 12:44 PM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1875

getABITypeAlign(LoadI->getType())) doesn't make sense. It doesn't matter what the ABI alignment of the type is; the question here is whether the alignment of the load is greater than or equal to the size of the load.

No longer compare ABI alignment.

nikic added inline comments.Sep 17 2022, 2:41 AM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1873

This should use DL.getTypeStoreSize(). Please add tests for these two cases: 1. Non-byte-sized types, e.g. i12 requires align 2 not align 1. 2. Pointer types, for which getScalarSizeInBits() returns zero.

1882

Position of this comment doesn't look right.

xbolva00 edited the summary of this revision. (Show Details)Sep 17 2022, 2:43 AM
nikic added inline comments.Sep 17 2022, 4:06 AM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1873

Also, don't we have to check alignment on both the load and the store? If you have load align 1, store align 4 they might still be offset.

  1. Add test cases of i12, pointer type, load has inconsistent alignment with store
  2. Improve comments.
  3. Check load's alignment more than just store's alignment.

Use existing DL.

nikic added inline comments.Sep 21 2022, 2:12 AM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1872

This should use dyn_cast_or_null to account for liveOnEntry defs. Please add a test case like this, that currently crashes:

define void @test(i1 %c, ptr %p, ptr noalias %q) {
entry:
  br i1 %c, label %if, label %join

if:
  store i8 0, ptr %q, align 1
  br label %join

join:
  %v = load i8, ptr %q, align 1
  store i8 0, ptr %p, align 1
  store i8 %v, ptr %q, align 1
  ret void
}
1875

nit: Prefer getValueOperand().

Account for liveonentrydef

nikic added inline comments.Sep 22 2022, 12:57 AM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1877

Can you please add a test with scalable vectors? Unless we exclude them somewhere earlier already, we would assert here when converting the type size to integer.

llvm/test/Transforms/DeadStoreElimination/noop-stores.ll
702

Can you please add an align 1 variant of this test (negative test)?

  1. Use getValueOperand().
  2. Add test of unaligned i12.
  3. Add test of scalable vector test.
  4. Disable optimization if the type size is scalable.
nikic accepted this revision.Sep 26 2022, 1:55 AM

LGTM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1870

You can make this auto *CurrentDef = cast<MemoryDef>(Current); and save the if. See the assert a few lines above.

alexfh added a subscriber: alexfh.Oct 6 2022, 8:45 AM

We started seeing incorrect behavior in some code after this commit. The test case is here: https://gcc.godbolt.org/z/hhj64noKs.

The elimination of store ptr %9, ptr %0, align 8 at line 362 doesn't seem to be correct, since memory pointed by %0 could previously be assigned %44 (e.g. on line 346: store ptr %44, ptr %0, align 8), which can hold a different value (line 121): %44 = load ptr, ptr %2, align 8. Could you take a look at this?

If you agree that this is a bug, but there is no obvious fix, could you revert the commit while working on a fix?

Thanks!

reduced test case

define void @f(ptr %arg, ptr %arg1, i1 %arg2) {
bb:
  %tmp = load ptr, ptr %arg, align 8
  %tmp3 = load ptr, ptr %arg1, align 8
  store ptr %tmp3, ptr %arg, align 8
  store ptr %tmp, ptr %arg1, align 8
  br i1 %arg2, label %bb5, label %bb6

bb5:
  store ptr %tmp, ptr %arg, align 8 ; this store is incorrectly removed
  store ptr %tmp3, ptr %arg1, align 8
  ret void

bb6:
  ret void
}

we're first swapping the values %arg and %arg1 point to, then conditionally swapping them back to their original values

will revert

aeubanks reopened this revision.Oct 6 2022, 10:54 AM
This revision is now accepted and ready to land.Oct 6 2022, 10:54 AM
nikic added a comment.Oct 6 2022, 11:31 AM

I think the problem here is that we fail to queue up the defining access of the skipped store.

Insert the skiped memorydef's clobbering memorydef to worklist.

Clean format.

reduced test case

define void @f(ptr %arg, ptr %arg1, i1 %arg2) {
bb:
  %tmp = load ptr, ptr %arg, align 8
  %tmp3 = load ptr, ptr %arg1, align 8
  store ptr %tmp3, ptr %arg, align 8
  store ptr %tmp, ptr %arg1, align 8
  br i1 %arg2, label %bb5, label %bb6

bb5:
  store ptr %tmp, ptr %arg, align 8 ; this store is incorrectly removed
  store ptr %tmp3, ptr %arg1, align 8
  ret void

bb6:
  ret void
}

we're first swapping the values %arg and %arg1 point to, then conditionally swapping them back to their original values

will revert

Thanks for providing a reduced test case!

nikic added inline comments.Oct 7 2022, 12:42 AM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1868

Drop the if here, cast never returns nullptr.

1885

This doesn't looks correct: It will look for a clobber of CurrentDef, while we are interested in clobbers of Def.

It's probably possible to show an issue using a sequence like this:

store i32 %v, ptr %p
store i32 %v2, ptr %p2, !alias.scope !0
store i32 %v, ptr %p, !noalias !0
store i32 %v, ptr %p2

With appropriate metadata to make those two stores non-aliasing.

Just using getDefiningAccess() should be correct. Alternatively using the MemoryLocation-based clobber API.

  1. Rebase.
  2. Use MemoryLocation-based clobbering API
  3. Add test cases.
nikic added inline comments.Oct 9 2022, 1:04 AM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1886

This should probably use the Store location -- the location itself is the same, but they may differ in AAMDNodes.

Is this code correct in the presence of backedges? We are looking through MemoryPhis above, so we might be looking at a previous loop iteration. This should probably have a loop invariance check on the location.

Use store location.

StephenFan added inline comments.Oct 13 2022, 12:07 AM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1886

I think the loop invariance check is not needed. The loop invariance check is needed when we do optimization that depends on AliasAnalysis. In noop-stores detection, aliasing analysis is not needed. Although MemorySSA needs alias analysis, it will do the loop invariance check when it searches for clobbering memory access.

nikic added inline comments.Oct 13 2022, 3:36 AM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1886

MemorySSA will only do the loop invariance check when it looks through a MemoryPhi though -- won't we run into problems because the DSE code already looked through MemoryPhis, so MemorySSA will not know that an invariance check has to be performed?

StephenFan added inline comments.Oct 14 2022, 6:47 PM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1886

I hope I understand what you mean. IMO, we will run into storeIsNoop Function with a MemoryDef, and then search from MemoryAccess which is returned from getClobberingMemoryAccess(Def). So I think we won't run into problems because the storeIsNoop code already looked through MemoryPhis.