This is an archive of the discontinued LLVM Phabricator instance.

DeadStoreElimination: remove a redundant store even if the load is in a different block.
ClosedPublic

Authored by eeckstein on Aug 7 2015, 5:02 PM.

Details

Summary

DeadStoreElimination does eliminate a store if it stores a value which was loaded from the same memory location.
So far this worked only if the store is in the same block as the load. This patch allows that the load is in another block than the store.
Example:

define i32 @test26(i1, i32*) {
entry:

%l2 = load i32, i32* %1, align 4
br i1 %0, label %bb1, label %bb2

bb1:

br label %bb3

bb2:

; This store is redundant
store i32 %l2, i32* %1, align 4
br label %bb3

bb3:

ret i32 0

}

Diff Detail

Event Timeline

eeckstein updated this revision to Diff 31560.Aug 7 2015, 5:02 PM
eeckstein retitled this revision from to DeadStoreElimination: remove a redundant store even if the load is in a different block..
eeckstein updated this object.
eeckstein added a subscriber: llvm-commits.

Hi Erik,

Couple of comments:
Please upload your patch with the full context.
Please use -instnamer on your test cases to get rid of the %[0-9]+ variables.

Thanks,
-Quentin

Could you please provide more context in the diff for DeadStoreElimination.cpp?

eeckstein updated this revision to Diff 31562.Aug 7 2015, 5:33 PM

renamed variables in test
diff with full context

reames added inline comments.Aug 7 2015, 6:28 PM
lib/Transforms/Scalar/DeadStoreElimination.cpp
496

I think you'd be better to phrase the non-local query here so that it benefits both sub-optimizations.

522

Phasing this is isNonLocal wouldbe much clearer.

So, Karthik posted and got http://reviews.llvm.org/D11143 reviewed

Does this not cover your case?

Additionally, " MD->getNonLocalPointerDependency(SI, Deps);"

This is very compile time expensive to call a lot. I'd really like to know how often yours actually triggers, and what performance different it actually makes relative to Karthik's

Thanks for looking at this patch.

So, Karthik posted and got http://reviews.llvm.org/D11143 reviewed
Does this not cover your case?

I checked this and no, it does not cover my case.

Nevertheless thanks for pointing me to Karthik's patch. I think I will invest a little bit more work and use a similar work-list algorithm as Kathik. It will cover more cases: my current approach (with getNonLocalPointerDependency) has the drawback that it does not work if there is a (harmless) other load between the first load and the store.

Also I assume (but I'm not sure) it will be faster than getNonLocalPointerDependency.

I will upload an updated patch soon. But anyway I don't want to commit anything until Kathik's patch has landed so that there are no conflicts.

Thanks,
Erik

eeckstein updated this revision to Diff 31751.Aug 10 2015, 5:49 PM

This updated version uses a work-list algorithm to do the safety checking. This has 2 advantages compared to MD->getNonLocalPointerDependency:

  1. It ignores load instructions which should not prevent the optimization
  2. It's faster because it bails out at the first memory alias it finds.

I still have to do some measurements. I will post some data in the next time.

Philip, thanks for your comments on the first version. Please let me know if your questions are still relevant with this updated patch.

Some data from the lnt test suite:

Compile time: I did not see any changes which are not noise.
Execution time: There are a few benchmarks with improvements, but I think most (if not all) is noise.

Static statistics data:
This optimization effects redundant stores of the form:

%x = load %p
store %x, %p

With this patch the compiler can eliminate 131 more redundant stores of this kind.

Correction to the static data:

In the original version, the compiler did eliminate 46 redundant stores of this kind.
With this patch, the compiler eliminates 131 redundant stores of this kind.
So the difference is 85 or 2.8x.

LGTM
LNT doesn't have particularly large programs, so not seeing compile
time effects there is not surprising. Your walk may in fact, walk the
entire CFG *for each store* (if load is in entry block, store is near
exit block).

This is, in general, a really expensive way to perform this optimization.

It's sad this can't take advantage of GVN, and instead just directly
compares pointer operands. This means things like bitcasts/etc, will
cause it to not eliminate dead stores it could. Another thing on the
pile of usefulness for separating GVN's analysis and elimination.

However, all that said, i'm fine with this patch, we'll fix any
compile time issues when they pop up by doing it right.

--Dan

Thanks for reviewing!

PS to compile time: while you are right that it goes over the whole CFG for each store, this is only the worst case scenario.
The MemoryIsNotModifiedBetween() is only called if the value and pointer operands match, which is the case for 0.2% of all stores.
And even then it bails out early if it finds a potentially aliased store. On average MemoryIsNotModifiedBetween() traverses only about 3 blocks or 10 instructions per invocation. On average it calls 2 times getModRefInfo per invocation.

eeckstein updated this revision to Diff 31999.Aug 12 2015, 4:22 PM

Here is an updated version with a bug fix in MemoryIsNotModifiedBetween.
I also added a few tests.

I plan to commit it tomorrow unless anyone has additional comments on the update.

Thanks,
Erik

This revision was automatically updated to reflect the committed changes.

committed in r244901