This is an archive of the discontinued LLVM Phabricator instance.

[DSE] Eliminate stores at the end of the function.
ClosedPublic

Authored by fhahn on Jan 13 2020, 9:19 AM.

Details

Summary

This patch add support for eliminating MemoryDefs that do not have any
aliasing users, which indicates that there are no reads/writes to the
memory location until the end of the function.

To eliminate such defs, we have to ensure that the underlying object is
not visible in the caller and does not escape via returning. We need a
separate check for that, as InvisibleToCaller does not consider returns.

Diff Detail

Event Timeline

fhahn created this revision.Jan 13 2020, 9:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 13 2020, 9:19 AM
Herald added subscribers: jfb, hiraditya. · View Herald Transcript

Unit tests: unknown.

clang-tidy: unknown.

clang-format: unknown.

Build artifacts: diff.json, console-log.txt

Just a general comment here: DSE shouldn't necessarily make all optimizations possible. This one looks like it should be caught by DCE. If it's not, perhaps it would be good to improve DCE. Alternatively, if it's cheap enough, add a comment that it's still worth adding it to DSE.

fhahn added a comment.Jan 14 2020, 4:03 PM

Just a general comment here: DSE shouldn't necessarily make all optimizations possible. This one looks like it should be caught by DCE. If it's not, perhaps it would be good to improve DCE. Alternatively, if it's cheap enough, add a comment that it's still worth adding it to DSE.

Agreed! I tried to structured the patches to match the current functionality of DSE though, to make it easier to compare statistics between both implementations to double check we get the desired results.

Currently I think neither DCE nor ADCE would eliminate the cases covered with this patch, as they don't reason about memory accesses. To eliminate a MemoryDef at 'the end of the function', we have to prove that there are no clobbering reads between the MemoryDef and the function exit. With reworking to bottom-up, I think it might make sense to restructure the code in this patch too: only collect MemoryDefs potentially at 'the end' of a function in the main elimination loop and then afterwards do a bottom-up walk starting at the end nodes, eliminating the applicable MemoryDefs.

fhahn planned changes to this revision.Feb 3 2020, 1:36 PM

Needs to be updated after changing approach to use a bottom-up walk.

fhahn updated this revision to Diff 243567.Feb 10 2020, 8:05 AM

Rebased after D72700 landed.

fhahn updated this revision to Diff 248818.Mar 6 2020, 1:46 PM

rebased

fhahn updated this revision to Diff 257728.Apr 15 2020, 8:17 AM

Ping. Rebased after D77736 landed.

fhahn updated this revision to Diff 270789.Jun 15 2020, 9:58 AM

Rebased on top of latest changes. Ping :)

fhahn updated this revision to Diff 272756.Jun 23 2020, 9:45 AM

Ping :)

Rebased, added a few comments, moved additional tests from todo, improve naming.

asbirlea accepted this revision.Jun 23 2020, 12:47 PM

Thank you for the patch!

Minor comments on the tests for keeping the patch clean, but otherwise lgtm.

Please consider the performance comments for future improvements.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1658

FWIW, this call will add up a lot on compile-time. Those AA calls are very expensive, and the MemorySSAScanLimit is per call, so dependent on the number of MemoryDefs in MemDefs when called in the eliminateDeadWritesAtEndOfFunction.
In the worst case, you'd hit something like:

;a, b, c, d noalias
store to a
store to b
store to c
store to d
load/use d
; repeat read/write to d until MemorySSAScanLimit is hit

The store to a,b,c cannot be eliminated because they each hit the limit, but the work is done towards checking all accesses to d.
A potential helper would be a caching mechanism similar to what MemorySSA does internally. Along the lines of: if the first store to d is found as NoAlias and subsequent access is a load to the same MemoryLocation, this is known NoAlias.
Also worth considering for reducing compile-time, if it's not a pattern as straight-forward as store/load/mem_cpy to a precise locations to just bail-out and not do the AA call.

This comment is solely focused on performance when trying to turn this on.

1989

Also for performance, it would be great to have some metrics into what's more expensive to perform first: isWriteAtEndOfFunction or GetUnderlyingObjects+searching the map InvisibleToCallerAfterRet.

llvm/test/Transforms/DeadStoreElimination/MSSA/combined-partial-overwrites.ll
255 ↗(On Diff #272756)

Look unrelated to the patch? Perhaps commit separately as NFC?

llvm/test/Transforms/DeadStoreElimination/MSSA/memintrinsics.ll
38

Commit first as NFC?
Same for below?

This revision is now accepted and ready to land.Jun 23 2020, 12:47 PM
Whitney added inline comments.
llvm/test/Transforms/DeadStoreElimination/MSSA/mda-with-dbg-values.ll
72–73

update comment.

Whitney added inline comments.Jun 23 2020, 4:27 PM
llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-captures.ll
210

Some of the TODO comments can also be removed.

fhahn marked 4 inline comments as done.Jun 24 2020, 4:53 AM

Thank you for the patch!

Minor comments on the tests for keeping the patch clean, but otherwise lgtm.

Please consider the performance comments for future improvements.

Thanks, I added TODO comments and plan to follow up to them soon. I'll try to collect a few cases with compile-time outliers for further analysis/tuning and share them.

llvm/test/Transforms/DeadStoreElimination/MSSA/combined-partial-overwrites.ll
255 ↗(On Diff #272756)

Done, thanks!

llvm/test/Transforms/DeadStoreElimination/MSSA/mda-with-dbg-values.ll
72–73

Done, thanks!

llvm/test/Transforms/DeadStoreElimination/MSSA/memintrinsics.ll
38

Done, thanks!

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-captures.ll
210

Done, thanks!

This revision was automatically updated to reflect the committed changes.