Page MenuHomePhabricator

[DSE] Optimize defining access of defs while walking upwards.
AcceptedPublic

Authored by fhahn on Oct 22 2021, 6:33 AM.

Details

Summary

This patch extends the code that walks memory defs upwards to find
clobbering accesses to also try to optimize the clobbering defining
access.

We should be able to find set the optimized access of our starting def
(KillingDef), if the following holds:

  1. It is the first call of getDomMemoryDef for KillingDef (so Current == KillingDef->getDefiningAccess().
  2. No potentially aliasing defs are skipped.

Then if a (partly) aliasing def is encountered, it can be used as
optimized access for KillingDef. No further optimizations can be
applied to KillingDef.

I'd appreciate a careful look, as the existing documentation is not too
clear on what is expected for optimized accesses.

The motivation for this patch is to use the optimized accesses to cover
more cases of redundant stores as follow-up to D111727.

Diff Detail

Event Timeline

fhahn created this revision.Oct 22 2021, 6:33 AM
fhahn requested review of this revision.Oct 22 2021, 6:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 22 2021, 6:33 AM
fhahn updated this revision to Diff 381565.Oct 22 2021, 9:00 AM

remove stray edit of getLocForWriteEx, add some comments.

nikic added inline comments.Oct 25 2021, 2:55 AM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1410

I think this is incorrect if KillingDef accesses multiple locations. Say you have a memcpy, then KillingLoc will point to the write location only, and that's the only one for which we will check aliasing. This means that the optimized access might look past a clobber of the read location.

nikic requested changes to this revision.Oct 25 2021, 2:28 PM

Marking as changes requested per above comment.

As a more general comment, I'm somewhat concerned about setting and accessing the optimized access outside the caching MSSA walker, as it may result in unexpected behavior changes. E.g. if you run a pass before DSE that uses the walker and which sets an optimized access, but would not be set by this code (e.g. because it requires looking through a MemoryPhi), then that optimized access would be used by D112315. If you did not run the pass beforehand, it wouldn't be used. Conversely, if another pass runs after DSE and uses the caching walker, it will now return the access optimized by DSE, but will it always be the same as the one produced by the walker? Without looking into it in detail, I'm going to guess "no", they can differ due to e.g. different cutoffs. Of course, behavior can already be influenced by adjacent passes because MSSA updating is imprecise (in the sense that it is not equivalent to rebuilding MSSA from scratch), but I think this may make things worse in that a pass dependency would exist even if IR is not modified -- only caches in MSSA are populated/used in an inconsistent manner.

This revision now requires changes to proceed.Oct 25 2021, 2:28 PM
fhahn added a comment.Oct 26 2021, 1:50 PM

Marking as changes requested per above comment.

As a more general comment, I'm somewhat concerned about setting and accessing the optimized access outside the caching MSSA walker, as it may result in unexpected behavior changes. E.g. if you run a pass before DSE that uses the walker and which sets an optimized access, but would not be set by this code (e.g. because it requires looking through a MemoryPhi), then that optimized access would be used by D112315. If you did not run the pass beforehand, it wouldn't be used. Conversely, if another pass runs after DSE and uses the caching walker, it will now return the access optimized by DSE, but will it always be the same as the one produced by the walker? Without looking into it in detail, I'm going to guess "no", they can differ due to e.g. different cutoffs. Of course, behavior can already be influenced by adjacent passes because MSSA updating is imprecise (in the sense that it is not equivalent to rebuilding MSSA from scratch), but I think this may make things worse in that a pass dependency would exist even if IR is not modified -- only caches in MSSA are populated/used in an inconsistent manner.

That's a good point, having results depend on memory-ssa caching can lead to unexpected/hard-to-reproduce changes. But I think we already have the same issue with the existing users of walkers. I'd be fine with just keeping a mapping in DSE, but it seems like it might be helpful for other passes to optimize memory-ssa if possible.

Conversely, if another pass runs after DSE and uses the caching walker, it will now return the access optimized by DSE, but will it always be the same as the one produced by the walker?

I *think* the patch should only set the optimized access *if* we hit a 'guaranteed' write-clobber, so this should be the 'nearest' dominating write-clobber. It intentionally does not set the optimized access for mem-phis or non-aliasing writes. I think this should be consistent with the walker as we only set the optimized access if it could not be skipped by the walker. That is modulo differences in the AA interpretation used between the walker and isOverwrite.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1410

But isn't the optimized access only for write clobbers? This is the part that is not really documented well, but going from the use in getClobberingMemoryAccess it looks like it should be the nearest dominating write clobber

nikic added inline comments.Oct 26 2021, 2:54 PM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1410

Yes, it's about write clobbers. The problem is that a MemoryDef may access multiple locations, and the optimized access is the nearest write clobber on any of those locations. Your code (unless I'm missing something) will only find clobbers to the write location of KillingDef, but not to any additional read locations it may have.

; assuming p, q noalias
write(p) <-- optimized access found by this implementation
write(q) <-- correct optimized access
memcpy(p, q)
fhahn updated this revision to Diff 382589.Oct 27 2021, 3:23 AM

Do not optimize accesses for instructions also reading from memory. With the current guarantee that getLocForWriteEx only supports instructiosn that modify a single location this should guarantee that KillingDef only accesses a single location.

fhahn marked an inline comment as done.Oct 27 2021, 3:27 AM
fhahn added inline comments.
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1410

You are right, I my thinking about mem-defs was a bit backwards, we also need to consider defs of the read access here. I updated the code to not optimize killing defs that also read from memory. With the current restriction to only support writing instructions with a single memory location that should guarantee that only defs which access a single location (KillingLoc) are optimized.

I added a test in 1a2a7cca3e43 which I think may be mis-optimized due to the issue you described, if we support eliminating redundant memcpys based on earlier memcpys.

Conversely, if another pass runs after DSE and uses the caching walker, it will now return the access optimized by DSE, but will it always be the same as the one produced by the walker?

I *think* the patch should only set the optimized access *if* we hit a 'guaranteed' write-clobber, so this should be the 'nearest' dominating write-clobber. It intentionally does not set the optimized access for mem-phis or non-aliasing writes. I think this should be consistent with the walker as we only set the optimized access if it could not be skipped by the walker. That is modulo differences in the AA interpretation used between the walker and isOverwrite.

Right. For example DSE uses the earliest escape analysis, so it may find and cache a better (i.e. more dominating) clobber than the walker usually would.

Maybe it's okay. Would be great for @asbirlea to chime in.

fhahn marked an inline comment as done.Wed, Nov 3, 6:33 AM

Conversely, if another pass runs after DSE and uses the caching walker, it will now return the access optimized by DSE, but will it always be the same as the one produced by the walker?

I *think* the patch should only set the optimized access *if* we hit a 'guaranteed' write-clobber, so this should be the 'nearest' dominating write-clobber. It intentionally does not set the optimized access for mem-phis or non-aliasing writes. I think this should be consistent with the walker as we only set the optimized access if it could not be skipped by the walker. That is modulo differences in the AA interpretation used between the walker and isOverwrite.

Right. For example DSE uses the earliest escape analysis, so it may find and cache a better (i.e. more dominating) clobber than the walker usually would.

Good point, I should have said 'at least as good' compared to most uses of the walkers, depending on the AA the walker uses.

Maybe it's okay. Would be great for @asbirlea to chime in.

That would be great, as this seems a fundamental question when interacting with MemSSA, i.e. how passes should or shouldn't optimize MemorySSA as they go along.

fhahn added a comment.Tue, Nov 9, 10:44 AM

ping @asbirlea :)

Do you think DSE should optimize MemorySSA or is that better left to the official walkers only?

Allowing passes to do this is a slippery slope...
We already have the issue that it is sometimes hard to reproduce an issue with a single pass, due to the cached state in MemorySSA. This cached state is already dependent on what other passes do, a mix between queries and transforms, which may leave MemorySSA either "over-optimized" (has stored info beyond what it could deduce if built from scratch) or "under-optimized" (could deduce more). Having DSE optimize more is not that far fetched considering this.
I'm inclined to let DSE do this, only because 1) the traversals and inferences are beyond what MSSA alone does now and 2) they're "free" (i.e. DSE does them anyway).
However, I would not be ok with any pass being able to set optimized accesses.

Actionable feedback for this patch: can you introduce a cl::opt flag and set CanOptimize = flag && current_conditions, and document what the flag does and why.
Set the flag to false until there is a concrete case where this will be used (does it further affect the results in D112315?).
Have a separate patch that just turn flag to true and evaluate that.

The flag can be used when attempting to reproduce behavior, to determine if the caching from DSE affects results. It's also a quick way to reverse the decision of allowing any pass to do this.

fhahn updated this revision to Diff 386647.Thu, Nov 11, 1:32 PM

Allowing passes to do this is a slippery slope...
We already have the issue that it is sometimes hard to reproduce an issue with a single pass, due to the cached state in MemorySSA. This cached state is already dependent on what other passes do, a mix between queries and transforms, which may leave MemorySSA either "over-optimized" (has stored info beyond what it could deduce if built from scratch) or "under-optimized" (could deduce more). Having DSE optimize more is not that far fetched considering this.
I'm inclined to let DSE do this, only because 1) the traversals and inferences are beyond what MSSA alone does now and 2) they're "free" (i.e. DSE does them anyway).
However, I would not be ok with any pass being able to set optimized accesses.

That policy sounds good to me, thanks for sharing! Should we add something like that to the MemorySSA docs?

Actionable feedback for this patch: can you introduce a cl::opt flag and set CanOptimize = flag && current_conditions, and document what the flag does and why.
Set the flag to false until there is a concrete case where this will be used (does it further affect the results in D112315?).
Have a separate patch that just turn flag to true and evaluate that.

I added a flag in the latest version.

Allowing passes to do this is a slippery slope...
We already have the issue that it is sometimes hard to reproduce an issue with a single pass, due to the cached state in MemorySSA. This cached state is already dependent on what other passes do, a mix between queries and transforms, which may leave MemorySSA either "over-optimized" (has stored info beyond what it could deduce if built from scratch) or "under-optimized" (could deduce more). Having DSE optimize more is not that far fetched considering this.
I'm inclined to let DSE do this, only because 1) the traversals and inferences are beyond what MSSA alone does now and 2) they're "free" (i.e. DSE does them anyway).
However, I would not be ok with any pass being able to set optimized accesses.

That policy sounds good to me, thanks for sharing! Should we add something like that to the MemorySSA docs?

Yes. I'll send something for review tomorrow.

Actionable feedback for this patch: can you introduce a cl::opt flag and set CanOptimize = flag && current_conditions, and document what the flag does and why.
Set the flag to false until there is a concrete case where this will be used (does it further affect the results in D112315?).
Have a separate patch that just turn flag to true and evaluate that.

I added a flag in the latest version.

Changes LG. @nikic?

nikic requested changes to this revision.Sat, Nov 20, 1:21 PM
nikic added inline comments.
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1337

Looking at how canSkipDef() is defined, I think this is wrong on two fronts:

First, canSkipDef() skips any mayThrow() instruction if !DefVisibleToCaller. But what if this is a non-nounwind call that clobbers KillingLoc?

Second, canSkipDef() skips lifetime intrinsics, but I believe those are considered clobbering by MSSA (they effectively write an undef value to the location).

1386–1387

On an unrelated note, why does this explicitly adjust WalkerStepLimit?

2119

Spurious change

This revision now requires changes to proceed.Sat, Nov 20, 1:21 PM
fhahn updated this revision to Diff 388914.Mon, Nov 22, 7:16 AM

disable MSSA optimizations when stepping over a 'skippable' def

fhahn marked 2 inline comments as done.Mon, Nov 22, 7:21 AM
fhahn added inline comments.
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1337

Good point, it's probably best to disable optimizations for any skipable def to start with.

1386–1387

I'm not sure, but looking at other continues we should be able to drop it.

2119

Removed it here, but I also need to remove the one below as well.

fhahn updated this revision to Diff 388915.Mon, Nov 22, 7:23 AM
fhahn marked 2 inline comments as done.

remove unrelated whitespace changes

nikic accepted this revision.Fri, Nov 26, 10:20 AM

LGTM

This revision is now accepted and ready to land.Fri, Nov 26, 10:20 AM