This is an archive of the discontinued LLVM Phabricator instance.

[MergedLoadStoreMotion] Conditional store elimination
AbandonedPublic

Authored by chill on Jul 7 2021, 3:46 AM.

Details

Summary

This patch implements replacement of certain conditional stores with
unconditional ones (subject to constraints).

A triangle shaped part of the CFG like:

header:
  ...
  br %cond, label %if.then, label %if.end
    
if.then:
  store %s.new, %addr
  br label %if.end
    
if.end:
   ...

would become

 header:
   ...
   br %cond, label %if.then, label %if.else
    
if.then:
   br %if.end
    
if.else:
  %s.old = load %addr
  br label %if.end
    
if.end:
  %s = phi [%s.new, if.then], [%s.old, if.else]
  store %s, %addr

This transformation is correct as long as the store:
a) is not volatile
b) does not introduce an invalid memory access in the new location
c) does not introduce a data race in the new location

For a) volatile stores are simply disqualified from the transformation.

To satisfy b) we can check that on all paths leading up to the end of the
header block

  • the program already contains a write (or, for local objects only, a read) to precisely the same memory location, and
  • for non-local objects only, following that write, there is no instruction that could possibly make subsequent writes invalid (local objects are considered always writable).

To satisfy c), for local objects only, we can check that the address of the
object does not escape the function.

For non-local objects or for escaping local objects we can check that on all
paths leading up to the end of the header block

  • the program already contains a write to precisely the same memory location, and
  • following that write, there is no instruction that could possibly be the tail edge of a "synchronizes-with" relation

If the candidate store is to a local variable, we first traverse the users of
the alloca instruction, noting whether the address escapes and whether a load
or a store to the same address dominates the candidate store (domination is a
stronger constraint than the above "on all paths" one).

Failing that, or for stores to non-local objects, we then traverse the MemorySSA
graph, starting from the MemoryDef that corresponds to the candidate
store. During that travesal:

  • if we reach the initial liveOnEntry, nothing is guaranteed and we fail
  • if we reach a call to an unknown function, a volatile memory access, or an atomic memory access we bail
  • if we reach a simple store to the same memory location, we stop traversing upwards from this MemoryDef only
  • otherwise we contionue the traverse to the incoming MemoryDef

If we didn't bail anywhere in the above traversal, the transformation is
considered correct.

Diff Detail

Event Timeline

chill created this revision.Jul 7 2021, 3:46 AM
chill requested review of this revision.Jul 7 2021, 3:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 7 2021, 3:46 AM
chill added a comment.Jul 7 2021, 4:04 AM

Oh, and I'm going to work on adding tests.

lebedev.ri retitled this revision from Conditional store elimination to [MergedLoadStoreMotion] Conditional store elimination.Jul 7 2021, 4:13 AM
lebedev.ri edited the summary of this revision. (Show Details)

Warning: this pass clearly modifies CFG, and clearly does not update dominator tree,
which means you can not use dominator tree in this pass,
at least not until fixing the pass to preserve it when modifying CFG.

chill added inline comments.Jul 8 2021, 2:43 AM
llvm/lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
491

Note to self: should be V == cast<StoreInst>(I)->getValueOperand())

chill updated this revision to Diff 357569.Jul 9 2021, 10:49 AM
chill edited the summary of this revision. (Show Details)

Updated with a very different implementation. Now handles non-local stores as well.
Updated the description with a justification why I think this transformation is correct.

chill added a comment.Jul 9 2021, 10:52 AM

Warning: this pass clearly modifies CFG, and clearly does not update dominator tree,
which means you can not use dominator tree in this pass,
at least not until fixing the pass to preserve it when modifying CFG.

Thanks. I think all accesses to the dominator tree (and on the next variants to the MemorySSA graph) occur
before any changes to the CFG.

chill updated this revision to Diff 358202.Jul 13 2021, 2:23 AM

In this updates:

  • added tests
  • update the Dom tree, preserve DomTreeAnalysis
  • small fixes here and there
bjope resigned from this revision.Jul 13 2021, 2:28 AM

It looks like the patch does Partial Redundancy Elimination for store with triangle CFG... As far as I know, LLVM has passes to support the PRE.
If you have already checked the llvm passes, can you let me know why the passes do not handle the triangle CFG with store instruction please?

chill added a comment.Jul 15 2021, 9:53 AM

It looks like the patch does Partial Redundancy Elimination for store with triangle CFG... As far as I know, LLVM has passes to support the PRE.
If you have already checked the llvm passes, can you let me know why the passes do not handle the triangle CFG with store instruction please?

I'm not aware of a pass that does PRE on stores. GVN does a limited form of PRE
on diamonds, but it does not deal with void instructions (such is stores).
DSE only handles fully redundant stores, as far as I can tell.

Even if we have PRE on stores, we still need this pass (or a variant thereof) to
create the (partial) redundancy, to feed to or as a step of a (hypothetical?)
Partial-DSE pass, since we don't have actually a case for a PRE - no two stores,
one to be considered (partially) redundant.

In any case, I'm open to suggestions about better places for this
transformation.

@mkazantsev I can see you have worked with PRE recently. If possible, can you recommend the point where this transformation is useful please?

SjoerdMeijer added a subscriber: efriedma.

Adding @efriedma to see if this is the right place for this.

Assuming it is, for now, I plan to look at this soon.

High-level question(s) first before I dive more into the details: I was wondering if the main benefit of this is simplification of control flow? Which then probably relies on the store that is being sunk the only instruction in the block? In other words, I was wondering if this is always a good thing to do. Or can we can regress things if it's not the only instruction, or it the other path (the one that does not have the store) now has to deal with the condition?

llvm/lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
99

It's probably good to add some of the justification and legality considerations here, i.e. the things you added to the description of this ticket.

At a high level, I think this needs some heuristic to drive it. I can see this being profitable under two circumstances, broadly speaking:

  1. Sinking the store enables DSE to eliminate an earlier store.
  2. Sinking the store enables vectorization.

Otherwise, you're just performing extra memory operations, which doesn't seem like an improvement.

llvm/test/Other/opt-O3-pipeline.ll
144

We don't want an additional MemorySSA run, if we can avoid it.

If we're going to do this, maybe stick the code into Dead Store Elimination?

chill added a comment.Aug 2 2021, 7:52 AM

Hello,

Thanks for the comments and the suggestions.
The main objective of this patch is to simplify the CFG, but, indeed, the patch itself does not
make the CFG simpler.
As it turns out, there's already a very similar transformation done in SimplifyCFG, so I have opted
to adding extra correctness checks (respectively, trigger in more cases) *there*.

I'm abandoning this patch in favour of https://reviews.llvm.org/D107281

chill abandoned this revision.Aug 3 2021, 2:06 AM