This is an archive of the discontinued LLVM Phabricator instance.

[MergedLoadStoreMotion] Sink stores if they have common GEP
Needs ReviewPublic

Authored by dendibakh on Sep 5 2019, 11:03 AM.

Details

Summary

If 2 stores in the diamond have common GEP:

bb1:
  %tmp = getelementptr inbounds i32, i32* %arg1, i64 1
  br i1 %arg2, label %bb2, label %bb3
bb2:                                              ; preds = %bb1
  store i32 42, i32* %tmp, align 4
  br label %bb4
bb3:                                              ; preds = %bb1
  store i32 42, i32* %tmp, align 4
  br label %bb4
bb4:                                              ; preds = %bb2, %bb3
  ret void

Stores will be sunk into %bb4.

Diff Detail

Event Timeline

dendibakh created this revision.Sep 5 2019, 11:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 5 2019, 11:03 AM
dendibakh edited the summary of this revision. (Show Details)Sep 5 2019, 11:04 AM
lebedev.ri resigned from this revision.Sep 6 2019, 5:20 AM

Kind reminder. @bjope , please review.

Kind reminder. @bjope , please review.

Sorry, I haven't had time to look at this.

Just like with the previous patch (that I did help out reviewing), all I can do is to try to figure out if the code is doing what you aim for. Not if this is good in general and how it fits together with other passes.
You should probably add some more reviewers here. I suppose there should be a code owner for these kinds of transforms? Maybe find someone that knows more about the strategy for these kinds of optimizations?
I happen to know that the load hoisting, earlier present in this pass, was moved, I think it was to GVNHoist. So is the long term plan to add these kinds of rewrites to GVNSink?

Another thing to consider as a reviewer is the cost/value perspective (if the benefits of these changes are worth the added code complexity).
You do not write anything about how this impacts any benchmarks etc., so it is really hard for me to understand if this is good in general or just a micro optimization.

FWIW, personally I'm not that interested in this pass. I just happened to fix a problem codegen not being debug invariant once upon a time.

Maybe you can start out by adding some more info in the description about:

  • any measured results (what is the benefit)
  • the "origin" of this patch (is this part of some RFC, any trouble reports about lack of optimizations, just something that you discovered was missing)

And if you do not find anyone who feels more concerned about these kinds of transforms, then I can try to find some time to look at it (since I'm trying to contribute with some patches myself from time to time I actually try to also contribute with reviews... but I actually don't have that much knowledge when it comes to optimization strategies...).

Thanks for getting back to this.
GVNSink and SimplifyCFG only sink common tails of 2 BBs. I.e. they will not cherry-pick any individual stores from the middle of a BB.
That was the motivation for a little improvement. According to my testing it doesn't have measurable performance impact. I saw fluctuations within 1% which I think is caused by code placement.
I will try to find someone to review.

Maybe @efriedma can review this.

Broadly, this seems like a small enough change that it would be worth taking for now, even if we expect the pass to become obsolete eventually. But there isn't much point if you aren't seeing any measurable benefit. Could you give some idea how frequently this triggers?

llvm/lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
284

Why does it matter here if the address is a GEP, as opposed to some other Value?

Instead of checking equality, could we check if the pointers are MustAlias? Would that have any significant impact on how frequently the transform triggers?

296

Is there a check somewhere that ensures InsertPt is actually a legal insertion point? You can't insert instructions into a catchswitch.

299

Do we need to do something with the store's alignment here?