This is an archive of the discontinued LLVM Phabricator instance.

Hoist load based on alias analysis
ClosedPublic

Authored by delena on Oct 26 2014, 7:29 AM.

Details

Reviewers
Gerolf
hfinkel
Summary

I use alias analysis to define whether the given instruction is a barrier for load hoisting.
When I find 2 identical loads, I check all preceding instructions in the both basic blocks.
I'll also look at sinking stores in the next patch.

Diff Detail

Event Timeline

delena updated this revision to Diff 15456.Oct 26 2014, 7:29 AM
delena retitled this revision from to Hoist load based on alias analysis.
delena updated this object.
delena edited the test plan for this revision. (Show Details)
delena added a reviewer: Gerolf.
delena set the repository for this revision to rL LLVM.
delena set the repository for this revision to rL LLVM.
delena added a subscriber: Unknown Object (MLST).

As a general note, please upload patches with full context, see: http://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
270

This is not really sufficient, we also need to check the other possible Load flags (atomic properties, volatile, alignment). This is easy to do:

Load0->isSameOperationAs(Load1)

(it looks like this is all guarded on a check that the load isSimple, so we know the atomic properties and volatile are a non-issue, but the alignment still needs checking ).

Also, this line has a typo, (Load0->getType() == Load0->getType()) is always true.

test/Transforms/InstMerge/ld_hoist1.ll
19

I think these checks could be more specific. I think you can explicitly match the first gep and load and then match the second get (picking up the name by regex) but specifying the form, and then making sure the second load uses that address.

64

Remove unnecessary attributes.

68

Remove unnecessary metadata.

delena updated this revision to Diff 15474.Oct 27 2014, 1:47 AM
delena edited the test plan for this revision. (Show Details)

I followed Hal's comments and used IsIndenticalToWhenDefefined() - the same interface is used in SimplifyCFG.
isSameOperationAs() does the same with some excluded flags.
I also did all other suggested changes.

Thanks.

Gerolf edited edge metadata.Oct 27 2014, 12:46 PM

Does the new implementation catch any cases the old one didn't? There is extra cost by walking the range of instruction twice, and perhaps a small change would have the same effect.

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
265

Typo. Did you mean IsIdenticalToDefined()? Or IsIdendicalLoad()?

267

Are you checking the range for load0 twice? See overall comment.

371

The optimization should work when one load is used outside the block. The optimization can work when both loads are used outside but we have to deal with the PHI nodes accordingly.

Does the new implementation catch any cases the old one didn't? There is extra cost by walking the range of instruction twice, and perhaps a small change would have the same effect.

[Demikhovsky, Elena] Exactly, not every store is a barrier for load hoisting. I check this with AliasAnalysis. I attached a test case where it works.

lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
265

IsIdenticalToDefined is a standard function to compare 2 instructions.

267

No. I check only once for each load. I can't say that instruction is a barrier before I see the load itself. Because I go back from bb start to the current instruction, I do it after I see two identical loads.

371

I did not change the current behavior in this case, but I plan to work more on these optimizations. Conditional load/store are barriers for vectorizer.

hfinkel added inline comments.Oct 28 2014, 7:09 AM
lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
265

The old code used AA and checked for a MustAlias result. The new code is just checking operand quality, this makes the check less powerful. Please keep the original form of the check *using MustAlias) and use isSameOperationAs as I had suggested.

The problem is that, on either side of the diamond, you could have address-equivalent but type-inequivalent GEPs. BasicAA can easily show that the resulting offset is the same, but they'll never really be hoisted (I can't say how common this is, and I presume that we don't have a regression test for it, but let's not reduce our capability to handle that situation here).

delena updated this revision to Diff 15566.Oct 30 2014, 4:04 AM
delena edited edge metadata.

The problem is that, on either side of the diamond, you could have address-equivalent but type-inequivalent GEPs. BasicAA can easily show that the resulting offset is the same, but they'll never really be hoisted (I can't say how common this is, and I presume that we don't have a regression test for it, but let's not reduce our capability to handle that situation here).

I checked the following code:

int *iin = (int *)in;
if (trigger[i] > 0) {

out[i] = in[i] + (float) 0.5;

}
else {

out[i] /= 3;
iout[i] += iin[i];

}
And tried to hoist loading in[i]. isSameOperationAs() filters it out because the types are different.

I uploaded the new diff anyway.

hfinkel accepted this revision.Oct 30 2014, 6:37 AM
hfinkel added a reviewer: hfinkel.

And tried to hoist loading in[i]. isSameOperationAs() filters it out because the types are different.

Understood, thanks for checking. When dealing with C++ classes, etc. there are a lot of bitcasts flying around, and I did not want to change behavior here without knowing what we might regress. We probably should handle the "one side will need a bitcast" case, but that is a separate issue.

LGTM, thanks!

This revision is now accepted and ready to land.Oct 30 2014, 6:37 AM

Please see below. I still think the compile-time concern needs more attention. Ignoring that the patch LGTM.

delena added a comment.Nov 2 2014, 1:50 AM

Thank you. Committed revision 221091.

I still think the compile-time concern needs more attention.

I do this check only after I see 2 identical loads. The alternative is checking barriers for every Load0 before I know that I have a candidate for hoisting in the second block.

Btw, SimplifyCFG() does half way by hoisting identical instructions from diamond, but we do load hoisting in a separate pass. Here I have a compile-time concern.
The same about store sinking.

  • Elena
delena closed this revision.Oct 24 2015, 11:59 PM