This is an archive of the discontinued LLVM Phabricator instance.

Add MemorySSA alternative to AliasSetTracker in LICM.
AbandonedPublic

Authored by asbirlea on Jul 21 2017, 3:38 PM.

Details

Summary

This patch adds MemorySSA as an alternative to the AliasSetTracker in LICM.
I'm also using this to list several issues I came across, and get feedback on best ways to address them.
At the end of the day, doing such a replacement may require major rewrites to LICM, this patch
still tries to fit into the pattern that already exists in LICM, that, for promotion in particular,
is not ideal for using MemorySSA.

This patch is intended to add the functionality without enabling it by default.
Work for follow-up patches:
---> Teach all loop passes to preserve MemorySSA
---> Expose alias information for Uses in MemorySSA

Issues:

  • (Current approach needs review) Adding MemorySSA as a loop dependency ------>This patch adds MemorySSA as a dependency to all loop passes. It is added as a required pass that gets invalidated if used in LICM. The ideal scenario is to have all loop passes preserve MemorySSA. ------>MemorySSA is *not* used by default in LICM in this patch. However it is always *built*. ------>Patch updates tests to reflect MemorySSA pass being run/invalidated: test/Other/loop-pm-invalidation.ll test/Other/new-pass-manager.ll test/Other/pass-pipelines.ll ------>Patch does *not* update "unittests/Transforms/Scalar/LoopPassManagerTest.cpp" (fails when LICMwMSSA=true)
  • For hoist/sink, it is possible to get "less than perfect" results, due to the fact that we intentionally allow

some imprecision, by using getDefiningAccess instead of getClobberingAccess. MemorySSA has a cap on the number of stores/phis it will walk past when optimizing Uses (memssa-check-limit).
------>(Done - Working as intended) See "BIG NOTE" in LICM.cpp.

  • In order to keep the promotion mechanism the same, we use MemorySSA to create some alias sets.

------>(Major refactor needed, will not address in this patch) Changing the promotion mechanism will require some major changes, I'm not sure this is the path to go on now.
------>(Done) We still face a performance penalty if we allow full sets to be created. This time, we cannot rely on the
imprecision approach used for hoist/sink. See the follow up to BIG NOTE in MemorySSAAliasSets.h
Patch has a cap on the number of sets and number of MemorySSA queries that did work (vs used cached info).
------>(TODO added in MemorySSAAliasSets.h) We can avoid calling into AA for uses. From discussions with dberlin, this info is available and could be exposed by MemorySSA. For defs, we still need to query AA.
------>(Done) [refactor] Move LICM:collectChildrenInLoop to Utils and reuse in MemorySSAAliasSets.h to avoid recursion. This is required to avoid blowing up stack (D35609).

Event Timeline

asbirlea created this revision.Jul 21 2017, 3:38 PM
davide edited edge metadata.Jul 21 2017, 3:47 PM

Thanks, I'll review this soon.
A quick question. a concern people have about MemorySSA is that you pay upfront for O(1) queries, which may have some impact on compile time.
(e.g. in EarlyCSE). AST is already O(N^2), and in some cases badly so, but I wonder if you have compile time measurements of the impact of your change?

As far as I have tested, the cost of building MemorySSA is low because it does not incur all the cost up front.
More precisely, "optimizeUses" is capped. Only when the clobbering info is requested (if this was not part of the initial optimize pass) there's a pass to obtain it.
For hoist/sink, LICM takes advantage of this.

For promotion, LICM does not use it in this patch, and as I mentioned in the description, there is no cap. So the test in PR28670/PR28832 will not be happy.
I have a version that adds a threshold, but it does more work creating the sets. With the current patch, a small extension to MemorySSA should be enough to get a similar threshold.
With sink/hoist/promotion capped like this, the test in the PRs I mentioned runs on par with the saturated AST.

I tried to add extensive comments in code to describe ways to minimize cost, please let me know if they're unclear.

I plan to get more results for compile times after I address all issues, and before landing. I'll appreciate others testing this once this is ready to land.

dberlin edited edge metadata.Jul 21 2017, 5:12 PM

Thanks, I'll review this soon.
A quick question. a concern people have about MemorySSA is that you pay upfront for O(1) queries, which may have some impact on compile time.

FWIW: This is and always has been false accounting in most passes as you know.
Most passes that could use MemorySSA already ask about every load and store in the function.
The only reason people think it's faster to not use MemorySSA, or to avoid the upfront cost, is because our time-passes doesn't account function time to lazy utilities/analysis properly.
If it did, AA/etc time would look *much* larger than it does now, because in reality, it *is* much larger than it seems.

IE if you avoid optimize uses, and call getClobberingAccess on every load/store, "MemorySSA" time goes down, overall time goes up.

(e.g. in EarlyCSE). AST is already O(N^2), and in some cases badly so, but I wonder if you have compile time measurements of the impact of your change?

MemorySSA as alias sets (IE as done promotion) in this manner is not going to be faster in a lot of cases.

But you are going to have to rewrite how promotion in LICM functions (IE not use alias sets) to get the advantages of really using MemorySSA.
RIght now it's kinda wonky

Thanks, I'll review this soon.
A quick question. a concern people have about MemorySSA is that you pay upfront for O(1) queries, which may have some impact on compile time.

FWIW: This is and always has been false accounting in most passes as you know.
Most passes that could use MemorySSA already ask about every load and store in the function.
The only reason people think it's faster to not use MemorySSA, or to avoid the upfront cost, is because our time-passes doesn't account function time to lazy utilities/analysis properly.
If it did, AA/etc time would look *much* larger than it does now, because in reality, it *is* much larger than it seems.

IE if you avoid optimize uses, and call getClobberingAccess on every load/store, "MemorySSA" time goes down, overall time goes up.

Yes :) Sorry if it wasn't clear, I'm fairly sure this is the correct way forward.
You know this already, but for the records.
What I was referring to is that when EarlyCSE was switched to MemorySSA that resulted in a "regression", which was basically "compute MemorySSA" (this doesn't necessarily apply here as the old earlyCSE was using a fairly simple scheme while LICM uses AST which is already costly).
This could've been avoided if GVNHoist was still the default and both updated MemSSA so you didn't pay the cost of recomputing.
I guess the more stuff we move in the compiler to use MemSSA (and teach to preserve, the more we amortize the cost), but I was curious about what was the situation today with this change :)

(e.g. in EarlyCSE). AST is already O(N^2), and in some cases badly so, but I wonder if you have compile time measurements of the impact of your change?

MemorySSA as alias sets (IE as done promotion) in this manner is not going to be faster in a lot of cases.

But you are going to have to rewrite how promotion in LICM functions (IE not use alias sets) to get the advantages of really using MemorySSA.
RIght now it's kinda wonky

fhahn added a subscriber: fhahn.Aug 23 2017, 9:22 AM
jlebar added a subscriber: jlebar.Sep 14 2017, 2:18 PM
asbirlea updated this revision to Diff 115328.Sep 14 2017, 5:33 PM

Update to head following fixes in dependent revisions.
Updated description. Clang-format.
[Still a few items remaining to address]

asbirlea updated this revision to Diff 116086.Sep 20 2017, 2:59 PM

Replace creation of alias sets for promotion with non-recursive method.
There is no gain here compared with AST as long as we have to create these sets (i.e. fit into the current pattern for promotion), and this implementation is more legible and easier to add a threshhold.

Remaining issue in this patch: Resolve what is the best approach to have MemorySSA as a dependency for LICM (before all Loop passes preserve MemorySSA, and without splitting the loop pass pipeline)
Currently failing tests due to this issue:
unittests/Transforms/Scalar/ScalarTests/LoopPassManagerTest
Transforms/LoopUnroll/unroll-loop-invalidation.ll

asbirlea edited the summary of this revision. (Show Details)Sep 20 2017, 3:02 PM
asbirlea edited the summary of this revision. (Show Details)Sep 20 2017, 3:09 PM
asbirlea edited the summary of this revision. (Show Details)Sep 20 2017, 3:12 PM
asbirlea edited the summary of this revision. (Show Details)Sep 20 2017, 3:48 PM
asbirlea edited the summary of this revision. (Show Details)Sep 20 2017, 3:52 PM
asbirlea updated this revision to Diff 116281.Sep 21 2017, 3:20 PM

Remove MemorySSA as the default for LICM. All tests passing.
The default remains the Alias Set Tracker with the current state of the patch.
MemorySSA is still built, but never invalidated or used(!). It is only preserved by LICM
when LICMwMSSA is set to true which causes unittests/Transforms/Scalar/LoopPassManagerTest.cpp to fail.

As such, the patch in its current state:

  • adds most of (all?) the functionality to use MemorySSA in LICM.
  • does not enable any of this functionality
  • it is *not* a NFC, since MemorySSA is added as a Loop dependency and built, so there may be some added overhead.
asbirlea edited the summary of this revision. (Show Details)Sep 21 2017, 3:26 PM
sanjoy requested changes to this revision.Sep 26 2017, 5:42 PM

I have added some generic style comments.

As discussed in person, I think we should split out the MemorySSAAliasSets class, and add some targeted unit tests.

Please also add a comment describing the algorithm at a high level.

include/llvm/Analysis/MemorySSAAliasSets.h
10 ↗(On Diff #107731)

s/enablei/enable/

49 ↗(On Diff #107731)

The whitespace is a bit off here (clang-format?).

98 ↗(On Diff #107731)

You probably should split this out into header and a .cpp -- that way telling the interface apart from the implementation will become easier.

include/llvm/Support/Debug.h
102

I'd prefer to spell this out as UseLICMWithMSSA.

include/llvm/Transforms/Scalar/LoopPassManager.h
350

LLVM style is to avoid braces around single line statements like this.

include/llvm/Transforms/Utils/MemorySSAAliasSets.h
11

s/enablei/enable/

29

Opening namespaces in headers is against LLVM style.

35

I'd prefer having these under accessors.

44

I'm not sure about the name -- how about MustAliasSet? The "single" part is implied by the fact that the name is singular, and this is semantically a "set" and not a "vec".

55

You probably don't need the CanNeverPromote(false) bit.

59

This will be a line shorter without an early return.

87

These need to be cl::opt s

You should also split this header into a .cpp file (and put the cl::opt s in that .cpp file).

91

You should use LLVM_ENABLE_DUMP and LLVM_DUMP_METHOD for this. See the definition and declaration of e.g. DominanceFrontierWrapperPass::dump()

97

Can be a ternary. Same for loopContains

113

I'd structure this as:

if (MemLocI)
  return AA->alias(MemLocI.getValue(), MemLocJ);
// Rest of the code
248

In loopContains you've handled the case where L is nullptr, but you've not done the same here.

254

Can this be for (auto &Acc : *Accesses)?

261

You can do:

if (const auto *Use = dyn_cast_or_null<MemoryUse>(&Acc))
  addToAliasSets(Use);

also, the second if can be an else if.

285

Can you please give this a more obvious name, like replaceMemoryLocation?

Also, the name LoadI is misleading -- it isn't the load instruction, but it is the memory location, right?

This revision now requires changes to proceed.Sep 26 2017, 5:42 PM
asbirlea abandoned this revision.Apr 2 2019, 11:59 AM

Stale, no longer applies.