This is an archive of the discontinued LLVM Phabricator instance.

Use MemorySSA in LICM to do sinking and hoisting.
ClosedPublic

Authored by asbirlea on Nov 22 2017, 2:54 PM.

Details

Summary

Step 2 in using MemorySSA in LICM:
Use MemorySSA in LICM to do sinking and hoisting, all under "EnableMSSALoopDependency" flag.
Promotion is disabled.

Enable flag in LICM sink/hoist tests to test correctness of this change. Moved one test which
relied on promotion, in order to test all sinking tests.

Diff Detail

Event Timeline

asbirlea created this revision.Nov 22 2017, 2:54 PM

Gentle ping!

Note this is still NFC since flag to enable use of MemorySSA remains disabled during this transition.
However I enabled testing to maintain correctness.

sanjoy requested changes to this revision.Dec 14 2017, 11:55 PM

Initial round of comments.

lib/Transforms/Scalar/LICM.cpp
159

Can this be an std::unique_ptr?

And why is it on the class as opposed to being a stack variable in runOnLoop like CurAST?

290

This seems unnecessary.

653–659

Probably better to leave this un-initialized; that way the compiler will warn if we forget to initialize it in one branch. You could also use a ternary here.

714

This can probably be a single llvm::any_of on CurLoop->blocks().

1647

How about call it isVolatileLoadOrLiveOnEntry?

1649

The usual idiom I've seen for this is:

if (!Processed.insert(Acc).second)
  return false;
1650

Should we be returning true in this recursive case? IE if we have a MemoryPhi that has one of its inputs as itself and the other input as live-on-entry then shouldn't that be equivalent to just a live-on-entry?

1656

I'd avoid testing for MemoryDef early -- instead I'd suggest doing this as:

if (auto *Phi = dyn_cast_or_null<MemoryPhi>(Acc)) {
} else if (auto *Def = dyn_cast_or_null<MemoryDef>(Acc)) {
}
1660

How about a range for on Phi->operands()? Or even better -- return llvm::all_of(Phi->operands(), <lambda>);

1667

Is this correct even if defining access for Def is a store in the loop?

1678

LLVM style is avoiding braces for one liners. You could also fold the assignment into the condition, like if (MemoryUseOrDef *AccI = ...)

1682

This can just be if (MemoryUse *MU = dyn_cast_or_null<MemoryUse>(MUD))

1699

s/if/If/

1700

This seems fishy -- if volatile loads do not invalidate pointers then why are they treated as defs by MSSA?

lib/Transforms/Scalar/LoopSink.cpp
291–292

How about adding /*MemorySSAUpdater=*/nullptr etc. here?

This revision now requires changes to proceed.Dec 14 2017, 11:55 PM
asbirlea updated this revision to Diff 127629.Dec 19 2017, 5:12 PM
asbirlea marked 14 inline comments as done.

Address comments.

lib/Transforms/Scalar/LICM.cpp
1649

Code in question removed.

1650

Right, but code in question removed.

1660

If using Phi->operands() I need to re-add the conversion from Use to MemoryAccess hidden in the macro in getIncomingValue().
New code would be:

return llvm::all_of(Phi->operands(), [MSSA, &Processed](const Use &Arg) {           
      return isVolatileLoadOrLiveOnEntry(
          cast_or_null<MemoryAccess>(Arg.get()), MSSA, Processed);                      
    });

Does this look better than existing?

Scratch that, code in question removed.

1667

Code updated, see comment below.

1700

Well, because they should be treated as defs. Motivation for special casing this was test/Transforms/LICM/volatile-alias.ll.
I updated the code to no longer consider volatile loads acceptable, meaning test in question will fail. And I believe that's the right thing.

sanjoy requested changes to this revision.Dec 20 2017, 4:31 PM

Some of the comments inline may be naive since I'm not very familiar with MemorySSA.

lib/Transforms/Scalar/LICM.cpp
398

Can this be a std::unique_ptr?

420

Should this assert be checking CurAST != nullptr ^ (MSSAUpdater != nullptr && MSSA != nullptr)?

690

Minor nit: I usually prefer leaving variables like Invalidated uninitialized; that way if I forget to assign to it on one branch of the if I'll get a warning and/or an ASan failure.

697

We should probably early exit if !Op->getType()->isPointerTy() (and assert Op->getType()->isPointerTy() in pointerInvalidatedByLoopWithMSSA etc.

706–711

Minor nit: even though unnecessary, I'd prefer using braces here since for readability.

863

I'm not sure that setting the defining access to nullptr here is correct -- if there is some subtle reason why it is, please add a comment.

865

Instead of two dyn_cast_or_nulls how about:

if (NewMemAcc) {
  if (auto *MemDef = dyn_cast<MemoryDef>(NewMemAcc)) {
    ...
  } else {
    auto *MemUse = cast<MemoryUse>(NewMemAcc);
  }
}

?

867

I'd prefer /*RenameUses=*/true.

873

I'm not clear on how this works -- IIUC you're calling into getClobberingMemoryAccess with NewMemUse's DefiningAccess set to nullptr, but:

  • Why don't we have to do this for MemoryDefs?
  • What if MSSA->getWalker() is a DoNothingMemorySSAWalker?
  • There are places where we should be crashing, like: CachingWalker::getClobberingMemoryAccess line 2078 that calls getClobberingMemoryAccess(DefiningAccess, Q) that calls ClobberWalker::findWalker which has if (auto *MU = dyn_cast<MemoryUse>(Start)).
1134–1135

Can this be cast_or_null?

1661

Stray comment?

1664

Minor nit: probably cleaner to use if (MSSA->isLiveOnEntryDef(Source) || !CurLoop->contains(Source->getBlock())) here.

This revision now requires changes to proceed.Dec 20 2017, 4:31 PM
asbirlea updated this revision to Diff 127969.Dec 21 2017, 4:32 PM
asbirlea marked 17 inline comments as done.

Comments.

asbirlea added inline comments.
lib/Transforms/Scalar/LICM.cpp
420

At some point I was testing with both enabled, but considering the added compile time when doing that, yes :).

873

AFAIU:
For Defs the DefiningAccess is properly updated.
For Uses, there is a DefiningAccess set, but that one is not optimized. When adding an access like in this case, calling getClobberingMemoryAccess will optimize it.
If the walker is DoNothingMemorySSAWalker, it won't optimize?

Pulling in george.burgess.iv to clarify, it's possible I'm missing something here.

Only looked at the bit I was mentioned in.

lib/Transforms/Scalar/LICM.cpp
873

IIUC you're calling into getClobberingMemoryAccess with NewMemUse's DefiningAccess set to nullptr

I'm assuming you meant MemUse, in which case insertUse sets up the defining access to something non-null.

It's sort of sketchy that we're trying to eagerly update the defining access using the walker, though. Is there a reason that we can't call MSSA->getWalker()->getClobberingMemoryAccess(MemUse); when we need optimized clobbers? Doing so should be incredibly cheap when the use has already been optimized, and it lets us be lazy about the work we do (if we don't need an accurate clobber for MemUse, we'll just never compute it)

asbirlea marked an inline comment as done.Dec 22 2017, 11:35 AM
asbirlea added a subscriber: dberlin.
asbirlea added inline comments.
lib/Transforms/Scalar/LICM.cpp
873

I'm assuming you meant MemUse, in which case insertUse sets up the defining access to something non-null.

Thanks for clarifying this.

It's sort of sketchy that we're trying to eagerly update the defining access using the walker, though. Is there a reason that we can't call MSSA->getWalker()->getClobberingMemoryAccess(MemUse); when we need optimized clobbers? Doing so should be incredibly cheap when the use has already been optimized, and it lets us be lazy about the work we do (if we don't need an accurate clobber for MemUse, we'll just never compute it)

The reasoning here is that when making code changes such as hoist/sink, I need accesses to be (reasonably) optimized. I'm not necessarily to looking to get full precision here, I'd be perfectly happy with an API to optimize up to the MemorySSA optimize-uses cap.
In LICM.cpp:1652 (and in the future for promotion) I'm relying on getting the defining access, not the walker->clobbering access, allowing for some imprecision in exchange for not paying the lookup time for blocks with large load/store count.
I had come across a case where after a hoist or sink, this approach did not work for a very small testcase without using the walker to get clobbering.
So the choice I made was to use the walker ahead of time to update accesses I insert when cloning, allowing me to use getDefining afterwards.

Hopefully I clarified the intention here :). Do you think there's a better way to address this problem? I'm open to suggestions.

@dberlin

So, this use renaming is expensive, and i'd like to understand the goal (It requires walking all stores and loads in the dominator subtree rooted at your memory access) It's really meant for when you've inserted defs that may alias existing uses.
If your goal is to replace all uses of an old memory access with some new memoryaccess, you want to use the replace API :)

The code here is creating a new memory access to handle cloning an access in all exit blocks, where that access may alias existing uses, so I think using the replace API isn't enough.
Please let me know if there is a way to handle this differently.

Gentle ping.

asbirlea updated this revision to Diff 131215.Jan 24 2018, 3:07 AM

Add a flag for the decision of using getDefining vs Walker->getClobbering, for optimizing pathological cases.

lebedev.ri added inline comments.
test/Transforms/LICM/sinking.ll
318

But there is no sink-promote.ll in this differential?
Is that in some other patch?

asbirlea updated this revision to Diff 131230.Jan 24 2018, 5:08 AM

Added sink-promote.ll. Thanks for catching this!

lib/Transforms/Scalar/LICM.cpp
873

(To be clear: the addition of the flag resolved my MemorySSA comments. While it's sort of hacky that we need to have it in the first place, the best place to handle "this {block,function} is pathologically big; let's give up instead of taking forever to walk it, and rewalk it, and rewalk it, and ..." is in MSSA, IMO. Until we make MSSA smart enough to actually do that, though, a targeted escape hatch seems like the best way forward to me.)

asbirlea marked 6 inline comments as done.Jan 26 2018, 10:03 AM

Thanks George, marking as Done. Also note that the optimization enables by this flag remains disabled.

Other reviewers: please let me know if there are remaining issues or concerns.
Please note that the use of MemorySSA remains disabled with this patch.

sanjoy added inline comments.Feb 21 2018, 1:11 PM
lib/Transforms/Scalar/LICM.cpp
1

Stray edit?

99

Line too long?

420

Probably better to have two asserts here:

  • assert((MSSAUpdater == nullptr) == (MSSA == nullptr))
  • assert((CurAST != nullptr) ^ (MSSA != nullptr))
871

Note: I did not revisit this bit since it looks like @george.burgess.iv settled it.

1140

I not convinced this is a good idea -- it looks like we're coding to the implementation of getClobberingMemoryAccess here instead of coding to its interface. But if @george.burgess.iv is fine with it, I am too.

1140

It looks like MemorySSA::CachingWalker::getClobberingMemoryAccess has an early exit if OldMemAcc was optimized. Are we okay not updating the defining access in that case?

1653

If MUD is always guaranteed to be a use then this should be cast_or_null. Otherwise please update the comment.

asbirlea updated this revision to Diff 135561.Feb 22 2018, 4:39 PM
asbirlea marked 4 inline comments as done.

Address comments.

asbirlea marked an inline comment as done.Feb 22 2018, 4:39 PM
asbirlea added inline comments.
lib/Transforms/Scalar/LICM.cpp
1140

If it's been optimized, then the defining access should have been updated, and viceversa.
@george.burgess.iv: Could you clarify this please? I don't see resetOptimized being called on any path for the call chain starting with moreToPlace->insertDef, though it would make sense for this to happen. Am I missing something here?

lib/Transforms/Scalar/LICM.cpp
1140

I not convinced this is a good idea -- it looks like we're coding to the implementation of getClobberingMemoryAccess here instead of coding to its interface. But if @george.burgess.iv is fine with it, I am too

In general, I agree, but Walkers are a sticky concept. The model we've sold people is "always call getWalker()->getClobberingMemoryAddress(MA) if you need the optimized access; the walker will handle making repeated calls fast," and we're specifically calling this on MSSA->getWalker().

I'll try to find a way to structure MSSA so that this guarantee is more obvious.

Am I missing something here?

Our theoretical isOptimized bit depends on two things: being flipped by a call to setOptimized, and either our defining access or optimized access having the same ID as it did when setOptimized was called. (...It also depends on stores not appearing out of thin air ;) )

So, when you move a Def, all Uses are implicitly invalided by the replaceAllUsesWith (since their defining access no longer has the same ID as before), and the Def's cached clobber is invalidated when you call setDefiningAccess on it.

Similarly, when you move a Use, we just set its defining access to whatever's closest. So, its cache will be invalidated (or not, if you move it right under its old clobber. But then the optimized use is still correct).

What I'm unsure about is what happens what happens to Defs optimized to a Def when said optimized-to Def is moved. I feel like I asked this during the original reviews for the caching/updater stuff, but I don't remember the answer. I'll do archaeology and get back to you. :)

(If the Def/Def case is uninteresting to this pass, feel free to not block this patch on my search)

lib/Transforms/Scalar/LICM.cpp
1140

OK, so assuming the fixes stick, https://bugs.llvm.org/show_bug.cgi?id=36529 should fix any def/def craziness we may see here.

Moving things around above the defs should also be fine, since if you have a Def A which has an "optimized" clobber B and you add a clobber between them, that sounds like a functional change (or, if nothing can observe that clobber, a waste of CPU time ;) ). Similarly, if B is moved such that a pre-existing clobber should become A's clobber, that sounds like a functional change.

I think this resolves all the questions. Please let me know if not.

Thank you for the fix, George. Yes, I think all issue are addressed now.

asbirlea updated this revision to Diff 147428.May 17 2018, 7:00 PM

Rebase and ping.

asbirlea updated this revision to Diff 166735.Sep 24 2018, 1:08 PM

Rebase patch after resect changes.

Updates:
Change test sink-promote due to PR38989.
Sink is incomplete with MSSA due to recent changes, added TODO and will extend in separate patch.

asbirlea updated this revision to Diff 180775.Jan 8 2019, 5:29 PM

Cleaned up after a series of MemorySSA updates, and rebased with ToT changes.

sanjoy requested changes to this revision.Jan 9 2019, 1:53 PM

I didn't carefully review the MSSA specific stuff since you're the domain expert here. I do have some general comments.

lib/Transforms/Scalar/LICM.cpp
93

Not sure if it is beneficial to split out a separate [Verbose] section -- maybe we can combine both into a large single comment?

107

Just to be clear, if this is true then opt -licm -some_pass_using_mssa is not the same as opt -licm | opt -some_pass_using_mssa? If yes, that needs to be explicitly (and loudly) mentioned.

475–476

Just to be clear, we don't have to tell MSSAU about these new blocks (and the edges between these new blocks)?

617–618

Update comment?

632

So if there is a memory phi then it has to be Acc->begin()?

Btw, it might be cleaner to express this as a loop:

int nonphi = 0;
for (auto *X : *Acc) {
  if (is_phi) { continue; }
  if (X->getInst() != I || nonphi++ == 1) { return false; }
}
return true;
739

Can be return isOnlyMemoryAccess(FI, CurLoop, MSSAU).

This revision now requires changes to proceed.Jan 9 2019, 1:53 PM
asbirlea updated this revision to Diff 180956.Jan 9 2019, 4:04 PM
asbirlea marked 13 inline comments as done.

Address comments.

I also cleaned up the usage of EnableLicmCap. It is only used in 2 places now, with no forced calls to getClobbering (the issue in MemorySSA requiring this is now resolved).
FWIW, I'm planning to replace this with an integer cap in a future patch (i.e. allow a limited number of calls to getClobbering, vs all or none like we have now).

lib/Transforms/Scalar/LICM.cpp
107

Hmm, let me think this out loud.

If the flag is true, there may be more opportunities for sink/hoist found at the cost of compile time, so the end result (module) may be different. But the users of the resulting MemorySSA should not be affected by the flag being true or false.

There may be more things already optimized in MemorySSA with the flag set to false.
And more things to be optimized on demand with the flag set to true.
So printing MemorySSA after LICM may have differences.
But the pass running after LICM should get the same results to MemorySSA clobbering queries.

Does this make sense?

475–476

MSSAU doesn't keep track of blocks with no memory accesses. This code if just duplicating the control flow from inside the loop to outside the loop. MSSAU only starts caring when instructions that touch memory are moved in there.

It also matters that the correct block predecessors are included in MemoryPhis, so when control flow changes to modify that, we need to update MSSA. The 3 lines below address this.

632

Yes, a block may have a single MemoryPhi, and it's always the first one in the list of Defs.

SG! Updated with loop.

sanjoy accepted this revision.Jan 9 2019, 4:40 PM

lgtm

This revision is now accepted and ready to land.Jan 9 2019, 4:40 PM
This revision was automatically updated to reflect the committed changes.