This is an archive of the discontinued LLVM Phabricator instance.

[LICM] Store promotion when memory is thread local
ClosedPublic

Authored by reames on Feb 1 2016, 12:37 PM.

Details

Summary

This patch teaches LICM's implementation of store promotion to exploit the fact that the memory location being accessed might be provable thread local. The fact it's thread local weakens the requirements for where we can insert stores since no other thread can observe the write. This allows us perform store promotion even in cases where the store is not guaranteed to execute in the loop.

Two key assumption worth drawing out is that this assumes a) no-capture is strong enough to imply no-escape, and b) standard allocation functions like malloc, calloc, and operator new return values which can be assumed not to have previously escaped.

In future work, it would be nice to generalize this so that it works without directly seeing the allocation site. I believe that the nocapture return attribute should be suitable for this purpose, but haven't investigated carefully. It's also likely that we could support unescaped allocas with similar reasoning, but since SROA and Mem2Reg should destroy those, they're less interesting than they first might seem.

Diff Detail

Repository
rL LLVM

Event Timeline

reames updated this revision to Diff 46568.Feb 1 2016, 12:37 PM
reames retitled this revision from to [LICM] Store promotion when memory is thread local.
reames updated this object.
reames added a subscriber: llvm-commits.
sanjoy added inline comments.Feb 1 2016, 1:27 PM
lib/Transforms/Scalar/LICM.cpp
928 ↗(On Diff #46568)

Though I think I understand it now, I found the code somewhat
difficult to follow because of the overlap between
GuaranteedToExecute && CanSpeculateLoad. Is it possible to
instead track CanSpeculateLoad and CanSpeculateStore?

reames added inline comments.Feb 1 2016, 2:30 PM
lib/Transforms/Scalar/LICM.cpp
928 ↗(On Diff #46568)

This would be a completely reasonable refactoring, but I'd strongly prefer to do that as a separate change.

sanjoy edited edge metadata.Mar 3 2016, 10:14 AM

Overall the logic lgtm. However, should this be disabled when optimizing for size?

lib/Transforms/Scalar/LICM.cpp
1012 ↗(On Diff #46568)

So part of the confusion here for me is that CanSpeculateLoad looks like it is really tracking something else. Does it make sense to rename it to CanPromoteAliasSet?

reames added a comment.Mar 4 2016, 1:44 PM

Overall the logic lgtm. However, should this be disabled when optimizing for size?

Why would the thread local version be disabled when optimizing for size and not the existing code? That doesn't seem to make any sense.

reames added inline comments.Mar 4 2016, 1:48 PM
lib/Transforms/Scalar/LICM.cpp
1012 ↗(On Diff #46568)

I really not getting the confusion here. Per the comment on the variable you're asking me to rename, it really is answering only the speculation legality question. In particular, that by itself is not enough to establish promotion is legal. That's the whole point.

sanjoy added a comment.Mar 4 2016, 3:19 PM

Overall the logic lgtm. However, should this be disabled when optimizing for size?

Why would the thread local version be disabled when optimizing for size and not the existing code? That doesn't seem to make any sense.

Given that you're potentially duplicating a store, if this change
makes the promotion fire a lot more often maybe that's bad for
-Os. But I have no hard numbers for this.

lib/Transforms/Scalar/LICM.cpp
1012 ↗(On Diff #46568)

I really not getting the confusion here.

That just proves you're smarter than I am. :)

My confusion initially was that you're checking if the store location
is dereferenceable, but yet use that to decide if the load can be
speculated. But now I get that this is really about proving that
any location in the alias set can be speculatively loaded from,
since they're all MustAlias.

Just as a understanding check -- if you remove this clause completely,
that will just be a quality of implementation regression, not a
correctness issue, right?

Overall the logic lgtm. However, should this be disabled when optimizing for size?

Why would the thread local version be disabled when optimizing for size and not the existing code? That doesn't seem to make any sense.

Given that you're potentially duplicating a store, if this change
makes the promotion fire a lot more often maybe that's bad for
-Os. But I have no hard numbers for this.

I'd argue that either the whole transform should be restricted in Os, or not. My change isn't a change to any of the profitability heuristics, just a legality one.

p.s. Once you've considered my points, can you give another explicit LGTM? I want to make sure that still stands after our follow on discussion.

lib/Transforms/Scalar/LICM.cpp
1012 ↗(On Diff #46568)

Correct. Using the store to establish dereferenceability lets the transform kick in more often, but is not required for correctness.

I'll add a clarify comment about the mustalias bit. That's not obvious from the code and might help to prevent future confusion.

sanjoy accepted this revision.Mar 9 2016, 12:46 PM
sanjoy edited edge metadata.

lgtm

This revision is now accepted and ready to land.Mar 9 2016, 12:46 PM
This revision was automatically updated to reflect the committed changes.