This is an archive of the discontinued LLVM Phabricator instance.

[GlobalOpt] Demote globals to locals more aggressively
ClosedPublic

Authored by jmolloy on Oct 28 2015, 7:39 AM.

Details

Summary

Global to local demotion can speed up programs that use globals a lot. It is particularly useful with LTO, when the entire call graph is known and most functions have been internalized.

For a global to be demoted, it must only be accessed by one function and that function:

  1. Must never recurse directly or indirectly, else the GV would be clobbered.
  2. Must never rely on the value in GV at the start of the function (apart from the initializer).

GlobalOpt can already do this, but it is hamstrung and only ever tries to demote globals inside "main", because C++ gives extra guarantees about how main is called - once and only once.

In LTO mode, we can often prove the first property (if the function is internal by this point, we know enough about the callgraph to determine if it could possibly recurse).

The second property can be proven for a subset of functions by proving that all loads from GV are dominated by a store to GV. This is conservative in the name of compile time - this only requires a DominatorTree which is fairly cheap in the grand scheme of things. We could do more fancy stuff with MemoryDependenceAnalysis too to catch more cases but this appears to catch most of the useful ones in my testing.

Some of the churn in this patch is due to reordering the optimizations that get done on globals. We were trying to demote first, and if that failed we were trying to delete the global. Now that we can demote on more occasions, actually demotion isn't as powerful as deletion so we need to swap these around to keep tests passing and not pessimize code.

Diff Detail

Repository
rL LLVM

Event Timeline

jmolloy updated this revision to Diff 38658.Oct 28 2015, 7:39 AM
jmolloy retitled this revision from to [GlobalOpt] Demote globals to locals more aggressively.
jmolloy updated this object.
jmolloy added reviewers: dexonsmith, chandlerc, majnemer.
jmolloy set the repository for this revision to rL LLVM.
jmolloy added a subscriber: llvm-commits.
jmolloy added inline comments.
test/Transforms/GlobalOpt/2009-11-16-MallocSingleStoreToGlobalVar.ll
11 ↗(On Diff #38658)

This is an unneeded change and will not be committed.

jmolloy updated this revision to Diff 38659.Oct 28 2015, 7:44 AM
majnemer added inline comments.Oct 28 2015, 8:59 AM
lib/Transforms/IPO/GlobalOpt.cpp
1733

This is not true if the program is not a C++ program. main can be recursively called in C. Also, the name main may be reused for a function other than the entry point if it is not a hosted program which would be important to LTO something like the Linux kernel.

At the very least, we should check the functions prototype.

1772

auto *I

1789

auto *LI

Thanks for working on this! Do you have any performance number?

lib/Transforms/IPO/GlobalOpt.cpp
95

Comments will be great.
It is a little strange to name it "FunctionsKnownNotToRecurse" with a bool that tells if the function will recurse.

1751

I am not sure if these conditions are sufficient to say the function does not recurse. Are indirect calls handled conservatively in the call graph?

mehdi_amini added inline comments.Oct 29 2015, 3:22 PM
lib/Transforms/IPO/GlobalOpt.cpp
1731

Note: I believe this is equivalent to

return !std::any_of(CGN->begin(), CGN->end(), [&] (CallGraphNode::CallRecord &CR) { return CR.second == CGN });
1738

Whether we go with an attribute (which I would support a priori) or not, it seems cleaner to expose the result of this through a module level analyses.

1748

Could the check !F be an assert or are there legitimate cases for this?

1786

I don't get this part, even if the load is larger than the GV you should have a must alias. Isn't the opposite case that is problematic?

Edit: reading later I think I figured: you need to makes sure that every load can be replace by the value forwarded from another store. I'm not sure "must alias" is the best wording here.

LLVM defines: The MustAlias response may only be returned if the two memory objects are guaranteed to always start at exactly the same location. A MustAlias response implies that the pointers compare equal.

1800

No else, just return.

1820

It would be nice if SmallVector (and other ADT) would allow to test a predicate on their content, this pattern would be less verbose. In the meantime I guess we're left with something like:

if(!std::any_of(Stores.begin(), Stores.end(), [&] (Instruction *S) { return DT.dominates(S, L); })
  return false;

If you're not allergic to the STL, or alternatively you may outline this like you did for functionDoesNotCallItself.

1916

So you moved this optimization a after all the previous one (like TryToShrinkGlobalToBoolean, OptimizeOnceStoredGlobal, ...), why?

1978

F->eraseFromParent(); is pretty straightforward and usual, this could deserve a comment.

And thanks for working on this! We were talking internally about improving GlobalOpt just two weeks ago on this aspect of recursive function detection :)

Hi guys,

Thanks for all the swift feedback. I have proposed adding a new "norecurse" function attribute in D14227, and uploaded a patch to FunctionAttrs to populate it in D14228. This patch now depends on those.

David, I agree the "main" hack is indeed a hack. The new "norecurse" attribute allows us to completely remove it as you say by teaching Clang to emit it during codegen for main() in C++ mode. I'll do that in a followup patch when the new attribute is committed and the dust has settled, if that's OK?

Mehdi, I've taken hopefully all your comments on board with this new version. One thing to comment on though:

So you moved this optimization a after all the previous one (like TryToShrinkGlobalToBoolean, OptimizeOnceStoredGlobal, ...), why?

I mentioned this in the description above, but basically it's because demoting globals to locals is less of a powerful optimization than deleting them outright (or shrinking them). With this patch global demotion triggers more often, so if we're not careful we could demote a global to a local that could have been deleted instead. That's why I changed the ordering.

James

jmolloy updated this revision to Diff 38902.Nov 2 2015, 5:33 AM
jmolloy updated this revision to Diff 40057.Nov 12 2015, 8:22 AM

Hi Manman, Mehdi,

Updated patch attached.

  • Now, Clang knows how to add norecurse to "main", so we can remove that logic (hooray!)
  • Removed all the norecurse logic, replaced with a query to F->doesNotRecurse()

Please take a look.

Cheers,

James

mehdi_amini requested changes to this revision.Nov 12 2015, 9:38 AM
mehdi_amini added a reviewer: mehdi_amini.

Looks good, but the changes we discussed on IRC!

lib/Transforms/IPO/GlobalOpt.cpp
1767

To sum up our IRC conversation, we can unconditionally add Load and Stored to the arrays, and check for the size later: we only care if a load is covered by a store at least as large.

1789

Nitpick: current guideline is not to repeat the function name in the doxygen.

1829

Commit this separately now.

This revision now requires changes to proceed.Nov 12 2015, 9:38 AM
jmolloy updated this revision to Diff 40137.Nov 13 2015, 4:38 AM
jmolloy edited edge metadata.

Hi Mehdi, Duncan, Manman,

I've updated the patch following all your comments. Implementing Mehdi's suggestion made the code look significantly neater and will also catch more cases.

I've also undone the reordering of optimizations in ProcessInternalGlobal, as I don't think this is now needed now that "main" isn't implicitly norecurse. It was only done to pacify tests in the first place, and they all pass now without modification.

Cheers,

James

mehdi_amini added inline comments.Nov 13 2015, 7:41 AM
lib/Transforms/IPO/GlobalOpt.cpp
1809

If the function does not recurse and is the only user of the global, why do we need isPointerValueDeadOnEntryToFunction?

Hi Duncan,

I had already committed after Mehdi's LGTM, so I have applied fixes based on your comments in r253192.

Cheers,

James

jmolloy added a comment.EditedNov 16 2015, 2:39 AM

Hi All,

I just want to apologize - after rereading the phab comments here I realize that I actually mistook Mehdi's LGTM on D14228 for this review when looking at my backlog, so I committed it before it was fully approved.

To reply to Mehdi's comment specifically:

If the function does not recurse and is the only user of the global, why do we need isPointerValueDeadOnEntryToFunction?

Consider this:

static int g = 0;
int f() {
  printf("%d\n", g++);
}

g can obviously not be localized, and f does not recurse. However, each invocation of f depends on the previous invocation because g is read before being written. The "isPointerValueDeadOnEntry" function is there to ensure the function is idempotent with respect to the given global.

James

hfinkel accepted this revision.Dec 10 2015, 4:55 PM
hfinkel added a reviewer: hfinkel.
hfinkel added a subscriber: hfinkel.

(will close)

hfinkel accepted this revision.Dec 10 2015, 4:57 PM

This was committed in r253168.

(It seems that, even after marking this as accepted, I cannot close it)

mehdi_amini accepted this revision.Dec 10 2015, 5:00 PM
mehdi_amini edited edge metadata.
This revision is now accepted and ready to land.Dec 10 2015, 5:00 PM
mehdi_amini closed this revision.Dec 10 2015, 5:00 PM