This is an archive of the discontinued LLVM Phabricator instance.

Replace the use of MaxFunctionCount module flag
ClosedPublic

Authored by eraman on Mar 30 2016, 2:19 PM.

Details

Summary

This provides a convenient way to get ProfileSummary for a module and use the maximum function count from the profile summary instead of calling getMaximumFunctionCount. The getMaximumFunctionCount is not removed in this patch and will be done in a later patch.

Diff Detail

Repository
rL LLVM

Event Timeline

eraman updated this revision to Diff 52122.Mar 30 2016, 2:19 PM
eraman retitled this revision from to Replace the use of MaxFunctionCount module flag.
eraman updated this object.
eraman added a reviewer: vsk.
eraman added subscribers: davidxl, llvm-commits.
vsk edited edge metadata.Mar 30 2016, 2:40 PM

Thanks, comments inline --

include/llvm/ProfileData/ProfileCommon.h
75 ↗(On Diff #52122)

Hm, this won't work in multi-threaded environments.

You can either sink this into LLVMContext (preferred!), or make it an llvm::ManagedStatic defined outside of ProfileSummary for more visibility. If you go with the second option, you'll also need a mutex around all cache accesses.

Somewhere down the line you might consider making this cache some kind of (Small)?DenseMap.

lib/ProfileData/ProfileSummary.cpp
36 ↗(On Diff #52122)

See comment above.

371 ↗(On Diff #52122)

nit: I prefer if (Metadata *MD = M->getProfileSummary()) return ...;, but I'll leave it up to you.

unittests/ProfileData/ProfileSummaryTest.cpp
23 ↗(On Diff #52122)

I think this sort of initialization code is supposed to go into a void SetUp() method. Let gtest do its own thing with the test constructor.

52 ↗(On Diff #52122)

It looks like this is worth lifting into a friend bool operator==(...) method for ProfileSummary. I think that would make things a bit easier to read; maybe clients would like to be able to compare summaries too. Wdyt?

eraman added inline comments.Mar 30 2016, 2:51 PM
include/llvm/ProfileData/ProfileCommon.h
75 ↗(On Diff #52122)

Yes indeed. Perhaps it is better to not cache this and instead let the clients cache the summary?

As to use of DenseMap, is this ever needed? I believe (and I may be wrong) the only place multiple modules are simultaneously present is in the LTO mode and there all modules are actually merged into one module which is where the IPO optimizations take place.

Another crazy thought - we could just cache the ProfileSummary without any key since all modules compiled in a single process are bound to have the same profile summary (since summary is not module specific)

vsk added inline comments.Mar 30 2016, 2:53 PM
include/llvm/ProfileData/ProfileCommon.h
75 ↗(On Diff #52122)

Sure, using a DenseMap is probably overkill. Having a one-entry cache seems useful though.

Regarding your last point: it's cheap to keep the Module* as a key, and keeping it makes the code less "magical" (no hidden assumptions about which Module you're referring to).

eraman added inline comments.Mar 30 2016, 3:55 PM
include/llvm/ProfileData/ProfileCommon.h
75 ↗(On Diff #52122)

Yeah, that's why I didn't go that route.

I don't see a way to make this part of the LLVMContextImpl since ProfileSummary is not part of Core (and we obviously don't want Core to be dependent on ProfileData). In fact that is the reason why Module's getProfileSummary returns Metadata instead of ProfileSummary. So the only way to make this work is to make it an ManagedStatic right?

vsk added inline comments.Mar 30 2016, 4:02 PM
include/llvm/ProfileData/ProfileCommon.h
75 ↗(On Diff #52122)

There's one more possibility. What if we cache ProfileSummary objects in the clients of this API, instead of in getProfileSummary?

E.g we could cache the summary in CallAnalyzer. An LLVMContext should be readily available there, and that seems like the right place to put this.

eraman added inline comments.Mar 30 2016, 4:27 PM
include/llvm/ProfileData/ProfileCommon.h
75 ↗(On Diff #52122)

That's what I meant by "Perhaps it is better to not cache this and instead let the clients cache the summary". This won't require using LLVMContext as the cached summary will be part of the client object (in this case it'll be in Inliner and passed on to CallAnalyzer)

vsk added inline comments.Mar 30 2016, 4:29 PM
include/llvm/ProfileData/ProfileCommon.h
75 ↗(On Diff #52122)

Ah, sorry for misreading.

But, wouldn't the code be more generic if clients query the LLVMContext for the summary? That way, all clients of LLVMContext have access to the cached summary, not just the inliner.

eraman added inline comments.Mar 30 2016, 4:42 PM
include/llvm/ProfileData/ProfileCommon.h
75 ↗(On Diff #52122)

That would require ProfileSummary definition to be visible to LLVMContext, doesn't it? Or are you suggesting keeping a (Module *, ProfileSummary *) pair in LLVMContext, let the client query it and if there is no cached entry, compute the summary and update this cache to make it available for other optimizations? IMO this is not worth the potential savings.

vsk added inline comments.Mar 30 2016, 4:55 PM
include/llvm/ProfileData/ProfileCommon.h
75 ↗(On Diff #52122)

I'm suggesting keeping a (Module *, ProfileSummary *) pair in LLVMContextImpl and exposing LLVMContext::getProfileSummary(Module *). The latter would take care of computing and caching the summary. We wouldn't have to expose ProfileCommon.h to all users of LLVMContext (we can just forward-declare ProfileSummary).

ISTM that this approach is identical to caching the summary in the inliner, but is easier to reuse. So, is your concern that it would be trickier to cache the summary in an LLVMContext vs. the inliner? Or would you prefer to not cache the summary anywhere?

eraman added inline comments.Mar 31 2016, 4:00 PM
include/llvm/ProfileData/ProfileCommon.h
75 ↗(On Diff #52122)

Sorry, I am still not getting it. To implement LLVMContext::getProfileSummary(Module *) in the .cpp file - specifically the part that computes the summary - we need the definition of ProfileSummary and not just the forward declaration. This would also have to call getFromMD which is defined in ProfileSummary.cpp. In short, this would make LLVMCore depend on LLVMProfileData (or make the code in ProfileData part of LLVMCore) which is something we don't want to do.. Does this make sense?

One thing we could do is keep a ProfileSummary* in Module and use it as a cache that is populated by ProfileSummary::getProfileSummary(Module *). This would only require forward declaration of ProfileSummary in Module.

vsk added inline comments.Apr 6 2016, 9:48 AM
include/llvm/ProfileData/ProfileCommon.h
75 ↗(On Diff #52122)

I'm sorry I missed your response! I understand the problem now. I think your solution (keep a ProfileSummary* in Module) is good.

eraman added inline comments.Apr 6 2016, 5:36 PM
include/llvm/ProfileData/ProfileCommon.h
75 ↗(On Diff #52122)

Unfortunately, this is not ideal either. Who should own the ProfileSummary object created out of Metadata? Module cannot since it doesn't see ProfileSummary destructor. I think the only options are to a) let the optimizations cache the summary or b) use ManagedStatic.

vsk added inline comments.Apr 6 2016, 11:21 PM
include/llvm/ProfileData/ProfileCommon.h
75 ↗(On Diff #52122)

I see, we can't place a unique_ptr<ProfileSummary> into LLVMContext without exposing its class definition.

In that case, I prefer option (b) because it's a bit more general solution.

davidxl added inline comments.Apr 7 2016, 11:41 AM
include/llvm/ProfileData/ProfileCommon.h
194 ↗(On Diff #52122)

I suggest doing some compile time measurement before using the caching mechanism. Besides, this won't affect O2 compile time at all.

eraman added inline comments.Apr 12 2016, 1:14 PM
include/llvm/ProfileData/ProfileCommon.h
194 ↗(On Diff #52122)

Here are some numbers:

  • It takes ~ 1.05 ms on a machine with Intel Xeon E5-2690 with 2.9GHz clock frequency per call to computeProfileSummary. Or 3000 cycles. The time was obtained by using a NamedRegionTimer to measure 1M calls to computeProfileSummary and I took the user + sys time.
  • To put this in perspective, while compiling a large real application, the time taken by computeProfileSummary is 25-30% of the time taken by CallAnalyzer's analyzeCall. So not caching will increase in a 25-30% increase in analyzeCall (which is now the only client of computeProfileSummary).

Of course, as you point out this is only in the PGO mode and doesn't affect O2 compiles, but still it seems like worthwhile caching this,

eraman added inline comments.Apr 12 2016, 2:20 PM
include/llvm/ProfileData/ProfileCommon.h
194 ↗(On Diff #52122)

I made the cahe into a ManagedStatic and added SmartScopedLock to control accesses to the cache. Now, a call to getProfileSummary takes ~323 ns (averaged over 20M calls). Out of this, 70% is system time.

What is the impact to overall compile time (using a large app)? On the other hand, if caching does not increase code complexity too much, it is also fine. Do you have a updated version of the patch?

eraman updated this revision to Diff 53590.Apr 13 2016, 10:52 AM
eraman edited edge metadata.

Use ManagedStatic and a mutex to access the cache

What is the impact to overall compile time (using a large app)? On the other hand, if caching does not increase code complexity too much, it is also fine. Do you have a updated version of the patch?

I was looking at compilation time of a fairly large file with a higher threshold (1500) for hot callsites. In this case, the total time spent on inlining (not just inline cost analysis) was around 5.5% of the total compilation time (as reported by -time-passes). If this is representative, we are looking at increasing the compilation time by ~1.4% (25% of 5.5%). Not much, but this is a low hanging fruit and the code with ManagedStatic is not much complex.

davidxl added inline comments.Apr 13 2016, 11:38 PM
lib/ProfileData/ProfileSummary.cpp
116 ↗(On Diff #53590)

dyn_cast

unittests/ProfileData/ProfileSummaryTest.cpp
65 ↗(On Diff #53590)

redundant line.

eraman updated this revision to Diff 53809.Apr 14 2016, 4:35 PM

Address David's comments.

looks good to me. Check with Vedant in case there are more comments.

This revision was automatically updated to reflect the committed changes.