This provides a convenient way to get ProfileSummary for a module and use the maximum function count from the profile summary instead of calling getMaximumFunctionCount. The getMaximumFunctionCount is not removed in this patch and will be done in a later patch.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Thanks, comments inline --
include/llvm/ProfileData/ProfileCommon.h | ||
---|---|---|
75 ↗ | (On Diff #52122) | Hm, this won't work in multi-threaded environments. You can either sink this into LLVMContext (preferred!), or make it an llvm::ManagedStatic defined outside of ProfileSummary for more visibility. If you go with the second option, you'll also need a mutex around all cache accesses. Somewhere down the line you might consider making this cache some kind of (Small)?DenseMap. |
lib/ProfileData/ProfileSummary.cpp | ||
36 ↗ | (On Diff #52122) | See comment above. |
371 ↗ | (On Diff #52122) | nit: I prefer if (Metadata *MD = M->getProfileSummary()) return ...;, but I'll leave it up to you. |
unittests/ProfileData/ProfileSummaryTest.cpp | ||
23 ↗ | (On Diff #52122) | I think this sort of initialization code is supposed to go into a void SetUp() method. Let gtest do its own thing with the test constructor. |
52 ↗ | (On Diff #52122) | It looks like this is worth lifting into a friend bool operator==(...) method for ProfileSummary. I think that would make things a bit easier to read; maybe clients would like to be able to compare summaries too. Wdyt? |
include/llvm/ProfileData/ProfileCommon.h | ||
---|---|---|
75 ↗ | (On Diff #52122) | Yes indeed. Perhaps it is better to not cache this and instead let the clients cache the summary? As to use of DenseMap, is this ever needed? I believe (and I may be wrong) the only place multiple modules are simultaneously present is in the LTO mode and there all modules are actually merged into one module which is where the IPO optimizations take place. Another crazy thought - we could just cache the ProfileSummary without any key since all modules compiled in a single process are bound to have the same profile summary (since summary is not module specific) |
include/llvm/ProfileData/ProfileCommon.h | ||
---|---|---|
75 ↗ | (On Diff #52122) | Sure, using a DenseMap is probably overkill. Having a one-entry cache seems useful though. Regarding your last point: it's cheap to keep the Module* as a key, and keeping it makes the code less "magical" (no hidden assumptions about which Module you're referring to). |
include/llvm/ProfileData/ProfileCommon.h | ||
---|---|---|
75 ↗ | (On Diff #52122) | Yeah, that's why I didn't go that route. I don't see a way to make this part of the LLVMContextImpl since ProfileSummary is not part of Core (and we obviously don't want Core to be dependent on ProfileData). In fact that is the reason why Module's getProfileSummary returns Metadata instead of ProfileSummary. So the only way to make this work is to make it an ManagedStatic right? |
include/llvm/ProfileData/ProfileCommon.h | ||
---|---|---|
75 ↗ | (On Diff #52122) | There's one more possibility. What if we cache ProfileSummary objects in the clients of this API, instead of in getProfileSummary? E.g we could cache the summary in CallAnalyzer. An LLVMContext should be readily available there, and that seems like the right place to put this. |
include/llvm/ProfileData/ProfileCommon.h | ||
---|---|---|
75 ↗ | (On Diff #52122) | That's what I meant by "Perhaps it is better to not cache this and instead let the clients cache the summary". This won't require using LLVMContext as the cached summary will be part of the client object (in this case it'll be in Inliner and passed on to CallAnalyzer) |
include/llvm/ProfileData/ProfileCommon.h | ||
---|---|---|
75 ↗ | (On Diff #52122) | Ah, sorry for misreading. But, wouldn't the code be more generic if clients query the LLVMContext for the summary? That way, all clients of LLVMContext have access to the cached summary, not just the inliner. |
include/llvm/ProfileData/ProfileCommon.h | ||
---|---|---|
75 ↗ | (On Diff #52122) | That would require ProfileSummary definition to be visible to LLVMContext, doesn't it? Or are you suggesting keeping a (Module *, ProfileSummary *) pair in LLVMContext, let the client query it and if there is no cached entry, compute the summary and update this cache to make it available for other optimizations? IMO this is not worth the potential savings. |
include/llvm/ProfileData/ProfileCommon.h | ||
---|---|---|
75 ↗ | (On Diff #52122) | I'm suggesting keeping a (Module *, ProfileSummary *) pair in LLVMContextImpl and exposing LLVMContext::getProfileSummary(Module *). The latter would take care of computing and caching the summary. We wouldn't have to expose ProfileCommon.h to all users of LLVMContext (we can just forward-declare ProfileSummary). ISTM that this approach is identical to caching the summary in the inliner, but is easier to reuse. So, is your concern that it would be trickier to cache the summary in an LLVMContext vs. the inliner? Or would you prefer to not cache the summary anywhere? |
include/llvm/ProfileData/ProfileCommon.h | ||
---|---|---|
75 ↗ | (On Diff #52122) | Sorry, I am still not getting it. To implement LLVMContext::getProfileSummary(Module *) in the .cpp file - specifically the part that computes the summary - we need the definition of ProfileSummary and not just the forward declaration. This would also have to call getFromMD which is defined in ProfileSummary.cpp. In short, this would make LLVMCore depend on LLVMProfileData (or make the code in ProfileData part of LLVMCore) which is something we don't want to do.. Does this make sense? One thing we could do is keep a ProfileSummary* in Module and use it as a cache that is populated by ProfileSummary::getProfileSummary(Module *). This would only require forward declaration of ProfileSummary in Module. |
include/llvm/ProfileData/ProfileCommon.h | ||
---|---|---|
75 ↗ | (On Diff #52122) | I'm sorry I missed your response! I understand the problem now. I think your solution (keep a ProfileSummary* in Module) is good. |
include/llvm/ProfileData/ProfileCommon.h | ||
---|---|---|
75 ↗ | (On Diff #52122) | Unfortunately, this is not ideal either. Who should own the ProfileSummary object created out of Metadata? Module cannot since it doesn't see ProfileSummary destructor. I think the only options are to a) let the optimizations cache the summary or b) use ManagedStatic. |
include/llvm/ProfileData/ProfileCommon.h | ||
---|---|---|
75 ↗ | (On Diff #52122) | I see, we can't place a unique_ptr<ProfileSummary> into LLVMContext without exposing its class definition. In that case, I prefer option (b) because it's a bit more general solution. |
include/llvm/ProfileData/ProfileCommon.h | ||
---|---|---|
194 ↗ | (On Diff #52122) | I suggest doing some compile time measurement before using the caching mechanism. Besides, this won't affect O2 compile time at all. |
include/llvm/ProfileData/ProfileCommon.h | ||
---|---|---|
194 ↗ | (On Diff #52122) | Here are some numbers:
Of course, as you point out this is only in the PGO mode and doesn't affect O2 compiles, but still it seems like worthwhile caching this, |
include/llvm/ProfileData/ProfileCommon.h | ||
---|---|---|
194 ↗ | (On Diff #52122) | I made the cahe into a ManagedStatic and added SmartScopedLock to control accesses to the cache. Now, a call to getProfileSummary takes ~323 ns (averaged over 20M calls). Out of this, 70% is system time. |
What is the impact to overall compile time (using a large app)? On the other hand, if caching does not increase code complexity too much, it is also fine. Do you have a updated version of the patch?
I was looking at compilation time of a fairly large file with a higher threshold (1500) for hot callsites. In this case, the total time spent on inlining (not just inline cost analysis) was around 5.5% of the total compilation time (as reported by -time-passes). If this is representative, we are looking at increasing the compilation time by ~1.4% (25% of 5.5%). Not much, but this is a low hanging fruit and the code with ManagedStatic is not much complex.