This is the compiler part of the context split work. Please refer to D107299 for the original whole patch.
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Thanks for working on split context representation. This really makes context first class citizen in the framework, and the speed up it brought us is also very nice.
llvm/include/llvm/ProfileData/SampleProf.h | ||
---|---|---|
403 | nit: having the word Type in type name does not add value, and the class also doesn't have anything specific for sample. Suggest we call it CallSiteLocation. There're a few other others with Type in its name too. | |
420 | LineOffset 0 can be a valid call site location given this is offset not absolute line number. | |
440–441 | SampleContextRefType makes people think of SampleContext&, how about SampleContextStorageType -> SampleContextFrameVector, SampleContextRefType -> SampleContextFrames? Accordingly SampleContext::getFullContext->SampleContext::getContextFrames. | |
449–457 | This needs update. One thing I was looking for a clarification from comments somewhere but didn't find it: whether context-less profile would have a non-empty FullContext which only contains leaf? From the code it seems answer is no? But then this is not consistent since FullContext of CS profile contains leaf frame. Or would it be possible to remove leaf frame in FullContext for CS profile too? leaf doesn't need call site location either. | |
501–502 | Can we remove this and replace all its call with getName? | |
505 | Use std::ostringstream to help repeated string append? | |
515 | Seems std::string toString() const is the conventional name for string converter. | |
526 | Sanity check/assert to make sure the new name does not conflict leaf frame in FullContext? Or is this only allowed for non-CS case? | |
532 | Should we assert that CState is not unknown? i.e. even if Context only has one frame, this is still considered context profile (something like [foo] before, as opposed to foo). | |
582–589 | This sequence of copy, edit and compare could be simplified? if (ThisContext.back().CallerName != ThatContext.back().CallerName) return false; Even better, would it be possible to canonicalize context, so location of leaf frame is always (0, 0)? | |
600 | Clarify in comment whether this is empty for non-CS profile. | |
llvm/include/llvm/Transforms/IPO/SampleContextTracker.h | ||
93–107 | This is a comparer only as it doesn't provide sorting directly. So ProfileComparer? | |
llvm/lib/Transforms/IPO/SampleContextTracker.cpp | ||
67 | One step further, ContextFramesToRemove? | |
307–309 | Curious what triggered this order change? |
llvm/include/llvm/ProfileData/SampleProf.h | ||
---|---|---|
403 | CallSiteLocation sounds good. | |
420 | Good catch. Switched to using INT32_MAX as an invalid line offset. | |
440–441 | Sounds good. | |
449–457 | The new context representation should be consistent with the old one for CS profile, i.e, it can contain a leaf frame only. The sample reader creates such leaf-only context. For non-CS profile, the context string will be empty. Excluding leaf frame from FullContext should also work and it is cleaner by design. But the current way is more natural to implement, for example the IsPrefixOf operation, and some operations in llvm-profgen that appends the leaf probe into an existing calling context. Also the constructor takes in an arrayRef of the underlying vector, which, if separated, will need some special handling in the reader. | |
501–502 | Yes, done. | |
505 | Done. | |
515 | Changed to toString | |
526 | Assert added. This supposed to use by non-CS case. | |
532 | Sounds good, assert added. | |
582–589 | The API can be used to compare A:1 @ B with A:1 @ B:2 @ C, where we have to drop the line number of B:2. The location of leaf frame was (0,0) previously, now I'm changing it to (-1,0). But the that context can still have a valid line number like B:2 where we have to drop the line number for comparision. | |
600 | Done. | |
llvm/include/llvm/Transforms/IPO/SampleContextTracker.h | ||
93–107 | Sounds good. | |
llvm/lib/Transforms/IPO/SampleContextTracker.cpp | ||
67 | Renamed to ContextFramesToRemove. | |
307–309 | The context of an inlined or merged node may be out-of-sync with its tree path so getContextFor will AV in that case. This can happen when functions on a tree path are not processed in top-down order, due to recursions. |
llvm/include/llvm/ProfileData/SampleProf.h | ||
---|---|---|
449–457 | makes sense, thanks for clarification. though it is indeed inconsistent with non-CS profile in the sense that FullContext doesn't contain leaf for non-CS, but contain leaf for CS profile. | |
582–589 | Ok, if we have to ignore the location part of B, can we simplified the copy, edit and compare sequence as suggested above still? (maybe i'm missing something, but it looks to me that you answered the 2nd question only?) |
llvm/include/llvm/ProfileData/SampleProf.h | ||
---|---|---|
582–589 | Do you mean something like below? if (ThisContext.back().CallerName != ThatContext.back().CallerName) return false; if (ThisContext.back().hasLineLocation()) return ThisContext.back().linelocation == ThatContext.back().linelocation return true; |
llvm/include/llvm/ProfileData/SampleProf.h | ||
---|---|---|
582–589 | return true --> return ThisContext.drop_back() == ThatContext.drop_back() |
llvm/include/llvm/ProfileData/SampleProf.h | ||
---|---|---|
582–589 | Yes if actually expect ThisContext (the prefix) to have location for its leaf (i thought the prefix's leaf won't have location as shown in your example), otherwise we could omit the location compare too. |
llvm/include/llvm/ProfileData/SampleProf.h | ||
---|---|---|
582–589 | That's true with the current uses of this function. I'm adding an assert for that and removing the line location check. Supporting the location compare is slow, especially this function is frequently used in profile reading. |
llvm/include/llvm/ProfileData/SampleProf.h | ||
---|---|---|
490 | The implicit operator here seems unnecessary given you have getContextFrames below providing the same thing. Is this needed? | |
546 | The == and != operators are not always opposites. If another instance of this object has the same State/Name/FullContext == and != will both evaluate as true. Should != check on all of these fields/be a direct negation of ==? |
llvm/include/llvm/ProfileData/SampleProf.h | ||
---|---|---|
423 | This one can now be removed as well. |
llvm/include/llvm/ProfileData/SampleProf.h | ||
---|---|---|
586 | is this more heavy operator than direct implementation of >(like <). say if two contexts are very long and only the last frame is different. it will take 2 * n for (*this != That) plus !(*this < That), rather than n. |
llvm/include/llvm/ProfileData/SampleProf.h | ||
---|---|---|
420 | On the second thought, sticking with LineOffset 0 for leaf frame is more natural. We have a bench places in the compiler/reader/writer/llvm-profgen that uses 0 as leaf line location. By default, SampleContext should represent a valid call context with the leaf frame, so most of the time we should just automatically strip off the leaf line location., such as when doing prefix check, getting the full context string. However there is a case in llvm-profgen where SampleContext is used to represent intermediate context for pseudo probes when the leaf probe is not ready. This causes diversion, especially in the full context string printing. Given that's the only place we diverge, I'm adding a flag to getContextString to tolerate that, instead of introducing the hasLineLocation API. So with the latest change, by default, we should assume the last element of the context vector is the leaf frame, except in the special llvm-profgen case. Does that make sense? |
llvm/include/llvm/ProfileData/SampleProf.h | ||
---|---|---|
26 | I don't see DJB hash used anywhere, remove as well? | |
llvm/lib/Transforms/IPO/SampleContextTracker.cpp | ||
307–309 | Could you share a concrete example for such case? I was expecting context to be in-sync with its position in trie, because when we promote a node in trie, we also adjust its context accordingly. Does it also happen before this change? |
llvm/include/llvm/ProfileData/SampleProf.h | ||
---|---|---|
420 | I think as long as we don't use 0 line offset to indicate no line info, it's fine. Actually if we use CallSiteLocation to represent leaf, this really isn't a call site (call site always has a line location), but rather a frame. The comment is also inaccurate: "Represent a callsite with caller function name and line location". We have typedef std::pair<std::string, LineLocation> FrameLocation; in llvm-profgen, once we have string buffers owning the underlying strings, that FrameLocation can be merged too. Btw, I hope we can unify string converter to be toString. Right now we have CallSiteLocation::str(), SampleContext::getContextString() and SampleContext::toString. SampleContext::toString properly dispatches for CS and non-CS case, but SampleContext::getContextString() doesn't. How about merge the two and let toString take a boolean? |
llvm/include/llvm/Transforms/IPO/SampleContextTracker.h | ||
---|---|---|
98 | I remember you mentioned this is the bottleneck, so how about adding another check to filter more cases, specifically use the frame size(or context string size). like the above code. |
llvm/include/llvm/ProfileData/SampleProf.h | ||
---|---|---|
26 | Removed. | |
420 | Right CallSiteLocation is really for a callsite, we just take advantage of it to represent a leaf frame in SampleContext, and we intentionally ignore the line location for leaf frame. This isn't ideal, but easier to implement. Alternatively, the LineLocation field can be made an Optional field, but that adds the complexity most of the time we don't need. The FrameLocation type is being replaced by D108437. I'm renaming CallSiteLocation::str() as toString. SampleContext::getContextString() is a static function that works on an ArrayRef object. This is mostly for the sake of intermediate contexts (without leaf probe) in llvm-profgen. SampleContext::toString is not supposed to work with intermediate contexts. I can move getContextString out of SampleContext if it is confusing here. | |
586 | Good point. I just removed the > operator which is only used by sorting profiles for dumping. The sorting will use < instead. | |
llvm/lib/Transforms/IPO/SampleContextTracker.cpp | ||
307–309 | Unfortunately I don't seem to find that extreme case after adjusting the context promotion order, i.e, with a comparer. On the other hand, if we are not going to do anything for an inlined or merged context, we can bail out earlier without taking time to find its context node? |
llvm/include/llvm/Transforms/IPO/SampleContextTracker.h | ||
---|---|---|
98 | This is not the bottleneck. The bottleneck is in sample reader when trying to load contexts out of the current module. There we sort all contexts in the profile first and selectively load the contexts which appear to be a subtree of a given context. Using ordered set here is to enforce an order of promoting the contexts of a function later. We have seen cases that different promoting order led to different codegen, especially when recursion is involved. |
llvm/include/llvm/ProfileData/SampleProf.h | ||
---|---|---|
420 | Ok, keeping static function getContextString is fine. Given we have SampleContextFrames, SampleContextFrameVector, perhaps this CallSiteLocation should be renamed to ContextFrame to reflect what it really is? | |
llvm/lib/Transforms/IPO/SampleContextTracker.cpp | ||
307–309 | Avoid context trie look up sounds good. Just wanted to make sure we don't have lurking issues as I don't see how they can be out of sync - it could be some other issues. |
llvm/include/llvm/ProfileData/SampleProf.h | ||
---|---|---|
420 | ContextFrame sounds good. How about SampleContextFrame? |
llvm/include/llvm/ProfileData/SampleProf.h | ||
---|---|---|
420 | Sounds good too, though it may look too similar to SampleContextFrames, which can cause confusion. But either way is fine. |
llvm/include/llvm/Transforms/IPO/SampleContextTracker.h | ||
---|---|---|
98 | I see, thanks for the clarification. |
llvm/include/llvm/ProfileData/SampleProf.h | ||
---|---|---|
546 | Ah I missed that it was comparing the dereferenced this which is invoking the == operator directly. Thought it was comparing the pointers. |
I don't see DJB hash used anywhere, remove as well?