Page MenuHomePhabricator

[CSSPGO][llvm-profgen] Refactor to unify hashable interface for trace sample and context-sensitive counter
ClosedPublic

Authored by wlei on Dec 3 2020, 10:15 AM.

Details

Summary

As we plan to support both CSSPGO and AutoFDO for llvm-profgen, we will have different kinds of perf sample and different kinds of sample counter(cs/non-cs, with/without pseudo probe) which both need to do aggregation in hash map. This change implements the hashable interface(Hashable) and the unified base class for them to have better extensibility and reusability.

Currently perf trace sample and sample counter with context implemented this Hashable and the class hierarchy is like:

| Hashable  
           | PerfSample
                          | HybridSample
                          | LBRSample
           | ContextKey
                          | StringBasedCtxKey
                          | ProbeBasedCtxKey
                          | CallsiteBasedCtxKey
           | ...

For perf sample, HybridSample includes the call stack with LBR stack and LBRSample only includes LBR stack.
For context key used for virtual unwinding counter aggregation, we use string based context(StringBasedCtxKey) by default for good debug experience. For pseudo probe, we switch to use a stack of probe pointer(ProbeBasedCtxKey) to avoid redundant string handling. In the future, we can also speed up StringBasedCtxKey to use the original function frame stack, which is here named CallsiteBasedCtxKey

Implementation

  • Class specifying Hashable should implement getHashCode and isEqual. Here we make getHashCode a non-virtual function to avoid vtable overhead, so derived class should calculate and assign the base class's HashCode manually. This also provides the flexibility for calculating the hash code incrementally(like rolling hash) during frame stack unwinding
  • isEqual is a virtual function, which will have perf overhead. In the future, if we redesign a better hash function, then we can just skip this or switch to non-virtual function.
  • Added PerfSample and ContextKey as base class for perf sample and counter context key, leveraging llvm-style RTTI for this.
  • Added StringBasedCtxKey class extending ContextKey to use string as context id.
  • Refactor AggregationCounter to take all kinds of PerfSample as key
  • Refactor ContextSampleCounter to take all kinds of ContextKey as key
  • Other refactoring work:
    • Create a wrapper class SampleCounter to wrap RangeCounter and BranchCounter
    • Hoist ContextId and FunctionProfile out of populateFunctionBodySamples and populateFunctionBoundarySamples to reuse them in ProfileGenerator

Diff Detail

Event Timeline

wlei created this revision.Dec 3 2020, 10:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 3 2020, 10:15 AM
wlei requested review of this revision.Dec 3 2020, 10:15 AM
wlei retitled this revision from [CSSPGO][llvm-profgen] Rafactor to unify hashable interface for trace sample and context-sensitive counter to [CSSPGO][llvm-profgen] Refactor to unify hashable interface for trace sample and context-sensitive counter.Dec 3 2020, 9:00 PM
wlei edited the summary of this revision. (Show Details)
wlei added reviewers: wmi, davidxl, hoy, wenlei.
wlei updated this revision to Diff 309451.Dec 3 2020, 9:09 PM
wlei edited the summary of this revision. (Show Details)

add more comments

wlei updated this revision to Diff 309453.Dec 3 2020, 9:32 PM

fix a missing genHashCode, also add a assertion for this

wlei updated this revision to Diff 310368.Dec 8 2020, 3:15 PM

rebase

hoy added inline comments.Dec 9 2020, 6:22 PM
llvm/tools/llvm-profgen/PerfReader.h
104

Nit: could this be defined as an overloaded == operator?

wlei added inline comments.Dec 10 2020, 1:03 PM
llvm/tools/llvm-profgen/PerfReader.h
104

== should work. Here using Equal is intentional, I want to diff from == to indicate that this is a 'Equal' function for hash not for the regular compare. Currently we exactly compare the elements like ==, but maybe in the future if we have a good hash function, we can use the hash code for the comparison. Or we can have different custom Equal function, like ContextKeyEqual, SampleEqual. Or maybe the 'Equal' is not a proper naming. Any thoughts on this?

hoy added inline comments.Dec 10 2020, 3:14 PM
llvm/tools/llvm-profgen/PerfReader.h
104

Sounds good. It makes sense to name the comparison more specifically.

hoy added inline comments.Dec 13 2020, 11:23 PM
llvm/tools/llvm-profgen/PerfReader.cpp
76

Name it getOrCreateSampleCounter?

llvm/tools/llvm-profgen/PerfReader.h
110

I don't see a consumer of the two APIs. Maybe exclude them from this patch? I'm wondering if the shared pointer should be returned (for reference counting) when there is a need of exposing the underlying data.

wmi added inline comments.Dec 14 2020, 12:22 PM
llvm/tools/llvm-profgen/PerfReader.h
116
263

Can you use Hashable as a base class instead so you can remove the isEqual virtual function (The CRTP pattern: https://en.wikipedia.org/wiki/Curiously_recurring_template_pattern#Static_polymorphism)? It will also simplify the interface using Hashable.

struct StringBasedCtxKey : public Hashable<StringBasedCtxKey> {
...
}

And other type of Key later on:
struct ProbeBasedCtxKey : public Hashable<ProbeBasedCtxKey> {
...
}

wmi added inline comments.Dec 14 2020, 12:39 PM
llvm/tools/llvm-profgen/PerfReader.h
298

Can we rename it to ContextSampleCounterMap and its object to something like CtxCounterMap?

wlei updated this revision to Diff 311958.Dec 15 2020, 10:19 AM

address reviewers' feedback

wlei marked 4 inline comments as done.Dec 15 2020, 10:22 AM
wlei added inline comments.
llvm/tools/llvm-profgen/PerfReader.cpp
76

renamed

llvm/tools/llvm-profgen/PerfReader.h
110

Removed getData(), getPtr() is used when getting the key data, like PerfRead.cpp:232

if the shared pointer should be returned

Not sure const shared_ptr<Key> *K = dyn_cast<shared_ptr<Key>>(I.first.getPtr()); can work for this, let me try.
Also since it's the hashkey, it's only used like:

for(auto I : Map) {
    const Key *K = dyn_cast<Key>(I.first.getPtr());
    // use K
}

So K is always used inside of Map's scope.

263

Thanks for your suggestion and detailed example.
Here I want to put the different types derived from the same base class in the same container.
like :
struct StringBasedCtxKey : public CtxKey {
...
}
struct ProbeBasedCtxKey : public CtxKey {
...
}
Then I can use only one container unordered_map<CtxKey, ...> ContextSampleCounterMap . If use CRTP, I guess it need two containers. any ideas about this?

298

Good point, the value is actually a counter, renamed.

wmi added inline comments.Dec 15 2020, 12:16 PM
llvm/tools/llvm-profgen/PerfReader.h
263

Ok, I don't have better idea. A possible solution is described in https://en.wikipedia.org/wiki/Curiously_recurring_template_pattern#Pitfalls, but that prevents Hashable class from being reused by PerfSample without duplication.

I am fine with your existing solution If no one else has better idea.

wlei marked 2 inline comments as done.Dec 15 2020, 1:02 PM
wlei added inline comments.
llvm/tools/llvm-profgen/PerfReader.h
263

Yeah, we want both PerfSample and ContextKey could implements a unified Hashable. Thank you.

hoy accepted this revision.Dec 16 2020, 12:20 AM
hoy added inline comments.
llvm/tools/llvm-profgen/PerfReader.h
110

Current getPtr() implementation and its use looks good. I was thinking getPtr() might leak the underlying data out of the shared pointer scope when destroyed.

263

Both patterns have their specialties. The current solution looks a bit cleaner though it falls back to use virtual methods to solve hash collision. Hopefully that could be mitigated by using a more efficient hash code.

This revision is now accepted and ready to land.Dec 16 2020, 12:20 AM

Thanks for the refactoring and clean up, looks great!

LBRSample will be added when AutoFDO support is moved into llvm-profgen, right? For AutoFDO, we could use the same infrastructure except that context will always be empty.

What is CallsiteBasedCtxKey in the description? A stack of call site addresses? Brief comment for each of the leaf class that was mentioned in the description (even if not implemented) would be useful.

llvm/tools/llvm-profgen/PerfReader.cpp
216

Does this map also needs to own a copy of the context string?

nit: perhaps remove this comment below? determinism mentioned above is good enough.

This is due to a build failure on sanitizer build(asan/msan/ubsan)

llvm/tools/llvm-profgen/PerfReader.h
171–174

nit: make hashCombine a lambda function inside genHashCode if there's no other use?

192

nit: rename it AggregatedCounter? we have AggregatedSamples as names.

wlei updated this revision to Diff 312294.Dec 16 2020, 1:30 PM

Address Wenlei's feedback

wlei marked 3 inline comments as done.Dec 16 2020, 1:33 PM

LBRSample will be added when AutoFDO support is moved into llvm-profgen, right? For AutoFDO, we could use the same infrastructure except that context will always be empty.

Good question, Yes, my initial thought is to decouple this by using separated LBRSample when it comes to AutoFDO, perhaps this's good for readability. I guess your suggestion is for the performance since we won't have virtual call. That's good. If so and we don't have other kinds of perf sample, we might don't need the base class PerfSample and can just name HybridSample to more general name(like PerfSample).

What is CallsiteBasedCtxKey in the description? A stack of call site addresses? Brief comment for each of the leaf class that was mentioned in the description (even if not implemented) would be useful.

Yes, it's a vector of original call site address(Callstack). I was thinking whether we could use this to replace the string based context key in unwinder stage, later in ProfileGenerator convert to string. Just an idea. Will add more comments to the summary part.

llvm/tools/llvm-profgen/PerfReader.cpp
216

Good point. Changed to StringRef and remove this comment.

wenlei accepted this revision.Dec 16 2020, 1:57 PM

LBRSample will be added when AutoFDO support is moved into llvm-profgen, right? For AutoFDO, we could use the same infrastructure except that context will always be empty.

Good question, Yes, my initial thought is to decouple this by using separated LBRSample when it comes to AutoFDO, perhaps this's good for readability. I guess your suggestion is for the performance since we won't have virtual call. That's good. If so and we don't have other kinds of perf sample, we might don't need the base class PerfSample and can just name HybridSample to more general name(like PerfSample).

Right.. LBR sample can be a Hybrid sample with empty stack, in which case we don't need hierarchy. But we can refine and deal with that later.

What is CallsiteBasedCtxKey in the description? A stack of call site addresses? Brief comment for each of the leaf class that was mentioned in the description (even if not implemented) would be useful.

Yes, it's a vector of original call site address(Callstack). I was thinking whether we could use this to replace the string based context key in unwinder stage, later in ProfileGenerator convert to string. Just an idea. Will add more comments to the summary part.

wlei edited the summary of this revision. (Show Details)Dec 18 2020, 2:35 PM
wlei added a comment.Jan 6 2021, 10:23 AM

Kindly ping @wmi ,
I have addressed feedbacks from Hongtao and Wenlei, want to know whether this's also good for you or any other questions.
Also for other pseudo probe patches(https://reviews.llvm.org/D92334, https://reviews.llvm.org/D92896, https://reviews.llvm.org/D92998). Thanks!

wmi accepted this revision.Jan 7 2021, 4:56 PM

Sorry for the delay. LGTM.

This revision was landed with ongoing or failed builds.Jan 13 2021, 11:07 AM
This revision was automatically updated to reflect the committed changes.