This is an archive of the discontinued LLVM Phabricator instance.

[llvm-profgen] Allow unsymbolized profile as perf input
ClosedPublic

Authored by wlei on Oct 13 2021, 11:34 AM.

Details

Summary

This change allows the unsymbolized profile as input. The unsymbolized profile is created by llvm-profgen with --skip-symbolization and it's after the sample aggregation but before symbolization , so it has much small file size. It can be used for sample merging and trimming, also is useful for debugging or adding test cases. A switch --unsymbolized-profile=file-patch is added for this.

Format of unsymbolized profile:

[context stack1]    # If it's a CS profile
   number of entries in RangeCounter
   from_1-to_1:count_1
   from_2-to_2:count_2
   ......
   from_n-to_n:count_n
   number of entries in BranchCounter
   src_1->dst_1:count_1
   src_2->dst_2:count_2
   ......
   src_n->dst_n:count_n
 [context stack2]
   ......

Diff Detail

Event Timeline

wlei created this revision.Oct 13 2021, 11:34 AM
wlei requested review of this revision.Oct 13 2021, 11:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 13 2021, 11:34 AM
wlei retitled this revision from [llvm-profgen] Allow unsymbolized profile as input to [llvm-profgen] Allow unsymbolized profile as perf input.Oct 13 2021, 11:53 AM
wlei edited the summary of this revision. (Show Details)
wlei added reviewers: hoy, wenlei.
hoy added a comment.Oct 18 2021, 10:35 AM

Thanks for bringing up the support of raw profile as input. Can you please describe the format of raw profile in the summary and somewhere in the code (RawProfileReader) ?

wenlei added inline comments.Oct 20 2021, 3:22 PM
llvm/tools/llvm-profgen/PerfReader.cpp
310–311

How about we use PerfInputFile PerfReaderBase::convertPerfDataToTrace(ProfiledBinary *Binary, PerfInputFile &PerfInput), and return a new PerfInput from the function?

719–726

Can we move this a lambda instead of global function since it's only used in readSampleCounters?

If lambda grows too big, we can use member functions too.

728

Some comment showing the expected input for parsing would be helpful.

758

If we use this flag to control profile reading too, we need to update flag description.

775–776

The underlying object created by make_shared stays on heap, but SampleCounters.emplace creates another copy of the object? is that intentional or did I miss anything?

784

Would it better if hash code is lazily generated instead of requiring an explicit call?

getHashCode() {
  if (HashCode == 0) 
     genHashCode()
  ...
}
794–795

So if user specify skip-symbolization and the input is raw profile, we are just reading it in and then writing it out without doing anything?

llvm/tools/llvm-profgen/PerfReader.h
62

nit: all caps is usually only used for macro. For enum, it's better to follow normal variable naming, which is also the convention used in llvm

62–67

These are not mutually exclusive types. e.g. PERFSCRIPT_UNKNOWN can be either PERFSCRIPT_LBR or PERFSCRIPT_LBR_STACK.

Logically there're two things being represented, the format and the content. To avoid exponential combination and to have a clear separation, perhaps use format, content masks instead, or even separate them out?

We can have format = RawProfile|PerfScript|PerfData, and content = LBR|LBRStack|Aggregated.

73–74

nit: it's cleaner to keep it a POD type and remove the ctor. we can use initializer list to create the object.

llvm/tools/llvm-profgen/llvm-profgen.cpp
52

I'm not sure if raw profile is a good name. the real raw profile is the perf.data, after some processing we get perf script, and with more processing we get this "raw profile". Perhaps we need a alternative name, unsymbolized profile?

109

nit: createPerfInputFile -> getPerfInputFile

wlei updated this revision to Diff 381603.Oct 22 2021, 11:00 AM
wlei marked 7 inline comments as done.

rebase and addressing feedbacks

wlei added a comment.Oct 22 2021, 11:00 AM

Thanks for bringing up the support of raw profile as input. Can you please describe the format of raw profile in the summary and somewhere in the code (RawProfileReader) ?

Sounds good. added the format to the head of class.

llvm/tools/llvm-profgen/PerfReader.cpp
310–311

Fixed.

719–726

Moved to a lambda.

728

comments added

758

description updated

784

Here to explicitly call genHashCode is intentional, the reason is we want to avoid making genHashCode a virtual function, i, e, avoid call genHashCode after casting to base class. so we separate like:

derived class:  HashCode = genHashCode();

base class :  getHashCode{return HashCode;}
794–795

Yes, otherwise user give skip-symbolization with unsymbolized profile, it won't have output. And I actually used it during debugging or maybe in the future we can do something after read then output it.

Added a warning to this

llvm/tools/llvm-profgen/PerfReader.h
62

Fixed, thanks!

62–67

Makes sense.Changed to separate them out.

llvm/tools/llvm-profgen/llvm-profgen.cpp
52

yeah, it's clearer to use --unsymbolized-profile, changed. The name is a little long then, so add an alias to it.

wlei added inline comments.Oct 22 2021, 11:27 AM
llvm/tools/llvm-profgen/PerfReader.cpp
775–776

(Seems phabricator lost my comments..)

SampleCounters key is a Hashable<ContextKey> type.

My understanding is emplace will call the ctor of Hashable, it's like:

template <class T> class Hashable {
public:
  std::shared_ptr<T> Data;
  Hashable(const std::shared_ptr<T> &D) : Data(D) {}
...

Hashable only own the shared_ptr, do not own the underlying object, so it should only copy the shared_ptr.

wenlei added inline comments.Oct 22 2021, 12:38 PM
llvm/tools/llvm-profgen/PerfReader.cpp
358–359

nit: we can do return {PerfTraceFile, PerfFormat::PerfScript, PerfContent::Unknown};

784

Not sure if I understand. Why do we want to avoid call genHashCode after casting to base class?

llvm/tools/llvm-profgen/PerfReader.h
62–67

Now that format and content are separated, perhaps they no longer need to use mask within each? Sorry if I wasn't clear, mask or separate field are two ways to have separation for format and content.

70

We still want to reserve 0 as unknown or invalid?

78–79

Can we use enum types instead of integer?

651

Also comment to explain the difference between CS and non-CS profile?

wlei updated this revision to Diff 381651.Oct 22 2021, 2:08 PM
wlei marked 4 inline comments as done.

Updating D111750: [llvm-profgen] Allow unsymbolized profile as perf input

llvm/tools/llvm-profgen/PerfReader.cpp
358–359

Good to know this, thanks

784

Currently we have two type of key, StrKey and ProbeKey and they derived from the base ContextKey.

We have a hash map to store them, like unordered_map<ContextKey, ...>, the logic of insert is like

StrKey* key = ..

hashmap[key] = ...;

In the hashmap, StrKey*/ProbeKey* implicitly be cast to a ContextKey* class then call ContextKey->getHashCode , so getHashCode should be a virtual function which has overhead because StrKey and ProbeKey have different genHashCode.

So to avoid this, we can explicitly call StrKey->genHashCode() before being casting to base and store it into a variable HashCode

then base's getHashCode just read the HasCode, no need a virtual function.

llvm/tools/llvm-profgen/PerfReader.h
62–67

I see, thanks for the clarification.

651

comments updated

wlei edited the summary of this revision. (Show Details)Oct 22 2021, 2:16 PM
wenlei accepted this revision.Oct 23 2021, 4:43 PM

lgtm, thanks.

llvm/tools/llvm-profgen/PerfReader.cpp
784

so getHashCode should be a virtual function which has overhead because StrKey and ProbeKey have different genHashCode.

So to avoid this

Why do we want to avoid this? performance reason? It looks to me that having getHashCode as virtual function is natural and clean.

This pattern perhaps is not related to this patch though.

llvm/tools/llvm-profgen/PerfReader.h
664

nit: this does not align with [context stack1]

This revision is now accepted and ready to land.Oct 23 2021, 4:43 PM
hoy added inline comments.Oct 24 2021, 10:16 PM
llvm/tools/llvm-profgen/PerfReader.cpp
789

Add a test for this?

llvm/tools/llvm-profgen/PerfReader.h
669

What is the main reason of making this type hierarchy? It looks like UnsymbolizedProfileReader doesn't need most of the interfaces PerfReaderBase provides. Conceptually it sounds to me that the two classes function independently, and if we'd like code sharing, UnsymbolizedProfileReader can be made the base class of PerfReaderBase or make a new base class that simply reads in something and outputs an symbolized profile?

wenlei added inline comments.Oct 25 2021, 11:54 AM
llvm/tools/llvm-profgen/PerfReader.h
669

UnsymbolizedProfileReader can be made the base class of PerfReaderBase

I don't think this is a good idea. Conceptually, PerfReader is not a special kind of UnsymbolizedProfileReader.

hoy added inline comments.Oct 25 2021, 12:10 PM
llvm/tools/llvm-profgen/PerfReader.h
669

Right, they are independent of each other. PerfReader really deals with perf input. They only share in the raw output writting. A new base class makes more sense?

wlei added inline comments.Oct 25 2021, 12:30 PM
llvm/tools/llvm-profgen/PerfReader.cpp
784

Yeah, my initial intention is performance reason. I can try it in a separate patch.

789

We have the test, see noinline-cs-noprobe.test.

llvm/tools/llvm-profgen/PerfReader.h
664

updated comments

669

It seems to me the hierarchy should be like

PerfReaderBase.  --->   PerfScriptReader.   ---> LBRPerfReader
                                            ---> HybridPerfReader

                 --->   UnsymbolizedProfileReader

This will make the hierarchy deep though.

hoy added inline comments.Oct 25 2021, 1:23 PM
llvm/tools/llvm-profgen/PerfReader.h
669

That's the current hierarchy, IIUC? I was wondering if UnsymbolizedProfileReader should inherit from a base class that's not PerfReaderBase. If you look at the implementation of PerfReaderBase, there isn't much UnsymbolizedProfileReader shares. The writing of unsymbolized profile is probably not needed for UnsymbolizedProfileReader.

wlei added inline comments.Oct 25 2021, 2:44 PM
llvm/tools/llvm-profgen/PerfReader.h
669

Yeah, It's mostly similar to your suggestion. I meant just rename current PerfReaderBase to PerfScriptReader. then we can use PerfReaderBase as the new base.

wlei updated this revision to Diff 382157.Oct 25 2021, 5:44 PM

Refactor the code: separate the perfscript reader logic out of perfreaderbase
and create a new class PerfScriptReader and let LBRPerfReader and HybridPerfreader inherit from it.

wenlei added inline comments.Oct 25 2021, 5:55 PM
llvm/tools/llvm-profgen/PerfReader.cpp
784

Yes, a separate patch is fine. But I feel we might be over optimizing here unless the perf delta is visible.

llvm/tools/llvm-profgen/PerfReader.h
669

The hierarchy in the latest version looks good to me.

However, I'm wondering if we could move LBRPerfReader::generateRawProfile into PerfScriptReader? That way we could have HybridPerfReader and LBRPerfReader both inherit from PerfScriptReader directly. The LBRPerfReader-> HybridPerfReader hierarchy seems a bit weird, but I understand it was for reusing LBRPerfReader::generateRawProfile. Don't bother if this can cause bigger change.

hoy added inline comments.Oct 25 2021, 6:24 PM
llvm/tools/llvm-profgen/PerfReader.h
669

I now see what you mean. The renaming sounds good. It separates UnsymbolizedProfileReader which doesn't do anything with perf (perf data or perf script) from the old hierarchy. Perhaps rename the new PerfReaderBase to ProfileReaderBase?

wlei added inline comments.Oct 25 2021, 7:15 PM
llvm/tools/llvm-profgen/PerfReader.h
669

However, I'm wondering if we could move LBRPerfReader::generateRawProfile into PerfScriptReader?

Sounds good, not a big change.

Perhaps rename the new PerfReaderBase to ProfileReaderBase?

Just want to confirm this, do you mean also change the file name? currently the file is PerfReader.cpp/h. It seems the base class is used to read perf.data, perf script, unsymbolized profile, it has both perf and profile term, I don't have any preference though.

hoy added inline comments.Oct 25 2021, 8:01 PM
llvm/tools/llvm-profgen/PerfReader.h
669

I see. I’m fine with the PerfReaderBase class name and PerfReader file names to minimize the changes. Perf is the main scenario we support.

wlei updated this revision to Diff 382178.Oct 25 2021, 8:58 PM

move LBRPerfReader::generateUnsymbolizedProfile into PerfScriptReader

hoy accepted this revision.Oct 25 2021, 10:00 PM

lgtm, thanks!

This revision was landed with ongoing or failed builds.Oct 25 2021, 11:58 PM
This revision was automatically updated to reflect the committed changes.