This is an archive of the discontinued LLVM Phabricator instance.

[llvm-profgen] Read sample profiles for post-processing.
ClosedPublic

Authored by hoy on Mar 14 2022, 5:55 PM.

Details

Summary

Sometimes we would like to run post-processing repeatedly on the original sample profile for tuning. In order to avoid regenerating the original profile from scratch every time, this change adds the support of reading in the original profile (called symbolized profile) and running the post-processor on it.

Diff Detail

Event Timeline

hoy created this revision.Mar 14 2022, 5:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2022, 5:55 PM
Herald added subscribers: modimo, wenlei. · View Herald Transcript
hoy requested review of this revision.Mar 14 2022, 5:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2022, 5:55 PM
hoy edited the summary of this revision. (Show Details)Mar 14 2022, 6:00 PM
hoy added reviewers: wenlei, wlei.

original non-modified profile

nit: rephrase to be clearer what it means, or just simplify "llvm pgo profile".

llvm/test/tools/llvm-profgen/cold-profile-trimming-symbolized.test
3

Maybe we should probably support no-diff llvm profile to llvm profile conversion? You mentioned the call site samples as a difference and I'm wondering whether this is an easy to fix difference.

llvm/tools/llvm-profgen/PerfReader.h
68 ↗(On Diff #415288)

I'd prefer not to introduce a new term, given we already have a proper name for it - LLVM PGO profile. Also this is not really PerfFormat anyways.

I think it's fine and probably clearer to create a side channel to deal with llvm profile as input, rather than trying to fit in the existing perf format/reader structure. Taking llvm profile as input is indeed a special mode and use of llvm-profgen anyways.

730 ↗(On Diff #415288)

This doesn't look right. This is no longer a perf profile, but rather a proper LLVM profile, so inheriting from PerfReaderBase isn't good. As you can see, there's really no notion of trace, so parsePerfTraces really just calls LLVM's profile reader.

Maybe instead of passing a reader to profile generator, we can just either pass the counters like we do today, or a ProfileMap which is what profile reader gives us. This may need two separate ctor for profile generator. But I think that's better than the (awkward) inheritance we have here.

llvm/tools/llvm-profgen/llvm-profgen.cpp
53

This is just PGO profile right? If so, I don't think we need yet another name for it. We can say something like llvm-pgo-profile to be clear.

Do you intend to make this work for all kinds of pgo profiles as input? like CSSPGO/AutoFDO, flat vs nest etc.

hoy added inline comments.Mar 23 2022, 10:19 AM
llvm/tools/llvm-profgen/PerfReader.h
730 ↗(On Diff #415288)

I was thinking about that. Not only for the profile generator, the reader would also need a different constructor or a different type hierarchy and the work flow of llvm-profgen may not look as clean as now. As shown below (from llvm-profgen::main), the current approach tries to reuse the PerfReaderBase and ProfileGeneratorBase. I guess it's a tradeoff between adding a new base class for Reader or diverging the workflow structure a little bit.

PerfInputFile PerfFile = getPerfInputFile();
 std::unique_ptr<PerfReaderBase> Reader =
     PerfReaderBase::create(Binary.get(), PerfFile);
 // Parse perf events and samples
 Reader->parsePerfTraces();

 if (SkipSymbolization)
   return EXIT_SUCCESS;

 std::unique_ptr<ProfileGeneratorBase> Generator =
     ProfileGeneratorBase::create(Binary.get(), Reader->getSampleCounters(),
                                  Reader->profileIsCSFlat());
 Generator->generateProfile();
 Generator->write();
llvm/tools/llvm-profgen/llvm-profgen.cpp
53

llvm-pgo-profile sounds better. Yes, I'd like all kinds for sample profiles to be read and postpreocessed by the tool.

wenlei added inline comments.Mar 23 2022, 10:51 AM
llvm/tools/llvm-profgen/PerfReader.h
730 ↗(On Diff #415288)

the reader would also need a different constructor or a different type hierarchy and the work flow of llvm-profgen may not look as clean as now

It'd be a very thin wrapper over the actually llvm profile reader. Maybe a standalone function is all we need, instead of a new reader class. we take file as input, produced ProfileMap and pass it to generator. The use case and workflow is indeed special, it's like a side channel not tied to the main use of the tool (raw profile to llvm profile). Forcing that special workflow to use the same structure of the rest creates the problems. Yes, it's tradeoff, but shove llvm profile reader under the hierarchy of perf reader doesn't look like a good one to me.

hoy added inline comments.Mar 23 2022, 5:06 PM
llvm/tools/llvm-profgen/PerfReader.h
730 ↗(On Diff #415288)

Changed to using a separate channel for sample profile input.

llvm/tools/llvm-profgen/llvm-profgen.cpp
53

I ended up ing llvm-sample-profile.

hoy updated this revision to Diff 417790.Mar 23 2022, 5:07 PM

Using a separate channel for sample profile input.

hoy retitled this revision from [llvm-profgen] Read symbolized profiles for post-processing. to [llvm-profgen] Read sample profiles for post-processing..Mar 23 2022, 5:07 PM
hoy edited the summary of this revision. (Show Details)Mar 23 2022, 5:23 PM
hoy added inline comments.
llvm/test/tools/llvm-profgen/cold-profile-trimming-symbolized.test
3

The check here is to make sure the profile post-processing is effective. There's no cross-type profile conversion here. The non-diff profile conversion I mentioned was about converting CS flat profile to CS nest which should be independent of this diff. I can make a separate fix for that though.

hoy updated this revision to Diff 417796.Mar 23 2022, 5:42 PM

Moving pseudo probe decoding out of probe profile generation so that it'll run for sample profile input case.

wenlei added inline comments.Mar 24 2022, 10:26 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
431–435

So for pgo profile as input, we still need probe decoding because preinliner needs context size based on profile, right?

llvm/tools/llvm-profgen/ProfileGenerator.h
136–139

Both Reader->getSampleCounters() and Reader->getProfiles() return by reference so there should be no copy, why do we need move ctor here?

hoy added inline comments.Mar 25 2022, 12:55 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
431–435

Exactly.

llvm/tools/llvm-profgen/ProfileGenerator.h
136–139

This is because previously the SampleCounters field is a reference and it must be initialized in constructors. I'm now changing it to a value field and using the move constructor to initialize it when needed.

wenlei added inline comments.Mar 25 2022, 1:52 PM
llvm/tools/llvm-profgen/ProfileGenerator.h
136–139

Ok, move still has some cost as a new object is constructed. In this case can we just use pointer type instead?

IIUC, the change is to allow them to be "optional" depending on whether input is perf profile or llvm profile, for that it seems pointer which is nullable fits well..

hoy added inline comments.Mar 25 2022, 2:07 PM
llvm/tools/llvm-profgen/ProfileGenerator.h
136–139

Yes, pointer type should work for SampleCounters since it's always read-only. For ProfileMap, which is not optional and is mutable during the postprocessing, a pointer type may result in ProfileGenerator changing the Reader's buffer. This doesn't sound right conceptually, though technically this has no difference with std::move stealing the reader buffer. Another issue with using pointer type for ProfileMap is that it has to be explicitly initialized (maybe via unique_ptr) in one of the constructors.

hoy updated this revision to Diff 418323.Mar 25 2022, 2:21 PM

Using pointer type for SampleCounters.

hoy updated this revision to Diff 418324.Mar 25 2022, 2:25 PM

Updating D121655: [llvm-profgen] Read sample profiles for post-processing.

wenlei added inline comments.Mar 25 2022, 3:28 PM
llvm/tools/llvm-profgen/ProfileGenerator.h
136–139

I am not sure if the reasoning to use move ctor makes sense, but I also don't have a strong opinion.

For ProfileMap, which is not optional and is mutable during the postprocessing, a pointer type may result in ProfileGenerator changing the Reader's buffer. This doesn't sound right conceptually, though technically this has no difference with std::move stealing the reader buffer.

Changing reader's buffer is not the problem of the API, and using move ctor API also does not solve the problem. In fact using move may make it worse, because the Reader's buffer become invalid instead of changed.

Another issue with using pointer type for ProfileMap is that it has to be explicitly initialized (maybe via unique_ptr) in one of the constructors.

Why is explicit initialization an *issue*?

hoy added inline comments.Mar 28 2022, 9:39 AM
llvm/tools/llvm-profgen/ProfileGenerator.h
136–139

An explicit initialization may need an explicit destruction, unless some wrapping DS is used like unique_ptr. However, using unique_ptr in one case and raw ptr in the other makes it inconsistent, ie., how do you define the ProfileMap field. Using raw ptrs for both will require an explicit delete for one case and we need a way to tell it apart.

Ideally what comes out of the reader should be readonly, however, it looks like not easy to achieve without duplicating the whole data. Sounds like using move constructor is easy to implement. WDYT?

wenlei added inline comments.Mar 28 2022, 11:32 PM
llvm/tools/llvm-profgen/ProfileGenerator.h
136–139

An explicit initialization may need an explicit destruction, unless some wrapping DS is used like unique_ptr. However, using unique_ptr in one case and raw ptr in the other makes it inconsistent, ie., how do you define the ProfileMap field. Using raw ptrs for both will require an explicit delete for one case and we need a way to tell it apart.

You can let reader own the data (so destruction belongs to reader as well), then ProfileGenerator only takes an escaped raw pointer. explicit initialization is justing assign a value to raw pointer.

Ideally what comes out of the reader should be readonly, however, it looks like not easy to achieve without duplicating the whole data. Sounds like using move constructor is easy to implement. WDYT?

I don't have strong opinion, but I don't see how move achieves what you want, it's not readonly anyways.

hoy added inline comments.Mar 29 2022, 9:27 AM
llvm/tools/llvm-profgen/ProfileGenerator.h
136–139

You can let reader own the data (so destruction belongs to reader as well), then ProfileGenerator only takes an escaped raw pointer. explicit initialization is justing assign a value to raw pointer.

That'd require a change to the reader and the generator's constructor, just to be consistent with the other path.

Right, neither solves the readonly issue. The std::move one seems a bit clearer. I'm inlined to go with it if you're fine with both.

wenlei accepted this revision.Mar 29 2022, 11:44 PM
wenlei added inline comments.
llvm/tools/llvm-profgen/ProfileGenerator.h
136–139

sounds good.

This revision is now accepted and ready to land.Mar 29 2022, 11:44 PM
This revision was landed with ongoing or failed builds.Mar 30 2022, 1:51 PM
This revision was automatically updated to reflect the committed changes.

Hi, I have found that cs-preinline-sample-profile.test is flaky around 10% of the runs. Could you please check what might be the issue?

hoy added a comment.Mar 31 2022, 10:46 AM

Hi, I have found that cs-preinline-sample-profile.test is flaky around 10% of the runs. Could you please check what might be the issue?

Thanks for reporting this issue. Let me double check and try to repro.

Hi, I have found that cs-preinline-sample-profile.test is flaky around 10% of the runs. Could you please check what might be the issue?

Thanks for reporting this issue. Let me double check and try to repro.

An update on this. It appears this new test exposed an existing issue. This should be fixed by D122844.