Sometimes we would like to run post-processing repeatedly on the original sample profile for tuning. In order to avoid regenerating the original profile from scratch every time, this change adds the support of reading in the original profile (called symbolized profile) and running the post-processor on it.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
original non-modified profile
nit: rephrase to be clearer what it means, or just simplify "llvm pgo profile".
llvm/test/tools/llvm-profgen/cold-profile-trimming-symbolized.test | ||
---|---|---|
3 | Maybe we should probably support no-diff llvm profile to llvm profile conversion? You mentioned the call site samples as a difference and I'm wondering whether this is an easy to fix difference. | |
llvm/tools/llvm-profgen/PerfReader.h | ||
68 ↗ | (On Diff #415288) | I'd prefer not to introduce a new term, given we already have a proper name for it - LLVM PGO profile. Also this is not really PerfFormat anyways. I think it's fine and probably clearer to create a side channel to deal with llvm profile as input, rather than trying to fit in the existing perf format/reader structure. Taking llvm profile as input is indeed a special mode and use of llvm-profgen anyways. |
730 ↗ | (On Diff #415288) | This doesn't look right. This is no longer a perf profile, but rather a proper LLVM profile, so inheriting from PerfReaderBase isn't good. As you can see, there's really no notion of trace, so parsePerfTraces really just calls LLVM's profile reader. Maybe instead of passing a reader to profile generator, we can just either pass the counters like we do today, or a ProfileMap which is what profile reader gives us. This may need two separate ctor for profile generator. But I think that's better than the (awkward) inheritance we have here. |
llvm/tools/llvm-profgen/llvm-profgen.cpp | ||
53 | This is just PGO profile right? If so, I don't think we need yet another name for it. We can say something like llvm-pgo-profile to be clear. Do you intend to make this work for all kinds of pgo profiles as input? like CSSPGO/AutoFDO, flat vs nest etc. |
llvm/tools/llvm-profgen/PerfReader.h | ||
---|---|---|
730 ↗ | (On Diff #415288) | I was thinking about that. Not only for the profile generator, the reader would also need a different constructor or a different type hierarchy and the work flow of llvm-profgen may not look as clean as now. As shown below (from llvm-profgen::main), the current approach tries to reuse the PerfReaderBase and ProfileGeneratorBase. I guess it's a tradeoff between adding a new base class for Reader or diverging the workflow structure a little bit. PerfInputFile PerfFile = getPerfInputFile(); std::unique_ptr<PerfReaderBase> Reader = PerfReaderBase::create(Binary.get(), PerfFile); // Parse perf events and samples Reader->parsePerfTraces(); if (SkipSymbolization) return EXIT_SUCCESS; std::unique_ptr<ProfileGeneratorBase> Generator = ProfileGeneratorBase::create(Binary.get(), Reader->getSampleCounters(), Reader->profileIsCSFlat()); Generator->generateProfile(); Generator->write(); |
llvm/tools/llvm-profgen/llvm-profgen.cpp | ||
53 | llvm-pgo-profile sounds better. Yes, I'd like all kinds for sample profiles to be read and postpreocessed by the tool. |
llvm/tools/llvm-profgen/PerfReader.h | ||
---|---|---|
730 ↗ | (On Diff #415288) |
It'd be a very thin wrapper over the actually llvm profile reader. Maybe a standalone function is all we need, instead of a new reader class. we take file as input, produced ProfileMap and pass it to generator. The use case and workflow is indeed special, it's like a side channel not tied to the main use of the tool (raw profile to llvm profile). Forcing that special workflow to use the same structure of the rest creates the problems. Yes, it's tradeoff, but shove llvm profile reader under the hierarchy of perf reader doesn't look like a good one to me. |
llvm/test/tools/llvm-profgen/cold-profile-trimming-symbolized.test | ||
---|---|---|
3 | The check here is to make sure the profile post-processing is effective. There's no cross-type profile conversion here. The non-diff profile conversion I mentioned was about converting CS flat profile to CS nest which should be independent of this diff. I can make a separate fix for that though. |
Moving pseudo probe decoding out of probe profile generation so that it'll run for sample profile input case.
llvm/tools/llvm-profgen/ProfileGenerator.cpp | ||
---|---|---|
423–427 | So for pgo profile as input, we still need probe decoding because preinliner needs context size based on profile, right? | |
llvm/tools/llvm-profgen/ProfileGenerator.h | ||
135–139 | Both Reader->getSampleCounters() and Reader->getProfiles() return by reference so there should be no copy, why do we need move ctor here? |
llvm/tools/llvm-profgen/ProfileGenerator.cpp | ||
---|---|---|
423–427 | Exactly. | |
llvm/tools/llvm-profgen/ProfileGenerator.h | ||
135–139 | This is because previously the SampleCounters field is a reference and it must be initialized in constructors. I'm now changing it to a value field and using the move constructor to initialize it when needed. |
llvm/tools/llvm-profgen/ProfileGenerator.h | ||
---|---|---|
135–139 | Ok, move still has some cost as a new object is constructed. In this case can we just use pointer type instead? IIUC, the change is to allow them to be "optional" depending on whether input is perf profile or llvm profile, for that it seems pointer which is nullable fits well.. |
llvm/tools/llvm-profgen/ProfileGenerator.h | ||
---|---|---|
135–139 | Yes, pointer type should work for SampleCounters since it's always read-only. For ProfileMap, which is not optional and is mutable during the postprocessing, a pointer type may result in ProfileGenerator changing the Reader's buffer. This doesn't sound right conceptually, though technically this has no difference with std::move stealing the reader buffer. Another issue with using pointer type for ProfileMap is that it has to be explicitly initialized (maybe via unique_ptr) in one of the constructors. |
llvm/tools/llvm-profgen/ProfileGenerator.h | ||
---|---|---|
135–139 | I am not sure if the reasoning to use move ctor makes sense, but I also don't have a strong opinion.
Changing reader's buffer is not the problem of the API, and using move ctor API also does not solve the problem. In fact using move may make it worse, because the Reader's buffer become invalid instead of changed.
Why is explicit initialization an *issue*? |
llvm/tools/llvm-profgen/ProfileGenerator.h | ||
---|---|---|
135–139 | An explicit initialization may need an explicit destruction, unless some wrapping DS is used like unique_ptr. However, using unique_ptr in one case and raw ptr in the other makes it inconsistent, ie., how do you define the ProfileMap field. Using raw ptrs for both will require an explicit delete for one case and we need a way to tell it apart. Ideally what comes out of the reader should be readonly, however, it looks like not easy to achieve without duplicating the whole data. Sounds like using move constructor is easy to implement. WDYT? |
llvm/tools/llvm-profgen/ProfileGenerator.h | ||
---|---|---|
135–139 |
You can let reader own the data (so destruction belongs to reader as well), then ProfileGenerator only takes an escaped raw pointer. explicit initialization is justing assign a value to raw pointer.
I don't have strong opinion, but I don't see how move achieves what you want, it's not readonly anyways. |
llvm/tools/llvm-profgen/ProfileGenerator.h | ||
---|---|---|
135–139 |
That'd require a change to the reader and the generator's constructor, just to be consistent with the other path. Right, neither solves the readonly issue. The std::move one seems a bit clearer. I'm inlined to go with it if you're fine with both. |
llvm/tools/llvm-profgen/ProfileGenerator.h | ||
---|---|---|
135–139 | sounds good. |
Hi, I have found that cs-preinline-sample-profile.test is flaky around 10% of the runs. Could you please check what might be the issue?
An update on this. It appears this new test exposed an existing issue. This should be fixed by D122844.
Maybe we should probably support no-diff llvm profile to llvm profile conversion? You mentioned the call site samples as a difference and I'm wondering whether this is an easy to fix difference.