This is an archive of the discontinued LLVM Phabricator instance.

[llvm-profgen] Clean up unnecessary memory reservations between phases.
ClosedPublic

Authored by hoy on Jan 31 2022, 9:52 PM.

Details

Summary

Cleaning up data structures that are not used after a certain point. This further brings down peak memory usage by 15% for a large benchmark.

Before:

note: Before parsePerfTraces
note: VM: 40.73 GB   RSS: 39.18 GB
note: Before parseAndAggregateTrace
note: VM: 40.73 GB   RSS: 39.18 GB
note: After parseAndAggregateTrace
note: VM: 88.93 GB   RSS: 87.97 GB
note: Before generateUnsymbolizedProfile
note: VM: 88.95 GB   RSS: 87.99 GB
note: After generateUnsymbolizedProfile
note: VM: 93.50 GB   RSS: 92.53 GB
note: After computeSizeForProfiledFunctions
note: VM: 101.13 GB   RSS: 99.36 GB
note: After generateProbeBasedProfile
note: VM: 215.61 GB   RSS: 210.88 GB
note: After postProcessProfiles
note: VM: 237.48 GB   RSS: 212.50 GB

After:

note: Before parsePerfTraces
note: VM: 40.73 GB   RSS: 39.18 GB
note: Before parseAndAggregateTrace
note: VM: 40.73 GB   RSS: 39.18 GB
note: After parseAndAggregateTrace
note: VM: 88.93 GB   RSS: 87.96 GB
note: Before generateUnsymbolizedProfile
note: VM: 88.95 GB   RSS: 87.97 GB
note: After generateUnsymbolizedProfile
note: VM: 93.50 GB   RSS: 92.51 GB
note: After computeSizeForProfiledFunctions
note: VM: 93.50 GB   RSS: 92.53 GB
note: After generateProbeBasedProfile
note: VM: 164.87 GB   RSS: 163.55 GB
note: After postProcessProfiles
note: VM: 182.28 GB   RSS: 179.43 GB

Diff Detail

Event Timeline

hoy created this revision.Jan 31 2022, 9:52 PM
hoy requested review of this revision.Jan 31 2022, 9:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 31 2022, 9:52 PM
hoy edited the summary of this revision. (Show Details)Jan 31 2022, 10:04 PM
hoy added reviewers: wenlei, wlei.
wlei accepted this revision.Jan 31 2022, 10:07 PM

LGTM, thanks for the cleanup!

This revision is now accepted and ready to land.Jan 31 2022, 10:07 PM
wenlei added a comment.Feb 1 2022, 9:20 AM

Good change!

llvm/tools/llvm-profgen/llvm-profgen.cpp
161 ↗(On Diff #404810)

nit: using a lexical scope for this purpose might be better.

hoy added inline comments.Feb 1 2022, 9:40 AM
llvm/tools/llvm-profgen/llvm-profgen.cpp
161 ↗(On Diff #404810)

Lexical scope is nice, but in this case Reader and Generator have overlapping lifetimes, i.e, we need Reader to be released before Generator->write(); but after Generator->generateProfile();. So one lexical scope including both objects may not work.

Actually maybe we can define Reader->getSampleCounters() as a unique_ptr too, so that when it is transferred to Generator, Generator will take its ownership and release it when appropriate.

wenlei added inline comments.Feb 1 2022, 9:49 AM
llvm/tools/llvm-profgen/llvm-profgen.cpp
161 ↗(On Diff #404810)

Lexical scope is nice, but in this case Reader and Generator have overlapping lifetimes, i.e, we need Reader to be released before Generator->write(); but after Generator->generateProfile();. So one lexical scope including both objects may not work.

I was thinking about hoisting the unique_ptr for Generator above, initialized with null, then set to the real thing after reader is available. This way the lexical scope only applies to reader.

Actually maybe we can define Reader->getSampleCounters() as a unique_ptr too, so that when it is transferred to Generator, Generator will take its ownership and release it when appropriate.

I'm not sure if this is a good idea. With this, we can leave Reader in a corrupted state after calling the getter, which is probably not a good API design.

hoy added inline comments.Feb 1 2022, 10:04 AM
llvm/tools/llvm-profgen/llvm-profgen.cpp
161 ↗(On Diff #404810)

So far Reader is released between two Generator invocations. This relies on the knowledge that Reader is only used in first invocation. I was thinking that Reader could be released immediately after the needed data is passed into Generator. Usually this would be done by passing SampleCounters by value from Reader into Generator. We are now passing it by reference which holds reader live until Generator doesn't use it. This seems a tangled definition without a clear boundary between the two objects. Maybe defining SampleCounters as a shared pointer is better?

I was thinking about hoisting the unique_ptr for Generator above, initialized with null, then set to the real thing after reader is available. This way the lexical scope only applies to reader.

Yeah, this way should work. The adjusted code look like below. Still feels a bit weird. Maybe just make the release call explicitly if we don't want to change the api?

PerfInputFile PerfFile = getPerfInputFile();
std::unique_ptr<ProfileGeneratorBase> Generator;

{
    std::unique_ptr<PerfReaderBase> Reader =
        PerfReaderBase::create(Binary.get(), PerfFile);
    // Parse perf events and samples
    Reader->parsePerfTraces();

    if (SkipSymbolization)
      return EXIT_SUCCESS;

    Generator =
      ProfileGeneratorBase::create(Binary.get(), Reader->getSampleCounters(),
                                   Reader->profileIsCSFlat());
    Generator->generateProfile();
  }

  Generator->write();
wenlei added inline comments.Feb 1 2022, 10:52 AM
llvm/tools/llvm-profgen/llvm-profgen.cpp
161 ↗(On Diff #404810)

I don't have strong opinion. It's ok to leave it as is, but add a comment to explain why the explicit release/scope is needed.

Still feels a bit weird.

It is just as weird as a release in the middle. :)

hoy added inline comments.Feb 1 2022, 12:11 PM
llvm/tools/llvm-profgen/llvm-profgen.cpp
161 ↗(On Diff #404810)

It is just as weird as a release in the middle. :)

Agreed. I hope we'll have some time to readdress the api design in the future.

hoy updated this revision to Diff 405041.Feb 1 2022, 12:11 PM
hoy edited the summary of this revision. (Show Details)

Updating D118677: [llvm-profgen] Clean up unnecessary memory reservations between phases.

wenlei accepted this revision.Feb 1 2022, 12:12 PM

lgtm, thanks.

This revision was landed with ongoing or failed builds.Feb 1 2022, 12:48 PM
This revision was automatically updated to reflect the committed changes.
hoy reopened this revision.Feb 1 2022, 2:42 PM
hoy added inline comments.
llvm/tools/llvm-profgen/llvm-profgen.cpp
163 ↗(On Diff #405070)

Should call reset instead of release.

This revision is now accepted and ready to land.Feb 1 2022, 2:42 PM
hoy updated this revision to Diff 405108.Feb 1 2022, 2:47 PM

Updating D118677: [llvm-profgen] Clean up unnecessary memory reservations between phases.

hoy added inline comments.Feb 1 2022, 3:44 PM
llvm/tools/llvm-profgen/llvm-profgen.cpp
163 ↗(On Diff #405070)

OK, looks like Reader is needed by Generator->write(), especially for UnsymbolizedProfileReader, where the context strings are built inside the reader. I'll remove the release here. This is not on the critical path. The critical path is Generator->generateProfile();.

hoy updated this revision to Diff 405114.Feb 1 2022, 3:45 PM

Updating D118677: [llvm-profgen] Clean up unnecessary memory reservations between phases.

wenlei added a comment.Feb 1 2022, 3:46 PM

Is the memory reduction in the description still accurate?

hoy added a comment.Feb 1 2022, 3:48 PM

Is the memory reduction in the description still accurate?

Yes, since that was measured based on using unique_ptr release which didn't really free anything.

hoy added a comment.Feb 1 2022, 3:54 PM

Thanks for letting me know. The commit has been reverted. I'll recheck it in with a fix.

This revision was landed with ongoing or failed builds.Feb 1 2022, 4:29 PM
This revision was automatically updated to reflect the committed changes.