This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO] Do not import pseudo probe desc in thinLTO
ClosedPublic

Authored by hoy on Jun 30 2021, 4:55 PM.

Details

Summary

Previously we reliedy on pseudo probe descriptors to look up precomputed GUID during probe emission for inlined probes. Since we are moving to always using unique linkage names, GUID for functions can be computed in place from dwarf names. This eliminates the need of importing pseudo probe descs in thinlto, since those descs should be emitted by the original modules.

This significantly reduces thinlto memory footprint in some extreme case where the number of imported modules for a single module is massive.

Test Plan:

Diff Detail

Event Timeline

hoy created this revision.Jun 30 2021, 4:55 PM
hoy requested review of this revision.Jun 30 2021, 4:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2021, 4:55 PM
hoy edited reviewers, added: wlei; removed: wle358.

Thanks for the change and cleanup. Do we see identical probe and probe_desc section for final binary for larger builds w/ and w/o this change?

hoy added a comment.EditedJun 30 2021, 5:31 PM

Thanks for the change and cleanup. Do we see identical probe and probe_desc section for final binary for larger builds w/ and w/o this change?

That's a good point. I have tried with a giant binary, and the generated probe sections are decodable. Haven't compared smaller workloads. Let me do that.

hoy updated this revision to Diff 355996.Jul 1 2021, 1:09 PM

Updating D105248: [CSSPGO] Do not import pseudo probe desc in thinLTO

hoy added a comment.EditedJul 1 2021, 1:10 PM

Verified that for SPEC, the generated profiles w/ and w/o this changes are very close. The runtime performance of pass2 has no difference two. Regarding the pseudo probe sections, the .pseudo_probe section is exactly the same for all benchmarks. The .pseudo_probe_desc sections are mostly the same, except for a couple benchmarks, for which the probe decoding gives exact result. There might be a glitch in setting the comdat group name for duplicated functions, like the ones with .llvm. suffix, or an issue of our internal thinlto dedup work.

wmi added a comment.Jul 2 2021, 10:02 AM

-funique-internal-linkage-names is not enabled by default. Will -fpseudo-probe-for-profiling implicitly enable -funique-internal-linkage-names?

-funique-internal-linkage-names is not enabled by default. Will -fpseudo-probe-for-profiling implicitly enable -funique-internal-linkage-names?

We still pass -funique-internal-linkage-names separately. I think the situation is similar to line-based AFDO.

Verified that for SPEC, the generated profiles w/ and w/o this changes are very close. The runtime performance of pass2 has no difference two. Regarding the pseudo probe sections, the .pseudo_probe section is exactly the same for all benchmarks. The .pseudo_probe_desc sections are mostly the same, except for a couple benchmarks, for which the probe decoding gives exact result. There might be a glitch in setting the comdat group name for duplicated functions, like the ones with .llvm. suffix, or an issue of our internal thinlto dedup work.

Thanks for checking. For pseudo_probe_desc, do you know what is the actual difference, and why decoding leads to same output with different pseudo_probe_desc?

hoy added a comment.Jul 6 2021, 3:40 PM

Verified that for SPEC, the generated profiles w/ and w/o this changes are very close. The runtime performance of pass2 has no difference two. Regarding the pseudo probe sections, the .pseudo_probe section is exactly the same for all benchmarks. The .pseudo_probe_desc sections are mostly the same, except for a couple benchmarks, for which the probe decoding gives exact result. There might be a glitch in setting the comdat group name for duplicated functions, like the ones with .llvm. suffix, or an issue of our internal thinlto dedup work.

Thanks for checking. For pseudo_probe_desc, do you know what is the actual difference, and why decoding leads to same output with different pseudo_probe_desc?

It appeared due to a glitch of our internal thinlto comdat deduplication change. Only part of the duplicated comdat groups were removed. llvm-profgen can tolerate that by taking only the last duplicate.

wenlei accepted this revision.Jul 13 2021, 5:30 PM

It appeared due to a glitch of our internal thinlto comdat deduplication change. Only part of the duplicated comdat groups were removed. llvm-profgen can tolerate that by taking only the last duplicate.

I think it'd be good to track it down why that happens - the expectation is to have identical probe_desc before/after this change. Though this change lgtm.

This revision is now accepted and ready to land.Jul 13 2021, 5:30 PM
This revision was landed with ongoing or failed builds.Jul 13 2021, 6:26 PM
This revision was automatically updated to reflect the committed changes.