This is an archive of the discontinued LLVM Phabricator instance.

[InstrProf] Prevent duplicate functions in correlated data
ClosedPublic

Authored by ellis on Dec 20 2021, 12:14 PM.

Details

Summary

When using debug info for profile correlation, avoid adding duplicate
functions in the synthetic Data section.

Before this patch, n duplicate function entries in the Data section would
cause counter values to be a factor of n larger. I built instrumented
clang with and without debug info correlation and got these summaries.

# With Debug Info Correlate
$ llvm-profdata show default.profdata
Instrumentation level: IR  entry_first = 0
Total functions: 182530
Maximum function count: 52034
Maximum internal block count: 5763

# Without
$ llvm-profdata show default.profdata
Instrumentation level: IR  entry_first = 0
Total functions: 183212
Maximum function count: 52034
Maximum internal block count: 5766

The slight difference in counts seem to be mostly from FileSystem and
Map functions and the difference in the number of instrumented functions
seems to come from missing debug info like destructors without source.

Diff Detail

Event Timeline

ellis created this revision.Dec 20 2021, 12:14 PM
ellis published this revision for review.Dec 20 2021, 12:16 PM
ellis edited the summary of this revision. (Show Details)
ellis added reviewers: kyulee, MaskRay.
Herald added a project: Restricted Project. · View Herald TranscriptDec 20 2021, 12:17 PM

I presume without debug-info related correlation (normal IRPGO), the metadata is merged by their linkage types/contents.
For my education, I wonder whether IRPGO might have the same named function with different different CFG shapes -- e.g., static functions.
This approach seems to pick the first appearance of metadata for the same named functions, and ignore the rest assuming they are all identical.
And also can we have a test case for this?

ellis added a comment.Dec 20 2021, 1:49 PM

I presume without debug-info related correlation (normal IRPGO), the metadata is merged by their linkage types/contents.
For my education, I wonder whether IRPGO might have the same named function with different different CFG shapes -- e.g., static functions.
This approach seems to pick the first appearance of metadata for the same named functions, and ignore the rest assuming they are all identical.

IRPGO uses GlobalValue::getGlobalIdentifier() as the function name, which will prepend the filename if necessary. https://llvm.org/doxygen/Globals_8cpp_source.html#l00133
I suppose it might be more accurate to track the counter pointers and add an assert that if there are two entries with the same counter pointer, then they must have the same function name.

And also can we have a test case for this?

I can try to find a reproducer, but I assume it would require more than one module.

ellis updated this revision to Diff 395762.Dec 21 2021, 3:08 PM

Use CounterOffset to find duplicate probes and refactor the compiler-rt test to check this case.

Herald added a project: Restricted Project. · View Herald TranscriptDec 21 2021, 3:08 PM
Herald added a subscriber: Restricted Project. · View Herald Transcript
ellis updated this revision to Diff 395911.Dec 22 2021, 11:39 AM

Clear CounterOffsets set to save memory.

Makes sense to check deduplication using counter address instead of function name.
LGTM.

kyulee accepted this revision.Dec 28 2021, 12:38 PM
This revision is now accepted and ready to land.Dec 28 2021, 12:38 PM
This revision was landed with ongoing or failed builds.Dec 28 2021, 2:21 PM
This revision was automatically updated to reflect the committed changes.