Debug info correlation is an option in InstrProfiling pass, which is used by
both IR instrumentation and front-end instrumentation. So, Clang coverage can
also benefits the binary size saving from it.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Update.
Binary may still contains __llvm_prf_names sections for functions that are not emitted.
Thanks for working on this! I've added a few other reviewers that might be interested. In particular it might conflict with the stack D138846 IIRC
It seems that the __llvm_prf_names is retained in this mode. What is the overhead of this section generally? Can we instead use debug info to lookup function names?
It seems that the __llvm_prf_names is retained in this mode. What is the overhead of this section generally? Can we instead use debug info to lookup function names?
With debug info correlation enabled, __llvm_prf_names section will only contain functions names that are not emitted to the final binary, not even in debug info. So, this section is still much smaller compared to when not enabling debug info correlation. This is used to indicate that those functions are covered but not executed.
Example from the test case above:
$ cat a.cpp struct A { void unused_func() {} }; void used_func() {} int main() { used_func(); return 0; } $ clang -fprofile-instr-generate -fcoverage-mapping -mllvm -enable-name-compression=false -mllvm -debug-info-correlate -g a.cpp -o a.out $ llvm-objdump --section=__llvm_prf_names --full-contents a.out a.out: file format elf64-x86-64 Contents of section __llvm_prf_names: 7b65 14005f5a 4e314131 31756e75 7365645f .._ZN1A11unused_ 7b75 66756e63 4576 funcEv
Thanks for the explanation!
clang/test/CodeGen/coverage-debug-info-correlate.c | ||
---|---|---|
1 ↗ | (On Diff #554048) | I think this test can be generalized to check that __llvm_profile_raw_version is emitted with or without debug info correlation. |
compiler-rt/test/profile/Darwin/coverage-debug-info-correlate.cpp | ||
10 | NIT | |
30 | Is it worth testing that %t.normal.profdata emits the same coverage? | |
llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp | ||
1074–1089 | I don't quite understand why this was changed. Is there is a test case that covers this? |
compiler-rt/test/profile/Darwin/coverage-debug-info-correlate.cpp | ||
---|---|---|
30 | Added testing for %t.normal.profdata. | |
llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp | ||
1074–1089 | This part of code is used by llvm-cov. Originally, it always checks if the instrumented binary contains the __llvm_prf_names section and assumes all instrumented function names are store there. Now, it will firstly create InstrProfSymtab from the indexed profile file and add remaining function names to this InstrProfSymtab from __llvm_prf_names section in the instrumented binary, if there is one. This is covered by // RUN: llvm-cov export --format=lcov --instr-profile=%t.profdata %t | FileCheck %s -check-prefix=NAME, where _Z9used_funcv and main are stored in debug info and later moved to indexed profile file, and _ZN1A11unused_funcEv is stored in __llvm_prf_names section in the binary. |
Yes, I think it's feasible by creating a fake variable debug info that contains all the skipped function names in the source file for each object file, but that's a bit more overhead in debug info compared to just plain __llvm_prf_names in binary, which is only about 0.2% or less of original size [1].
[1]: https://bugs.chromium.org/p/chromium/issues/detail?id=1463755#c8
Since the .debug_str section can be stripped before execution time, I don't count that towards the binary size overhead. But I believe the __llvm_prf_names section cannot be stripped as is. So it would be an improvement if we could completely remove the __llvm_prf_names section and emit names in the .debug_str section.
I see that you have one case where the overhead of the names section is just 0.2%. I wonder if this is common or if there exist real-world scenarios where the overhead is more significant. If not then leaving the names section in the binary might be ok, but we should probably leave a note explaining how we could remove it if we emit the names in debug info.
Remove profile name section.
Unused coverage function names are stored in debug info as a fake global variable info:
0x0000000b: DW_TAG_compile_unit ... 0x00000071: DW_TAG_variable DW_AT_name ("__llvm_coverage_names") DW_AT_type (0x00000097 "Coverage Type") DW_AT_location (DW_OP_addr 0x0) 0x00000084: DW_TAG_LLVM_annotation DW_AT_name ("Cov Function Name") DW_AT_const_value ("bar") 0x0000008d: DW_TAG_LLVM_annotation DW_AT_name ("Cov Function Name") DW_AT_const_value ("baz")
llvm-cov will need to retrieve those names from the binary's debug info.
The reasons for not writing those names into the indexed profile file are following:
- llvm-cov always need to read llvm_covfun and llvm_covmap from the unstripped binary, so it doesn't matter if those names are read from the binary or the indexed profile.
- Since those names are just an array of strings, they don't have counter/data info. When llvm-profdata writes indexed profile, it iterates profile data to write records, so those unreferenced names are skipped even if I append them to the end of profile symtab. I'm not sure how we can simply append those names into the end of function name strings at indexed profile.
Overall, there are basically two types of function names:
- Instrumented function names, which are stored in dwarf and later copied to indexed profile. llvm-cov reads them from indexed profile.
- Unused function names(could be empty), which are stored in dwarf. llvm-cov reads them from the debug info.
It looks like debug-info-correlate-coverage.ll was renamed twice. Is this intended?
compiler-rt/lib/profile/InstrProfilingWriter.c | ||
---|---|---|
276–277 ↗ | (On Diff #556805) | Can we remove changes to this file now that we are stripping the names section? |
llvm/lib/ProfileData/InstrProfCorrelator.cpp | ||
423 | I think this return should be on the line above to break out of the for loop early. Or did I miscount the brackets? | |
llvm/lib/ProfileData/InstrProfReader.cpp | ||
577–578 ↗ | (On Diff #556805) | I think these changes can also be removed |
llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp | ||
769 | Just so that I understand. Does this code run only for names of unused functions? Or will we store all function names with CovFunctionNameAnnotation? |
I just moved llvm/test/Instrumentation/InstrProfiling/debug-info-correlate-coverage.ll. to llvm/test/Instrumentation/InstrProfiling/debug-info-correlate-byte-coverage.ll and created a new test file llvm/test/Instrumentation/InstrProfiling/debug-info-correlate-clang-coverage.ll.
compiler-rt/lib/profile/InstrProfilingWriter.c | ||
---|---|---|
276–277 ↗ | (On Diff #556805) | Reverted all changes in this file. |
llvm/lib/ProfileData/InstrProfCorrelator.cpp | ||
423 | Removed that if. It's already checked inside IsDIEOfCovName. | |
llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp | ||
769 | Just the names of unused functions. All instrumented functions names will still be stored at the same place as before. |
Got it! Sorry I got confused by the UI.
Sounds good to me! I'm excited to see clang instrumentation can take full advantage of debug info correlation!
Looks like this breaks check-clang on mac: http://45.33.8.238/macm1/69297/step_7.txt
Please take a look and revert for now if it takes a while to fix.
llvm/lib/ProfileData/InstrProfCorrelator.cpp | ||
---|---|---|
42 | I missed this earlier, but I think that spelling this out in full, that is Coverage Function Name, would be more consistent with the Coverage Type. |
llvm/lib/ProfileData/InstrProfCorrelator.cpp | ||
---|---|---|
42 |
For some unknown reasons, linker still generates __llvm_prf_names section in the final binary on mac even if the object file doesn't have this section. And I cannot use llvm-objdump to get anything from that section.
$ llvm-readelf -S ../tmp/a.out ... Section { Index: 15 Name: __llvm_prf_names (5F 5F 6C 6C 76 6D 5F 70 72 66 5F 6E 61 6D 65 73) Segment: __DATA (5F 5F 44 41 54 41 00 00 00 00 00 00 00 00 00 00) Address: 0x100018000 Size: 0x0 Offset: 98304 Alignment: 0 RelocationOffset: 0x0 RelocationCount: 0 Type: Regular (0x0) Attributes [ (0x0) ] Reserved1: 0x0 Reserved2: 0x0 Reserved3: 0x0 } ... $ llvm-objdump --section=__llvm_prf_names --full-contents ../tmp/a.out ../tmp/a.out: file format mach-o arm64
I will delete the test for now as it's not working on mac: compiler-rt/test/profile/Darwin/coverage-debug-info-correlate.cpp
I have seen that MachO sections remain in the header even if the size is zero which seems to be the case here. You could use llvm-objdump to verify that the section exists but the size is zero.
I feel like either this feature not working well with mac or we should keep __llvm_prf_names section in the binary.
Current idea is to completely get rid of __llvm_prf_names section in the binary. Even if I fixed the issue https://reviews.llvm.org/D157913#4648296, llvm-cov will need to read information from 3 different places on mac: the binary (for __llvm_cov_fun and __llvm_cov_map), debug info (unused function names, which could live in __llvm_prf_names) and indexed profile file for (counter, data, instrumented function names). If we keep __llvm_prf_names in the binary, then llvm-cov doesn't need to look into debug info again.
We've been investigating a serious performance regression in our latest Clang roll that is affecting coverage. Whereas previously, we could process all coverage data in under an hour, now the job times out after 6 hours. After investigation, I found out that this change is the culprit. Specifically, it seems like llvm-cov has gotten significantly slower.
Before this change:
$ time llvm-cov show -instr-profile merged.profdata -summary-only e91b5a1a324aa156 llvm-cov 0.92s user 0.12s system 99% cpu 1.038 total
After this change:
$ time llvm-cov show -instr-profile merged.profdata -summary-only e91b5a1a324aa156 llvm-cov 5.61s user 0.89s system 99% cpu 6.495 total
I did some investigation and it seems like the time increase is related to InstrProfSymtab that used in the CoverageReader. I saw that subsequent changes fix some issues trying to avoid unnecessary copies, but that still hasn't addressed the slowdown since the tip-of-tree is still exhibiting this issue.
It's also possible that there isn't any issue that was introduced here, the problem is that the implementation simply doesn't scale. For example, the binary e91b5a1a324aa156 I used here has 95527 symbol entries, most of them are the __covrec_* symbols, and that's one of the smaller binaries we have in our build.
Regardless, this is still a serious regression and I'm wondering if there's anything we can do short of reverting this change?
NIT