Page MenuHomePhabricator

[InstrProfiling] If no value profiling, make data variable private and (for Windows) use one comdat
ClosedPublic

Authored by MaskRay on Sat, May 29, 2:32 PM.

Details

Summary

__profd_* variables are referenced by code only when value profiling is
enabled. If disabled (e.g. default -fprofile-instr-generate), the symbols just
waste space on ELF/Mach-O. We change the comdat symbol from __profd_* to
__profc_* because an internal symbol does not provide deduplication features
on COFF. The choice doesn't matter on ELF.

(In -DLLVM_BUILD_INSTRUMENTED_COVERAGE=on build, there is now no __profd_* symbols.)

On Windows this enables further optimization. We are no longer affected by the
link.exe limitation: an external symbol in IMAGE_COMDAT_SELECT_ASSOCIATIVE can
cause duplicate definition error.
https://lists.llvm.org/pipermail/llvm-dev/2021-May/150758.html
We can thus use llvm.compiler.used instead of llvm.used like ELF (D97585).
This avoids many /INCLUDE: directives in .drectve.

Here is rnk's measurement for Chrome:

This reduced object file size of base_unittests.exe, compiled with coverage, optimizations, and gmlt debug info by 10%:

#BEFORE

$ find . -iname '*.obj' | xargs du -b | awk '{ sum += $1 } END { print sum}'
1047758867

$ du -cksh base_unittests.exe
82M     base_unittests.exe
82M     total

# AFTER

$ find . -iname '*.obj' | xargs du -b | awk '{ sum += $1 } END { print sum}'
937886499

$ du -cksh base_unittests.exe
78M     base_unittests.exe
78M     total

The change is NFC for Mach-O.

Diff Detail

Event Timeline

MaskRay created this revision.Sat, May 29, 2:32 PM
MaskRay requested review of this revision.Sat, May 29, 2:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptSat, May 29, 2:32 PM
davidxl added inline comments.Sun, May 30, 3:33 PM
llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
854

Probably define a helper : profDataReferencedInCode(). Currently it is the same as enablesValueProfiling, but may change in the future. It also makes code more readable.

862

why changing behavior on ELF?

MaskRay marked an inline comment as done.Tue, Jun 1, 10:09 AM
MaskRay added inline comments.
llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
862

COFF now uses __profc_* as the comdat symbol. __profd_* is always internal for non-value-profiling cases. So the group name must be __profc_*.

For ELF for implementation simplicity we use the same __profc_*.

MaskRay updated this revision to Diff 349012.Tue, Jun 1, 10:24 AM
MaskRay marked an inline comment as done.
MaskRay edited the summary of this revision. (Show Details)

define profDataReferencedbyCode()

Herald added a project: Restricted Project. · View Herald TranscriptTue, Jun 1, 10:24 AM
Herald added a subscriber: Restricted Project. · View Herald Transcript
MaskRay updated this revision to Diff 349013.Tue, Jun 1, 10:24 AM

remove accidental a.out

davidxl accepted this revision.Tue, Jun 1, 3:35 PM

lgtm, but wait for Reid and Vedant's comment.

This revision is now accepted and ready to land.Tue, Jun 1, 3:35 PM
vsk added a comment.Wed, Jun 2, 2:11 PM

While I haven't spotted any issues with this patch, I feel it'd be better for someone with more familiarity with COFF linking to offer a second +1.

rnk added a comment.Wed, Jun 2, 2:58 PM

Thanks! This is really cool. It will probably help reduce coverage object file size by a lot. Let me see if I can get some numbers for that.

BTW, if you are adventurous, Chromium does support cross-compiling from Linux to Windows, so you can do everything short of running tests from a Linux machine. It is then possible to execute those binaries remotely on Chrome swarming test infra, although that may require some ACLs and is more involved.

There is a possible behavior change here with linker GC. Previously, the /include: directives would make __profd_* into a GC root. Now, the __profd is kept alive by .text references to __profc symbols. If GC removes the text keeping some counters and data alive, the binary will no longer contain profile data for that function. I believe the coverage tools display "no info" different from "counters present and zero". I think this is OK in our usage, and if it isn't, there is an easy workaround: disable GC for coverage builds.

rnk accepted this revision.Thu, Jun 3, 11:05 AM

lgtm

This reduced object file size of base_unittests.exe, compiled with coverage, optimizations, and gmlt debug info by 10%:

#BEFORE

$ find . -iname '*.obj' | xargs du -b | awk '{ sum += $1 } END { print sum}'
1047758867

$ du -cksh base_unittests.exe
82M     base_unittests.exe
82M     total


# AFTER

$ find . -iname '*.obj' | xargs du -b | awk '{ sum += $1 } END { print sum}'
937886499

$ du -cksh base_unittests.exe
78M     base_unittests.exe
78M     total

I should've precisely measured executable size in bytes, but the exe size reduction is roughly 5%. Linker GC was enabled, so I'm assuming that 5% is from dropping profile data for unreferenced counters.

Inspecting a random object file shows that we are still adding __covrec_* to llvm.used, and that accounts for lots of /include: directives.

Thanks for testing! I have figured out how to test check-profile on Windows. Will try to study how to cross compile Chromium on Linux. for Windows.

MaskRay updated this revision to Diff 349668.Thu, Jun 3, 1:11 PM
MaskRay edited the summary of this revision. (Show Details)

Replace one enablesValueProfiling with profDataReferencedByCode

MaskRay edited the summary of this revision. (Show Details)Thu, Jun 3, 1:12 PM
This revision was landed with ongoing or failed builds.Thu, Jun 3, 1:16 PM
This revision was automatically updated to reflect the committed changes.

This breaks instrprof-dynamic-one-shared.test instrprof-dynamic-two-shared.test in check-profile on macOS. I reverted this for now in e9a9c850989e2392b7f16dea2449312c15bd0240

https://bugs.chromium.org/p/chromium/issues/detail?id=1216425 has detailed output, but it repros locally without problems.

MaskRay reopened this revision.Fri, Jun 4, 10:34 AM
This revision is now accepted and ready to land.Fri, Jun 4, 10:34 AM
MaskRay updated this revision to Diff 349915.Fri, Jun 4, 10:35 AM

Re-upload after revert

Herald added a project: Restricted Project. · View Herald TranscriptFri, Jun 4, 10:35 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

@vsk Do you mind investigating macOS Posix/instrprof-dynamic-one-shared.test and Posix/instrprof-dynamic-two-shared.test failures on https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket.appspot.com/8845381350758503248/+/steps/package_clang/0/stdout ? :) I don't have a macOS machine.

I think the only behavior change for macOS is this block:

// Data variables can be private if not referenced by code.
if (!DataReferencedByCode) {
  Linkage = GlobalValue::PrivateLinkage;
  Visibility = GlobalValue::DefaultVisibility;
}

If not easy to fix, I'll just exclude isOSBinFormatMachO() for now...

rnk added a comment.Fri, Jun 4, 11:38 AM

I think MachO doesn't have comdats, so we need to leave the __profd_ linkage as it was. I think private linkage also prevents atomization, which inhibits GC / dead stripping. I don't think this change really makes sense for MachO, so you could reasonably reland with that change.

llvm/test/Instrumentation/InstrProfiling/linkage.ll
64

I think we don't want this change for macho. If we make __profd_ private on Mac, the linker won't be able to deduplicate these __profd globals, and there will be multiple copies pointing to the same counters.

MaskRay updated this revision to Diff 349953.Fri, Jun 4, 1:17 PM
MaskRay edited the summary of this revision. (Show Details)

Exclude Mach-O.

MaskRay updated this revision to Diff 349956.Fri, Jun 4, 1:27 PM

better description

This revision was landed with ongoing or failed builds.Fri, Jun 4, 1:28 PM
This revision was automatically updated to reflect the committed changes.
MaskRay added inline comments.Fri, Jun 4, 1:44 PM
llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
935

I am conservative here. For ELF __profd_* can be unconditionally private, even in DataReferencedByCode == true mode. This will decrease object file sizes for regular PGO.

But there may be too many changes now. I'll postpone doing this after this proves to be stable.

MaskRay added a comment.EditedFri, Jun 4, 11:59 PM

I measured a stage 2 -DLLVM_TARGETS_TO_BUILD=X86 -DCMAKE_BUILD_TYPE=Release -DLLVM_BUILD_INSTRUMENTED_COVERAGE=on build on Linux x86-64.

% ls -1 /tmp/out/s3-custom/**/*.o(.) | wc -l
2174

% stat -c %s /tmp/out/s3-custom/**/*.o(.) | awk '{s+=$1}END{print s}'
6174683864
% stat -c %s /tmp/out/s2-custom/**/*.o(.) | awk '{s+=$1}END{print s}'                                                                                                                                                                                                                      
5992574864

% stat -c %s /tmp/out/s3-custom/bin/clang-13
683184016
% stat -c %s /tmp/out/s2-custom/bin/clang-13                         
617970256

For ELF the object file size decrease is 3%. The non-stripped clang size decrease is 10%.

D103717 can bring the benefit to regular PGO.