This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
docs/
-
UsersManual.rst
-
lib/Driver/ToolChains/
-
Driver/
-
ToolChains/
-
Fuchsia.h
-
Fuchsia.cpp
-
test/Driver/
-
Driver/
-
fuchsia.c
-
llvm/
-
lib/Transforms/Instrumentation/
-
Transforms/
-
Instrumentation/
3/4
InstrProfiling.cpp
-
test/Instrumentation/InstrProfiling/
-
Instrumentation/
-
InstrProfiling/
-
profiling.ll

Differential D98061

[InstrProfiling] Generate runtime hook for Fuchsia
ClosedPublic

Authored by phosek on Mar 5 2021, 10:49 AM.

Download Raw Diff

Details

Reviewers

vsk
davidxl

Commits

rG389dc94d4be7: [InstrProfiling] Generate runtime hook for Fuchsia
rG87fd09b25f88: [InstrProfiling] Generate runtime hook for ELF platforms

Summary

When none of the translation units in the binary have been instrumented
we shouldn't need to link the profile runtime. However, because we pass
-u__llvm_profile_runtime on Linux and Fuchsia, the runtime would still
be pulled in and incur some overhead. On Fuchsia which uses runtime
counter relocation, it also means that we cannot reference the bias
variable unconditionally.

This change modifies the InstrProfiling pass to pull in the profile
runtime only when needed by declaring the llvm_profile_runtime symbol
in the translation unit only when needed. For now we restrict this only
for Fuchsia, but this can be later expanded to other platforms. This
approach was already used prior to 9a041a75221ca, but we changed it
to always generate the llvm_profile_runtime due to a TAPI limitation,
but that limitation may no longer apply, and it certainly doesn't apply
on platforms like Fuchsia.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

phosek created this revision.Mar 5 2021, 10:49 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptMar 5 2021, 10:49 AM

phosek requested review of this revision.Mar 5 2021, 10:49 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMar 5 2021, 10:49 AM

Herald added subscribers: llvm-commits, cfe-commits. · View Herald Transcript

phosek added inline comments.Mar 5 2021, 10:50 AM

llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
1138–1139	@vsk do you know why we need this function instead of just using `llvm.compiler.used`/`llvm.used` for the symbol? I used that approach for ELF and it seems to be working fine.

Harbormaster completed remote builds in B92355: Diff 328587.Mar 6 2021, 4:01 AM

@vsk do you have any thoughts on this?

@ributzka may have stronger thoughts about when -fprofile-instr-generate must imply that a known set of symbols appear with external visibility. Up until now, the answer has been "always", and this is what tapi enforces for MachO. It's awkward to have this be inconsistent between MachO/ELF, but if there's a compelling performance reason then I think it's fine.

llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
1138–1139	I don't have the context for this, since this code is from before I started working on llvm. I'm guessing, but maybe it's possible that llvm(.compiler)?.used didn't exist or work well when this code was written.

In D98061#2615239, @vsk wrote:

@ributzka may have stronger thoughts about when -fprofile-instr-generate must imply that a known set of symbols appear with external visibility. Up until now, the answer has been "always", and this is what tapi enforces for MachO. It's awkward to have this be inconsistent between MachO/ELF, but if there's a compelling performance reason then I think it's fine.

From the perspective of Fuchsia, we've seen a noticeable impact of this change when using -fprofile-instr-generate together -fprofile-list to apply instrumentation selectively only to modified files.

llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
1138–1139	Would it be OK with you if I sent out a separate change to remove this?

In D98061#2615250, @phosek wrote:

In D98061#2615239, @vsk wrote:

@ributzka may have stronger thoughts about when -fprofile-instr-generate must imply that a known set of symbols appear with external visibility. Up until now, the answer has been "always", and this is what tapi enforces for MachO. It's awkward to have this be inconsistent between MachO/ELF, but if there's a compelling performance reason then I think it's fine.

From the perspective of Fuchsia, we've seen a noticeable impact of this change when using -fprofile-instr-generate together -fprofile-list to apply instrumentation selectively only to modified files.

What kind of impact do you see? If #counters > 0, is it mostly binary size cost? If #counters == 0, what's the avg. overhead of writing out the full profile? Can it be fixed by doing an early-exit in the runtime initializer, writing out an empty .profraw?

That raises a question about tooling support: some workflows (like the Xcode coverage plugin) might assume that a program compiled with -fprofile-instr-generate always creates a .profraw. If there's no profile written at all for the #counters == 0 case, that could be a breaking change.

llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
1138–1139	Thanks, yes, that would be great.

In D98061#2615334, @vsk wrote:

In D98061#2615250, @phosek wrote:

In D98061#2615239, @vsk wrote:

@ributzka may have stronger thoughts about when -fprofile-instr-generate must imply that a known set of symbols appear with external visibility. Up until now, the answer has been "always", and this is what tapi enforces for MachO. It's awkward to have this be inconsistent between MachO/ELF, but if there's a compelling performance reason then I think it's fine.

From the perspective of Fuchsia, we've seen a noticeable impact of this change when using -fprofile-instr-generate together -fprofile-list to apply instrumentation selectively only to modified files.

What kind of impact do you see? If #counters > 0, is it mostly binary size cost? If #counters == 0, what's the avg. overhead of writing out the full profile?

It depends a bit on the runtime and the platform. In Fuchsia where we always use the continuous mode, there's quite a bit of upfront cost (see https://github.com/llvm/llvm-project/blob/75f3f778052cdcd89e79f7a42a50915ee5ce2281/compiler-rt/lib/profile/InstrProfilingPlatformFuchsia.c#L109), we need to allocate a memory object, map it into address space, publish it, write some additional information into the log.

While it may not be so bad for a single binary, over hundreds of tests it adds up and with this change we saw the total test execution time go down from 30 to 18 minutes. There are other steps we're taking, like eliminating the need for logging, but that's unlikely to eliminate all of the overhead.

Can it be fixed by doing an early-exit in the runtime initializer, writing out an empty .profraw?

I considered that initially, but that's less efficient than the approach implemented here, especially when it comes to binary size.

That raises a question about tooling support: some workflows (like the Xcode coverage plugin) might assume that a program compiled with -fprofile-instr-generate always creates a .profraw. If there's no profile written at all for the #counters == 0 case, that could be a breaking change.

That's a good point, would it be better to put this behind a (frontend or backend) flag?

In D98061#2615386, @phosek wrote:

In D98061#2615334, @vsk wrote:

In D98061#2615250, @phosek wrote:

In D98061#2615239, @vsk wrote:

@ributzka may have stronger thoughts about when -fprofile-instr-generate must imply that a known set of symbols appear with external visibility. Up until now, the answer has been "always", and this is what tapi enforces for MachO. It's awkward to have this be inconsistent between MachO/ELF, but if there's a compelling performance reason then I think it's fine.

From the perspective of Fuchsia, we've seen a noticeable impact of this change when using -fprofile-instr-generate together -fprofile-list to apply instrumentation selectively only to modified files.

What kind of impact do you see? If #counters > 0, is it mostly binary size cost? If #counters == 0, what's the avg. overhead of writing out the full profile?

It depends a bit on the runtime and the platform. In Fuchsia where we always use the continuous mode, there's quite a bit of upfront cost (see https://github.com/llvm/llvm-project/blob/75f3f778052cdcd89e79f7a42a50915ee5ce2281/compiler-rt/lib/profile/InstrProfilingPlatformFuchsia.c#L109), we need to allocate a memory object, map it into address space, publish it, write some additional information into the log.

While it may not be so bad for a single binary, over hundreds of tests it adds up and with this change we saw the total test execution time go down from 30 to 18 minutes. There are other steps we're taking, like eliminating the need for logging, but that's unlikely to eliminate all of the overhead.

Can it be fixed by doing an early-exit in the runtime initializer, writing out an empty .profraw?

I considered that initially, but that's less efficient than the approach implemented here, especially when it comes to binary size.

I don't follow where the binary size cost comes from (is it the cost of writing out many empty .profraw headers?), but it sounds like the 30 -> 18min speed up is achieved by not registering essentially an empty profile with the VM, so it does follow that unconditionally writing out an empty .profraw won't work as well as disabling the runtime initializer entirely.

That raises a question about tooling support: some workflows (like the Xcode coverage plugin) might assume that a program compiled with -fprofile-instr-generate always creates a .profraw. If there's no profile written at all for the #counters == 0 case, that could be a breaking change.

That's a good point, would it be better to put this behind a (frontend or backend) flag?

I don't think it should be an option because I have doubts about how discoverable it'd be. My preference would be to add a section to the clang coverage docs explaining the different guarantees for .profraw emission.

Is the case when there is no counters very rare? And for those cases, how much overhead the runtime hook can incur? I assume it is small compared with actual instrumentation?

In D98061#2615575, @davidxl wrote:

Is the case when there is no counters very rare? And for those cases, how much overhead the runtime hook can incur? I assume it is small compared with actual instrumentation?

I'll try to explain our use case in more detail, hopefully that'll make the issue more clear.

We're collecting coverage during pre-submit testing and we're trying to reduce the overhead of instrumentation to provide fast turnaround time, so we're only instrumenting modified code. So if a patch modifies the header foo.h, we'd generate profile.list with the following content:

src:include/foo.h

and set the global flag -fprofile-list=profile.list (because we don't know what are all the places where foo.h is included). For binaries that contain TUs that include foo.h, those TUs get instrumented and the profile runtime is linked in into the binary as expected.

There are also going to be binaries in our system build that don't use foo.h and ideally shouldn't have any overhead since nothing inside them is instrumented (they should behave as if -fprofile-instr-generate wasn't set for them at all). That's not the case today because if you set -fprofile-instr-generate, driver passes -u__llvm_profile_runtime to the linker which "pulls in" the profile runtime introducing some extra bloat and startup overhead I described earlier.

This change is trying to address the issue by declaring __llvm_profile_runtime only when it's needed (that is, only when TU contains some instrumented functions) rather than doing it unconditionally in the driver where we don't yet know if the runtime will be needed or not.

thanks for the background. This patch looks good at higher level. Vedant can help detailed review.

Thanks for the detailed explanation of the -fprofile-list workflow; given the difference constraints, this patch lgtm. Please document the divergent behavior re: no .profraw file when #counters == 0 for non-MachO in the clang docs.

This revision is now accepted and ready to land.Mar 10 2021, 12:11 PM

phosek updated this revision to Diff 329875.Mar 11 2021, 1:26 AM

Harbormaster completed remote builds in B93243: Diff 329875.Mar 11 2021, 7:51 AM

Closed by commit rG87fd09b25f88: [InstrProfiling] Generate runtime hook for ELF platforms (authored by phosek). · Explain WhyMar 11 2021, 12:29 PM

This revision was automatically updated to reflect the committed changes.

phosek added a commit: rG87fd09b25f88: [InstrProfiling] Generate runtime hook for ELF platforms.

hans added a reverting change: rGf50aef745c3b: Revert "[InstrProfiling] Don't generate __llvm_profile_runtime_user".Mar 12 2021, 4:55 AM

phosek reopened this revision.Mar 12 2021, 2:34 PM

This revision is now accepted and ready to land.Mar 12 2021, 2:34 PM

phosek mentioned this in D98325: [InstrProfiling] Don't generate __llvm_profile_runtime_user.Mar 12 2021, 6:25 PM

I am a bit concerned that whether the file is generated or not is now dependent on the instrumentation and linker garbage collection.

There are also going to be binaries in our system build that don't use foo.h and ideally shouldn't have any overhead since nothing inside them is instrumented (they should behave as if -fprofile-instr-generate wasn't set for them at all). That's not the case today because if you set -fprofile-instr-generate, driver passes -u__llvm_profile_runtime to the linker which "pulls in" the profile runtime introducing some extra bloat and startup overhead I described earlier.

The overhead is just __llvm_profile_write_file, right? It just writes a 100+ bytes file which has very little overhead.

Some sanitizers can be used in a link-only manner without instrumentation, e.g. -fsanitize=leak does not need instrumentation. The source code just loses __has_feature(leak_sanitizer) detection.
Link-only -fsanitize=address can catch double free and mismatching new/delete.

Do we expect that libclang_rt.profile- can provide other features which may be useful even if there is nothing to instrument according to -fprofile-list?
If yes, making the library conditionally not linked can lose such features.

Another case is ld.lld --thinlto-index-only always creates *.imports and *.thinlto.bc files, to convey to the build system that the files are correctly generated.

phosek updated this revision to Diff 332739.Mar 23 2021, 11:23 AM

In D98061#2623959, @MaskRay wrote:

I am a bit concerned that whether the file is generated or not is now dependent on the instrumentation and linker garbage collection.

That's a fair concern, do you know about use cases where this would cause issues?

There are also going to be binaries in our system build that don't use foo.h and ideally shouldn't have any overhead since nothing inside them is instrumented (they should behave as if -fprofile-instr-generate wasn't set for them at all). That's not the case today because if you set -fprofile-instr-generate, driver passes -u__llvm_profile_runtime to the linker which "pulls in" the profile runtime introducing some extra bloat and startup overhead I described earlier.

The overhead is just __llvm_profile_write_file, right? It just writes a 100+ bytes file which has very little overhead.

It could be more if you use the continuous mode where you'd also need to mmap the profile and do some additional setup.

Some sanitizers can be used in a link-only manner without instrumentation, e.g. -fsanitize=leak does not need instrumentation. The source code just loses __has_feature(leak_sanitizer) detection.
Link-only -fsanitize=address can catch double free and mismatching new/delete.

Do we expect that libclang_rt.profile- can provide other features which may be useful even if there is nothing to instrument according to -fprofile-list?

I'm not aware of any such features being planned right now.

If yes, making the library conditionally not linked can lose such features.

Another case is ld.lld --thinlto-index-only always creates *.imports and *.thinlto.bc files, to convey to the build system that the files are correctly generated.

Since each instrumented binary (that is executable and shared library) generates its own profile, how many profiles get generated really depends on how many instrumented shared libraries your binary depends on. Furthermore, if you use profile merging, you cannot even predict what the profile names are going to be, so I assumed that build systems/test runners would need some other mechanism to find out what profiles were generated, at least that's the case for us.

This approach was already used prior to 9a041a75221ca, but we changed it to always generate the llvm_profile_runtime due to a TAPI limitation.

D43794 (rG9a041a75221ca) does not seem to affect the ELF behavior.

We can stop passing -u__llvm_profile_runtime to the linker on Linux and Fuchsia since the generated undefined symbol in each translation unit that needed it serves the same purpose.

This restores the behavior before @davidxl's rG170cd100ed6f38ec5826dbd1bd6930ddfd3490a4 (2015).
While having less code in the clang driver side is pros to me, I don't know whether it should come with the cost of conditional presence/absence of the .profraw file.
(you can specify LLVM_PROFILE_FILE without %m; -fprofile-profile-generate by default does not use %m)

There are also going to be binaries in our system build that don't use foo.h and ideally shouldn't have any overhead since nothing inside them is instrumented (they should behave as if -fprofile-instr-generate wasn't set for them at all). That's not the case today because if you set -fprofile-instr-generate, driver passes -u__llvm_profile_runtime to the linker which "pulls in" the profile runtime introducing some extra bloat and startup overhead I described earlier.

The overhead is just __llvm_profile_write_file, right? It just writes a 100+ bytes file which has very little overhead.

It could be more if you use the continuous mode where you'd also need to mmap the profile and do some additional setup.

Can the cost be avoided in the runtime?

llvm/test/Instrumentation/InstrProfiling/linkage.ll
13 ↗	(On Diff #332739)	If the symbol is now present, a CHECK line should be kept.

Harbormaster completed remote builds in B95310: Diff 332739.Mar 23 2021, 3:40 PM

I have reworked the change and restricted it only to Fuchsia for now which is where this change is still desirable.

Harbormaster completed remote builds in B118994: Diff 365635.Aug 10 2021, 5:50 PM

This revision was landed with ongoing or failed builds.Aug 10 2021, 11:21 PM

Closed by commit rG389dc94d4be7: [InstrProfiling] Generate runtime hook for Fuchsia (authored by phosek). · Explain Why

This revision was automatically updated to reflect the committed changes.

phosek mentioned this in rGc0c1c3cf93ec: Revert "[InstrProfiling] Emit bias variable eagerly".

phosek added a commit: rG389dc94d4be7: [InstrProfiling] Generate runtime hook for Fuchsia.

Hi Petr,

This looks to be the change that most likely broke a test on Windows Debug - would you mind taking a look? Here's the relevant test and stack trace:

FAIL: LLVM :: Instrumentation/InstrProfiling/linkage.ll (56524 of 77140)

TEST 'LLVM :: Instrumentation/InstrProfiling/linkage.ll' FAILED ****

Script:

: 'RUN: at line 3'; d:\a\_work\1\b\llvm\debug\bin\opt.exe < D:\a\_work\1\s\llvm-project\llvm\test\Instrumentation\InstrProfiling\linkage.ll -mtriple=x86_64-apple-macosx10.10.0 -instrprof -S | d:\a\_work\1\b\llvm\debug\bin\filecheck.exe D:\a\_work\1\s\llvm-project\llvm\test\Instrumentation\InstrProfiling\linkage.ll --check-prefixes=MACHO
: 'RUN: at line 4'; d:\a\_work\1\b\llvm\debug\bin\opt.exe < D:\a\_work\1\s\llvm-project\llvm\test\Instrumentation\InstrProfiling\linkage.ll -mtriple=x86_64-unknown-linux -instrprof -S | d:\a\_work\1\b\llvm\debug\bin\filecheck.exe D:\a\_work\1\s\llvm-project\llvm\test\Instrumentation\InstrProfiling\linkage.ll --check-prefixes=ELF
: 'RUN: at line 5'; d:\a\_work\1\b\llvm\debug\bin\opt.exe < D:\a\_work\1\s\llvm-project\llvm\test\Instrumentation\InstrProfiling\linkage.ll -mtriple=x86_64-unknown-fuchsia -instrprof -S | d:\a\_work\1\b\llvm\debug\bin\filecheck.exe D:\a\_work\1\s\llvm-project\llvm\test\Instrumentation\InstrProfiling\linkage.ll --check-prefixes=ELF
: 'RUN: at line 6'; d:\a\_work\1\b\llvm\debug\bin\opt.exe < D:\a\_work\1\s\llvm-project\llvm\test\Instrumentation\InstrProfiling\linkage.ll -mtriple=x86_64-pc-win32-coff -instrprof -S | d:\a\_work\1\b\llvm\debug\bin\filecheck.exe D:\a\_work\1\s\llvm-project\llvm\test\Instrumentation\InstrProfiling\linkage.ll --check-prefixes=COFF
: 'RUN: at line 7'; d:\a\_work\1\b\llvm\debug\bin\opt.exe < D:\a\_work\1\s\llvm-project\llvm\test\Instrumentation\InstrProfiling\linkage.ll -mtriple=x86_64-apple-macosx10.10.0 -passes=instrprof -S | d:\a\_work\1\b\llvm\debug\bin\filecheck.exe D:\a\_work\1\s\llvm-project\llvm\test\Instrumentation\InstrProfiling\linkage.ll --check-prefixes=MACHO
: 'RUN: at line 8'; d:\a\_work\1\b\llvm\debug\bin\opt.exe < D:\a\_work\1\s\llvm-project\llvm\test\Instrumentation\InstrProfiling\linkage.ll -mtriple=x86_64-unknown-linux -passes=instrprof -S | d:\a\_work\1\b\llvm\debug\bin\filecheck.exe D:\a\_work\1\s\llvm-project\llvm\test\Instrumentation\InstrProfiling\linkage.ll --check-prefixes=ELF
: 'RUN: at line 9'; d:\a\_work\1\b\llvm\debug\bin\opt.exe < D:\a\_work\1\s\llvm-project\llvm\test\Instrumentation\InstrProfiling\linkage.ll -mtriple=x86_64-unknown-fuchsia -passes=instrprof -S | d:\a\_work\1\b\llvm\debug\bin\filecheck.exe D:\a\_work\1\s\llvm-project\llvm\test\Instrumentation\InstrProfiling\linkage.ll --check-prefixes=ELF
: 'RUN: at line 10'; d:\a\_work\1\b\llvm\debug\bin\opt.exe < D:\a\_work\1\s\llvm-project\llvm\test\Instrumentation\InstrProfiling\linkage.ll -mtriple=x86_64-pc-win32-coff -passes=instrprof -S | d:\a\_work\1\b\llvm\debug\bin\filecheck.exe D:\a\_work\1\s\llvm-project\llvm\test\Instrumentation\InstrProfiling\linkage.ll --check-prefixes=COFF

Exit Code: 2

Command Output (stdout):

$ ":" "RUN: at line 3"
$ "d:\a\_work\1\b\llvm\debug\bin\opt.exe" "-mtriple=x86_64-apple-macosx10.10.0" "-instrprof" "-S"
$ "d:\a\_work\1\b\llvm\debug\bin\filecheck.exe" "D:\a\_work\1\s\llvm-project\llvm\test\Instrumentation\InstrProfiling\linkage.ll" "--check-prefixes=MACHO"
$ ":" "RUN: at line 4"
$ "d:\a\_work\1\b\llvm\debug\bin\opt.exe" "-mtriple=x86_64-unknown-linux" "-instrprof" "-S"
$ "d:\a\_work\1\b\llvm\debug\bin\filecheck.exe" "D:\a\_work\1\s\llvm-project\llvm\test\Instrumentation\InstrProfiling\linkage.ll" "--check-prefixes=ELF"
$ ":" "RUN: at line 5"
$ "d:\a\_work\1\b\llvm\debug\bin\opt.exe" "-mtriple=x86_64-unknown-fuchsia" "-instrprof" "-S"

command stderr:

PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.

Stack dump:

0. Program arguments: d:\\a\\_work\\1\\b\\llvm\\debug\\bin\\opt.exe -mtriple=x86_64-unknown-fuchsia -instrprof -S

#0 0x00007ff6c05a6844 failwithmessage d:\agent\_work\4\s\src\vctools\crt\vcstartup\src\rtc\error.cpp:213:0

#1 0x00007ff6c05a69e4 _RTC_UninitUse d:\agent\_work\4\s\src\vctools\crt\vcstartup\src\rtc\error.cpp:362:0

#2 0x00007ff6be864982 llvm::InstrProfiling::run(class llvm::Module &, class std::function<(class llvm::Function &)>) D:\a\_work\1\s\llvm-project\llvm\lib\Transforms\Instrumentation\InstrProfiling.cpp:590:0

#3 0x00007ff6be8644ad llvm::InstrProfiling::run(class llvm::Module &, class llvm::AnalysisManager<class llvm::Module> &) D:\a\_work\1\s\llvm-project\llvm\lib\Transforms\Instrumentation\InstrProfiling.cpp:417:0

#4 0x00007ff6bf82d229 llvm::detail::PassModel<class llvm::Module, class llvm::InstrProfiling, class llvm::PreservedAnalyses, class llvm::AnalysisManager<class llvm::Module>>::run(class llvm::Module &, class llvm::AnalysisManager<class llvm::Module> &) D:\a\_work\1\s\llvm-project\llvm\include\llvm\IR\PassManagerInternal.h:85:0

#5 0x00007ff6bc402b9e llvm::PassManager<class llvm::Module, class llvm::AnalysisManager<class llvm::Module>>::run(class llvm::Module &, class llvm::AnalysisManager<class llvm::Module> &) D:\a\_work\1\s\llvm-project\llvm\include\llvm\IR\PassManager.h:509:0

#6 0x00007ff6bc3bd844 llvm::runPassPipeline(class llvm::StringRef, class llvm::Module &, class llvm::TargetMachine *, class llvm::TargetLibraryInfoImpl *, class llvm::ToolOutputFile *, class llvm::ToolOutputFile *, class llvm::ToolOutputFile *, class llvm::StringRef, class llvm::ArrayRef<class llvm::StringRef>, enum llvm::opt_tool::OutputKind, enum llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool) D:\a\_work\1\s\llvm-project\llvm\tools\opt\NewPMDriver.cpp:456:0

#7 0x00007ff6bc417060 main D:\a\_work\1\s\llvm-project\llvm\tools\opt\opt.cpp:830:0

#8 0x00007ff6c05a6039 invoke_main d:\agent\_work\4\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:79:0

kile added a subscriber: stella.stamenova.Aug 16 2021, 11:28 AM

Apologies, reporting on the incorrect revision :(

probinson mentioned this in D127506: [PS4/PS5][profiling] Go back to the old way of doing a runtime hook.Jun 10 2022, 9:18 AM

probinson mentioned this in rG39fb84343ec5: [PS4/PS5][profiling] Go back to the old way of doing a runtime hook.Jun 16 2022, 11:37 AM

probinson mentioned this in rG3f6030255d7a: Reland "[PS4/PS5][profiling] Go back to the old way of doing a runtime hook".Jun 16 2022, 11:54 AM

gulfem mentioned this in rGd6aed77f0d19: [InstrProfiling] No runtime hook for unused funcs.Sep 15 2022, 7:05 PM

Revision Contents

Path

Size

clang/

docs/

UsersManual.rst

8 lines

lib/

Driver/

ToolChains/

Fuchsia.h

3 lines

Fuchsia.cpp

10 lines

test/

Driver/

fuchsia.c

2 lines

llvm/

lib/

Transforms/

Instrumentation/

InstrProfiling.cpp

59 lines

test/

Instrumentation/

InstrProfiling/

profiling.ll

3 lines

Diff 365666

clang/docs/UsersManual.rst

	Show First 20 Lines • Show All 2,339 Lines • ▼ Show 20 Lines

	In these cases, you can use the flag ``-fno-profile-instr-generate`` (or			In these cases, you can use the flag ``-fno-profile-instr-generate`` (or
	``-fno-profile-generate``) to disable profile generation, and			``-fno-profile-generate``) to disable profile generation, and
	``-fno-profile-instr-use`` (or ``-fno-profile-use``) to disable profile use.			``-fno-profile-instr-use`` (or ``-fno-profile-use``) to disable profile use.

	Note that these flags should appear after the corresponding profile			Note that these flags should appear after the corresponding profile
	flags to have an effect.			flags to have an effect.

				.. note::

				When none of the translation units inside a binary is instrumented, in the
				case of Fuchsia the profile runtime will not be linked into the binary and
				no profile will be produced, while on other platforms the profile runtime
				will be linked and profile will be produced but there will not be any
				counters.

	Instrumenting only selected files or functions			Instrumenting only selected files or functions
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	Sometimes it's useful to only instrument certain files or functions. For			Sometimes it's useful to only instrument certain files or functions. For
	example in automated testing infrastructure, it may be desirable to only			example in automated testing infrastructure, it may be desirable to only
	instrument files or functions that were modified by a patch to reduce the			instrument files or functions that were modified by a patch to reduce the
	overhead of instrumenting a full system.			overhead of instrumenting a full system.

	▲ Show 20 Lines • Show All 1,604 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/Fuchsia.h

Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	public:
}		}

std::string ComputeEffectiveClangTriple(const llvm::opt::ArgList &Args,		std::string ComputeEffectiveClangTriple(const llvm::opt::ArgList &Args,
types::ID InputType) const override;		types::ID InputType) const override;

SanitizerMask getSupportedSanitizers() const override;		SanitizerMask getSupportedSanitizers() const override;
SanitizerMask getDefaultSanitizers() const override;		SanitizerMask getDefaultSanitizers() const override;

void addProfileRTLibs(const llvm::opt::ArgList &Args,
llvm::opt::ArgStringList &CmdArgs) const override;

RuntimeLibType		RuntimeLibType
GetRuntimeLibType(const llvm::opt::ArgList &Args) const override;		GetRuntimeLibType(const llvm::opt::ArgList &Args) const override;
CXXStdlibType		CXXStdlibType
GetCXXStdlibType(const llvm::opt::ArgList &Args) const override;		GetCXXStdlibType(const llvm::opt::ArgList &Args) const override;

void addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,		void addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args,		llvm::opt::ArgStringList &CC1Args,
Action::OffloadKind DeviceOffloadKind) const override;		Action::OffloadKind DeviceOffloadKind) const override;
Show All 22 Lines

clang/lib/Driver/ToolChains/Fuchsia.cpp

Show First 20 Lines • Show All 431 Lines • ▼ Show 20 Lines	case llvm::Triple::x86_64:
Res \|= SanitizerKind::SafeStack;		Res \|= SanitizerKind::SafeStack;
break;		break;
default:		default:
// TODO: Enable SafeStack on RISC-V once tested.		// TODO: Enable SafeStack on RISC-V once tested.
break;		break;
}		}
return Res;		return Res;
}		}

void Fuchsia::addProfileRTLibs(const llvm::opt::ArgList &Args,
llvm::opt::ArgStringList &CmdArgs) const {
// Add linker option -u__llvm_profile_runtime to cause runtime
// initialization module to be linked in.
if (needsProfileRT(Args))
CmdArgs.push_back(Args.MakeArgString(
Twine("-u", llvm::getInstrProfRuntimeHookVarName())));
ToolChain::addProfileRTLibs(Args, CmdArgs);
}

clang/test/Driver/fuchsia.c

	Show First 20 Lines • Show All 243 Lines • ▼ Show 20 Lines
	// CHECK-SPLIT-DWARF: "-split-dwarf-output" "fuchsia.dwo"			// CHECK-SPLIT-DWARF: "-split-dwarf-output" "fuchsia.dwo"

	// RUN: %clang %s -### --target=aarch64-unknown-fuchsia \			// RUN: %clang %s -### --target=aarch64-unknown-fuchsia \
	// RUN: -fprofile-instr-generate -fcoverage-mapping \			// RUN: -fprofile-instr-generate -fcoverage-mapping \
	// RUN: -resource-dir=%S/Inputs/resource_dir_with_per_target_subdir \			// RUN: -resource-dir=%S/Inputs/resource_dir_with_per_target_subdir \
	// RUN: -fuse-ld=lld 2>&1 \			// RUN: -fuse-ld=lld 2>&1 \
	// RUN: \| FileCheck %s -check-prefix=CHECK-PROFRT-AARCH64			// RUN: \| FileCheck %s -check-prefix=CHECK-PROFRT-AARCH64
	// CHECK-PROFRT-AARCH64: "-resource-dir" "[[RESOURCE_DIR:[^"]+]]"			// CHECK-PROFRT-AARCH64: "-resource-dir" "[[RESOURCE_DIR:[^"]+]]"
	// CHECK-PROFRT-AARCH64: "-u__llvm_profile_runtime"
	// CHECK-PROFRT-AARCH64: "[[RESOURCE_DIR]]{{/\|\\\\}}lib{{/\|\\\\}}aarch64-unknown-fuchsia{{/\|\\\\}}libclang_rt.profile.a"			// CHECK-PROFRT-AARCH64: "[[RESOURCE_DIR]]{{/\|\\\\}}lib{{/\|\\\\}}aarch64-unknown-fuchsia{{/\|\\\\}}libclang_rt.profile.a"

	// RUN: %clang %s -### --target=x86_64-unknown-fuchsia \			// RUN: %clang %s -### --target=x86_64-unknown-fuchsia \
	// RUN: -fprofile-instr-generate -fcoverage-mapping \			// RUN: -fprofile-instr-generate -fcoverage-mapping \
	// RUN: -resource-dir=%S/Inputs/resource_dir_with_per_target_subdir \			// RUN: -resource-dir=%S/Inputs/resource_dir_with_per_target_subdir \
	// RUN: -fuse-ld=lld 2>&1 \			// RUN: -fuse-ld=lld 2>&1 \
	// RUN: \| FileCheck %s -check-prefix=CHECK-PROFRT-X86_64			// RUN: \| FileCheck %s -check-prefix=CHECK-PROFRT-X86_64
	// CHECK-PROFRT-X86_64: "-resource-dir" "[[RESOURCE_DIR:[^"]+]]"			// CHECK-PROFRT-X86_64: "-resource-dir" "[[RESOURCE_DIR:[^"]+]]"
	// CHECK-PROFRT-X86_64: "-u__llvm_profile_runtime"
	// CHECK-PROFRT-X86_64: "[[RESOURCE_DIR]]{{/\|\\\\}}lib{{/\|\\\\}}x86_64-unknown-fuchsia{{/\|\\\\}}libclang_rt.profile.a"			// CHECK-PROFRT-X86_64: "[[RESOURCE_DIR]]{{/\|\\\\}}lib{{/\|\\\\}}x86_64-unknown-fuchsia{{/\|\\\\}}libclang_rt.profile.a"

llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp

Show First 20 Lines • Show All 514 Lines • ▼ Show 20 Lines	void InstrProfiling::promoteCounterLoadStores(Function *F) {
// Do a post-order traversal of the loops so that counter updates can be		// Do a post-order traversal of the loops so that counter updates can be
// iteratively hoisted outside the loop nest.		// iteratively hoisted outside the loop nest.
for (auto *Loop : llvm::reverse(Loops)) {		for (auto *Loop : llvm::reverse(Loops)) {
PGOCounterPromoter Promoter(LoopPromotionCandidates, *Loop, LI, BFI.get());		PGOCounterPromoter Promoter(LoopPromotionCandidates, *Loop, LI, BFI.get());
Promoter.run(&TotalCountersPromoted);		Promoter.run(&TotalCountersPromoted);
}		}
}		}

		static bool needsRuntimeHookUnconditionally(const Triple &TT) {
		// On Fuchsia, we only need runtime hook if any counters are present.
		if (TT.isOSFuchsia())
		return false;

		return true;
		}

/// Check if the module contains uses of any profiling intrinsics.		/// Check if the module contains uses of any profiling intrinsics.
static bool containsProfilingIntrinsics(Module &M) {		static bool containsProfilingIntrinsics(Module &M) {
if (auto *F = M.getFunction(		if (auto *F = M.getFunction(
Intrinsic::getName(llvm::Intrinsic::instrprof_increment)))		Intrinsic::getName(llvm::Intrinsic::instrprof_increment)))
if (!F->use_empty())		if (!F->use_empty())
return true;		return true;
if (auto *F = M.getFunction(		if (auto *F = M.getFunction(
Intrinsic::getName(llvm::Intrinsic::instrprof_increment_step)))		Intrinsic::getName(llvm::Intrinsic::instrprof_increment_step)))
Show All 12 Lines	bool InstrProfiling::run(
this->GetTLI = std::move(GetTLI);		this->GetTLI = std::move(GetTLI);
NamesVar = nullptr;		NamesVar = nullptr;
NamesSize = 0;		NamesSize = 0;
ProfileDataMap.clear();		ProfileDataMap.clear();
CompilerUsedVars.clear();		CompilerUsedVars.clear();
UsedVars.clear();		UsedVars.clear();
TT = Triple(M.getTargetTriple());		TT = Triple(M.getTargetTriple());

		bool MadeChange;

// Emit the runtime hook even if no counters are present.		// Emit the runtime hook even if no counters are present.
bool MadeChange = emitRuntimeHook();		if (needsRuntimeHookUnconditionally(TT))
		MadeChange = emitRuntimeHook();

// Improve compile time by avoiding linear scans when there is no work.		// Improve compile time by avoiding linear scans when there is no work.
GlobalVariable *CoverageNamesVar =		GlobalVariable *CoverageNamesVar =
M.getNamedGlobal(getCoverageUnusedNamesVarName());		M.getNamedGlobal(getCoverageUnusedNamesVarName());
if (!containsProfilingIntrinsics(M) && !CoverageNamesVar)		if (!containsProfilingIntrinsics(M) && !CoverageNamesVar)
return MadeChange;		return MadeChange;

// We did not know how many value sites there would be inside		// We did not know how many value sites there would be inside
Show All 22 Lines	if (CoverageNamesVar) {
MadeChange = true;		MadeChange = true;
}		}

if (!MadeChange)		if (!MadeChange)
return false;		return false;

emitVNodes();		emitVNodes();
emitNameData();		emitNameData();
		emitRuntimeHook();
emitRegistration();		emitRegistration();
emitUses();		emitUses();
emitInitialization();		emitInitialization();
return true;		return true;
}		}

static FunctionCallee getOrInsertValueProfilingCall(		static FunctionCallee getOrInsertValueProfilingCall(
Module &M, const TargetLibraryInfo &TLI,		Module &M, const TargetLibraryInfo &TLI,
▲ Show 20 Lines • Show All 505 Lines • ▼ Show 20 Lines	if (NamesVar) {
IRB.CreateCall(NamesRegisterF, {IRB.CreateBitCast(NamesVar, VoidPtrTy),		IRB.CreateCall(NamesRegisterF, {IRB.CreateBitCast(NamesVar, VoidPtrTy),
IRB.getInt64(NamesSize)});		IRB.getInt64(NamesSize)});
}		}

IRB.CreateRetVoid();		IRB.CreateRetVoid();
}		}

bool InstrProfiling::emitRuntimeHook() {		bool InstrProfiling::emitRuntimeHook() {
// We expect the linker to be invoked with -u<hook_var> flag for Linux or		// We expect the linker to be invoked with -u<hook_var> flag for Linux
// Fuchsia, in which case there is no need to emit the user function.		// in which case there is no need to emit the external variable.
if (TT.isOSLinux() \|\| TT.isOSFuchsia())		if (TT.isOSLinux())
return false;		return false;

// If the module's provided its own runtime, we don't need to do anything.		// If the module's provided its own runtime, we don't need to do anything.
if (M->getGlobalVariable(getInstrProfRuntimeHookVarName()))		if (M->getGlobalVariable(getInstrProfRuntimeHookVarName()))
return false;		return false;

// Declare an external variable that will pull in the runtime initialization.		// Declare an external variable that will pull in the runtime initialization.
auto *Int32Ty = Type::getInt32Ty(M->getContext());		auto *Int32Ty = Type::getInt32Ty(M->getContext());
auto *Var =		auto *Var =
new GlobalVariable(*M, Int32Ty, false, GlobalValue::ExternalLinkage,		new GlobalVariable(*M, Int32Ty, false, GlobalValue::ExternalLinkage,
nullptr, getInstrProfRuntimeHookVarName());		nullptr, getInstrProfRuntimeHookVarName());

		if (TT.isOSBinFormatELF()) {
		phosekAuthorUnsubmitted Done Reply Inline Actions @vsk do you know why we need this function instead of just using `llvm.compiler.used`/`llvm.used` for the symbol? I used that approach for ELF and it seems to be working fine. phosek: @vsk do you know why we need this function instead of just using `llvm.compiler.used`/`llvm.
		vskUnsubmitted Done Reply Inline Actions I don't have the context for this, since this code is from before I started working on llvm. I'm guessing, but maybe it's possible that llvm(.compiler)?.used didn't exist or work well when this code was written. vsk: I don't have the context for this, since this code is from before I started working on llvm.
		phosekAuthorUnsubmitted Done Reply Inline Actions Would it be OK with you if I sent out a separate change to remove this? phosek: Would it be OK with you if I sent out a separate change to remove this?
		vskUnsubmitted Not Done Reply Inline Actions Thanks, yes, that would be great. vsk: Thanks, yes, that would be great.
		// Mark the user variable as used so that it isn't stripped out.
		CompilerUsedVars.push_back(Var);
		} else {
// Make a function that uses it.		// Make a function that uses it.
auto *User = Function::Create(FunctionType::get(Int32Ty, false),		auto *User = Function::Create(FunctionType::get(Int32Ty, false),
GlobalValue::LinkOnceODRLinkage,		GlobalValue::LinkOnceODRLinkage,
getInstrProfRuntimeHookVarUseFuncName(), M);		getInstrProfRuntimeHookVarUseFuncName(), M);
User->addFnAttr(Attribute::NoInline);		User->addFnAttr(Attribute::NoInline);
if (Options.NoRedZone)		if (Options.NoRedZone)
User->addFnAttr(Attribute::NoRedZone);		User->addFnAttr(Attribute::NoRedZone);
User->setVisibility(GlobalValue::HiddenVisibility);		User->setVisibility(GlobalValue::HiddenVisibility);
if (TT.supportsCOMDAT())		if (TT.supportsCOMDAT())
User->setComdat(M->getOrInsertComdat(User->getName()));		User->setComdat(M->getOrInsertComdat(User->getName()));

IRBuilder<> IRB(BasicBlock::Create(M->getContext(), "", User));		IRBuilder<> IRB(BasicBlock::Create(M->getContext(), "", User));
auto *Load = IRB.CreateLoad(Int32Ty, Var);		auto *Load = IRB.CreateLoad(Int32Ty, Var);
IRB.CreateRet(Load);		IRB.CreateRet(Load);

// Mark the user variable as used so that it isn't stripped out.		// Mark the function as used so that it isn't stripped out.
CompilerUsedVars.push_back(User);		CompilerUsedVars.push_back(User);
		}
return true;		return true;
}		}

void InstrProfiling::emitUses() {		void InstrProfiling::emitUses() {
// The metadata sections are parallel arrays. Optimizers (e.g.		// The metadata sections are parallel arrays. Optimizers (e.g.
// GlobalOpt/ConstantMerge) may not discard associated sections as a unit, so		// GlobalOpt/ConstantMerge) may not discard associated sections as a unit, so
// we conservatively retain all unconditionally in the compiler.		// we conservatively retain all unconditionally in the compiler.
//		//
▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/test/Instrumentation/InstrProfiling/profiling.ll

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	define void @baz() {
call void @llvm.instrprof.increment(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @__profn_baz, i32 0, i32 0), i64 0, i32 3, i32 0)		call void @llvm.instrprof.increment(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @__profn_baz, i32 0, i32 0), i64 0, i32 3, i32 0)
call void @llvm.instrprof.increment(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @__profn_baz, i32 0, i32 0), i64 0, i32 3, i32 1)		call void @llvm.instrprof.increment(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @__profn_baz, i32 0, i32 0), i64 0, i32 3, i32 1)
call void @llvm.instrprof.increment(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @__profn_baz, i32 0, i32 0), i64 0, i32 3, i32 2)		call void @llvm.instrprof.increment(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @__profn_baz, i32 0, i32 0), i64 0, i32 3, i32 2)
ret void		ret void
}		}

declare void @llvm.instrprof.increment(i8*, i64, i32, i32)		declare void @llvm.instrprof.increment(i8*, i64, i32, i32)

; ELF: @llvm.compiler.used = appending global {{.}} @__llvm_profile_runtime_user {{.}} @__profd_foo {{.}} @__profd_bar {{.}} @__profd_baz		; ELF: @llvm.compiler.used = appending global {{.}} @__llvm_profile_runtime {{.}} @__profd_foo {{.}} @__profd_bar {{.}} @__profd_baz
; MACHO: @llvm.used = appending global {{.}} @__llvm_profile_runtime_user {{.}} @__profd_foo {{.}} @__profd_bar {{.}} @__profd_baz		; MACHO: @llvm.used = appending global {{.}} @__llvm_profile_runtime_user {{.}} @__profd_foo {{.}} @__profd_bar {{.}} @__profd_baz
; WIN: @llvm.compiler.used = appending global {{.}} @__llvm_profile_runtime_user {{.}} @__profd_foo {{.}} @__profd_bar {{.}} @__profd_baz		; WIN: @llvm.compiler.used = appending global {{.}} @__llvm_profile_runtime_user {{.}} @__profd_foo {{.}} @__profd_bar {{.}} @__profd_baz

; ELF_GENERIC: define internal void @__llvm_profile_register_functions() unnamed_addr {		; ELF_GENERIC: define internal void @__llvm_profile_register_functions() unnamed_addr {
		; ELF_GENERIC-NEXT: call void @__llvm_profile_register_function(i8* bitcast (i32* @__llvm_profile_runtime to i8*))
; ELF_GENERIC-NEXT: call void @__llvm_profile_register_function(i8* bitcast ({ i64, i64, i64, i8, i8, i32, [2 x i16] }* @__profd_foo to i8*))		; ELF_GENERIC-NEXT: call void @__llvm_profile_register_function(i8* bitcast ({ i64, i64, i64, i8, i8, i32, [2 x i16] }* @__profd_foo to i8*))
; ELF_GENERIC-NEXT: call void @__llvm_profile_register_function(i8* bitcast ({ i64, i64, i64, i8, i8, i32, [2 x i16] }* @__profd_bar to i8*))		; ELF_GENERIC-NEXT: call void @__llvm_profile_register_function(i8* bitcast ({ i64, i64, i64, i8, i8, i32, [2 x i16] }* @__profd_bar to i8*))
; ELF_GENERIC-NEXT: call void @__llvm_profile_register_function(i8* bitcast ({ i64, i64, i64, i8, i8, i32, [2 x i16] }* @__profd_baz to i8*))		; ELF_GENERIC-NEXT: call void @__llvm_profile_register_function(i8* bitcast ({ i64, i64, i64, i8, i8, i32, [2 x i16] }* @__profd_baz to i8*))
; ELF_GENERIC-NEXT: call void @__llvm_profile_register_names_function(i8* getelementptr inbounds {{.*}} @__llvm_prf_nm		; ELF_GENERIC-NEXT: call void @__llvm_profile_register_names_function(i8* getelementptr inbounds {{.*}} @__llvm_prf_nm
; ELF_GENERIC-NEXT: ret void		; ELF_GENERIC-NEXT: ret void
; ELF_GENERIC-NEXT: }		; ELF_GENERIC-NEXT: }

; ELF_LINUX-NOT: @__llvm_profile_register_functions()		; ELF_LINUX-NOT: @__llvm_profile_register_functions()