This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
CodeGen/
-
AsmPrinter.h
-
MIPSectionEmitter.h
-
MIRInstrumentationPass.h
-
MachineInstr.h
-
Passes.h
-
TargetInstrInfo.h
-
InitializePasses.h
-
MC/
-
MCObjectFileInfo.h
-
MIP/
-
MIP.h
-
MIPData.inc
-
Support/
-
TargetOpcodes.def
-
Target/
-
Target.td
-
lib/
-
CodeGen/
-
AsmPrinter/
-
AsmPrinter.cpp
-
CMakeLists.txt
-
CodeGen.cpp
1/1
MIPSectionEmitter.cpp
-
MIRInstrumentationPass.cpp
-
TargetPassConfig.cpp
-
MC/
-
MCMachOStreamer.cpp
-
MCObjectFileInfo.cpp
-
Target/
-
AArch64/
-
AArch64AsmPrinter.cpp
-
AArch64InstrInfo.h
-
AArch64InstrInfo.cpp
-
ARM/
-
ARMAsmPrinter.h
-
ARMAsmPrinter.cpp
-
ARMMCInstLower.cpp
-
X86/
-
X86AsmPrinter.h
-
X86MCInstLower.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
-
mip-basic-block-coverage.ll
-
mip-comdat.ll
-
mip-function-coverage.ll
-
mip-header.ll
-
mip-map.ll
-
ARM/
-
mip-comdat.ll
-
mip-function-coverage.ll
-
mip-header.ll
-
mip-map.ll
-
Thumb/
-
mip-function-coverage.ll
-
Thumb2/
-
mip-function-coverage.ll
-
X86/
-
mip-basic-block-coverage.ll
-
mip-comdat.ll
-
mip-function-coverage.ll
-
mip-header.ll
-
mip-map.ll

Differential D104060

Machine IR Profile
AbandonedPublic

Authored by ellis on Jun 10 2021, 2:15 PM.

Download Raw Diff

Details

Reviewers

alexander-shaposhnikov

Summary

Machine IR Profile (MIP)

tl;dr;

This is a proposal to introduce a new instrumentation pass that can produce optimization profiles with a focus on binary size and runtime performance of the instrumented binaries.

Our instrumented binaries record machine function call counts, machine function timestamps, machine basic block coverage, and a subset of the dynamic call graph. There is also a more lightweight mode that only collects machine function coverage data that has negligible runtime overhead and a binary size increase of 2-5% for instrumented binaries.

This is just the first patch of the WIP MIP project. The full branch can be found at https://github.com/ellishg/llvm-project

Motivation

In the mobile space, increasing binary size has an outsized impact on both runtime performance and download speed. Current instrumentation implementations such as XRay and GCov produce binaries that are too slow and too large to run on real mobile devices. We propose a new pass that injects instrumentation code at the machine ir level. At runtime, we write profile data to our custom __llvm_mipraw section that is eventually dumped to a .mipraw file. At buildtime, we emit a .mipmap file which we use to map function information to data in the .mipraw file. The result is that no redundant function info is stored in the binary, which allows our instrumentation to have minimal size overhead.

MIP has been implemented on ELF and Mach-O targets for x86_64, AArch64, and Armv7 with Thumb and Thumb2.

Performance

Our focus for now is on the performance and size of binaries that have injected instrumentation instead of binaries that have been optimized with instrumentation profiles. We collected some basic results from MultiSource/Benchmarks in llvm-test-suite for both MIP and clang’s instrumentation using the -fprofile-generate flag. It should be noted that this comparison is not fair because clang’s instrumentation collects much more data than just function coverage. However, we expect fully-featured MIP to have similar metrics.

Instrumented Binary Size

At the moment, we have implemented function coverage which injects one x86_64 instruction (7 bytes) and one byte of global data for each instrumented function, which should have minimal impact on binary size and runtime performance. In fact, our results show that we should expect MIP instrumented binaries to be only 2-5% larger. We contrast this with clang’s instrumentation, which can increase the binary size by 500-900%.

Instrumented Execution Time

We found that MIP had negligable execution time regressions when instrumented with MIP. Again, we can (unfairly) contrast this to -fprofile-generate which increased execution time by 1-40%.

Usage

We use the -fmachine-profile-generate clang flag to produce an instrumented binary and then use llvm-objcopy to extract the .mipmap file.

$ clang -g -fmachine-profile-generate main.cpp
$ llvm-objcopy --dump-section=__llvm_mipmap=default.mipmap a.out /dev/null
$ llvm-strip -g a.out -o a.out.stripped

This will produce the instrumented binary a.out and a map file default.mipmap.

When we run the binary, it will produce a default.mipraw file containing the profile data for that run.

$ ./a.out.stripped
$ ls
a.out    default.mipmap    default.mipraw    main.cpp

Then we use our custom tool to postprocess the raw profile and produce the final profile default.mip.

$ llvm-mipdata create -p default.mip default.mipmap
$ llvm-mipdata merge -p default.mip default.mipraw

If our binary has debug info, we can use it to report source information along with the profile data.

$ llvm-mipdata show -p default.mip --debug a.out
_Z3fooi
  Source Info: /home/main.cpp:9
  Call Count: 0
  Block Coverage:
     COLD COLD COLD COLD COLD

_Z3bari
  Source Info: /home/main.cpp:16
  Call Count: 1
  Block Coverage:
     HOT  HOT  COLD HOT  HOT

Finally, we can consume the profile using the clang flag -fmachine-profile-use= to produce a profile-optimized binary.

$ clang -fmachine-profile-use=default.mip main.cpp

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	30 ms	x64 windows > flang-Unit.RuntimeGTest/_/FlangRuntimeTests_exe::TimeIntrinsics.CpuTime

Event Timeline

ellis requested review of this revision.Jun 10 2021, 2:15 PM

ellis created this revision.

ellis created this object with visibility "ellis (Ellis Hoag)".

ellis created this object with edit policy "ellis (Ellis Hoag)".

Herald added a project: Restricted Project. · View Herald TranscriptJun 10 2021, 2:15 PM

ellis edited the summary of this revision. (Show Details)Jun 10 2021, 2:17 PM

ellis edited the summary of this revision. (Show Details)

ellis edited the summary of this revision. (Show Details)Jun 10 2021, 3:03 PM

ellis changed the visibility from "ellis (Ellis Hoag)" to "Public (No Login Required)".

ellis changed the edit policy from "ellis (Ellis Hoag)" to "All Users".

Herald added subscribers: llvm-commits, dexonsmith, pengfei and 7 others. · View Herald TranscriptJun 10 2021, 3:03 PM

Machine IR Profile (MIP)

tl;dr;

This is a proposal to introduce a new instrumentation pass that can produce optimization profiles with a focus on binary size and runtime performance of the instrumented binaries.

This is just the first patch of the WIP MIP project. The full branch can be found at https://github.com/ellishg/llvm-project

Motivation

MIP has been implemented on ELF and Mach-O targets for x86_64, AArch64, and Armv7 with Thumb and Thumb2.

Performance

Instrumented Binary Size

Instrumented Execution Time

We found that MIP had negligable execution time regressions when instrumented with MIP. Again, we can (unfairly) contrast this to -fprofile-generate which increased execution time by 1-40%.

Usage

We use the -fmachine-profile-generate clang flag to produce an instrumented binary and then use llvm-objcopy to extract the .mipmap file.

$ clang -g -fmachine-profile-generate main.cpp
$ llvm-objcopy --dump-section=__llvm_mipmap=default.mipmap a.out /dev/null
$ llvm-strip -g a.out -o a.out.stripped

This will produce the instrumented binary a.out and a map file default.mipmap.

When we run the binary, it will produce a default.mipraw file containing the profile data for that run.

$ ./a.out.stripped
$ ls
a.out    default.mipmap    default.mipraw    main.cpp

Then we use our custom tool to postprocess the raw profile and produce the final profile default.mip.

$ llvm-mipdata create -p default.mip default.mipmap
$ llvm-mipdata merge -p default.mip default.mipraw

If our binary has debug info, we can use it to report source information along with the profile data.

$ llvm-mipdata show -p default.mip --debug a.out
_Z3fooi
  Source Info: /home/main.cpp:9
  Call Count: 0
  Block Coverage:
     COLD COLD COLD COLD COLD

_Z3bari
  Source Info: /home/main.cpp:16
  Call Count: 1
  Block Coverage:
     HOT  HOT  COLD HOT  HOT

Finally, we can consume the profile using the clang flag -fmachine-profile-use= to produce a profile-optimized binary.

$ clang -fmachine-profile-use=default.mip main.cpp

bcain added a subscriber: bcain.Jun 10 2021, 3:28 PM

kyulee added a subscriber: kyulee.Jun 10 2021, 3:30 PM

drodriguez added a subscriber: drodriguez.Jun 10 2021, 3:32 PM

cc @davidxl @xur @wmi

ellis added a subscriber: dberris.Jun 10 2021, 3:38 PM

Very interesting work. A high level question, the CSFDO/CSPGO instruments the program after the inlining pass, so it can provide the same level of coverage for machine IR. What additional information do we expect from MIR profile that is not available in CSFDO?

In D104060#2811754, @davidxl wrote:

Very interesting work. A high level question, the CSFDO/CSPGO instruments the program after the inlining pass, so it can provide the same level of coverage for machine IR. What additional information do we expect from MIR profile that is not available in CSFDO?

One of the main features of MIP is that the instrumented binaries do not contain any excess metadata so they are very small. In fact, all the metadata is extracted out into a .mipmap file. Later diffs (https://github.com/ellishg/llvm-project) add more profile data including function call counts, function call order, and samples of the return address register which we can use to generate a dynamic call graph that includes dynamic dispatch calls.

Can you compare the instrumentation overhead with -fcs-profile-generate? (as with -fprofile-generate). You may also want to disable value profiling (which can be expensive) in -fcs-profile-generate.

In the MIP description, it mentions that the counter size is 1 byte. This is good enough for coverage testing, but not enough to track hotness.

alexander-shaposhnikov added a subscriber: alexander-shaposhnikov.Jun 10 2021, 4:38 PM

smeenai added a subscriber: smeenai.Jun 10 2021, 4:41 PM

Thanks for sharing this! Specifically re: the -fprofile-generate v. MIP comparison for code coverage, perhaps it would be more fair to compare against -finstrument-function-entry-bare. The latter only adds one call per function, whereas -fprofile-generate can add instrument a function in more than one place.

In the MIP description, it mentions that the counter size is 1 byte. This is good enough for coverage testing, but not enough to track hotness.

The counter is 1 byte when only function coverage is enabled (-fmachine-profile-function-coverage). There is another mode (-fmachine-profile-call-graph) which has two 4 byte counters, one for call counts, and the other for call order, which can be used for hotness.

lanza added a subscriber: lanza.Jun 10 2021, 5:38 PM

ellis edited the summary of this revision. (Show Details)Jun 10 2021, 6:25 PM

Herald added a reviewer: alexander-shaposhnikov. · View Herald TranscriptJun 10 2021, 6:25 PM

Fix small alignment bug in codegen

ellis added a parent revision: D104086: Add compiler-rt MIP support.Jun 10 2021, 6:32 PM

ellis removed a parent revision: D104086: Add compiler-rt MIP support.Jun 10 2021, 6:45 PM

ellis added a child revision: D104086: Add compiler-rt MIP support.

Can you compare the instrumentation overhead with -fcs-profile-generate? (as with -fprofile-generate). You may also want to disable value profiling (which can be expensive) in -fcs-profile-generate.

Thanks for sharing this! Specifically re: the -fprofile-generate v. MIP comparison for code coverage, perhaps it would be more fair to compare against -finstrument-function-entry-bare. The latter only adds one call per function, whereas -fprofile-generate can add instrument a function in more than one place.

@davidxl @vsk

Yes, I am planning to run some more simple benchmarks from llvm-test-suite for code size and performance of the instrumented binaries. I can compare MIP vs -finstrument-function-entry-bare, but are there any flags I should compare against?

Harbormaster completed remote builds in B108730: Diff 351323.Jun 10 2021, 7:12 PM

ellis mentioned this in D104086: Add compiler-rt MIP support.Jun 10 2021, 7:17 PM

You may compare the overhead with -fsanitize-coverage=func,inline-8bit-counters,pc-table. The latter only instruments the entry block of a function and uses a one-byte counter.

tschuett added a subscriber: tschuett.Jun 11 2021, 2:32 AM

I've collected some size and runtime metrics from llvm-test-suite. The steps and results can be found in this gist: https://gist.github.com/ellishg/92a68cf82bfdeccd10225154425edc69

Keep in mind that a major goal is that MIP instrumented binaries should be as small as possible without sacrificing usability or functionality. The metrics that I measured in the gist are the binary size and the execution time of instrumented binaries. Like I said in the description, MIP instrumented binaries are usually 2-5% larger than non-instrumented binaries. We can compare this to the -fsanitize-coverage=func,inline-bool-flag,pc-table flag which seems to produce instrumented binaries that can be 100x larger. The extra size likely comes from the function metadata that is left in the binary, whereas MIP extracts all excess metadata out of the binary.

I also looked into comparing against -finstrument-function-entry-bare and -finstrument-functions. It seems that they inject calls to void __cyg_profile_func_enter_bare(); or void __cyg_profile_func_enter (void *this_fn, void *call_site); at the start of each function. The first flag doesn't really provide enough context know the caller and mark it as covered. The second does provide this information, but it still would require some work to collect profile data in a consumable way as MIP does. That being said, these flags seem provide similar size and runtime overhead as MIP, albeit with much less functionality.

It should also be noted that all this is only considering MIP function coverage. MIP can profile much more data including call counts and the dynamic call graph, likely with very similar performance metrics.

Do you have runtime number with -fcs-profile-generate -mllvm
-disable-vp=true

This will provide more data (BB count) than what MIP can do.

Besides, more options can be added to control the type of profile data
collected in IRPGO to further reduce overhead.

David

In D104060#2814525, @davidxl wrote:

Do you have runtime number with -fcs-profile-generate -mllvm
-disable-vp=true

This will provide more data (BB count) than what MIP can do.

Besides, more options can be added to control the type of profile data
collected in IRPGO to further reduce overhead.

David

I've added those numbers here https://gist.github.com/ellishg/92a68cf82bfdeccd10225154425edc69#gistcomment-3777527

The flags -fcs-profile-generate -mllvm -disable-vp=true appears to increase binary size by 4-9x and execution time is increased in many cases.

Fix clang-tidy errors

@davidxl

Can you use the same set of benchmarks for comparison?

I'm not sure what you mean. In https://gist.github.com/ellishg/92a68cf82bfdeccd10225154425edc69 I used llvm-test-suite/MultiSource/Benchmarks/ for all tests.

is MIP used in the test only for function coverage or have call counts collected as well?

Just function coverage for simplicity, but full MIP does not add very much to the size.

for -fcs-profile-generate, the size increase may mostly come from the name section. Can you compare .text + .data + .bss

Sure, I can do this next week.

Harbormaster completed remote builds in B108920: Diff 351601.Jun 11 2021, 6:49 PM

MIP instrumentation pass should happen before branch relaxation

Harbormaster completed remote builds in B108943: Diff 351625.Jun 11 2021, 9:30 PM

@davidxl

I've added text size data to https://gist.github.com/ellishg/92a68cf82bfdeccd10225154425edc69#gistcomment-3778109 and hopefully it is more clear. I randomly chose the test MultiSource/Benchmarks/FreeBench/fourinarow.test to show the size data. You can see the raw data in https://gist.github.com/ellishg/156639f24d728a88067f903cb53e1643

Base

"size..text": 4629
"size..data": 8,
"size..bss": 120,

MIP

"size..text": 4741,
"size..data": 8,
"size.__llvm_mipmap": 768,
"size.__llvm_mipraw": 41,
"size..bss": 120,

`-fcs-profile-generate`

"size..text": 24322,
"size..data": 176,
"size.__llvm_prf_cnts": 1336,
"size.__llvm_prf_data": 816,
"size.__llvm_prf_names": 117,
"size.__llvm_prf_vnds": 24576,
"size..bss": 8952,

Thanks for the data. With value profile disabled, __llvm_prf_vnds section
should not be emitted -- there is a bug in the compiler.

Thinking about MIP's use case a little, it seems that it actually matches
what xray does. Xray has very low runtime overhead and can be turned on
always : xra https://llvm.org/docs/XRay.htmly. Have you compare with xray
and consider using that?

David

ellis added a comment.

@davidxl

I've added text size data to
https://gist.github.com/ellishg/92a68cf82bfdeccd10225154425edc69#gistcomment-3778109
and hopefully it is more clear. I randomly chose the test
MultiSource/Benchmarks/FreeBench/fourinarow.test to show the size data.
You can see the raw data in
https://gist.github.com/ellishg/156639f24d728a88067f903cb53e1643

Base
"size..text": 4629
"size..data": 8,
"size..bss": 120,
MIP
"size..text": 4741,
"size..data": 8,
"size.__llvm_mipmap": 768,
"size.__llvm_mipraw": 41,
"size..bss": 120,
-fcs-profile-generate
"size..text": 24322,
"size..data": 176,
"size.__llvm_prf_cnts": 1336,
"size.__llvm_prf_data": 816,
"size.__llvm_prf_names": 117,
"size.__llvm_prf_vnds": 24576,
"size..bss": 8952,
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D104060/new/
https://reviews.llvm.org/D104060

In D104060#2815509, @davidxl wrote:

Thinking about MIP's use case a little, it seems that it actually matches
what xray does. Xray has very low runtime overhead and can be turned on
always : xra https://llvm.org/docs/XRay.htmly. Have you compare with xray
and consider using that?

In the space we care about, any increase in binary size regresses runtime performance. Yes there are other instrumentations that provide similar features with low runtime overhead, but MIP seems to be the only one that extracts out all metadata to minimize binary size overhead.

I've collected some section size data from the fourinarow test and have more size comparisons in https://gist.github.com/ellishg/92a68cf82bfdeccd10225154425edc69#gistcomment-3778753
Please let me know if there are more flags I can use to turn off XRay features so that this is a more fair comparison.

XRay

`-fxray-instrument -fxray-instrumentation-bundle=function-entry`

size..text: 154802
size..data: 10468
size.xray_instr_map: 512
size..bss: 594144

In D104060#2815509, @davidxl wrote:

Thinking about MIP's use case a little, it seems that it actually matches
what xray does. Xray has very low runtime overhead and can be turned on
always : xra https://llvm.org/docs/XRay.htmly. Have you compare with xray
and consider using that?

Perhaps xray could be improved to reduce code size overhead to have feature parity with MIP?

Thanks for sharing this! Specifically re: the -fprofile-generate v. MIP comparison for code coverage, perhaps it would be more fair to compare against -finstrument-function-entry-bare. The latter only adds one call per function, whereas -fprofile-generate can add instrument a function in more than one place.

Some historical context, I wrote an instrumentation little while back https://reviews.llvm.org/D74362 which adds similar instrumentation for function entry but at the IR level.

David

ellis added a comment.
@davidxl

I've added text size data to
https://gist.github.com/ellishg/92a68cf82bfdeccd10225154425edc69#gistcomment-3778109
and hopefully it is more clear. I randomly chose the test
MultiSource/Benchmarks/FreeBench/fourinarow.test to show the size data.
You can see the raw data in
https://gist.github.com/ellishg/156639f24d728a88067f903cb53e1643

Base
"size..text": 4629
"size..data": 8,
"size..bss": 120,
MIP
"size..text": 4741,
"size..data": 8,
"size.__llvm_mipmap": 768,
"size.__llvm_mipraw": 41,
"size..bss": 120,
-fcs-profile-generate
"size..text": 24322,
"size..data": 176,
"size.__llvm_prf_cnts": 1336,
"size.__llvm_prf_data": 816,
"size.__llvm_prf_names": 117,
"size.__llvm_prf_vnds": 24576,
"size..bss": 8952,
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D104060/new/
https://reviews.llvm.org/D104060

Yes there are other instrumentations that provide similar features with low runtime overhead, but MIP seems to be the only one that extracts out all metadata to minimize binary size overhead.

I echo this. There are instrumentations with similar features (e.g, code coverage) but they required to keep metadata in the final binary, The other/rest lightweight instrumentations (without metadata) are far from complete to match the feature that MIP provides.
The key differentiation of MIP is that the metadata (mipmap) is expressed with relative relocations, which can be extractable in any build target and platform, which took us non-trivial amount of time to make them right.
In theory, we could improve other existing insturmentations like MIP to extract metadata, but this may require a significant amount of changes in the architect and format.

MIP has achieved great size reduction for instrumented binary. My
understanding the savings are mainly from the following:

Smaller counter size (1 byte or 4 byte instead of 8 byte for IR PGO)
extractable per function metadata (mipmap). Using this technique may

increase object file size more (due to extra relocations), but will reduce
executable size.

I don't understand (in block coverage mode) how the .text size can be
reduced. I have not looked at the patch in detail, but the test case shows
that the counter update is simply a 'movb 0, $counter' even in non-coverage
mode, is that expected?

I think 2) is something worth introducing to IRPGO under a flag.

David

MaskRay added inline comments.Jun 13 2021, 3:00 PM

llvm/lib/CodeGen/MIPSectionEmitter.cpp

I want to try out the patch but I have noticed some layering violation.

If you do a -DBUILD_SHARED_LIBS=on build, you'll get errors like

ld.lld: error: undefined symbol: llvm::AsmPrinter::createTempSymbol(llvm::Twine const&) const
>>> referenced by MIPSectionEmitter.cpp:60 (/home/maskray/llvm/llvm/lib/CodeGen/MIPSectionEmitter.cpp:60)
>>>               lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MIPSectionEmitter.cpp.o:(llvm::MIPSectionEmitter::runOnMachineFunctionStart(llvm::MachineFunction&))
>>> referenced by MIPSectionEmitter.cpp:128 (/home/maskray/llvm/llvm/lib/CodeGen/MIPSectionEmitter.cpp:128)
>>>               lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MIPSectionEmitter.cpp.o:(llvm::MIPSectionEmitter::emitMIPHeader(llvm::MachineProfile::MIPFileType))
>>> referenced by MIPSectionEmitter.cpp:237 (/home/maskray/llvm/llvm/lib/CodeGen/MIPSectionEmitter.cpp:237)
>>>               lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MIPSectionEmitter.cpp.o:(llvm::MIPSectionEmitter::emitMIPFunctionInfo(llvm::MIPSectionEmitter::MFInfo&))

because of -Wl,-z,defs

djtodoro added a subscriber: djtodoro.Jun 13 2021, 11:29 PM

In D104060#2815939, @davidxl wrote:

MIP has achieved great size reduction for instrumented binary. My
understanding the savings are mainly from the following:

Smaller counter size (1 byte or 4 byte instead of 8 byte for IR PGO)

extractable per function metadata (mipmap). Using this technique may

increase object file size more (due to extra relocations), but will reduce
executable size.

That is correct. For function coverage MIP inject only one movb instruction in x86 so that the total overhead for every function is 1 byte of global data + 7 bytes of text.

I don't understand (in block coverage mode) how the .text size can be
reduced. I have not looked at the patch in detail, but the test case shows
that the counter update is simply a 'movb 0, $counter' even in non-coverage
mode, is that expected?

Let me try to clear something up. The size tests I ran in https://gist.github.com/ellishg/92a68cf82bfdeccd10225154425edc69 were for function coverage only and block coverage disabled. For block coverage, we inject a single movb instruction and another global byte for each basic block so the size overhead is very similar to function coverage. Can you point me to the test case you are referring to?

I think 2) is something worth introducing to IRPGO under a flag.

David

manmanren added a subscriber: manmanren.Jun 14 2021, 11:36 AM

Fix build failures when using -DDBUILD_SHARED_LIBS=On. Thanks @MaskRay

Harbormaster completed remote builds in B109205: Diff 352006.Jun 14 2021, 4:16 PM

I have played with the patch.

-fmachine-profile-generate only inserts a __llvm_mip_call_counts_caller function call. There is no basic block instrumentation, so this is just the function entry count coverage mode.

-fmachine-profile-generate -fmachine-profile-function-coverage changes the __llvm_mip_call_counts_caller call to movb $0, counter(%rip).
So this matches the traditional binary coverage mode.
This mode is supported by clang -fsanitize-coverage=func,inline-bool-flag,pc-table.
inline-bool-flag uses a conditional set because IIUC under concurrency this is faster than a racy write.
If needed -fsanitize-coverage=inline-bool-flag can introduce a mode to use a racy write.

-fmachine-profile-generate -fmachine-profile-block-coverage inserts movb $0, counter(%rip) for machine basic blocks.
This is a vertex profile (less powerful than a edge profile).
This mode is supported by clang -fsanitize-coverage=edge,inline-bool-flag,pc-table
Mapping the information to source files will require debug info (-g1).

Traditional gcc/clang coverage features (-fprofile-arcs/-fprofile-instr-generate/-fprofile-generate) are all about edge profiles and use word-size counters.
If the size is a concern, it is probably reasonable to use 32-bit counters, but smaller counters may not be suitable for PGO.

__llvm_mip_call_counts_caller is slow.
It is a function with a custom call convention using RAX as the argument on x86-64.
The impl detail function saves and restores many vector registers.
I haven't studied why __llvm_mip_call_counts_caller is needed.

__llvm_prf_data (-fprofile-generate, -fprofile-instr-generate) vs __llvm_mipmap (-fmachine-profile-generate)

In the absence of value profiling, __llvm_prf_data uses:

.quad NameRef
.quad FuncHash
.quad .L__profc_fun
.quad fun   # need a dynamic relocation; used by raw profile reader
.quad 0     # value profiling
.long NumCounters
.long 0     # value profiling, 2 unused value sites

If we want to save size for small linked images, we can change some .quad to .long.
e.g. if the number of functions is smaller than 2**16 (or slightly larger), we can use a 32-bit hash.
.L__profc_fun can use .long if the size cannot overflow 32-bit.

Note that 2 fields are only used by value profiling.

__llvm_mipmap has these fields. I added an inline comment that -shared doesn't work.

        .section        __llvm_mipmap,"aw",@progbits
        .globl  _Z3fooPiS_$MAP
        .p2align        3
_Z3fooPiS_$MAP:
.Lref2:
  ### not sure why this is needed
        .long   __start___llvm_mipraw-.Lref2    # Raw Section Start PC Offset

  ##### this does not link in -fpic -shared mode
        .long   _Z3fooPiS_$RAW-.Lref2           # Raw Profile Symbol PC Offset

        .long   _Z3fooPiS_-.Lref2               # Function PC Offset
        .long   .Lmip_func_end0-_Z3fooPiS_      # Function Size
        .long   0x0                             # CFG Signature
        .long   0                               # Non-entry Block Count
        .long   10                              # Function Name Length
        .ascii  "_Z3fooPiS_"

So the existing instrumentations are either in Clang or LLVM IR. Sort them by their position in the compiler pipeline:

-fprofile-instr-generate: clang CodeGen. Very few optimizations can be applied.
-fprofile-arcs: very early in the optimization pipeline. The coverage information is from debug info, so being early can have debug info with fidelity.
-fcs-profile-generate: in buildModuleSimplificationPipeline, no pre-inliner, before PassBuilder::buildInlinerPipeline
-fprofile-generate: in buildModuleSimplificationPipeline, after pre-inliner, before PassBuilder::buildInlinerPipeline; then a separate context-sensitive PGO instrumentation/use is added after inlining, early in the optimization pipeline

My understanding about -fprofile-instr-generate vs -fprofile-generate.

In clang you have full information about line/column/region information.
So -fprofile-instr-generate works well for coverage.

At IR level, we can use debug info to convey some source-level information to IR but some information is lost (e.g. column information), so clang -fprofile-arcs is less accurate.

However, the frontend is not at a good position applying various optimizations.

For instance, the important Kirchoff's circult law (aka spanning tree) optimization is not implemented in -fprofile-instr-generate. (I added the optimization to clang -fprofile-arcs).
So in bad cases (e.g. libvpx) -fprofile-instr-generate can be 15% slower than -fprofile-arcs/-fprofile-generate.
(
As of why it is so efficient: normally the edge count |E| is only slightly larger than the vertex count |V|. There is a large difference between |E|-|V|+1 and |E|.

For instance, a no-branch function just needs one counter.
)

The loop optimization (instead of adding a counter N times, add N to it) cannot be enabled.
The benefit is relatively small, though.

The frontend cannot apply inlining or some early optimizations to greatly decrease the number of counters.

Now this patch series adds machine basic blocks instrumentation.
I wonder what it can do while the regular IR instrumentation cannot.

Machine basic block instrumentation has some awkward points.
Semantic information is lost. The loop optimization definitely cannot be applied.
If an IR basic block maps to multiple machine basic blocks, you need multiple increments for each MBB while with IR BB you may just need one (e.g. dominator).
Edge profiling is tricky. Edge profiling requires splitting critical edges - it is not clear how you can do this after the machine basic block layout is finalized.

Good summary.

For instance, the important Kirichoff's circult law (aka spanning tree)
optimization is not implemented. (I added the optimization to clang
-fprofile-generate).

You probably meant -fprofile-instr-generate :)

So in bad cases (e.g. libvpx) -fprofile-instr-generate can be 15% slower
than -fprofile-arcs/-fprofile-generate.

The loop optimization (instead of adding a counter N times, add N to it)
cannot be enabled.
The benefit is relatively small, though.

The frontend cannot apply inlining or some early optimizations to greatly
decrease the number of counters.

Instrumenting machine basic blocks feels awkward to me.
Now much semantic information is lost. The loop optimization definitely
cannot be applied.
Edge profiling is tricky. Edge profiling requires splitting critical edges

it is not clear how you can do this after the machine basic block layout

is finalized.

David

Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D104060/new/
https://reviews.llvm.org/D104060

In D104060#2818277, @davidxl wrote:

Good summary.

For instance, the important Kirichoff's circult law (aka spanning tree)
optimization is not implemented. (I added the optimization to clang
-fprofile-generate).

You probably meant -fprofile-instr-generate :)

Thanks for noticing the typo:) Corrected. @xur implemented the optimization for -fprofile-generate (D12781). I ported it to -fprofile-arcs.
-fprofile-instr-generate doesn't have the optimization.

I've created a ~~gist~~ comment to help explain some of the unique implementation details and discuss reasons for adding a new framework instead of extending some existing pgi.

https://gist.github.com/ellishg/46f04bc7761e64841653f0bcad64a1aa
(The gist has been moved to a comment so people can easily add comments)

We can also take discussions offline if you have more questions. You can email me (ellis.sparky.hoag@gmail.com) and @kyulee (kyulee@fb.com) to setup a VC.

CC @MaskRay @davidxl

Section Layout

MIP's major feature is the ability to extract all metadata from the instrumented binary to reduce its size. This is possible by using two sections, __llvm_mipraw to store profile data and __llvm_mipmap to store function metadata. The map section is extracted from the binary before runtime, and the raw section is dumped at runtime. Both sections are then combined to a final .mip profile. Both sections have a 24 byte header and a box of data for each instrumented function.

`__llvm_mipraw`

The contents of this section depends on the type of instrumentation used, but the data is always "unreadable" without the map section. For function coverage, we simply allocate a single byte for each instrumented function and initialize it to all ones.

_Z3foov$RAW:
  .byte  0xff

`__llvm_mipmap`

The map section allows us to map instrumented functions to their profile data in the raw section. The two most important values are the function name and the offset to the profile data in the raw section. With these values we can read a function's profile data using the offset.

_Z3foov$MAP:
.Lref:
  .long  __start___llvm_mipraw-.Lref     # Raw Section Start PC Offset
  .long  _Z3foov$RAW-.Lref               # Raw Profile Symbol PC Offset
  .long  _Z3foov-.Lref                   # Function PC Offset
  .long  [[FOO_END]]-_Z3foov             # Function Size
  .long  0x70c9fa27                      # CFG Signature
  .long  2                               # Non-entry Block Count
  .long  [[FOO_BLOCK0]]-_Z3foov          # Block 0 Offset
  .long  [[FOO_BLOCK1]]-_Z3foov          # Block 1 Offset
  .long  7                               # Function Name Length
  .ascii  "_Z3foov"

The main challenge is storing the offset to the profile data without using dynamic relocations. This is complicated by the fact that we use comdat sections within the __llvm_mipraw section and that ELF does not seem to directly support section relative addresses. The solution is to use PC relative relocations. __start___llvm_mipraw-.Lref gives us the PC relative offset to the start of the raw section and _Z3foov$RAW-.Lref gives us the offset to the profile data for this function relative to the same PC. After we extract the map section, we can subtract these to get the value we want, the section relative raw profile data offset.

(_Z3foov$RAW-.Lref) - (__start___llvm_mipraw-.Lref) = _Z3foov$RAW - __start___llvm_mipraw

We can use the same trick to encode the function address, we just need to also add the address of the raw section which can be looked up in the binary. This is useful to lookup debug info and join it with our final profile data.

The other values are relatively straightforward. The function size and block offsets allow us to map block coverage data to debug info. The control flow graph signature is used to verify that the function has not changed since the profile was collected. And last we have the mangled function name to help identify the function.

Integration with Existing PGI?

There seems to be a question about why we chose to implement MIP from scratch instead of extending an existing framework. If we were to extend -fprofile-instr-generate we would need to have extractable metadata, which may be too invasive to implement.

As shown above, a lot of work was done to make sure the metadata can be extracted correctly.
Existing pgi has structured raw data that would need to be moved to the extractable metadata section.
Our MIP tool has a llvm-mipdata create command which converts the map section to an “empty” profile that can later be merged with raw profiles. Existing pgi tools do not have this extra step.

MIP Edge Profile

We omitted the code that adds call edge profiles to reduce the complexity of the current review, but we do plan to upload it. For each instrumented function, we sample return address values so we can build a dynamic call graph from the profile. The format is largely the same, but we added a size-configurable static buffer to hold the return address values. This approach correctly profiles dynamic dispatch calls that are found in Objective-C and Swift as well as any normal function calls. We can, for example, identify candidate for objc_direct using this call edge data.

Optimization

We also omitted the profile consumption and optimization code from the current review. Our focus for optimization is on outlining to reduce size and function ordering to reduce page faults. When we consume profile data with function coverage, block coverage, or function call counts we can make smarter inlining and outlining decisions based on hotness. When we have profile data with timestamps, call counts, or the dynamic call graph then we can generate an optimal function order that reduces the number of page faults.

I think people's main question is what distinguishing features make MachineFunction instrumentation appealing.

MIP Edge Profile, Optimization

The two are very inconvenient at the MachineFunction/MachineBasicBlock level...
I don't know how you can make edge profiling work for BB transitions...

I changed clang -fprofile-arcs to use critical edge splitting insert of PHI nodes and noticed some code generation improvement.
I know this is difficult/infeasible if you cannot split machine basic blocks...

For the profile format, there is indeed a bit redundancy in -fprofile-generate/-fprofile-instr-generate.
Some fields are reserved even if value profiling is not used. I do not have a good idea how we can save the space for coverage usage.
Some fields are 64-bit for generality. As I mentioned, a 32-bit CFG signature makes it less robust when the number of functions exceed roughly 2**16.
The 32-bit Function PC Offset is probably sufficient for most usage but will not work with medium/large code model programs.

I have been slowly trying to making -fprofile-generate/-fprofile-instr-generate object files/linkaged images smaller (e.g. D103372) since last year (I have much to learn..).
(I can test on Linux and Windows, so I'll try making both work. I don't have Mach-O but I am happy to report whatever issues I have found, though.)
I do plan to try PC-relative relocations (I made such improvement for XRay: D78082/D78590/D87977; the only portability issue is that we will require the integrated assembler for mips64)
and probably make the symbol in __llvm_prf_data local alias to avoid an R_*_RELATIVE dynamic relocation.
(I need to study more about llvm-profdata.)

In D104060#2818268, @MaskRay wrote:

__llvm_mip_call_counts_caller is slow.
It is a function with a custom call convention using RAX as the argument on x86-64.
The impl detail function saves and restores many vector registers.
I haven't studied why __llvm_mip_call_counts_caller is needed.

Yes, __llvm_mip_call_counts_caller is not optimal, but we wanted to first have correctness. Since we are injecting calls to the runtime at the very beginning of functions, we save/restore the stack frame in __llvm_mip_call_counts_caller. In our return address instrumentation code, we also use this helper function to pass the return address register to the runtime.

__llvm_mipmap has these fields. I added an inline comment that -shared doesn't work.

Unfortunately, yes, it seems -shared does not work, but I don't know enough about it to have ideas for fixes at the moment.

        .section        __llvm_mipmap,"aw",@progbits
        .globl  _Z3fooPiS_$MAP
        .p2align        3
_Z3fooPiS_$MAP:
.Lref2:
  ### not sure why this is needed
        .long   __start___llvm_mipraw-.Lref2    # Raw Section Start PC Offset

  ##### this does not link in -fpic -shared mode
        .long   _Z3fooPiS_$RAW-.Lref2           # Raw Profile Symbol PC Offset

        .long   _Z3fooPiS_-.Lref2               # Function PC Offset
        .long   .Lmip_func_end0-_Z3fooPiS_      # Function Size
        .long   0x0                             # CFG Signature
        .long   0                               # Non-entry Block Count
        .long   10                              # Function Name Length
        .ascii  "_Z3fooPiS_"

In the previous comment I describe these fields in detail.

Now this patch series adds machine basic blocks instrumentation.
I wonder what it can do while the regular IR instrumentation cannot.

Machine basic block instrumentation has some awkward points.
Semantic information is lost. The loop optimization definitely cannot be applied.
If an IR basic block maps to multiple machine basic blocks, you need multiple increments for each MBB while with IR BB you may just need one (e.g. dominator).
Edge profiling is tricky. Edge profiling requires splitting critical edges - it is not clear how you can do this after the machine basic block layout is finalized.

The benefit of instrumenting machine basic blocks is we can easily mark MBBs that were not executed as candidates for outlining. We can definitely apply Kirchoff's cirtuit law optimization to reduce the number of stores.

In D104060#2823466, @MaskRay wrote:

I have been slowly trying to making -fprofile-generate/-fprofile-instr-generate object files/linkaged images smaller (e.g. D103372) since last year (I have much to learn..).
(I can test on Linux and Windows, so I'll try making both work. I don't have Mach-O but I am happy to report whatever issues I have found, though.)
I do plan to try PC-relative relocations (I made such improvement for XRay: D78082/D78590/D87977; the only portability issue is that we will require the integrated assembler for mips64)
and probably make the symbol in __llvm_prf_data local alias to avoid an R_*_RELATIVE dynamic relocation.
(I need to study more about llvm-profdata.)

I'm really happy to see this work! I also have much to learn so I'll try to keep an eye out for related diffs in the future.

In D104060#2823466, @MaskRay wrote:

For the profile format, there is indeed a bit redundancy in -fprofile-generate/-fprofile-instr-generate.
Some fields are reserved even if value profiling is not used. I do not have a good idea how we can save the space for coverage usage.
Some fields are 64-bit for generality. As I mentioned, a 32-bit CFG signature makes it less robust when the number of functions exceed roughly 2**16.

Actually, the 32-bit CFG signature we use in MIP is not unique to functions. If two functions have the same basic block layout, they will have the same CFG signature and that is not a problem. The field is used to determine if a function has changed from when it was profiled. I'm not sure if this is different from the other profile formats.

The 32-bit Function PC Offset is probably sufficient for most usage but will not work with medium/large code model programs.

That is true, but we wanted to use 32 bit values to maintain consistency with armv7 targets. We could probably add a flag to support 64 bit values if we need to.

The __llvm_mipmap section should not have the SHF_ALLOC bit set.

In D104060#2823466, @MaskRay wrote:

I think people's main question is what distinguishing features make MachineFunction instrumentation appealing.

MIP Edge Profile, Optimization

The two are very inconvenient at the MachineFunction/MachineBasicBlock level...
I don't know how you can make edge profiling work for BB transitions...

MIP does not (cannot) collect BB edge data but MachineBlock coverage as needed (optional).
But, MIP can collect call-edge data for all call-sites including dynamic dispatch calls that are not covered by LLVM IR instrumentation.
As commented earlier, MIP is initially designed for mobile applications where majority calls are dynamic. In this world, size or minimum size optimizations are typically enabled.
So, traditional speed optimizations like inlining or vectorization from BB edge profiles were not a great concern.
Instead, the MIP data were mainly used for ordering, separation, or outlining to minimize CPU penalties while saving as much size as possible.

We understand if we want an IR level profile (e.g. BB edge profiles) for IR level optimizations, it would be tricky because MIP instrument Machine IRs.
However, internally we've already experimented a SamplePGO like conversion to generate LLVM IR profile converted from MIP using symbolication.
Certainly this will lose precision, but it's generally good enough in this domain because the majority of speed optimizations will be blocked anyhow under minimum size-opt.

Nonetheless, I think it's also worth revisiting MIP-like implementation at IR level to support full IR profiles including BB edge profile by reusing as much LLVM IR instrumentation code as possible.
I do still think refactoring the profile format of the existing LLVM IR instrumentation for the integration of MIP seems too disruptive while breaking the existing infra or usage.
Instead, I'd like to keep the mipmap (metadata) layout in MC to make them extractable, and this means we may retain pseudo ops all the way down to MC from IR.

Harbormaster completed remote builds in B109626: Diff 352597.Jun 17 2021, 3:43 AM

Raw symbols should have hidden visibility so that -fPIC -shared works

Harbormaster completed remote builds in B109826: Diff 352871.Jun 18 2021, 7:36 AM

Some email conversations are not on Phabricator. I record a copy here so that people who are not subscribed can have a full view

davidxl: I believe you mean -fprofile-generate or -fcs-profile-generate. -fprofile-instr-generate is based on front end AST and eventually will be hidden under -fcoverage-test for source coveraging purpose only.
davidxl: As you can see we are currently making an effort to unify/simplify the PGO implementation. Having yet another instrumentation mechanism is doing the opposite.

As shown above, a lot of work was done to make sure the metadata can be extracted correctly.
Existing pgi has structured raw data that would need to be moved to the extractable metadata section.
Our MIP tool has a llvm-mipdata create command which converts the map section to an “empty” profile that can later be merged with raw profiles. Existing pgi tools do not have this extra step.

davidxl: As I said, size improvement efforts (under options) are welcome for the existing IRPGO. Another benefit is that we can have consolidated effort on improving related toolings.

MIP Edge Profile

davidxl: Adding this duplicate functionality (edge profiling) makes it even less compelling to do it MIR level.
davidxl: For the dynamic dispatching profiling, does it handle any number of indirect targets or only supports topN?

Optimization

davidxl: See above, adding any missing features in the existing framework is a more prefered approach. I am yet to see convincing arguments that spinning off a new instrumentation framework is the way to go.

maskray: I think people's main question is what distinguishing features make MachineFunction instrumentation appealing.

maskray: > MIP Edge Profile, Optimization

maskray: The two are very inconvenient at the MachineFunction/MachineBasicBlock level...
maskray: I don't know how you can make edge profiling work for BB transitions...

kyulee: MIP does not (cannot) collect BB edge data but MachineBlock coverage as needed (optional).
kyulee: But, MIP can collect call-edge data for all call-sites including dynamic dispatch calls that are not covered by LLVM IR instrumentation.

davidxl: IR instrumentation supports indirect call target profiling. I suppose MIP has a lightweight mechanism at the cost of tracking precision? Anyway, I don't think this is not something IR instrumentation can not have.

kyulee: As commented earlier, MIP is initially designed for mobile applications where majority calls are dynamic. In this world, size or minimum size optimizations are typically enabled.
kyulee: So, traditional speed optimizations like inlining or vectorization from BB edge profiles were not a great concern.
kyulee: Instead, the MIP data were mainly used for ordering, separation, or outlining to minimize CPU penalties while saving as much size as possible.

davidxl: Edge profiling helps size optimization as well -- we recently added OptimizeForSize support at BB level so that cold blocks can be better size optimized.
davidxl:
davidxl: Another plug -- if you are interested in size optimization, the ML based size optimization is also available in LLVM -- it beats -Oz.

kyulee:We understand if we want an IR level profile (e.g. BB edge profiles) for IR level optimizations, it would be tricky because MIP instrument Machine IRs.

davidxl: -fcs-profile-generate is very late in the IR pipeline after inlining transformations, so there is very little information loss when passing to MIR.

kyulee: However, internally we've already experimented a SamplePGO like conversion to generate LLVM IR profile converted from MIP using symbolication.
kyulee: Certainly this will lose precision, but it's generally good enough in this domain because the majority of speed optimizations will be blocked anyhow under minimum size-opt.

davidxl: Do you mean 'converting MIP' for IR passes?

kyulee: Nonetheless, I think it's also worth revisiting MIP-like implementation at IR level to support full IR profiles including BB edge profile by reusing as much LLVM IR instrumentation code as possible.

davidxl: yes.

kyulee: I do still think refactoring the profile format of the existing LLVM IR instrumentation for the integration of MIP seems too disruptive while breaking the existing infra or usage.

davidxl: It is fine to do this under an option and produce profile data with different magic number or flavor bit -- this is well supported.

kyulee: I understand IR instrumentation has value-profile for indirect call targets at call-site. I don’t think IR instrumentation covers dynamic dispatch call-site like msgSend whose target resolution happens at runtime. Instead of instrumenting each call-site, MIP tracks return address values at the function entry via a sort of back-tracking to reconstruct call-edges regardless of all type of calls – direct, indirect, or dynamic.
kyulee: MIP does this instrumentation at post-RA where assembly level coding is relatively straightforward. I think doing this at IR before frame-lowering will need extra overhead/mechanic to ensure this instrumentation happens in the very beginning of function.

davidxl: Do you mean 'converting MIP' for IR passes?

kyulee: Yes. MIP is not IR-attached, but rather tagged on machine address with which we can easily correlate debug data.
kyulee: So, it’s possible to construct a SamplePGO profile that is consumable for IR passes.

davidxl: Basically the callsite context (counter) is passed to the caller so it can do the profiling. GCC does that too.
davidxl: For IRPGO, we plan to add dynamic type profiling at some point. Once that is ready, the problem of message passing style call profiling will be handled.
davidxl: Also if edge profiling is available, profiling direct calls will be a waste as the information can be deduced.

@ellis

The main challenge is storing the offset to the profile data without using dynamic relocations. This is complicated by the fact that we use comdat sections within the llvm_mipraw section and that ELF does not seem to directly support section relative addresses. The solution is to use PC relative relocations. start___llvm_mipraw-.Lref gives us the PC relative offset to the start of the raw section and _Z3foov$RAW-.Lref gives us the offset to the profile data for this function relative to the same PC. After we extract the map section, we can subtract these to get the value we want, the section relative raw profile data offset.

(_Z3foov$RAW-.Lref) - (start_llvm_mipraw-.Lref) = _Z3foov$RAW - start_llvm_mipraw

I am unclear about this. Why does the $MAP section needs to know its relative position?

Note: if the $RAW symbol has a local linkage or has the hidden visibility, a label difference can indeed avoid a dynamic relative relocation. I have a patch D104556

# ELF: R_X86_64_PC64
.quad .L__profc_foo-.L__profd_foo

# Mach-O: a pair of X86_64_RELOC_UNSIGNED and X86_64_RELOC_SUBTRACTOR
.quad l___profc_foo-l___profd_foo

Unfortunately this is currently not representable in COFF:

% clang -fprofile-generate a.c -c -target x86_64-windows -o d.o
error: Cannot represent this expression

We can use the same trick to encode the function address, we just need to also add the address of the raw section which can be looked up in the binary. This is useful to lookup debug info and join it with our final profile data.

A __profd_$name variable has a similar field referencing the function. It is used by IPVK_IndirectCallTarget so that indirect call target profile can be translated to function names.

We can save the dynamic relocation with the following scheme:

; Note the sub constexpr
@__profd_main = private global { i64, i64, i64*, i64, i8*, i32, [2 x i16] } { i64 -2624081020897602054, i64 742261418966908927, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @__profc_main, i32 0, i32 0), i64 sub (i64 ptrtoint (i32 ()* @alias_main to i64), i64 ptrtoint ({ i64, i64, i64*, i64, i8*, i32, [2 x i16] }* @__profd_main to i64)), i8* null, i32 1, [2 x i16] zeroinitializer }, section "__llvm_prf_data", comdat($__profc_main), align 8

@alias_main = private alias i32 (), i32 ()* @main

define i32 @main() #0 {
  ...

For other fields of __profd_$name, other than the generality and mandatory value profiling (even if you don't use it) issues I mentioned in https://reviews.llvm.org/D104060#2818268 ,
I think the $MAP format is the same.

The $MAP format does not implement these things:

using private linkage as much as possible
ELF comdat any/noduplicates (this increases object file sizes but can enable --gc-sections for linked images)
function name compression

@davidxl

davidxl: For the dynamic dispatching profiling, does it handle any number of indirect targets or only supports topN?

Our edge profiling patch (not included) was also designed with code size and runtime in mind. We allocate a buffer and sample return addresses, possibly overwriting old values when the buffer is full. Data for common callsites will overwrite data for rare callsites, so I guess it's more like top N, but the size of the buffer can be as large as needed.

davidxl: Basically the callsite context (counter) is passed to the caller so it can do the profiling. GCC does that too.

Our method does not need to touch the callsite code to work. In fact, most of the implementation is done in D104089 so we mostly only change the runtime code.

davidxl: For IRPGO, we plan to add dynamic type profiling at some point. Once that is ready, the problem of message passing style call profiling will be handled.
davidxl: Also if edge profiling is available, profiling direct calls will be a waste as the information can be deduced.

In D104060#2827554, @ellis wrote:

@davidxl

davidxl: For the dynamic dispatching profiling, does it handle any number of indirect targets or only supports topN?

Our edge profiling patch (not included) was also designed with code size and runtime in mind. We allocate a buffer and sample return addresses, possibly overwriting old values when the buffer is full. Data for common callsites will overwrite data for rare callsites, so I guess it's more like top N, but the size of the buffer can be as large as needed.

Neat -- so the value profile (target histogram) of a given callsite is 'distributed' into the callees -- aka each callee function allocating a fixed size buffer to track all incoming edge frequencies? For some small utility functions, they usually have thousands of callsites (incoming edges), thus this approach may significantly reduce the profile precision for them? The simple LRU eviction policy may be bad for some patterns like ping-pong effect.

davidxl: Basically the callsite context (counter) is passed to the caller so it can do the profiling. GCC does that too.

Our method does not need to touch the callsite code to work. In fact, most of the implementation is done in D104089 so we mostly only change the runtime code.

Will take a look.

davidxl: For IRPGO, we plan to add dynamic type profiling at some point. Once that is ready, the problem of message passing style call profiling will be handled.
davidxl: Also if edge profiling is available, profiling direct calls will be a waste as the information can be deduced.

In D104060#2827459, @MaskRay wrote:

@ellis

The main challenge is storing the offset to the profile data without using dynamic relocations. This is complicated by the fact that we use comdat sections within the llvm_mipraw section and that ELF does not seem to directly support section relative addresses. The solution is to use PC relative relocations. start___llvm_mipraw-.Lref gives us the PC relative offset to the start of the raw section and _Z3foov$RAW-.Lref gives us the offset to the profile data for this function relative to the same PC. After we extract the map section, we can subtract these to get the value we want, the section relative raw profile data offset.

(_Z3foov$RAW-.Lref) - (start_llvm_mipraw-.Lref) = _Z3foov$RAW - start_llvm_mipraw

I am unclear about this. Why does the $MAP section needs to know its relative position?

The map section doesn't care about its relative position, but it is used to compute the value we want. Ideally we would do something simple like this to get the section relative address of the symbol.

_Z3foov$RAW-__start___llvm_mipraw

From my testing this doesn't work because these symbols are in different section in ELF (because we use comdat sections for the header). Also, IIRC there were other issues due to relocations getting resolved before the final executable was created and ending up with the wrong values. The solution in this patch was the only one that seems to work in all cases.

Note: if the $RAW symbol has a local linkage or has the hidden visibility, a label difference can indeed avoid a dynamic relative relocation.
# ELF: R_X86_64_PC64
.quad .L__profc_foo-.L__profd_foo

# Mach-O: a pair of X86_64_RELOC_UNSIGNED and X86_64_RELOC_SUBTRACTOR
.quad l___profc_foo-l___profd_foo
Unfortunately this may not be representable in COFF:
% clang -fprofile-generate a.c -c -target x86_64-windows -o d.o
error: Cannot represent this expression

I haven't tested COFF, but i think it might support section relative addresses which would make this format much simpler.

We can use the same trick to encode the function address, we just need to also add the address of the raw section which can be looked up in the binary. This is useful to lookup debug info and join it with our final profile data.

A __profd_$name variable has a similar field referencing the function. It is used by IPVK_IndirectCallTarget so that indirect call target profile can be translated to function names.

We can save the dynamic relocation with the following scheme:
; Note the sub constexpr
@__profd_main = private global { i64, i64, i64*, i64, i8*, i32, [2 x i16] } { i64 -2624081020897602054, i64 742261418966908927, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @__profc_main, i32 0, i32 0), i64 sub (i64 ptrtoint (i32 ()* @alias_main to i64), i64 ptrtoint ({ i64, i64, i64*, i64, i8*, i32, [2 x i16] }* @__profd_main to i64)), i8* null, i32 1, [2 x i16] zeroinitializer }, section "__llvm_prf_data", comdat($__profc_main), align 8

@alias_main = private alias i32 (), i32 ()* @main

define i32 @main() #0 {
  ...
For other fields of __profd_$name, other than the generality and mandatory value profiling (even if you don't use it) issues I mentioned in https://reviews.llvm.org/D104060#2818268 ,
I think the $MAP format is the same.

The format is probably the same. I'm interested to see if we can do something similar to replace our encoded function address.

The $MAP format does not implement these things:

using private linkage as much as possible

I think we can use private linkage in more places if we need to.

ELF comdat any/noduplicates (this increases object file sizes but can enable --gc-sections for linked images)

function name compression

We don't care about the size of the map section since it is extracted out.

I probably sound like a broken record, but we've spent a lot of time making sure the map section can be extracted correctly and that the raw section has no excess info. This is a major feature of MIP, the profile data and the function info are separated so that only the necessary data remains in the binary. The question is whether we should extend an existing pgi to support this feature or if MIP deserves to be its own framework. I do see the value in extending one of the many existing pgi to reduce duplicate work. My thoughts are that it would be too invasive to do everything we need; add one or two new sections, create a new .profraw and .profmap format, add a few flags, and extend the tools. By keeping MIP separate, we can make design decisions that align with our code size goal that may not be so easy to do in existing frameworks.

In D104060#2827721, @davidxl wrote:

Neat -- so the value profile (target histogram) of a given callsite is 'distributed' into the callees -- aka each callee function allocating a fixed size buffer to track all incoming edge frequencies? For some small utility functions, they usually have thousands of callsites (incoming edges), thus this approach may significantly reduce the profile precision for them? The simple LRU eviction policy may be bad for some patterns like ping-pong effect.

Our current approach is to use a single global buffer for all callees and we store a value that identifies both a callsite address and the callee. We've considered other options like a buffer for each callee, but we settled with the global buffer approach to avoid locks, extra complexity, and runtime. Yeah there are cases where we oversample some edges. I'd say this patch is still the in the experimental stage so there is still work to be done.

In D104060#2827823, @ellis wrote:

In D104060#2827721, @davidxl wrote:

Neat -- so the value profile (target histogram) of a given callsite is 'distributed' into the callees -- aka each callee function allocating a fixed size buffer to track all incoming edge frequencies? For some small utility functions, they usually have thousands of callsites (incoming edges), thus this approach may significantly reduce the profile precision for them? The simple LRU eviction policy may be bad for some patterns like ping-pong effect.

Our current approach is to use a single global buffer for all callees and we store a value that identifies both a callsite address and the callee. We've considered other options like a buffer for each callee, but we settled with the global buffer approach to avoid locks, extra complexity, and runtime. Yeah there are cases where we oversample some edges. I'd say this patch is still the in the experimental stage so there is still work to be done.

Single global buffer introduces more data races which can be worse with/without using lock compared with split buffers.

In D104060#2827790, @ellis wrote:

I probably sound like a broken record, but we've spent a lot of time making sure the map section can be extracted correctly and that the raw section has no excess info. This is a major feature of MIP, the profile data and the function info are separated so that only the necessary data remains in the binary. The question is whether we should extend an existing pgi to support this feature or if MIP deserves to be its own framework. I do see the value in extending one of the many existing pgi to reduce duplicate work. My thoughts are that it would be too invasive to do everything we need; add one or two new sections, create a new .profraw and .profmap format, add a few flags, and extend the tools. By keeping MIP separate, we can make design decisions that align with our code size goal that may not be so easy to do in existing frameworks.

The existing __llvm_prf_cnts/__llvm_prf_data can be exacted easily as well... What functionality do you find missing?

It requires a lot of efforts to make linker garbage collection work. How does MIP work better? I don't find comdat specific code or Mach-O S_ATTR_LIVE_SUPPORT.

I have a patch to make use label differences (PC-relative relocations on ELF) for CountersPtr: D104556. The tricky part is to make COFF work because COFF has some limitation.

Do you have measurement of metadata section size, dynamic relocation size, symbol table entry size for the __llvm_prf_* format and MIP?

__llvm_mipmap section should have type ELF::SHF_NOTE so that it doesn't get stripped by --gc-sections.

In D104060#2831605, @ellis wrote:

__llvm_mipmap section should have type ELF::SHF_NOTE so that it doesn't get stripped by --gc-sections.

This is an abuse of the section type SHF_NOTE. GNU ld retaining allocable SHF_NOTE sections under --gc-sections is a somewhat unfortunate fact. I feel it is inappropriate and I know a FreeBSD folk who doesn't like this rule.

A metadata section design making GC work under all of the 3 major binary formats (ELF/PE-COFF/Mach-O) is very complex. You may dig some history related to __llvm_prf_* and __prof*. That may be another argument that we should probably focus on one format. After D104556 and a follow-up patch making the function address relative, I think __llvm_prf_* will be in a very good state.

In D104060#2831660, @MaskRay wrote:

In D104060#2831605, @ellis wrote:

__llvm_mipmap section should have type ELF::SHF_NOTE so that it doesn't get stripped by --gc-sections.

This is an abuse of the section type SHF_NOTE. GNU ld retaining allocable SHF_NOTE sections under --gc-sections is a somewhat unfortunate fact. I feel it is inappropriate and I know a FreeBSD folk who doesn't like this rule.

No? We don't like the fact that it _doesn't_ get retained in certain cases, because that's where our ABI branding lives (as does glibc's, but they don't care all that much about it, as they only brand executables, not libraries, so it's of limited use...). We need it to be present, always, and cannot yet rely on SHT_RETAIN as a feature introduced in the past year is far too new to be able to rely on.

Harbormaster completed remote builds in B110289: Diff 353485.Jun 21 2021, 3:28 PM

In D104060#2831660, @MaskRay wrote:

In D104060#2831605, @ellis wrote:

__llvm_mipmap section should have type ELF::SHF_NOTE so that it doesn't get stripped by --gc-sections.

This is an abuse of the section type SHF_NOTE. GNU ld retaining allocable SHF_NOTE sections under --gc-sections is a somewhat unfortunate fact. I feel it is inappropriate and I know a FreeBSD folk who doesn't like this rule.

I found that SHF_GNU_RETAIN was not enough, but I'm happy to explore other ways if SHF_NOTE is not preferred.

A metadata section design making GC work under all of the 3 major binary formats (ELF/PE-COFF/Mach-O) is very complex. You may dig some history related to __llvm_prf_* and __prof*. That may be another argument that we should probably focus on one format. After D104556 and a follow-up patch making the function address relative, I think __llvm_prf_* will be in a very good state.

Thanks, I'll take a look at __llvm_prf_* and friends.

int3 added a subscriber: int3.Jun 21 2021, 11:13 PM

In D104060#2828641, @MaskRay wrote:

In D104060#2827790, @ellis wrote:

I probably sound like a broken record, but we've spent a lot of time making sure the map section can be extracted correctly and that the raw section has no excess info. This is a major feature of MIP, the profile data and the function info are separated so that only the necessary data remains in the binary. The question is whether we should extend an existing pgi to support this feature or if MIP deserves to be its own framework. I do see the value in extending one of the many existing pgi to reduce duplicate work. My thoughts are that it would be too invasive to do everything we need; add one or two new sections, create a new .profraw and .profmap format, add a few flags, and extend the tools. By keeping MIP separate, we can make design decisions that align with our code size goal that may not be so easy to do in existing frameworks.

The existing __llvm_prf_cnts/__llvm_prf_data can be exacted easily as well... What functionality do you find missing?

Thanks for the patch D104556 that uses PC relative relocations for reference! This avoids dynamic relocation, which is the prerequisite to make a section (region) be extractable.
Unfortunately this is not sufficient because we need to find the references once they are extracted out because the address ranges (either in the original binary and the extracted binary) are not valid or can be altered.
So, we need some notion of section (or region) relative addressing which is not all available in COFF/ELF/MachO. One trick in MIP is to introduce an anchor reference to express such addressing with two PC relative relocations.
In fact, this anchor reference can be hoisted out into the header which we can share, which is not in this change yet.
To interoperate and make all different __llvm_prof_* be extractable, I think we should introduce a header (weak or comdat) like MIP for each __llvm_prof_* and bookkeep distance/relocations within it across regions.
I can see it's technically doable but the cost and risk seems high since we need to restructure __llvm_prof_* and its dependencies. If we're meant to just introduce another section like __llvm_mipmap under a flag in LLVM IR infra, this is just confusing.

In D104060#2836015, @kyulee wrote:

In D104060#2828641, @MaskRay wrote:

In D104060#2827790, @ellis wrote:

I probably sound like a broken record, but we've spent a lot of time making sure the map section can be extracted correctly and that the raw section has no excess info. This is a major feature of MIP, the profile data and the function info are separated so that only the necessary data remains in the binary. The question is whether we should extend an existing pgi to support this feature or if MIP deserves to be its own framework. I do see the value in extending one of the many existing pgi to reduce duplicate work. My thoughts are that it would be too invasive to do everything we need; add one or two new sections, create a new .profraw and .profmap format, add a few flags, and extend the tools. By keeping MIP separate, we can make design decisions that align with our code size goal that may not be so easy to do in existing frameworks.

The existing __llvm_prf_cnts/__llvm_prf_data can be exacted easily as well... What functionality do you find missing?

Thanks for the patch D104556 that uses PC relative relocations for reference! This avoids dynamic relocation, which is the prerequisite to make a section (region) be extractable.
Unfortunately this is not sufficient because we need to find the references once they are extracted out because the address ranges (either in the original binary and the extracted binary) are not valid or can be altered.
So, we need some notion of section (or region) relative addressing which is not all available in COFF/ELF/MachO. One trick in MIP is to introduce an anchor reference to express such addressing with two PC relative relocations.
In fact, this anchor reference can be hoisted out into the header which we can share, which is not in this change yet.

This is interesting. Can you describe a little more with the fuzzy matching, perhaps with a small example to demonstrate how it works?

David

To interoperate and make all different __llvm_prof_* be extractable, I think we should introduce a header (weak or comdat) like MIP for each __llvm_prof_* and bookkeep distance/relocations within it across regions.
I can see it's technically doable but the cost and risk seems high since we need to restructure __llvm_prof_* and its dependencies. If we're meant to just introduce another section like __llvm_mipmap under a flag in LLVM IR infra, this is just confusing.

In D104060#2836209, @davidxl wrote:

In D104060#2836015, @kyulee wrote:

Thanks for the patch D104556 that uses PC relative relocations for reference! This avoids dynamic relocation, which is the prerequisite to make a section (region) be extractable.
Unfortunately this is not sufficient because we need to find the references once they are extracted out because the address ranges (either in the original binary and the extracted binary) are not valid or can be altered.
So, we need some notion of section (or region) relative addressing which is not all available in COFF/ELF/MachO. One trick in MIP is to introduce an anchor reference to express such addressing with two PC relative relocations.
In fact, this anchor reference can be hoisted out into the header which we can share, which is not in this change yet.

This is interesting. Can you describe a little more with the fuzzy matching, perhaps with a small example to demonstrate how it works?

If you're asking about our trick for the raw profile addresses, I have a comment above that describes it in detail (the comment title is "Section Layout"). If your asking about something else, could you please elaborate?

Use the ELF::SHF_GNU_RETAIN flag for the __llvm_mipmap section so that it doesn't get stripped by --gc-sections. The ELF::SHF_NOTE flag is not necessary.

arichardson removed a subscriber: arichardson.Jun 23 2021, 3:18 PM

In D104060#2837248, @ellis wrote:

In D104060#2836209, @davidxl wrote:

In D104060#2836015, @kyulee wrote:

Thanks for the patch D104556 that uses PC relative relocations for reference! This avoids dynamic relocation, which is the prerequisite to make a section (region) be extractable.
Unfortunately this is not sufficient because we need to find the references once they are extracted out because the address ranges (either in the original binary and the extracted binary) are not valid or can be altered.
So, we need some notion of section (or region) relative addressing which is not all available in COFF/ELF/MachO. One trick in MIP is to introduce an anchor reference to express such addressing with two PC relative relocations.
In fact, this anchor reference can be hoisted out into the header which we can share, which is not in this change yet.

This is interesting. Can you describe a little more with the fuzzy matching, perhaps with a small example to demonstrate how it works?

If you're asking about our trick for the raw profile addresses, I have a comment above that describes it in detail (the comment title is "Section Layout"). If your asking about something else, could you please elaborate?

I was referring to this comment by kyulee@: "because the address ranges (either in the original binary and the extracted binary) are not valid or can be altered." -- Is this about using stale profile or something else? I might have misunderstood the part about 'address ranges can be altered' part.

Harbormaster completed remote builds in B110714: Diff 354086.Jun 23 2021, 3:48 PM

In D104060#2837314, @davidxl wrote:

In D104060#2837248, @ellis wrote:

In D104060#2836209, @davidxl wrote:

In D104060#2836015, @kyulee wrote:

Thanks for the patch D104556 that uses PC relative relocations for reference! This avoids dynamic relocation, which is the prerequisite to make a section (region) be extractable.
Unfortunately this is not sufficient because we need to find the references once they are extracted out because the address ranges (either in the original binary and the extracted binary) are not valid or can be altered.
So, we need some notion of section (or region) relative addressing which is not all available in COFF/ELF/MachO. One trick in MIP is to introduce an anchor reference to express such addressing with two PC relative relocations.
In fact, this anchor reference can be hoisted out into the header which we can share, which is not in this change yet.

This is interesting. Can you describe a little more with the fuzzy matching, perhaps with a small example to demonstrate how it works?

If you're asking about our trick for the raw profile addresses, I have a comment above that describes it in detail (the comment title is "Section Layout"). If your asking about something else, could you please elaborate?

I was referring to this comment by kyulee@: "because the address ranges (either in the original binary and the extracted binary) are not valid or can be altered." -- Is this about using stale profile or something else? I might have misunderstood the part about 'address ranges can be altered' part.

Say a binary consists of two sections, C and D.
When D is extracted out (and thus C only remains), the extracted binary D starts with 0 while the address of D in the prior (or original) binary used to be after C. So, a direct addressing is simply invalid unless we bookeep the original location somehow.
In addition, we also saw the case where even C's address in the resulting (remaining) binary might be shifted via a strip process that seemed to alter the image base.
So, I think we need to express and operate with the references which are constant within each section or region.

This is a longer version, and I hope it helps for clarification on how and why MIP works in this context.
Let say we have two sections C (raw counter) and D (metadata) that has references to C where D is to be extracted out (at build-time).

CHeader:
C1:
C2:

Ideally, we want to layout D as below so that the contents in D are constant, but this is not possible in all platforms.

DHeader:
D1:
  &C1 - &CHeader // Section/region-relative to C1
D2:
  &C2 - &CHeader // Section/region-relative to C2

This is how MIP lays D in this patch.
This simulates section-relative expression with two PC-relative expressions. This incurs extra size for D but in practice, this is not a concern because D is meant to be serialized (or extracted) at build-time.
That being said, MIP does not optimize D for size but for simplicity in the maintenance. All metadata including name, etc. is wrapped and self-contained in each D's entry (function granularity) without spanning multiple metadata sections.
Note D is a sort of raw form of MIP's metadata, which is turned into an optimized (merged) profile form via llvm-mipdata create process once at build-time, which then we can merge with multiple Cs (raw counters) only via llvm-mipdata merge.

DHeader:
D1: 
  &CHeader - &D1(.)  // PC-relative to CHeader/Section
  &C1  - &D1(.) // PC-relative to C1
   // Subtract these two values to get the section-relative constant value -- &C1 - &CHeader =  (&C1  - &D1) - (&CHeader - &D1)
D2:
  &CHeader - &D2(.) // PC-relative to CHeader/Section
  &C2 - &D2(.) // PC-relative to C2

This might be an optimized version by sharing the reference to CHeader in DHeader, which I meant in the prior comment.

DHeader :
  &CHeader - &DHeader (.) // PC-relative to CHeader
D1: 
  &C1 - &D1(.) // PC-relative to C1
   // We know this offset, &D1 - &DHeader from the layout in D. 
   // With the common value in DHeader, compute this base value &CHeader - &D1 = (&CHeader - &DHeader) - (&D1 - &DHeader)
   // Then derive the section-relative value in the same way --  &C1 - &CHeader =  (&C1  - &D1) - (&CHeader - &D1)
D2:
  &C2 - &D2(.) // PC-relative to C2

I think a similar approach can be applied when multiple metadata sections exist. The key idea is to keep the distances among sections/regions (that will be in different address spaces) in the header, and use this common value + the layout offset to compute the section/region-relative constant value.
This last approach would largely maintain the size of metadata sections like D and thus allows them to be optionally extracted out. However, in the presence of multiple metadata sections, the complexity in dependencies (runtime and tools) seems high and also maintaining backward compatibility seems challenging.

In D104060#2838544, @kyulee wrote:
In D104060#2837314, @davidxl wrote:

In D104060#2837248, @ellis wrote:

In D104060#2836209, @davidxl wrote:

In D104060#2836015, @kyulee wrote:

Thanks for the patch D104556 that uses PC relative relocations for reference! This avoids dynamic relocation, which is the prerequisite to make a section (region) be extractable.
Unfortunately this is not sufficient because we need to find the references once they are extracted out because the address ranges (either in the original binary and the extracted binary) are not valid or can be altered.
So, we need some notion of section (or region) relative addressing which is not all available in COFF/ELF/MachO. One trick in MIP is to introduce an anchor reference to express such addressing with two PC relative relocations.
In fact, this anchor reference can be hoisted out into the header which we can share, which is not in this change yet.

This is interesting. Can you describe a little more with the fuzzy matching, perhaps with a small example to demonstrate how it works?

If you're asking about our trick for the raw profile addresses, I have a comment above that describes it in detail (the comment title is "Section Layout"). If your asking about something else, could you please elaborate?

I was referring to this comment by kyulee@: "because the address ranges (either in the original binary and the extracted binary) are not valid or can be altered." -- Is this about using stale profile or something else? I might have misunderstood the part about 'address ranges can be altered' part.

Say a binary consists of two sections, C and D.
When D is extracted out (and thus C only remains), the extracted binary D starts with 0 while the address of D in the prior (or original) binary used to be after C. So, a direct addressing is simply invalid unless we bookeep the original location somehow.
In addition, we also saw the case where even C's address in the resulting (remaining) binary might be shifted via a strip process that seemed to alter the image base.
So, I think we need to express and operate with the references which are constant within each section or region.

This is a longer version, and I hope it helps for clarification on how and why MIP works in this context.
Let say we have two sections C (raw counter) and D (metadata) that has references to C where D is to be extracted out (at build-time).
CHeader:
C1:
C2:
Ideally, we want to layout D as below so that the contents in D are constant, but this is not possible in all platforms.
DHeader:
D1:
  &C1 - &CHeader // Section/region-relative to C1
D2:
  &C2 - &CHeader // Section/region-relative to C2
This is how MIP lays D in this patch.
This simulates section-relative expression with two PC-relative expressions. This incurs extra size for D but in practice, this is not a concern because D is meant to be serialized (or extracted) at build-time.
That being said, MIP does not optimize D for size but for simplicity in the maintenance. All metadata including name, etc. is wrapped and self-contained in each D's entry (function granularity) without spanning multiple metadata sections.
Note D is a sort of raw form of MIP's metadata, which is turned into an optimized (merged) profile form via llvm-mipdata create process once at build-time, which then we can merge with multiple Cs (raw counters) only via llvm-mipdata merge.
DHeader:
D1: 
  &CHeader - &D1(.)  // PC-relative to CHeader/Section
  &C1  - &D1(.) // PC-relative to C1
   // Subtract these two values to get the section-relative constant value -- &C1 - &CHeader =  (&C1  - &D1) - (&CHeader - &D1)
D2:
  &CHeader - &D2(.) // PC-relative to CHeader/Section
  &C2 - &D2(.) // PC-relative to C2
This might be an optimized version by sharing the reference to CHeader in DHeader, which I meant in the prior comment.
DHeader :
  &CHeader - &DHeader (.) // PC-relative to CHeader
D1: 
  &C1 - &D1(.) // PC-relative to C1
   // We know this offset, &D1 - &DHeader from the layout in D. 
   // With the common value in DHeader, compute this base value &CHeader - &D1 = (&CHeader - &DHeader) - (&D1 - &DHeader)
   // Then derive the section-relative value in the same way --  &C1 - &CHeader =  (&C1  - &D1) - (&CHeader - &D1)
D2:
  &C2 - &D2(.) // PC-relative to C2
I think a similar approach can be applied when multiple metadata sections exist. The key idea is to keep the distances among sections/regions (that will be in different address spaces) in the header, and use this common value + the layout offset to compute the section/region-relative constant value.
This last approach would largely maintain the size of metadata sections like D and thus allows them to be optionally extracted out. However, in the presence of multiple metadata sections, the complexity in dependencies (runtime and tools) seems high and also maintaining backward compatibility seems challenging.

Thanks for the explanation about it. To summarize it is a technique to further reduce mipdata size by sharing the common header difference.

Unfortunately this is not sufficient because we need to find the references once they are extracted out because the address ranges (either in the original binary and the extracted binary) are not valid or can be altered.

I think extraction works but maybe your extraction operation is different?

The __llvm_prf_{data,cnts} sections with D104556 work this way. In the notation below, the address differences are in-memory.

raw profile header:
  CountersDelta = &D0 - &C0

D0:
  CounterPtr = (in-memory address difference) &D0 - &C0
D1:
  CounterPtr = (in-memory address difference) &D1 - &C1

C0: ...
C1: ...

One can subtract CountersDelta from CounterPtr to get the on-disk D/C offset.

D104556 doesn't touch the FunctionPointer (absolute address) field in D.

Use the ELF::SHF_GNU_RETAIN flag for the __llvm_mipmap section so that it doesn't get stripped by --gc-sections. The ELF::SHF_NOTE flag is not necessary.

The problem is that all __llvm_mipmap sections will be retained, even if the referenced text sections are discarded.
According to https://reviews.llvm.org/D96757#2567631 there could be a lot of waste.

In D104060#2839473, @MaskRay wrote:

Unfortunately this is not sufficient because we need to find the references once they are extracted out because the address ranges (either in the original binary and the extracted binary) are not valid or can be altered.

I think extraction works but maybe your extraction operation is different?

The __llvm_prf_{data,cnts} sections with D104556 work this way. In the notation below, the address differences are in-memory.

Thanks for working on D104556. I'll have to study this to see if it can be extracted in all the same cases as MIP.

raw profile header:
  CountersDelta = &D0 - &C0

D0:
  CounterPtr = (in-memory address difference) &D0 - &C0
D1:
  CounterPtr = (in-memory address difference) &D1 - &C1

C0: ...
C1: ...
One can subtract CountersDelta from CounterPtr to get the on-disk D/C offset.

D104556 doesn't touch the FunctionPointer (absolute address) field in D.

I haven't looked into the FunctionPointer field, but I assume it generate dynamic relocations, which cannot be extracted from a shared library.

Use the ELF::SHF_GNU_RETAIN flag for the __llvm_mipmap section so that it doesn't get stripped by --gc-sections. The ELF::SHF_NOTE flag is not necessary.

The problem is that all __llvm_mipmap sections will be retained, even if the referenced text sections are discarded.
According to https://reviews.llvm.org/D96757#2567631 there could be a lot of waste.

It appears that only the mipmap header needs to be retained, so I'll update the flags.

Rename llvm flag and only the mipmap section header needs the SHF_GNU_RETAIN flag.

Fix lit tests

Harbormaster completed remote builds in B111408: Diff 355070.Jun 28 2021, 6:00 PM

The __llvm_mipraw and __llvm_mipmap sections are now in the same group so that they are retained or removed together. This is important when using -ffunction-sections and --gc-sections.

Also, add 64 bit and 32 bit file types so that we can use 64 bit PC relative relocations.

Harbormaster completed remote builds in B112036: Diff 355946.Jul 1 2021, 12:07 PM

Hoist Raw Section Offset to mipmap header.

alexander-shaposhnikov removed a reviewer: alexander-shaposhnikov.Jul 1 2021, 5:50 PM

alexander-shaposhnikov removed a subscriber: alexander-shaposhnikov.

Herald added a reviewer: alexander-shaposhnikov. · View Herald TranscriptJul 1 2021, 5:50 PM

Harbormaster completed remote builds in B112116: Diff 356058.Jul 1 2021, 6:16 PM

MTC added a subscriber: MTC.Sep 9 2021, 11:45 PM

I think it makes total sense to leverage existing PGO and achieve what's needed through extension rather setting up a new, disconnected framework. I've been working with @ellis and @kyulee offline to see how we can satisfies the key requirements within IRPGO framework.

It really comes down to two things: 1) extractable metadata; 2) coarse-grained instrumentation, and these don't warrant reinvent the wheel for every piece in the PGO pipeline. We've come up with a design that we think can achieve both within today's PGO framework, and we will send up a high level RFC soon. Thanks for the feedbacks and discussions.

In D104060#3063215, @wenlei wrote:

I think it makes total sense to leverage existing PGO and achieve what's needed through extension rather setting up a new, disconnected framework. I've been working with @ellis and @kyulee offline to see how we can satisfies the key requirements within IRPGO framework.

It really comes down to two things: 1) extractable metadata; 2) coarse-grained instrumentation, and these don't warrant reinvent the wheel for every piece in the PGO pipeline. We've come up with a design that we think can achieve both within today's PGO framework, and we will send up a high level RFC soon. Thanks for the feedbacks and discussions.

By the way, here is the RFC we put together.
https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4

ellis abandoned this revision.Dec 16 2021, 4:52 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

AsmPrinter.h

4 lines

MIPSectionEmitter.h

70 lines

MIRInstrumentationPass.h

54 lines

1 line

3 lines

8 lines

1 line

MC/

MCObjectFileInfo.h

19 lines

MIP/

MIP.h

20 lines

MIPData.inc

80 lines

Support/

TargetOpcodes.def

15 lines

Target/

Target.td

18 lines

lib/

CodeGen/

AsmPrinter/

AsmPrinter.cpp

9 lines

CMakeLists.txt

2 lines

CodeGen.cpp

1 line

MIPSectionEmitter.cpp

388 lines

MIRInstrumentationPass.cpp

208 lines

TargetPassConfig.cpp

7 lines

MC/

MCMachOStreamer.cpp

5 lines

MCObjectFileInfo.cpp

16 lines

Target/

AArch64/

AArch64AsmPrinter.cpp

102 lines

AArch64InstrInfo.h

3 lines

AArch64InstrInfo.cpp

23 lines

ARM/

ARMAsmPrinter.h

5 lines

ARMAsmPrinter.cpp

6 lines

ARMMCInstLower.cpp

149 lines

X86/

X86AsmPrinter.h

4 lines

X86MCInstLower.cpp

55 lines

test/

CodeGen/

AArch64/

mip-basic-block-coverage.ll

66 lines

mip-comdat.ll

12 lines

mip-function-coverage.ll

21 lines

mip-header.ll

61 lines

mip-map.ll

88 lines

ARM/

mip-comdat.ll

12 lines

mip-function-coverage.ll

25 lines

mip-header.ll

32 lines

mip-map.ll

57 lines

Thumb/

mip-function-coverage.ll

25 lines

Thumb2/

mip-function-coverage.ll

25 lines

X86/

mip-basic-block-coverage.ll

56 lines

mip-comdat.ll

12 lines

mip-function-coverage.ll

18 lines

mip-header.ll

64 lines

mip-map.ll

90 lines

Diff 356058

llvm/include/llvm/CodeGen/AsmPrinter.h

Show All 13 Lines

#ifndef LLVM_CODEGEN_ASMPRINTER_H		#ifndef LLVM_CODEGEN_ASMPRINTER_H
#define LLVM_CODEGEN_ASMPRINTER_H		#define LLVM_CODEGEN_ASMPRINTER_H

#include "llvm/ADT/MapVector.h"		#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/CodeGen/AsmPrinterHandler.h"		#include "llvm/CodeGen/AsmPrinterHandler.h"
#include "llvm/CodeGen/DwarfStringPoolEntry.h"		#include "llvm/CodeGen/DwarfStringPoolEntry.h"
		#include "llvm/CodeGen/MIPSectionEmitter.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/IR/InlineAsm.h"		#include "llvm/IR/InlineAsm.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/SourceMgr.h"		#include "llvm/Support/SourceMgr.h"
#include <cstdint>		#include <cstdint>
#include <memory>		#include <memory>
#include <utility>		#include <utility>
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	public:
/// all of the global MC-related objects for the generated translation unit.		/// all of the global MC-related objects for the generated translation unit.
MCContext &OutContext;		MCContext &OutContext;

/// This is the MCStreamer object for the file we are generating. This		/// This is the MCStreamer object for the file we are generating. This
/// contains the transient state for the current translation unit that we are		/// contains the transient state for the current translation unit that we are
/// generating (such as the current section etc).		/// generating (such as the current section etc).
std::unique_ptr<MCStreamer> OutStreamer;		std::unique_ptr<MCStreamer> OutStreamer;

		/// Used to emit special sections for machine instrumentation.
		MIPSectionEmitter MIPEmitter;

/// The current machine function.		/// The current machine function.
MachineFunction *MF = nullptr;		MachineFunction *MF = nullptr;

/// This is a pointer to the current MachineModuleInfo.		/// This is a pointer to the current MachineModuleInfo.
MachineModuleInfo *MMI = nullptr;		MachineModuleInfo *MMI = nullptr;

/// This is a pointer to the current MachineDominatorTree.		/// This is a pointer to the current MachineDominatorTree.
MachineDominatorTree *MDT = nullptr;		MachineDominatorTree *MDT = nullptr;
▲ Show 20 Lines • Show All 701 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/MIPSectionEmitter.h

This file was added.

				//===- MIPSectionEmitter.h --------------------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CODEGEN_MIPSECTIONEMITTER_H
				#define LLVM_CODEGEN_MIPSECTIONEMITTER_H

				#include "MachineFunction.h"
				#include "MachineOperand.h"
				#include "llvm/CodeGen/MachineModuleInfo.h"
				#include "llvm/IR/Module.h"
				#include "llvm/MC/MCSymbol.h"
				#include "llvm/MIP/MIP.h"
				#include "llvm/Support/Endian.h"
				#include <map>

				namespace llvm {

				class AsmPrinter;

				class MIPSectionEmitter {
				public:
				explicit MIPSectionEmitter(AsmPrinter &AP);

				void runOnMachineFunctionStart(MachineFunction &MF);
				void runOnMachineFunctionEnd(MachineFunction &MF);
				void runOnFunctionInstrumentationMarker(const MachineInstr &MI);
				void runOnBasicBlockInstrumentationMarker(const MachineInstr &MI);
				void serializeToMIPRawSection();
				void serializeToMIPMapSection();

				MCSymbol *getRawProfileSymbol(const MachineFunction &MF);
				uint64_t getOffsetToRawBlockProfileSymbol(uint32_t BlockID);

				private:
				struct MBBInfo {
				const MCSymbol *StartSymbol;
				};

				struct MFInfo {
				const Function *Func;
				const MCSymbol *StartSymbol;
				const MCSymbol *EndSymbol;
				MCSymbol *RawProfileSymbol;
				uint32_t ControlFlowGraphSignature;
				uint32_t NonEntryBasicBlockCount;

				// A map from Machine Basic Block IDs to MBBInfo.
				DenseMap<uint32_t, MBBInfo> BasicBlockInfos;
				};

				void emitMIPHeader(MachineProfile::MIPFileType FileType);
				void emitMIPFunctionData(const MFInfo &Info);
				void emitMIPFunctionInfo(MFInfo &Info);
				MCSymbol *getMIPSectionBeginSymbol(Twine MIPSectionName);

				AsmPrinter &AP;
				MCSymbol *CurrentFunctionEndSymbol;

				// A map from a function symbol to its function info.
				std::map<const MCSymbol *, MFInfo> FunctionInfos;
				};

				} // end namespace llvm

				#endif // LLVM_CODEGEN_MIPSECTIONEMITTER_H

llvm/include/llvm/CodeGen/MIRInstrumentationPass.h

This file was added.

				//===-------------- MIRInstrumentationPass.h ------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CODEGEN_MIRINSTRUMENTATIONPASS_H
				#define LLVM_CODEGEN_MIRINSTRUMENTATIONPASS_H

				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/Support/CommandLine.h"

				namespace llvm {

				class MIRInstrumentation : public MachineFunctionPass {
				public:
				static char ID;
				MIRInstrumentation() : MachineFunctionPass(ID) {}

				static cl::opt<bool> EnableMachineInstrumentation;
				static cl::opt<bool> EnableMachineFunctionCoverage;
				static cl::opt<bool> EnableMachineBasicBlockCoverage;
				static cl::opt<bool> EnableMachineCallGraph;
				static cl::opt<unsigned> MachineProfileRuntimeBufferSize;
				static cl::opt<unsigned> MachineProfileFunctionGroupCount;
				static cl::opt<unsigned> MachineProfileSelectedFunctionGroup;
				static cl::opt<unsigned> MachineProfileMinInstructionCount;
				static std::string LinkUnitName;
				static cl::opt<std::string, true> LinkUnitNameOption;

				private:
				StringRef getPassName() const override {
				return "Add instrumentation code to machine functions.";
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.setPreservesCFG();
				MachineFunctionPass::getAnalysisUsage(AU);
				}

				bool doInitialization(Module &M) override;
				bool shouldInstrumentMachineFunction(const MachineFunction &MF) const;
				bool runOnMachineFunction(MachineFunction &MF) override;
				uint32_t getControlFlowGraphSignature(
				SmallVectorImpl<MachineBasicBlock *> &MBBs) const;
				void getMachineBasicBlocks(MachineFunction &MF,
				SmallVectorImpl<MachineBasicBlock *> &MBBs) const;
				void runOnMachineBasicBlock(MachineBasicBlock &MBB, uint32_t BlockID);
				};
				} // namespace llvm

				#endif // LLVM_CODEGEN_MIRINSTRUMENTATIONPASS_H

llvm/include/llvm/CodeGen/MachineInstr.h

Show First 20 Lines • Show All 1,314 Lines • ▼ Show 20 Lines	bool isMetaInstruction() const {
case TargetOpcode::DBG_VALUE:		case TargetOpcode::DBG_VALUE:
case TargetOpcode::DBG_VALUE_LIST:		case TargetOpcode::DBG_VALUE_LIST:
case TargetOpcode::DBG_INSTR_REF:		case TargetOpcode::DBG_INSTR_REF:
case TargetOpcode::DBG_PHI:		case TargetOpcode::DBG_PHI:
case TargetOpcode::DBG_LABEL:		case TargetOpcode::DBG_LABEL:
case TargetOpcode::LIFETIME_START:		case TargetOpcode::LIFETIME_START:
case TargetOpcode::LIFETIME_END:		case TargetOpcode::LIFETIME_END:
case TargetOpcode::PSEUDO_PROBE:		case TargetOpcode::PSEUDO_PROBE:
		case TargetOpcode::MIP_FUNCTION_INSTRUMENTATION_MARKER:
return true;		return true;
}		}
}		}

/// Return true if this is a transient instruction that is either very likely		/// Return true if this is a transient instruction that is either very likely
/// to be eliminated during register allocation (such as copy-like		/// to be eliminated during register allocation (such as copy-like
/// instructions), or if this instruction doesn't have an execution-time cost.		/// instructions), or if this instruction doesn't have an execution-time cost.
bool isTransient() const {		bool isTransient() const {
▲ Show 20 Lines • Show All 602 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/Passes.h

Show First 20 Lines • Show All 323 Lines • ▼ Show 20 Lines	namespace llvm {

/// This pass lays out funclets contiguously.		/// This pass lays out funclets contiguously.
extern char &FuncletLayoutID;		extern char &FuncletLayoutID;

/// This pass inserts the XRay instrumentation sleds if they are supported by		/// This pass inserts the XRay instrumentation sleds if they are supported by
/// the target platform.		/// the target platform.
extern char &XRayInstrumentationID;		extern char &XRayInstrumentationID;

		/// This pass injects instrumentation code into machine functions.
		extern char &MIRInstrumentationID;

/// This pass inserts FEntry calls		/// This pass inserts FEntry calls
extern char &FEntryInserterID;		extern char &FEntryInserterID;

/// This pass implements the "patchable-function" attribute.		/// This pass implements the "patchable-function" attribute.
extern char &PatchableFunctionID;		extern char &PatchableFunctionID;

/// createStackProtectorPass - This pass adds stack protectors to functions.		/// createStackProtectorPass - This pass adds stack protectors to functions.
///		///
▲ Show 20 Lines • Show All 188 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/TargetInstrInfo.h

Show First 20 Lines • Show All 1,974 Lines • ▼ Show 20 Lines	virtual unsigned getTailDuplicateSize(CodeGenOpt::Level OptLevel) const {
return OptLevel >= CodeGenOpt::Aggressive ? 4 : 2;		return OptLevel >= CodeGenOpt::Aggressive ? 4 : 2;
}		}

/// Returns the callee operand from the given \p MI.		/// Returns the callee operand from the given \p MI.
virtual const MachineOperand &getCalleeOperand(const MachineInstr &MI) const {		virtual const MachineOperand &getCalleeOperand(const MachineInstr &MI) const {
return MI.getOperand(0);		return MI.getOperand(0);
}		}

		/// Returns a target-specific temporary register that is dead at the
		/// beginning of the given machine basic block for machine profile
		/// instrumentation.
		virtual Register
		getTemporaryMachineProfileRegister(const MachineBasicBlock &MBB) const {
		return MCRegister::NoRegister;
		}

private:		private:
mutable std::unique_ptr<MIRFormatter> Formatter;		mutable std::unique_ptr<MIRFormatter> Formatter;
unsigned CallFrameSetupOpcode, CallFrameDestroyOpcode;		unsigned CallFrameSetupOpcode, CallFrameDestroyOpcode;
unsigned CatchRetOpcode;		unsigned CatchRetOpcode;
unsigned ReturnOpcode;		unsigned ReturnOpcode;
};		};

/// Provide DenseMapInfo for TargetInstrInfo::RegSubRegPair.		/// Provide DenseMapInfo for TargetInstrInfo::RegSubRegPair.
Show All 30 Lines

llvm/include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 272 Lines • ▼ Show 20 Lines
	void initializeLowerIntrinsicsPass(PassRegistry&);			void initializeLowerIntrinsicsPass(PassRegistry&);
	void initializeLowerInvokeLegacyPassPass(PassRegistry&);			void initializeLowerInvokeLegacyPassPass(PassRegistry&);
	void initializeLowerSwitchLegacyPassPass(PassRegistry &);			void initializeLowerSwitchLegacyPassPass(PassRegistry &);
	void initializeLowerTypeTestsPass(PassRegistry&);			void initializeLowerTypeTestsPass(PassRegistry&);
	void initializeLowerMatrixIntrinsicsLegacyPassPass(PassRegistry &);			void initializeLowerMatrixIntrinsicsLegacyPassPass(PassRegistry &);
	void initializeLowerMatrixIntrinsicsMinimalLegacyPassPass(PassRegistry &);			void initializeLowerMatrixIntrinsicsMinimalLegacyPassPass(PassRegistry &);
	void initializeMIRAddFSDiscriminatorsPass(PassRegistry &);			void initializeMIRAddFSDiscriminatorsPass(PassRegistry &);
	void initializeMIRCanonicalizerPass(PassRegistry &);			void initializeMIRCanonicalizerPass(PassRegistry &);
				void initializeMIRInstrumentationPass(PassRegistry &);
	void initializeMIRNamerPass(PassRegistry &);			void initializeMIRNamerPass(PassRegistry &);
	void initializeMIRPrintingPassPass(PassRegistry&);			void initializeMIRPrintingPassPass(PassRegistry&);
	void initializeMachineBlockFrequencyInfoPass(PassRegistry&);			void initializeMachineBlockFrequencyInfoPass(PassRegistry&);
	void initializeMachineBlockPlacementPass(PassRegistry&);			void initializeMachineBlockPlacementPass(PassRegistry&);
	void initializeMachineBlockPlacementStatsPass(PassRegistry&);			void initializeMachineBlockPlacementStatsPass(PassRegistry&);
	void initializeMachineBranchProbabilityInfoPass(PassRegistry&);			void initializeMachineBranchProbabilityInfoPass(PassRegistry&);
	void initializeMachineCSEPass(PassRegistry&);			void initializeMachineCSEPass(PassRegistry&);
	void initializeMachineCombinerPass(PassRegistry&);			void initializeMachineCombinerPass(PassRegistry&);
	▲ Show 20 Lines • Show All 171 Lines • Show Last 20 Lines

llvm/include/llvm/MC/MCObjectFileInfo.h

Show First 20 Lines • Show All 157 Lines • ▼ Show 20 Lines	protected:
MCSection *TLSBSSSection = nullptr; // Defaults to ".tbss".		MCSection *TLSBSSSection = nullptr; // Defaults to ".tbss".

/// StackMap section.		/// StackMap section.
MCSection *StackMapSection = nullptr;		MCSection *StackMapSection = nullptr;

/// FaultMap section.		/// FaultMap section.
MCSection *FaultMapSection = nullptr;		MCSection *FaultMapSection = nullptr;

		/// Specials sections for machine instrumentation.
		MCSection *MIPRawSection = nullptr;
		MCSection *MIPMapSection = nullptr;

		/// Special COMDAT sections for machine instrumentation. Only used on ELF
		/// targets.
		MCSection *MIPRawHeaderComdatSection = nullptr;
		MCSection *MIPMapHeaderComdatSection = nullptr;

/// Remarks section.		/// Remarks section.
MCSection *RemarksSection = nullptr;		MCSection *RemarksSection = nullptr;

/// EH frame section.		/// EH frame section.
///		///
/// It is initialized on demand so it can be overwritten (with uniquing).		/// It is initialized on demand so it can be overwritten (with uniquing).
MCSection *EHFrameSection = nullptr;		MCSection *EHFrameSection = nullptr;

▲ Show 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	public:
}		}

MCSection *getTLSExtraDataSection() const { return TLSExtraDataSection; }		MCSection *getTLSExtraDataSection() const { return TLSExtraDataSection; }
const MCSection *getTLSDataSection() const { return TLSDataSection; }		const MCSection *getTLSDataSection() const { return TLSDataSection; }
MCSection *getTLSBSSSection() const { return TLSBSSSection; }		MCSection *getTLSBSSSection() const { return TLSBSSSection; }

MCSection *getStackMapSection() const { return StackMapSection; }		MCSection *getStackMapSection() const { return StackMapSection; }
MCSection *getFaultMapSection() const { return FaultMapSection; }		MCSection *getFaultMapSection() const { return FaultMapSection; }

		MCSection *getMIPRawSection() const { return MIPRawSection; }
		MCSection *getMIPMapSection() const { return MIPMapSection; }
		MCSection *getMIPRawHeaderComdatSection() const {
		return MIPRawHeaderComdatSection;
		}
		MCSection *getMIPMapHeaderComdatSection() const {
		return MIPMapHeaderComdatSection;
		}

MCSection *getRemarksSection() const { return RemarksSection; }		MCSection *getRemarksSection() const { return RemarksSection; }

MCSection *getStackSizesSection(const MCSection &TextSec) const;		MCSection *getStackSizesSection(const MCSection &TextSec) const;

MCSection *getBBAddrMapSection(const MCSection &TextSec) const;		MCSection *getBBAddrMapSection(const MCSection &TextSec) const;

MCSection getPseudoProbeSection(const MCSection TextSec) const;		MCSection getPseudoProbeSection(const MCSection TextSec) const;

▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

llvm/include/llvm/MIP/MIP.h

This file was added.

				//===- MIP.h - Machine IR Profile -------------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_MIP_MIP_H
				#define LLVM_MIP_MIP_H

				namespace llvm {
				namespace MachineProfile {

				#include "llvm/MIP/MIPData.inc"

				} // namespace MachineProfile
				} // namespace llvm

				#endif // LLVM_MIP_MIP_H

llvm/include/llvm/MIP/MIPData.inc

This file was added.

				//===-- MIPData.inc - machine ir profile runtime structures -------- C --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains definitions and data structures that are shared between
				// the runtime and the compiler. It lives in two locations that need to stay in
				// sync.
				// + llvm/include/llvm/MIP/MIPData.inc
				// + compiler-rt/include/mip/MIPData.inc
				//
				//===----------------------------------------------------------------------===//

				#ifndef MIP_DATA_DEFINED

				#include <stdint.h>

				#ifndef MIP_DATA_INC
				#define MIP_DATA_INC

				#define MIP_SIMPLE_QUOTE(x) #x
				#define MIP_QUOTE(x) MIP_SIMPLE_QUOTE(x)
				#define MIP_SIMPLE_CONCAT(x,y) x ## y
				#define MIP_CONCAT(x,y) MIP_SIMPLE_CONCAT(x,y)

				#define MIP_RAW_SECTION __llvm_mipraw
				#define MIP_MAP_SECTION __llvm_mipmap
				#define MIP_RUNTIME_SYMBOL __llvm_mip_runtime

				#define MIP_RAW_SECTION_NAME MIP_QUOTE(MIP_RAW_SECTION)
				#define MIP_MAP_SECTION_NAME MIP_QUOTE(MIP_MAP_SECTION)
				#define MIP_RUNTIME_SYMBOL_NAME MIP_QUOTE(MIP_RUNTIME_SYMBOL)

				// MIP magic value in little endian format.
				// \251 M I P
				// 0xFB 0x4D 0x49 0x50
				#define MIP_MAGIC_VALUE (0x50494DFB)
				#define MIP_VERSION (8)

				typedef enum {
				MIP_FILE_TYPE_RAW = 1 << 0, // .mipraw
				MIP_FILE_TYPE_CALL_EDGE_SAMPLES = 1 << 1, // .mipret
				MIP_FILE_TYPE_MAP = 1 << 2, // .mipmap
				MIP_FILE_TYPE_PROFILE = 1 << 3, // .mip
				MIP_FILE_TYPE_64_BIT = 1 << 4,
				MIP_FILE_TYPE_32_BIT = 1 << 5,
				} MIPFileType;

				typedef enum {
				MIP_PROFILE_TYPE_FUNCTION_COVERAGE = 1 << 0,
				MIP_PROFILE_TYPE_BLOCK_COVERAGE = 1 << 1,
				MIP_PROFILE_TYPE_FUNCTION_TIMESTAMP = 1 << 2,
				MIP_PROFILE_TYPE_FUNCTION_CALL_COUNT = 1 << 3,
				MIP_PROFILE_TYPE_RETURN_ADDRESS = 1 << 4,
				} MIPProfileType;

				typedef struct {
				uint32_t Magic;
				uint16_t Version;
				uint16_t FileType;
				uint32_t ProfileType;
				uint32_t ModuleHash;
				int64_t RawSectionOffset;
				uint32_t Reserved;
				uint32_t OffsetToData;
				} MIPHeader;

				typedef struct {
				uint32_t CalleeProfileDataOffset;
				uint32_t SectionRelativeReturnAddress;
				} CallEdge_t;

				#endif // MIP_DATA_INC

				#else // MIP_DATA_INC
				#undef MIP_DATA_DEFINED
				#endif // MIP_DATA_DEFINED

llvm/include/llvm/Support/TargetOpcodes.def

	Show All 28 Lines
	HANDLE_TARGET_OPCODE(PHI)			HANDLE_TARGET_OPCODE(PHI)
	HANDLE_TARGET_OPCODE(INLINEASM)			HANDLE_TARGET_OPCODE(INLINEASM)
	HANDLE_TARGET_OPCODE(INLINEASM_BR)			HANDLE_TARGET_OPCODE(INLINEASM_BR)
	HANDLE_TARGET_OPCODE(CFI_INSTRUCTION)			HANDLE_TARGET_OPCODE(CFI_INSTRUCTION)
	HANDLE_TARGET_OPCODE(EH_LABEL)			HANDLE_TARGET_OPCODE(EH_LABEL)
	HANDLE_TARGET_OPCODE(GC_LABEL)			HANDLE_TARGET_OPCODE(GC_LABEL)
	HANDLE_TARGET_OPCODE(ANNOTATION_LABEL)			HANDLE_TARGET_OPCODE(ANNOTATION_LABEL)

				/// MIP_FUNCTION_INSTRUMENTATION_MARKER - This instruction is a noop that marks
				/// this function as an instrumented function. It takes as input the control
				/// flow graph signature and the number of machine basic blocks.
				HANDLE_TARGET_OPCODE(MIP_FUNCTION_INSTRUMENTATION_MARKER)

				/// MIP_FUNCTION_COVERAGE_INSTRUMENTATION - This instruction implements function
				/// coverage instrumentation. It takes as input a free temporary register if
				/// one exists.
				HANDLE_TARGET_OPCODE(MIP_FUNCTION_COVERAGE_INSTRUMENTATION)

				/// MIP_BASIC_BLOCK_COVERAGE_INSTRUMENTATION - This instruction implements basic
				/// block coverage instrumentation. It takes as input a free temporary register
				/// if one exists and the id of the instrumented block.
				HANDLE_TARGET_OPCODE(MIP_BASIC_BLOCK_COVERAGE_INSTRUMENTATION)

	/// KILL - This instruction is a noop that is used only to adjust the			/// KILL - This instruction is a noop that is used only to adjust the
	/// liveness of registers. This can be useful when dealing with			/// liveness of registers. This can be useful when dealing with
	/// sub-registers.			/// sub-registers.
	HANDLE_TARGET_OPCODE(KILL)			HANDLE_TARGET_OPCODE(KILL)

	/// EXTRACT_SUBREG - This instruction takes two operands: a register			/// EXTRACT_SUBREG - This instruction takes two operands: a register
	/// that has subregisters, and a subregister index. It returns the			/// that has subregisters, and a subregister index. It returns the
	/// extracted subregister value. This is commonly used to implement			/// extracted subregister value. This is commonly used to implement
	▲ Show 20 Lines • Show All 729 Lines • Show Last 20 Lines

llvm/include/llvm/Target/Target.td

	Show First 20 Lines • Show All 1,008 Lines • ▼ Show 20 Lines
	class StandardPseudoInstruction : Instruction {			class StandardPseudoInstruction : Instruction {
	let mayLoad = false;			let mayLoad = false;
	let mayStore = false;			let mayStore = false;
	let isCodeGenOnly = true;			let isCodeGenOnly = true;
	let isPseudo = true;			let isPseudo = true;
	let hasNoSchedulingInfo = true;			let hasNoSchedulingInfo = true;
	let Namespace = "TargetOpcode";			let Namespace = "TargetOpcode";
	}			}
				def MIP_FUNCTION_INSTRUMENTATION_MARKER : StandardPseudoInstruction {
				let OutOperandList = (outs);
				let InOperandList = (ins i32imm:$cfg, i32imm:$n);
				let AsmString = "";
				let hasSideEffects = 0;
				}
				def MIP_FUNCTION_COVERAGE_INSTRUMENTATION : StandardPseudoInstruction {
				let OutOperandList = (outs);
				let InOperandList = (ins unknown:$reg);
				let AsmString = "";
				let hasSideEffects = 1;
				}
				def MIP_BASIC_BLOCK_COVERAGE_INSTRUMENTATION : StandardPseudoInstruction {
				let OutOperandList = (outs);
				let InOperandList = (ins unknown:$reg, i32imm:$n);
				let AsmString = "";
				let hasSideEffects = 1;
				}
	def PHI : StandardPseudoInstruction {			def PHI : StandardPseudoInstruction {
	let OutOperandList = (outs unknown:$dst);			let OutOperandList = (outs unknown:$dst);
	let InOperandList = (ins variable_ops);			let InOperandList = (ins variable_ops);
	let AsmString = "PHINODE";			let AsmString = "PHINODE";
	let hasSideEffects = false;			let hasSideEffects = false;
	}			}
	def INLINEASM : StandardPseudoInstruction {			def INLINEASM : StandardPseudoInstruction {
	let OutOperandList = (outs);			let OutOperandList = (outs);
	▲ Show 20 Lines • Show All 690 Lines • Show Last 20 Lines

llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp

Show First 20 Lines • Show All 189 Lines • ▼ Show 20 Lines	Align AsmPrinter::getGVAlignment(const GlobalObject *GV, const DataLayout &DL,
// NumBits because the GV has an assigned section, obey it.		// NumBits because the GV has an assigned section, obey it.
if (*GVAlign > Alignment \|\| GV->hasSection())		if (*GVAlign > Alignment \|\| GV->hasSection())
Alignment = *GVAlign;		Alignment = *GVAlign;
return Alignment;		return Alignment;
}		}

AsmPrinter::AsmPrinter(TargetMachine &tm, std::unique_ptr<MCStreamer> Streamer)		AsmPrinter::AsmPrinter(TargetMachine &tm, std::unique_ptr<MCStreamer> Streamer)
: MachineFunctionPass(ID), TM(tm), MAI(tm.getMCAsmInfo()),		: MachineFunctionPass(ID), TM(tm), MAI(tm.getMCAsmInfo()),
OutContext(Streamer->getContext()), OutStreamer(std::move(Streamer)) {		OutContext(Streamer->getContext()), OutStreamer(std::move(Streamer)),
		MIPEmitter(*this) {
VerboseAsm = OutStreamer->isVerboseAsm();		VerboseAsm = OutStreamer->isVerboseAsm();
}		}

AsmPrinter::~AsmPrinter() {		AsmPrinter::~AsmPrinter() {
assert(!DD && Handlers.size() == NumUserHandlers &&		assert(!DD && Handlers.size() == NumUserHandlers &&
"Debug/EH info didn't get finalized");		"Debug/EH info didn't get finalized");

if (GCMetadataPrinters) {		if (GCMetadataPrinters) {
▲ Show 20 Lines • Show All 1,252 Lines • ▼ Show 20 Lines	void AsmPrinter::emitFunctionBody() {

if (needFuncLabelsForEHOrDebugInfo(*MF) \|\|		if (needFuncLabelsForEHOrDebugInfo(*MF) \|\|
MAI->hasDotTypeDotSizeDirective()) {		MAI->hasDotTypeDotSizeDirective()) {
// Create a symbol for the end of function.		// Create a symbol for the end of function.
CurrentFnEnd = createTempSymbol("func_end");		CurrentFnEnd = createTempSymbol("func_end");
OutStreamer->emitLabel(CurrentFnEnd);		OutStreamer->emitLabel(CurrentFnEnd);
}		}

		MIPEmitter.runOnMachineFunctionEnd(*MF);

// If the target wants a .size directive for the size of the function, emit		// If the target wants a .size directive for the size of the function, emit
// it.		// it.
if (MAI->hasDotTypeDotSizeDirective()) {		if (MAI->hasDotTypeDotSizeDirective()) {
// We can get the size as difference between the function label and the		// We can get the size as difference between the function label and the
// temp label.		// temp label.
const MCExpr *SizeExp = MCBinaryExpr::createSub(		const MCExpr *SizeExp = MCBinaryExpr::createSub(
MCSymbolRefExpr::create(CurrentFnEnd, OutContext),		MCSymbolRefExpr::create(CurrentFnEnd, OutContext),
MCSymbolRefExpr::create(CurrentFnSymForSize, OutContext), OutContext);		MCSymbolRefExpr::create(CurrentFnSymForSize, OutContext), OutContext);
▲ Show 20 Lines • Show All 445 Lines • ▼ Show 20 Lines	for (const GlobalValue &GV : M.global_values()) {
MAI->getCodePointerSize());		MAI->getCodePointerSize());
}		}
}		}

// Allow the target to emit any magic that it wants at the end of the file,		// Allow the target to emit any magic that it wants at the end of the file,
// after everything else has gone out.		// after everything else has gone out.
emitEndOfAsmFile(M);		emitEndOfAsmFile(M);

		MIPEmitter.serializeToMIPRawSection();
		MIPEmitter.serializeToMIPMapSection();

MMI = nullptr;		MMI = nullptr;

OutStreamer->Finish();		OutStreamer->Finish();
OutStreamer->reset();		OutStreamer->reset();
OwnedMLI.reset();		OwnedMLI.reset();
OwnedMDT.reset();		OwnedMDT.reset();

return false;		return false;
Show All 38 Lines	if (F.hasFnAttribute("patchable-function-entry") \|\|
needFuncLabelsForEHOrDebugInfo(MF) \|\| NeedsLocalForSize \|\|		needFuncLabelsForEHOrDebugInfo(MF) \|\| NeedsLocalForSize \|\|
MF.getTarget().Options.EmitStackSizeSection \|\| MF.hasBBLabels()) {		MF.getTarget().Options.EmitStackSizeSection \|\| MF.hasBBLabels()) {
CurrentFnBegin = createTempSymbol("func_begin");		CurrentFnBegin = createTempSymbol("func_begin");
if (NeedsLocalForSize)		if (NeedsLocalForSize)
CurrentFnSymForSize = CurrentFnBegin;		CurrentFnSymForSize = CurrentFnBegin;
}		}

ORE = &getAnalysis<MachineOptimizationRemarkEmitterPass>().getORE();		ORE = &getAnalysis<MachineOptimizationRemarkEmitterPass>().getORE();
		MIPEmitter.runOnMachineFunctionStart(MF);
}		}

namespace {		namespace {

// Keep track the alignment, constpool entries per Section.		// Keep track the alignment, constpool entries per Section.
struct SectionCPs {		struct SectionCPs {
MCSection *S;		MCSection *S;
Align Alignment;		Align Alignment;
▲ Show 20 Lines • Show All 1,610 Lines • Show Last 20 Lines

llvm/lib/CodeGen/CMakeLists.txt

Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Lines	add_llvm_component_library(LLVMCodeGen
RegisterCoalescer.cpp		RegisterCoalescer.cpp
RegisterPressure.cpp		RegisterPressure.cpp
RegisterScavenging.cpp		RegisterScavenging.cpp
RenameIndependentSubregs.cpp		RenameIndependentSubregs.cpp
MachineStableHash.cpp		MachineStableHash.cpp
MIRVRegNamerUtils.cpp		MIRVRegNamerUtils.cpp
MIRNamerPass.cpp		MIRNamerPass.cpp
MIRCanonicalizerPass.cpp		MIRCanonicalizerPass.cpp
		MIRInstrumentationPass.cpp
		MIPSectionEmitter.cpp
RegisterUsageInfo.cpp		RegisterUsageInfo.cpp
RegUsageInfoCollector.cpp		RegUsageInfoCollector.cpp
RegUsageInfoPropagate.cpp		RegUsageInfoPropagate.cpp
ReplaceWithVeclib.cpp		ReplaceWithVeclib.cpp
ResetMachineFunctionPass.cpp		ResetMachineFunctionPass.cpp
SafeStack.cpp		SafeStack.cpp
SafeStackLayout.cpp		SafeStackLayout.cpp
ScheduleDAG.cpp		ScheduleDAG.cpp
▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

llvm/lib/CodeGen/CodeGen.cpp

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	void llvm::initializeCodeGen(PassRegistry &Registry) {
initializeLiveDebugVariablesPass(Registry);		initializeLiveDebugVariablesPass(Registry);
initializeLiveIntervalsPass(Registry);		initializeLiveIntervalsPass(Registry);
initializeLiveRangeShrinkPass(Registry);		initializeLiveRangeShrinkPass(Registry);
initializeLiveStacksPass(Registry);		initializeLiveStacksPass(Registry);
initializeLiveVariablesPass(Registry);		initializeLiveVariablesPass(Registry);
initializeLocalStackSlotPassPass(Registry);		initializeLocalStackSlotPassPass(Registry);
initializeLowerIntrinsicsPass(Registry);		initializeLowerIntrinsicsPass(Registry);
initializeMIRCanonicalizerPass(Registry);		initializeMIRCanonicalizerPass(Registry);
		initializeMIRInstrumentationPass(Registry);
initializeMIRNamerPass(Registry);		initializeMIRNamerPass(Registry);
initializeMachineBlockFrequencyInfoPass(Registry);		initializeMachineBlockFrequencyInfoPass(Registry);
initializeMachineBlockPlacementPass(Registry);		initializeMachineBlockPlacementPass(Registry);
initializeMachineBlockPlacementStatsPass(Registry);		initializeMachineBlockPlacementStatsPass(Registry);
initializeMachineCSEPass(Registry);		initializeMachineCSEPass(Registry);
initializeMachineCombinerPass(Registry);		initializeMachineCombinerPass(Registry);
initializeMachineCopyPropagationPass(Registry);		initializeMachineCopyPropagationPass(Registry);
initializeMachineDominatorTreePass(Registry);		initializeMachineDominatorTreePass(Registry);
▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

llvm/lib/CodeGen/MIPSectionEmitter.cpp

This file was added.

				//===- MIPSectionEmitter.cpp ----------------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/CodeGen/MIPSectionEmitter.h"
				#include "llvm/ADT/Twine.h"
				#include "llvm/CodeGen/AsmPrinter.h"
				#include "llvm/CodeGen/MIRInstrumentationPass.h"
				#include "llvm/IR/DebugInfoMetadata.h"
				#include "llvm/IR/Mangler.h"
				#include "llvm/MC/MCAsmInfo.h"
				#include "llvm/MC/MCContext.h"
				#include "llvm/MC/MCExpr.h"
				#include "llvm/MC/MCObjectFileInfo.h"
				#include "llvm/MC/MCObjectStreamer.h"
				#include "llvm/MC/MCStreamer.h"
				#include "llvm/MC/MCValue.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/ErrorHandling.h"
				#include "llvm/Support/Format.h"
				#include "llvm/Support/Path.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Target/TargetMachine.h"

				using namespace llvm;
				using namespace llvm::MachineProfile;

				std::string getMangledName(const Function *F) {
				std::string MangledName;
				raw_string_ostream MangledNameOS(MangledName);
				Mangler().getNameWithPrefix(MangledNameOS, F,
				/CannotUsePrivateLabel=/true);
				return MangledName;
				}

				MIPSectionEmitter::MIPSectionEmitter(AsmPrinter &AP) : AP(AP) {}

				MCSymbol *MIPSectionEmitter::getMIPSectionBeginSymbol(Twine MIPSectionName) {
				auto &OS = *AP.OutStreamer;
				auto &OutContext = OS.getContext();
				const auto &TT = OutContext.getTargetTriple();

				if (TT.getObjectFormat() == Triple::ELF)
				return OutContext.getOrCreateSymbol("__start_" + MIPSectionName);
				if (TT.getObjectFormat() == Triple::MachO)
				return OutContext.getOrCreateSymbol("__header$" + MIPSectionName);
				llvm_unreachable("Unsupported target triple");
				}

				void MIPSectionEmitter::runOnMachineFunctionStart(MachineFunction &MF) {
				if (!MIRInstrumentation::EnableMachineInstrumentation)
				return;

				auto &OS = *AP.OutStreamer;
				auto &OutContext = OS.getContext();

				MaskRayUnsubmitted Done Reply Inline Actions I want to try out the patch but I have noticed some layering violation. If you do a `-DBUILD_SHARED_LIBS=on` build, you'll get errors like ld.lld: error: undefined symbol: llvm::AsmPrinter::createTempSymbol(llvm::Twine const&) const >>> referenced by MIPSectionEmitter.cpp:60 (/home/maskray/llvm/llvm/lib/CodeGen/MIPSectionEmitter.cpp:60) >>> lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MIPSectionEmitter.cpp.o:(llvm::MIPSectionEmitter::runOnMachineFunctionStart(llvm::MachineFunction&)) >>> referenced by MIPSectionEmitter.cpp:128 (/home/maskray/llvm/llvm/lib/CodeGen/MIPSectionEmitter.cpp:128) >>> lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MIPSectionEmitter.cpp.o:(llvm::MIPSectionEmitter::emitMIPHeader(llvm::MachineProfile::MIPFileType)) >>> referenced by MIPSectionEmitter.cpp:237 (/home/maskray/llvm/llvm/lib/CodeGen/MIPSectionEmitter.cpp:237) >>> lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MIPSectionEmitter.cpp.o:(llvm::MIPSectionEmitter::emitMIPFunctionInfo(llvm::MIPSectionEmitter::MFInfo&)) because of `-Wl,-z,defs` MaskRay: I want to try out the patch but I have noticed some layering violation. If you do a `…
				for (auto &MBB : MF)
				MBB.setLabelMustBeEmitted();
				CurrentFunctionEndSymbol = OutContext.createTempSymbol("mip_func_end");
				}

				void MIPSectionEmitter::runOnMachineFunctionEnd(MachineFunction &MF) {
				if (!MIRInstrumentation::EnableMachineInstrumentation)
				return;

				auto &OS = *AP.OutStreamer;
				OS.emitLabel(CurrentFunctionEndSymbol);
				}

				void MIPSectionEmitter::runOnFunctionInstrumentationMarker(
				const MachineInstr &MI) {
				assert(MI.getOpcode() == TargetOpcode::MIP_FUNCTION_INSTRUMENTATION_MARKER);
				auto &OS = *AP.OutStreamer;
				auto &OutContext = OS.getContext();

				MFInfo Info;
				Info.Func = &MI.getMF()->getFunction();
				Info.StartSymbol = AP.TM.getSymbol(Info.Func);
				Info.EndSymbol = CurrentFunctionEndSymbol;
				Info.RawProfileSymbol =
				OutContext.getOrCreateSymbol(getMangledName(Info.Func) + "$RAW");
				Info.ControlFlowGraphSignature = MI.getOperand(0).getImm();
				Info.NonEntryBasicBlockCount = MI.getOperand(1).getImm();

				FunctionInfos.insert(std::make_pair(Info.StartSymbol, Info));
				}

				void MIPSectionEmitter::runOnBasicBlockInstrumentationMarker(
				const MachineInstr &MI) {
				assert(MI.getOpcode() ==
				TargetOpcode::MIP_BASIC_BLOCK_COVERAGE_INSTRUMENTATION);
				// TODO: Properly lookup the correct function info instead of just looking at
				// the function we are currently in.
				const auto &F = MI.getMF()->getFunction();
				auto BlockID = MI.getOperand(1).getImm();
				MBBInfo Info;
				Info.StartSymbol = MI.getParent()->getSymbol();
				auto *FunctionSymbol = AP.TM.getSymbol(&F);
				auto &FunctionInfo = FunctionInfos[FunctionSymbol];
				FunctionInfo.BasicBlockInfos.insert(std::make_pair(BlockID, Info));
				}

				MCSymbol *MIPSectionEmitter::getRawProfileSymbol(const MachineFunction &MF) {
				// TODO: Take a function symbol as input so we know we are getting the right
				// raw symbol. For now we are using the function we are currently in.
				auto *FunctionSymbol = AP.TM.getSymbol(&MF.getFunction());
				auto &Info = FunctionInfos[FunctionSymbol];
				return Info.RawProfileSymbol;
				}

				uint64_t MIPSectionEmitter::getOffsetToRawBlockProfileSymbol(uint32_t BlockID) {
				assert(MIRInstrumentation::EnableMachineBasicBlockCoverage);
				if (MIRInstrumentation::EnableMachineFunctionCoverage)
				return 1 + BlockID;
				if (MIRInstrumentation::EnableMachineCallGraph)
				return 8 + BlockID;
				llvm_unreachable("Expected function coverage or call graph instrumentation.");
				}

				void MIPSectionEmitter::emitMIPHeader(MIPFileType FileType) {
				auto &OS = *AP.OutStreamer;
				auto &OutContext = OS.getContext();
				const auto &TT = OutContext.getTargetTriple();
				if (TT.isArch64Bit()) {
				FileType = (MIPFileType)(FileType \| MIP_FILE_TYPE_64_BIT);
				} else if (TT.isArch32Bit()) {
				FileType = (MIPFileType)(FileType \| MIP_FILE_TYPE_32_BIT);
				} else {
				llvm_unreachable("Only 64 bit and 32 bit architectures are supported.");
				}

				OS.emitValueToAlignment(TT.isArch64Bit() ? 8 : 4);

				auto *ReferenceLabel = OutContext.createTempSymbol("ref");
				OS.emitLabel(ReferenceLabel);

				OS.AddComment("Magic");
				OS.emitIntValueInHex(MIP_MAGIC_VALUE, 4);

				OS.AddComment("Version");
				OS.emitIntValue(MIP_VERSION, 2);

				OS.AddComment("File Type");
				OS.emitIntValueInHex(FileType, 2);

				uint32_t ProfileType;
				if (MIRInstrumentation::EnableMachineFunctionCoverage) {
				ProfileType = MIP_PROFILE_TYPE_FUNCTION_COVERAGE;
				} else if (MIRInstrumentation::EnableMachineCallGraph) {
				ProfileType = MIP_PROFILE_TYPE_FUNCTION_TIMESTAMP \|
				MIP_PROFILE_TYPE_FUNCTION_CALL_COUNT;
				} else {
				llvm_unreachable(
				"Expected function coverage or call graph instrumentation.");
				}
				if (MIRInstrumentation::EnableMachineBasicBlockCoverage)
				ProfileType \|= MIP_PROFILE_TYPE_BLOCK_COVERAGE;
				OS.AddComment("Profile Type");
				OS.emitIntValueInHex(ProfileType, 4);

				OS.AddComment("Module Hash");
				OS.emitIntValueInHex((uint32_t)MD5Hash(MIRInstrumentation::LinkUnitName), 4);

				if (FileType & MIP_FILE_TYPE_MAP) {
				OS.AddComment("Raw Section Start PC Offset");
				OS.emitValue(
				MCBinaryExpr::createSub(
				MCSymbolRefExpr::create(
				getMIPSectionBeginSymbol(MIP_RAW_SECTION_NAME), OutContext),
				MCSymbolRefExpr::create(ReferenceLabel, OutContext), OutContext),
				TT.isArch64Bit() ? 8 : 4);
				} else {
				OS.emitZeros(TT.isArch64Bit() ? 8 : 4);
				}
				if (TT.isArch32Bit())
				OS.emitZeros(4);
				OS.emitZeros(4);

				OS.AddComment("Offset To Data");
				OS.emitIntValueInHex(sizeof(MIPHeader), 4);

				OS.AddBlankLine();
				}

				void MIPSectionEmitter::emitMIPFunctionData(const MFInfo &Info) {
				auto &OS = *AP.OutStreamer;
				auto &OutContext = OS.getContext();
				const auto &TT = OutContext.getTargetTriple();
				auto MangledName = getMangledName(Info.Func);

				MCSection *RawSection;
				if (TT.getObjectFormat() == Triple::ELF) {
				if (Info.Func->hasComdat()) {
				auto ComdatName = Info.Func->getComdat()->getName();
				RawSection = (MCSection *)OutContext.getELFSection(
				MIP_RAW_SECTION_NAME, ELF::SHT_PROGBITS,
				ELF::SHF_WRITE \| ELF::SHF_ALLOC \| ELF::SHF_GROUP, 0, ComdatName,
				/IsComdat=/true);
				} else {
				auto GroupName = MangledName + "$MIP";
				RawSection = (MCSection *)OutContext.getELFSection(
				MIP_RAW_SECTION_NAME, ELF::SHT_PROGBITS,
				ELF::SHF_WRITE \| ELF::SHF_ALLOC \| ELF::SHF_GROUP, 0, GroupName,
				/IsComdat=/false);
				}
				} else {
				RawSection = OutContext.getObjectFileInfo()->getMIPRawSection();
				}
				OS.SwitchSection(RawSection);

				OS.emitSymbolAttribute(Info.RawProfileSymbol,
				AP.MAI->getHiddenVisibilityAttr());
				if (MIRInstrumentation::EnableMachineFunctionCoverage) {
				OS.emitValueToAlignment(1);
				AP.emitLinkage(Info.Func, Info.RawProfileSymbol);
				OS.emitLabel(Info.RawProfileSymbol);

				OS.emitIntValueInHex(0xFF, 1);
				} else if (MIRInstrumentation::EnableMachineCallGraph) {
				OS.emitValueToAlignment(4);
				AP.emitLinkage(Info.Func, Info.RawProfileSymbol);
				OS.emitLabel(Info.RawProfileSymbol);

				OS.emitIntValueInHex(0xFFFFFFFF, 4);
				OS.emitIntValueInHex(0xFFFFFFFF, 4);
				} else {
				llvm_unreachable(
				"Expected function coverage or call graph instrumentation.");
				}

				if (MIRInstrumentation::EnableMachineBasicBlockCoverage) {
				OS.emitFill(Info.NonEntryBasicBlockCount, 0xFF);
				}

				OS.AddBlankLine();
				}

				void MIPSectionEmitter::emitMIPFunctionInfo(MFInfo &Info) {
				auto &OS = *AP.OutStreamer;
				auto &OutContext = OS.getContext();
				const auto &TT = OutContext.getTargetTriple();
				auto MangledName = getMangledName(Info.Func);

				MCSection *MapSection;
				if (TT.getObjectFormat() == Triple::ELF) {
				if (Info.Func->hasComdat()) {
				auto ComdatName = Info.Func->getComdat()->getName();
				MapSection = (MCSection *)OutContext.getELFSection(
				MIP_MAP_SECTION_NAME, ELF::SHT_PROGBITS,
				ELF::SHF_WRITE \| ELF::SHF_GROUP, 0, ComdatName,
				/IsComdat=/true);
				} else {
				auto GroupName = MangledName + "$MIP";
				MapSection = (MCSection *)OutContext.getELFSection(
				MIP_MAP_SECTION_NAME, ELF::SHT_PROGBITS,
				ELF::SHF_WRITE \| ELF::SHF_GROUP, 0, GroupName,
				/IsComdat=/false);
				}
				} else {
				MapSection = OutContext.getObjectFileInfo()->getMIPMapSection();
				}
				OS.SwitchSection(MapSection);

				auto *MapEntrySymbol = OutContext.getOrCreateSymbol(MangledName + "$MAP");
				AP.emitLinkage(Info.Func, MapEntrySymbol);
				OS.emitValueToAlignment(TT.isArch64Bit() ? 8 : 4);
				OS.emitLabel(MapEntrySymbol);

				// NOTE: Since we cannot compute a difference across sections, we use
				// PC-relative relocations to represent the section-relative address of
				// `Info.RawProfileSymbol`. The actual section-relative address is
				// computed by
				// <Raw Profile Symbol PC Offset> - (<Raw Section Start PC Offset> - C)
				// where <Raw Section Start PC Offset> is found in the header and C is
				// the section-relative address of this function info and is known when
				// reading the `.mipmap` file.
				auto *ReferenceLabel = OutContext.createTempSymbol("ref");
				OS.emitLabel(ReferenceLabel);
				OS.AddComment("Raw Profile Symbol PC Offset");
				OS.emitValue(MCBinaryExpr::createSub(
				MCSymbolRefExpr::create(Info.RawProfileSymbol, OutContext),
				MCSymbolRefExpr::create(ReferenceLabel, OutContext),
				OutContext),
				TT.isArch64Bit() ? 8 : 4);
				// NOTE: We use the same method to encode the offset of the function to the
				// raw section. Then we can compute the absolute address of the function
				// by adding the absolute address of the raw section.
				OS.AddComment("Function PC Offset");
				OS.emitValue(MCBinaryExpr::createSub(
				MCSymbolRefExpr::create(Info.StartSymbol, OutContext),
				MCSymbolRefExpr::create(ReferenceLabel, OutContext),
				OutContext),
				TT.isArch64Bit() ? 8 : 4);

				OS.AddComment("Function Size");
				OS.emitValue(MCBinaryExpr::createSub(
				MCSymbolRefExpr::create(Info.EndSymbol, OutContext),
				MCSymbolRefExpr::create(Info.StartSymbol, OutContext),
				OutContext),
				4);

				OS.AddComment("CFG Signature");
				OS.emitIntValueInHex(Info.ControlFlowGraphSignature, 4);

				OS.AddComment("Non-entry Block Count");
				OS.emitIntValue(Info.NonEntryBasicBlockCount, 4);

				for (uint64_t BlockID = 0; BlockID < Info.NonEntryBasicBlockCount;
				BlockID++) {
				if (Info.BasicBlockInfos.count(BlockID)) {
				const auto &MBBInfo = Info.BasicBlockInfos[BlockID];
				OS.AddComment("Block " + Twine(BlockID) + " Offset");
				OS.emitValue(MCBinaryExpr::createSub(
				MCSymbolRefExpr::create(MBBInfo.StartSymbol, OutContext),
				MCSymbolRefExpr::create(Info.StartSymbol, OutContext),
				OutContext),
				4);
				} else {
				OS.emitZeros(4);
				}
				}

				OS.AddComment("Function Name Length");
				OS.emitIntValue(MangledName.size(), 4);
				OS.emitBytes(MangledName);

				OS.AddBlankLine();
				}

				void MIPSectionEmitter::serializeToMIPRawSection() {
				if (!MIRInstrumentation::EnableMachineInstrumentation)
				return;

				auto &OS = *AP.OutStreamer;
				auto &OutContext = OS.getContext();

				// NOTE: We either emit a COMDAT section or a weak definition to ensure the
				// header symbol is deduplicated correctly.
				if (auto *MIPRawHeaderComdatSection =
				OutContext.getObjectFileInfo()->getMIPRawHeaderComdatSection()) {
				OS.SwitchSection(MIPRawHeaderComdatSection);
				} else {
				OS.SwitchSection(OutContext.getObjectFileInfo()->getMIPRawSection());
				auto *HeaderSymbol = getMIPSectionBeginSymbol(MIP_RAW_SECTION_NAME);
				OS.emitSymbolAttribute(HeaderSymbol, MCSymbolAttr::MCSA_Global);
				OS.emitSymbolAttribute(HeaderSymbol, MCSymbolAttr::MCSA_WeakDefinition);
				OS.emitSymbolAttribute(HeaderSymbol, MCSymbolAttr::MCSA_NoDeadStrip);
				OS.emitLabel(HeaderSymbol);
				}

				emitMIPHeader(MIP_FILE_TYPE_RAW);

				for (auto &Pair : FunctionInfos) {
				emitMIPFunctionData(Pair.second);
				}
				}

				void MIPSectionEmitter::serializeToMIPMapSection() {
				if (!MIRInstrumentation::EnableMachineInstrumentation)
				return;

				auto &OS = *AP.OutStreamer;
				auto &OutContext = OS.getContext();

				// NOTE: We either emit a COMDAT section or a weak definition to ensure the
				// header symbol is deduplicated correctly.
				if (auto *MIPMapHeaderComdatSection =
				OutContext.getObjectFileInfo()->getMIPMapHeaderComdatSection()) {
				OS.SwitchSection(MIPMapHeaderComdatSection);
				} else {
				OS.SwitchSection(OutContext.getObjectFileInfo()->getMIPMapSection());
				auto *HeaderSymbol = getMIPSectionBeginSymbol(MIP_MAP_SECTION_NAME);
				OS.emitSymbolAttribute(HeaderSymbol, MCSymbolAttr::MCSA_Global);
				OS.emitSymbolAttribute(HeaderSymbol, MCSymbolAttr::MCSA_WeakDefinition);
				OS.emitLabel(HeaderSymbol);
				}

				emitMIPHeader(MIP_FILE_TYPE_MAP);

				for (auto &Pair : FunctionInfos) {
				emitMIPFunctionInfo(Pair.second);
				}
				}

llvm/lib/CodeGen/MIRInstrumentationPass.cpp

This file was added.

				//===-------------- MIRInstrumentationPass.cpp ----------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#define DEBUG_TYPE "machine-ir-instrumentation"

				#include "llvm/CodeGen/MIRInstrumentationPass.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/TargetInstrInfo.h"
				#include "llvm/InitializePasses.h"
				#include "llvm/Target/TargetMachine.h"

				using namespace llvm;

				STATISTIC(NumInstrumented, "Number of machine functions instrumented");
				STATISTIC(NumBlocksInstrumented, "Number of machine basic blocks instrumented");

				char MIRInstrumentation::ID;
				char &llvm::MIRInstrumentationID = MIRInstrumentation::ID;
				INITIALIZE_PASS(MIRInstrumentation, DEBUG_TYPE,
				"Add instrumentation code to machine functions.", false, false)

				cl::opt<bool> MIRInstrumentation::EnableMachineInstrumentation(
				"enable-machine-instrumentation", cl::init(false), cl::ZeroOrMore,
				cl::desc("Instrument machine ir"));
				cl::opt<bool> MIRInstrumentation::EnableMachineFunctionCoverage(
				"enable-machine-function-coverage", cl::init(false), cl::ZeroOrMore,
				cl::desc("Instrument machine ir to profile function coverage only."));
				cl::opt<bool> MIRInstrumentation::EnableMachineBasicBlockCoverage(
				"enable-machine-block-coverage", cl::init(false), cl::ZeroOrMore,
				cl::desc("Instrument machine ir to profile machine basic blocks."));
				cl::opt<bool> MIRInstrumentation::EnableMachineCallGraph(
				"enable-machine-call-graph", cl::init(false), cl::ZeroOrMore,
				cl::desc("Instrument machine ir to profile the dynamic call graph."));
				cl::opt<unsigned> MIRInstrumentation::MachineProfileRuntimeBufferSize(
				"machine-profile-runtime-buffer", cl::init(0), cl::ZeroOrMore,
				cl::value_desc("RuntimeBufferSize"),
				cl::desc("Allocate a buffer of <RuntimeBufferSize> bytes to hold machine "
				"function call samples."));
				cl::opt<unsigned> MIRInstrumentation::MachineProfileFunctionGroupCount(
				"machine-profile-function-group-count", cl::init(1), cl::ZeroOrMore,
				cl::value_desc("N"),
				cl::desc(
				"Partition the machine functions into <N> groups and instrument the "
				"group specified by -machine-profile-selected-function-group."));
				cl::opt<unsigned> MIRInstrumentation::MachineProfileSelectedFunctionGroup(
				"machine-profile-selected-function-group", cl::init(0), cl::ZeroOrMore,
				cl::value_desc("i"),
				cl::desc("Instrument group <i>. Must be in the range [0, "
				"-fmachine-profile-function-group-count)."));
				cl::opt<unsigned> MIRInstrumentation::MachineProfileMinInstructionCount(
				"machine-profile-min-instruction-count", cl::init(0), cl::ZeroOrMore,
				cl::value_desc("N"),
				cl::desc("Do not instrument machine function that have fewer than <N> "
				"machine instructions."));

				std::string MIRInstrumentation::LinkUnitName;
				cl::opt<std::string, true> MIRInstrumentation::LinkUnitNameOption(
				"link-unit-name", cl::location(MIRInstrumentation::LinkUnitName),
				cl::init(""), cl::ZeroOrMore, cl::value_desc("LinkUnitName"),
				cl::desc("Use <LinkUnitName> to identify this link unit"));

				bool MIRInstrumentation::doInitialization(Module &M) {
				auto &Ctx = M.getContext();
				if (EnableMachineInstrumentation) {
				if (EnableMachineFunctionCoverage == EnableMachineCallGraph)
				Ctx.emitError("Exactly one of -" + Twine(EnableMachineCallGraph.ArgStr) +
				" or -" + Twine(EnableMachineFunctionCoverage.ArgStr) +
				" must be provided when using -" +
				Twine(EnableMachineInstrumentation.ArgStr) + ".");

				if (EnableMachineFunctionCoverage && MachineProfileRuntimeBufferSize)
				Ctx.emitError("Cannot set -" +
				Twine(MachineProfileRuntimeBufferSize.ArgStr) + " when -" +
				Twine(EnableMachineFunctionCoverage.ArgStr) +
				" is provided.");

				if (EnableMachineCallGraph)
				Ctx.emitError("-" + Twine(EnableMachineCallGraph.ArgStr) +
				" is not yet implemented.");

				if (MachineProfileRuntimeBufferSize)
				Ctx.emitError("-" + Twine(MachineProfileRuntimeBufferSize.ArgStr) +
				" is not yet implemented.");
				}
				return false;
				}

				bool MIRInstrumentation::runOnMachineFunction(MachineFunction &MF) {
				if (!shouldInstrumentMachineFunction(MF))
				return false;

				LLVM_DEBUG(dbgs() << "Visit " << MF.getName());

				SmallVector<MachineBasicBlock *, 4> MBBs;
				getMachineBasicBlocks(MF, MBBs);
				if (MBBs.empty()) {
				LLVM_DEBUG(dbgs() << MF.getName() << " has zero non-debug blocks");
				return false;
				}

				auto &EntryBlock = *MBBs[0];
				unsigned NonEntryBlockCount =
				EnableMachineBasicBlockCoverage ? MBBs.size() - 1 : 0;

				const auto &TII = *MF.getSubtarget().getInstrInfo();
				auto MBBI = EntryBlock.begin();
				const auto &DL = MBBI->getDebugLoc();
				BuildMI(EntryBlock, MBBI, DL,
				TII.get(TargetOpcode::MIP_FUNCTION_INSTRUMENTATION_MARKER))
				.addImm(getControlFlowGraphSignature(MBBs))
				.addImm(NonEntryBlockCount);
				++NumInstrumented;

				if (EnableMachineFunctionCoverage) {
				BuildMI(EntryBlock, MBBI, DL,
				TII.get(TargetOpcode::MIP_FUNCTION_COVERAGE_INSTRUMENTATION))
				.addReg(TII.getTemporaryMachineProfileRegister(EntryBlock));
				} else if (EnableMachineCallGraph) {
				llvm_unreachable("Not yet implemented");
				} else {
				llvm_unreachable(
				"Expected function coverage or call graph instrumentation.");
				}

				for (uint32_t BlockID = 0; BlockID < NonEntryBlockCount; BlockID++) {
				auto &MBB = MBBs[/EntryBlock=*/1 + BlockID];
				runOnMachineBasicBlock(MBB, BlockID);
				}

				return true;
				}

				void MIRInstrumentation::runOnMachineBasicBlock(MachineBasicBlock &MBB,
				uint32_t BlockID) {
				const auto &MF = *MBB.getParent();
				const auto &TII = *MF.getSubtarget().getInstrInfo();
				auto MBBI = MBB.begin();
				const auto &DL = MBBI->getDebugLoc();
				BuildMI(MBB, MBBI, DL,
				TII.get(TargetOpcode::MIP_BASIC_BLOCK_COVERAGE_INSTRUMENTATION))
				.addReg(TII.getTemporaryMachineProfileRegister(MBB))
				.addImm(BlockID);
				++NumBlocksInstrumented;
				}

				bool MIRInstrumentation::shouldInstrumentMachineFunction(
				const MachineFunction &MF) const {
				auto Name = MF.getName();
				if (MF.empty() \|\| Name.empty())
				return false;

				if (Name.startswith("OUTLINED_FUNCTION_"))
				return false;

				if (MF.getFunction().hasFnAttribute(Attribute::Naked))
				return false;

				if (MF.getInstructionCount() < MachineProfileMinInstructionCount)
				return false;

				if (MachineProfileFunctionGroupCount > 1) {
				unsigned Group = MD5Hash(Name) % MachineProfileFunctionGroupCount;
				if (Group != MachineProfileSelectedFunctionGroup)
				return false;
				}

				return true;
				}

				void MIRInstrumentation::getMachineBasicBlocks(
				MachineFunction &MF, SmallVectorImpl<MachineBasicBlock *> &MBBs) const {
				auto ShouldSkipBlock = [](const MachineBasicBlock &MBB) {
				return MBB.empty() \|\| MBB.getFirstNonDebugInstr() == MBB.end();
				};

				for (auto &MBB : MF) {
				if (!ShouldSkipBlock(MBB))
				MBBs.push_back(&MBB);
				}
				}

				uint32_t MIRInstrumentation::getControlFlowGraphSignature(
				SmallVectorImpl<MachineBasicBlock *> &MBBs) const {
				if (MBBs.size() <= 1)
				return 0;

				DenseMap<const MachineBasicBlock *, uint32_t> BlockToID;
				uint32_t ID = 0;
				for (auto *MBB : MBBs)
				BlockToID[MBB] = ID++;

				std::string AdjacencyList;
				raw_string_ostream OS(AdjacencyList);
				for (auto *MBB : MBBs) {
				OS << "{";
				for (auto *Succ : MBB->successors())
				OS << BlockToID[Succ] << ";";
				OS << "}";
				}

				return MD5Hash(OS.str());
				}

llvm/lib/CodeGen/TargetPassConfig.cpp

Show All 17 Lines
#include "llvm/Analysis/BasicAliasAnalysis.h"		#include "llvm/Analysis/BasicAliasAnalysis.h"
#include "llvm/Analysis/CFLAndersAliasAnalysis.h"		#include "llvm/Analysis/CFLAndersAliasAnalysis.h"
#include "llvm/Analysis/CFLSteensAliasAnalysis.h"		#include "llvm/Analysis/CFLSteensAliasAnalysis.h"
#include "llvm/Analysis/CallGraphSCCPass.h"		#include "llvm/Analysis/CallGraphSCCPass.h"
#include "llvm/Analysis/ScopedNoAliasAA.h"		#include "llvm/Analysis/ScopedNoAliasAA.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/TypeBasedAliasAnalysis.h"		#include "llvm/Analysis/TypeBasedAliasAnalysis.h"
#include "llvm/CodeGen/CSEConfigBase.h"		#include "llvm/CodeGen/CSEConfigBase.h"
		#include "llvm/CodeGen/MIRInstrumentationPass.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachinePassRegistry.h"		#include "llvm/CodeGen/MachinePassRegistry.h"
#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
#include "llvm/CodeGen/RegAllocRegistry.h"		#include "llvm/CodeGen/RegAllocRegistry.h"
#include "llvm/IR/IRPrintingPasses.h"		#include "llvm/IR/IRPrintingPasses.h"
#include "llvm/IR/LegacyPassManager.h"		#include "llvm/IR/LegacyPassManager.h"
#include "llvm/IR/PassInstrumentation.h"		#include "llvm/IR/PassInstrumentation.h"
#include "llvm/IR/Verifier.h"		#include "llvm/IR/Verifier.h"
▲ Show 20 Lines • Show All 1,145 Lines • ▼ Show 20 Lines	void TargetPassConfig::addMachinePasses() {
if (EnableFSDiscriminator && !FSNoFinalDiscrim)		if (EnableFSDiscriminator && !FSNoFinalDiscrim)
// Add FS discriminators here so that all the instruction duplicates		// Add FS discriminators here so that all the instruction duplicates
// in different BBs get their own discriminators. With this, we can "sum"		// in different BBs get their own discriminators. With this, we can "sum"
// the SampleFDO counters instead of using MAX. This will improve the		// the SampleFDO counters instead of using MAX. This will improve the
// SampleFDO profile quality.		// SampleFDO profile quality.
addPass(createMIRAddFSDiscriminatorsPass(		addPass(createMIRAddFSDiscriminatorsPass(
sampleprof::FSDiscriminatorPass::PassLast));		sampleprof::FSDiscriminatorPass::PassLast));

		// Inject machine instrumentation code into machine functions.
		// NOTE: Block instrumentation may increase block size. Inject code before
		// branch relaxation.
		if (MIRInstrumentation::EnableMachineInstrumentation)
		addPass(&MIRInstrumentationID);

addPreEmitPass();		addPreEmitPass();

if (TM->Options.EnableIPRA)		if (TM->Options.EnableIPRA)
// Collect register usage information and produce a register mask of		// Collect register usage information and produce a register mask of
// clobbered registers, to be used to optimize call sites.		// clobbered registers, to be used to optimize call sites.
addPass(createRegUsageInfoCollector());		addPass(createRegUsageInfoCollector());

// FIXME: Some backends are incompatible with running the verifier after		// FIXME: Some backends are incompatible with running the verifier after
▲ Show 20 Lines • Show All 296 Lines • Show Last 20 Lines

llvm/lib/MC/MCMachOStreamer.cpp

Show All 24 Lines
#include "llvm/MC/MCObjectStreamer.h"		#include "llvm/MC/MCObjectStreamer.h"
#include "llvm/MC/MCObjectWriter.h"		#include "llvm/MC/MCObjectWriter.h"
#include "llvm/MC/MCSection.h"		#include "llvm/MC/MCSection.h"
#include "llvm/MC/MCSectionMachO.h"		#include "llvm/MC/MCSectionMachO.h"
#include "llvm/MC/MCStreamer.h"		#include "llvm/MC/MCStreamer.h"
#include "llvm/MC/MCSymbol.h"		#include "llvm/MC/MCSymbol.h"
#include "llvm/MC/MCSymbolMachO.h"		#include "llvm/MC/MCSymbolMachO.h"
#include "llvm/MC/MCValue.h"		#include "llvm/MC/MCValue.h"
		#include "llvm/MIP/MIP.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/TargetRegistry.h"		#include "llvm/Support/TargetRegistry.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <cassert>		#include <cassert>
#include <vector>		#include <vector>

using namespace llvm;		using namespace llvm;
▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	static bool canGoAfterDWARF(const MCSectionMachO &MSec) {

if (SegName == "__TEXT" && SecName == "__eh_frame")		if (SegName == "__TEXT" && SecName == "__eh_frame")
return true;		return true;

if (SegName == "__DATA" && (SecName == "__nl_symbol_ptr" \|\|		if (SegName == "__DATA" && (SecName == "__nl_symbol_ptr" \|\|
SecName == "__thread_ptr"))		SecName == "__thread_ptr"))
return true;		return true;

		if (SegName == "__DATA" &&
		(SecName == MIP_MAP_SECTION_NAME \|\| SecName == MIP_RAW_SECTION_NAME))
		return true;

return false;		return false;
}		}

void MCMachOStreamer::changeSection(MCSection *Section,		void MCMachOStreamer::changeSection(MCSection *Section,
const MCExpr *Subsection) {		const MCExpr *Subsection) {
// Change the section normally.		// Change the section normally.
bool Created = changeSectionImpl(Section, Subsection);		bool Created = changeSectionImpl(Section, Subsection);
const MCSectionMachO &MSec = *cast<MCSectionMachO>(Section);		const MCSectionMachO &MSec = *cast<MCSectionMachO>(Section);
▲ Show 20 Lines • Show All 370 Lines • Show Last 20 Lines

llvm/lib/MC/MCObjectFileInfo.cpp

Show All 14 Lines
#include "llvm/MC/MCAsmInfo.h"		#include "llvm/MC/MCAsmInfo.h"
#include "llvm/MC/MCContext.h"		#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCSection.h"		#include "llvm/MC/MCSection.h"
#include "llvm/MC/MCSectionCOFF.h"		#include "llvm/MC/MCSectionCOFF.h"
#include "llvm/MC/MCSectionELF.h"		#include "llvm/MC/MCSectionELF.h"
#include "llvm/MC/MCSectionMachO.h"		#include "llvm/MC/MCSectionMachO.h"
#include "llvm/MC/MCSectionWasm.h"		#include "llvm/MC/MCSectionWasm.h"
#include "llvm/MC/MCSectionXCOFF.h"		#include "llvm/MC/MCSectionXCOFF.h"
		#include "llvm/MIP/MIP.h"

using namespace llvm;		using namespace llvm;

static bool useCompactUnwind(const Triple &T) {		static bool useCompactUnwind(const Triple &T) {
// Only on darwin.		// Only on darwin.
if (!T.isOSDarwin())		if (!T.isOSDarwin())
return false;		return false;

▲ Show 20 Lines • Show All 259 Lines • ▼ Show 20 Lines	DwarfTUIndexSection =
Ctx->getMachOSection("__DWARF", "__debug_tu_index", MachO::S_ATTR_DEBUG,		Ctx->getMachOSection("__DWARF", "__debug_tu_index", MachO::S_ATTR_DEBUG,
SectionKind::getMetadata());		SectionKind::getMetadata());
StackMapSection = Ctx->getMachOSection("__LLVM_STACKMAPS", "__llvm_stackmaps",		StackMapSection = Ctx->getMachOSection("__LLVM_STACKMAPS", "__llvm_stackmaps",
0, SectionKind::getMetadata());		0, SectionKind::getMetadata());

FaultMapSection = Ctx->getMachOSection("__LLVM_FAULTMAPS", "__llvm_faultmaps",		FaultMapSection = Ctx->getMachOSection("__LLVM_FAULTMAPS", "__llvm_faultmaps",
0, SectionKind::getMetadata());		0, SectionKind::getMetadata());

		MIPRawSection = Ctx->getMachOSection("__DATA", MIP_RAW_SECTION_NAME, 0,
		SectionKind::getMetadata());
		MIPMapSection = Ctx->getMachOSection("__DATA", MIP_MAP_SECTION_NAME,
		MachO::S_ATTR_LIVE_SUPPORT, 0,
		SectionKind::getMetadata());

RemarksSection = Ctx->getMachOSection(		RemarksSection = Ctx->getMachOSection(
"__LLVM", "__remarks", MachO::S_ATTR_DEBUG, SectionKind::getMetadata());		"__LLVM", "__remarks", MachO::S_ATTR_DEBUG, SectionKind::getMetadata());

TLSExtraDataSection = TLSTLVSection;		TLSExtraDataSection = TLSTLVSection;
}		}

void MCObjectFileInfo::initELFMCObjectFileInfo(const Triple &T, bool Large) {		void MCObjectFileInfo::initELFMCObjectFileInfo(const Triple &T, bool Large) {
switch (T.getArch()) {		switch (T.getArch()) {
▲ Show 20 Lines • Show All 183 Lines • ▼ Show 20 Lines	DwarfTUIndexSection =
Ctx->getELFSection(".debug_tu_index", DebugSecType, 0);		Ctx->getELFSection(".debug_tu_index", DebugSecType, 0);

StackMapSection =		StackMapSection =
Ctx->getELFSection(".llvm_stackmaps", ELF::SHT_PROGBITS, ELF::SHF_ALLOC);		Ctx->getELFSection(".llvm_stackmaps", ELF::SHT_PROGBITS, ELF::SHF_ALLOC);

FaultMapSection =		FaultMapSection =
Ctx->getELFSection(".llvm_faultmaps", ELF::SHT_PROGBITS, ELF::SHF_ALLOC);		Ctx->getELFSection(".llvm_faultmaps", ELF::SHT_PROGBITS, ELF::SHF_ALLOC);

		MIPRawHeaderComdatSection =
		Ctx->getELFSection(MIP_RAW_SECTION_NAME, ELF::SHT_PROGBITS,
		ELF::SHF_WRITE \| ELF::SHF_ALLOC \| ELF::SHF_GROUP, 0,
		"Header", /IsComdat=/true);
		MIPMapHeaderComdatSection =
		Ctx->getELFSection(MIP_MAP_SECTION_NAME, ELF::SHT_PROGBITS,
		ELF::SHF_WRITE \| ELF::SHF_GROUP \| ELF::SHF_GNU_RETAIN,
		0, "Header", /IsComdat=/true);

EHFrameSection =		EHFrameSection =
Ctx->getELFSection(".eh_frame", EHSectionType, EHSectionFlags);		Ctx->getELFSection(".eh_frame", EHSectionType, EHSectionFlags);

StackSizesSection = Ctx->getELFSection(".stack_sizes", ELF::SHT_PROGBITS, 0);		StackSizesSection = Ctx->getELFSection(".stack_sizes", ELF::SHT_PROGBITS, 0);

PseudoProbeSection = Ctx->getELFSection(".pseudo_probe", DebugSecType, 0);		PseudoProbeSection = Ctx->getELFSection(".pseudo_probe", DebugSecType, 0);
PseudoProbeDescSection =		PseudoProbeDescSection =
Ctx->getELFSection(".pseudo_probe_desc", DebugSecType, 0);		Ctx->getELFSection(".pseudo_probe_desc", DebugSecType, 0);
▲ Show 20 Lines • Show All 600 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp

Show First 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	void LowerSTACKMAP(MCStreamer &OutStreamer, StackMaps &SM,
const MachineInstr &MI);		const MachineInstr &MI);
void LowerPATCHPOINT(MCStreamer &OutStreamer, StackMaps &SM,		void LowerPATCHPOINT(MCStreamer &OutStreamer, StackMaps &SM,
const MachineInstr &MI);		const MachineInstr &MI);
void LowerSTATEPOINT(MCStreamer &OutStreamer, StackMaps &SM,		void LowerSTATEPOINT(MCStreamer &OutStreamer, StackMaps &SM,
const MachineInstr &MI);		const MachineInstr &MI);
void LowerFAULTING_OP(const MachineInstr &MI);		void LowerFAULTING_OP(const MachineInstr &MI);

void LowerPATCHABLE_FUNCTION_ENTER(const MachineInstr &MI);		void LowerPATCHABLE_FUNCTION_ENTER(const MachineInstr &MI);
		void LowerMIP_FUNCTION_INSTRUMENTATION_MARKER(const MachineInstr &MI);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'LowerMIP_FUNCTION_INSTRUMENTATION_MARKER' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'LowerMIP_FUNCTION_INSTRUMENTATION_MARKER'…
		void LowerMIP_FUNCTION_COVERAGE_INSTRUMENTATION(const MachineInstr &MI);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'LowerMIP_FUNCTION_COVERAGE_INSTRUMENTATION' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function…
		void LowerMIP_BASIC_BLOCK_COVERAGE_INSTRUMENTATION(const MachineInstr &MI);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'LowerMIP_BASIC_BLOCK_COVERAGE_INSTRUMENTATION' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function…
void LowerPATCHABLE_FUNCTION_EXIT(const MachineInstr &MI);		void LowerPATCHABLE_FUNCTION_EXIT(const MachineInstr &MI);
void LowerPATCHABLE_TAIL_CALL(const MachineInstr &MI);		void LowerPATCHABLE_TAIL_CALL(const MachineInstr &MI);

typedef std::tuple<unsigned, bool, uint32_t> HwasanMemaccessTuple;		typedef std::tuple<unsigned, bool, uint32_t> HwasanMemaccessTuple;
std::map<HwasanMemaccessTuple, MCSymbol *> HwasanMemaccessSymbols;		std::map<HwasanMemaccessTuple, MCSymbol *> HwasanMemaccessSymbols;
void LowerHWASAN_CHECK_MEMACCESS(const MachineInstr &MI);		void LowerHWASAN_CHECK_MEMACCESS(const MachineInstr &MI);
void EmitHwasanMemaccessSymbols(Module &M);		void EmitHwasanMemaccessSymbols(Module &M);

▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines

void AArch64AsmPrinter::emitFunctionHeaderComment() {		void AArch64AsmPrinter::emitFunctionHeaderComment() {
const AArch64FunctionInfo *FI = MF->getInfo<AArch64FunctionInfo>();		const AArch64FunctionInfo *FI = MF->getInfo<AArch64FunctionInfo>();
Optional<std::string> OutlinerString = FI->getOutliningStyle();		Optional<std::string> OutlinerString = FI->getOutliningStyle();
if (OutlinerString != None)		if (OutlinerString != None)
OutStreamer->GetCommentOS() << ' ' << OutlinerString;		OutStreamer->GetCommentOS() << ' ' << OutlinerString;
}		}

		void AArch64AsmPrinter::LowerMIP_FUNCTION_INSTRUMENTATION_MARKER(
		const MachineInstr &MI) {
		MIPEmitter.runOnFunctionInstrumentationMarker(MI);
		}

		void AArch64AsmPrinter::LowerMIP_FUNCTION_COVERAGE_INSTRUMENTATION(
		const MachineInstr &MI) {
		auto RawProfileSymbol = MIPEmitter.getRawProfileSymbol(MI.getMF());

		const auto ProfileRegister = MI.getOperand(0).getReg();
		auto RawAddressPageMO =
		MachineOperand::CreateMCSymbol(RawProfileSymbol, AArch64II::MO_PAGE);
		auto RawAddressPageOffsetMO = MachineOperand::CreateMCSymbol(
		RawProfileSymbol, AArch64II::MO_PAGEOFF \| AArch64II::MO_NC);
		MCOperand RawAddressPageMCO, RawAddressPageOffsetMCO;
		lowerOperand(RawAddressPageMO, RawAddressPageMCO);
		lowerOperand(RawAddressPageOffsetMO, RawAddressPageOffsetMCO);

		OutStreamer->AddComment("MIP: Function Coverage");
		// adrp <ProfileRegister>, <RawProfileSymbolPage>
		EmitToStreamer(*OutStreamer, MCInstBuilder(AArch64::ADRP)
		.addReg(ProfileRegister)
		.addOperand(RawAddressPageMCO));
		// strb wzr, [<ProfileRegister>, <RawProfileSymbolPageOffset>]
		EmitToStreamer(*OutStreamer, MCInstBuilder(AArch64::STRBBui)
		.addReg(AArch64::WZR)
		.addReg(ProfileRegister)
		.addOperand(RawAddressPageOffsetMCO));
		}

		void AArch64AsmPrinter::LowerMIP_BASIC_BLOCK_COVERAGE_INSTRUMENTATION(
		const MachineInstr &MI) {
		MIPEmitter.runOnBasicBlockInstrumentationMarker(MI);

		auto RawProfileSymbol = MIPEmitter.getRawProfileSymbol(MI.getMF());
		auto TempRegister = MI.getOperand(0).getReg();
		auto BlockID = MI.getOperand(1).getImm();
		auto Offset = MIPEmitter.getOffsetToRawBlockProfileSymbol(BlockID);
		bool ShouldSpillRegister = TempRegister == AArch64::NoRegister;

		auto RawAddressPageMO =
		MachineOperand::CreateMCSymbol(RawProfileSymbol, AArch64II::MO_PAGE);
		auto RawAddressPageOffsetMO = MachineOperand::CreateMCSymbol(
		RawProfileSymbol, AArch64II::MO_PAGEOFF \| AArch64II::MO_NC);
		MCOperand RawAddressPageMCO, RawAddressPageOffsetMCO;
		lowerOperand(RawAddressPageMO, RawAddressPageMCO);
		lowerOperand(RawAddressPageOffsetMO, RawAddressPageOffsetMCO);

		OutStreamer->AddComment("MIP: Block Coverage");
		// TODO: There is still some oportunity for optimization here. If there is a
		// register that is dead for the whole function, we can use it to store
		// the raw profile address at the beginning of the function. Then for
		// each block we use one `strb` instruction with an offset if the offset
		// is less than 4096.
		if (ShouldSpillRegister) {
		TempRegister = AArch64::X16;
		// Spill the temporary register.
		// str <TempRegister>, [sp, #-16]!
		EmitToStreamer(*OutStreamer, MCInstBuilder(AArch64::STRXpre)
		.addReg(AArch64::SP)
		.addReg(TempRegister)
		.addReg(AArch64::SP)
		.addImm(-16));
		}
		// adrp <TempRegister>, <RawProfileSymbolPage>
		EmitToStreamer(*OutStreamer, MCInstBuilder(AArch64::ADRP)
		.addReg(TempRegister)
		.addOperand(RawAddressPageMCO));
		// add <TempRegister>, <TempRegister>, <Offset>
		EmitToStreamer(*OutStreamer, MCInstBuilder(AArch64::ADDXri)
		.addReg(TempRegister)
		.addReg(TempRegister)
		.addImm(Offset)
		.addImm(0));
		// strb wzr, [<TempRegister>, <RawProfileSymbolPageOffset>]
		EmitToStreamer(*OutStreamer, MCInstBuilder(AArch64::STRBBui)
		.addReg(AArch64::WZR)
		.addReg(TempRegister)
		.addOperand(RawAddressPageOffsetMCO));
		if (ShouldSpillRegister) {
		// Restore the temporary register.
		// ldr <TempRegister>, [sp], #16
		EmitToStreamer(*OutStreamer, MCInstBuilder(AArch64::LDRXpost)
		.addReg(AArch64::SP)
		.addReg(TempRegister)
		.addReg(AArch64::SP)
		.addImm(16));
		}
		}

void AArch64AsmPrinter::LowerPATCHABLE_FUNCTION_ENTER(const MachineInstr &MI)		void AArch64AsmPrinter::LowerPATCHABLE_FUNCTION_ENTER(const MachineInstr &MI)
{		{
const Function &F = MF->getFunction();		const Function &F = MF->getFunction();
if (F.hasFnAttribute("patchable-function-entry")) {		if (F.hasFnAttribute("patchable-function-entry")) {
unsigned Num;		unsigned Num;
if (F.getFnAttribute("patchable-function-entry")		if (F.getFnAttribute("patchable-function-entry")
.getValueAsString()		.getValueAsString()
.getAsInteger(10, Num))		.getAsInteger(10, Num))
▲ Show 20 Lines • Show All 1,094 Lines • ▼ Show 20 Lines	void AArch64AsmPrinter::emitInstruction(const MachineInstr *MI) {
case AArch64::FMOVS0:		case AArch64::FMOVS0:
case AArch64::FMOVD0:		case AArch64::FMOVD0:
EmitFMov0(*MI);		EmitFMov0(*MI);
return;		return;

case TargetOpcode::STACKMAP:		case TargetOpcode::STACKMAP:
return LowerSTACKMAP(OutStreamer, SM, MI);		return LowerSTACKMAP(OutStreamer, SM, MI);

		case TargetOpcode::MIP_FUNCTION_INSTRUMENTATION_MARKER:
		return LowerMIP_FUNCTION_INSTRUMENTATION_MARKER(*MI);

		case TargetOpcode::MIP_FUNCTION_COVERAGE_INSTRUMENTATION:
		return LowerMIP_FUNCTION_COVERAGE_INSTRUMENTATION(*MI);

		case TargetOpcode::MIP_BASIC_BLOCK_COVERAGE_INSTRUMENTATION:
		return LowerMIP_BASIC_BLOCK_COVERAGE_INSTRUMENTATION(*MI);

case TargetOpcode::PATCHPOINT:		case TargetOpcode::PATCHPOINT:
return LowerPATCHPOINT(OutStreamer, SM, MI);		return LowerPATCHPOINT(OutStreamer, SM, MI);

case TargetOpcode::STATEPOINT:		case TargetOpcode::STATEPOINT:
return LowerSTATEPOINT(OutStreamer, SM, MI);		return LowerSTATEPOINT(OutStreamer, SM, MI);

case TargetOpcode::FAULTING_OP:		case TargetOpcode::FAULTING_OP:
return LowerFAULTING_OP(*MI);		return LowerFAULTING_OP(*MI);
▲ Show 20 Lines • Show All 135 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.h

Show First 20 Lines • Show All 313 Lines • ▼ Show 20 Lines	public:

static void decomposeStackOffsetForFrameOffsets(const StackOffset &Offset,		static void decomposeStackOffsetForFrameOffsets(const StackOffset &Offset,
int64_t &NumBytes,		int64_t &NumBytes,
int64_t &NumPredicateVectors,		int64_t &NumPredicateVectors,
int64_t &NumDataVectors);		int64_t &NumDataVectors);
static void decomposeStackOffsetForDwarfOffsets(const StackOffset &Offset,		static void decomposeStackOffsetForDwarfOffsets(const StackOffset &Offset,
int64_t &ByteSized,		int64_t &ByteSized,
int64_t &VGSized);		int64_t &VGSized);
		Register getTemporaryMachineProfileRegister(
		const MachineBasicBlock &MBB) const override;

#define GET_INSTRINFO_HELPER_DECLS		#define GET_INSTRINFO_HELPER_DECLS
#include "AArch64GenInstrInfo.inc"		#include "AArch64GenInstrInfo.inc"

protected:		protected:
/// If the specific machine instruction is an instruction that moves/copies		/// If the specific machine instruction is an instruction that moves/copies
/// value from one register to another register return destination and source		/// value from one register to another register return destination and source
/// registers as machine operands.		/// registers as machine operands.
Optional<DestSourcePair>		Optional<DestSourcePair>
▲ Show 20 Lines • Show All 180 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	unsigned AArch64InstrInfo::getInstSizeInBytes(const MachineInstr &MI) const {
// before the assembly printer.		// before the assembly printer.
unsigned NumBytes = 0;		unsigned NumBytes = 0;
const MCInstrDesc &Desc = MI.getDesc();		const MCInstrDesc &Desc = MI.getDesc();
switch (Desc.getOpcode()) {		switch (Desc.getOpcode()) {
default:		default:
// Anything not explicitly designated otherwise is a normal 4-byte insn.		// Anything not explicitly designated otherwise is a normal 4-byte insn.
NumBytes = 4;		NumBytes = 4;
break;		break;
		case TargetOpcode::MIP_FUNCTION_COVERAGE_INSTRUMENTATION:
		NumBytes = 8;
		break;
		case TargetOpcode::MIP_BASIC_BLOCK_COVERAGE_INSTRUMENTATION:
		NumBytes = (MI.getOperand(0).getReg() == AArch64::NoRegister) ? 20 : 12;
		break;
case TargetOpcode::STACKMAP:		case TargetOpcode::STACKMAP:
// The upper bound for a stackmap intrinsic is the full length of its shadow		// The upper bound for a stackmap intrinsic is the full length of its shadow
NumBytes = StackMapOpers(&MI).getNumPatchBytes();		NumBytes = StackMapOpers(&MI).getNumPatchBytes();
assert(NumBytes % 4 == 0 && "Invalid number of NOP bytes requested!");		assert(NumBytes % 4 == 0 && "Invalid number of NOP bytes requested!");
break;		break;
case TargetOpcode::PATCHPOINT:		case TargetOpcode::PATCHPOINT:
// The size of the patchpoint intrinsic is the number of bytes requested		// The size of the patchpoint intrinsic is the number of bytes requested
NumBytes = PatchPointOpers(&MI).getNumPatchBytes();		NumBytes = PatchPointOpers(&MI).getNumPatchBytes();
▲ Show 20 Lines • Show All 3,816 Lines • ▼ Show 20 Lines	void AArch64InstrInfo::decomposeStackOffsetForFrameOffsets(
// uses ADDVL for part of it, reducing the number of ADDPL instructions.		// uses ADDVL for part of it, reducing the number of ADDPL instructions.
if (NumPredicateVectors % 8 == 0 \|\| NumPredicateVectors < -64 \|\|		if (NumPredicateVectors % 8 == 0 \|\| NumPredicateVectors < -64 \|\|
NumPredicateVectors > 62) {		NumPredicateVectors > 62) {
NumDataVectors = NumPredicateVectors / 8;		NumDataVectors = NumPredicateVectors / 8;
NumPredicateVectors -= NumDataVectors * 8;		NumPredicateVectors -= NumDataVectors * 8;
}		}
}		}

		Register AArch64InstrInfo::getTemporaryMachineProfileRegister(
		const MachineBasicBlock &MBB) const {
		const auto &MF = *MBB.getParent();
		const auto &TRI = getRegisterInfo();
		if (MF.getRegInfo().tracksLiveness()) {
		LiveRegUnits LRU(TRI);
		LRU.addReg(AArch64::LR);
		LRU.addUnits(TRI.getReservedRegs(MF));
		LRU.addLiveIns(MBB);
		for (unsigned Reg : AArch64::GPR64RegClass) {
		if (LRU.available(Reg))
		return Reg;
		}
		}
		return AArch64::NoRegister;
		}

// Helper function to emit a frame offset adjustment from a given		// Helper function to emit a frame offset adjustment from a given
// pointer (SrcReg), stored into DestReg. This function is explicit		// pointer (SrcReg), stored into DestReg. This function is explicit
// in that it requires the opcode.		// in that it requires the opcode.
static void emitFrameOffsetAdj(MachineBasicBlock &MBB,		static void emitFrameOffsetAdj(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI,		MachineBasicBlock::iterator MBBI,
const DebugLoc &DL, unsigned DestReg,		const DebugLoc &DL, unsigned DestReg,
unsigned SrcReg, int64_t Offset, unsigned Opc,		unsigned SrcReg, int64_t Offset, unsigned Opc,
const TargetInstrInfo *TII,		const TargetInstrInfo *TII,
▲ Show 20 Lines • Show All 3,609 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMAsmPrinter.h

Show First 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	public:
// XRay implementation		// XRay implementation
//===------------------------------------------------------------------===//		//===------------------------------------------------------------------===//
public:		public:
// XRay-specific lowering for ARM.		// XRay-specific lowering for ARM.
void LowerPATCHABLE_FUNCTION_ENTER(const MachineInstr &MI);		void LowerPATCHABLE_FUNCTION_ENTER(const MachineInstr &MI);
void LowerPATCHABLE_FUNCTION_EXIT(const MachineInstr &MI);		void LowerPATCHABLE_FUNCTION_EXIT(const MachineInstr &MI);
void LowerPATCHABLE_TAIL_CALL(const MachineInstr &MI);		void LowerPATCHABLE_TAIL_CALL(const MachineInstr &MI);

		// MIP-specific lowering for ARM.
		void LowerMIP_FUNCTION_INSTRUMENTATION_MARKER(const MachineInstr &MI);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'LowerMIP_FUNCTION_INSTRUMENTATION_MARKER' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'LowerMIP_FUNCTION_INSTRUMENTATION_MARKER'…
		void LowerMIP_FUNCTION_COVERAGE_INSTRUMENTATION(const MachineInstr &MI);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'LowerMIP_FUNCTION_COVERAGE_INSTRUMENTATION' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function…
		void LowerMIP_BASIC_BLOCK_COVERAGE_INSTRUMENTATION(const MachineInstr &MI);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'LowerMIP_BASIC_BLOCK_COVERAGE_INSTRUMENTATION' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function…

private:		private:
void EmitSled(const MachineInstr &MI, SledKind Kind);		void EmitSled(const MachineInstr &MI, SledKind Kind);

// Helpers for emitStartOfAsmFile() and emitEndOfAsmFile()		// Helpers for emitStartOfAsmFile() and emitEndOfAsmFile()
void emitAttributes();		void emitAttributes();

// Generic helper used to emit e.g. ARMv5 mul pseudos		// Generic helper used to emit e.g. ARMv5 mul pseudos
void EmitPatchedInstruction(const MachineInstr *MI, unsigned TargetOpc);		void EmitPatchedInstruction(const MachineInstr *MI, unsigned TargetOpc);
Show All 33 Lines

llvm/lib/Target/ARM/ARMAsmPrinter.cpp

Show First 20 Lines • Show All 2,174 Lines • ▼ Show 20 Lines	case ARM::PATCHABLE_FUNCTION_ENTER:
LowerPATCHABLE_FUNCTION_ENTER(*MI);		LowerPATCHABLE_FUNCTION_ENTER(*MI);
return;		return;
case ARM::PATCHABLE_FUNCTION_EXIT:		case ARM::PATCHABLE_FUNCTION_EXIT:
LowerPATCHABLE_FUNCTION_EXIT(*MI);		LowerPATCHABLE_FUNCTION_EXIT(*MI);
return;		return;
case ARM::PATCHABLE_TAIL_CALL:		case ARM::PATCHABLE_TAIL_CALL:
LowerPATCHABLE_TAIL_CALL(*MI);		LowerPATCHABLE_TAIL_CALL(*MI);
return;		return;
		case TargetOpcode::MIP_FUNCTION_INSTRUMENTATION_MARKER:
		return LowerMIP_FUNCTION_INSTRUMENTATION_MARKER(*MI);
		case TargetOpcode::MIP_FUNCTION_COVERAGE_INSTRUMENTATION:
		return LowerMIP_FUNCTION_COVERAGE_INSTRUMENTATION(*MI);
		case TargetOpcode::MIP_BASIC_BLOCK_COVERAGE_INSTRUMENTATION:
		return LowerMIP_BASIC_BLOCK_COVERAGE_INSTRUMENTATION(*MI);
case ARM::SpeculationBarrierISBDSBEndBB: {		case ARM::SpeculationBarrierISBDSBEndBB: {
// Print DSB SYS + ISB		// Print DSB SYS + ISB
MCInst TmpInstDSB;		MCInst TmpInstDSB;
TmpInstDSB.setOpcode(ARM::DSB);		TmpInstDSB.setOpcode(ARM::DSB);
TmpInstDSB.addOperand(MCOperand::createImm(0xf));		TmpInstDSB.addOperand(MCOperand::createImm(0xf));
EmitToStreamer(*OutStreamer, TmpInstDSB);		EmitToStreamer(*OutStreamer, TmpInstDSB);
MCInst TmpInstISB;		MCInst TmpInstISB;
TmpInstISB.setOpcode(ARM::ISB);		TmpInstISB.setOpcode(ARM::ISB);
▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMMCInstLower.cpp

	Show First 20 Lines • Show All 221 Lines • ▼ Show 20 Lines
	{			{
	EmitSled(MI, SledKind::FUNCTION_EXIT);			EmitSled(MI, SledKind::FUNCTION_EXIT);
	}			}

	void ARMAsmPrinter::LowerPATCHABLE_TAIL_CALL(const MachineInstr &MI)			void ARMAsmPrinter::LowerPATCHABLE_TAIL_CALL(const MachineInstr &MI)
	{			{
	EmitSled(MI, SledKind::TAIL_CALL);			EmitSled(MI, SledKind::TAIL_CALL);
	}			}

				void ARMAsmPrinter::LowerMIP_FUNCTION_INSTRUMENTATION_MARKER(
				const MachineInstr &MI) {
				MIPEmitter.runOnFunctionInstrumentationMarker(MI);
				}

				void ARMAsmPrinter::LowerMIP_FUNCTION_COVERAGE_INSTRUMENTATION(
				const MachineInstr &MI) {
				auto RawProfileSymbol = MIPEmitter.getRawProfileSymbol(MI.getMF());

				auto &AFI = *MI.getParent()->getParent()->getInfo<ARMFunctionInfo>();
				auto *RawProfileSymbolLocationLabel =
				OutContext.createTempSymbol("RawSymbolLoc", true);
				auto *LoadLabel = OutContext.createTempSymbol("LoadLabel", true);
				auto *ContinueLabel = OutContext.createTempSymbol("ContinueLabel", true);
				const MCExpr *RawProfileSymbolLocation =
				MCSymbolRefExpr::create(RawProfileSymbolLocationLabel, OutContext);

				// TODO: The emitted code is very unoptimized, but we initially strive for
				// correctness.
				OutStreamer->AddComment("MIP: Function Coverage");
				// push {r0, r1}
				auto PushOpcode = AFI.isThumbFunction() ? ARM::tPUSH : ARM::STMDB_UPD;
				auto PushInst = MCInstBuilder(PushOpcode);
				if (!AFI.isThumbFunction())
				PushInst.addReg(ARM::SP).addReg(ARM::SP);
				PushInst.addImm(ARMCC::AL)
				.addReg(ARM::NoRegister)
				.addReg(ARM::R0)
				.addReg(ARM::R1);
				EmitToStreamer(*OutStreamer, PushInst);

				// ldr r1, <RawProfileSymbolLocation>
				if (AFI.isThumbFunction()) {
				EmitToStreamer(*OutStreamer, MCInstBuilder(ARM::tLDRpci)
				.addReg(ARM::R1)
				.addOperand(MCOperand::createExpr(
				RawProfileSymbolLocation))
				.addImm(ARMCC::AL)
				.addReg(ARM::NoRegister));
				} else {
				EmitToStreamer(*OutStreamer, MCInstBuilder(ARM::LDRi12)
				.addReg(ARM::R1)
				.addOperand(MCOperand::createExpr(
				RawProfileSymbolLocation))
				.addImm(0)
				.addImm(ARMCC::AL)
				.addReg(ARM::NoRegister));
				}

				// <LoadLabel>:
				// add r1, pc, r1
				OutStreamer->emitLabel(LoadLabel);
				if (AFI.isThumbFunction()) {
				EmitToStreamer(*OutStreamer, MCInstBuilder(ARM::tADDhirr)
				.addReg(ARM::R1)
				.addReg(ARM::R1)
				.addReg(ARM::PC)
				.addImm(ARMCC::AL)
				.addReg(ARM::NoRegister));
				} else {
				EmitToStreamer(*OutStreamer, MCInstBuilder(ARM::ADDrr)
				.addReg(ARM::R1)
				.addReg(ARM::PC)
				.addReg(ARM::R1)
				.addImm(ARMCC::AL)
				.addReg(ARM::NoRegister)
				.addReg(ARM::NoRegister));
				}

				// mov r0, #0
				if (AFI.isThumbFunction()) {
				EmitToStreamer(*OutStreamer, MCInstBuilder(ARM::tMOVi8)
				.addReg(ARM::R0)
				.addReg(ARM::CPSR)
				.addImm(0)
				.addImm(ARMCC::AL)
				.addReg(ARM::NoRegister));
				} else {
				EmitToStreamer(*OutStreamer, MCInstBuilder(ARM::MOVi)
				.addReg(ARM::R0)
				.addImm(0)
				.addImm(ARMCC::AL)
				.addReg(ARM::NoRegister)
				.addReg(ARM::NoRegister));
				}

				// strb r0, [r1]
				if (AFI.isThumbFunction()) {
				EmitToStreamer(*OutStreamer, MCInstBuilder(ARM::tSTRBi)
				.addReg(ARM::R0)
				.addReg(ARM::R1)
				.addImm(0)
				.addImm(ARMCC::AL)
				.addReg(ARM::NoRegister));
				} else {
				EmitToStreamer(*OutStreamer, MCInstBuilder(ARM::STRB_POST_IMM)
				.addReg(ARM::R1)
				.addReg(ARM::R0)
				.addReg(ARM::R1)
				.addReg(ARM::NoRegister)
				.addImm(0)
				.addImm(ARMCC::AL)
				.addReg(ARM::NoRegister));
				}

				// pop {r0, r1}
				auto PopOpcode = AFI.isThumbFunction() ? ARM::tPOP : ARM::LDMIA_UPD;
				auto PopInst = MCInstBuilder(PopOpcode);
				if (!AFI.isThumbFunction())
				PopInst.addReg(ARM::SP).addReg(ARM::SP);
				PopInst.addImm(ARMCC::AL)
				.addReg(ARM::NoRegister)
				.addReg(ARM::R0)
				.addReg(ARM::R1);
				EmitToStreamer(*OutStreamer, PopInst);

				// b <ContinueLabel>
				auto BranchOpcode = AFI.isThumbFunction() ? ARM::tB : ARM::Bcc;
				EmitToStreamer(*OutStreamer, MCInstBuilder(BranchOpcode)
				.addExpr(MCSymbolRefExpr::create(
				ContinueLabel, OutContext))
				.addImm(ARMCC::AL)
				.addReg(ARM::NoRegister));
				// NOTE: T16 LDR instructions require labels to be 4 byte aligned.
				// .p2align 2
				// <RawProfileSymbolLabel>:
				// .long <RawProfileSymbol>-(<LoadLabel>+<PCFixup>)
				OutStreamer->emitCodeAlignment(4);
				OutStreamer->emitLabel(RawProfileSymbolLocationLabel);
				// NOTE: In ARM the value of PC is the address of the current instruction
				// plus 8 bytes, but in Thumb it's 4 bytes.
				int64_t PCFixupValue = AFI.isThumbFunction() ? 4 : 8;
				const auto *PCRelativeAddress = MCBinaryExpr::createSub(
				MCSymbolRefExpr::create(RawProfileSymbol, OutContext),
				MCBinaryExpr::createAdd(MCSymbolRefExpr::create(LoadLabel, OutContext),
				MCConstantExpr::create(PCFixupValue, OutContext),
				OutContext),
				OutContext);
				OutStreamer->emitValue(PCRelativeAddress, 4);

				// <ContinueLabel>:
				OutStreamer->emitLabel(ContinueLabel);
				}

				void ARMAsmPrinter::LowerMIP_BASIC_BLOCK_COVERAGE_INSTRUMENTATION(
				const MachineInstr &MI) {
				llvm_unreachable("MIP block coverage is not implemlented for ARM targets");
				}

llvm/lib/Target/X86/X86AsmPrinter.h

Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	class LLVM_LIBRARY_VISIBILITY X86AsmPrinter : public AsmPrinter {
void LowerSTACKMAP(const MachineInstr &MI);		void LowerSTACKMAP(const MachineInstr &MI);
void LowerPATCHPOINT(const MachineInstr &MI, X86MCInstLower &MCIL);		void LowerPATCHPOINT(const MachineInstr &MI, X86MCInstLower &MCIL);
void LowerSTATEPOINT(const MachineInstr &MI, X86MCInstLower &MCIL);		void LowerSTATEPOINT(const MachineInstr &MI, X86MCInstLower &MCIL);
void LowerFAULTING_OP(const MachineInstr &MI, X86MCInstLower &MCIL);		void LowerFAULTING_OP(const MachineInstr &MI, X86MCInstLower &MCIL);
void LowerPATCHABLE_OP(const MachineInstr &MI, X86MCInstLower &MCIL);		void LowerPATCHABLE_OP(const MachineInstr &MI, X86MCInstLower &MCIL);

void LowerTlsAddr(X86MCInstLower &MCInstLowering, const MachineInstr &MI);		void LowerTlsAddr(X86MCInstLower &MCInstLowering, const MachineInstr &MI);

		void LowerMIP_FUNCTION_INSTRUMENTATION_MARKER(const MachineInstr &MI);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'LowerMIP_FUNCTION_INSTRUMENTATION_MARKER' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'LowerMIP_FUNCTION_INSTRUMENTATION_MARKER'…
		void LowerMIP_FUNCTION_COVERAGE_INSTRUMENTATION(const MachineInstr &MI);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'LowerMIP_FUNCTION_COVERAGE_INSTRUMENTATION' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function…
		void LowerMIP_BASIC_BLOCK_COVERAGE_INSTRUMENTATION(const MachineInstr &MI);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'LowerMIP_BASIC_BLOCK_COVERAGE_INSTRUMENTATION' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function…

// XRay-specific lowering for X86.		// XRay-specific lowering for X86.
void LowerPATCHABLE_FUNCTION_ENTER(const MachineInstr &MI,		void LowerPATCHABLE_FUNCTION_ENTER(const MachineInstr &MI,
X86MCInstLower &MCIL);		X86MCInstLower &MCIL);
void LowerPATCHABLE_RET(const MachineInstr &MI, X86MCInstLower &MCIL);		void LowerPATCHABLE_RET(const MachineInstr &MI, X86MCInstLower &MCIL);
void LowerPATCHABLE_TAIL_CALL(const MachineInstr &MI, X86MCInstLower &MCIL);		void LowerPATCHABLE_TAIL_CALL(const MachineInstr &MI, X86MCInstLower &MCIL);
void LowerPATCHABLE_EVENT_CALL(const MachineInstr &MI, X86MCInstLower &MCIL);		void LowerPATCHABLE_EVENT_CALL(const MachineInstr &MI, X86MCInstLower &MCIL);
void LowerPATCHABLE_TYPED_EVENT_CALL(const MachineInstr &MI,		void LowerPATCHABLE_TYPED_EVENT_CALL(const MachineInstr &MI,
X86MCInstLower &MCIL);		X86MCInstLower &MCIL);
▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86MCInstLower.cpp

Show First 20 Lines • Show All 1,379 Lines • ▼ Show 20 Lines	void X86AsmPrinter::LowerSTACKMAP(const MachineInstr &MI) {
MCSymbol *MILabel = Ctx.createTempSymbol();		MCSymbol *MILabel = Ctx.createTempSymbol();
OutStreamer->emitLabel(MILabel);		OutStreamer->emitLabel(MILabel);

SM.recordStackMap(*MILabel, MI);		SM.recordStackMap(*MILabel, MI);
unsigned NumShadowBytes = MI.getOperand(1).getImm();		unsigned NumShadowBytes = MI.getOperand(1).getImm();
SMShadowTracker.reset(NumShadowBytes);		SMShadowTracker.reset(NumShadowBytes);
}		}

		void X86AsmPrinter::LowerMIP_FUNCTION_INSTRUMENTATION_MARKER(
		const MachineInstr &MI) {
		MIPEmitter.runOnFunctionInstrumentationMarker(MI);
		}

		void X86AsmPrinter::LowerMIP_FUNCTION_COVERAGE_INSTRUMENTATION(
		const MachineInstr &MI) {
		auto RawProfileSymbol = MIPEmitter.getRawProfileSymbol(MI.getMF());

		OutStreamer->AddComment("MIP: Function Coverage");
		// movb $0, <RawProfileSymbol>(%rsp)
		EmitAndCountInstruction(
		MCInstBuilder(X86::MOV8mi)
		.addReg(X86::RIP) // BaseReg
		.addImm(1) // ScaleAmt
		.addReg(0) // IndexReg
		.addExpr(MCSymbolRefExpr::create(RawProfileSymbol, OutContext))
		.addReg(0) // Segment
		.addImm(0) // Immediate
		);
		}

		void X86AsmPrinter::LowerMIP_BASIC_BLOCK_COVERAGE_INSTRUMENTATION(
		const MachineInstr &MI) {
		MIPEmitter.runOnBasicBlockInstrumentationMarker(MI);

		auto RawProfileSymbol = MIPEmitter.getRawProfileSymbol(MI.getMF());

		auto BlockID = MI.getOperand(1).getImm();
		auto Offset = MIPEmitter.getOffsetToRawBlockProfileSymbol(BlockID);

		OutStreamer->AddComment("MIP: Block Coverage");
		// movb $0, <RawProfileSymbol>+<Offset>(%rsp)
		EmitAndCountInstruction(
		MCInstBuilder(X86::MOV8mi)
		.addReg(X86::RIP) // BaseReg
		.addImm(1) // ScaleAmt
		.addReg(0) // IndexReg
		.addExpr(MCBinaryExpr::createAdd(
		MCSymbolRefExpr::create(RawProfileSymbol, OutContext),
		MCConstantExpr::create(Offset, OutContext), OutContext))
		.addReg(0) // Segment
		.addImm(0) // Immediate
		);
		}

// Lower a patchpoint of the form:		// Lower a patchpoint of the form:
// [<def>], <id>, <numBytes>, <target>, <numArgs>, <cc>, ...		// [<def>], <id>, <numBytes>, <target>, <numArgs>, <cc>, ...
void X86AsmPrinter::LowerPATCHPOINT(const MachineInstr &MI,		void X86AsmPrinter::LowerPATCHPOINT(const MachineInstr &MI,
X86MCInstLower &MCIL) {		X86MCInstLower &MCIL) {
assert(Subtarget->is64Bit() && "Patchpoint currently only supports X86-64");		assert(Subtarget->is64Bit() && "Patchpoint currently only supports X86-64");

SMShadowTracker.emitShadowPadding(*OutStreamer, getSubtargetInfo());		SMShadowTracker.emitShadowPadding(*OutStreamer, getSubtargetInfo());

▲ Show 20 Lines • Show All 1,140 Lines • ▼ Show 20 Lines	case TargetOpcode::FENTRY_CALL:
return LowerFENTRY_CALL(*MI, MCInstLowering);		return LowerFENTRY_CALL(*MI, MCInstLowering);

case TargetOpcode::PATCHABLE_OP:		case TargetOpcode::PATCHABLE_OP:
return LowerPATCHABLE_OP(*MI, MCInstLowering);		return LowerPATCHABLE_OP(*MI, MCInstLowering);

case TargetOpcode::STACKMAP:		case TargetOpcode::STACKMAP:
return LowerSTACKMAP(*MI);		return LowerSTACKMAP(*MI);

		case TargetOpcode::MIP_FUNCTION_INSTRUMENTATION_MARKER:
		return LowerMIP_FUNCTION_INSTRUMENTATION_MARKER(*MI);

		case TargetOpcode::MIP_FUNCTION_COVERAGE_INSTRUMENTATION:
		return LowerMIP_FUNCTION_COVERAGE_INSTRUMENTATION(*MI);

		case TargetOpcode::MIP_BASIC_BLOCK_COVERAGE_INSTRUMENTATION:
		return LowerMIP_BASIC_BLOCK_COVERAGE_INSTRUMENTATION(*MI);

case TargetOpcode::PATCHPOINT:		case TargetOpcode::PATCHPOINT:
return LowerPATCHPOINT(*MI, MCInstLowering);		return LowerPATCHPOINT(*MI, MCInstLowering);

case TargetOpcode::PATCHABLE_FUNCTION_ENTER:		case TargetOpcode::PATCHABLE_FUNCTION_ENTER:
return LowerPATCHABLE_FUNCTION_ENTER(*MI, MCInstLowering);		return LowerPATCHABLE_FUNCTION_ENTER(*MI, MCInstLowering);

case TargetOpcode::PATCHABLE_RET:		case TargetOpcode::PATCHABLE_RET:
return LowerPATCHABLE_RET(*MI, MCInstLowering);		return LowerPATCHABLE_RET(*MI, MCInstLowering);
▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/mip-basic-block-coverage.ll

This file was added.

				; RUN: llc < %s -enable-machine-instrumentation -enable-machine-function-coverage -enable-machine-block-coverage -mtriple=arm64-linux \| FileCheck %s --check-prefixes CHECK,CHECK-ELF
				; RUN: llc < %s -enable-machine-instrumentation -enable-machine-function-coverage -enable-machine-block-coverage -mtriple=arm64-apple-ios \| FileCheck %s --check-prefixes CHECK,CHECK-MACHO

				; CHECK-ELF-LABEL: _Z3fooi:
				; CHECK-MACHO-LABEL: __Z3fooi:
				define i32 @_Z3fooi(i32 %a) {
				entry:
				%a.addr = alloca i32, align 4
				%sum = alloca i32, align 4
				%i = alloca i32, align 4
				store i32 %a, i32* %a.addr, align 4
				store i32 0, i32* %sum, align 4
				store i32 0, i32* %i, align 4
				br label %for.cond

				; CHECK-LABEL: LBB0_1:
				for.cond: ; preds = %for.inc, %entry
				; CHECK-ELF: adrp [[r:x[0-9]+]], _Z3fooi$RAW
				; CHECK-MACHO: adrp [[r:x[0-9]+]], __Z3fooi$RAW@PAGE
				; CHECK-NEXT: add [[r]], [[r]], #1
				; CHECK-ELF-NEXT: strb wzr, {{\[}}[[r]], :lo12:_Z3fooi$RAW]
				; CHECK-MACHO-NEXT: strb wzr, {{\[}}[[r]], __Z3fooi$RAW@PAGEOFF]
				%0 = load i32, i32* %i, align 4
				%1 = load i32, i32* %a.addr, align 4
				%cmp = icmp slt i32 %0, %1
				br i1 %cmp, label %for.body, label %for.end

				for.body: ; preds = %for.cond
				; CHECK-ELF: adrp [[r:x[0-9]+]], _Z3fooi$RAW
				; CHECK-MACHO: adrp [[r:x[0-9]+]], __Z3fooi$RAW@PAGE
				; CHECK-NEXT: add [[r]], [[r]], #2
				; CHECK-ELF-NEXT: strb wzr, {{\[}}[[r]], :lo12:_Z3fooi$RAW]
				; CHECK-MACHO-NEXT: strb wzr, {{\[}}[[r]], __Z3fooi$RAW@PAGEOFF]
				%2 = load i32, i32* %i, align 4
				%3 = load i32, i32* %sum, align 4
				%add = add nsw i32 %3, %2
				store i32 %add, i32* %sum, align 4
				br label %for.inc

				for.inc: ; preds = %for.body
				; CHECK-ELF: adrp [[r:x[0-9]+]], _Z3fooi$RAW
				; CHECK-MACHO: adrp [[r:x[0-9]+]], __Z3fooi$RAW@PAGE
				; CHECK-NEXT: add [[r]], [[r]], #3
				; CHECK-ELF-NEXT: strb wzr, {{\[}}[[r]], :lo12:_Z3fooi$RAW]
				; CHECK-MACHO-NEXT: strb wzr, {{\[}}[[r]], __Z3fooi$RAW@PAGEOFF]
				%4 = load i32, i32* %i, align 4
				%inc = add nsw i32 %4, 1
				store i32 %inc, i32* %i, align 4
				br label %for.cond

				for.end: ; preds = %for.cond
				%5 = load i32, i32* %sum, align 4
				ret i32 %5
				; CHECK: ret
				}

				; CHECK-ELF-LABEL: .section __llvm_mipraw,"aGw",@progbits,"_Z3fooi$MIP"
				; CHECK-MACHO-LABEL: .section __DATA,__llvm_mipraw

				; CHECK-ELF-LABEL: _Z3fooi$RAW:
				; CHECK-ELF-NEXT: .byte 0xff
				; CHECK-ELF-NEXT: .zero 3,255

				; CHECK-MACHO-LABEL: __Z3fooi$RAW:
				; CHECK-MACHO-NEXT: .byte 0xff
				; CHECK-MACHO-NEXT: .space 3,255

llvm/test/CodeGen/AArch64/mip-comdat.ll

This file was added.

				; RUN: llc < %s -enable-machine-instrumentation -enable-machine-function-coverage -mtriple=arm64-linux \| FileCheck %s

				; CHECK-DAG: .section .text.foo,"axG",@progbits,foo,comdat
				$foo = comdat any
				define i32 @foo(i32) comdat {
				ret i32 101
				}


				; CHECK-DAG: .section __llvm_mipraw,"aGw",@progbits,foo,comdat

				; CHECK-DAG: .section __llvm_mipmap,"Gw",@progbits,foo,comdat

llvm/test/CodeGen/AArch64/mip-function-coverage.ll

This file was added.

				; RUN: llc < %s -enable-machine-instrumentation -enable-machine-function-coverage -mtriple=arm64-linux \| FileCheck %s --check-prefixes CHECK,CHECK-ELF
				; RUN: llc < %s -enable-machine-instrumentation -enable-machine-function-coverage -mtriple=arm64-apple-ios \| FileCheck %s --check-prefixes CHECK,CHECK-MACHO

				; CHECK-ELF-LABEL: _Z3foov:
				; CHECK-MACHO-LABEL: __Z3foov:
				define i32 @_Z3foov() #0 {
				; CHECK-ELF: adrp [[r:x[0-9]+]], _Z3foov$RAW
				; CHECK-ELF-NEXT: strb wzr, {{\[}}[[r]], :lo12:_Z3foov$RAW]

				; CHECK-MACHO: adrp [[r:x[0-9]+]], __Z3foov$RAW@PAGE
				; CHECK-MACHO-NEXT: strb wzr, {{\[}}[[r]], __Z3foov$RAW@PAGEOFF]
				ret i32 0
				; CHECK: ret
				}

				; CHECK-ELF-LABEL: .section __llvm_mipraw,"aGw",@progbits,"_Z3foov$MIP"
				; CHECK-MACHO-LABEL: .section __DATA,__llvm_mipraw

				; CHECK-ELF-LABEL: _Z3foov$RAW:
				; CHECK-MACHO-LABEL: __Z3foov$RAW:
				; CHECK-NEXT: .byte 0xff

llvm/test/CodeGen/AArch64/mip-header.ll

This file was added.

				; RUN: llc < %s -enable-machine-instrumentation -enable-machine-function-coverage -mtriple=arm64-linux \| FileCheck %s --check-prefix ELF
				; RUN: llc < %s -enable-machine-instrumentation -enable-machine-function-coverage -mtriple=arm64-apple-ios \| FileCheck %s --check-prefix MACHO

				define i32 @_Z3fooii(i32 %a, i32 %b) #0 {
				ret i32 0
				}

				;================================= .mipraw Header =====================================;
				; ELF: .section __llvm_mipraw,"aGw",@progbits,Header,comdat
				; ELF: .p2align 3
				; ELF: .word 0x50494dfb // Magic
				; ELF-NEXT: .hword 8 // Version
				; ELF-NEXT: .hword 0x11 // File Type
				; ELF-NEXT: .word 0x1 // Profile Type
				; ELF-NEXT: .word [[MODULE_HASH:.*]] // Module Hash
				; ELF-NEXT: .zero 8
				; ELF-NEXT: .zero 4
				; ELF-NEXT: .word 0x20 // Offset To Data

				;================================= .mipmap Header =====================================;
				; ELF: .section __llvm_mipmap,"GwR",@progbits,Header,comdat
				; ELF: .p2align 3
				; ELF: [[REF:.Lref.+]]:
				; ELF: .word 0x50494dfb // Magic
				; ELF-NEXT: .hword 8 // Version
				; ELF-NEXT: .hword 0x14 // File Type
				; ELF-NEXT: .word 0x1 // Profile Type
				; ELF-NEXT: .word [[MODULE_HASH]] // Module Hash
				; ELF-NEXT: .xword __start___llvm_mipraw-[[REF]] // Raw Section Start PC Offset
				; ELF-NEXT: .zero 4
				; ELF-NEXT: .word 0x20 // Offset To Data

				;================================= .mipraw Header =====================================;
				; MACHO: .section __DATA,__llvm_mipraw
				; MACHO: .weak_definition __header$__llvm_mipraw
				; MACHO: .no_dead_strip __header$__llvm_mipraw
				; MACHO-LABEL: __header$__llvm_mipraw:
				; MACHO: .p2align 3
				; MACHO: .long 0x50494dfb ; Magic
				; MACHO-NEXT: .short 8 ; Version
				; MACHO-NEXT: .short 0x11 ; File Type
				; MACHO-NEXT: .long 0x1 ; Profile Type
				; MACHO-NEXT: .long [[MODULE_HASH:.*]] ; Module Hash
				; MACHO-NEXT: .space 8
				; MACHO-NEXT: .space 4
				; MACHO-NEXT: .long 0x20 ; Offset To Data

				;================================= .mipmap Header =====================================;
				; MACHO: .section __DATA,__llvm_mipmap,regular,live_support
				; MACHO: .weak_definition __header$__llvm_mipmap
				; MACHO-LABEL: __header$__llvm_mipmap:
				; MACHO: .p2align 3
				; MACHO: [[REF:Lref.+]]:
				; MACHO: .long 0x50494dfb ; Magic
				; MACHO-NEXT: .short 8 ; Version
				; MACHO-NEXT: .short 0x14 ; File Type
				; MACHO-NEXT: .long 0x1 ; Profile Type
				; MACHO-NEXT: .long [[MODULE_HASH]] ; Module Hash
				; MACHO-NEXT: .quad __header$__llvm_mipraw-[[REF]] ; Raw Section Start PC Offset
				; MACHO-NEXT: .space 4
				; MACHO-NEXT: .long 0x20 ; Offset To Data

llvm/test/CodeGen/AArch64/mip-map.ll

This file was added.

				; RUN: llc < %s -enable-machine-instrumentation -enable-machine-function-coverage -enable-machine-block-coverage -mtriple=arm64-linux \| FileCheck %s --check-prefixes CHECK,CHECK-ELF
				; RUN: llc < %s -enable-machine-instrumentation -enable-machine-function-coverage -enable-machine-block-coverage -mtriple=arm64-apple-ios \| FileCheck %s --check-prefixes CHECK,CHECK-MACHO

				@global = local_unnamed_addr global i32 4, align 4

				; CHECK-LABEL: _Z3fooi:
				define i32 @_Z3fooi(i32) {
				ret i32 101
				}
				; CHECK: [[FOO_END:.?Lmip_func_end[0-9]+]]:

				; CHECK-LABEL: _Z3gooi:
				define weak i32 @_Z3gooi(i32 %a) local_unnamed_addr #0 {
				entry:
				%cmp = icmp sgt i32 %a, 1
				br i1 %cmp, label %if.then, label %if.end

				if.then: ; preds = %entry
				; CHECK: [[GOO_BLOCK0:.?LBB.*]]:
				%0 = load i32, i32* @global, align 4
				%call = tail call i32 @_Z3fooi(i32 1) #2
				%add.neg = sub i32 %a, %0
				%sub = sub i32 %add.neg, %call
				br label %if.end

				if.end: ; preds = %if.then, %entry
				; CHECK: [[GOO_BLOCK1:.?LBB.*]]:
				%c.0 = phi i32 [ %sub, %if.then ], [ %a, %entry ]
				%mul = mul nsw i32 %c.0, %a
				ret i32 %mul
				}
				; CHECK: [[GOO_END:.?Lmip_func_end[0-9]+]]:

				attributes #0 = { minsize norecurse optsize ssp uwtable "frame-pointer"="non-leaf" }

				; CHECK-ELF-LABEL: .section __llvm_mipmap,"Gw",@progbits,"_Z3fooi$MIP"

				; CHECK-ELF: .p2align 3
				; CHECK-ELF-LABEL: _Z3fooi$MAP:
				; CHECK-ELF-NEXT: [[FOO_REF:.*]]:
				; CHECK-ELF-NEXT: .xword _Z3fooi$RAW-[[FOO_REF]] // Raw Profile Symbol PC Offset
				; CHECK-ELF-NEXT: .xword _Z3fooi-[[FOO_REF]] // Function PC Offset
				; CHECK-ELF-NEXT: .word [[FOO_END]]-_Z3fooi // Function Size
				; CHECK-ELF-NEXT: .word 0x0 // CFG Signature
				; CHECK-ELF-NEXT: .word 0 // Non-entry Block Count
				; CHECK-ELF-NEXT: .word 7 // Function Name Length
				; CHECK-ELF-NEXT: .ascii "_Z3fooi"

				; CHECK-ELF-LABEL: .section __llvm_mipmap,"Gw",@progbits,"_Z3gooi$MIP"

				; CHECK-ELF: .p2align 3
				; CHECK-ELF-LABEL: _Z3gooi$MAP:
				; CHECK-ELF-NEXT: [[GOO_REF:.*]]:
				; CHECK-ELF-NEXT: .xword _Z3gooi$RAW-[[GOO_REF]] // Raw Profile Symbol PC Offset
				; CHECK-ELF-NEXT: .xword _Z3gooi-[[GOO_REF]] // Function PC Offset
				; CHECK-ELF-NEXT: .word [[GOO_END]]-_Z3gooi // Function Size
				; CHECK-ELF-NEXT: .word 0x70c9fa27 // CFG Signature
				; CHECK-ELF-NEXT: .word 2 // Non-entry Block Count
				; CHECK-ELF-NEXT: .word [[GOO_BLOCK0]]-_Z3gooi // Block 0 Offset
				; CHECK-ELF-NEXT: .word [[GOO_BLOCK1]]-_Z3gooi // Block 1 Offset
				; CHECK-ELF-NEXT: .word 7 // Function Name Length
				; CHECK-ELF-NEXT: .ascii "_Z3gooi"

				; CHECK-MACHO-LABEL: .section __DATA,__llvm_mipmap

				; CHECK-MACHO: .p2align 3
				; CHECK-MACHO-LABEL: __Z3fooi$MAP:
				; CHECK-MACHO-NEXT: [[FOO_REF:.*]]:
				; CHECK-MACHO-NEXT: .quad __Z3fooi$RAW-[[FOO_REF]] ; Raw Profile Symbol PC Offset
				; CHECK-MACHO-NEXT: .quad __Z3fooi-[[FOO_REF]] ; Function PC Offset
				; CHECK-MACHO-NEXT: .long [[FOO_END]]-__Z3fooi ; Function Size
				; CHECK-MACHO-NEXT: .long 0x0 ; CFG Signature
				; CHECK-MACHO-NEXT: .long 0 ; Non-entry Block Count
				; CHECK-MACHO-NEXT: .long 8 ; Function Name Length
				; CHECK-MACHO-NEXT: .ascii "__Z3fooi"

				; CHECK-MACHO: .p2align 3
				; CHECK-MACHO-LABEL: __Z3gooi$MAP:
				; CHECK-MACHO-NEXT: [[GOO_REF:.*]]:
				; CHECK-MACHO-NEXT: .quad __Z3gooi$RAW-[[GOO_REF]] ; Raw Profile Symbol PC Offset
				; CHECK-MACHO-NEXT: .quad __Z3gooi-[[GOO_REF]] ; Function PC Offset
				; CHECK-MACHO-NEXT: .long [[GOO_END]]-__Z3gooi ; Function Size
				; CHECK-MACHO-NEXT: .long 0x70c9fa27 ; CFG Signature
				; CHECK-MACHO-NEXT: .long 2 ; Non-entry Block Count
				; CHECK-MACHO-NEXT: .long [[GOO_BLOCK0]]-__Z3gooi ; Block 0 Offset
				; CHECK-MACHO-NEXT: .long [[GOO_BLOCK1]]-__Z3gooi ; Block 1 Offset
				; CHECK-MACHO-NEXT: .long 8 ; Function Name Length
				; CHECK-MACHO-NEXT: .ascii "__Z3gooi"

llvm/test/CodeGen/ARM/mip-comdat.ll

This file was added.

				; RUN: llc < %s -enable-machine-instrumentation -enable-machine-function-coverage -mtriple=armv7-linux \| FileCheck %s

				; CHECK-DAG: .section .text.foo,"axG",%progbits,foo,comdat
				$foo = comdat any
				define i32 @foo(i32) comdat {
				ret i32 101
				}


				; CHECK-DAG: .section __llvm_mipraw,"aGw",%progbits,foo,comdat

				; CHECK-DAG: .section __llvm_mipmap,"Gw",%progbits,foo,comdat

llvm/test/CodeGen/ARM/mip-function-coverage.ll

This file was added.

				; RUN: llc < %s -enable-machine-instrumentation -enable-machine-function-coverage -mtriple=armv7-linux \| FileCheck %s

				; CHECK-LABEL: _Z3foov:
				define i32 @_Z3foov() #0 {
				; CHECK: push {r0, r1}
				; CHECK-NEXT: ldr r1, [[LABEL:.*]]
				; CHECK-NEXT: [[LOAD_LABEL:.*]]:
				; CHECK-NEXT: add r1, pc, r1
				; CHECK-NEXT: mov r0, #0
				; CHECK-NEXT: strb r0, [r1], #0
				; CHECK-NEXT: pop {r0, r1}
				; CHECK-NEXT: b [[CONTINUE:.*]]
				; CHECK-NEXT: .p2align 2
				; CHECK-NEXT: [[LABEL]]:
				; CHECK-NEXT: .long _Z3foov$RAW-([[LOAD_LABEL]]+8)
				; CHECK-NEXT: [[CONTINUE]]:

				ret i32 0
				; CHECK: bx lr
				}

				; CHECK-LABEL: .section __llvm_mipraw,"aGw",%progbits,"_Z3foov$MIP"

				; CHECK-LABEL: _Z3foov$RAW:
				; CHECK-NEXT: .byte 0xff

llvm/test/CodeGen/ARM/mip-header.ll

This file was added.

				; RUN: llc < %s -enable-machine-instrumentation -enable-machine-function-coverage -mtriple=armv7-linux \| FileCheck %s

				define i32 @_Z3fooii(i32 %a, i32 %b) #0 {
				ret i32 0
				}

				;================================= .mipraw Header =====================================;
				; CHECK: .section __llvm_mipraw,"aGw",%progbits,Header,comdat
				; CHECK: .p2align 2
				; CHECK: .long 0x50494dfb @ Magic
				; CHECK-NEXT: .short 8 @ Version
				; CHECK-NEXT: .short 0x21 @ File Type
				; CHECK-NEXT: .long 0x1 @ Profile Type
				; CHECK-NEXT: .long [[MODULE_HASH:.*]] @ Module Hash
				; CHECK-NEXT: .zero 4
				; CHECK-NEXT: .zero 4
				; CHECK-NEXT: .zero 4
				; CHECK-NEXT: .long 0x20 @ Offset To Data

				;================================= .mipmap Header =====================================;
				; CHECK: .section __llvm_mipmap,"GwR",%progbits,Header,comdat
				; CHECK: .p2align 2
				; CHECK: [[REF:.Lref.+]]:
				; CHECK: .long 0x50494dfb @ Magic
				; CHECK-NEXT: .short 8 @ Version
				; CHECK-NEXT: .short 0x24 @ File Type
				; CHECK-NEXT: .long 0x1 @ Profile Type
				; CHECK-NEXT: .long [[MODULE_HASH:.*]] @ Module Hash
				; CHECK-NEXT: .long __start___llvm_mipraw-[[REF]] @ Raw Section Start PC Offset
				; CHECK-NEXT: .zero 4
				; CHECK-NEXT: .zero 4
				; CHECK-NEXT: .long 0x20 @ Offset To Data

llvm/test/CodeGen/ARM/mip-map.ll

This file was added.

				; RUN: llc < %s -enable-machine-instrumentation -enable-machine-function-coverage -mtriple=armv7-linux \| FileCheck %s

				@global = local_unnamed_addr global i32 4, align 4

				; CHECK-LABEL: _Z3fooi:
				define i32 @_Z3fooi(i32) {
				ret i32 101
				}
				; CHECK: [[FOO_END:.Lmip_func_end[0-9]+]]:

				; CHECK-LABEL: _Z3gooi:
				define weak i32 @_Z3gooi(i32 %a) local_unnamed_addr #0 {
				entry:
				%cmp = icmp sgt i32 %a, 1
				br i1 %cmp, label %if.then, label %if.end

				if.then: ; preds = %entry
				%0 = load i32, i32* @global, align 4
				%call = tail call i32 @_Z3fooi(i32 1) #2
				%add.neg = sub i32 %a, %0
				%sub = sub i32 %add.neg, %call
				br label %if.end

				if.end: ; preds = %if.then, %entry
				%c.0 = phi i32 [ %sub, %if.then ], [ %a, %entry ]
				%mul = mul nsw i32 %c.0, %a
				ret i32 %mul
				}
				; CHECK: [[GOO_END:.Lmip_func_end[0-9]+]]:

				attributes #0 = { minsize norecurse optsize ssp uwtable "frame-pointer"="non-leaf" }

				; CHECK-LABEL: .section __llvm_mipmap,"Gw",%progbits,"_Z3fooi$MIP"

				; CHECK: .p2align 2
				; CHECK-LABEL: _Z3fooi$MAP:
				; CHECK-NEXT: [[FOO_REF:.*]]:
				; CHECK-NEXT: .long _Z3fooi$RAW-[[FOO_REF]] @ Raw Profile Symbol PC Offset
				; CHECK-NEXT: .long _Z3fooi-[[FOO_REF]] @ Function PC Offset
				; CHECK-NEXT: .long [[FOO_END]]-_Z3fooi @ Function Size
				; CHECK-NEXT: .long 0x0 @ CFG Signature
				; CHECK-NEXT: .long 0 @ Non-entry Block Count
				; CHECK-NEXT: .long 7 @ Function Name Length
				; CHECK-NEXT: .ascii "_Z3fooi"

				; CHECK-LABEL: .section __llvm_mipmap,"Gw",%progbits,"_Z3gooi$MIP"

				; CHECK: .p2align 2
				; CHECK-LABEL: _Z3gooi$MAP:
				; CHECK-NEXT: [[GOO_REF:.*]]:
				; CHECK-NEXT: .long _Z3gooi$RAW-[[GOO_REF]] @ Raw Profile Symbol PC Offset
				; CHECK-NEXT: .long _Z3gooi-[[GOO_REF]] @ Function PC Offset
				; CHECK-NEXT: .long [[GOO_END]]-_Z3gooi @ Function Size
				; CHECK-NEXT: .long 0x70c9fa27 @ CFG Signature
				; CHECK-NEXT: .long 0 @ Non-entry Block Count
				; CHECK-NEXT: .long 7 @ Function Name Length
				; CHECK-NEXT: .ascii "_Z3gooi"

llvm/test/CodeGen/Thumb/mip-function-coverage.ll

This file was added.

				; RUN: llc < %s -enable-machine-instrumentation -enable-machine-function-coverage -mtriple=thumb-linux \| FileCheck %s

				; CHECK-LABEL: _Z3foov:
				define i32 @_Z3foov() #0 {
				; CHECK: push {r0, r1}
				; CHECK-NEXT: ldr r1, [[LABEL:.*]]
				; CHECK-NEXT: [[LOAD_LABEL:.*]]:
				; CHECK-NEXT: add r1, pc
				; CHECK-NEXT: movs r0, #0
				; CHECK-NEXT: strb r0, [r1]
				; CHECK-NEXT: pop {r0, r1}
				; CHECK-NEXT: b [[CONTINUE:.*]]
				; CHECK-NEXT: .p2align 2
				; CHECK-NEXT: [[LABEL]]:
				; CHECK-NEXT: .long _Z3foov$RAW-([[LOAD_LABEL]]+4)
				; CHECK-NEXT: [[CONTINUE]]:

				ret i32 0
				; CHECK: bx lr
				}

				; CHECK-LABEL: .section __llvm_mipraw,"aGw",%progbits,"_Z3foov$MIP"

				; CHECK-LABEL: _Z3foov$RAW:
				; CHECK-NEXT: .byte 0xff

llvm/test/CodeGen/Thumb2/mip-function-coverage.ll

This file was added.

				; RUN: llc < %s -enable-machine-instrumentation -enable-machine-function-coverage -mtriple=thumb-linux -mattr=+thumb2 -mcpu=cortex-a8 \| FileCheck %s

				; CHECK-LABEL: _Z3foov:
				define i32 @_Z3foov() #0 {
				; CHECK: push {r0, r1}
				; CHECK-NEXT: ldr r1, [[LABEL:.*]]
				; CHECK-NEXT: [[LOAD_LABEL:.*]]:
				; CHECK-NEXT: add r1, pc
				; CHECK-NEXT: movs r0, #0
				; CHECK-NEXT: strb r0, [r1]
				; CHECK-NEXT: pop {r0, r1}
				; CHECK-NEXT: b [[CONTINUE:.*]]
				; CHECK-NEXT: .p2align 2
				; CHECK-NEXT: [[LABEL]]:
				; CHECK-NEXT: .long _Z3foov$RAW-([[LOAD_LABEL]]+4)
				; CHECK-NEXT: [[CONTINUE]]:

				ret i32 0
				; CHECK: bx lr
				}

				; CHECK-LABEL: .section __llvm_mipraw,"aGw",%progbits,"_Z3foov$MIP"

				; CHECK-LABEL: _Z3foov$RAW:
				; CHECK-NEXT: .byte 0xff

llvm/test/CodeGen/X86/mip-basic-block-coverage.ll

This file was added.

				; RUN: llc < %s -enable-machine-instrumentation -enable-machine-function-coverage -enable-machine-block-coverage -mtriple=x86_64-LINUX \| FileCheck %s --check-prefixes CHECK,CHECK-ELF
				; RUN: llc < %s -enable-machine-instrumentation -enable-machine-function-coverage -enable-machine-block-coverage -mtriple=x86_64-apple-macosx \| FileCheck %s --check-prefixes CHECK,CHECK-MACHO

				; CHECK-ELF-LABEL: _Z3fooi:
				; CHECK-MACHO-LABEL: __Z3fooi:
				define i32 @_Z3fooi(i32 %a) {
				entry:
				%a.addr = alloca i32, align 4
				%sum = alloca i32, align 4
				%i = alloca i32, align 4
				store i32 %a, i32* %a.addr, align 4
				store i32 0, i32* %sum, align 4
				store i32 0, i32* %i, align 4
				br label %for.cond

				for.cond: ; preds = %for.inc, %entry
				; CHECK-ELF: movb $0, _Z3fooi$RAW+1(%rip)
				; CHECK-MACHO: movb $0, __Z3fooi$RAW+1(%rip)
				%0 = load i32, i32* %i, align 4
				%1 = load i32, i32* %a.addr, align 4
				%cmp = icmp slt i32 %0, %1
				br i1 %cmp, label %for.body, label %for.end

				for.body: ; preds = %for.cond
				; CHECK-ELF: movb $0, _Z3fooi$RAW+2(%rip)
				; CHECK-MACHO: movb $0, __Z3fooi$RAW+2(%rip)
				%2 = load i32, i32* %i, align 4
				%3 = load i32, i32* %sum, align 4
				%add = add nsw i32 %3, %2
				store i32 %add, i32* %sum, align 4
				br label %for.inc

				for.inc: ; preds = %for.body
				; CHECK-ELF: movb $0, _Z3fooi$RAW+3(%rip)
				; CHECK-MACHO: movb $0, __Z3fooi$RAW+3(%rip)
				%4 = load i32, i32* %i, align 4
				%inc = add nsw i32 %4, 1
				store i32 %inc, i32* %i, align 4
				br label %for.cond

				for.end: ; preds = %for.cond
				%5 = load i32, i32* %sum, align 4
				ret i32 %5
				; CHECK: retq
				}

				; CHECK-ELF-LABEL: .section __llvm_mipraw,"aGw",@progbits,"_Z3fooi$MIP"
				; CHECK-MACHO-LABEL: .section __DATA,__llvm_mipraw

				; CHECK-ELF-LABEL: _Z3fooi$RAW:
				; CHECK-ELF-NEXT: .byte 0xff
				; CHECK-ELF-NEXT: .zero 3,255

				; CHECK-MACHO-LABEL: __Z3fooi$RAW:
				; CHECK-MACHO-NEXT: .byte 0xff
				; CHECK-MACHO-NEXT: .space 3,255

llvm/test/CodeGen/X86/mip-comdat.ll

This file was added.

				; RUN: llc < %s -enable-machine-instrumentation -enable-machine-function-coverage -mtriple=x86_64-linux \| FileCheck %s

				; CHECK-DAG: .section .text.foo,"axG",@progbits,foo,comdat
				$foo = comdat any
				define i32 @foo(i32) comdat {
				ret i32 101
				}


				; CHECK-DAG: .section __llvm_mipraw,"aGw",@progbits,foo,comdat

				; CHECK-DAG: .section __llvm_mipmap,"Gw",@progbits,foo,comdat

llvm/test/CodeGen/X86/mip-function-coverage.ll

This file was added.

				; RUN: llc < %s -enable-machine-instrumentation -enable-machine-function-coverage -mtriple=x86_64-linux \| FileCheck %s --check-prefixes CHECK,CHECK-ELF
				; RUN: llc < %s -enable-machine-instrumentation -enable-machine-function-coverage -mtriple=x86_64-apple-macosx \| FileCheck %s --check-prefixes CHECK,CHECK-MACHO

				; CHECK-ELF-LABEL: _Z3foov:
				; CHECK-MACHO-LABEL: __Z3foov:
				define i32 @_Z3foov() #0 {
				; CHECK-ELF: movb $0, _Z3foov$RAW(%rip)
				; CHECK-MACHO: movb $0, __Z3foov$RAW(%rip)
				ret i32 0
				; CHECK: retq
				}

				; CHECK-ELF-LABEL: .section __llvm_mipraw,"aGw",@progbits,"_Z3foov$MIP"
				; CHECK-MACHO-LABEL: .section __DATA,__llvm_mipraw

				; CHECK-ELF-LABEL: _Z3foov$RAW:
				; CHECK-MACHO-LABEL: __Z3foov$RAW:
				; CHECK-NEXT: .byte 0xff

llvm/test/CodeGen/X86/mip-header.ll

This file was added.

				; RUN: llc < %s -enable-machine-instrumentation -enable-machine-function-coverage -mtriple=x86_64-linux \| FileCheck %s --check-prefix ELF
				; RUN: llc < %s -enable-machine-instrumentation -enable-machine-function-coverage -mtriple=x86_64-apple-macosx \| FileCheck %s --check-prefix MACHO

				define i32 @_Z3fooii(i32 %a, i32 %b) #0 {
				ret i32 0
				}

				;================================= .mipraw Header =====================================;
				; ELF-LABEL: .section __llvm_mipraw,"aGw",@progbits,Header,comdat
				; ELF: .p2align 3
				; ELF: .long 0x50494dfb # Magic
				; ELF-NEXT: .short 8 # Version
				; ELF-NEXT: .short 0x11 # File Type
				; ELF-NEXT: .long 0x1 # Profile Type
				; ELF-NEXT: .long [[MODULE_HASH:.*]] # Module Hash
				; ELF-NEXT: .zero 8
				; ELF-NEXT: .zero 4
				; ELF-NEXT: .long 0x20 # Offset To Data

				;================================= .mipmap Header =====================================;
				; ELF-LABEL: .section __llvm_mipmap,"GwR",@progbits,Header,comdat
				; ELF: .p2align 3
				; ELF-NEXT: [[REF:.Lref.+]]:
				; ELF-NEXT: .long 0x50494dfb # Magic
				; ELF-NEXT: .short 8 # Version
				; ELF-NEXT: .short 0x14 # File Type
				; ELF-NEXT: .long 0x1 # Profile Type
				; ELF-NEXT: .long [[MODULE_HASH]] # Module Hash
				; ELF-NEXT: .quad __start___llvm_mipraw-[[REF]] # Raw Section Start PC Offset
				; ELF-NEXT: .zero 4
				; ELF-NEXT: .long 0x20 # Offset To Data

				;================================= .mipraw Header =====================================;
				; MACHO: .section __DATA,__llvm_mipraw
				; MACHO: .globl __header$__llvm_mipraw
				; MACHO: .weak_definition __header$__llvm_mipraw
				; MACHO: .no_dead_strip __header$__llvm_mipraw
				; MACHO-LABEL: __header$__llvm_mipraw:
				; MACHO: .p2align 3
				; MACHO: .long 0x50494dfb ## Magic
				; MACHO-NEXT: .short 8 ## Version
				; MACHO-NEXT: .short 0x11 ## File Type
				; MACHO-NEXT: .long 0x1 ## Profile Type
				; MACHO-NEXT: .long [[MODULE_HASH:.*]] ## Module Hash
				; MACHO-NEXT: .space 8
				; MACHO-NEXT: .space 4
				; MACHO-NEXT: .long 0x20 ## Offset To Data


				;================================= .mipmap Header =====================================;
				; MACHO: .section __DATA,__llvm_mipmap,regular,live_support
				; MACHO: .globl __header$__llvm_mipmap
				; MACHO: .weak_definition __header$__llvm_mipmap
				; MACHO-LABEL: __header$__llvm_mipmap:
				; MACHO: .p2align 3
				; MACHO: [[REF:Lref.+]]:
				; MACHO: .long 0x50494dfb ## Magic
				; MACHO-NEXT: .short 8 ## Version
				; MACHO-NEXT: .short 0x14 ## File Type
				; MACHO-NEXT: .long 0x1 ## Profile Type
				; MACHO-NEXT: .long [[MODULE_HASH]] ## Module Hash
				; MACHO-NEXT: .quad __header$__llvm_mipraw-[[REF]] ## Raw Section Start PC Offset
				; MACHO-NEXT: .space 4
				; MACHO-NEXT: .long 0x20 ## Offset To Data

llvm/test/CodeGen/X86/mip-map.ll

This file was added.

				; RUN: llc < %s -enable-machine-instrumentation -enable-machine-function-coverage -enable-machine-block-coverage -mtriple=x86_64-linux \| FileCheck %s --check-prefixes CHECK,CHECK-ELF
				; RUN: llc < %s -enable-machine-instrumentation -enable-machine-function-coverage -enable-machine-block-coverage -mtriple=x86_64-apple-macosx \| FileCheck %s --check-prefixes CHECK,CHECK-MACHO

				@global = local_unnamed_addr global i32 4, align 4

				; CHECK-MACHO-LABEL: __Z3fooi:
				; CHECK-ELF-LABEL: _Z3fooi:
				define i32 @_Z3fooi(i32) {
				ret i32 101
				}
				; CHECK: [[FOO_END:.?Lmip_func_end[0-9]+]]:

				; CHECK-MACHO-LABEL: __Z3gooi:
				; CHECK-ELF-LABEL: _Z3gooi:
				define weak i32 @_Z3gooi(i32 %a) local_unnamed_addr #0 {
				entry:
				%cmp = icmp sgt i32 %a, 1
				br i1 %cmp, label %if.then, label %if.end

				if.then: ; preds = %entry
				; CHECK: [[GOO_BLOCK0:.?LBB.*]]:
				%0 = load i32, i32* @global, align 4
				%call = tail call i32 @_Z3fooi(i32 1) #2
				%add.neg = sub i32 %a, %0
				%sub = sub i32 %add.neg, %call
				br label %if.end

				if.end: ; preds = %if.then, %entry
				; CHECK: [[GOO_BLOCK1:.?LBB.*]]:
				%c.0 = phi i32 [ %sub, %if.then ], [ %a, %entry ]
				%mul = mul nsw i32 %c.0, %a
				ret i32 %mul
				}
				; CHECK: [[GOO_END:.?Lmip_func_end[0-9]+]]:

				attributes #0 = { minsize norecurse optsize ssp uwtable "frame-pointer"="non-leaf" }

				; CHECK-ELF-LABEL: .section __llvm_mipmap,"Gw",@progbits,"_Z3fooi$MIP"

				; CHECK-ELF: .p2align 3
				; CHECK-ELF-LABEL: _Z3fooi$MAP:
				; CHECK-ELF-NEXT: [[FOO_REF:.*]]:
				; CHECK-ELF-NEXT: .quad _Z3fooi$RAW-[[FOO_REF]] # Raw Profile Symbol PC Offset
				; CHECK-ELF-NEXT: .quad _Z3fooi-[[FOO_REF]] # Function PC Offset
				; CHECK-ELF-NEXT: .long [[FOO_END]]-_Z3fooi # Function Size
				; CHECK-ELF-NEXT: .long 0x0 # CFG Signature
				; CHECK-ELF-NEXT: .long 0 # Non-entry Block Count
				; CHECK-ELF-NEXT: .long 7 # Function Name Length
				; CHECK-ELF-NEXT: .ascii "_Z3fooi"

				; CHECK-ELF-LABEL: .section __llvm_mipmap,"Gw",@progbits,"_Z3gooi$MIP"

				; CHECK-ELF: .p2align 3
				; CHECK-ELF-LABEL: _Z3gooi$MAP:
				; CHECK-ELF-NEXT: [[GOO_REF:.*]]:
				; CHECK-ELF-NEXT: .quad _Z3gooi$RAW-[[GOO_REF]] # Raw Profile Symbol PC Offset
				; CHECK-ELF-NEXT: .quad _Z3gooi-[[GOO_REF]] # Function PC Offset
				; CHECK-ELF-NEXT: .long [[GOO_END]]-_Z3gooi # Function Size
				; CHECK-ELF-NEXT: .long 0x70c9fa27 # CFG Signature
				; CHECK-ELF-NEXT: .long 2 # Non-entry Block Count
				; CHECK-ELF-NEXT: .long [[GOO_BLOCK0]]-_Z3gooi # Block 0 Offset
				; CHECK-ELF-NEXT: .long [[GOO_BLOCK1]]-_Z3gooi # Block 1 Offset
				; CHECK-ELF-NEXT: .long 7 # Function Name Length
				; CHECK-ELF-NEXT: .ascii "_Z3gooi"

				; CHECK-MACHO-LABEL: .section __DATA,__llvm_mipmap

				; CHECK-MACHO: .p2align 3
				; CHECK-MACHO-LABEL: __Z3fooi$MAP:
				; CHECK-MACHO-NEXT: [[FOO_REF:.*]]:
				; CHECK-MACHO-NEXT: .quad __Z3fooi$RAW-[[FOO_REF]] ## Raw Profile Symbol PC Offset
				; CHECK-MACHO-NEXT: .quad __Z3fooi-[[FOO_REF]] ## Function PC Offset
				; CHECK-MACHO-NEXT: .long [[FOO_END]]-__Z3fooi ## Function Size
				; CHECK-MACHO-NEXT: .long 0x0 ## CFG Signature
				; CHECK-MACHO-NEXT: .long 0 ## Non-entry Block Count
				; CHECK-MACHO-NEXT: .long 8 ## Function Name Length
				; CHECK-MACHO-NEXT: .ascii "__Z3fooi"

				; CHECK-MACHO: .p2align 3
				; CHECK-MACHO-LABEL: __Z3gooi$MAP:
				; CHECK-MACHO-NEXT: [[GOO_REF:.*]]:
				; CHECK-MACHO-NEXT: .quad __Z3gooi$RAW-[[GOO_REF]] ## Raw Profile Symbol PC Offset
				; CHECK-MACHO-NEXT: .quad __Z3gooi-[[GOO_REF]] ## Function PC Offset
				; CHECK-MACHO-NEXT: .long [[GOO_END]]-__Z3gooi ## Function Size
				; CHECK-MACHO-NEXT: .long 0x70c9fa27 ## CFG Signature
				; CHECK-MACHO-NEXT: .long 2 ## Non-entry Block Count
				; CHECK-MACHO-NEXT: .long [[GOO_BLOCK0]]-__Z3gooi ## Block 0 Offset
				; CHECK-MACHO-NEXT: .long [[GOO_BLOCK1]]-__Z3gooi ## Block 1 Offset
				; CHECK-MACHO-NEXT: .long 8 ## Function Name Length
				; CHECK-MACHO-NEXT: .ascii "__Z3gooi"

This is an archive of the discontinued LLVM Phabricator instance.

Machine IR ProfileAbandonedPublic

Details

Machine IR Profile (MIP)

tl;dr;

Motivation

Performance

Instrumented Binary Size

Instrumented Execution Time

Usage

Diff Detail

Unit TestsFailed

Event Timeline

Machine IR Profile (MIP)

tl;dr;

Motivation

Performance

Instrumented Binary Size

Instrumented Execution Time

Usage

Base

MIP

-fcs-profile-generate

Base

MIP

-fcs-profile-generate

XRay

-fxray-instrument -fxray-instrumentation-bundle=function-entry

Base

MIP

-fcs-profile-generate

Section Layout

__llvm_mipraw

__llvm_mipmap

Integration with Existing PGI?

MIP Edge Profile

Optimization

Revision Contents

Diff 356058

llvm/include/llvm/CodeGen/AsmPrinter.h

llvm/include/llvm/CodeGen/MIPSectionEmitter.h

llvm/include/llvm/CodeGen/MIRInstrumentationPass.h

llvm/include/llvm/CodeGen/MachineInstr.h

llvm/include/llvm/CodeGen/Passes.h

llvm/include/llvm/CodeGen/TargetInstrInfo.h

llvm/include/llvm/InitializePasses.h

llvm/include/llvm/MC/MCObjectFileInfo.h

llvm/include/llvm/MIP/MIP.h

llvm/include/llvm/MIP/MIPData.inc

llvm/include/llvm/Support/TargetOpcodes.def

llvm/include/llvm/Target/Target.td

llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp

llvm/lib/CodeGen/CMakeLists.txt

llvm/lib/CodeGen/CodeGen.cpp

llvm/lib/CodeGen/MIPSectionEmitter.cpp

llvm/lib/CodeGen/MIRInstrumentationPass.cpp

llvm/lib/CodeGen/TargetPassConfig.cpp

llvm/lib/MC/MCMachOStreamer.cpp

llvm/lib/MC/MCObjectFileInfo.cpp

llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp

llvm/lib/Target/AArch64/AArch64InstrInfo.h

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

llvm/lib/Target/ARM/ARMAsmPrinter.h

llvm/lib/Target/ARM/ARMAsmPrinter.cpp

llvm/lib/Target/ARM/ARMMCInstLower.cpp

llvm/lib/Target/X86/X86AsmPrinter.h

llvm/lib/Target/X86/X86MCInstLower.cpp

llvm/test/CodeGen/AArch64/mip-basic-block-coverage.ll

llvm/test/CodeGen/AArch64/mip-comdat.ll

llvm/test/CodeGen/AArch64/mip-function-coverage.ll

llvm/test/CodeGen/AArch64/mip-header.ll

llvm/test/CodeGen/AArch64/mip-map.ll

llvm/test/CodeGen/ARM/mip-comdat.ll

llvm/test/CodeGen/ARM/mip-function-coverage.ll

llvm/test/CodeGen/ARM/mip-header.ll

llvm/test/CodeGen/ARM/mip-map.ll

llvm/test/CodeGen/Thumb/mip-function-coverage.ll

llvm/test/CodeGen/Thumb2/mip-function-coverage.ll

llvm/test/CodeGen/X86/mip-basic-block-coverage.ll

llvm/test/CodeGen/X86/mip-comdat.ll

Machine IR Profile
AbandonedPublic

`-fcs-profile-generate`

`-fcs-profile-generate`

`-fxray-instrument -fxray-instrumentation-bundle=function-entry`

`-fcs-profile-generate`

`__llvm_mipraw`

`__llvm_mipmap`