This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
compiler-rt/test/profile/
-
test/
-
profile/
-
instrprof-block-coverage.c
-
instrprof-coverage.c
-
instrprof-entry-coverage.c
-
llvm/
-
include/llvm/Transforms/Instrumentation/
-
llvm/
-
Transforms/
-
Instrumentation/
2/2
BlockCoverageInference.h
-
lib/Transforms/Instrumentation/
-
Transforms/
-
Instrumentation/
18/24
BlockCoverageInference.cpp
-
CMakeLists.txt
-
InstrProfiling.cpp
6/8
PGOInstrumentation.cpp
-
test/Transforms/PGOProfile/
-
Transforms/
-
PGOProfile/
-
coverage.ll

Differential D124490

[InstrProf] Minimal Block Coverage
ClosedPublic

Authored by ellis on Apr 26 2022, 5:14 PM.

Download Raw Diff

Details

Reviewers

phosek
spupyrev
kyulee
wenlei
davidxl
paquette
gulfem
MaskRay
yozhu

Commits

rG167e8f8b6b11: [InstrProf] Minimal Block Coverage

Summary

This diff implements minimal block coverage instrumentation. When the -pgo-block-coverage option is used, basic blocks will be instrumented for block coverage using single byte booleans. The coverage of some basic blocks can be inferred from others, so not every basic block is instrumented. In fact, we found that only ~60% of basic blocks need to be instrumented. These differences lead to less size overhead when compared to instrumenting block counts. For example, block coverage on the clang binary has an overhead of 20 Mi (17%) compared to 56 Mi (47%) with block counts.

Even though block coverage profiles have less precision than block count profiles, they can still be used to guide optimizations. In PGOUseFunc we use block coverage to populate edge weights such that BFI gives nonzero counts to only covered blocks. We do this by 1) setting the entry count of covered functions to a large value, i.e., 10000 and 2) populating edge weights using block coverage. In the next diff https://reviews.llvm.org/D125743 we use BFI to guide the machine outliner to avoid outlining covered blocks. This -pgo-block-coverage option provides a trade off of generating less precise profiles for faster and smaller instrumented binaries.

The BlockCoverageInference class defines the algorithm to find the minimal set of basic blocks that need to be instrumented for coverage. This is different from the Kirchhoff circuit law optimization that is used for edge counts because that does not work for block coverage. The reason for this is that edge counts can be added together to find a missing count while block coverage cannot since they store boolean values. So we need a new algorithm to find which blocks must be instrumented.

The details on this algorithm can be found in this paper titled "Minimum Coverage Instrumentation": https://arxiv.org/abs/2208.13907

Special thanks to Julian Mestre for creating this block coverage inference algorithm.

Binary size of clang using -O2:

Base
- .text: 65.8 Mi
- Total: 119 Mi
IRPGO (-fprofile-generate -mllvm -disable-vp -mllvm -debug-info-correlate)
- .text: 93.0 Mi
- __llvm_prf_cnts: 14.5 Mi
- Total: 175 Mi
Minimal Block Coverage (-fprofile-generate -mllvm -disable-vp -mllvm -debug-info-correlate -mllvm -pgo-block-coverage)
- .text: 82.1 Mi
- __llvm_prf_cnts: 1.38 Mi
- Total: 139 Mi

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ellis created this revision.Apr 26 2022, 5:14 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 26 2022, 5:14 PM

Herald added subscribers: wenlei, hiraditya, mgorny. · View Herald Transcript

Harbormaster completed remote builds in B161489: Diff 425350.Apr 26 2022, 6:42 PM

kyulee added a subscriber: kyulee.Apr 26 2022, 6:53 PM

spupyrev added a subscriber: spupyrev.Apr 26 2022, 7:01 PM

ellis edited the summary of this revision. (Show Details)Apr 26 2022, 9:58 PM

ellis published this revision for review.May 16 2022, 11:18 AM

ellis edited the summary of this revision. (Show Details)

ellis added reviewers: phosek, spupyrev, kyulee, wenlei, davidxl.

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 16 2022, 11:19 AM

Herald added subscribers: llvm-commits, Restricted Project. · View Herald Transcript

Special thanks to Julian Mestre for creating this block coverage inference algorithm.

I haven't read the code yet but can you briefly describe it in the summary?

How is this compared with Kirchhoff circuit law optimization which requires integral execution counts?
Note: -fprofile-instr-generate is quite a bit slower because it does not have the optimization.

In D124490#3516747, @MaskRay wrote:

Special thanks to Julian Mestre for creating this block coverage inference algorithm.

I haven't read the code yet but can you briefly describe it in the summary?

How is this compared with Kirchhoff circuit law optimization which requires integral execution counts?
Note: -fprofile-instr-generate is quite a bit slower because it does not have the optimization.

I've updated the summary to explain the algorithm.

ellis added a reviewer: paquette.May 16 2022, 5:37 PM

Do the outliner changes need to be a part of this patch? Or can you split the outliner changes out into a separate patch, and then add that separate patch as a child of this one?

gulfem added a subscriber: gulfem.May 16 2022, 6:32 PM

Rebase and separate out machine outliner code into https://reviews.llvm.org/D125743

ellis edited the summary of this revision. (Show Details)May 16 2022, 9:39 PM

ellis added a child revision: D125743: [outliner] Use profile data to avoid outlining hot blocks.May 16 2022, 9:42 PM

Harbormaster completed remote builds in B164803: Diff 429933.May 16 2022, 10:26 PM

phosek added a reviewer: gulfem.May 17 2022, 11:49 AM

Hi @ellis,

This is exciting for us! We are interested in using single byte booleans for source-based code coverage, and I am currently working on the implementation.
We are trying to find a good solution to the problem that you described that missing counts cannot be inferred by doing arithmetic on other block counts when we use boolean counters.
For ex, in an if statement, else part is not instrumented, and its counter is inferred by subtracting parent and then counters.
When we use the boolean values for counters, this cannot be done anymore.
Simpler approach is to instrument more basic blocks, but we are exploring approaches that add counters to minimal blocks.
One of the biggest differences of coverage and PGO is that counters are emitted by traversing AST nodes in the front-end for coverage, whereas they are emitted by traversing CFG nodes for PGO.
The algorithm that you introduced is based on CFG traversal. I'm looking at that to see whether this can be repurposed for coverage.

A few questions:

You mentioned that "we found that only ~60% of basic blocks need to be instrumented". With boolean counters, do you how much that has changed?
Size reduction is great! Do you have any data on the impact of block coverage on compilation time and runtime performance? For coverage, we are also aiming runtime performance with boolean counters.
Did you do any verification to compare the correctness of block coverage like comparing it against block counts?

It seems like we are trying to solve similar problems in different contexts, so we would be very interested in collaboration.

smeenai added a subscriber: smeenai.May 17 2022, 6:10 PM

In D124490#3520905, @gulfem wrote:

Hi @ellis,

This is exciting for us! We are interested in using single byte booleans for source-based code coverage, and I am currently working on the implementation.
We are trying to find a good solution to the problem that you described that missing counts cannot be inferred by doing arithmetic on other block counts when we use boolean counters.
For ex, in an if statement, else part is not instrumented, and its counter is inferred by subtracting parent and then counters.
When we use the boolean values for counters, this cannot be done anymore.
Simpler approach is to instrument more basic blocks, but we are exploring approaches that add counters to minimal blocks.
One of the biggest differences of coverage and PGO is that counters are emitted by traversing AST nodes in the front-end for coverage, whereas they are emitted by traversing CFG nodes for PGO.
The algorithm that you introduced is based on CFG traversal. I'm looking at that to see whether this can be repurposed for coverage.

A few questions:

You mentioned that "we found that only ~60% of basic blocks need to be instrumented". With boolean counters, do you how much that has changed?

Size reduction is great! Do you have any data on the impact of block coverage on compilation time and runtime performance? For coverage, we are also aiming runtime performance with boolean counters.

Did you do any verification to compare the correctness of block coverage like comparing it against block counts?

It seems like we are trying to solve similar problems in different contexts, so we would be very interested in collaboration.

I'm happy to see interest in coverage instrumentation!

Let me clarify. The existing "Kirchhoff circuit law" optimization used for 64 bit counters allows us to instrument a subset of basic blocks. IIRC we need to instrument slightly more than 60% of basic blocks to do this. For the minimal block coverage algorithm I've implemented here, we also only need to instrument ~60% of basic blocks. Of course, the blocks we instrument are not the same in both algorithms, but the number of blocks is roughly the same.
I haven't analyzed compilation time, but the theoretical runtime is O(|E| * |V|). For runtime performance, I also haven't measured this but my intuition is that this is very close to optimal since we are adding a single store in as few blocks as possible. In some cases there is some choice in which blocks to instrument and so we could try to make better decisions there, e.g., choose to instrument a rarely executed block rather than one in a tight loop.
My colleague Julian Mestre actually proved that the algorithm produces a correct instrumentation and also that it produces a minimal instrumentation. So we cannot find a coverage that instruments fewer blocks. We are planning on publishing these results in a paper, but that might not be for some time.

I've implemented BlockCoverageInference to assume we are instrumenting Blocks in a CFG, but there is nothing special about blocks. This algorithm will work on any directed graph with entry and exit nodes. I think we can extend this class to support a more general graph and use it for AST coverage. Let me know if you think this could work.

I'm also wondering if we could use this PGO coverage to infer source-based code coverage by looking at debug info.

ellis mentioned this in D125743: [outliner] Use profile data to avoid outlining hot blocks.May 19 2022, 11:11 AM

Remove isBlockRarelyExecuted() API.
For covered functions, set the entry count to 10000 so that BFI correctly gives non-zero profile counts to only covered blocks.

Harbormaster completed remote builds in B165425: Diff 430827.May 19 2022, 3:50 PM

In D124490#3522697, @ellis wrote:

In D124490#3520905, @gulfem wrote:

Hi @ellis,

This is exciting for us! We are interested in using single byte booleans for source-based code coverage, and I am currently working on the implementation.
We are trying to find a good solution to the problem that you described that missing counts cannot be inferred by doing arithmetic on other block counts when we use boolean counters.
For ex, in an if statement, else part is not instrumented, and its counter is inferred by subtracting parent and then counters.
When we use the boolean values for counters, this cannot be done anymore.
Simpler approach is to instrument more basic blocks, but we are exploring approaches that add counters to minimal blocks.
One of the biggest differences of coverage and PGO is that counters are emitted by traversing AST nodes in the front-end for coverage, whereas they are emitted by traversing CFG nodes for PGO.
The algorithm that you introduced is based on CFG traversal. I'm looking at that to see whether this can be repurposed for coverage.

A few questions:

You mentioned that "we found that only ~60% of basic blocks need to be instrumented". With boolean counters, do you how much that has changed?

Size reduction is great! Do you have any data on the impact of block coverage on compilation time and runtime performance? For coverage, we are also aiming runtime performance with boolean counters.

Did you do any verification to compare the correctness of block coverage like comparing it against block counts?

It seems like we are trying to solve similar problems in different contexts, so we would be very interested in collaboration.

I'm happy to see interest in coverage instrumentation!

Let me clarify. The existing "Kirchhoff circuit law" optimization used for 64 bit counters allows us to instrument a subset of basic blocks. IIRC we need to instrument slightly more than 60% of basic blocks to do this. For the minimal block coverage algorithm I've implemented here, we also only need to instrument ~60% of basic blocks. Of course, the blocks we instrument are not the same in both algorithms, but the number of blocks is roughly the same.

I haven't analyzed compilation time, but the theoretical runtime is O(|E| * |V|). For runtime performance, I also haven't measured this but my intuition is that this is very close to optimal since we are adding a single store in as few blocks as possible. In some cases there is some choice in which blocks to instrument and so we could try to make better decisions there, e.g., choose to instrument a rarely executed block rather than one in a tight loop.

My colleague Julian Mestre actually proved that the algorithm produces a correct instrumentation and also that it produces a minimal instrumentation. So we cannot find a coverage that instruments fewer blocks. We are planning on publishing these results in a paper, but that might not be for some time.

That's great!

I've implemented BlockCoverageInference to assume we are instrumenting Blocks in a CFG, but there is nothing special about blocks. This algorithm will work on any directed graph with entry and exit nodes. I think we can extend this class to support a more general graph and use it for AST coverage. Let me know if you think this could work.

CoverageMappingGen.cpp adds counters while traversing AST. For ex, this is how counters are added for an if statement.
https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/CoverageMappingGen.cpp#L1383
I'm currently building the prototype for using boolean counters in coverage, and I'm planning to upload a WIP review soon.
After that, we can discuss further whether BlockCoverageInference can be generalized.

I'm also wondering if we could use this PGO coverage to infer source-based code coverage by looking at debug info.

How are you planning to use block coverage for PGO?
In some of the tools that we show source-based coverage, we show whether a line is covered by the executed tests and do not show how many times each line is executed. For this specific purpose, we want to use single byte counters.
Although I'm not that familiar with PGO implementation, it seems like number counters might be more beneficial for making optimization decisions for PGO. Are there specific optimizations that you are planning to use block coverage?

In D124490#3526752, @gulfem wrote:

CoverageMappingGen.cpp adds counters while traversing AST. For ex, this is how counters are added for an if statement.
https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/CoverageMappingGen.cpp#L1383
I'm currently building the prototype for using boolean counters in coverage, and I'm planning to upload a WIP review soon.
After that, we can discuss further whether BlockCoverageInference can be generalized.

In D124490#3522697, @ellis wrote:

I'm also wondering if we could use this PGO coverage to infer source-based code coverage by looking at debug info.

How are you planning to use block coverage for PGO?
In some of the tools that we show source-based coverage, we show whether a line is covered by the executed tests and do not show how many times each line is executed. For this specific purpose, we want to use single byte counters.
Although I'm not that familiar with PGO implementation, it seems like number counters might be more beneficial for making optimization decisions for PGO. Are there specific optimizations that you are planning to use block coverage?

The default PGO instruments counters and is useful for optimization. That problem is that this instrumentation is expensive with respect to binary size and runtime. This diff adds a block coverage mode which is much smaller and faster, although the data is not as precise. In the parent diff D125743 I've added support to use block coverage to outline blocks that are not covered and we may use this data to guide more optimizations in the future.

About the source-based coverage, my intuition is that adding coverage instrumentation to clang AST is less efficient than adding coverage instrumentation to blocks in the LLVM-IR like this diff does. In theory, I think it's possible to map covered LLVM-IR blocks to source code using debug info. Then you would only need to run this block instrumentation and get source coverage out of the profiles. Unfortunately, we don't maintain the binary address or source locations of instrumented blocks in our profiles, so this idea would require some format changes. I'm just throwing out ideas and I'm now realizing https://discourse.llvm.org/ would be a good place for this discussion. Feel free to reach out there if you'd like to talk about it more.

Refactor BCI a bit and add a more interesting test case in compiler-rt.

Harbormaster completed remote builds in B165582: Diff 431047.May 20 2022, 2:35 PM

We reviewed the code internally prior sending upstream, and it is still looks good to me.

llvm/lib/Transforms/Instrumentation/BlockCoverageInference.cpp
180	I have seen in some places assertions of type assert(condition && "readable text message"); Is this optional or the accepted style?
222	why predecessor? is this for determinism?

This revision is now accepted and ready to land.Jun 2 2022, 4:48 PM

Herald added a subscriber: Enna1. · View Herald TranscriptJun 2 2022, 4:48 PM

MaskRay added a reviewer: MaskRay.Jun 2 2022, 4:51 PM

This revision now requires review to proceed.Jun 2 2022, 4:51 PM

Herald added a subscriber: StephenFan. · View Herald TranscriptJun 2 2022, 4:51 PM

ellis added inline comments.Jun 3 2022, 4:08 PM

llvm/lib/Transforms/Instrumentation/BlockCoverageInference.cpp
180	I don't know if this is optional or not, so I'm open to either way. I don't expect to ever hit this assert unless there is a bug, so a user should not encounter this assert. If the assert is hit, then we will need to look at and understand the source anyway and I'm not sure how much added value we get from a readable message here.

Improve comments, add more statistics, and remove the get() helper function.

(I plan to read this soon, and adding myself as a blocking reviewer.)

In D124490#3557619, @MaskRay wrote:

(I plan to read this soon, and adding myself as a blocking reviewer.)

Ah that makes sense, thanks!

ellis added inline comments.Jun 3 2022, 4:53 PM

llvm/lib/Transforms/Instrumentation/BlockCoverageInference.cpp
222	The predecessor was just an arbitrary choice. On second thought, this logic is not really needed. The original idea was to have every non-instrumented node depend on either some of its predecessors or some of its successors. This aligns well with the theory, but in practice we don't think this is needed. This may result in a node's coverage depending on more nodes than necessary, but this should be fine. I will remove this logic.

ellis updated this revision to Diff 434207.Jun 3 2022, 4:56 PM

Remove logic that enforced a non-instrumented node to either depend on its predecessors or successors, but not both. This logic was valid, but not necessary to produce a valid coverage instrumentation, so it was removed for simplicity.

Also, copy the interesting test case from compiler-rt to llvm. This allows us to more easily debug the complex test case in LLVM rather than compiler-rt.

ellis marked an inline comment as done.Jun 3 2022, 4:56 PM

Harbormaster completed remote builds in B167858: Diff 434207.Jun 3 2022, 6:13 PM

MaskRay added inline comments.Jun 4 2022, 11:15 AM

llvm/include/llvm/Transforms/Instrumentation/BlockCoverageInference.h
16	`LLVM_TRANSFORMS_INSTRUMENTATION_BLOCKCOVERAGEINFERENCE_H`
llvm/lib/Transforms/Instrumentation/BlockCoverageInference.cpp
49	remove. see https://llvm.org/docs/CodingStandards.html#use-namespace-qualifiers-to-implement-previously-declared-functions
51	Use a reference for non-null. delete `assert(F)`
58	https://llvm.org/docs/CodingStandards.html#prefer-preincrement
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
275	Delete `, cl::init(false)`

If this new instrumentation mode is for coverage (non-optimization), why modify PGOUseFunc? If just to use PGOVerifyBFI, it doesn't justify modifying PGOInstrumentation. It would be cleaner to create a new file based on my current understanding.

llvm/include/llvm/Transforms/Instrumentation/BlockCoverageInference.h
80
llvm/lib/Transforms/Instrumentation/BlockCoverageInference.cpp
67
104
114	`return F.size() == Visited.size()`
121
130	BlockList is used only once. Remove `using`
132	This loop can be augmented with more info to include `canFindMinimalCoverage`, then just delete `canFindMinimalCoverage`
138	The time complexity is O(\|BB\|^2) and can be problematic.
146	llvm::any_of
200	`return Path.count(Neighbors[0]) ? Neighbors[1] : Neighbors[0];`
245	Use `llvm::append_range`
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
1656	This can use a flood-fill instead of a fixed-point iteration.

MaskRay requested changes to this revision.Jun 4 2022, 12:12 PM

This revision now requires changes to proceed.Jun 4 2022, 12:12 PM

MaskRay added inline comments.Jun 4 2022, 12:44 PM

llvm/lib/Transforms/Instrumentation/BlockCoverageInference.cpp
150	I have other things to do and haven't read very carefully, but the impl here appears to differ from the algorithm description in the summary. Having a SuperReachable Pred does not mean that `PredecessorDependencies[&BB]` should remain empty.

MaskRay added inline comments.Jun 4 2022, 12:50 PM

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
275	Also delete `cl::ZeroOrMore`. I have changed `cl::opt` that the option can be specified multiple times without an error.

ellis mentioned this in D116180: [InstrProf] Add single byte coverage mode.Jun 6 2022, 11:19 AM

In D124490#3558438, @MaskRay wrote:

If this new instrumentation mode is for coverage (non-optimization), why modify PGOUseFunc? If just to use PGOVerifyBFI, it doesn't justify modifying PGOInstrumentation. It would be cleaner to create a new file based on my current understanding.

Perhaps I should update the description to better explain how this diff is used for optimization. I will update it shortly.

In PGOUseFunc we populate edge weights such that BlockFrequencyInfo reports uncovered blocks as cold and covered blocks as hot. Obviously this will not be as precise as populating real edge weights from block counts, but this is a trade off from using extremely lightweight instrumented binaries. In the next diff D125743 we can take advantage of these imprecise profiles by using block coverage to guide the machine outliner. We have found this to work quite well at reducing binary size without harming runtime performance.

ellis edited the summary of this revision. (Show Details)Jun 6 2022, 3:28 PM

Make changes based on comments.

I am planning changes for now. We might have a simpler algorithm that uses dominator trees. It also runs in linear time, so it's more efficient than the current one.

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
275	Nice 🙂

Harbormaster completed remote builds in B168728: Diff 435403.Jun 8 2022, 8:00 PM

alanphipps added a subscriber: alanphipps.Jul 29 2022, 7:44 PM

Rebase

Fix warning with Optional<BlockCoverageInference>

ellis edited the summary of this revision. (Show Details)Sep 13 2022, 4:10 PM

ellis added a reviewer: yozhu.

Add link to paper

Harbormaster completed remote builds in B186483: Diff 459914.Sep 13 2022, 4:51 PM

jmestre added a subscriber: jmestre.Oct 10 2022, 2:47 AM

jmestre added inline comments.

llvm/lib/Transforms/Instrumentation/BlockCoverageInference.cpp
150	As long as BB itself is reachable from EntryBlock, PredecessorDependencies should be non-empty whenever HasSuperReachablePred is true. Consider a simple path (i.e., there are no repeated nodes in the path) from EntryBlock to BB. Let Pred be the second to last node in this path. (Pred is well defined because HasSuperReachablePred is true implies BB != EntryBlock and so the path has at least one edge.) Then Pred belongs in RecheableFromEntry and so it will be added to PredecessorDependencies. If it is desirable that the implementation handles functions with blocks that are not reachable from EntryBlock, then we can filter them out at the start and set their coverage status of these block to false right way.

Rebase and add comment discussing the runtime.

Can we get another review of this? CC @MaskRay

llvm/lib/Transforms/Instrumentation/BlockCoverageInference.cpp
138	I've added a comment discussing the runtime. We do know of a linear time algorithm that would compute the Dependencies maps, but it is significantly more complicated. We thought it would be best to first add this simpler implementation and in the future we can use the linear algorithm if we find that it gives a significant speedup in practice.

Harbormaster completed remote builds in B191404: Diff 466673.Oct 10 2022, 6:47 PM

Friendly ping :)

Rebase.

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
1656	I opted to use this algorithm for simplicity. Since most CFGs are relatively small, this function doesn't contribute too much to buildtime. If we discover that this causes a slowdown in some real world scenario then we can improve it. In fact, we can use info from the block coverage inference algorithm to infer all blocks in one pass. But again, this would add complexity and I would rather not do this unless we needed to.

Harbormaster completed remote builds in B197596: Diff 475237.Nov 14 2022, 12:54 PM

@MaskRay, were you still planning to review this/do you have any unaddressed concerns?

Rebase, use the std::optional() API, and initialize region counters for cover instructions as well as increment instructions.

Harbormaster completed remote builds in B209035: Diff 490914.Jan 20 2023, 1:22 PM

Ping @MaskRay. Have your blocking concerns been addressed?

smeenai added a comment.Feb 24 2023, 1:59 PM

This comment was removed by smeenai.

In D124490#4146219, @smeenai wrote:

Ping @MaskRay. Have your blocking concerns been addressed?

Very sorry for missing these notifications! I'll read this patch soon...

In D124490#4151517, @MaskRay wrote:

In D124490#4146219, @smeenai wrote:

Ping @MaskRay. Have your blocking concerns been addressed?

Very sorry for missing these notifications! I'll read this patch soon...

Thank you!

In BlockCoverageInference.cpp , for (auto &BB : F) { ... getReachableAvoiding is strictly quadratic and I think that may be problematic.
There are quite a few programs (e.g. Chisel genrerated C++) which contain functions with many basic blocks (say, >1000).
Also see D137184: a workaround for some programs with too many critical edges.

Applying a strict quadratic algorithm is a compile time pitfall. I understand that it may be fine for some mobile applications.
There are certainly other quadratic algorithms such as an O((V+E)*c) algorithm in GCC gcov (V: number of vertices; E: number of edges; c: number of elementary circuits),
an O(V^2) implementation of Semi-NCA algorithm.
For them, the quadratic time complexity is a theoretic upper bound which is really really difficult to achieve in practice (if ever achievable considering that in practice most CFGs are reducible).

Note that the optimality in the number of instrumented basic blocks is not required.
It does not necessarily lead to optimal performance since we have discarded execution count information.
Is it possible to use a faster algorithm which may instrument a few more basic blocks (let's arbitrarily say 1.2-approximation is good enough in practice)?

llvm/lib/Transforms/Instrumentation/BlockCoverageInference.cpp
112	Rename ExitBlocks: BB may end with an unreachable/resume/cleanupret/etc. I think we don't name them exit blocks.
138	We have quadratic time algorithms like `O((V+E)(c+1))` used in gcov.
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
1656	I think it's worth fixing it. A flood-fill algorithm is nearly of the same length and avoids the performance pitfall. That style is used much more than the iterative algorithm here.
1718

Use flood-fill algorithm to infer block coverage and rename some compiler-rt tests.

llvm/lib/Transforms/Instrumentation/BlockCoverageInference.cpp
112	Is `TerminalBlocks` a better name?
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
1656	I've changes this to a flood-fill algorithm, but I needed to first compute `InverseDependencies` which is done in one pass.

ellis added inline comments.Mar 1 2023, 12:43 PM

llvm/lib/Transforms/Instrumentation/BlockCoverageInference.cpp
138	I haven't forgotten about this, I just wanted to resolve the other comments first.

Harbormaster completed remote builds in B216813: Diff 501634.Mar 1 2023, 1:55 PM

In D124490#4153293, @MaskRay wrote:

In BlockCoverageInference.cpp , for (auto &BB : F) { ... getReachableAvoiding is strictly quadratic and I think that may be problematic.
There are quite a few programs (e.g. Chisel genrerated C++) which contain functions with many basic blocks (say, >1000).
Also see D137184: a workaround for some programs with too many critical edges.

Applying a strict quadratic algorithm is a compile time pitfall. I understand that it may be fine for some mobile applications.
There are certainly other quadratic algorithms such as an O((V+E)*c) algorithm in GCC gcov (V: number of vertices; E: number of edges; c: number of elementary circuits),
an O(V^2) implementation of Semi-NCA algorithm.
For them, the quadratic time complexity is a theoretic upper bound which is really really difficult to achieve in practice (if ever achievable considering that in practice most CFGs are reducible).

Note that the optimality in the number of instrumented basic blocks is not required.
It does not necessarily lead to optimal performance since we have discarded execution count information.
Is it possible to use a faster algorithm which may instrument a few more basic blocks (let's arbitrarily say 1.2-approximation is good enough in practice)?

We have analyzed the runtime on several real-world programs that have large functions with many basic blocks. We found that the total runtime of the pgo-instr-gen pass for functions with less than 1.5K blocks is less than 5 seconds. For functions with between 4K and 10K basic blocks, the runtime was 150 seconds to 850 seconds.

Note that the intended use case is to instrument minimal block coverage on binaries interested in minimizing binary size so that we can better control the machine outliner. Those binaries likely won't have functions this large because inlining is likely restricted to reduce code size. For those special cases where functions are large (say, >1.5K blocks), I'd like to bail on this algorithm and instead instrument every block. This will keep the code complexity down while preventing build time regressions for the pathological cases.

Bail if the function has >1.5K blocks.

Harbormaster completed remote builds in B219528: Diff 505341.Mar 14 2023, 7:29 PM

Hi @MaskRay do you have any more concerns?

This looks good to me as this change is effective even for large programs.
This good change has been sitting for a while. I think it would be beneficial for the community to have it, so I approve it.
I also suggest following up any concerns and feedbacks after it lands.

kyulee accepted this revision.Mar 29 2023, 7:53 AM

In D124490#4230857, @kyulee wrote:

This looks good to me as this change is effective even for large programs.
This good change has been sitting for a while. I think it would be beneficial for the community to have it, so I approve it.
I also suggest following up any concerns and feedbacks after it lands.

Just to clarify, as far as I can tell, all other comments have been addressed, and the one remaining concern is around using a faster algorithm. I believe that concern is mitigated by applying the size cutoff. You could argue that having the linear algorithm is cleaner than having a quadratic one with a cutoff, but it sounds like the linear algorithm would be significantly more complex, which is also a trade-off. For the intended application, where being optimal for size is really important and you're unlikely to have a function large enough to hit the cutoff in practice, going with the simpler algorithm seems like a reasonable trade-off.

Rebase

This revision was not accepted when it landed; it landed in state Needs Review.Mar 29 2023, 4:24 PM

This revision was landed with ongoing or failed builds.

Closed by commit rG167e8f8b6b11: [InstrProf] Minimal Block Coverage (authored by ellis). · Explain Why

This revision was automatically updated to reflect the committed changes.

ellis added a commit: rG167e8f8b6b11: [InstrProf] Minimal Block Coverage.

Harbormaster completed remote builds in B222608: Diff 509494.Mar 29 2023, 5:05 PM

GitHub <noreply@github.com> mentioned this in rGb0154c36d638: [InstrProf] Add pgo use block coverage test (#72443).Mon, Nov 20, 7:25 AM

Revision Contents

Path

Size

compiler-rt/

test/

profile/

instrprof-block-coverage.c

47 lines

instrprof-coverage.c

	instrprof-entry-coverage.c
	instrprof-coverage.c

llvm/

include/

llvm/

Transforms/

Instrumentation/

BlockCoverageInference.h

86 lines

lib/

Transforms/

Instrumentation/

BlockCoverageInference.cpp

368 lines

CMakeLists.txt

1 line

InstrProfiling.cpp

11 lines

PGOInstrumentation.cpp

285 lines

test/

Transforms/

PGOProfile/

coverage.ll

148 lines

Diff 509495

compiler-rt/test/profile/instrprof-block-coverage.c

This file was added.

				// RUN: %clang_pgogen -mllvm -pgo-block-coverage %s -o %t.out
				// RUN: env LLVM_PROFILE_FILE=%t1.profraw %run %t.out 1
				// RUN: env LLVM_PROFILE_FILE=%t2.profraw %run %t.out 2
				// RUN: llvm-profdata merge -o %t.profdata %t1.profraw %t2.profraw
				// RUN: %clang_profuse=%t.profdata -mllvm -pgo-verify-bfi -o - -S -emit-llvm %s 2>%t.errs \| FileCheck %s --implicit-check-not="!prof"
				// RUN: FileCheck %s < %t.errs --allow-empty --check-prefix=CHECK-ERROR

				#include <stdlib.h>

				// CHECK: @foo({{.*}})
				// CHECK-SAME: !prof ![[PROF0:[0-9]+]]
				void foo(int a) {
				// CHECK: br i1 %{{.}}, label %{{.}}, label %{{.*}}, !prof ![[PROF1:[0-9]+]]
				if (a % 2 == 0) {
				//
				} else {
				//
				}

				// CHECK: br i1 %{{.}}, label %{{.}}, label %{{.*}}, !prof ![[PROF1]]
				for (int i = 1; i < a; i++) {
				// CHECK: br i1 %{{.}}, label %{{.}}, label %{{.*}}, !prof ![[PROF2:[0-9]+]]
				if (a % 3 == 0) {
				//
				} else {
				// CHECK: br i1 %{{.}}, label %{{.}}, label %{{.*}}, !prof ![[PROF2]]
				if (a % 1001 == 0) {
				return;
				}
				}
				}

				return;
				}

				// CHECK: @main({{.*}})
				// CHECK-SAME: !prof ![[PROF0]]
				int main(int argc, char *argv[]) {
				foo(atoi(argv[1]));
				return 0;
				}

				// CHECK-DAG: ![[PROF0]] = !{!"function_entry_count", i64 10000}
				// CHECK-DAG: ![[PROF1]] = !{!"branch_weights", i32 1, i32 1}
				// CHECK-DAG: ![[PROF2]] = !{!"branch_weights", i32 0, i32 1}

				// CHECK-ERROR-NOT: warning: {{.*}}: Found inconsistent block coverage

compiler-rt/test/profile/instrprof-coverage.c

This file was moved to compiler-rt/test/profile/instrprof-entry-coverage.c.

compiler-rt/test/profile/instrprof-entry-coverage.c

This file was moved from compiler-rt/test/profile/instrprof-coverage.c.

The contents of this file were not changed.

llvm/include/llvm/Transforms/Instrumentation/BlockCoverageInference.h

This file was added.

//===-- BlockCoverageInference.h - Minimal Execution Coverage ---*- C++ -*-===//

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

///

/// \file

/// This file finds the minimum set of blocks on a CFG that must be instrumented

/// to infer execution coverage for the whole graph.

///

//===----------------------------------------------------------------------===//

#ifndef LLVM_TRANSFORMS_INSTRUMENTATION_BLOCKCOVERAGEINFERENCE_H

#define LLVM_TRANSFORMS_INSTRUMENTATION_BLOCKCOVERAGEINFERENCE_H

MaskRayUnsubmitted

Done

LLVM_TRANSFORMS_INSTRUMENTATION_BLOCKCOVERAGEINFERENCE_H

MaskRay: `LLVM_TRANSFORMS_INSTRUMENTATION_BLOCKCOVERAGEINFERENCE_H`

#include "llvm/ADT/ArrayRef.h"

#include "llvm/ADT/DenseMap.h"

#include "llvm/ADT/SetVector.h"

#include "llvm/Support/raw_ostream.h"

namespace llvm {

class Function;

class BasicBlock;

class DotFuncBCIInfo;

class BlockCoverageInference {

friend class DotFuncBCIInfo;

public:

using BlockSet = SmallSetVector<const BasicBlock *, 4>;

BlockCoverageInference(const Function &F, bool ForceInstrumentEntry);

/// \return true if \p BB should be instrumented for coverage.

bool shouldInstrumentBlock(const BasicBlock &BB) const;

/// \return the set of blocks \p Deps such that \p BB is covered iff any

/// blocks in \p Deps are covered.

BlockSet getDependencies(const BasicBlock &BB) const;

/// \return a hash that depends on the set of instrumented blocks.

uint64_t getInstrumentedBlocksHash() const;

/// Dump the inference graph.

void dump(raw_ostream &OS) const;

/// View the inferred block coverage as a dot file.

/// Filled gray blocks are instrumented, red outlined blocks are found to be

/// covered, red edges show that a block's coverage can be inferred from its

/// successors, and blue edges show that a block's coverage can be inferred

/// from its predecessors.

void viewBlockCoverageGraph(

const DenseMap<const BasicBlock *, bool> *Coverage = nullptr) const;

private:

const Function &F;

bool ForceInstrumentEntry;

/// Maps blocks to a minimal list of predecessors that can be used to infer

/// this block's coverage.

DenseMap<const BasicBlock *, BlockSet> PredecessorDependencies;

/// Maps blocks to a minimal list of successors that can be used to infer

/// this block's coverage.

DenseMap<const BasicBlock *, BlockSet> SuccessorDependencies;

/// Compute \p PredecessorDependencies and \p SuccessorDependencies.

void findDependencies();

/// Find the set of basic blocks that are reachable from \p Start without the

/// basic block \p Avoid.

void getReachableAvoiding(const BasicBlock &Start, const BasicBlock &Avoid,

bool IsForward, BlockSet &Reachable) const;

static std::string getBlockNames(ArrayRef<const BasicBlock *> BBs);

static std::string getBlockNames(BlockSet BBs) {

return getBlockNames(ArrayRef<const BasicBlock *>(BBs.begin(), BBs.end()));

MaskRayUnsubmitted

Done

/// basic block \p Avoid.

- void getReachableAvoiding(const BasicBlock *Start, const BasicBlock *Avoid,

+ void getReachableAvoiding(const BasicBlock &Start, const BasicBlock &Avoid,

bool IsForward, BlockSet &Reachable) const;

MaskRay:

}

};

} // end namespace llvm

#endif // LLVM_TRANSFORMS_INSTRUMENTATION_BLOCKCOVERAGEINFERENCE_H

llvm/lib/Transforms/Instrumentation/BlockCoverageInference.cpp

This file was added.

//===-- BlockCoverageInference.cpp - Minimal Execution Coverage -*- C++ -*-===//

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

// Our algorithm works by first identifying a subset of nodes that must always

// be instrumented. We call these nodes ambiguous because knowing the coverage

// of all remaining nodes is not enough to infer their coverage status.

// In general a node v is ambiguous if there exists two entry-to-terminal paths

// P_1 and P_2 such that:

// 1. v not in P_1 but P_1 visits a predecessor of v, and

// 2. v not in P_2 but P_2 visits a successor of v.

// If a node v is not ambiguous, then if condition 1 fails, we can infer v’s

// coverage from the coverage of its predecessors, or if condition 2 fails, we

// can infer v’s coverage from the coverage of its successors.

// Sadly, there are example CFGs where it is not possible to infer all nodes

// from the ambiguous nodes alone. Our algorithm selects a minimum number of

// extra nodes to add to the ambiguous nodes to form a valid instrumentation S.

// Details on this algorithm can be found in https://arxiv.org/abs/2208.13907

//===----------------------------------------------------------------------===//

#include "llvm/Transforms/Instrumentation/BlockCoverageInference.h"

#include "llvm/ADT/DepthFirstIterator.h"

#include "llvm/ADT/Statistic.h"

#include "llvm/Support/CRC.h"

#include "llvm/Support/Debug.h"

#include "llvm/Support/GraphWriter.h"

#include "llvm/Support/raw_ostream.h"

#include "llvm/Transforms/Utils/BasicBlockUtils.h"

using namespace llvm;

#define DEBUG_TYPE "pgo-block-coverage"

STATISTIC(NumFunctions, "Number of total functions that BCI has processed");

STATISTIC(NumIneligibleFunctions,

"Number of functions for which BCI cannot run on");

STATISTIC(NumBlocks, "Number of total basic blocks that BCI has processed");

STATISTIC(NumInstrumentedBlocks,

"Number of basic blocks instrumented for coverage");

MaskRayUnsubmitted

Done

remove. see https://llvm.org/docs/CodingStandards.html#use-namespace-qualifiers-to-implement-previously-declared-functions

MaskRay: remove. see https://llvm.org/docs/CodingStandards.html#use-namespace-qualifiers-to-implement…

BlockCoverageInference::BlockCoverageInference(const Function &F,

bool ForceInstrumentEntry)

MaskRayUnsubmitted

Done

Use a reference for non-null.

delete assert(F)

MaskRay: Use a reference for non-null. delete `assert(F)`

: F(F), ForceInstrumentEntry(ForceInstrumentEntry) {

findDependencies();

assert(!ForceInstrumentEntry || shouldInstrumentBlock(F.getEntryBlock()));

++NumFunctions;

for (auto &BB : F) {

++NumBlocks;

MaskRayUnsubmitted

Done

https://llvm.org/docs/CodingStandards.html#prefer-preincrement

MaskRay: https://llvm.org/docs/CodingStandards.html#prefer-preincrement

if (shouldInstrumentBlock(BB))

++NumInstrumentedBlocks;

}

BlockCoverageInference::BlockSet

BlockCoverageInference::getDependencies(const BasicBlock &BB) const {

assert(BB.getParent() == &F);

BlockSet Dependencies;

MaskRayUnsubmitted

Done

BlockCoverageInference::BlockSet

- BlockCoverageInference::getDependencies(const BasicBlock *BB) const {

+ BlockCoverageInference::getDependencies(const BasicBlock &BB) const {

assert(BB && BB->getParent() == F);

MaskRay:

auto It = PredecessorDependencies.find(&BB);

if (It != PredecessorDependencies.end())

Dependencies.set_union(It->second);

It = SuccessorDependencies.find(&BB);

if (It != SuccessorDependencies.end())

Dependencies.set_union(It->second);

return Dependencies;

}

uint64_t BlockCoverageInference::getInstrumentedBlocksHash() const {

JamCRC JC;

uint64_t Index = 0;

for (auto &BB : F) {

if (shouldInstrumentBlock(BB)) {

uint8_t Data[8];

support::endian::write64le(Data, Index);

JC.update(Data);

}

Index++;

}

return JC.getCRC();

}

bool BlockCoverageInference::shouldInstrumentBlock(const BasicBlock &BB) const {

assert(BB.getParent() == &F);

auto It = PredecessorDependencies.find(&BB);

if (It != PredecessorDependencies.end() && It->second.size())

return false;

It = SuccessorDependencies.find(&BB);

if (It != SuccessorDependencies.end() && It->second.size())

return false;

return true;

}

void BlockCoverageInference::findDependencies() {

assert(PredecessorDependencies.empty() && SuccessorDependencies.empty());

// Empirical analysis shows that this algorithm finishes within 5 seconds for

MaskRayUnsubmitted

Done

return true;

}

- bool BlockCoverageInference::canFindMinimalCoverage(const Function *F) const {

+ bool BlockCoverageInference::canFindMinimalCoverage(const Function &F) const {

if (F->hasFnAttribute(Attribute::NoReturn))

MaskRay:

// functions with fewer than 1.5K blocks.

if (F.hasFnAttribute(Attribute::NoReturn) || F.size() > 1500) {

++NumIneligibleFunctions;

return;

}

SmallVector<const BasicBlock *, 4> TerminalBlocks;

for (auto &BB : F)

MaskRayUnsubmitted

Not Done

Rename ExitBlocks: BB may end with an unreachable/resume/cleanupret/etc. I think we don't name them exit blocks.

MaskRay: Rename ExitBlocks: BB may end with an unreachable/resume/cleanupret/etc. I think we don't name…

ellisAuthorUnsubmitted

Done

Is TerminalBlocks a better name?

ellis: Is `TerminalBlocks` a better name?

if (succ_empty(&BB))

TerminalBlocks.push_back(&BB);

MaskRayUnsubmitted

Done

return F.size() == Visited.size()

MaskRay: `return F.size() == Visited.size()`

// Traverse the CFG backwards from the terminal blocks to make sure every

// block can reach some terminal block. Otherwise this algorithm will not work

// and we must fall back to instrumenting every block.

df_iterator_default_set<const BasicBlock *> Visited;

for (auto *BB : TerminalBlocks)

for (auto *N : inverse_depth_first_ext(BB, Visited))

MaskRayUnsubmitted

Done

void BlockCoverageInference::findDependencies() {

- assert(!PredecessorDependencies.size() && !SuccessorDependencies.size());

+ assert(PredecessorDependencies.empty() && SuccessorDependencies.empty());

// Fallback to instrumenting every block for functions that we cannot run this

MaskRay:

(void)N;

if (F.size() != Visited.size()) {

++NumIneligibleFunctions;

return;

}

// The current implementation for computing `PredecessorDependencies` and

// `SuccessorDependencies` runs in quadratic time with respect to the number

// of basic blocks. While we do have a more complicated linear time algorithm

MaskRayUnsubmitted

Done

BlockList is used only once. Remove using

MaskRay: BlockList is used only once. Remove `using`

// in https://arxiv.org/abs/2208.13907 we do not know if it will give a

// significant speedup in practice given that most functions tend to be

MaskRayUnsubmitted

Done

This loop can be augmented with more info to include canFindMinimalCoverage, then just delete canFindMinimalCoverage

MaskRay: This loop can be augmented with more info to include `canFindMinimalCoverage`, then just delete…

// relatively small in size for intended use cases.

auto &EntryBlock = F.getEntryBlock();

for (auto &BB : F) {

// The set of blocks that are reachable while avoiding BB.

BlockSet ReachableFromEntry, ReachableFromTerminal;

getReachableAvoiding(EntryBlock, BB, /*IsForward=*/true,

MaskRayUnsubmitted

Not Done

The time complexity is O(|BB|^2) and can be problematic.

MaskRay: The time complexity is O(|BB|^2) and can be problematic.

ellisAuthorUnsubmitted

Done

I've added a comment discussing the runtime. We do know of a linear time algorithm that would compute the Dependencies maps, but it is significantly more complicated. We thought it would be best to first add this simpler implementation and in the future we can use the linear algorithm if we find that it gives a significant speedup in practice.

ellis: I've added a comment discussing the runtime. We do know of a linear time algorithm that would…

MaskRayUnsubmitted

Not Done

We have quadratic time algorithms like O((V+E)(c+1)) used in gcov.

MaskRay: We have quadratic time algorithms like `O((V+E)(c+1))` used in gcov.

ellisAuthorUnsubmitted

Done

I haven't forgotten about this, I just wanted to resolve the other comments first.

ellis: I haven't forgotten about this, I just wanted to resolve the other comments first.

ReachableFromEntry);

for (auto *TerminalBlock : TerminalBlocks)

getReachableAvoiding(*TerminalBlock, BB, /*IsForward=*/false,

ReachableFromTerminal);

auto Preds = predecessors(&BB);

bool HasSuperReachablePred = llvm::any_of(Preds, [&](auto *Pred) {

return ReachableFromEntry.count(Pred) &&

MaskRayUnsubmitted

Done

llvm::any_of

MaskRay: llvm::any_of

ReachableFromTerminal.count(Pred);

});

if (!HasSuperReachablePred)

for (auto *Pred : Preds)

MaskRayUnsubmitted

Not Done

I have other things to do and haven't read very carefully, but the impl here appears to differ from the algorithm description in the summary.

Having a SuperReachable Pred does not mean that PredecessorDependencies[&BB] should remain empty.

MaskRay: I have other things to do and haven't read very carefully, but the impl here appears to differ…

jmestreUnsubmitted

Not Done

As long as BB itself is reachable from EntryBlock, PredecessorDependencies should be non-empty whenever HasSuperReachablePred is true. Consider a simple path (i.e., there are no repeated nodes in the path) from EntryBlock to BB. Let Pred be the second to last node in this path. (Pred is well defined because HasSuperReachablePred is true implies BB != EntryBlock and so the path has at least one edge.) Then Pred belongs in RecheableFromEntry and so it will be added to PredecessorDependencies.
If it is desirable that the implementation handles functions with blocks that are not reachable from EntryBlock, then we can filter them out at the start and set their coverage status of these block to false right way.

jmestre: As long as BB itself is reachable from EntryBlock, PredecessorDependencies should be non-empty…

if (ReachableFromEntry.count(Pred))

PredecessorDependencies[&BB].insert(Pred);

auto Succs = successors(&BB);

bool HasSuperReachableSucc = llvm::any_of(Succs, [&](auto *Succ) {

return ReachableFromEntry.count(Succ) &&

ReachableFromTerminal.count(Succ);

});

if (!HasSuperReachableSucc)

for (auto *Succ : Succs)

if (ReachableFromTerminal.count(Succ))

SuccessorDependencies[&BB].insert(Succ);

}

if (ForceInstrumentEntry) {

// Force the entry block to be instrumented by clearing the blocks it can

// infer coverage from.

PredecessorDependencies[&EntryBlock].clear();

SuccessorDependencies[&EntryBlock].clear();

}

// Construct a graph where blocks are connected if there is a mutual

// dependency between them. This graph has a special property that it contains

// only paths.

DenseMap<const BasicBlock *, BlockSet> AdjacencyList;

for (auto &BB : F) {

for (auto *Succ : successors(&BB)) {

if (SuccessorDependencies[&BB].count(Succ) &&

PredecessorDependencies[Succ].count(&BB)) {

AdjacencyList[&BB].insert(Succ);

spupyrevUnsubmitted

Not Done

I have seen in some places assertions of type

assert(condition && "readable text message");

Is this optional or the accepted style?

spupyrev: I have seen in some places assertions of type ``` assert(condition && "readable text message")…

ellisAuthorUnsubmitted

Done

I don't know if this is optional or not, so I'm open to either way.

I don't expect to ever hit this assert unless there is a bug, so a user should not encounter this assert. If the assert is hit, then we will need to look at and understand the source anyway and I'm not sure how much added value we get from a readable message here.

ellis: I don't know if this is optional or not, so I'm open to either way. I don't expect to ever hit…

AdjacencyList[Succ].insert(&BB);

}

// Given a path with at least one node, return the next node on the path.

auto getNextOnPath = [&](BlockSet &Path) -> const BasicBlock * {

assert(Path.size());

auto &Neighbors = AdjacencyList[Path.back()];

if (Path.size() == 1) {

// This is the first node on the path, return its neighbor.

assert(Neighbors.size() == 1);

return Neighbors.front();

} else if (Neighbors.size() == 2) {

// This is the middle of the path, find the neighbor that is not on the

// path already.

assert(Path.size() >= 2);

return Path.count(Neighbors[0]) ? Neighbors[1] : Neighbors[0];

}

// This is the end of the path.

MaskRayUnsubmitted

Done

return Path.count(Neighbors[0]) ? Neighbors[1] : Neighbors[0];

MaskRay: `return Path.count(Neighbors[0]) ? Neighbors[1] : Neighbors[0];`

assert(Neighbors.size() == 1);

return nullptr;

};

// Remove all cycles in the inferencing graph.

for (auto &BB : F) {

if (AdjacencyList[&BB].size() == 1) {

// We found the head of some path.

BlockSet Path;

Path.insert(&BB);

while (const BasicBlock *Next = getNextOnPath(Path))

Path.insert(Next);

LLVM_DEBUG(dbgs() << "Found path: " << getBlockNames(Path) << "\n");

// Remove these nodes from the graph so we don't discover this path again.

for (auto *BB : Path)

AdjacencyList[BB].clear();

// Finally, remove the cycles.

if (PredecessorDependencies[Path.front()].size()) {

for (auto *BB : Path)

if (BB != Path.back())

spupyrevUnsubmitted

Done

why predecessor? is this for determinism?

spupyrev: why predecessor? is this for determinism?

ellisAuthorUnsubmitted

Done

The predecessor was just an arbitrary choice.

On second thought, this logic is not really needed. The original idea was to have every non-instrumented node depend on either some of its predecessors or some of its successors. This aligns well with the theory, but in practice we don't think this is needed. This may result in a node's coverage depending on more nodes than necessary, but this should be fine. I will remove this logic.

ellis: The predecessor was just an arbitrary choice. On second thought, this logic is not really…

SuccessorDependencies[BB].clear();

} else {

for (auto *BB : Path)

if (BB != Path.front())

PredecessorDependencies[BB].clear();

}

LLVM_DEBUG(dump(dbgs()));

}

void BlockCoverageInference::getReachableAvoiding(const BasicBlock &Start,

const BasicBlock &Avoid,

bool IsForward,

BlockSet &Reachable) const {

df_iterator_default_set<const BasicBlock *> Visited;

Visited.insert(&Avoid);

if (IsForward) {

auto Range = depth_first_ext(&Start, Visited);

Reachable.insert(Range.begin(), Range.end());

} else {

auto Range = inverse_depth_first_ext(&Start, Visited);

Reachable.insert(Range.begin(), Range.end());

MaskRayUnsubmitted

Done

Use llvm::append_range

MaskRay: Use `llvm::append_range`

}

namespace llvm {

class DotFuncBCIInfo {

private:

const BlockCoverageInference *BCI;

const DenseMap<const BasicBlock *, bool> *Coverage;

public:

DotFuncBCIInfo(const BlockCoverageInference *BCI,

const DenseMap<const BasicBlock *, bool> *Coverage)

: BCI(BCI), Coverage(Coverage) {}

const Function &getFunction() { return BCI->F; }

bool isInstrumented(const BasicBlock *BB) const {

return BCI->shouldInstrumentBlock(*BB);

}

bool isCovered(const BasicBlock *BB) const {

return Coverage && Coverage->lookup(BB);

}

bool isDependent(const BasicBlock *Src, const BasicBlock *Dest) const {

return BCI->getDependencies(*Src).count(Dest);

}

};

template <>

struct GraphTraits<DotFuncBCIInfo *> : public GraphTraits<const BasicBlock *> {

static NodeRef getEntryNode(DotFuncBCIInfo *Info) {

return &(Info->getFunction().getEntryBlock());

}

// nodes_iterator/begin/end - Allow iteration over all nodes in the graph

using nodes_iterator = pointer_iterator<Function::const_iterator>;

static nodes_iterator nodes_begin(DotFuncBCIInfo *Info) {

return nodes_iterator(Info->getFunction().begin());

}

static nodes_iterator nodes_end(DotFuncBCIInfo *Info) {

return nodes_iterator(Info->getFunction().end());

}

static size_t size(DotFuncBCIInfo *Info) {

return Info->getFunction().size();

}

};

template <>

struct DOTGraphTraits<DotFuncBCIInfo *> : public DefaultDOTGraphTraits {

DOTGraphTraits(bool IsSimple = false) : DefaultDOTGraphTraits(IsSimple) {}

static std::string getGraphName(DotFuncBCIInfo *Info) {

return "BCI CFG for " + Info->getFunction().getName().str();

}

std::string getNodeLabel(const BasicBlock *Node, DotFuncBCIInfo *Info) {

return Node->getName().str();

}

std::string getEdgeAttributes(const BasicBlock *Src, const_succ_iterator I,

DotFuncBCIInfo *Info) {

const BasicBlock *Dest = *I;

if (Info->isDependent(Src, Dest))

return "color=red";

if (Info->isDependent(Dest, Src))

return "color=blue";

return "";

}

std::string getNodeAttributes(const BasicBlock *Node, DotFuncBCIInfo *Info) {

std::string Result;

if (Info->isInstrumented(Node))

Result += "style=filled,fillcolor=gray";

if (Info->isCovered(Node))

Result += std::string(Result.empty() ? "" : ",") + "color=red";

return Result;

}

};

} // namespace llvm

void BlockCoverageInference::viewBlockCoverageGraph(

const DenseMap<const BasicBlock *, bool> *Coverage) const {

DotFuncBCIInfo Info(this, Coverage);

WriteGraph(&Info, "BCI", false,

"Block Coverage Inference for " + F.getName());

}

void BlockCoverageInference::dump(raw_ostream &OS) const {

OS << "Minimal block coverage for function \'" << F.getName()

<< "\' (Instrumented=*)\n";

for (auto &BB : F) {

OS << (shouldInstrumentBlock(BB) ? "* " : " ") << BB.getName() << "\n";

auto It = PredecessorDependencies.find(&BB);

if (It != PredecessorDependencies.end() && It->second.size())

OS << " PredDeps = " << getBlockNames(It->second) << "\n";

It = SuccessorDependencies.find(&BB);

if (It != SuccessorDependencies.end() && It->second.size())

OS << " SuccDeps = " << getBlockNames(It->second) << "\n";

}

OS << " Instrumented Blocks Hash = 0x"

<< Twine::utohexstr(getInstrumentedBlocksHash()) << "\n";

}

std::string

BlockCoverageInference::getBlockNames(ArrayRef<const BasicBlock *> BBs) {

std::string Result;

raw_string_ostream OS(Result);

OS << "[";

if (!BBs.empty()) {

OS << BBs.front()->getName();

BBs = BBs.drop_front();

}

for (auto *BB : BBs)

OS << ", " << BB->getName();

OS << "]";

return OS.str();

}

llvm/lib/Transforms/Instrumentation/CMakeLists.txt

	add_llvm_component_library(LLVMInstrumentation			add_llvm_component_library(LLVMInstrumentation
	AddressSanitizer.cpp			AddressSanitizer.cpp
	BoundsChecking.cpp			BoundsChecking.cpp
	CGProfile.cpp			CGProfile.cpp
	ControlHeightReduction.cpp			ControlHeightReduction.cpp
	DataFlowSanitizer.cpp			DataFlowSanitizer.cpp
	GCOVProfiling.cpp			GCOVProfiling.cpp
				BlockCoverageInference.cpp
	MemProfiler.cpp			MemProfiler.cpp
	MemorySanitizer.cpp			MemorySanitizer.cpp
	IndirectCallPromotion.cpp			IndirectCallPromotion.cpp
	Instrumentation.cpp			Instrumentation.cpp
	InstrOrderFile.cpp			InstrOrderFile.cpp
	InstrProfiling.cpp			InstrProfiling.cpp
	KCFI.cpp			KCFI.cpp
	PGOInstrumentation.cpp			PGOInstrumentation.cpp
	Show All 24 Lines

llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp

Show First 20 Lines • Show All 534 Lines • ▼ Show 20 Lines	bool InstrProfiling::run(
// Improve compile time by avoiding linear scans when there is no work.		// Improve compile time by avoiding linear scans when there is no work.
if (!ContainsProfiling && !CoverageNamesVar)		if (!ContainsProfiling && !CoverageNamesVar)
return MadeChange;		return MadeChange;

// We did not know how many value sites there would be inside		// We did not know how many value sites there would be inside
// the instrumented function. This is counting the number of instrumented		// the instrumented function. This is counting the number of instrumented
// target value sites to enter it as field in the profile data variable.		// target value sites to enter it as field in the profile data variable.
for (Function &F : M) {		for (Function &F : M) {
InstrProfIncrementInst *FirstProfIncInst = nullptr;		InstrProfInstBase *FirstProfInst = nullptr;
for (BasicBlock &BB : F)		for (BasicBlock &BB : F)
for (auto I = BB.begin(), E = BB.end(); I != E; I++)		for (auto I = BB.begin(), E = BB.end(); I != E; I++)
if (auto *Ind = dyn_cast<InstrProfValueProfileInst>(I))		if (auto *Ind = dyn_cast<InstrProfValueProfileInst>(I))
computeNumValueSiteCounts(Ind);		computeNumValueSiteCounts(Ind);
else if (FirstProfIncInst == nullptr)		else if (FirstProfInst == nullptr &&
FirstProfIncInst = dyn_cast<InstrProfIncrementInst>(I);		(isa<InstrProfIncrementInst>(I) \|\| isa<InstrProfCoverInst>(I)))
		FirstProfInst = dyn_cast<InstrProfInstBase>(I);

// Value profiling intrinsic lowering requires per-function profile data		// Value profiling intrinsic lowering requires per-function profile data
// variable to be created first.		// variable to be created first.
if (FirstProfIncInst != nullptr)		if (FirstProfInst != nullptr)
static_cast<void>(getOrCreateRegionCounters(FirstProfIncInst));		static_cast<void>(getOrCreateRegionCounters(FirstProfInst));
}		}

for (Function &F : M)		for (Function &F : M)
MadeChange \|= lowerIntrinsics(&F);		MadeChange \|= lowerIntrinsics(&F);

if (CoverageNamesVar) {		if (CoverageNamesVar) {
lowerCoverageData(CoverageNamesVar);		lowerCoverageData(CoverageNamesVar);
MadeChange = true;		MadeChange = true;
▲ Show 20 Lines • Show All 793 Lines • Show Last 20 Lines

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp

Show First 20 Lines • Show All 107 Lines • ▼ Show 20 Lines

#include "llvm/Support/Error.h" #include "llvm/Support/Error.h"

#include "llvm/Support/ErrorHandling.h" #include "llvm/Support/ErrorHandling.h"

#include "llvm/Support/GraphWriter.h" #include "llvm/Support/GraphWriter.h"

#include "llvm/Support/HashBuilder.h" #include "llvm/Support/HashBuilder.h"

#include "llvm/Support/VirtualFileSystem.h" #include "llvm/Support/VirtualFileSystem.h"

#include "llvm/Support/raw_ostream.h" #include "llvm/Support/raw_ostream.h"

#include "llvm/TargetParser/Triple.h" #include "llvm/TargetParser/Triple.h"

#include "llvm/Transforms/Instrumentation.h" #include "llvm/Transforms/Instrumentation.h"

#include "llvm/Transforms/Instrumentation/BlockCoverageInference.h"

#include "llvm/Transforms/Utils/BasicBlockUtils.h" #include "llvm/Transforms/Utils/BasicBlockUtils.h"

#include "llvm/Transforms/Utils/MisExpect.h" #include "llvm/Transforms/Utils/MisExpect.h"

#include "llvm/Transforms/Utils/ModuleUtils.h" #include "llvm/Transforms/Utils/ModuleUtils.h"

#include <algorithm> #include <algorithm>

#include <cassert> #include <cassert>

#include <cstdint> #include <cstdint>

#include <map> #include <map>

#include <memory> #include <memory>

Show All 31 Lines

STATISTIC(NumOfCSPGOEdge, "Number of edges in CSPGO."); STATISTIC(NumOfCSPGOEdge, "Number of edges in CSPGO.");

STATISTIC(NumOfCSPGOBB, "Number of basic-blocks in CSPGO."); STATISTIC(NumOfCSPGOBB, "Number of basic-blocks in CSPGO.");

STATISTIC(NumOfCSPGOSplit, "Number of critical edge splits in CSPGO."); STATISTIC(NumOfCSPGOSplit, "Number of critical edge splits in CSPGO.");

STATISTIC(NumOfCSPGOFunc, STATISTIC(NumOfCSPGOFunc,

"Number of functions having valid profile counts in CSPGO."); "Number of functions having valid profile counts in CSPGO.");

STATISTIC(NumOfCSPGOMismatch, STATISTIC(NumOfCSPGOMismatch,

"Number of functions having mismatch profile in CSPGO."); "Number of functions having mismatch profile in CSPGO.");

STATISTIC(NumOfCSPGOMissing, "Number of functions without profile in CSPGO."); STATISTIC(NumOfCSPGOMissing, "Number of functions without profile in CSPGO.");

STATISTIC(NumCoveredBlocks, "Number of basic blocks that were executed");

// Command line option to specify the file to read profile from. This is // Command line option to specify the file to read profile from. This is

// mainly used for testing. // mainly used for testing.

static cl::opt<std::string> static cl::opt<std::string>

PGOTestProfileFile("pgo-test-profile-file", cl::init(""), cl::Hidden, PGOTestProfileFile("pgo-test-profile-file", cl::init(""), cl::Hidden,

cl::value_desc("filename"), cl::value_desc("filename"),

cl::desc("Specify the path of profile data file. This is" cl::desc("Specify the path of profile data file. This is"

"mainly for test purpose.")); "mainly for test purpose."));

▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines static cl::opt<bool> PGOInstrumentEntry(

"pgo-instrument-entry", cl::init(false), cl::Hidden, "pgo-instrument-entry", cl::init(false), cl::Hidden,

cl::desc("Force to instrument function entry basicblock.")); cl::desc("Force to instrument function entry basicblock."));

static cl::opt<bool> PGOFunctionEntryCoverage( static cl::opt<bool> PGOFunctionEntryCoverage(

"pgo-function-entry-coverage", cl::Hidden, "pgo-function-entry-coverage", cl::Hidden,

cl::desc( cl::desc(

"Use this option to enable function entry coverage instrumentation.")); "Use this option to enable function entry coverage instrumentation."));

static cl::opt<bool> PGOBlockCoverage(

"pgo-block-coverage",

MaskRayUnsubmitted

Done

Delete , cl::init(false)

MaskRay: Delete `, cl::init(false)`

MaskRayUnsubmitted

Done

Also delete cl::ZeroOrMore. I have changed cl::opt that the option can be specified multiple times without an error.

MaskRay: Also delete `cl::ZeroOrMore`. I have changed `cl::opt` that the option can be specified…

ellisAuthorUnsubmitted

Done

Nice 🙂

ellis: Nice 🙂

cl::desc("Use this option to enable basic block coverage instrumentation"));

static cl::opt<bool>

PGOViewBlockCoverageGraph("pgo-view-block-coverage-graph",

cl::desc("Create a dot file of CFGs with block "

"coverage inference information"));

static cl::opt<bool> static cl::opt<bool>

PGOFixEntryCount("pgo-fix-entry-count", cl::init(true), cl::Hidden, PGOFixEntryCount("pgo-fix-entry-count", cl::init(true), cl::Hidden,

cl::desc("Fix function entry count in profile use.")); cl::desc("Fix function entry count in profile use."));

static cl::opt<bool> PGOVerifyHotBFI( static cl::opt<bool> PGOVerifyHotBFI(

"pgo-verify-hot-bfi", cl::init(false), cl::Hidden, "pgo-verify-hot-bfi", cl::init(false), cl::Hidden,

cl::desc("Print out the non-match BFI count if a hot raw profile count " cl::desc("Print out the non-match BFI count if a hot raw profile count "

"becomes non-hot, or a cold raw profile count becomes hot. " "becomes non-hot, or a cold raw profile count becomes hot. "

▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines if (IsCS)

ProfileVersion |= VARIANT_MASK_CSIR_PROF; ProfileVersion |= VARIANT_MASK_CSIR_PROF;

if (PGOInstrumentEntry) if (PGOInstrumentEntry)

ProfileVersion |= VARIANT_MASK_INSTR_ENTRY; ProfileVersion |= VARIANT_MASK_INSTR_ENTRY;

if (DebugInfoCorrelate) if (DebugInfoCorrelate)

ProfileVersion |= VARIANT_MASK_DBG_CORRELATE; ProfileVersion |= VARIANT_MASK_DBG_CORRELATE;

if (PGOFunctionEntryCoverage) if (PGOFunctionEntryCoverage)

ProfileVersion |= ProfileVersion |=

VARIANT_MASK_BYTE_COVERAGE | VARIANT_MASK_FUNCTION_ENTRY_ONLY; VARIANT_MASK_BYTE_COVERAGE | VARIANT_MASK_FUNCTION_ENTRY_ONLY;

if (PGOBlockCoverage)

ProfileVersion |= VARIANT_MASK_BYTE_COVERAGE;

auto IRLevelVersionVariable = new GlobalVariable( auto IRLevelVersionVariable = new GlobalVariable(

M, IntTy64, true, GlobalValue::WeakAnyLinkage, M, IntTy64, true, GlobalValue::WeakAnyLinkage,

Constant::getIntegerValue(IntTy64, APInt(64, ProfileVersion)), VarName); Constant::getIntegerValue(IntTy64, APInt(64, ProfileVersion)), VarName);

IRLevelVersionVariable->setVisibility(GlobalValue::HiddenVisibility); IRLevelVersionVariable->setVisibility(GlobalValue::HiddenVisibility);

Triple TT(M.getTargetTriple()); Triple TT(M.getTargetTriple());

if (TT.supportsCOMDAT()) { if (TT.supportsCOMDAT()) {

IRLevelVersionVariable->setLinkage(GlobalValue::ExternalLinkage); IRLevelVersionVariable->setLinkage(GlobalValue::ExternalLinkage);

IRLevelVersionVariable->setComdat(M.getOrInsertComdat(VarName)); IRLevelVersionVariable->setComdat(M.getOrInsertComdat(VarName));

Show All 16 Lines struct SelectInstVisitor : public InstVisitor<SelectInstVisitor> {

Function &F; Function &F;

unsigned NSIs = 0; // Number of select instructions instrumented. unsigned NSIs = 0; // Number of select instructions instrumented.

VisitMode Mode = VM_counting; // Visiting mode. VisitMode Mode = VM_counting; // Visiting mode.

unsigned *CurCtrIdx = nullptr; // Pointer to current counter index. unsigned *CurCtrIdx = nullptr; // Pointer to current counter index.

unsigned TotalNumCtrs = 0; // Total number of counters unsigned TotalNumCtrs = 0; // Total number of counters

GlobalVariable *FuncNameVar = nullptr; GlobalVariable *FuncNameVar = nullptr;

uint64_t FuncHash = 0; uint64_t FuncHash = 0;

PGOUseFunc *UseFunc = nullptr; PGOUseFunc *UseFunc = nullptr;

bool HasSingleByteCoverage;

SelectInstVisitor(Function &Func) : F(Func) {} SelectInstVisitor(Function &Func, bool HasSingleByteCoverage)

: F(Func), HasSingleByteCoverage(HasSingleByteCoverage) {}

void countSelects(Function &Func) { void countSelects(Function &Func) {

NSIs = 0; NSIs = 0;

Mode = VM_counting; Mode = VM_counting;

visit(Func); visit(Func);

} }

// Visit the IR stream and instrument all select instructions. \p // Visit the IR stream and instrument all select instructions. \p

▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines public:

GlobalVariable *FuncNameVar; GlobalVariable *FuncNameVar;

// CFG hash value for this function. // CFG hash value for this function.

uint64_t FunctionHash = 0; uint64_t FunctionHash = 0;

// The Minimum Spanning Tree of function CFG. // The Minimum Spanning Tree of function CFG.

CFGMST<Edge, BBInfo> MST; CFGMST<Edge, BBInfo> MST;

const std::optional<BlockCoverageInference> BCI;

static std::optional<BlockCoverageInference>

constructBCI(Function &Func, bool HasSingleByteCoverage,

bool InstrumentFuncEntry) {

if (HasSingleByteCoverage)

return BlockCoverageInference(Func, InstrumentFuncEntry);

return {};

}

// Collect all the BBs that will be instrumented, and store them in // Collect all the BBs that will be instrumented, and store them in

// InstrumentBBs. // InstrumentBBs.

void getInstrumentBBs(std::vector<BasicBlock *> &InstrumentBBs); void getInstrumentBBs(std::vector<BasicBlock *> &InstrumentBBs);

// Give an edge, find the BB that will be instrumented. // Give an edge, find the BB that will be instrumented.

// Return nullptr if there is no BB to be instrumented. // Return nullptr if there is no BB to be instrumented.

BasicBlock *getInstrBB(Edge *E); BasicBlock *getInstrBB(Edge *E);

Show All 9 Lines MST.dumpEdges(dbgs(), Twine("Dump Function ") + FuncName + " Hash: " +

Twine(FunctionHash) + "\t" + Str); Twine(FunctionHash) + "\t" + Str);

} }

FuncPGOInstrumentation( FuncPGOInstrumentation(

Function &Func, TargetLibraryInfo &TLI, Function &Func, TargetLibraryInfo &TLI,

std::unordered_multimap<Comdat *, GlobalValue *> &ComdatMembers, std::unordered_multimap<Comdat *, GlobalValue *> &ComdatMembers,

bool CreateGlobalVar = false, BranchProbabilityInfo *BPI = nullptr, bool CreateGlobalVar = false, BranchProbabilityInfo *BPI = nullptr,

BlockFrequencyInfo *BFI = nullptr, bool IsCS = false, BlockFrequencyInfo *BFI = nullptr, bool IsCS = false,

bool InstrumentFuncEntry = true) bool InstrumentFuncEntry = true, bool HasSingleByteCoverage = false)

: F(Func), IsCS(IsCS), ComdatMembers(ComdatMembers), VPC(Func, TLI), : F(Func), IsCS(IsCS), ComdatMembers(ComdatMembers), VPC(Func, TLI),

TLI(TLI), ValueSites(IPVK_Last + 1), SIVisitor(Func), TLI(TLI), ValueSites(IPVK_Last + 1),

MST(F, InstrumentFuncEntry, BPI, BFI) { SIVisitor(Func, HasSingleByteCoverage),

MST(F, InstrumentFuncEntry, BPI, BFI),

BCI(constructBCI(Func, HasSingleByteCoverage, InstrumentFuncEntry)) {

if (BCI && PGOViewBlockCoverageGraph)

BCI->viewBlockCoverageGraph();

// This should be done before CFG hash computation. // This should be done before CFG hash computation.

SIVisitor.countSelects(Func); SIVisitor.countSelects(Func);

ValueSites[IPVK_MemOPSize] = VPC.get(IPVK_MemOPSize); ValueSites[IPVK_MemOPSize] = VPC.get(IPVK_MemOPSize);

if (!IsCS) { if (!IsCS) {

NumOfPGOSelectInsts += SIVisitor.getNumOfSelectInsts(); NumOfPGOSelectInsts += SIVisitor.getNumOfSelectInsts();

NumOfPGOMemIntrinsics += ValueSites[IPVK_MemOPSize].size(); NumOfPGOMemIntrinsics += ValueSites[IPVK_MemOPSize].size();

NumOfPGOBB += MST.BBInfos.size(); NumOfPGOBB += MST.BBInfos.size();

ValueSites[IPVK_IndirectCallTarget] = VPC.get(IPVK_IndirectCallTarget); ValueSites[IPVK_IndirectCallTarget] = VPC.get(IPVK_IndirectCallTarget);

▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines if (PGOOldCFGHashing) {

auto updateJCH = [&JCH](uint64_t Num) { auto updateJCH = [&JCH](uint64_t Num) {

uint8_t Data[8]; uint8_t Data[8];

support::endian::write64le(Data, Num); support::endian::write64le(Data, Num);

JCH.update(Data); JCH.update(Data);

}; };

updateJCH((uint64_t)SIVisitor.getNumOfSelectInsts()); updateJCH((uint64_t)SIVisitor.getNumOfSelectInsts());

updateJCH((uint64_t)ValueSites[IPVK_IndirectCallTarget].size()); updateJCH((uint64_t)ValueSites[IPVK_IndirectCallTarget].size());

updateJCH((uint64_t)ValueSites[IPVK_MemOPSize].size()); updateJCH((uint64_t)ValueSites[IPVK_MemOPSize].size());

if (BCI) {

updateJCH(BCI->getInstrumentedBlocksHash());

} else {

updateJCH((uint64_t)MST.AllEdges.size()); updateJCH((uint64_t)MST.AllEdges.size());

}

// Hash format for context sensitive profile. Reserve 4 bits for other // Hash format for context sensitive profile. Reserve 4 bits for other

// information. // information.

FunctionHash = (((uint64_t)JCH.getCRC()) << 28) + JC.getCRC(); FunctionHash = (((uint64_t)JCH.getCRC()) << 28) + JC.getCRC();

} }

// Reserve bit 60-63 for other information purpose. // Reserve bit 60-63 for other information purpose.

FunctionHash &= 0x0FFFFFFFFFFFFFFF; FunctionHash &= 0x0FFFFFFFFFFFFFFF;

▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines void FuncPGOInstrumentation<Edge, BBInfo>::renameComdatFunction() {

} }

// Collect all the BBs that will be instruments and return them in // Collect all the BBs that will be instruments and return them in

// InstrumentBBs and setup InEdges/OutEdge for UseBBInfo. // InstrumentBBs and setup InEdges/OutEdge for UseBBInfo.

template <class Edge, class BBInfo> template <class Edge, class BBInfo>

void FuncPGOInstrumentation<Edge, BBInfo>::getInstrumentBBs( void FuncPGOInstrumentation<Edge, BBInfo>::getInstrumentBBs(

std::vector<BasicBlock *> &InstrumentBBs) { std::vector<BasicBlock *> &InstrumentBBs) {

if (BCI) {

for (auto &BB : F)

if (BCI->shouldInstrumentBlock(BB))

InstrumentBBs.push_back(&BB);

return;

}

// Use a worklist as we will update the vector during the iteration. // Use a worklist as we will update the vector during the iteration.

std::vector<Edge *> EdgeList; std::vector<Edge *> EdgeList;

EdgeList.reserve(MST.AllEdges.size()); EdgeList.reserve(MST.AllEdges.size());

for (auto &E : MST.AllEdges) for (auto &E : MST.AllEdges)

EdgeList.push_back(E.get()); EdgeList.push_back(E.get());

for (auto &E : EdgeList) { for (auto &E : EdgeList) {

BasicBlock *InstrBB = getInstrBB(E); BasicBlock *InstrBB = getInstrBB(E);

▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines

// Visit all edge and instrument the edges not in MST, and do value profiling. // Visit all edge and instrument the edges not in MST, and do value profiling.

// Critical edges will be split. // Critical edges will be split.

static void instrumentOneFunc( static void instrumentOneFunc(

Function &F, Module *M, TargetLibraryInfo &TLI, BranchProbabilityInfo *BPI, Function &F, Module *M, TargetLibraryInfo &TLI, BranchProbabilityInfo *BPI,

BlockFrequencyInfo *BFI, BlockFrequencyInfo *BFI,

std::unordered_multimap<Comdat *, GlobalValue *> &ComdatMembers, std::unordered_multimap<Comdat *, GlobalValue *> &ComdatMembers,

bool IsCS) { bool IsCS) {

if (!PGOBlockCoverage) {

// Split indirectbr critical edges here before computing the MST rather than // Split indirectbr critical edges here before computing the MST rather than

// later in getInstrBB() to avoid invalidating it. // later in getInstrBB() to avoid invalidating it.

SplitIndirectBrCriticalEdges(F, /*IgnoreBlocksWithoutPHI=*/false, BPI, BFI); SplitIndirectBrCriticalEdges(F, /*IgnoreBlocksWithoutPHI=*/false, BPI, BFI);

}

FuncPGOInstrumentation<PGOEdge, BBInfo> FuncInfo( FuncPGOInstrumentation<PGOEdge, BBInfo> FuncInfo(

F, TLI, ComdatMembers, true, BPI, BFI, IsCS, PGOInstrumentEntry); F, TLI, ComdatMembers, true, BPI, BFI, IsCS, PGOInstrumentEntry,

PGOBlockCoverage);

Type *I8PtrTy = Type::getInt8PtrTy(M->getContext()); Type *I8PtrTy = Type::getInt8PtrTy(M->getContext());

auto Name = ConstantExpr::getBitCast(FuncInfo.FuncNameVar, I8PtrTy); auto Name = ConstantExpr::getBitCast(FuncInfo.FuncNameVar, I8PtrTy);

auto CFGHash = ConstantInt::get(Type::getInt64Ty(M->getContext()), auto CFGHash = ConstantInt::get(Type::getInt64Ty(M->getContext()),

FuncInfo.FunctionHash); FuncInfo.FunctionHash);

if (PGOFunctionEntryCoverage) { if (PGOFunctionEntryCoverage) {

auto &EntryBB = F.getEntryBlock(); auto &EntryBB = F.getEntryBlock();

IRBuilder<> Builder(&EntryBB, EntryBB.getFirstInsertionPt()); IRBuilder<> Builder(&EntryBB, EntryBB.getFirstInsertionPt());

Show All 13 Lines static void instrumentOneFunc(

uint32_t I = 0; uint32_t I = 0;

for (auto *InstrBB : InstrumentBBs) { for (auto *InstrBB : InstrumentBBs) {

IRBuilder<> Builder(InstrBB, InstrBB->getFirstInsertionPt()); IRBuilder<> Builder(InstrBB, InstrBB->getFirstInsertionPt());

assert(Builder.GetInsertPoint() != InstrBB->end() && assert(Builder.GetInsertPoint() != InstrBB->end() &&

"Cannot get the Instrumentation point"); "Cannot get the Instrumentation point");

// llvm.instrprof.increment(i8* <name>, i64 <hash>, i32 <num-counters>, // llvm.instrprof.increment(i8* <name>, i64 <hash>, i32 <num-counters>,

// i32 <index>) // i32 <index>)

Builder.CreateCall( Builder.CreateCall(

Intrinsic::getDeclaration(M, Intrinsic::instrprof_increment), Intrinsic::getDeclaration(M, PGOBlockCoverage

? Intrinsic::instrprof_cover

: Intrinsic::instrprof_increment),

{Name, CFGHash, Builder.getInt32(NumCounters), Builder.getInt32(I++)}); {Name, CFGHash, Builder.getInt32(NumCounters), Builder.getInt32(I++)});

} }

// Now instrument select instructions: // Now instrument select instructions:

FuncInfo.SIVisitor.instrumentSelects(F, &I, NumCounters, FuncInfo.FuncNameVar, FuncInfo.SIVisitor.instrumentSelects(F, &I, NumCounters, FuncInfo.FuncNameVar,

FuncInfo.FunctionHash); FuncInfo.FunctionHash);

assert(I == NumCounters); assert(I == NumCounters);

▲ Show 20 Lines • Show All 126 Lines • ▼ Show 20 Lines

namespace { namespace {

class PGOUseFunc { class PGOUseFunc {

public: public:

PGOUseFunc(Function &Func, Module *Modu, TargetLibraryInfo &TLI, PGOUseFunc(Function &Func, Module *Modu, TargetLibraryInfo &TLI,

std::unordered_multimap<Comdat *, GlobalValue *> &ComdatMembers, std::unordered_multimap<Comdat *, GlobalValue *> &ComdatMembers,

BranchProbabilityInfo *BPI, BlockFrequencyInfo *BFIin, BranchProbabilityInfo *BPI, BlockFrequencyInfo *BFIin,

ProfileSummaryInfo *PSI, bool IsCS, bool InstrumentFuncEntry) ProfileSummaryInfo *PSI, bool IsCS, bool InstrumentFuncEntry,

bool HasSingleByteCoverage)

: F(Func), M(Modu), BFI(BFIin), PSI(PSI), : F(Func), M(Modu), BFI(BFIin), PSI(PSI),

FuncInfo(Func, TLI, ComdatMembers, false, BPI, BFIin, IsCS, FuncInfo(Func, TLI, ComdatMembers, false, BPI, BFIin, IsCS,

InstrumentFuncEntry), InstrumentFuncEntry, HasSingleByteCoverage),

FreqAttr(FFA_Normal), IsCS(IsCS) {} FreqAttr(FFA_Normal), IsCS(IsCS) {}

void handleInstrProfError(Error Err, uint64_t MismatchedFuncSum);

// Read counts for the instrumented BB from profile. // Read counts for the instrumented BB from profile.

bool readCounters(IndexedInstrProfReader *PGOReader, bool &AllZeros, bool readCounters(IndexedInstrProfReader *PGOReader, bool &AllZeros,

InstrProfRecord::CountPseudoKind &PseudoKind); InstrProfRecord::CountPseudoKind &PseudoKind);

// Read memprof data for the instrumented function from profile. // Read memprof data for the instrumented function from profile.

bool readMemprof(IndexedInstrProfReader *PGOReader); bool readMemprof(IndexedInstrProfReader *PGOReader);

// Populate the counts for all BBs. // Populate the counts for all BBs.

void populateCounters(); void populateCounters();

// Set block coverage based on profile coverage values.

void populateCoverage(IndexedInstrProfReader *PGOReader);

// Set the branch weights based on the count values. // Set the branch weights based on the count values.

void setBranchWeights(); void setBranchWeights();

// Annotate the value profile call sites for all value kind. // Annotate the value profile call sites for all value kind.

void annotateValueSites(); void annotateValueSites();

// Annotate the value profile call sites for one value kind. // Annotate the value profile call sites for one value kind.

void annotateValueSites(uint32_t Kind); void annotateValueSites(uint32_t Kind);

▲ Show 20 Lines • Show All 432 Lines • ▼ Show 20 Lines for (auto &I : BB) {

} }

return true; return true;

} }

// Read the profile from ProfileFileName and assign the value to the void PGOUseFunc::handleInstrProfError(Error Err, uint64_t MismatchedFuncSum) {

// instrumented BB and the edges. This function also updates ProgramMaxCount. handleAllErrors(std::move(Err), [&](const InstrProfError &IPE) {

// Return true if the profile are successfully read, and false on errors.

bool PGOUseFunc::readCounters(IndexedInstrProfReader *PGOReader, bool &AllZeros,

InstrProfRecord::CountPseudoKind &PseudoKind) {

auto &Ctx = M->getContext(); auto &Ctx = M->getContext();

uint64_t MismatchedFuncSum = 0;

Expected<InstrProfRecord> Result = PGOReader->getInstrProfRecord(

FuncInfo.FuncName, FuncInfo.FunctionHash, &MismatchedFuncSum);

if (Error E = Result.takeError()) {

handleAllErrors(std::move(E), [&](const InstrProfError &IPE) {

auto Err = IPE.get(); auto Err = IPE.get();

bool SkipWarning = false; bool SkipWarning = false;

LLVM_DEBUG(dbgs() << "Error in reading profile for Func " LLVM_DEBUG(dbgs() << "Error in reading profile for Func "

<< FuncInfo.FuncName << ": "); << FuncInfo.FuncName << ": ");

if (Err == instrprof_error::unknown_function) { if (Err == instrprof_error::unknown_function) {

IsCS ? NumOfCSPGOMissing++ : NumOfPGOMissing++; IsCS ? NumOfCSPGOMissing++ : NumOfPGOMissing++;

SkipWarning = !PGOWarnMissing; SkipWarning = !PGOWarnMissing;

LLVM_DEBUG(dbgs() << "unknown function"); LLVM_DEBUG(dbgs() << "unknown function");

} else if (Err == instrprof_error::hash_mismatch || } else if (Err == instrprof_error::hash_mismatch ||

Err == instrprof_error::malformed) { Err == instrprof_error::malformed) {

IsCS ? NumOfCSPGOMismatch++ : NumOfPGOMismatch++; IsCS ? NumOfCSPGOMismatch++ : NumOfPGOMismatch++;

SkipWarning = SkipWarning =

NoPGOWarnMismatch || NoPGOWarnMismatch ||

(NoPGOWarnMismatchComdatWeak && (NoPGOWarnMismatchComdatWeak &&

(F.hasComdat() || F.getLinkage() == GlobalValue::WeakAnyLinkage || (F.hasComdat() || F.getLinkage() == GlobalValue::WeakAnyLinkage ||

F.getLinkage() == GlobalValue::AvailableExternallyLinkage)); F.getLinkage() == GlobalValue::AvailableExternallyLinkage));

LLVM_DEBUG(dbgs() << "hash mismatch (hash= " << FuncInfo.FunctionHash LLVM_DEBUG(dbgs() << "hash mismatch (hash= " << FuncInfo.FunctionHash

<< " skip=" << SkipWarning << ")"); << " skip=" << SkipWarning << ")");

// Emit function metadata indicating PGO profile mismatch. // Emit function metadata indicating PGO profile mismatch.

annotateFunctionWithHashMismatch(F, M->getContext()); annotateFunctionWithHashMismatch(F, M->getContext());

} }

LLVM_DEBUG(dbgs() << " IsCS=" << IsCS << "\n"); LLVM_DEBUG(dbgs() << " IsCS=" << IsCS << "\n");

if (SkipWarning) if (SkipWarning)

return; return;

std::string Msg = std::string Msg =

IPE.message() + std::string(" ") + F.getName().str() + IPE.message() + std::string(" ") + F.getName().str() +

std::string(" Hash = ") + std::to_string(FuncInfo.FunctionHash) + std::string(" Hash = ") + std::to_string(FuncInfo.FunctionHash) +

std::string(" up to ") + std::to_string(MismatchedFuncSum) + std::string(" up to ") + std::to_string(MismatchedFuncSum) +

std::string(" count discarded"); std::string(" count discarded");

Ctx.diagnose( Ctx.diagnose(

DiagnosticInfoPGOProfile(M->getName().data(), Msg, DS_Warning)); DiagnosticInfoPGOProfile(M->getName().data(), Msg, DS_Warning));

}); });

}

// Read the profile from ProfileFileName and assign the value to the

// instrumented BB and the edges. This function also updates ProgramMaxCount.

// Return true if the profile are successfully read, and false on errors.

bool PGOUseFunc::readCounters(IndexedInstrProfReader *PGOReader, bool &AllZeros,

InstrProfRecord::CountPseudoKind &PseudoKind) {

auto &Ctx = M->getContext();

uint64_t MismatchedFuncSum = 0;

Expected<InstrProfRecord> Result = PGOReader->getInstrProfRecord(

FuncInfo.FuncName, FuncInfo.FunctionHash, &MismatchedFuncSum);

if (Error E = Result.takeError()) {

handleInstrProfError(std::move(E), MismatchedFuncSum);

return false; return false;

} }

ProfileRecord = std::move(Result.get()); ProfileRecord = std::move(Result.get());

PseudoKind = ProfileRecord.getCountPseudoKind(); PseudoKind = ProfileRecord.getCountPseudoKind();

if (PseudoKind != InstrProfRecord::NotPseudo) { if (PseudoKind != InstrProfRecord::NotPseudo) {

return true; return true;

} }

std::vector<uint64_t> &CountFromProfile = ProfileRecord.Counts; std::vector<uint64_t> &CountFromProfile = ProfileRecord.Counts;

Show All 22 Lines Ctx.diagnose(DiagnosticInfoPGOProfile(

+ Twine(": the profile may be stale or there is a function name collision."), + Twine(": the profile may be stale or there is a function name collision."),

DS_Warning)); DS_Warning));

return false; return false;

} }

ProgramMaxCount = PGOReader->getMaximumFunctionCount(IsCS); ProgramMaxCount = PGOReader->getMaximumFunctionCount(IsCS);

return true; return true;

} }

void PGOUseFunc::populateCoverage(IndexedInstrProfReader *PGOReader) {

uint64_t MismatchedFuncSum = 0;

Expected<InstrProfRecord> Result = PGOReader->getInstrProfRecord(

FuncInfo.FuncName, FuncInfo.FunctionHash, &MismatchedFuncSum);

if (auto Err = Result.takeError()) {

handleInstrProfError(std::move(Err), MismatchedFuncSum);

return;

}

std::vector<uint64_t> &CountsFromProfile = Result.get().Counts;

DenseMap<const BasicBlock *, bool> Coverage;

unsigned Index = 0;

for (auto &BB : F)

if (FuncInfo.BCI->shouldInstrumentBlock(BB))

Coverage[&BB] = (CountsFromProfile[Index++] != 0);

assert(Index == CountsFromProfile.size());

// For each B in InverseDependencies[A], if A is covered then B is covered.

DenseMap<const BasicBlock *, DenseSet<const BasicBlock *>>

MaskRayUnsubmitted

Not Done

This can use a flood-fill instead of a fixed-point iteration.

MaskRay: This can use a flood-fill instead of a fixed-point iteration.

ellisAuthorUnsubmitted

Done

I opted to use this algorithm for simplicity. Since most CFGs are relatively small, this function doesn't contribute too much to buildtime. If we discover that this causes a slowdown in some real world scenario then we can improve it. In fact, we can use info from the block coverage inference algorithm to infer all blocks in one pass. But again, this would add complexity and I would rather not do this unless we needed to.

ellis: I opted to use this algorithm for simplicity. Since most CFGs are relatively small, this…

MaskRayUnsubmitted

Not Done

I think it's worth fixing it. A flood-fill algorithm is nearly of the same length and avoids the performance pitfall. That style is used much more than the iterative algorithm here.

MaskRay: I think it's worth fixing it. A flood-fill algorithm is nearly of the same length and avoids…

ellisAuthorUnsubmitted

Done

I've changes this to a flood-fill algorithm, but I needed to first compute InverseDependencies which is done in one pass.

ellis: I've changes this to a flood-fill algorithm, but I needed to first compute…

InverseDependencies;

for (auto &BB : F) {

for (auto *Dep : FuncInfo.BCI->getDependencies(BB)) {

// If Dep is covered then BB is covered.

InverseDependencies[Dep].insert(&BB);

}

// Infer coverage of the non-instrumented blocks using a flood-fill algorithm.

std::stack<const BasicBlock *> CoveredBlocksToProcess;

for (auto &[BB, IsCovered] : Coverage)

if (IsCovered)

CoveredBlocksToProcess.push(BB);

while (!CoveredBlocksToProcess.empty()) {

auto *CoveredBlock = CoveredBlocksToProcess.top();

assert(Coverage[CoveredBlock]);

CoveredBlocksToProcess.pop();

for (auto *BB : InverseDependencies[CoveredBlock]) {

// If CoveredBlock is covered then BB is covered.

if (Coverage[BB])

continue;

Coverage[BB] = true;

CoveredBlocksToProcess.push(BB);

}

// Annotate block coverage.

MDBuilder MDB(F.getContext());

// We set the entry count to 10000 if the entry block is covered so that BFI

// can propagate a fraction of this count to the other covered blocks.

F.setEntryCount(Coverage[&F.getEntryBlock()] ? 10000 : 0);

for (auto &BB : F) {

// For a block A and its successor B, we set the edge weight as follows:

// If A is covered and B is covered, set weight=1.

// If A is covered and B is uncovered, set weight=0.

// If A is uncovered, set weight=1.

// This setup will allow BFI to give nonzero profile counts to only covered

// blocks.

SmallVector<unsigned, 4> Weights;

for (auto *Succ : successors(&BB))

Weights.push_back((Coverage[Succ] || !Coverage[&BB]) ? 1 : 0);

if (Weights.size() >= 2)

BB.getTerminator()->setMetadata(LLVMContext::MD_prof,

MDB.createBranchWeights(Weights));

}

unsigned NumCorruptCoverage = 0;

DominatorTree DT(F);

LoopInfo LI(DT);

BranchProbabilityInfo BPI(F, LI);

BlockFrequencyInfo BFI(F, BPI, LI);

auto IsBlockDead = [&](const BasicBlock &BB) -> std::optional<bool> {

if (auto C = BFI.getBlockProfileCount(&BB))

return C == 0;

return {};

};

LLVM_DEBUG(dbgs() << "Block Coverage: (Instrumented=*, Covered=X)\n");

for (auto &BB : F) {

LLVM_DEBUG(dbgs() << (FuncInfo.BCI->shouldInstrumentBlock(BB) ? "* " : " ")

<< (Coverage[&BB] ? "X " : " ") << " " << BB.getName()

<< "\n");

MaskRayUnsubmitted

Done

<< (IsBlockDead(BB).value() ? "Dead" : "Covered") << "\n");

- NumCorruptCoverage++;

+ ++NumCorruptCoverage;

}

if (Coverage[&BB])

MaskRay:

// In some cases it is possible to find a covered block that has no covered

// successors, e.g., when a block calls a function that may call exit(). In

// those cases, BFI could find its successor to be covered while BCI could

// find its successor to be dead.

if (Coverage[&BB] == IsBlockDead(BB).value_or(false)) {

LLVM_DEBUG(

dbgs() << "Found inconsistent block covearge for " << BB.getName()

<< ": BCI=" << (Coverage[&BB] ? "Covered" : "Dead") << " BFI="

<< (IsBlockDead(BB).value() ? "Dead" : "Covered") << "\n");

++NumCorruptCoverage;

}

if (Coverage[&BB])

++NumCoveredBlocks;

}

if (PGOVerifyBFI && NumCorruptCoverage) {

auto &Ctx = M->getContext();

Ctx.diagnose(DiagnosticInfoPGOProfile(

M->getName().data(),

Twine("Found inconsistent block coverage for function ") + F.getName() +

" in " + Twine(NumCorruptCoverage) + " blocks.",

DS_Warning));

}

if (PGOViewBlockCoverageGraph)

FuncInfo.BCI->viewBlockCoverageGraph(&Coverage);

}

// Populate the counters from instrumented BBs to all BBs. // Populate the counters from instrumented BBs to all BBs.

// In the end of this operation, all BBs should have a valid count value. // In the end of this operation, all BBs should have a valid count value.

void PGOUseFunc::populateCounters() { void PGOUseFunc::populateCounters() {

bool Changes = true; bool Changes = true;

unsigned NumPasses = 0; unsigned NumPasses = 0;

while (Changes) { while (Changes) {

NumPasses++; NumPasses++;

Changes = false; Changes = false;

▲ Show 20 Lines • Show All 143 Lines • ▼ Show 20 Lines if (BFI->isIrrLoopHeader(&BB) || isIndirectBrTarget(&BB)) {

Instruction *TI = BB.getTerminator(); Instruction *TI = BB.getTerminator();

const UseBBInfo &BBCountInfo = getBBInfo(&BB); const UseBBInfo &BBCountInfo = getBBInfo(&BB);

setIrrLoopHeaderMetadata(M, TI, BBCountInfo.CountValue); setIrrLoopHeaderMetadata(M, TI, BBCountInfo.CountValue);

} }

void SelectInstVisitor::instrumentOneSelectInst(SelectInst &SI) { void SelectInstVisitor::instrumentOneSelectInst(SelectInst &SI) {

if (PGOFunctionEntryCoverage)

return;

Module *M = F.getParent(); Module *M = F.getParent();

IRBuilder<> Builder(&SI); IRBuilder<> Builder(&SI);

Type *Int64Ty = Builder.getInt64Ty(); Type *Int64Ty = Builder.getInt64Ty();

Type *I8PtrTy = Builder.getInt8PtrTy(); Type *I8PtrTy = Builder.getInt8PtrTy();

auto *Step = Builder.CreateZExt(SI.getCondition(), Int64Ty); auto *Step = Builder.CreateZExt(SI.getCondition(), Int64Ty);

Builder.CreateCall( Builder.CreateCall(

Intrinsic::getDeclaration(M, Intrinsic::instrprof_increment_step), Intrinsic::getDeclaration(M, Intrinsic::instrprof_increment_step),

{ConstantExpr::getBitCast(FuncNameVar, I8PtrTy), {ConstantExpr::getBitCast(FuncNameVar, I8PtrTy),

Show All 16 Lines void SelectInstVisitor::annotateOneSelectInst(SelectInst &SI) {

// False Count // False Count

SCounts[1] = (TotalCount > SCounts[0] ? TotalCount - SCounts[0] : 0); SCounts[1] = (TotalCount > SCounts[0] ? TotalCount - SCounts[0] : 0);

uint64_t MaxCount = std::max(SCounts[0], SCounts[1]); uint64_t MaxCount = std::max(SCounts[0], SCounts[1]);

if (MaxCount) if (MaxCount)

setProfMetadata(F.getParent(), &SI, SCounts, MaxCount); setProfMetadata(F.getParent(), &SI, SCounts, MaxCount);

} }

void SelectInstVisitor::visitSelectInst(SelectInst &SI) { void SelectInstVisitor::visitSelectInst(SelectInst &SI) {

if (!PGOInstrSelect) if (!PGOInstrSelect || PGOFunctionEntryCoverage || HasSingleByteCoverage)

return; return;

// FIXME: do not handle this yet. // FIXME: do not handle this yet.

if (SI.getCondition()->getType()->isVectorTy()) if (SI.getCondition()->getType()->isVectorTy())

return; return;

switch (Mode) { switch (Mode) {

case VM_counting: case VM_counting:

NSIs++; NSIs++;

▲ Show 20 Lines • Show All 305 Lines • ▼ Show 20 Lines if (!PGOReader->hasCSIRLevelProfile() && IsCS)

return false; return false;

// TODO: might need to change the warning once the clang option is finalized. // TODO: might need to change the warning once the clang option is finalized.

if (!PGOReader->isIRLevelProfile() && !PGOReader->hasMemoryProfile()) { if (!PGOReader->isIRLevelProfile() && !PGOReader->hasMemoryProfile()) {

Ctx.diagnose(DiagnosticInfoPGOProfile( Ctx.diagnose(DiagnosticInfoPGOProfile(

ProfileFileName.data(), "Not an IR level instrumentation profile")); ProfileFileName.data(), "Not an IR level instrumentation profile"));

return false; return false;

} }

if (PGOReader->hasSingleByteCoverage()) {

Ctx.diagnose(DiagnosticInfoPGOProfile(

ProfileFileName.data(),

"Cannot use coverage profiles for optimization"));

return false;

}

if (PGOReader->functionEntryOnly()) { if (PGOReader->functionEntryOnly()) {

Ctx.diagnose(DiagnosticInfoPGOProfile( Ctx.diagnose(DiagnosticInfoPGOProfile(

ProfileFileName.data(), ProfileFileName.data(),

"Function entry profiles are not yet supported for optimization")); "Function entry profiles are not yet supported for optimization"));

return false; return false;

} }

// Add the profile summary (read from the header of the indexed summary) here // Add the profile summary (read from the header of the indexed summary) here

Show All 9 Lines static bool annotateAllFunctions(

std::vector<Function *> HotFunctions; std::vector<Function *> HotFunctions;

std::vector<Function *> ColdFunctions; std::vector<Function *> ColdFunctions;

// If the profile marked as always instrument the entry BB, do the // If the profile marked as always instrument the entry BB, do the

// same. Note this can be overwritten by the internal option in CFGMST.h // same. Note this can be overwritten by the internal option in CFGMST.h

bool InstrumentFuncEntry = PGOReader->instrEntryBBEnabled(); bool InstrumentFuncEntry = PGOReader->instrEntryBBEnabled();

if (PGOInstrumentEntry.getNumOccurrences() > 0) if (PGOInstrumentEntry.getNumOccurrences() > 0)

InstrumentFuncEntry = PGOInstrumentEntry; InstrumentFuncEntry = PGOInstrumentEntry;

bool HasSingleByteCoverage = PGOReader->hasSingleByteCoverage();

for (auto &F : M) { for (auto &F : M) {

if (skipPGO(F)) if (skipPGO(F))

continue; continue;

auto &TLI = LookupTLI(F); auto &TLI = LookupTLI(F);

auto *BPI = LookupBPI(F); auto *BPI = LookupBPI(F);

auto *BFI = LookupBFI(F); auto *BFI = LookupBFI(F);

// Split indirectbr critical edges here before computing the MST rather than if (!HasSingleByteCoverage) {

// later in getInstrBB() to avoid invalidating it. // Split indirectbr critical edges here before computing the MST rather

SplitIndirectBrCriticalEdges(F, /*IgnoreBlocksWithoutPHI=*/false, BPI, BFI); // than later in getInstrBB() to avoid invalidating it.

SplitIndirectBrCriticalEdges(F, /*IgnoreBlocksWithoutPHI=*/false, BPI,

BFI);

}

PGOUseFunc Func(F, &M, TLI, ComdatMembers, BPI, BFI, PSI, IsCS, PGOUseFunc Func(F, &M, TLI, ComdatMembers, BPI, BFI, PSI, IsCS,

InstrumentFuncEntry); InstrumentFuncEntry, HasSingleByteCoverage);

// Read and match memprof first since we do this via debug info and can // Read and match memprof first since we do this via debug info and can

// match even if there is an IR mismatch detected for regular PGO below. // match even if there is an IR mismatch detected for regular PGO below.

if (PGOReader->hasMemoryProfile()) if (PGOReader->hasMemoryProfile())

Func.readMemprof(PGOReader.get()); Func.readMemprof(PGOReader.get());

if (!PGOReader->isIRLevelProfile()) if (!PGOReader->isIRLevelProfile())

continue; continue;

if (HasSingleByteCoverage) {

Func.populateCoverage(PGOReader.get());

continue;

}

// When PseudoKind is set to a vaule other than InstrProfRecord::NotPseudo, // When PseudoKind is set to a vaule other than InstrProfRecord::NotPseudo,

// it means the profile for the function is unrepresentative and this // it means the profile for the function is unrepresentative and this

// function is actually hot / warm. We will reset the function hot / cold // function is actually hot / warm. We will reset the function hot / cold

// attribute and drop all the profile counters. // attribute and drop all the profile counters.

InstrProfRecord::CountPseudoKind PseudoKind = InstrProfRecord::NotPseudo; InstrProfRecord::CountPseudoKind PseudoKind = InstrProfRecord::NotPseudo;

bool AllZeros = false; bool AllZeros = false;

if (!Func.readCounters(PGOReader.get(), AllZeros, PseudoKind)) if (!Func.readCounters(PGOReader.get(), AllZeros, PseudoKind))

continue; continue;

▲ Show 20 Lines • Show All 264 Lines • Show Last 20 Lines

llvm/test/Transforms/PGOProfile/coverage.ll

	; RUN: opt < %s -passes=pgo-instr-gen -pgo-function-entry-coverage -S \| FileCheck %s			; RUN: opt < %s -passes=pgo-instr-gen -pgo-function-entry-coverage -S \| FileCheck %s --implicit-check-not="instrprof.cover" --check-prefixes=CHECK,ENTRY
				; RUN: opt < %s -passes=pgo-instr-gen -pgo-block-coverage -S \| FileCheck %s --implicit-check-not="instrprof.cover" --check-prefixes=CHECK,BLOCK
	target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	define i32 @foo(i32 %i) {			define void @foo() {
				; CHECK-LABEL: entry:
	entry:			entry:
	; CHECK: call void @llvm.instrprof.cover({{.*}})			; ENTRY: call void @llvm.instrprof.cover({{.*}})
	%cmp = icmp sgt i32 %i, 0			%c = call i1 @choice()
	br i1 %cmp, label %if.then, label %if.else			br i1 %c, label %if.then, label %if.else

				; CHECK-LABEL: if.then:
	if.then:			if.then:
	; CHECK-NOT: llvm.instrprof.cover(			; BLOCK: call void @llvm.instrprof.cover({{.*}})
	%add = add nsw i32 %i, 2
	%s = select i1 %cmp, i32 %add, i32 0
	br label %if.end			br label %if.end

				; CHECK-LABEL: if.else:
	if.else:			if.else:
	%sub = sub nsw i32 %i, 2			; BLOCK: call void @llvm.instrprof.cover({{.*}})
				br label %if.end

				; CHECK-LABEL: if.end:
				if.end:
				ret void
				}

				define void @bar() {
				; CHECK-LABEL: entry:
				entry:
				; ENTRY: call void @llvm.instrprof.cover({{.*}})
				%c = call i1 @choice()
				br i1 %c, label %if.then, label %if.end

				; CHECK-LABEL: if.then:
				if.then:
				; BLOCK: call void @llvm.instrprof.cover({{.*}})
	br label %if.end			br label %if.end

				; CHECK-LABEL: if.end:
	if.end:			if.end:
	%retv = phi i32 [ %add, %if.then ], [ %sub, %if.else ]			; BLOCK: call void @llvm.instrprof.cover({{.*}})
	ret i32 %retv			ret void
				}

				define void @goo() {
				; CHECK-LABEL: entry:
				entry:
				; CHECK: call void @llvm.instrprof.cover({{.*}})
				ret void
				}

				define void @loop() {
				; CHECK-LABEL: entry:
				entry:
				; CHECK: call void @llvm.instrprof.cover({{.*}})
				br label %while
				while:
				; BLOCK: call void @llvm.instrprof.cover({{.*}})
				br label %while
				}

				; Function Attrs: noinline nounwind ssp uwtable
				define void @hoo(i32 %a) #0 {
				; CHECK-LABEL: entry:
				entry:
				; ENTRY: call void @llvm.instrprof.cover({{.*}})
				%a.addr = alloca i32, align 4
				%i = alloca i32, align 4
				store i32 %a, i32* %a.addr, align 4
				%0 = load i32, i32* %a.addr, align 4
				%rem = srem i32 %0, 2
				%cmp = icmp eq i32 %rem, 0
				br i1 %cmp, label %if.then, label %if.else

				; CHECK-LABEL: if.then:
				if.then: ; preds = %entry
				; BLOCK: call void @llvm.instrprof.cover({{.*}})
				br label %if.end

				; CHECK-LABEL: if.else:
				if.else: ; preds = %entry
				; BLOCK: call void @llvm.instrprof.cover({{.*}})
				br label %if.end

				; CHECK-LABEL: if.end:
				if.end: ; preds = %if.else, %if.then
				store i32 1, i32* %i, align 4
				br label %for.cond

				; CHECK-LABEL: for.cond:
				for.cond: ; preds = %for.inc, %if.end
				%1 = load i32, i32* %i, align 4
				%2 = load i32, i32* %a.addr, align 4
				%cmp1 = icmp slt i32 %1, %2
				br i1 %cmp1, label %for.body, label %for.end

				; CHECK-LABEL: for.body:
				for.body: ; preds = %for.cond
				%3 = load i32, i32* %a.addr, align 4
				%rem2 = srem i32 %3, 3
				%cmp3 = icmp eq i32 %rem2, 0
				br i1 %cmp3, label %if.then4, label %if.else5

				; CHECK-LABEL: if.then4:
				if.then4: ; preds = %for.body
				; BLOCK: call void @llvm.instrprof.cover({{.*}})
				br label %if.end10

				; CHECK-LABEL: if.else5:
				if.else5: ; preds = %for.body
				%4 = load i32, i32* %a.addr, align 4
				%rem6 = srem i32 %4, 1001
				%cmp7 = icmp eq i32 %rem6, 0
				br i1 %cmp7, label %if.then8, label %if.end9

				; CHECK-LABEL: if.then8:
				if.then8: ; preds = %if.else5
				; BLOCK: call void @llvm.instrprof.cover({{.*}})
				br label %return

				; CHECK-LABEL: if.end9:
				if.end9: ; preds = %if.else5
				; BLOCK: call void @llvm.instrprof.cover({{.*}})
				br label %if.end10

				; CHECK-LABEL: if.end10:
				if.end10: ; preds = %if.end9, %if.then4
				br label %for.inc

				; CHECK-LABEL: for.inc:
				for.inc: ; preds = %if.end10
				%5 = load i32, i32* %i, align 4
				%inc = add nsw i32 %5, 1
				store i32 %inc, i32* %i, align 4
				br label %for.cond

				; CHECK-LABEL: for.end:
				for.end: ; preds = %for.cond
				; BLOCK: call void @llvm.instrprof.cover({{.*}})
				br label %return

				; CHECK-LABEL: return:
				return: ; preds = %for.end, %if.then8
				ret void
	}			}

	; CHECK: declare void @llvm.instrprof.cover(			declare i1 @choice()

				; CHECK: declare void @llvm.instrprof.cover({{.*}})

This is an archive of the discontinued LLVM Phabricator instance.

[InstrProf] Minimal Block CoverageClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 509495

compiler-rt/test/profile/instrprof-block-coverage.c

compiler-rt/test/profile/instrprof-coverage.c

compiler-rt/test/profile/instrprof-entry-coverage.c

llvm/include/llvm/Transforms/Instrumentation/BlockCoverageInference.h

llvm/lib/Transforms/Instrumentation/BlockCoverageInference.cpp

llvm/lib/Transforms/Instrumentation/CMakeLists.txt

llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp

llvm/test/Transforms/PGOProfile/coverage.ll

[InstrProf] Minimal Block Coverage
ClosedPublic