This is an archive of the discontinued LLVM Phabricator instance.

[InstrProf] Add single byte coverage mode
ClosedPublic

Authored by ellis on Dec 22 2021, 11:26 AM.

Details

Summary

Use the llvm flag -pgo-function-entry-coverage to create single byte "counters" to track functions coverage. This mode has significantly less size overhead in both code and data because

  • We mark a function as "covered" with a store instead of an increment which generally requires fewer assembly instructions
  • We use a single byte per function rather than 8 bytes per block

The trade off of course is that this mode only tells you if a function has been covered. This is useful, for example, to detect dead code.

When combined with debug info correlation [0] we are able to create an instrumented Clang binary that is only 150M (the vanilla Clang binary is 143M). That is an overhead of 7M (4.9%) compared to the default instrumentation (without value profiling) which has an overhead of 31M (21.7%).

[0] https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4

Diff Detail

Event Timeline

ellis created this revision.Dec 22 2021, 11:26 AM
ellis published this revision for review.Dec 22 2021, 11:31 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptDec 22 2021, 11:31 AM
Herald added subscribers: llvm-commits, Restricted Project. · View Herald Transcript

I will come back to review this code later once the dependent patches land.
In the high-level discussion, this is really a function coverage because it probes the function header, but the flag or terminology inter-mixed just a coverage.
For the code (not just function) coverage, I think we might want to have an option to inject this probe in the block edge as well so that the default/code-coverage can be fully functional.
Of course, this could be a follow-up, but just head-up on how this can be factored in this design.

compiler-rt/test/profile/instrprof-coverage.c
2

Can you add a test case using the debug data -debug-info?

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
479

Hmm. It appears to continue adding extra parameters.
I wonder whether we can refactor this given more features are coming.

ellis added a comment.Dec 29 2021, 7:51 PM

In the high-level discussion, this is really a function coverage because it probes the function header, but the flag or terminology inter-mixed just a coverage.

You're right, I should update the flag name to make it more clear that this is only a function entry coverage.

For the code (not just function) coverage, I think we might want to have an option to inject this probe in the block edge as well so that the default/code-coverage can be fully functional.
Of course, this could be a follow-up, but just head-up on how this can be factored in this design.

I've experimented with instrumenting clang with block coverage vs block counts. It turns out that full block coverage actually requires a larger counters section than regular block counts. This is because, for block counts, we can infer some missing block counts from summing up their parents or children. However, we cannot do this when using boolean values in the same way. So every basic block must be probed. (There are some cases where you can infer some missing block coverage, but I haven't thought deeply about when you can do this. And I suspect this is rare.) Also, there was slightly less assembly code from block counts rather than block coverage, but that only made the overall size difference negligible between the two.

Basically, block coverage has about the same size overhead, but gives significantly less information than block counts. We could allow for counters to be 16 or 8 bits to save space in the block counters mode, but that would be left for a later diff.

ellis updated this revision to Diff 396597.Dec 29 2021, 8:07 PM

Rebase and add test to instrprof-debug-info-correlate.c.

kyulee added inline comments.Dec 30 2021, 10:28 AM
llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
1025

I think we can just delete this condition.
Without DebugInfoCorrelate, Counters are referenced by Data which is being explicitly held as below via CompilerUsedVars.push_back(Data);.
With DebugInfoCorrelate, now Counters can be optimized away since they are local globals while there are no explicit use.
I think the reason the Coverage case (write only) does not work while the Counter case (read/write) works is simply because the global opt is smart enough to optimize the Coverage case but not the Counter case -- in theory there is no explicit use in the chain of references on Counters (other than compiler-rt/runtime), and the compiler may optimize them.

ellis added inline comments.Dec 30 2021, 10:53 AM
llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
1025

Yep, I agree. I have a separate diff D115981 to fix this and will update this code after that lands.

I know that you have posted many patches in this area and regular reviewers (e.g. davidxl, vsk) are not involved.
It is a holiday season for many folks so some may not be active.
For some intrusive/significant changes worth getting attention from them before committed, even after you get approval from your colleague(?).

ellis added a comment.Dec 30 2021, 2:52 PM

I know that you have posted many patches in this area and regular reviewers (e.g. davidxl, vsk) are not involved.
It is a holiday season for many folks so some may not be active.
For some intrusive/significant changes worth getting attention from them before committed, even after you get approval from your colleague(?).

Makes sense to me because I posted this just before the holidays. I can hold off landing this and the previous patch until around the second week of January (I'm taking most of next week off anyway).

I know that you have posted many patches in this area and regular reviewers (e.g. davidxl, vsk) are not involved.
It is a holiday season for many folks so some may not be active.
For some intrusive/significant changes worth getting attention from them before committed, even after you get approval from your colleague(?).

Makes sense to me because I posted this just before the holidays. I can hold off landing this and the previous patch until around the second week of January (I'm taking most of next week off anyway).

Thanks! They do not subscribe to these changes so they may not even know. (perhaps you mentioned the Differential links on some thread on llvm-dev, but it is pretty easy to miss this piece of information.)
I know that prudently picking reviewers is not easy: you need to do some git log archaeology who contributed/reviewed code in the area and may likely have an opinion.
I remember that they both commented on a previous proposal from you, so they likely have an opinion.

davidxl added inline comments.Jan 4 2022, 2:58 PM
llvm/docs/LangRef.rst
13150

probably name it llvm.instrprof.entry_cover to reflect the semantics. In theory you can have block level coverage mode too.

llvm/include/llvm/IR/IntrinsicInst.h
1229–1230

Split out this in different patch?

llvm/include/llvm/ProfileData/InstrProfReader.h
527

entry coverage is compatiable with IR pgo -- so perhaps add an assertion somewhere.

llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
959

Split out the change into a helper function to make it cleaner.

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
258

make the variable name matching the option name

llvm/test/Transforms/PGOProfile/coverage.ll
13

Just check there is one instance of instrinsic call per function?

llvm/tools/llvm-profdata/llvm-profdata.cpp
2088

Add a test case for the coverage mode.

2154

should probably by pass the rest of dump in coverage mode.

kyulee added inline comments.Jan 4 2022, 4:45 PM
llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
964

Can we just use Constant::getAllOnesValue() like Constant::getNullValue() with the array type?

ellis updated this revision to Diff 399832.Jan 13 2022, 4:30 PM
ellis marked 8 inline comments as done.

Rebase and split NFC changes in another diff.

ellis added inline comments.
compiler-rt/test/profile/instrprof-coverage.c
2

I've added this to the above test instrprof-debug-info-correlate.c.

llvm/docs/LangRef.rst
13150

I agree. Renaming to llvm.instrprof.entry_cover.

llvm/include/llvm/IR/IntrinsicInst.h
1229–1230

Created a parent diff

llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
964

Unfortunately, ConstantArray::get() takes an array so we cannot give it just a Constant.

https://llvm.org/doxygen/classllvm_1_1ConstantArray.html#a0900dacdc7ad8e3ea0cc92993c7fd422

Also, getAllOnesValue() does not appear to accept an array type yet.

llvm/test/Transforms/PGOProfile/coverage.ll
13

Yeah, this is just a sanity check that it isn't called in other blocks.

llvm/tools/llvm-profdata/llvm-profdata.cpp
2088

Added show-covered.test.

ellis updated this revision to Diff 400106.Jan 14 2022, 12:24 PM

Fix class name

kyulee added inline comments.Jan 14 2022, 1:02 PM
llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
964

FixedVectorType instead of ArrayType seems the same semantic, which appears to work.

ellis updated this revision to Diff 400153.Jan 14 2022, 2:39 PM

Split change into new function

ellis marked an inline comment as done.Jan 14 2022, 2:39 PM
ellis added inline comments.
llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
964

I would like to avoid using different array types for function entry cover and block counters. From the docs (https://llvm.org/docs/LangRef.html#vector-type) it seems that the vector type is intended for SIMD instructions and we won't them on these globals.

I think a better solution would be to expand getAllOnesValue() to take an array type, but I haven't looked into whether that is realistic.

ellis edited the summary of this revision. (Show Details)Jan 14 2022, 3:21 PM
ellis added inline comments.Jan 24 2022, 8:13 PM
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
479

I've created a D118097 to refactor this

This change appears to be implementing func-cov mode proposed in https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4. Do you plan on implementing the support for block-cov and func-cnt modes as well? I'm asking because we're interested in the block-cov mode, but also because I'm somewhat concerned about the orthogonality of these modes in the implementation.

For example, this change introduces the VARIANT_MASK_BYTE_ENTRY_COVERAGE flag to track whether the func-cov mode is used. This means that we're going to need two more bits for block-cov and func-cnt (three in total). Wouldn't it make more sense to instead have VARIANT_MASK_BYTE_COVERAGE and VARIANT_MASK_ENTRY_COVERAGE flags which could be combined to represent all four modes, requiring only two bits in total?

ellis added a comment.Jan 25 2022, 9:49 AM

This change appears to be implementing func-cov mode proposed in https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4. Do you plan on implementing the support for block-cov and func-cnt modes as well? I'm asking because we're interested in the block-cov mode, but also because I'm somewhat concerned about the orthogonality of these modes in the implementation.

For example, this change introduces the VARIANT_MASK_BYTE_ENTRY_COVERAGE flag to track whether the func-cov mode is used. This means that we're going to need two more bits for block-cov and func-cnt (three in total). Wouldn't it make more sense to instead have VARIANT_MASK_BYTE_COVERAGE and VARIANT_MASK_ENTRY_COVERAGE flags which could be combined to represent all four modes, requiring only two bits in total?

I was only planning on adding func-cov. A while back I did a small experiment and found very little binary size difference between block-cov and full block-counts. This is because block-counts can instrument a subset of blocks and infer counts from the rest. With block-cov we cannot do this in the same way so we need to instrument every block. There might be a smarter way to infer some block coverage, but I don't know it.

I decided to use just VARIANT_MASK_BYTE_ENTRY_COVERAGE for simplicity, but I'm open to using two flags like you suggested.

Can I ask why you are interested in block-cov rather than using full block counters? If binary size is a concern then I would suggest inferring coverage from 8 bit counters, assuming overflow isn't a problem. If speed is the main concern then maybe block-cov is the right way to go.

This change appears to be implementing func-cov mode proposed in https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4. Do you plan on implementing the support for block-cov and func-cnt modes as well? I'm asking because we're interested in the block-cov mode, but also because I'm somewhat concerned about the orthogonality of these modes in the implementation.

For example, this change introduces the VARIANT_MASK_BYTE_ENTRY_COVERAGE flag to track whether the func-cov mode is used. This means that we're going to need two more bits for block-cov and func-cnt (three in total). Wouldn't it make more sense to instead have VARIANT_MASK_BYTE_COVERAGE and VARIANT_MASK_ENTRY_COVERAGE flags which could be combined to represent all four modes, requiring only two bits in total?

I was only planning on adding func-cov. A while back I did a small experiment and found very little binary size difference between block-cov and full block-counts. This is because block-counts can instrument a subset of blocks and infer counts from the rest. With block-cov we cannot do this in the same way so we need to instrument every block. There might be a smarter way to infer some block coverage, but I don't know it.

I decided to use just VARIANT_MASK_BYTE_ENTRY_COVERAGE for simplicity, but I'm open to using two flags like you suggested.

Can I ask why you are interested in block-cov rather than using full block counters? If binary size is a concern then I would suggest inferring coverage from 8 bit counters, assuming overflow isn't a problem. If speed is the main concern then maybe block-cov is the right way to go.

We're mostly concerned about performance so block-cov is still of interest to us, but this is a useful insight, thanks! If it's not too much of an overhead, I'd prefer using the two flags so we can introduce the block-cov mode later.

ellis added a comment.Jan 25 2022, 4:22 PM

This change appears to be implementing func-cov mode proposed in https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4. Do you plan on implementing the support for block-cov and func-cnt modes as well? I'm asking because we're interested in the block-cov mode, but also because I'm somewhat concerned about the orthogonality of these modes in the implementation.

For example, this change introduces the VARIANT_MASK_BYTE_ENTRY_COVERAGE flag to track whether the func-cov mode is used. This means that we're going to need two more bits for block-cov and func-cnt (three in total). Wouldn't it make more sense to instead have VARIANT_MASK_BYTE_COVERAGE and VARIANT_MASK_ENTRY_COVERAGE flags which could be combined to represent all four modes, requiring only two bits in total?

I was only planning on adding func-cov. A while back I did a small experiment and found very little binary size difference between block-cov and full block-counts. This is because block-counts can instrument a subset of blocks and infer counts from the rest. With block-cov we cannot do this in the same way so we need to instrument every block. There might be a smarter way to infer some block coverage, but I don't know it.

I decided to use just VARIANT_MASK_BYTE_ENTRY_COVERAGE for simplicity, but I'm open to using two flags like you suggested.

Can I ask why you are interested in block-cov rather than using full block counters? If binary size is a concern then I would suggest inferring coverage from 8 bit counters, assuming overflow isn't a problem. If speed is the main concern then maybe block-cov is the right way to go.

We're mostly concerned about performance so block-cov is still of interest to us, but this is a useful insight, thanks! If it's not too much of an overhead, I'd prefer using the two flags so we can introduce the block-cov mode later.

Sounds good! I'll use two flags but leave the other cases unimplemented for now.

ellis updated this revision to Diff 403106.Jan 25 2022, 6:30 PM

Use two flags for single byte coverage and function entry coverage.

ellis added a comment.Jan 27 2022, 9:55 AM

@davidxl @phosek @kyulee Can I get another look at this?

llvm/docs/LangRef.rst
13150

I renamed it to llvm.instrprof.cover because this would be used for block coverage.

llvm/include/llvm/ProfileData/InstrProfReader.h
527

I added an assertion in PGOInstrumentation.cpp.

LGTM. I'm approving this but suggest for other reviewers'. Also I left a few comments about naming counters that may follow up.

llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
696–726

Strictly speaking, it's not a counter for coverage because a store instead of an adder will be used. Same as in the description of intrinsic, and all other api names using counters.
A probe or something might be a better name. However, it seems very invasive to change all the counter name, which may confuse the current (main) semantic for edge counter.
I suggest to just keep as it but I'm also open to other reviewer's opinions.

kyulee accepted this revision.Jan 27 2022, 11:05 AM
This revision is now accepted and ready to land.Jan 27 2022, 11:05 AM
davidxl added inline comments.Jan 27 2022, 11:10 AM
compiler-rt/lib/profile/InstrProfilingMerge.c
161

is there a test case for this?

ellis updated this revision to Diff 403797.Jan 27 2022, 2:14 PM

Rebase ontop of D115393

compiler-rt/lib/profile/InstrProfilingMerge.c
161

Added the test instrprof-merge-entry-cover.c and found a bug in __llvm_profile_reset_counters(). Thanks for the reminder!

llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
696–726

Yeah, I would like to use "probe", but "counters" is used all of the place in the source. I can bring this up in https://llvm.discourse.group/ to see if we are willing to make this change.

snehasish added inline comments.Jan 27 2022, 2:41 PM
llvm/include/llvm/ProfileData/InstrProf.h
289

At a glance, this looks similar to the BB enum value. If this is coverage only can we make it explicit in the name? Perhaps something like FunctionEntryCoverageOnly. IIUC the comment implies "only instrument the function entry for coverage" as opposed to whether the the entry basic block has instrumentation for profiling.

I like the idea of having better naming for the enum values and I'll send out a separate patch renaming the others once this is submitted.

ellis added inline comments.Jan 27 2022, 3:07 PM
llvm/include/llvm/ProfileData/InstrProf.h
289

This is actually not for coverage. FunctionEntryOnly only instruments the entry basic block, whereas BB guarantees that the entry basic block is instrumented, along with other blocks.

snehasish added inline comments.Jan 27 2022, 3:51 PM
llvm/include/llvm/ProfileData/InstrProf.h
289

I see. So should we allow profiles with BB and FunctionEntryOnly to be merged? If not can you update the logic in mergeProfileKind in InstrProfWriter.h?

ellis updated this revision to Diff 403844.Jan 27 2022, 5:22 PM

Make BB and FunctionEntryOnly incompatible.

ellis updated this revision to Diff 403847.Jan 27 2022, 5:30 PM

Fix build error

This revision was landed with ongoing or failed builds.Jan 27 2022, 5:39 PM
This revision was automatically updated to reflect the committed changes.

LGTM. I'm approving this but suggest for other reviewers'. Also I left a few comments about naming counters that may follow up.

Thanks for contributing a new coverage mode. I believe it will be useful to some groups.
That said, process-wise, I think committing this just 6hr after this comment was probably too quick. 6hr did not give stakeholders enough time to read through a major new feature.
Some stakeholders may have other commitments so don't respond timely. But if you gave more time with the patch in an "approved" state, they would probably want to comment.

I think my main question is why does a coverage mode need to piggyback on PGOInstrument.cpp which is mainly for optimizations.

From the functionality implemented by -pgo-function-entry-coverage, I don't see how it can be used for optimization.

Herald added a project: Restricted Project. · View Herald TranscriptJun 4 2022, 12:16 PM
ellis added a comment.Jun 6 2022, 11:19 AM

LGTM. I'm approving this but suggest for other reviewers'. Also I left a few comments about naming counters that may follow up.

Thanks for contributing a new coverage mode. I believe it will be useful to some groups.
That said, process-wise, I think committing this just 6hr after this comment was probably too quick. 6hr did not give stakeholders enough time to read through a major new feature.
Some stakeholders may have other commitments so don't respond timely. But if you gave more time with the patch in an "approved" state, they would probably want to comment.

I think my main question is why does a coverage mode need to piggyback on PGOInstrument.cpp which is mainly for optimizations.

From the functionality implemented by -pgo-function-entry-coverage, I don't see how it can be used for optimization.

I apologize for moving too quickly to land. In the future, I'll try to allow more time for others to review and respect their time.

As for your question, I've posted this RFC [0] describing our plans for lightweight instrumentation. Basically, we were planning to create a few instrumentation modes that trade off profile precision for a more lightweight instrumented binary. 1) block coverage, 2) function entry count, and 3) function entry coverage. As you've seen, D124490 implements block coverage and the profiles are useful for optimization in D125743 which uses block liveness to guide the machine outliner.

Function entry coverage implemented in this diff was an important step towards block coverage instrumentation. Both modes share a lot of code, which is a big reason why I added this in PGOInstrumentation.cpp. Unlike block coverage instrumentation, function entry coverage is not precise enough to properly guide optimizations, so its main use is to detect dead code.

[0] https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4