This is an archive of the discontinued LLVM Phabricator instance.

Add support for collating profiles for use with code coverage
ClosedPublic

Authored by vsk on Jun 3 2016, 4:59 PM.

Details

Summary
  • Add a cmake option to build with clang code coverage.
  • Specify a profile data directory along with -fprofile-instr-generate: this lets us gather profiles from instrumented binaries in a dedicated location.
  • Add a utility script that generates code coverage artifacts.

This patch uses the new in-process raw profile merging functionality in compiler-rt. We no longer index raw profiles after every test invocation, so no lit changes are required.

Running ninja check-llvm-unit generates 682M of raw profile data using a merge pool size of 4. In practice, this turns out to be a lot faster (and also much more compact) than doing llvm-profdata merge -sparse after each test run.

Diff Detail

Event Timeline

vsk updated this revision to Diff 59641.Jun 3 2016, 4:59 PM
vsk retitled this revision from to [lit] Add support for PGO profile and code coverage collection.
vsk updated this object.
vsk added reviewers: beanz, cmatthews, ddunbar.
vsk added a subscriber: llvm-commits.
MatzeB added a subscriber: MatzeB.Jun 3 2016, 8:30 PM

This adds a bigger bunch of code to lit, I wonder if it is necessary:

  • I assume the necessity for the logic inside lit is just the fact that the profile data overwrites each other after running a command. Is setting LLVM_PROFILE_FILE with a '%p' placeholder not enough to avoid this? The final profile data merging can always be done outside of llvm-lit afterwards. Or are there other reasons why you need to intercept the opt/llc/etc. calls?
  • The lit tests don't strike me as the most typical llvm usage (they are often designed to test corner cases of the compiler), wouldn't it be better to gather profiles by compiling the llvm test-suite or something similar?
davidxl added a subscriber: davidxl.Jun 3 2016, 9:24 PM

How many profile data files are generated in a full run? It may take huge amount of space.

I have not gotten time to submit my patch for the runtime to support online/in-process merge yet, but will try to get it done soon. Once that is ready, we can reduce the overhead greatly.

vsk added a comment.Jun 3 2016, 9:24 PM

This adds a bigger bunch of code to lit, I wonder if it is necessary:

  • I assume the necessity for the logic inside lit is just the fact that the profile data overwrites each other after running a command. Is setting LLVM_PROFILE_FILE with a '%p' placeholder not enough to avoid this?

Using PID substitution gets close to solving the problem of having overwritten profiles, but not all the way. On 32-bit systems PID wraparound would pose a real problem. In this patch I include the hash of the test command to minimize loss of profiles.

The final profile data merging can always be done outside of llvm-lit afterwards.

On a practical level, I don't think this is possible. Raw profiles are too large. Turning off the cleanup step and running check-llvm produces over half a terrabyte of data. There are a few ways to address this without touching lit:

  1. Reduce the size of raw profiles. This would make the compiler runtime larger and more complex.
  2. Run a monitor process that does the merging/cleanup. I don't think that approach has any advantages compared to modifying lit.

Alternatively, we could do in-place raw profile merging in the compiler runtime. I don't think that's preferable because we'd have to introduce a lot of complexity to compiler-rt (e.g portable mandatory file locking).

Or are there other reasons why you need to intercept the opt/llc/etc. calls?

Intercepting the calls makes it possible to generate 'unique' hashes for the profiles, see my comment above.

  • The lit tests don't strike me as the most typical llvm usage (they are often designed to test corner cases of the compiler), wouldn't it be better to gather profiles by compiling the llvm test-suite or something similar?

It's true that the profiles gathered from check-llvm aren't ideal for PGO. But, they are perfect for measuring code coverage of the llvm codebase. Moreover because lit is portable, other projects can use these changes to gather coverage reports with their tests.

(IIRC the test-suite also uses lit?)

MatzeB added a comment.Jun 3 2016, 9:32 PM
In D20993#449116, @vsk wrote:

This adds a bigger bunch of code to lit, I wonder if it is necessary:

  • I assume the necessity for the logic inside lit is just the fact that the profile data overwrites each other after running a command. Is setting LLVM_PROFILE_FILE with a '%p' placeholder not enough to avoid this?

Using PID substitution gets close to solving the problem of having overwritten profiles, but not all the way. On 32-bit systems PID wraparound would pose a real problem. In this patch I include the hash of the test command to minimize loss of profiles.

This is a really annoying problem to have... Maybe we should find a solution within the profile infrastructure itself (can we add another flag that adds a unique suffix to the filename if it already exists?) so not every user of the profiling infrastructure has to jump through the same hoops (I've done a similar dance in the test-suite profile support).

The final profile data merging can always be done outside of llvm-lit afterwards.

On a practical level, I don't think this is possible. Raw profiles are too large. Turning off the cleanup step and running check-llvm produces over half a terrabyte of data. There are a few ways to address this without touching lit:

  1. Reduce the size of raw profiles. This would make the compiler runtime larger and more complex.
  2. Run a monitor process that does the merging/cleanup. I don't think that approach has any advantages compared to modifying lit.

Alternatively, we could do in-place raw profile merging in the compiler runtime. I don't think that's preferable because we'd have to introduce a lot of complexity to compiler-rt (e.g portable mandatory file locking).

Ah, so that seems harder to solve without an external driver merging the profile data. I'm still not a big fan of injecting this stuff into the core of lit, but we may have no other choice today.

Or are there other reasons why you need to intercept the opt/llc/etc. calls?

Intercepting the calls makes it possible to generate 'unique' hashes for the profiles, see my comment above.

  • The lit tests don't strike me as the most typical llvm usage (they are often designed to test corner cases of the compiler), wouldn't it be better to gather profiles by compiling the llvm test-suite or something similar?

It's true that the profiles gathered from check-llvm aren't ideal for PGO. But, they are perfect for measuring code coverage of the llvm codebase. Moreover because lit is portable, other projects can use these changes to gather coverage reports with their tests.

Oh indeed I did not think about code coverage.

(IIRC the test-suite also uses lit?)

The test-suite uses lit to execute the benchmarks. I already added profiling support for that (see the TEST_SUITE_PROFILE_GENERATE/TEST_SUITE_PROFILE_USE variants). This is however about running the benchmarks. It does not help you when you want to generate profile data from the clang executable that compiled the benchmarks as that is only done by make (or ninja) and not lit.

vsk added a comment.Jun 6 2016, 10:55 AM

Some of the discussion moved onto the mailing list and Phab couldn't pick up the comments.

The TL;DR: The plan is to add in-process raw profile merging support to compiler-rt. In fact, David has already added the basic mandatory file-locking constructs we'll need (see r271864). Apart from bringing general usability improvements, this will let us trim this patch down significantly.

vsk updated this revision to Diff 60287.Jun 9 2016, 5:33 PM
vsk retitled this revision from [lit] Add support for PGO profile and code coverage collection to Add support for collating profiles for use with code coverage.
vsk updated this object.
vsk edited reviewers, added: davidxl, MatzeB; removed: ddunbar, cmatthews.
vsk removed subscribers: davidxl, MatzeB.
  • Reworked the patch to drop all lit changes. See the new title/summary.
vsk added a reviewer: ributzka.Jun 9 2016, 5:42 PM
davidxl edited edge metadata.Jun 9 2016, 10:34 PM

Why choosing default size of 9? Do you have build time comparison numbers i.e., pool size from 1 to 9?

silvas added a subscriber: silvas.Jun 10 2016, 12:23 AM

Nit about the shell script.

utils/prepare-code-coverage-artifact.sh
1

I mentioned this in one of the reviews of some stuff that Chris Bieneman was adding to clang for PGO (http://reviews.llvm.org/D15462), but these kinds of utility scripts are generally much more readable, robust, and portable when written in Python.

For example, the standard library operations all will throw exceptions nicely in cases of errors (which will stop the program dead, give you a complete stacktrace that is e.g. easy to read in a bot log, etc.). Also you can easily say assert <cond>, "reason" for quick sanity checks / "error handling" (again, you get a full stack trace if it fails, so for this kind of script more elaborate error handling isn't really needed).

It's really hard to get that kind of robustness from a shell script. (also it avoids general shell script problems like what happens when paths contain spaces, which lead to tremendous head scratching)

Also, writing it in Python makes it pretty portable for free and is generally more readable.

vsk added a comment.Jun 10 2016, 4:35 PM

Why choosing default size of 9? Do you have build time comparison numbers i.e., pool size from 1 to 9?

I chose 9 because it's the highest option: a larger pool of raw profiles should theoretically lead to less lock contention. Ideally this would be some factor of the number of threads on the system. However, I don't think it's worth optimizing this default.

Also, writing it in Python makes it pretty portable for free and is generally more readable.

Sure, I'll upload a new diff with this change.

vsk updated this revision to Diff 60430.Jun 10 2016, 6:21 PM
vsk updated this object.
vsk edited edge metadata.
  • Pythonify the artifact prep script.
  • Use a default merge pool size of 4 (see the mailing list discussion for some justification for this).
vsk updated this object.Jun 10 2016, 6:25 PM
davidxl added inline comments.Jun 13 2016, 11:30 AM
utils/prepare-code-coverage-artifact.py
20 ↗(On Diff #60430)

is profiles.manifest file needed here? does it simplify the process?

vsk added inline comments.Jun 13 2016, 11:32 AM
utils/prepare-code-coverage-artifact.py
20 ↗(On Diff #60430)

It's a bit of future-proofing. It was needed with the previous patch because of the sheer number of raw profiles. Now, it helps avoid failures during the check_call step due to too many arguments.

davidxl accepted this revision.Jun 13 2016, 11:33 AM
davidxl edited edge metadata.

lgtm

This revision is now accepted and ready to land.Jun 13 2016, 11:33 AM
silvas accepted this revision.Jun 13 2016, 3:37 PM
silvas added a reviewer: silvas.

Thanks for converting the script to python!

One advisory comment on the script (not a big deal), but this LGTM.

utils/prepare-code-coverage-artifact.py
20 ↗(On Diff #60430)

%m only supports up to 9, which should be fine even on windows. It would make things slightly simpler to pass directly on the command line.

vsk added a comment.Jun 13 2016, 3:42 PM

Thanks for the feedback and reviews. I'm planning on committing after double-checking that this doesn't break any of our PGO bots.

utils/prepare-code-coverage-artifact.py
20 ↗(On Diff #60430)

Fair point, though that does get multiplied by the number of distinct binaries invoked.

This revision was automatically updated to reflect the committed changes.