This is an archive of the discontinued LLVM Phabricator instance.

Add docs+a script for building clang/LLVM with PGO
ClosedPublic

Authored by george.burgess.iv on Oct 23 2018, 11:58 AM.

Details

Summary

Depending on who you ask, PGO grants a 15%-25% improvement in build times when using clang. Sadly, hooking everything up properly to generate a profile and apply it to clang isn't always straightforward. This script (and the accompanying docs) aim to make this process easier; ideally, a single invocation of the given script.

In terms of testing, I've got a cronjob on my Debian box that's meant to run this a few times per week, and I tried manually running it on a puny Gentoo box I have (four whole Atom cores!). Nothing obviously broke. ¯\_(ツ)_/¯

Don't really know who to tag for the review of this; IIRC, I chatted with the two of you about it at the dev conf? Other voices appreciated. :)

FWIW, I don't know if we have a Python style guide, so I just shoved this through yapf with all the defaults on. Happy to paint it any color we like, as long as I can do so with a tool.

Finally, though the focus is clang at the moment, the hope is that this is easily applicable to other LLVM-y tools with minimal effort (e.g. lld, opt, ...). Hence, this lives in llvm/utils and tries to be somewhat ambiguous about naming

Diff Detail

Event Timeline

hans added a comment.Oct 25 2018, 7:58 AM

I didn't have time to look at the script yet, but I read through the instructions doc. For most users I think maybe that's also the most important part, because I think many folks build quite differently. I will try to get to reading the script tomorrow.

docs/HowToBuildWithPGO.rst
41

Maybe s/test suites/lit tests/? Because I assume it's not building test-suite?

43

Including solid coverage for emitting all kinds of C++ diagnostics, whereas in many user builds none would be emitted. This is just a nit-pick, I'm not sure if there's a better way, but that might be one downside of using these kinds of tests for training.

69

3 and 4 kind of go together? I guess the outputs have to be merged, but the profile really is built while running the benchmarks, or that's how I think about it.

71

nit: s/with/using/ maybe?

80

Cool! I didn't know about this one.

92

But the text says its running the test suite instead?

110

instead of freshly-optimized, maybe say PGO-optimized or something similar to help distinguish the different kinds of clang getting built

aganea added a subscriber: aganea.Oct 25 2018, 8:25 AM
george.burgess.iv marked 7 inline comments as done.

Address feedback

Thank you!

For most users I think maybe that's also the most important part, because I think many folks build quite differently

Agreed. The script is only really made to cover simple builds/cases where users just want a general profile and are happy to pipe that through their own build logic.

If users can't use the script itself, my hope is that --dry-run will serve as a more easily executable form of documentation, and that the script itself ends up being a gentle reminder to update the docs when it breaks. :)

docs/HowToBuildWithPGO.rst
43

Not quite; I think I worded the above badly.

We basically do two things to "train" the instrumented clang/llvm:

  • In the instrumented clang/llvm's build directory, run all lit/unit tests
  • In a new build directory, build everything with the instrumented clang/llvm

The hope is that "build everything" will strongly bias hot paths toward the common-ish "your code is pretty OK" paths. The other tests may bias some colder branches the wrong way, but I'd imagine that:

  • That's not really an issue in practice, hence "colder"
  • For cases that aren't e.g. building code for the host arch, we'll still get some coverage for what's hot/not, as long as we have the relevant backends enabled.

Tried to clarify above a bit. Please let me know if it's still unclear

69

*shrug*

I'm not strongly opinionated, so SGTM :)

92

Tried to address this above as part of my response to your second comment

hans accepted this revision.Oct 26 2018, 2:23 AM

lgtm

docs/HowToBuildWithPGO.rst
43

Ah, that makes sense. The new text is very clear, thanks.

utils/collect_and_build_with_pgo.py
167

Since you're quoting the command, maybe quote the dir too for consistency, especially in case someone is brave enough to have spaces in it :-)

This revision is now accepted and ready to land.Oct 26 2018, 2:23 AM
george.burgess.iv marked an inline comment as done.Oct 26 2018, 1:58 PM

Thanks again!

This revision was automatically updated to reflect the committed changes.

LLVM's CMake has built in support for PGO; see https://llvm.org/docs/AdvancedBuilds.html#multi-stage-pgo. I haven't looked at the script in detail, but does it function similarly?

llvm/trunk/docs/HowToBuildWithPGO.rst
21 ↗(On Diff #171345)

"the the"

Thanks for pointing that out!

Yeah, someone forwarded that to me off-list yesterday. Apologies for the sorta-duplication here :)

I imagine the cmake support uses the same configuration flags/etc that this script does. When I have some free time, I hope to look more into it and make this script depend on the PGO cmake targets/etc (or, if the differences are tiny, turn this script into a thin wrapper around the cmake bits + something that can provide larger test-cases, since it isn't obvious to me that the cmake bits have preloaded tests beyond "hello world").

In any case, I'll update the docs with whatever I find/do

I notice that you're using LLVM_BUILD_INSTRUMENTED=IR, which corresponds to -fprofile-generate (IR-level profiling), instead of -fprofile-instr-generate (clang-level profiling). Did you play around with both and observe that IR-level profiling gave you better results?

Btw, I tried this out and got a 20% improvement on a self-host with PGO, which is pretty handy :)

I notice that you're using LLVM_BUILD_INSTRUMENTED=IR, which corresponds to -fprofile-generate (IR-level profiling), instead of -fprofile-instr-generate (clang-level profiling). Did you play around with both and observe that IR-level profiling gave you better results?

IR-level profiling gave me better results for unrelated projects in the past, so not initially, but it's probably a good idea.

A quick experiment shows that ninja opt consumes 2.5% fewer user cycles when clang is optimized with an IR-level profile rather than with a frontend one. Building an arbitrary-but-large cpp file (clang/lib/Sema/SemaOverload.cpp) shows a similar 4% win for the IR profile.

Btw, I tried this out and got a 20% improvement on a self-host with PGO, which is pretty handy :)

Woohoo!