This is an archive of the discontinued LLVM Phabricator instance.

[Clang][ThinLTO] Add a cache for compile phase output.
Needs ReviewPublic

Authored by ychen on Oct 23 2019, 12:49 AM.

Details

Summary

Currently the link phase has a object file cache whereas the compile phase always
perform optimizations (most likely happen for large source files and O2 or above)
which could potentially waste time optimizing a file that finally hit the object file cache.
For example, with Intel W-2133 and 64GB memory, compile X86ISelLowering.cpp with -flto=thin -O3
takes about 40s (takes about 10s with caching implemented by this patch).
The patch makes sure bitcodes that hit LTO cache also skip IR optimizations.

Add a driver/cc1 flag (-fthinlto-cache-dir, default off) to cache the minimized or regular ThinLTO bitcode file.
The caching is only trigger if the input is large than -fthinlto-cache-min-filesize=. Default minimum is 1024 IR instructions.
Cache pruning (-fthinlto-cache-policy=) shares the implementation with lld --thinlto-cache-policy.

Event Timeline

ychen created this revision.Oct 23 2019, 12:49 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptOct 23 2019, 12:49 AM

I haven't read through the patch in detail yet, but from the description it sounds like a cache is being added for the compile step outputs, e.g. the bitcode output object of the "clang -flto=thin -c" step? Typically the build system takes care of incremental build handling for these explicit outputs, just as it would for a non-LTO clang compile output. Can you clarify the motivation?

ychen added a comment.Oct 23 2019, 2:07 PM

I haven't read through the patch in detail yet, but from the description it sounds like a cache is being added for the compile step outputs, e.g. the bitcode output object of the "clang -flto=thin -c" step? Typically the build system takes care of incremental build handling for these explicit outputs, just as it would for a non-LTO clang compile output. Can you clarify the motivation?

Hi Teresa, thanks for the feedback. I think the motivation is to provide an option that reliably caches thinLTO compile output regardless of the external setup of compiler cache or the platform is used. I'm not sure if compiler cache such as ccache could save -fthin-link-bitcode output and if there is a ccache equivalence on windows? Even if such tool exists, having linking phase caching managed by toolchain/linker and the compile phase by an external tool feels awkward and fragile. This patch mostly shares the implementation with linke phase caching.

Additionally, this also has a small benefit of trigging a little bit more caching because it is hashing IR instead of preprocessor output or even literal source file.

Can you clarify what do you mean by 'waste time optimizing a file that finally hit the object file cache'?

No matter what build system to use, it should figure out during an incremental build that the input wasn't changed and not rerun the clang invocation on the same input. If you are looking to achieve what ccache is doing, I don't think implement in the compiler is a good option. That should really be done on the build system level. This solution is strictly worse than ccache because ccache is able to fetch the optimized bitcode without even running clang frontend and ir-gen.

I was just typing up a response similar to @steven_wu 's response... I don't think clang should be in the business of caching the outputs of a prior clang invocation, the build system should and usually does avoid re-executing if the inputs have not changed. Note that this is very different from the caching of objects LTO is doing - because those are not otherwise emitted at all, the cache has to be built in.

I was just typing up a response similar to @steven_wu 's response... I don't think clang should be in the business of caching the outputs of a prior clang invocation, the build system should and usually does avoid re-executing if the inputs have not changed. Note that this is very different from the caching of objects LTO is doing - because those are not otherwise emitted at all, the cache has to be built in.

And also because even if some of the inputs of the link change, not all backend threads necessarily need to be re-executed and not all native (intermediate) objects will change.

ychen added a comment.Oct 23 2019, 9:17 PM

Thanks for the inputs @steven_wu @tejohnson. Totally agree with the points you brought up. One last thing I'm not quite sure is the caching of -fthin-link-bitcode. It is a -cc1 option since it is a kind of implementation of ThinLTO, right? I'm a little hesitant to begin writing up patches to teach build system/caching tool (I could think of at least three for our workload) to recognize this option because of that. If there are any changes to the option, the same thing needs to be done again. Do you have any thoughts on that? Is the option in use for your workload and do you think it is stable enough to have build systems caching for it? (Another option is to produce -fthin-link-bitcode output post compile time which I assume having total build time impact to some degree).

Thanks for the inputs @steven_wu @tejohnson. Totally agree with the points you brought up. One last thing I'm not quite sure is the caching of -fthin-link-bitcode. It is a -cc1 option since it is a kind of implementation of ThinLTO, right? I'm a little hesitant to begin writing up patches to teach build system/caching tool (I could think of at least three for our workload) to recognize this option because of that. If there are any changes to the option, the same thing needs to be done again. Do you have any thoughts on that? Is the option in use for your workload and do you think it is stable enough to have build systems caching for it? (Another option is to produce -fthin-link-bitcode output post compile time which I assume having total build time impact to some degree).

-fthin-link-bitcode option is used to run distributed thin link. The format is not stable but it is deterministic for a fixed compiler version. You should be able to cache the thin-link-bitcode and expected it to be used only by the same compiler version.

For any build system that implements caching, it must take compiler version into consideration because different compiler will produce different output. I don't think the rule to cache thin-link-bitcode is any different from any other output during the build.

Thanks for the inputs @steven_wu @tejohnson. Totally agree with the points you brought up. One last thing I'm not quite sure is the caching of -fthin-link-bitcode. It is a -cc1 option since it is a kind of implementation of ThinLTO, right? I'm a little hesitant to begin writing up patches to teach build system/caching tool (I could think of at least three for our workload) to recognize this option because of that. If there are any changes to the option, the same thing needs to be done again. Do you have any thoughts on that? Is the option in use for your workload and do you think it is stable enough to have build systems caching for it? (Another option is to produce -fthin-link-bitcode output post compile time which I assume having total build time impact to some degree).

-fthin-link-bitcode option is used to run distributed thin link. The format is not stable but it is deterministic for a fixed compiler version. You should be able to cache the thin-link-bitcode and expected it to be used only by the same compiler version.

For any build system that implements caching, it must take compiler version into consideration because different compiler will produce different output. I don't think the rule to cache thin-link-bitcode is any different from any other output during the build.

I tried ccache, it does not cache -fthin-link-bitcode output.

From this link, it seems ccache only cares about "-o" output.
https://github.com/ccache/ccache/blob/ac9911b47b8a3777d20a3c481f90f877e8f9a81d/src/ccache.cpp#L2616

ychen added a comment.Oct 24 2019, 1:03 AM

Sorry for the confusion @steven_wu. By stable I mean the probability that the -fthin-link-bitcode option is replaced with some other thinlink mechanism under the distributed build environment. Since at least for ccache, the compilation output caching depends on the semantics of options ("-o" is assumed to be compilation output). For the case of -fthin-link-bitcode, both -Xclang -fthin-link-bitcode and -o are the output. I'm not familiar with compiler cache tools, but having the caching depends on a cc1 option feels not right since it is not an option of any other compilers, so most caching tools don't recognize it.

We rely on the minimized bitcode from that option in our builds, so it won't be going away. We could add it as a driver option, but it doesn't sound like that will solve the ccache issue. I'm not very familiar with that cache support, or if there is a way to express other output files (in our build system we identify all the outputs of a build action). At least with something like make and ninja the compile action should not be done again if none of the inputs change.

ccache does not have the support for this, I am just saying that this can be easily implemented in ccache and that would be much better than the proposed solution here.

If we need to add a clang driver flag so build system can better support to detect thin bitcode as output and caching them, we should just add the official driver flag.

ychen added a comment.Oct 24 2019, 2:19 PM

Thank you @steven_wu @tejohnson. I created D69406 to promote the flag to the driver.