Page MenuHomePhabricator

Add option to instantiate templates already in the PCH
AcceptedPublic

Authored by llunak on Oct 29 2019, 1:58 PM.

Details

Summary

Add -fpch-instantiate-templates which makes template instantiations be performed already in the PCH instead of it being done in every single file that uses the PCH (but every single file will still do it as well in order to handle its own instantiations). I can see 20-30% build time saved with the few tests I've tried.

The change may reorder compiler output and also generated code, but should be generally safe and produce functionally identical code. There are some rare cases that do not compile with it, such as test/PCH/pch-instantiate-templates-forward-decl.cpp. If template instantiation bailed out instead of reporting the error, these instantiations could even be postponed, which would make them work.

Enable this by default for clang-cl. MSVC creates PCHs by compiling them using an empty .cpp file, which means templates are instantiated while building the PCH and so the .h needs to be self-contained, making test/PCH/pch-instantiate-templates-forward-decl.cpp to fail with MSVC anyway. So the option being enables for clang-cl matches this.

Diff Detail

Repository
rC Clang

Event Timeline

llunak created this revision.Oct 29 2019, 1:58 PM
llunak updated this revision to Diff 227575.Nov 2 2019, 4:40 AM

Let's go a different route. This patch fully passes all tests and so should be ready to be committed.

It does so by opting out for OpenMP, which can be handled later whenever somebody who know about OpenMP looks at it.

llunak updated this revision to Diff 229722.Nov 17 2019, 11:22 AM

It turns out that this patch may introduce unwanted changes, specifically it can cause err_specialization_after_instantiation if the specialization is done in a source file but needed already by code in the PCH. But this seems to be rare (I've encountered it only after building the Skia library with put-everything-in-the-PCH), is easy to avoid (PCHs often need little tweaks anyway) and the performance gain is very much worth it.

Still, this means there should be an option to enable this. Done in this version of the patch. I don't know how to handle tests for it, pretty much all the tests pass as said above, but it seems a bit silly to duplicate all the PCH-related ones to also use the flag.

jdoerfert resigned from this revision.Jan 8 2020, 11:29 AM
llunak edited the summary of this revision. (Show Details)Jan 13 2020, 2:02 PM
rnk added a subscriber: rnk.Jan 13 2020, 3:01 PM

I'm interested in making clang do this, but I think this needs significantly more work until this is ready to land. It needs in-tree tests. I assumed we'd want to hook it up to clang-cl /Yc and /Yu.

clang/include/clang/Basic/LangOptions.def
163

I think both sides of a PCH user and creator will need to agree on this, so this isn't a "benign" langopt, is it? This is a regular one, and we should diagnose mismatches between the PCH creation and use.

clang/lib/Sema/Sema.cpp
985–986

This really deserves to be debugged before we land it.

llunak marked 2 inline comments as done.Jan 14 2020, 2:35 AM
In D69585#1818205, @rnk wrote:

I'm interested in making clang do this, but I think this needs significantly more work until this is ready to land. It needs in-tree tests.

What tests specifically? I can add one test that just uses the option and then checks the PCH already contains what it should, but besides that this would need to repeat all PCH tests with this enabled. Should I just do that?

I assumed we'd want to hook it up to clang-cl /Yc and /Yu.

If you mean automatically, then probably not. As said in the description, this leads to an error if a PCH'd template has a specialization that's not at least mentioned in the PCH. That's not a big deal and it's easy to handle, but it's still technically a regression. I myself wouldn't mind and would be fine with making this the default, but I assume (plural) you wouldn't.

clang/include/clang/Basic/LangOptions.def
163

The flag needs to be used only when creating the PCH. Users don't care, they'll run the instantiation too, the flag will just make users do less repeated work.

clang/lib/Sema/Sema.cpp
985–986

I debugged this more than 2 months ago, that's why the OpenMP code owner is added as a reviewer. The initial diff (that I have no idea how to display here) included a suggested fix and the description said this was a WIP and included my findings about it. As far as I can tell the only effect that had was that this patch was sitting here all the time without a single reaction, so if nobody OpenMP related cares then I do not either.

I don't see any crashes in OpenMP tests, other tests just must be updated by the author, if this patch will ever going to be landed. I don't think it is a good idea to have a not fully functional implementation, even guarded by the option.

llunak updated this revision to Diff 238217.Jan 15 2020, 4:20 AM
llunak edited the summary of this revision. (Show Details)
llunak removed a project: Restricted Project.

In order to simplify this, I've updated the patch to remove any reference to OpenMP and I've moved that part to https://reviews.llvm.org/D72759 .

This means that the only thing missing here should be adding tests. Which, as said above, I'm unsure how to do exactly given that it'd probably need duplicating all PCH tests to use this option, so please advise how to do this so that the patch can be accepted.

I don't see any crashes in OpenMP tests, other tests just must be updated by the author, if this patch will ever going to be landed. I don't think it is a good idea to have a not fully functional implementation, even guarded by the option.

Please see https://reviews.llvm.org/D72759 .

aganea added a subscriber: aganea.Jan 15 2020, 7:00 AM

I'm glad you're fixing this, this problem has came up in our profile traces as well.

The only way this breaks things that I've managed to find is if a .cpp file using the PCH adds another template specialization that's not mentioned in the PCH

What is the error?

Could you please add test(s)? We need a test especially for the error you're mentioning. It has to be clear to the user that an error message in the /Yu .obj (because of a specialization) in fact occurs because usage of -pch-instantiate-templates in the pch.obj.

Have you investigated how MSVC handles this? Using a PCH in a OBJ is a lot faster with MSVC, so I'm guessing they have some trick.

clang/lib/Sema/Sema.cpp
987

clang-format please.

988

Why not move this time trace scope inside PerformPendingInstantiations()? (and the other one at L927 too)

llunak updated this revision to Diff 238477.Jan 16 2020, 6:37 AM

What is the error?

I take that part back, actually. I don't quite remember anymore what exactly I did in October, but if I now revert the PCH tweaks in LibreOffice I did to avoid the error, the compilation fails even without the patch or with GCC. So I assume what really happened was that code changes triggered the error and I incorrectly assumed it was because of my patch. So, unless proven otherwise, I take it that my patch is actually technically correct without causing any code regressions (the OpenMP problem has just been fixed).

What remains is those 22 tests which fail because moving the instantiations reorders the code that is expected by FileCheck. This patch handles 2 of them, and it's already rather tedious (the OpenMP tests are large). Is this really the way to handle them, or does somebody have a better idea?

What is the error?

I take that part back, actually. I don't quite remember anymore what exactly I did in October, but if I now revert the PCH tweaks in LibreOffice I did to avoid the error, the compilation fails even without the patch or with GCC. So I assume what really happened was that code changes triggered the error and I incorrectly assumed it was because of my patch. So, unless proven otherwise, I take it that my patch is actually technically correct without causing any code regressions (the OpenMP problem has just been fixed).

What remains is those 22 tests which fail because moving the instantiations reorders the code that is expected by FileCheck. This patch handles 2 of them, and it's already rather tedious (the OpenMP tests are large). Is this really the way to handle them, or does somebody have a better idea?

I thought you were going to add an option or a flag to control the behavior? If so, just provide an option in tests to avoid triggering of the new behavior (except for declare_target... test and those 2 you modified already) and that's it.

I thought you were going to add an option or a flag to control the behavior? If so, just provide an option in tests to avoid triggering of the new behavior (except for declare_target... test and those 2 you modified already) and that's it.

It's not included in the latest version of the patch. As written above, I'm reasonably sure I was mistaken about the need for a flag, and it should be ok to simply do the change unconditionally. I can put the flag back just for the purpose of the tests if you want, that'd certainly make handling of the tests trivial, but then the tests wouldn't really test "normal" PCHs, so do you really want that?

I thought you were going to add an option or a flag to control the behavior? If so, just provide an option in tests to avoid triggering of the new behavior (except for declare_target... test and those 2 you modified already) and that's it.

It's not included in the latest version of the patch. As written above, I'm reasonably sure I was mistaken about the need for a flag, and it should be ok to simply do the change unconditionally. I can put the flag back just for the purpose of the tests if you want, that'd certainly make handling of the tests trivial, but then the tests wouldn't really test "normal" PCHs, so do you really want that?

Of course not. I was just wandering if you still going to use a flag. If you're not going to use it, then there is only one choice - update the tests. The tests are written so to test that emitting/including PCH does not break the codegen. So it should be tested.

aganea added inline comments.Jan 18 2020, 7:55 AM
clang/lib/Sema/Sema.cpp
984

All tests pass if you add if (LangOpts.BuildingPCHWithObjectFile) here. But if a specialization occurs inside a .CPP which includes the .PCH, not sure what would/should happen, and if that'S handled. Definitely needs a test for that. @rsmith WDYT?

llunak added inline comments.Jan 18 2020, 10:46 AM
clang/lib/Sema/Sema.cpp
984

All tests pass if you add if (LangOpts.BuildingPCHWithObjectFile) here.

That is because BuildingPCHWithObjectFile is almost unused by tests, so this just disables the change. But the change should be enabled for everything, it doesn't depend on whether PCH's object file is used.

But if a specialization occurs inside a .CPP which includes the .PCH, not sure what would/should happen, and if that'S handled.

You mean this?

a.h:
#pragma once
template< typename T >
struct A { int foo() const; };
int bar(A<double>* a) { return a->foo(); }
a.cpp:
#include "a.h"
template<> int A<double>::foo() const { return 10; }

That errors out with "explicit specialization of 'foo' after instantiation" with or without PCH used, with or without my patch. I don't know about tests for it though. I can add this as one.

llunak updated this revision to Diff 238956.Jan 18 2020, 10:58 AM
llunak edited the summary of this revision. (Show Details)

Handled all tests which failed because the change moved templates in -emit-llvm output.
Added a test for specialization of a template instantiated in a PCH.

The patch should be ready now.

Adding a ping since it's been a week with no additional feedback for the author.

dblaikie added inline comments.Jan 27 2020, 2:38 PM
clang/lib/Sema/Sema.cpp
985–986

So the original comment said "breaks some Open-MP specific code paths" - what kind of breakage occurs? (& I take it with the patch in its current form that breakage is present in the patch (as opposed to the previous version that conditionalized this feature so it didn't happen when OpenMP was enabled)?)

clang/test/PCH/specialization-after-instantiation.cpp
3

Does this need testing without PCH here? Presumably that's already been tested elsewhere for some time now?

llunak marked 2 inline comments as done.Jan 28 2020, 1:38 AM
llunak added inline comments.
clang/lib/Sema/Sema.cpp
985–986

See https://reviews.llvm.org/D72759 . It's already been handled.

clang/test/PCH/specialization-after-instantiation.cpp
3

Technically not, but it can't hurt (it seems to be the common practice with testing PCHs), and I'm not aware of that elsewhere place (i.e. I've looked and couldn't find it).

llunak updated this revision to Diff 241786.Jan 31 2020, 11:55 AM
llunak edited the summary of this revision. (Show Details)

I've found a corner case, as shown by clang/test/PCH/delayed-pch-instantiate.cpp, where it really is necessary to perform an instantiation in the TU. I've updated the patch to detect this and repeat these instantiations the way they would be without the patch.

Ok, so now :) the patch is ready.

llunak updated this revision to Diff 241868.Feb 1 2020, 12:23 AM

(Removed unintended debug output in tests.)

llunak added a comment.Feb 9 2020, 3:33 AM

Ping again. Is there still something more to do here?

Ping.......

If the current listed reviewers on this patch are not the best people for reviewing this, would one of them please suggest a more appropriate reviewer so we can get some traction on this?

Could you upload this with full context ( https://llvm.org/docs/Phabricator.html#id4 mentions how to do this)?

Does this still have an OpenMP special case? The production code changes don't seem to mention OpenMP anymore, but there's still a lot of test updates for OpenMP - what are they for?

@rsmith could you take a look at this - the pending instantiations stuff isn't something I'm sufficiently familiar with to feel comfortable signing off on this alone.

llunak updated this revision to Diff 250285.Mar 13 2020, 1:02 PM

Does this still have an OpenMP special case? The production code changes don't seem to mention OpenMP anymore, but there's still a lot of test updates for OpenMP - what are they for?

It's been handled by b841b9e96e605bed5a1f9b846a07aae88c65ce02 . The tests need updates because they test compilation both with and without PCH use, and this patch reorders templates in the output, without any functional change.

Ping........

This needs to be done behind a flag. It's an explicit design goal that compilation behavior using a PCH or precompiled preamble behaves identically to compilation not using a PCH / precompiled preamble. The behavior of this patch is also non-conforming: we're only allowed to perform function template instantiations at their points of instantiation (essentially, at points of use + at the end of the translation unit).

I'd be OK with either a (conforming compilation mode) flag that instructs Clang to attempt to perform instantiations at the point of use (assuming the template is defined at that point), or with a (non-conforming) flag that instructs Clang to perform instantiations at the end of the PCH, as it would when compiling a header unit.

This needs to be done behind a flag. It's an explicit design goal that compilation behavior using a PCH or precompiled preamble behaves identically to compilation not using a PCH / precompiled preamble. The behavior of this patch is also non-conforming: we're only allowed to perform function template instantiations at their points of instantiation (essentially, at points of use + at the end of the translation unit).

How is that different from -fmodules? Given that AFAICT this makes PCHs work the same way as modules, this should mean -fmodules also both fails the identical behaviour goal and is non-conforming.

And to make sure I understand correctly, by behaving identically you mean that all the compiler output should be identical without even benign differences?

This needs to be done behind a flag. It's an explicit design goal that compilation behavior using a PCH or precompiled preamble behaves identically to compilation not using a PCH / precompiled preamble. The behavior of this patch is also non-conforming: we're only allowed to perform function template instantiations at their points of instantiation (essentially, at points of use + at the end of the translation unit).

How is that different from -fmodules? Given that AFAICT this makes PCHs work the same way as modules, this should mean -fmodules also both fails the identical behaviour goal and is non-conforming.

Each header module ("header unit" in C++20 terminology) in a modules build is a distinct translation unit, so there's a point of instantiation at the end of it. It's also not a goal of -fmodules nor C++20 modules to produce identical output to a non-modules build; the modules language feature is intended to change the observable language semantics (unlike PCH, which is intended to improve build speed without affecting semantics).

And to make sure I understand correctly, by behaving identically you mean that all the compiler output should be identical without even benign differences?

If by "benign differences" you mean things like different (but still correct) diagnostic output, then yes, that's the goal. I think producing the same diagnostics in a different order might be OK, though. For example, in an IDE, we can automatically precompile a header or a preamble to make interactive use more efficient, without changing the observable behavior of the compiler. We don't want a different diagnostic experience for interactive use versus standalone compilation.

In addition to the "explicit specialization after instantiation" problem you identified, attempting instantiation at the end of a preamble or PCH would also break things like this:

template<typename T> void f();
struct X;
void g() { f<X>(); } // instantiation not performed yet

template<typename T> void f() { T t; };
// -- end of preamble or PCH --
struct X {};

Here, instantiation at the end of TU is valid, but instantiation at the end of preamble / PCH would reject this valid code, because we'd instantiate f<X> while X is incomplete.

Trass3r added a subscriber: Trass3r.Apr 2 2020, 1:38 AM
ychen added a subscriber: ychen.Apr 6 2020, 9:03 PM
niosHD added a subscriber: niosHD.Apr 7 2020, 1:51 AM
llunak updated this revision to Diff 258605.Apr 19 2020, 9:06 AM
llunak retitled this revision from PerformPendingInstatiations() already in the PCH to Add option to instantiate templates already in the PCH.
llunak edited the summary of this revision. (Show Details)

Changed to use -fpch-instantiate-templates to control the feature.

Ping.........

rnk accepted this revision.Wed, May 20, 4:35 PM

lgtm

IMO there is a pretty clear performance use case for this mode of operation, and it seems to me that you have addressed @rsmith's feedback. Please wait a few days to see if he has more to add, but otherwise, feel free to land this after Friday.

clang/lib/Driver/ToolChains/Clang.cpp
5625

Does MSVC default to this behavior? Should this default to true with clang-cl /Yu / /Yc? This can be future work and does not need to be part of this patch.

clang/test/PCH/crash-12631281.cpp
4

I wish we had a better solution for parameterizing tests like this, but thank you for making sure this new mode is covered by the existing tests. It looks like you found some interesting issues.

This revision is now accepted and ready to land.Wed, May 20, 4:35 PM
llunak marked an inline comment as done.Sat, May 23, 8:38 AM
llunak added inline comments.
clang/lib/Driver/ToolChains/Clang.cpp
5625

Since MSVC is noticeably faster for an equivalent PCH compile than current Clang, presumably it instantiates templates already in the PCH. But that doesn't really matter for this patch, if it were ok to enable this by default for clang-cl, than it would be ok also for clang itself. That cannot be done now though, https://reviews.llvm.org/D69585#1946765 points out a corner case where this change makes a valid compilation error out, and that's the reason for having this behind a flag. I expect Clang could possibly be adjusted to bail out and delay template instantiantion in such a case until it can be performed successfully, but given the response rate to my PCH patches I first wanted to get the feature in somehow, and I can try to make the flag default/irrelevant later.

rnk added inline comments.Sat, May 23, 9:08 AM
clang/lib/Driver/ToolChains/Clang.cpp
5625

Right, I guess what I mean is, for that example where instantiation at the end of the PCH creates an error, does MSVC emit an error? I just checked, and it looks like the answer is yes, so it seems like we could make this the default behavior for clang-cl /Yc without much further discussion.

llunak marked an inline comment as done.Sun, May 24, 3:30 AM
llunak added inline comments.
clang/lib/Driver/ToolChains/Clang.cpp
5625

That's a good point. Yes, since MSVC creates PCHs by basically compiling an empty .cpp, it apparently does instantiate templates as a side-effect of that, and https://reviews.llvm.org/D69585#1946765 indeed doesn't work with MSVC. So no harm in enabling the option for clang-cl.

I'll update the patch.

llunak updated this revision to Diff 265918.Sun, May 24, 3:32 AM
llunak edited the summary of this revision. (Show Details)

Enabled the option by default for clang-cl to match MSVC.