Extend clang's SubstTemplateTypeParm to represent the pack substitution index.
Fixes PR56099.
Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Differential D128113
Clang: fix AST representation of expanded template arguments. mizvekov on Jun 17 2022, 7:27 PM. Authored by
Details Extend clang's SubstTemplateTypeParm to represent the pack substitution index. Fixes PR56099. Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Diff Detail
Event Timeline
Comment Actions This corrects a genuine deficiency in the AST, and the patch LGTM. Can we knock this off Matheus' stack? Comment Actions We have a translation unit, on which we see an increase of compilation time and clang memory allocation from 11GB to 14GB. We are working on an isolated case. Comment Actions Thanks for looking into this! What I can imagine may be happening here is that if we instantiate a template with a very large argument pack consisting of mostly the same template arguments, we will create many more SubstTemplateTypeParmType nodes with this patch, as they won't unique so much anymore, because each will have a different pack index. Comment Actions If this is indeed a major problem, and/or if performance is an issue for other of the patches in the stack, maybe the solution is along the lines of what I raised in https://discourse.llvm.org/t/rfc-improving-diagnostics-with-template-specialization-resugaring/64294/5: introduce an option that determines how much sugar we construct in the AST, and modify ASTContext::get*Type(...) accordingly. For now it could have two possible values, 'medium' and 'maximum', and getSubstTemplateTypeParmType(...) could be modified so it always uses PackIndex=0 whenever it is only set to 'medium'. And, for later patches, resugaring would only be enabled when it is set to 'maximum'. (And maybe later a 'minimum' option could be experimented with which disables all sugar except that needed to keep tests passing.) Comment Actions Adding to the comment @joanahalili posted: this particular translation unit happens to have a large number of reverse dependencies and Clang's memory allocation has pushed it over the limits, which makes this issue quite serious. So far I can't say how widespread are large increases in clang heap allocation, which don't lead to exceeding memory limits used in the build setup and thus are less noticeable. However, I suppose that this should affect various clang-based tools like ciderd. @sammccall @kadircet, fyi. Comment Actions clang -fsyntax-only -std=c++17 -fproc-stat-report -Wno-deprecated-declarations -fsized-deallocation -Werror -Wno-deprecated-declarations -Wno-inconsistent-missing-override -Wno-null-conversion -Wno-ignored-attributes -Wno-defaulted-function-deleted -xc++ reproduction.cpp This is the reproducer we managed to create for the memory increase. As mentioned above we notice both a difference in memory and execution time.
Clang version with this patch:
Comment Actions Thanks. I also added a print of the amount of SubstTemplateTypeParmType nodes that were uniqued: Before: SubstTemplateTypeParmType: 751088 clang.exe: output=a.exe, total=46687.500 ms, user=46031.250 ms, mem=7793868 Kb After: SubstTemplateTypeParmType: 41919586 clang.exe: output=a.exe, total=68828.125 ms, user=67968.750 ms, mem=10511992 Kb This DR is so basic that I don't really see reasonable way to avoid this cost except by disabling it with a flag. Though I wonder how useful a SubstTemplateTypeParmType is without the pack index. If that kind of heavy handed meta-programming is reasonable, a flag for disabling substitution sugar entirely does not seem unreasonable... Comment Actions I wonder what is the practical application of the substitution index in SubstTemplateTypeParmType? Diagnostics? Matching AST? Something else? What would be the cost of calculating the index when necessary? Comment Actions The pack index helps you figure out which argument within an argument pack was used in a substitution, much like the ReplacementType has an index which helps you figure out which argument was used. Calculating it should be possible, if we have the whole type which contains the pattern from where the expansion came from. The problem is that if we want to extract a sub-type from the whole, say the type of a certain function argument, then we would have to rebuild that type to include this pack index in any substitution sugar. And that sounds expensive as well. Comment Actions Do you have a practical example that would use the substitution index? I believe you had something in mind before you implemented this patch? Comment Actions Oh, yes, this is needed for / is going to culminate in the implementation of the project described in https://discourse.llvm.org/t/rfc-improving-diagnostics-with-template-specialization-resugaring/64294/11 Comment Actions The whole project seems like a great improvement in clang diagnostics, but I don't yet see how template parameter pack substitution indices fit into the whole picture. If different type parameters in a parameter pack differ only by some other type of suraring, would they be distinguishable in the current state? How do substitution indices affect this? Comment Actions We need the pack index just as much as the index. When we are resugaring some type and we find a SubstTemplateParmType, first we try to find a template specialization in the current naming context (ie a nested name specifier) which refers to the same template as the Subst node. Then we have to rebuild that Subst node using, as the new underlying type, the template argument which appears in the template specialization we found. To find that argument, first we use the index, which already was available in the AST prior to this patch. But in case that index refers to an argument pack, then we need the pack index to be able to find the argument inside that. Comment Actions I more or less understand the general direction, but it would be interesting to look at the specific code that uses this. The main questions we need to answer here are:
Comment Actions
I would be fine though reverting this, and then doing some more investigation, and also perhaps waiting for a second opinion from @rsmith. Comment Actions For now we've added a workaround for the most problematic use case and got us unblocked. Thus there is no need now in introducing additional flags. However, I'm rather concerned about the cumulative additional overhead our build infrastructure gets as a result of this patch. Measuring this would unfortunately be a project of its own, which I'm not ready to commit to. If reverting this and discussing with Richard Smith is an option for you, I'd suggest doing this now and if you end up not finding a good alternative, we would probably need to find a way to estimate the impact of this patch. WDYT? Comment Actions Thanks, and sorry for the long time to respond to this. I have been thinking about this problem, and I have an idea for a solution which is better on the performance side, but more complex change. We would need to implement some pattern matching during AST traversal, and keep track of the pack index there. When extracting a component of the larger type, such as when deducing non-pack arguments from a pack, we would wrap it over with a new sugar type node which encodes the current pack indexes. I reflected a bit on this, and yeah I think we should try to design something which does not potentially hinder large packs so much. I am not sure how this is going to play out in practice, and it's possible we may need to come back to this solution again, but I think it's prudent to revert it for now if we are not sure, as this is an API change, however obscure for most users. As an aside, it would be really helpful if you could track how the compilation of your codebase performs as clang evolves, regardless if we keep this change or not :) Comment Actions I appreciate your effort on designing an alternative solution. Hopefully, it works out well.
Good point about tracking compile time. We do have compiler performance tests running in a controlled environment with a number of measures in place to reduce errors in results, but they are limited to a rather small sample of build targets. Tracking compiler performance (times, memory used) for all of our code and using this as a regression test for compiler performance is impractical due to the size of the code base, rate of changes, and the design of the build system (high level of parallelism, aggressive caching, shared build cluster consisting of machines with different performance characteristics). Comment Actions Two last thoughts: 2: After thinking about it further I don't think the pack index provides sufficiently useful info in any case, since packs will always be expanded in full, in order: when you find the first SubstTemplateTypeParmType expanded from a pack, the rest are sure to be right behind. IIUC I see how including the pack index makes resugaring more straightforward: the substitution of the sugared info for the non-sugared info occurs via a TreeTransform (see clang/lib/Sema/SemaTemplate.cpp in https://reviews.llvm.org/D127695), and by storing the pack index Matheus can simply override TransformSubstTemplateTypeParmType to make use of the pack index to easily fetch the corresponding sugared info. But since the pack indices are predictable given a bird's eye view of the AST, maybe state info can be stored in the TreeTransform to allow the pack index to be inferred in each call to TransformSubstTemplateTypeParmType? Comment Actions Packs are expanded from patterns, which is an arbitrary type that will reference one or more parameter pack anywhere within. So going back from an expanded pack back to a pattern + number of expansions will take more effort and we may need to add new markings to the AST to help with that. Check this example out: https://godbolt.org/z/rsGsM6GrM template <class ...Ts> struct A { template <class ...Us> struct B { using type1 = void ((void (*...fps)(Ts, Us))); }; }; using type2 = A<int, char>::B<short, bool>::type1; TypeAliasDecl 0x55ffe8b45368 <line:8:1, col:45> col:7 type2 'A<int, char>::B<short, bool>::type1':'void ((void (*)(int, short), void (*)(char, bool)))' `-ElaboratedType 0x55ffe8b452f0 'A<int, char>::B<short, bool>::type1' sugar `-TypedefType 0x55ffe8b452d0 'A<int, char>::B<short, bool>::type1' sugar |-TypeAlias 0x55ffe8b45258 'type1' `-FunctionProtoType 0x55ffe8b451e0 'void ((void (*)(int, short), void (*)(char, bool)))' cdecl |-ParenType 0x55ffe8b12150 'void' sugar | `-BuiltinType 0x55ffe8ac6370 'void' |-PointerType 0x55ffe8b44550 'void (*)(int, short)' | `-ParenType 0x55ffe8b444f0 'void (int, short)' sugar | `-FunctionProtoType 0x55ffe8b444b0 'void (int, short)' cdecl | |-BuiltinType 0x55ffe8ac6370 'void' | |-SubstTemplateTypeParmType 0x55ffe8b44310 'int' sugar | | |-TemplateTypeParmType 0x55ffe8b11440 'Ts' dependent contains_unexpanded_pack depth 0 index 0 pack | | | `-TemplateTypeParm 0x55ffe8b113c0 'Ts' | | `-BuiltinType 0x55ffe8ac6410 'int' | `-SubstTemplateTypeParmType 0x55ffe8b443c0 'short' sugar | |-TemplateTypeParmType 0x55ffe8b11960 'Us' dependent contains_unexpanded_pack depth 1 index 0 pack | | `-TemplateTypeParm 0x55ffe8b118d8 'Us' | `-BuiltinType 0x55ffe8ac63f0 'short' `-PointerType 0x55ffe8b450c0 'void (*)(char, bool)' `-ParenType 0x55ffe8b45060 'void (char, bool)' sugar `-FunctionProtoType 0x55ffe8b447d0 'void (char, bool)' cdecl |-BuiltinType 0x55ffe8ac6370 'void' |-SubstTemplateTypeParmType 0x55ffe8b44630 'char' sugar | |-TemplateTypeParmType 0x55ffe8b11440 'Ts' dependent contains_unexpanded_pack depth 0 index 0 pack | | `-TemplateTypeParm 0x55ffe8b113c0 'Ts' | `-BuiltinType 0x55ffe8ac63b0 'char' `-SubstTemplateTypeParmType 0x55ffe8b446e0 'bool' sugar |-TemplateTypeParmType 0x55ffe8b11960 'Us' dependent contains_unexpanded_pack depth 1 index 0 pack | `-TemplateTypeParm 0x55ffe8b118d8 'Us' `-BuiltinType 0x55ffe8ac6390 'bool' And in fact, a given parameter pack might be referenced more than one time in a given pack expansion pattern: template <class ...Ts> struct A { template <class ...Us> struct B { using type1 = void ((void (*...fps)(Ts, Ts, Us))); }; }; Ie those two Ts are referencing the same argument within the pack, we can't confuse that with an expansion. So think also about a case like the above, but where you are performing the expansion just one time. It doesn't look like to me you can figure that out from what Clang leaves behind in the AST at the moment. Comment Actions Great examples.
If it were as simple as the above case the Resugarer TreeTransform could say keep a map of TemplateTypeParmTypes to current pack indices, and iterate those during each each TransformSubstTemplateTypeParmType call, but...
...the different Ts are the same TemplateTypeParmType, so that wouldn't work. You're right, more info is needed.
Or, maybe TemplateTypeParmType could store a number that would make it unique for each *instance* of it in a given expansion? Or just SubstTemplateTypeParmType could store this number in addition to its TemplateTypeParmType? (E.g. the first Ts in an expansion is 0, the second Ts in the same expansion is 1, etc. - but it resets for the next expansion.) Comment Actions Well that number is just the pack_index, as implemented in this current patch :) I like the idea obviously, this patch is so simple, and simple solutions are the best when they are good enough. I am not completely sold that this solution is not good enough, it's hard to make the case that the reproducer is a reasonable program, but at the same time it's obvious that packs where designed to deal very cheaply with very large number of arguments. And this patch does not make the worst case worse: If all arguments are different types in all expansions, then a different pack_index will not cause an extra uniquing because the underlying types will be different as well anyway. Comment Actions I meant that, in an of expansion (Ts, Ts, Us), either the two Ts would be two unique TemplateTypeParmTypes by adding an instanceIndex or whatever to refer to the two different syntactic instances (this would of course naturally result in two unique STTPTs), or for a bit more savings the TTPTs for the two Ts could continue to be the same but their STTPTs would include the instanceIndex, so there is only one unique TTPT but two unique STTPTs referring to Ts. I.e. if there are 100 types in in Ts there would still only be two unique SubstTemplateTypeParmTypes created, as opposed to only 1 as there currently is, or 100 as this patch creates. Any way it is done, the goal would be to make the STTPTs just as unique as necessary so the Resugarer can keep map from the unique STTPTs to its current pack index, and iterate the relevant index with each call to TransformSubstTemplateTypeParmType. If I'm missing something and it is substantially more complex than that, then you're right that storing the pack index might end up the best solution - we don't want to e.g. have to do considerable Sema logic in the Resugarer just to infer the pack index. Comment Actions Disregard my previous comment, you're right, that would be identical. I'm mixing up my STTPTs and TTPTs :). So the proposed solution would be to make the TTPTs unique, and map from the TTPTs to their current pack indices in the Resugarer. But, that is probably identical in terms of memory usage as your proposal to introduce a new sugar type representing unique expansion instances, and the latter is probably clearer/less disruptive. Thanks for your work on this, and for explaining these complexities. Comment Actions Having a new Subst-like node that wraps the whole expanded pattern seems to simplify the case where we need to take the type apart, such as deductions, leaving you without the 'birds-eye' view. Otherwise we would need to dig into the type to figure out what pack indexes we need to preserve in the resulting type. Comment Actions TBH I still don't know what to do here from the project perspective. I have a new deadline now that runs until November. It seems that sitting down, figuring out a solution and coming up with a review-ready patch would take a considerable portion of that time. And then the review could take a year. Meanwhile there is still considerable work remaining on the resugarer, in terms of addressing performance and getting everything into a reviewable state. So I might try leaving this problem aside and not handle packs, for now. This would make us not able to resugar tuples for example, which is very unfortunate. But in the unlikely case there was someone else interested in solving this problem, this could be parallelized. Comment Actions I finally managed to have a talk to @rsmith about this. He thinks that, even if we can't come up with a better solution in time, the benefits of this patch justify the costs observed, as those substitution nodes are pretty useless without a way to index the pack, and having a rich AST is one of Clang's main goals. But he also came up with this nifty idea that this high cost is perhaps coming because we are reindexing the whole pack every time that repro recurses on that disjunction and we consume the first element in the deduction. I will try that sometime soon. Even if I don't see an improvement, any further objections we just re-merge this? PS: That does not disregard the other ideas to further improve this in the future. Another thing he noted that would be important is to also be able to represent in the resulting AST a pack that expanded 0 times. Comment Actions I support that - re-merge for now, and consider ways to reduce costs going forward. I thought this through every which way and could not avoid @mizvekov's conclusion, that the only other way would be another Subst* type node, from which the pack indices could be inferred, but only via non-straightforward code requiring a bird's eye view of the AST. Taking a step back and reconsidering, it's clear the original way - just storing the pack index - is very much preferable to all that complexity. The benefits justify the costs for most users, and for clang developers who would be burdened by the appearance of an obscure new type class. And as I said previously I think broad flags governing how much sugar/non-semantic info is added to the AST, i.e. letting users balance benefits vs. costs for themselves, is the better way to handle these concerns. Comment Actions The reverse indexing worked great for that problem, the patch has no perf impact now on that reproduction.cpp. That doesn't mean it doesn't have any impact in general, or that it can't be further improved, but other solutions will take more time to develop and we can revisit this later. |
I think we should have a test in ASTImporterTest.cpp to make sure we are importing the pack index correctly.