When printing NTTP template types, ensure that type name of the NTTP is
printed.
This should fix issue reported here: https://github.com/llvm/llvm-project/issues/57562
Differential D134453
Disambiguate type names when printing NTTP types DoDoENT on Sep 22 2022, 9:51 AM. Authored by
Details When printing NTTP template types, ensure that type name of the NTTP is This should fix issue reported here: https://github.com/llvm/llvm-project/issues/57562
Diff Detail
Event Timeline
Comment Actions Moving this to a top level discussion because it's easier to write replies here than in an inline comment.
Sorry, I'm confused - it sounded like you didn't actually want a string, though - you want the type information to make various semantic-aware choices in your tool, yes?
Generally the way to do it, if you want to introspect into the type, its template parameters, etc, then yes, keeping pointers to the AST or reparsing the name with Clang as-needed, would be the way to go - or walking the AST up-front and generating your own data structure (not necessarily a string) with the data you want in some structured format. Relying on the string printing of Clang to produce enough introspective information for a static analysis tool I think would be at odds with other tradeoffs that stringification is trying to achieve.
Again, I'm not advocating for the printing as-is, I think adding the top level name that disambiguates would be a good thing - and I think the GCC and MSVC examples somewhat show why adding all the other layers would be harmful to readability - there's a lot of text in those messages that doesn't directly help the user and gets in the way of identifying the important differences between the type names. For instance, let's take the GCC message: <source>: In function 'void foo()': <source>:25:9: error: no match for 'operator=' (operand types are 'NDArray<float, Width{DimensionImpl<Width>{Dimension{0}}}>' and 'NDArray<float, Height{DimensionImpl<Height>{Dimension{0}}}>') 25 | a = b; | ^ <source>:2:8: note: candidate: 'constexpr NDArray<float, Width{DimensionImpl<Width>{Dimension{0}}}>& NDArray<float, Width{DimensionImpl<Width>{Dimension{0}}}>::operator=(const NDArray<float, Width{DimensionImpl<Width>{Dimension{0}}}>&)' 2 | struct NDArray {}; | ^~~~~~~ <source>:2:8: note: no known conversion for argument 1 from 'NDArray<float, Height{DimensionImpl<Height>{Dimension{0}}}>' to 'const NDArray<float, Width{DimensionImpl<Width>{Dimension{0}}}>&' <source>:2:8: note: candidate: 'constexpr NDArray<float, Width{DimensionImpl<Width>{Dimension{0}}}>& NDArray<float, Width{DimensionImpl<Width>{Dimension{0}}}>::operator=(NDArray<float, Width{DimensionImpl<Width>{Dimension{0}}}>&&)' <source>:2:8: note: no known conversion for argument 1 from 'NDArray<float, Height{DimensionImpl<Height>{Dimension{0}}}>' to 'NDArray<float, Width{DimensionImpl<Width>{Dimension{0}}}>&&' And compare it to: <source>: In function 'void foo()': <source>:25:9: error: no match for 'operator=' (operand types are 'NDArray<float, Width{{{0}}}>' and 'NDArray<float, Height{{{0}}}>') 25 | a = b; | ^ <source>:2:8: note: candidate: 'constexpr NDArray<float, Width{{{0}}}>& NDArray<float, Width{{{0}}}>::operator=(const NDArray<float, Width{{{0}}}>&)' 2 | struct NDArray {}; | ^~~~~~~ <source>:2:8: note: no known conversion for argument 1 from 'NDArray<float, Height{{{0}}}>' to 'const NDArray<float, Width{{{0}}}>&' <source>:2:8: note: candidate: 'constexpr NDArray<float, Width{{{0}}}>& NDArray<float, Width{{{0}}}>::operator=(NDArray<float, Width{{{0}}}>&&)' <source>:2:8: note: no known conversion for argument 1 from 'NDArray<float, Height{{{0}}}>' to 'NDArray<float, Width{{{0}}}>&&' That seems significantly easier to read/easier to identify the relevant difference for the developer. Comment Actions
This is precisely what I'm trying to avoid. Source files that include AST-related headers are extremely slow to compile, so I want to build a facade around the code that interacts with clang to keep my build times as low as possible. Thus, I will pay for the slow compilation only in a couple of files.
At the end of the line, I will need a string representation of the type name, at least in my current approach.
I think this is a matter of taste. In the example that you've shown, I personally prefer the verbosity of GCC and don't see it as "less readable", but as "more informative". However, I do understand that some people may prefer the output you suggested. What about making this configurable, i.e. behind some clang flag, so that developers that prefer verbosity can enable that? Comment Actions Right - I'm trying to understand what your overall goals are and what clang needs to facilitate them, if it's suitable to do so. And it seems to me that the right way to support them, is for your approach to change - not to rely on the type printing to communicate certain information, but to use the AST to get the information you need (& if you want to encode that in a string, that's up to you - I'd suggest encoding it in some more semantic data structure (a struct, with variables, lists/vectors, etc, describing the properties you want - so you don't have to rely on what clang does/doesn't include in the text and have to parse that information out of the text later on with your own home-grown C++ type parsing code, I guess)).
I don't think that's the right choice for clang - though I'm not the only decider here. Be great to hear from @aaron.ballman at some point. My perspective on this issue at the moment is that we should fix any case where we print an ambiguous type (so template<auto A> struct t1; should never produce t1<{}> and should instnead produce t1<t2{}>) - but I don't think extra {} should be added where the language doesn't require it and where it isn't necessary to disambiguate. Comment Actions Whenever possible, interrogating the AST directly should be preferred to using strings to hold the data.
My thinking is along a somewhat different line of approach. Clang has -ast-print which is intended to pretty print the AST back to the original source code without losing information. It is a -cc1 option and not a driver option (though we do document it a little bit here https://releases.llvm.org/15.0.0/tools/clang/docs/HowToSetupToolingForLLVM.html), so it's not something we expose trivially to users. This option has always been aspirational -- we'd love it to pretty print back to the original source, but we know there are plenty of cases where we don't manage it and we typically don't require people to maintain it except superficially. Now that we have clang-format, I think its utility is largely theoretical in some ways. Despite its known issues, I am aware of at least some users who make use of the functionality (for various reasons) and I am not 100% convinced we'll get community agreement to drop support for it as a failed experiment despite my personal feelings that the functionality is more trouble than it's worth, especially because I'm reasonably certain some other functionality relies on it (ARC rewriting maybe?) and we'd have to figure out what to do with that. So my thinking is: if we're going to have -ast-print, we should maintain it, and part of maintaining it means having the ability for it to print this code back to compilable source with the same semantics as the original source. Our current behavior with the example is... not particularly fantastic: https://godbolt.org/z/88eK3q869. Alternatively, we could remove -ast-print as an option and narrow its focus to only the internal uses we need it for. Comment Actions
Hmm - that seems to be to be somewhat of a tangent, though? I think the claim that template<auto> struct t1; struct t2 { }; t1<t2{}> v1; and then rendering t1<{}> in a diagnostic is wrong is solid, and we should fix that for auto parameters to include the type info that'd be necessary in the source code/to disambiguate. (I guess C++20 actually requires the name even in non-ambiguous, non-auto cases - so I'd be OK leaning towards the syntax requirement, even if it's not necessary to disambiguate (maybe there are cases where it is?). You're suggesting that -ast-print should respect the type as-written by the user, even? Not just "enough to be valid C++ code/disambiguate" - and that mode might be enough to support @DoDoENT's use case using strings for type information? (more addressing this feature request, less about the incidental diagnostic quality bug report that's a side issue in this discussion) Comment Actions
I've got a machine-generated c++ function in the form: c++ template< typename Pixel > auto processImage( Image< Pixel > const & input ) { auto result0{ processor.step1( input ) }; auto result1{ processor.step2( result0 ) }; ... auto resultN{ processor.stepNplus1( resultNminus1, resultK, resultM, ... ) }; // for K, M, ... < N - 1 return resultN; } The processor is a graph processor that has steps that perform node computations (imagine it as neural network layers in machine learning inference). Now, this works fine, but produces unnecessarily large binary, as types of most results don't depend on the Pixel template, but get instantiated anyway, thus making both binary larger and source slower to compile. The idea of my tool is to use a clang to obtain concrete types of every result0, result1, ..., resultN and then decide whether its type depends on Pixel or not. Then use that information to cut the function into two functions - one that contains part of processing that depends on Pixel and the one that does not. The one that does not will be a normal function, while the one that does will remain in the function template and will call to the other after performing part of processing that depends on Pixel. So, my idea is to obtain the types of each result as strings and then simply search for the keyword "Pixel"` in those strings to classify each result type. This approach works pretty well with my patch. Also, I find that much easier than navigating the AST. But in general, you both are right - this tool could possibly be implemented differently and probably will in the future. Now, let's keep our focus on verbosity in the diagnostic outputs, as this was the main reason I contributed this patch back to the upstream LLVM. Comment Actions I'm mostly looking at it from the perspective of "is it reasonable to add a new printing policy here and if so, what do we need from it" first and then "when is it reasonable to enable that printing policy". I think the presence of -ast-print answers the first question that we probably do need *some* sort of printing policy here because we believe we have at least two scenarios to support: print for a diagnostic and print for the pretty printer. We might even find we want a verbosity level instead of a boolean switch: print minimal type info, print "disambiguating" type info, print all type info, though it sort of sounds to me like we only need two modes (disambiguating mode for diagnostics and all type info for pretty printing/@DoDoENT). (I think the original goal of -ast-print was to respect the code as-written by the user, but I am not convinced we need that strict of a goal. I think "produces code which compiles and has the same semantics as the original source" is a more reasonable goal and once we can do that much we can think about improving fidelity more if we need to.) Comment Actions Ah, OK - I'm not sure that use case motivates a different printing mode, though - skipping to the bottom to continue this thought.
OK, so it sounds like the printing behavior change (not necessarily with a policy flag) necessary for diagnostics (make it non-ambiguous/not invalid code which is what is currently emitted) would be a good bug fix both for diagnostics and for ast-print? I'm not sure we need anything other than that. So that what is currently printing as t1<{}> (ambiguous if the NTTP is of type auto, non-ambiguous, but invalid C++ if the NTTP has the specific type already) prints out as t1<t2{}> (correct C++ and disambiguates in the case of auto ambiguity). Or would you want the non-auto case to print in diagnostics as t1<{}> since there's no ambiguity, despite that not being valid C++? And then have a separate policy for ast-print that still prints it as t1<t2{}> so it's valid code? Comment Actions Agreed, fixing up diagnostics and ast print to not print invalid types would be a good fix to have. It doesn't necessitate a new printing policy.
Personally (and others are welcome to disagree with me here!), I think we would want to print t1<t2{}> even in the auto case for diagnostics and ast print. For diagnostics, we sometimes print type information reasonably far away from where the declaration of a type is (and use a note to shift the user's gaze to the related type), and so I think having the extra type information is useful for that situation. But even when we print the type information reasonably close to the declaration of the type, it can be helpful because the code context might be easy to lose -- consider using Clang from an IDE where diagnostics are added to a listbox; if you copy the diagnostic from the listbox to paste into an email to a coworker to ask about it, the related code isn't going to come along without extra intervention. Comment Actions
(sorry to nitpick, but this conversation's been a bit confusing so trying to be extra clear) - I guess you meant "even in the *non*-auto" case? (the auto case I think is clearer that it should always have the type - otherwise it's ambiguous) - but yeah, I'm with you - I think the language requires the type even in the non-auto case and it seems reasonable to side with that for explicitness/clarity even if it could arguably be omitted without loss of information, technically.
Ok, so sounds like we're on the same page that t1<{}> is a diagnostic quality bug and we'd love to see a fix for it to include the top level type of the NTTP, as in t1<t2{}>, and that shouldn't need a new printing policy - because we never want to print t1<{}> and anywhere we do is a bug to be fixed. Comment Actions No apologies necessary, sorry for the confusion! Yes, I meant the non-auto case (practically, I mean both cases).
I think we're on the same page, yes, thanks! Comment Actions Implemented changes requested in the review:
Comment Actions Update comment/documentation of the new policy to correctly reflect the change in previous diff. Comment Actions I still don't think "keep full NTTP type printing behind a policy, for those that want/need that" is a policy we should add. String printed names aren't meant to be a tool for type reflection - the AST can be queried for that information.
Comment Actions I agree on that matter. However, I'm building my clang with this policy enabled by default to provide my developers with GCC/MSVC-like verbosity in diagnostic messages. They prefer it that way. Comment Actions @dblaikie -- but I thought we agreed that we would always print the full type, so the policy in this case is to print information *less* suitable for type reflection, right? However, given that we don't think we'll use that policy in the tree, can that policy can be kept in your downstream instead @DoDoENT?
Comment Actions I'm confused - I'll try to say some things that might lead to less confusion, but I'm far from sure. (I'm confused by the question/statement above by itself - "agreed we would always print the full type" + "print information *less* suitable for type reflection". Are those meant to go together? I think printing the "full" type (that might be ambiguous/unclear what full means in this conversation, but best guess) would make something more suitable for type reflection? (type reflection in the sense of "I can look at/analyze/split up the string and see type names that may be useful for some tooling purpose")) My understanding is that there are 3 possible ways of printing under discussion (all demonstrated here: https://godbolt.org/z/1s8Mf6b8K ): struct t1 { int v1; }; struct t2 { t1 v1; }; template<auto> struct t3 { };
Clang produces (3) in some cases, and (2) in others. When you say "always print the full type" - it sounds like (1)? But I thought/my preference has been that we should always use (2) because (1) can be quite verbose. The patch I think currently makes Clang use (2) more consistently, and provides a PrintingPolicy flag to do (1). I think the fix to use (2) more consistently is good, but I don't think we should add functionality for (1).
That would tend to be my conclusion/preference at this point. But open to your perspective/preferences/etc, @aaron.ballman Comment Actions Ah, and I thought/my preference was that we should always use (1) because (2) requires the reader to infer information that may not be obvious from local context.
I was thinking we should use (1) consistently and not add the functionality for (2). Huttah, we found the confusion! :-D
I got my perspective, perhaps incorrectly, from this bit of the conversation: I think my confusion came from <{}> -- I was interpreting that as meaning any braces without a type preceding them, which t2{} would resolve. I think you intended me to read that as empty braces without a preceding type, specifically. I can sympathize with the argument that brevity can be good thing, but I think that biases towards expert users over novice users of the language in terms of making the information understandable, at least in some small way. I tend to come down on the side of thinking verbosity is better than terseness for non-expert users because I think it's easier for humans to ignore information they think is irrelevant compared to correctly inferring information where it's missing, but I can also see the argument that you can overwhelm people with too much information. Comment Actions @aaron.ballman: thanks for tracking down the source of confusion between our perspectives. (& yeah, sorry, could've included less ambiguous examples with the extra nesting so we'd have avoided all that confusion) Do you have particular examples where you find the fully explicit (1) especially helpful (to a novice or otherwise) over (2)? It seems to me once you've got the top level type you have enough info to compare two things, even if you don't know the nested member types - "X<Y{{3}, {5}}>" is distinct from "X<Y{{4}, {5}}>" it doesn't seem to matter what the intermediate type is? but yeah, I guess it raises the question "is it that the length or the height is mismatched here" and more explicit would avoid that question. It does seem unfortunately verbose to my mind - like if we improved the printer to show only the relevant types when diffing template names, for instance, that'd be an improvement (though I guess we already do fancy things for template type diffing that don't show complete copy-paste-into-source usable names anyway, so that's a different situation - and doesn't necessarily say what we should do when the type appears alone in a diagnostic, not as a comparison) *shrug* Sorry to you both for all the talking/back and forth then. Adding the fully explicit mode will address @DoDoENT's string-based reflection needs and @aaron.ballman so there's no tension between Clang's diagnostic goals and @DoDoENT's reflection goals. Comment Actions No worries, text is such a fantastic medium for confusion. :-D
That's largely the kind of situation I was imagining. When you just have a pile of integers and braces, you have to mentally map the braces to how many levels of initialization exist and you have to map the integers to what specifically is being initialized. Those mappings gives very little context as to what types are involved. e.g., it's easy to see that {{3}, {5}} doesn't match {{4}, {5}} in the text, but it's less easy to see if that's because of Width vs Height problems or because there was a surprise specialization involved, etc. Sometimes you don't need that extra information and sometimes it may be helpful, and I'm not certain we can heuristically tell those situations apart for the user.
Yeah, it'd be nice to find a happy medium. CC @cjdb due to the exploratory work he's doing with improving the way we display diagnostics. Maybe Chris has a good idea here (or can help us resolve which direction to go)?
No worries about the back and forth (at least for me) -- I appreciate that we're taking the time to carefully consider the user experience. In terms of next steps, I think we should try to see if there's a "clear" path forward in terms of printing the types vs not printing the types. If we find one, then great, we go that way. But if we don't seem to have a clear path forward (relatively quickly; I don't think we want to drag this discussion on for months trying to find the perfect answer), then I think I'm fine with the patch as-is. It fixes the issue of t<{}> (with empty braces specifically) while retaining the status quo in other areas, but still exposes useful functionality through the additional printing policy. Does that sound reasonable? Comment Actions
If you reckon (1) is better overall anyway - happy enough to defer to your opinion there and go with that, skip/omit the printing policy. I think the policy is like adding off-by-default warnings, and supporting a use case (using the string name for reflection when we'd recommend the AST) I don't think we want to encourage/support. Admittedly going with (1) means that @DoDoENT's use case will then work, until/unless we come back around and make more strategic use of type names in this printing in the future to bring down the verbosity - so I'd still discourage that use case & warn that this isn't a guarantee that all type names will be included going forward. Comment Actions Okay, thanks for that! I'm still happy to consider alternatives if we can think of any, FWIW.
That's a fair point. So if we find no better approach, drop the policy and just always print types.
Absolutely agreed! IMO, whatever strings you get out of something with a printing policy generally have very little/no stability guarantees (it's part of the C++ API surface, so it's exactly as stable as any other C++ API). We'll change printing behavior whenever we feel the need to. Comment Actions Yeah - I don't think I have any better ideas that are reasonably within reach for a newer contributor nor worth holding up fixing the t1<{}> bug for. Yeah, for type comparisons/fancy template type diffing we could trim things down if we aren't already(we probably aren't), but that doesn't address the cases where the name appears alone and not as part of a comparison. And when there's no particular context, I can't think of any cues we could use to decide which nested names "matter". (rabbit hole/tangent: Arguably what might be more useful to the user might be a name that's not even usable in C++ - using the member variable names, as well as the type names, though that'd be even more verbose... like t1<t2{.length=3, .width=7}> - which, I guess, if you had user defined types for some kind of extra type safety, would get as verbose as t1<Shape{Length{.length = 3}, Width{.width = 5}}> though I would expect that would be the minority) Comment Actions Correction, apparently that is valid syntax, which is nice: https://godbolt.org/z/xaYYo3fva Comment Actions
True, but if user defined type is defined as CRTP (very common for strong typedefs): template< typename T, typename U > struct StrongTypedef { T value; }; struct Length : StrongTypedef< unsigned, Length > {}; struct Width : StrongTypedef< unsigned, Width > {}; struct Shape { Length length; Width width; }; template< Shape > struct S {}; you would get something like S<Shape{.length = Length{StrongTypedef<unsigned int, Length>{.value = 50}}, .width = Width{StrongTypedef<unsigned int, Width>{.value = 70}}}> (inspired by this GCC output), which is truly verbose. However, the current way of printing (assuming member names are printed) it would print something like S<{.length = {{.value = 50}}, .width = {{.value = 70}}}>. Ideally, in this scenario, probably the best possible output would be S<Shape{.length = Length{{.value = 50}}, .width = Width{{.value = 70}}}>. however, I'm not exactly sure how to achieve this (my patch would print Shape, but not Length and Width with my policy disabled. Personally, I prefer full verbosity in all cases. Yes, it's a bit more to read, but does not confuse our less experienced c++ developers. This is why our internal build of clang has this enabled by default and I would be actually very happy if this patch gets accepted in a form where full types are always printed without the need for any new policies. If you are OK with that, I can update the patch. Just, please, let me know.
Comment Actions
Yeah, considering this dimension I'd have thought /just/ printing the member names would be more informative than printing the type (but no doubt folks might have different opinions) - and I guess keeping the top level type so the identifier is valid code (not a strict necessity, but seems nice to have). That way a type /without/ strong typedefs is more readable S<Shape{.length = 50, .width = 70}> and the strong typedefs would be S<Shape{.length = {{.value = 50}}, .width = {{.value = 70}}}>. Rather than S<Shape{50, 70}> V S<Shape{Width{StrongTypedef<unsigned int, Width>{50}, Length{StrongTypedef<unsigned int, Length>{70}}>. But yeah, not sure/open to perspectives. @aaron.ballman - member names V type names V both?
Comment Actions FWIW, i find the GCC diagnostics (and the application of this patch) to be much more clear/easy to read. The pile of { and } don't really look useful/readable/meaningful to me, and leaves us ambiguous, so I'm in favor of this. As far as the over-verbosity, I would be interested to see if we can do some additional work to make sure we use type aliases whenever possible, I believe @mizvekov has done some work in that direction, so perhaps he has commetns? Comment Actions I think type names are really the only thing that will disambiguate the expressions in some circumstances. Consider the case where there are two unrelated classes both with the same member name (like Person::name and Vehicle::name) -- having: S<Shape{.obj = {.name = "foo"}}> would still leave the reader guessing as to what object is actually being created without having to go chase the information down themselves. That said, I am also curious to hear if @mizvekov has thoughts on this topic because he's been doing excellent work on template readability. Comment Actions I'm not suggesting we go with anything ambiguous - so including the top level type (especially for auto template parameters where omitting it could be genuinely ambiguous, but even for non-auto where omitting the type would produce an invalid identifier, but not be ambiguous) seems like a good thing to me. It's a question of whether we include the nested types, or the nested variable names, both, neither, or something else. But the top level type is specified - that's an unambiguous identifier (Shape). I think there'd be cases where only using the type name (and not the member name) is ambiguous - probably more often. Like I have a struct with a couple of strings and a few ints. Including the names of the string type twice and the int type three times is probably not very informative - but including the member variable names is more likely to be informative to the user within the domain of the outer type information (this is what debuggers tend to do, for instance - print the member names, not their type names). We can do both, but that seems really verbose. Maybe that's the right thing though?
Comment Actions I think the type printer's purpose is for printing types for diagnostics, not for generating a lossless representation of types as strings, as the ast dumper does. When printing types for diagnostics we roughly want to:
I think that for the motivating example, we completely fail to provide 1), we don't have a TypeLoc, and we only have a canonical type. For the short term, we have canonical types, and we need the type printer to deal with them reasonably. But by fixing 1), we have more options. We can do something like: no known conversion from 'NDArray<[...], W (aka 'Width{}')>' to 'const NDArray<[...], H (aka 'Height{}')>' for 1st argument Or perhaps in addition to the aka, we can generate notes which place a caret in source code for each of W and H. Comment Actions Could you describe this in more detail? I find that to be more verbose than is necessary/helpful and probably the variable name based option to be more informative than the type name based option (if I have Shape{3, 5} (because I have two int members) that's less helpful than Shape{.length = 3, .width = 5} and all the type information seems pretty large/verbose. & given the example: NDArray<float, Height{DimensionImpl<Height>{Dimension{0}}}, Width{DimensionImpl<Width>{Dimension{0}}}, Channels{DimensionImpl<Channels>{Dimension{0}}}> V NDArray<float, Height{{{0}}}, Width{{{0}}}, Channels{{{0}}}> The former seems to not add a lot of value for the user - in this case due to the CRTP and other shenanigans. Admittedly other types might have nested aggregates that are more interesting/less obvious - but variable names still seem to me like they'd be more informative than types. (what I'm suggesting is the outermost has a type (since there's nothing else to use there, and C++ requires a type there) and nested things could use variable names (because they're more contextually relevant - differentiating a width and height, for instance, even though they might be the same type))
That feels liable to be /really/ verbose & make it harder to find the signal in all that noise, but maybe there's ways to make that work. Certainly having the type as written gives us more choices about how to do it/see what can be made to work, for sure. Comment Actions I agree printing all of the types of all of the subobjects is too much. I wouldn't object to what you are suggesting, it seems that if all we had was the canonical type of the NTTP, and no idea of what would be helpful for this diagnostic, then printing it in this designated-initializer style seems more readable, for reasonably small objects. Though if we had the as-written type, I think the best option here is just printing the expression that initializes it, it's type, and perhaps if it was simple declref, pointing to the declaration, but that is probably situational.
Yeah, that definitely looks better than all the other options, if all we had was the NTTP value.
It seems to me that, at least for this diagnostic, and probably most others, just printing the expression that initializes the NTTP and it's type is enough. Comment Actions Ah, OK.
I think we should restrict the discussion in this review to being only about the case where we have the canonical type (acknowledging that there's broader work to reduce the number of cases where that is what we have to deal with). When you say "for reasonably small objects" are you suggesting this patch/an nearish-term solution to the original problem presented here (t1<{}> with no info about the type of the NTTP) should include some heuristic to judge "reasonably small objects" and choose a different rendering in that case?
Given this review's already gotten fairly muddled - hopefully we can defer the nuances of what to do when we have the type as-written for other threads/discussions that are addressing that case.
OK, I'm not sure what to make of this, though - "the expression that initializes the NTTP" is something we don't have, right? Just the value, I think, at this point? So we're trying to pick some kind of rendering for that value - is it just the outer type and then braces and values (Using the "bad"/difficult case of nested subobjects - t1{{3}, {4}}) or with nested types (t1{t2{3}, t3{4]}) or with member names (t1{.v1 = {.v2 = 3}, .v3 = {.v4 = 4}}) or both (t1{.v1 = t2{.v2 = 3}, .v3 = t3{.v4 = 4}}). I guess you meant the first, t1{{3}, {4}}? OK, so I think you're agreeing with what I would prefer/outlined in https://reviews.llvm.org/D134453#3871251 (which isn't where this review was going when @aaron.ballman roped you in, it was going towards fully explicit (t1{t2{3}, t3{4}})) Comment Actions I agree that we can leave that aside for this patch.
I don't think we necessarily have to do something about that from the start, but it's probably a problem that is going to come up at some point.
I am a bit torn actually, something like t1{.name = "Bar"} does look better ideally, but it's probably more prone to bad corner cases. Will probably not work well for tuples. The second option, t1{{3}, {4}}, looks more well-rounded. The fully explicit t1{t2{3}, t3{4}} option seems like it would be both less readable and also even more prone to bad corner cases than the first (field names) option. Comment Actions +1 :-)
I also think this form will get chatty in practice. It's not that uncommon for real world code to have quite a few members in a structure, so I could see t1{.name = "Bar"} being reasonable while t1{.name = "Bar", .age = 183, .height = -12, .weight = 1.2f, .eye_color = Eyes::Chartreuse, .smell = Smells::LikeTeenSpirit } being a far less reasonable thing to print. That said, we could perhaps have a cut point so we print with the member names if there's < N of them and print without member names otherwise. But I'm not certain it's worth it.
I can live with this approach if folks think it's the better way to go. But my concern still remains that the user has no idea what's being constructed by {3} or {4} without tracking down which constructor of t1 is being called. Because of overloads, std::initializer_list, and conversion operators (and the fact that C++ initialization rules are a bit of a disaster), I think this can be unobvious sometimes without including type names. That said, after playing around a bit, I think those times might involve more contorted code than others are worried about, and so maybe it's not worth worrying about those cases. The fact that Clang is the only C++ compiler that doesn't print full type information makes me wary though. My thinking is that it's always more frustrating to have not enough information in a diagnostic than too much information (both are problems in their own ways though).
I can understand it being less readable for deep nesting situations, but I do not see why it would be prone to bad corner cases -- it's the most explicit form we can write (and matches the behavior of all other C++ compilers, from what we can tell). Comment Actions
I agree with @aaron.ballman. There is no ideal solution here, and I would too prefer to have too much information rather than not enough. GCC and MSVC are quite fine with printing full type names and it's up to IDE's and good editors and CI log parsers to parse that and present in a possibly better format. If others would agree with that as well, I'll be happy to remove the policy from my PR completely and simply make clang print full explicit types always, just like GCC and MSVC do. Comment Actions I think the problem here is that since we don't have the as-written argument, we just have the converted argument which is an APValue with the state of the object. So we might print it with a syntax as if the class was a simple pod type, even though that might produce a completely different result if used in practice. That might be a point in favor of the designated-initializer style with the field names, that should never produce the wrong result if copy-pasted back to code.
The types of the fields might be internal, or otherwise too distant from user code. Comment Actions
Lots of folks use the diagnostic info from their compiler without IDEs or log parsers doing anything else to them - we should be trying to design these things for usability directly. (SARIF and such are probably the direction to go when producing diagnostics intended to be massaged by tooling in various ways - could provide collapsed views of template type names or these struct literals - comparitive highlighting views (though we do some of that for template type diffing in clang today - maybe one day we'll do that/expand it to cover the struct literals too))
Oh, yeah, I hadn't considered that. :/ that is awkward & something I would certainly place some value in avoiding (was one of the issues I was motivated by in the earlier discussion, that it'd be useful if we produced an identifier that also happened to be valid code - but I wasn't considering the case where we might produce something valid-but-incorrect, just valid being better than invalid - but yeah, invalid being better than valid-but-incorrect). Comment Actions +1 to the point that we want to design our diagnostics to be usable without requiring another tool, and +1 to using SARIF for when we want to pass our diagnostics to another tool which modifies them somehow.
Yeah, I would place value in avoiding that outcome as well. Also, the point by @mizvekov that the types may be internal types is kind of compelling to me as well; I was thinking that the types involved are most often going to be ones users are aware of, but that's certainly not true for highly generic code like STL implementations. In the interests of getting this review unstuck, I think we should 1) remove the printing policy option (which should address the concerns from @dblaikie about use of that policy for introspection purposes); the policy is something that @DoDoENT can carry in their downstream, 2) keep the changes in TemplateBase.cpp and drop the changes from APValue.cpp. That gives us an incremental improvement over what the status quo is, and we can debate other improvements (if we can think of any we'd like to make) separately. Is everyone comfortable with this? Comment Actions
This would have the effect of implementing (2) from here ( https://reviews.llvm.org/D134453#3869610 )? I'm good with that as an incremental improvement. Comment Actions Removed the policy and full explicit type printing behind it and keep only printing of first type name, as suggested by the review. This is essentially (2) from here: https://reviews.llvm.org/D134453#3869610 Comment Actions Restored trailing whitespace on untouched line that was automatically deleted by editor's save action. Comment Actions Thanks! The changes are generally looking good to me. Can you add a release note about the improvement to our diagnostic behavior (in clang/docs/ReleaseNotes.rst -- there's a section for diagnostic improvements) and can you rebase the patch on main so that precommit CI testing can run? Comment Actions Updated clang release notes about improvement to diagnostics as requested by reviewers and added more vim-specific temp files to .gitignore. Comment Actions LGTM assuming precommit CI comes back green. Do you need someone to land these changes on your behalf? If so, please let me know what name and email address you'd like used for patch attribution.
Comment Actions
Yes. My name is Nenad Mikša, e-mail address: dodoentertainment@gmail.com
Comment Actions updated new tests that appeared after rebasing to main and removed vim-specific files from .gitignore (I'll keep that downstream). |
Looks like some unintended whitespace changes? (removing trailing whitespace from these lines)