Page MenuHomePhabricator

Disambiguate type names when printing NTTP types
ClosedPublic

Authored by DoDoENT on Sep 22 2022, 9:51 AM.

Details

Summary

When printing NTTP template types, ensure that type name of the NTTP is
printed.

This should fix issue reported here: https://github.com/llvm/llvm-project/issues/57562

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
dblaikie added inline comments.
clang/include/clang/AST/PrettyPrinter.h
307 ↗(On Diff #462217)

Perhaps this might be more of a bug in PrintCanonicalTypes than something to add a separate flag for.

@aaron.ballman D55552 for context here...

Hmm, actually, just adding the top level Height{{0}}, Width{{0}}, Channels{{0}} is sufficient to make this code compile (whereas with the {{{0}}} it doesn't form a valid identifier.

So what's your use case for needing more explicitness than that top level?

aaron.ballman added inline comments.Sep 22 2022, 12:07 PM
clang/include/clang/AST/PrettyPrinter.h
307 ↗(On Diff #462217)

Perhaps this might be more of a bug in PrintCanonicalTypes than something to add a separate flag for.

@aaron.ballman D55552 for context here...

I looked over D55552 again and haven't spotted anything with it that seems amiss; the change there is to grab the canonical type before trying to print it which is all the more I'd expect PrintCanonicalTypes to impact.

This looks like the behavior you'd get when you desugar the type. Check out the AST dump for s: https://godbolt.org/z/vxh5j6qWr

`-VarDecl <line:3:1, col:20> col:20 s 'S<Point{1, 2}>':'S<{1, 2}>' callinit

We generate that type information at https://github.com/llvm/llvm-project/blob/main/clang/lib/AST/TextNodeDumper.cpp#L663 for doing the AST dump, note how the second type printed is the desugared type and that matches what we're seeing from the pretty printer.

DoDoENT added inline comments.Sep 23 2022, 2:02 AM
clang/include/clang/AST/PrettyPrinter.h
307 ↗(On Diff #462217)

So what's your use case for needing more explicitness than that top level?

As I described in the github issue, I'm trying to write a clang-based tool that will have different behavior if the printed {{{0}}} is actually Width than if its Height or anything else.

You can see the the issue in the AST dump for bla: https://godbolt.org/z/fMr4f13o3

The type is

`-VarDecl <line:20:1, col:21> col:21 bla 'NDArray<float, W>':'NDArray<float, {{{0}}}>' callinit
  `-CXXConstructExpr <col:21> 'NDArray<float, W>':'NDArray<float, {{{0}}}>' 'void () noexcept'

so it's unknown whether {{{0}}} represent the Width or Height. My patch makes it work exactly like GCC (see the comparison of error message between clang 15 and GCC 12.1.

Perhaps this might be more of a bug in PrintCanonicalTypes than something to add a separate flag for.

This was also my first thought and the first version of my patch (before even submitting it here) was to actually change the behavior of PrintCanonicalTypes. However, that change made some tests fail, as I described in the patch description:

  • CodeGenCXX/debug-info-template.cpp
  • SemaCXX/constexpr-printing.cpp
  • SemaCXX/cxx2a-nttp-printing.cpp
  • SemaTemplate/temp_arg_string_printing.cpp

Of course, it's possible to simply update the tests, but I actually don't fully understand what is the goal of PrintCanonicalTypes and whether its current behavior is actually desired or not, so I played it safe and introduced a new policy that is disabled by default until I get more feedback from more experienced LLVM developers.

The patch does solve my problem and since I'm building LLVM from source anyway, I can have it enabled in my fork.

I just want to see if it would also be beneficial to be introduced into the upstream LLVM.

aaron.ballman added inline comments.Sep 23 2022, 10:50 AM
clang/include/clang/AST/PrettyPrinter.h
307 ↗(On Diff #462217)

Of course, it's possible to simply update the tests, but I actually don't fully understand what is the goal of PrintCanonicalTypes and whether its current behavior is actually desired or not, so I played it safe and introduced a new policy that is disabled by default until I get more feedback from more experienced LLVM developers.

PrintCanonicalTypes is what controls output like whether we print a typedef name (not the canonical type) or the final underlying type all typedefs involved (the canonical type). It won't have an impact on things like whether we print the name of a structure or not.

After looking into this, I think we do want changes here. I'm not 100% convinced we need a new policy member. I tried out printing the type names unconditionally and the results were a bit of a mixed bag (some diagnostics got more clear, other diagnostics didn't become more clear but also didn't become more confusing):

error: 'note' diagnostics expected but not seen:
  File F:\source\llvm-project\clang\test\SemaTemplate\temp_arg_nontype_cxx20.cpp Line 189: in instantiation of template class 'Diags::X<{1, 2}>' requested here
error: 'note' diagnostics seen but not expected:
  File F:\source\llvm-project\clang\test\SemaTemplate\temp_arg_nontype_cxx20.cpp Line 189: in instantiation of template class 'Diags::X<Diags::A{1, 2}>' requested here

seems like a clarifying change, while:

error: 'error' diagnostics expected but not seen:
  File F:\source\llvm-project\clang\test\SemaCXX\cxx2a-nttp-printing.cpp Line 19: no member named 'display' in 'ASCII<{"this nontype template argument is [...]"}>'
  File F:\source\llvm-project\clang\test\SemaCXX\cxx2a-nttp-printing.cpp Line 25: no member named 'display' in 'ASCII<{{119, 97, 105, 116, 32, 97, 32, 115, 27, 99, ...}}>'
  File F:\source\llvm-project\clang\test\SemaCXX\cxx2a-nttp-printing.cpp Line 33: no member named 'display' in 'ASCII<{"what??!"}>'
error: 'error' diagnostics seen but not expected:
  File F:\source\llvm-project\clang\test\SemaCXX\cxx2a-nttp-printing.cpp Line 19: no member named 'display' in 'ASCII<Str<43>{"this nontype template argument is [...]"}>'
  File F:\source\llvm-project\clang\test\SemaCXX\cxx2a-nttp-printing.cpp Line 25: no member named 'display' in 'ASCII<Str<14>{{119, 97, 105, 116, 32, 97, 32, 115, 27, 99, ...}}>'
  File F:\source\llvm-project\clang\test\SemaCXX\cxx2a-nttp-printing.cpp Line 33: no member named 'display' in 'ASCII<Str<8>{"what??!"}>'

seems like it's neither here nor there.

@dblaikie, do you have feelings on how to go with this?

dblaikie added inline comments.Sep 23 2022, 12:22 PM
clang/include/clang/AST/PrettyPrinter.h
307 ↗(On Diff #462217)

Yeah, I'm inclined to think that Diags::X<{1, 2}> is just too simplified - it's unambiguous if the parameter isn't auto, but isn't valid syntax (so the language still expects a type name there) & so maybe we should do the same in diagnostics?

@aaron.ballman - though I'm still confused by the behavior-change when PrintCanonicalTypes = false that causes the top level names to be printed? Maybe that's just a weird case/red herring/bug in PrintCanonicalTypes = true that skips those top level names and we could print them unconditionally?

@DoDoENT - I was/am curious if/why you need more explicitness than would be provided by PrintCanonicalTypes = false - if the output was NDArray<float, Height{{{0}}}, Width{{{0}}}, Channels{{{0}}}> would that be sufficient for your needs? (I think in that case the name would be valid to use in code, which I think is a reasonable basis to make the decision - but I'm not sure how to justify adding all the intermediate names too)

DoDoENT added inline comments.Sep 24 2022, 7:06 AM
clang/include/clang/AST/PrettyPrinter.h
307 ↗(On Diff #462217)

PrintCanonicalTypes is what controls output like whether we print a typedef name (not the canonical type) or the final underlying type all typedefs involved (the canonical type). It won't have an impact on things like whether we print the name of a structure or not.

Thank you for the explanation.

I was/am curious if/why you need more explicitness than would be provided by PrintCanonicalTypes = false - if the output was NDArray<float, Height{{{0}}}, Width{{{0}}}, Channels{{{0}}}> would that be sufficient for your needs?

It would for now (at least until I actually start needing to get the types of inner braces), however, in a real-world scenario I actually get output NDArray<float, {{{0}}}, {{{0}}}, {{{0}}}>, regardless of the value of PrintCanonicalTypes. I've tried to create a minimum reproducer for that case, but, unfortunately, all simple cases resolve to actually display the names Height, Width, etc. The closes thing to the real-world scenario is the diagnostic message shown here.

However, in my real code, where there are multiple levels of type deduction happening, I always get the output without a type name, regardless of the value of PrintCanonicalTypes. This is why I took some time to debug what is actually happening within clang and created this patch which now works for me in all cases.

307 ↗(On Diff #462217)

Maybe that's just a weird case/red herring/bug in PrintCanonicalTypes = true that skips those top level names and we could print them unconditionally?

From my debugging sessions, I saw that the difference happens during printing whether the type in question is deduced type or known type. If it's a known type, then in both cases the type gets printed (in a clang-based tool, not in the diagnostic as shown on Godbolt). However, if it's a deduced type, then the type gets or doesn't get printed depending on whether the canonical type is requested. If canonical type is requested from a deduced type, in AST it's actually represented as an "expression statement", not a "type", and then the type printer tries to print it as an "expression statement". This then leads to printing code in APValue and TemplateBase thinking that it's actually printing a normal C++ expression (as in the case of a clang-tidy or similar refactor tool), and not as a non-type template parameter, yielding printout with omitted types, as that is generally expected when writing code.

However, if you are using the printer to get a full type name in a string for different purposes, as I do, then you get incomplete information. So, there may be cases when you actually need/want full-type info and cases when you don't want it. Therefore, I think some new policy config should be in place to let people choose the behavior.

dblaikie added inline comments.Sep 24 2022, 8:36 AM
clang/include/clang/AST/PrettyPrinter.h
307 ↗(On Diff #462217)

It would for now (at least until I actually start needing to get the types of inner braces),

This ^ is what I was getting at - do you have a use/need for the explicit nested type names? Given that those aren't ambiguous/the syntax is valid C++ - what's the use case/need for them to be explicitly written?

Because that presents some tension between the needs of diagnostic users (where I think adding this extra info that can be inferred from context (by both the user and the C++ language/compiler) and would potentially make diagnostics noisier/less helpful) and your use case (which I'm trying to understand).

If there are multiple levels of deduction/ambiguity, I'm all for having more explicit names as necessary to remove the ambiguity, so maybe small examples of that would be helpful too, to make sure the solution is as deep as it needs to be.

however, in a real-world scenario I actually get output NDArray<float, {{{0}}}, {{{0}}}, {{{0}}}>

Right, I'm happy to/leaning towards/trying to understand if we can consider that a bug in whatever way it shows up.

Hmm, seems this might be related to the FIXME here: https://github.com/llvm/llvm-project/blob/e07ead85a368173a56e96a21d6841aa497ad80f8/clang/lib/AST/TemplateBase.cpp#L424 - which, right, is the code you're changing in this patch - but I think the FIXME hints at this not being something we necessarily want to have as opt-in, but as something that should be fixed in this code possibly without the need for a new printing policy.

Seems reasonable that this produces at least unambiguous names (so it should print out the name if it's in a deduced context) and I think it's probably good if it prints out valid names (so even includes the name in non-deduced contexts, because C++ requires that too when writing the name) to simplify things for users of diagnostics, and in your case.

DoDoENT added inline comments.Sep 25 2022, 8:38 AM
clang/include/clang/AST/PrettyPrinter.h
307 ↗(On Diff #462217)

Given that those aren't ambiguous/the syntax is valid C++ - what's the use case/need for them to be explicitly written?

I'm writing a clang-based tool for internal use in our company that will need to parse some c++ code from a machine-generated function, representing the execution of computation graph (like those used in machine learning inference), then extract the exact types of all intermediate results and feed that into an algorithm that will optimize and possibly reorder the execution of graph nodes, depending on the types of the intermediate results (after optimization the types may change, but we know exactly how). Since we are encoding a lot of compile-time information into the type names of the intermediates, we need full type names here. As I'm a rookie with LLVM and don't know better, I implemented the FrontendAction and RecursiveVisitor to visit variables representing the variables I'm interested in and used getAsString to obtain type names. Unfortunately, I was not getting full type names and therefore I investigated further and that resulted with this patch.

Because that presents some tension between the needs of diagnostic users (where I think adding this extra info that can be inferred from context (by both the user and the C++ language/compiler) and would potentially make diagnostics noisier/less helpful) and your use case (which I'm trying to understand).

In that case, it makes sense to introduce a new policy. Maybe only a better name for it should be invented...

Hmm, seems this might be related to the FIXME here: https://github.com/llvm/llvm-project/blob/e07ead85a368173a56e96a21d6841aa497ad80f8/clang/lib/AST/TemplateBase.cpp#L424 - which, right, is the code you're changing in this patch - but I think the FIXME hints at this not being something we necessarily want to have as opt-in, but as something that should be fixed in this code possibly without the need for a new printing policy.

Seems reasonable that this produces at least unambiguous names (so it should print out the name if it's in a deduced context) and I think it's probably good if it prints out valid names (so even includes the name in non-deduced contexts, because C++ requires that too when writing the name) to simplify things for users of diagnostics, and in your case.

If only a change in TemplateBase.cpp is applied, but not also in APValue.cpp, then at least I get consistently printed out Height{{{0}}}, regardless of the value of PrintCanonicalTypes, so change in this line may indeed be a bugfix.

However, a change in APValue.cpp indeed adds noise, as you call it, to the diagnostic messages, as it ensures that the full type name is written. However, the same noise level is always generated by GCC and, in my opinion, although noisier, the GCC's error message is more thorough. But I'm OK with keeping that change behind a new policy.

dblaikie added inline comments.Sep 26 2022, 9:10 AM
clang/include/clang/AST/PrettyPrinter.h
307 ↗(On Diff #462217)

Given that those aren't ambiguous/the syntax is valid C++ - what's the use case/need for them to be explicitly written?

I'm writing a clang-based tool for internal use in our company that will need to parse some c++ code from a machine-generated function, representing the execution of computation graph (like those used in machine learning inference), then extract the exact types of all intermediate results and feed that into an algorithm that will optimize and possibly reorder the execution of graph nodes, depending on the types of the intermediate results (after optimization the types may change, but we know exactly how). Since we are encoding a lot of compile-time information into the type names of the intermediates, we need full type names here. As I'm a rookie with LLVM and don't know better, I implemented the FrontendAction and RecursiveVisitor to visit variables representing the variables I'm interested in and used getAsString to obtain type names. Unfortunately, I was not getting full type names and therefore I investigated further and that resulted with this patch.

But if the type as a string is correct as far as C++ is concerned, is that not enough information (necessarily - the C++ compiler can determine all the type information from the type as written with only the top level types in the NTTP names)? If it's that the tool is trying to get the type information purely from the string representation, maybe the tool should be doing things differently - relying on Clang's AST/type APIs, rather than on inspecting the resulting string?

DoDoENT added inline comments.Sep 28 2022, 3:03 AM
clang/include/clang/AST/PrettyPrinter.h
307 ↗(On Diff #462217)

If it's that the tool is trying to get the type information purely from the string representation, maybe the tool should be doing things differently - relying on Clang's AST/type APIs, rather than on inspecting the resulting string?

I completely agree, however, the documentation on type API is not very good, and couldn't find a better way to get a string representation of the type using this API, besides manually traversing the AST and concatenating bits and pieces into a string which is essentially the same thing that TypePrinter does...

On the other hand, inventing new data structures to hold the type information or directly saving pointers to parts of the AST does not sound appealing to me, at least not for the current level of complexity I'm aiming for.

But if the type as a string is correct as far as C++ is concerned, is that not enough information (necessarily - the C++ compiler can determine all the type information from the type as written with only the top level types in the NTTP names)?

I think that the essential problem here is the as far as C++ is concerned? What about the developer? Developers may have more benefit from getting a full type name in their diagnostic than just something that is correct as far as C++ is concerned.

Consider this error message, which is completely useless and compare it to the much more informative errors by GCC and MSVC that immediately tell you what's wrong with your code.

So, why shouldn't Clang behave the same way as well?

Moving this to a top level discussion because it's easier to write replies here than in an inline comment.

If it's that the tool is trying to get the type information purely from the string representation, maybe the tool should be doing things differently - relying on Clang's AST/type APIs, rather than on inspecting the resulting string?

I completely agree, however, the documentation on type API is not very good, and couldn't find a better way to get a string representation of the type using this API, besides manually traversing the AST and concatenating bits and pieces into a string which is essentially the same thing that TypePrinter does...

Sorry, I'm confused - it sounded like you didn't actually want a string, though - you want the type information to make various semantic-aware choices in your tool, yes?

On the other hand, inventing new data structures to hold the type information or directly saving pointers to parts of the AST does not sound appealing to me, at least not for the current level of complexity I'm aiming for.

Generally the way to do it, if you want to introspect into the type, its template parameters, etc, then yes, keeping pointers to the AST or reparsing the name with Clang as-needed, would be the way to go - or walking the AST up-front and generating your own data structure (not necessarily a string) with the data you want in some structured format. Relying on the string printing of Clang to produce enough introspective information for a static analysis tool I think would be at odds with other tradeoffs that stringification is trying to achieve.

But if the type as a string is correct as far as C++ is concerned, is that not enough information (necessarily - the C++ compiler can determine all the type information from the type as written with only the top level types in the NTTP names)?

I think that the essential problem here is the as far as C++ is concerned? What about the developer? Developers may have more benefit from getting a full type name in their diagnostic than just something that is correct as far as C++ is concerned.

Consider this error message, which is completely useless and compare it to the much more informative errors by GCC and MSVC that immediately tell you what's wrong with your code.

So, why shouldn't Clang behave the same way as well?

Again, I'm not advocating for the printing as-is, I think adding the top level name that disambiguates would be a good thing - and I think the GCC and MSVC examples somewhat show why adding all the other layers would be harmful to readability - there's a lot of text in those messages that doesn't directly help the user and gets in the way of identifying the important differences between the type names.

For instance, let's take the GCC message:

<source>: In function 'void foo()':
<source>:25:9: error: no match for 'operator=' (operand types are 'NDArray<float, Width{DimensionImpl<Width>{Dimension{0}}}>' and 'NDArray<float, Height{DimensionImpl<Height>{Dimension{0}}}>')
   25 |     a = b;
      |         ^
<source>:2:8: note: candidate: 'constexpr NDArray<float, Width{DimensionImpl<Width>{Dimension{0}}}>& NDArray<float, Width{DimensionImpl<Width>{Dimension{0}}}>::operator=(const NDArray<float, Width{DimensionImpl<Width>{Dimension{0}}}>&)'
    2 | struct NDArray {};
      |        ^~~~~~~
<source>:2:8: note:   no known conversion for argument 1 from 'NDArray<float, Height{DimensionImpl<Height>{Dimension{0}}}>' to 'const NDArray<float, Width{DimensionImpl<Width>{Dimension{0}}}>&'
<source>:2:8: note: candidate: 'constexpr NDArray<float, Width{DimensionImpl<Width>{Dimension{0}}}>& NDArray<float, Width{DimensionImpl<Width>{Dimension{0}}}>::operator=(NDArray<float, Width{DimensionImpl<Width>{Dimension{0}}}>&&)'
<source>:2:8: note:   no known conversion for argument 1 from 'NDArray<float, Height{DimensionImpl<Height>{Dimension{0}}}>' to 'NDArray<float, Width{DimensionImpl<Width>{Dimension{0}}}>&&'

And compare it to:

<source>: In function 'void foo()':
<source>:25:9: error: no match for 'operator=' (operand types are 'NDArray<float, Width{{{0}}}>' and 'NDArray<float, Height{{{0}}}>')
   25 |     a = b;
      |         ^
<source>:2:8: note: candidate: 'constexpr NDArray<float, Width{{{0}}}>& NDArray<float, Width{{{0}}}>::operator=(const NDArray<float, Width{{{0}}}>&)'
    2 | struct NDArray {};
      |        ^~~~~~~
<source>:2:8: note:   no known conversion for argument 1 from 'NDArray<float, Height{{{0}}}>' to 'const NDArray<float, Width{{{0}}}>&'
<source>:2:8: note: candidate: 'constexpr NDArray<float, Width{{{0}}}>& NDArray<float, Width{{{0}}}>::operator=(NDArray<float, Width{{{0}}}>&&)'
<source>:2:8: note:   no known conversion for argument 1 from 'NDArray<float, Height{{{0}}}>' to 'NDArray<float, Width{{{0}}}>&&'

That seems significantly easier to read/easier to identify the relevant difference for the developer.

Generally the way to do it, if you want to introspect into the type, its template parameters, etc, then yes, keeping pointers to the AST or reparsing the name with Clang as-needed, would be the way to go - or walking the AST up-front and generating your own data structure (not necessarily a string) with the data you want in some structured format.

This is precisely what I'm trying to avoid. Source files that include AST-related headers are extremely slow to compile, so I want to build a facade around the code that interacts with clang to keep my build times as low as possible. Thus, I will pay for the slow compilation only in a couple of files.

Sorry, I'm confused - it sounded like you didn't actually want a string, though - you want the type information to make various semantic-aware choices in your tool, yes?

At the end of the line, I will need a string representation of the type name, at least in my current approach.

Again, I'm not advocating for the printing as-is, I think adding the top level name that disambiguates would be a good thing - and I think the GCC and MSVC examples somewhat show why adding all the other layers would be harmful to readability - there's a lot of text in those messages that doesn't directly help the user and gets in the way of identifying the important differences between the type names.

I think this is a matter of taste. In the example that you've shown, I personally prefer the verbosity of GCC and don't see it as "less readable", but as "more informative". However, I do understand that some people may prefer the output you suggested. What about making this configurable, i.e. behind some clang flag, so that developers that prefer verbosity can enable that?

Generally the way to do it, if you want to introspect into the type, its template parameters, etc, then yes, keeping pointers to the AST or reparsing the name with Clang as-needed, would be the way to go - or walking the AST up-front and generating your own data structure (not necessarily a string) with the data you want in some structured format.

This is precisely what I'm trying to avoid. Source files that include AST-related headers are extremely slow to compile, so I want to build a facade around the code that interacts with clang to keep my build times as low as possible. Thus, I will pay for the slow compilation only in a couple of files.

Sorry, I'm confused - it sounded like you didn't actually want a string, though - you want the type information to make various semantic-aware choices in your tool, yes?

At the end of the line, I will need a string representation of the type name, at least in my current approach.

Right - I'm trying to understand what your overall goals are and what clang needs to facilitate them, if it's suitable to do so. And it seems to me that the right way to support them, is for your approach to change - not to rely on the type printing to communicate certain information, but to use the AST to get the information you need (& if you want to encode that in a string, that's up to you - I'd suggest encoding it in some more semantic data structure (a struct, with variables, lists/vectors, etc, describing the properties you want - so you don't have to rely on what clang does/doesn't include in the text and have to parse that information out of the text later on with your own home-grown C++ type parsing code, I guess)).

Again, I'm not advocating for the printing as-is, I think adding the top level name that disambiguates would be a good thing - and I think the GCC and MSVC examples somewhat show why adding all the other layers would be harmful to readability - there's a lot of text in those messages that doesn't directly help the user and gets in the way of identifying the important differences between the type names.

I think this is a matter of taste. In the example that you've shown, I personally prefer the verbosity of GCC and don't see it as "less readable", but as "more informative". However, I do understand that some people may prefer the output you suggested. What about making this configurable, i.e. behind some clang flag, so that developers that prefer verbosity can enable that?

I don't think that's the right choice for clang - though I'm not the only decider here. Be great to hear from @aaron.ballman at some point.

My perspective on this issue at the moment is that we should fix any case where we print an ambiguous type (so template<auto A> struct t1; should never produce t1<{}> and should instnead produce t1<t2{}>) - but I don't think extra {} should be added where the language doesn't require it and where it isn't necessary to disambiguate.

Generally the way to do it, if you want to introspect into the type, its template parameters, etc, then yes, keeping pointers to the AST or reparsing the name with Clang as-needed, would be the way to go - or walking the AST up-front and generating your own data structure (not necessarily a string) with the data you want in some structured format.

This is precisely what I'm trying to avoid. Source files that include AST-related headers are extremely slow to compile, so I want to build a facade around the code that interacts with clang to keep my build times as low as possible. Thus, I will pay for the slow compilation only in a couple of files.

Sorry, I'm confused - it sounded like you didn't actually want a string, though - you want the type information to make various semantic-aware choices in your tool, yes?

At the end of the line, I will need a string representation of the type name, at least in my current approach.

Right - I'm trying to understand what your overall goals are and what clang needs to facilitate them, if it's suitable to do so. And it seems to me that the right way to support them, is for your approach to change - not to rely on the type printing to communicate certain information, but to use the AST to get the information you need (& if you want to encode that in a string, that's up to you - I'd suggest encoding it in some more semantic data structure (a struct, with variables, lists/vectors, etc, describing the properties you want - so you don't have to rely on what clang does/doesn't include in the text and have to parse that information out of the text later on with your own home-grown C++ type parsing code, I guess)).

Whenever possible, interrogating the AST directly should be preferred to using strings to hold the data.

Again, I'm not advocating for the printing as-is, I think adding the top level name that disambiguates would be a good thing - and I think the GCC and MSVC examples somewhat show why adding all the other layers would be harmful to readability - there's a lot of text in those messages that doesn't directly help the user and gets in the way of identifying the important differences between the type names.

I think this is a matter of taste. In the example that you've shown, I personally prefer the verbosity of GCC and don't see it as "less readable", but as "more informative". However, I do understand that some people may prefer the output you suggested. What about making this configurable, i.e. behind some clang flag, so that developers that prefer verbosity can enable that?

I don't think that's the right choice for clang - though I'm not the only decider here. Be great to hear from @aaron.ballman at some point.

My perspective on this issue at the moment is that we should fix any case where we print an ambiguous type (so template<auto A> struct t1; should never produce t1<{}> and should instnead produce t1<t2{}>) - but I don't think extra {} should be added where the language doesn't require it and where it isn't necessary to disambiguate.

My thinking is along a somewhat different line of approach. Clang has -ast-print which is intended to pretty print the AST back to the original source code without losing information. It is a -cc1 option and not a driver option (though we do document it a little bit here https://releases.llvm.org/15.0.0/tools/clang/docs/HowToSetupToolingForLLVM.html), so it's not something we expose trivially to users. This option has always been aspirational -- we'd love it to pretty print back to the original source, but we know there are plenty of cases where we don't manage it and we typically don't require people to maintain it except superficially. Now that we have clang-format, I think its utility is largely theoretical in some ways. Despite its known issues, I am aware of at least some users who make use of the functionality (for various reasons) and I am not 100% convinced we'll get community agreement to drop support for it as a failed experiment despite my personal feelings that the functionality is more trouble than it's worth, especially because I'm reasonably certain some other functionality relies on it (ARC rewriting maybe?) and we'd have to figure out what to do with that.

So my thinking is: if we're going to have -ast-print, we should maintain it, and part of maintaining it means having the ability for it to print this code back to compilable source with the same semantics as the original source. Our current behavior with the example is... not particularly fantastic: https://godbolt.org/z/88eK3q869. Alternatively, we could remove -ast-print as an option and narrow its focus to only the internal uses we need it for.

Again, I'm not advocating for the printing as-is, I think adding the top level name that disambiguates would be a good thing - and I think the GCC and MSVC examples somewhat show why adding all the other layers would be harmful to readability - there's a lot of text in those messages that doesn't directly help the user and gets in the way of identifying the important differences between the type names.

I think this is a matter of taste. In the example that you've shown, I personally prefer the verbosity of GCC and don't see it as "less readable", but as "more informative". However, I do understand that some people may prefer the output you suggested. What about making this configurable, i.e. behind some clang flag, so that developers that prefer verbosity can enable that?

I don't think that's the right choice for clang - though I'm not the only decider here. Be great to hear from @aaron.ballman at some point.

My perspective on this issue at the moment is that we should fix any case where we print an ambiguous type (so template<auto A> struct t1; should never produce t1<{}> and should instnead produce t1<t2{}>) - but I don't think extra {} should be added where the language doesn't require it and where it isn't necessary to disambiguate.

My thinking is along a somewhat different line of approach. Clang has -ast-print which is intended to pretty print the AST back to the original source code without losing information. It is a -cc1 option and not a driver option (though we do document it a little bit here https://releases.llvm.org/15.0.0/tools/clang/docs/HowToSetupToolingForLLVM.html), so it's not something we expose trivially to users. This option has always been aspirational -- we'd love it to pretty print back to the original source, but we know there are plenty of cases where we don't manage it and we typically don't require people to maintain it except superficially. Now that we have clang-format, I think its utility is largely theoretical in some ways. Despite its known issues, I am aware of at least some users who make use of the functionality (for various reasons) and I am not 100% convinced we'll get community agreement to drop support for it as a failed experiment despite my personal feelings that the functionality is more trouble than it's worth, especially because I'm reasonably certain some other functionality relies on it (ARC rewriting maybe?) and we'd have to figure out what to do with that.

So my thinking is: if we're going to have -ast-print, we should maintain it, and part of maintaining it means having the ability for it to print this code back to compilable source with the same semantics as the original source. Our current behavior with the example is... not particularly fantastic: https://godbolt.org/z/88eK3q869. Alternatively, we could remove -ast-print as an option and narrow its focus to only the internal uses we need it for.

Hmm - that seems to be to be somewhat of a tangent, though? I think the claim that template<auto> struct t1; struct t2 { }; t1<t2{}> v1; and then rendering t1<{}> in a diagnostic is wrong is solid, and we should fix that for auto parameters to include the type info that'd be necessary in the source code/to disambiguate. (I guess C++20 actually requires the name even in non-ambiguous, non-auto cases - so I'd be OK leaning towards the syntax requirement, even if it's not necessary to disambiguate (maybe there are cases where it is?).

You're suggesting that -ast-print should respect the type as-written by the user, even? Not just "enough to be valid C++ code/disambiguate" - and that mode might be enough to support @DoDoENT's use case using strings for type information? (more addressing this feature request, less about the incidental diagnostic quality bug report that's a side issue in this discussion)

Right - I'm trying to understand what your overall goals are and what clang needs to facilitate them, if it's suitable to do so. And it seems to me that the right way to support them, is for your approach to change - not to rely on the type printing to communicate certain information, but to use the AST to get the information you need (& if you want to encode that in a string, that's up to you - I'd suggest encoding it in some more semantic data structure (a struct, with variables, lists/vectors, etc, describing the properties you want - so you don't have to rely on what clang does/doesn't include in the text and have to parse that information out of the text later on with your own home-grown C++ type parsing code, I guess)).

I've got a machine-generated c++ function in the form:

c++
template< typename Pixel >
auto processImage( Image< Pixel > const & input )
{
    auto result0{ processor.step1( input ) };
    auto result1{ processor.step2( result0 ) };
    ...
    auto resultN{ processor.stepNplus1( resultNminus1, resultK, resultM, ... ) }; // for K, M, ... < N - 1
    return resultN;
}

The processor is a graph processor that has steps that perform node computations (imagine it as neural network layers in machine learning inference).

Now, this works fine, but produces unnecessarily large binary, as types of most results don't depend on the Pixel template, but get instantiated anyway, thus making both binary larger and source slower to compile.

The idea of my tool is to use a clang to obtain concrete types of every result0, result1, ..., resultN and then decide whether its type depends on Pixel or not. Then use that information to cut the function into two functions - one that contains part of processing that depends on Pixel and the one that does not. The one that does not will be a normal function, while the one that does will remain in the function template and will call to the other after performing part of processing that depends on Pixel.

So, my idea is to obtain the types of each result as strings and then simply search for the keyword "Pixel"` in those strings to classify each result type. This approach works pretty well with my patch. Also, I find that much easier than navigating the AST.

But in general, you both are right - this tool could possibly be implemented differently and probably will in the future.

Now, let's keep our focus on verbosity in the diagnostic outputs, as this was the main reason I contributed this patch back to the upstream LLVM.

Again, I'm not advocating for the printing as-is, I think adding the top level name that disambiguates would be a good thing - and I think the GCC and MSVC examples somewhat show why adding all the other layers would be harmful to readability - there's a lot of text in those messages that doesn't directly help the user and gets in the way of identifying the important differences between the type names.

I think this is a matter of taste. In the example that you've shown, I personally prefer the verbosity of GCC and don't see it as "less readable", but as "more informative". However, I do understand that some people may prefer the output you suggested. What about making this configurable, i.e. behind some clang flag, so that developers that prefer verbosity can enable that?

I don't think that's the right choice for clang - though I'm not the only decider here. Be great to hear from @aaron.ballman at some point.

My perspective on this issue at the moment is that we should fix any case where we print an ambiguous type (so template<auto A> struct t1; should never produce t1<{}> and should instnead produce t1<t2{}>) - but I don't think extra {} should be added where the language doesn't require it and where it isn't necessary to disambiguate.

My thinking is along a somewhat different line of approach. Clang has -ast-print which is intended to pretty print the AST back to the original source code without losing information. It is a -cc1 option and not a driver option (though we do document it a little bit here https://releases.llvm.org/15.0.0/tools/clang/docs/HowToSetupToolingForLLVM.html), so it's not something we expose trivially to users. This option has always been aspirational -- we'd love it to pretty print back to the original source, but we know there are plenty of cases where we don't manage it and we typically don't require people to maintain it except superficially. Now that we have clang-format, I think its utility is largely theoretical in some ways. Despite its known issues, I am aware of at least some users who make use of the functionality (for various reasons) and I am not 100% convinced we'll get community agreement to drop support for it as a failed experiment despite my personal feelings that the functionality is more trouble than it's worth, especially because I'm reasonably certain some other functionality relies on it (ARC rewriting maybe?) and we'd have to figure out what to do with that.

So my thinking is: if we're going to have -ast-print, we should maintain it, and part of maintaining it means having the ability for it to print this code back to compilable source with the same semantics as the original source. Our current behavior with the example is... not particularly fantastic: https://godbolt.org/z/88eK3q869. Alternatively, we could remove -ast-print as an option and narrow its focus to only the internal uses we need it for.

Hmm - that seems to be to be somewhat of a tangent, though? I think the claim that template<auto> struct t1; struct t2 { }; t1<t2{}> v1; and then rendering t1<{}> in a diagnostic is wrong is solid, and we should fix that for auto parameters to include the type info that'd be necessary in the source code/to disambiguate. (I guess C++20 actually requires the name even in non-ambiguous, non-auto cases - so I'd be OK leaning towards the syntax requirement, even if it's not necessary to disambiguate (maybe there are cases where it is?).

You're suggesting that -ast-print should respect the type as-written by the user, even? Not just "enough to be valid C++ code/disambiguate" - and that mode might be enough to support @DoDoENT's use case using strings for type information? (more addressing this feature request, less about the incidental diagnostic quality bug report that's a side issue in this discussion)

I'm mostly looking at it from the perspective of "is it reasonable to add a new printing policy here and if so, what do we need from it" first and then "when is it reasonable to enable that printing policy". I think the presence of -ast-print answers the first question that we probably do need *some* sort of printing policy here because we believe we have at least two scenarios to support: print for a diagnostic and print for the pretty printer. We might even find we want a verbosity level instead of a boolean switch: print minimal type info, print "disambiguating" type info, print all type info, though it sort of sounds to me like we only need two modes (disambiguating mode for diagnostics and all type info for pretty printing/@DoDoENT).

(I think the original goal of -ast-print was to respect the code as-written by the user, but I am not convinced we need that strict of a goal. I think "produces code which compiles and has the same semantics as the original source" is a more reasonable goal and once we can do that much we can think about improving fidelity more if we need to.)

Again, I'm not advocating for the printing as-is, I think adding the top level name that disambiguates would be a good thing - and I think the GCC and MSVC examples somewhat show why adding all the other layers would be harmful to readability - there's a lot of text in those messages that doesn't directly help the user and gets in the way of identifying the important differences between the type names.

I think this is a matter of taste. In the example that you've shown, I personally prefer the verbosity of GCC and don't see it as "less readable", but as "more informative". However, I do understand that some people may prefer the output you suggested. What about making this configurable, i.e. behind some clang flag, so that developers that prefer verbosity can enable that?

I don't think that's the right choice for clang - though I'm not the only decider here. Be great to hear from @aaron.ballman at some point.

My perspective on this issue at the moment is that we should fix any case where we print an ambiguous type (so template<auto A> struct t1; should never produce t1<{}> and should instnead produce t1<t2{}>) - but I don't think extra {} should be added where the language doesn't require it and where it isn't necessary to disambiguate.

My thinking is along a somewhat different line of approach. Clang has -ast-print which is intended to pretty print the AST back to the original source code without losing information. It is a -cc1 option and not a driver option (though we do document it a little bit here https://releases.llvm.org/15.0.0/tools/clang/docs/HowToSetupToolingForLLVM.html), so it's not something we expose trivially to users. This option has always been aspirational -- we'd love it to pretty print back to the original source, but we know there are plenty of cases where we don't manage it and we typically don't require people to maintain it except superficially. Now that we have clang-format, I think its utility is largely theoretical in some ways. Despite its known issues, I am aware of at least some users who make use of the functionality (for various reasons) and I am not 100% convinced we'll get community agreement to drop support for it as a failed experiment despite my personal feelings that the functionality is more trouble than it's worth, especially because I'm reasonably certain some other functionality relies on it (ARC rewriting maybe?) and we'd have to figure out what to do with that.

So my thinking is: if we're going to have -ast-print, we should maintain it, and part of maintaining it means having the ability for it to print this code back to compilable source with the same semantics as the original source. Our current behavior with the example is... not particularly fantastic: https://godbolt.org/z/88eK3q869. Alternatively, we could remove -ast-print as an option and narrow its focus to only the internal uses we need it for.

Hmm - that seems to be to be somewhat of a tangent, though? I think the claim that template<auto> struct t1; struct t2 { }; t1<t2{}> v1; and then rendering t1<{}> in a diagnostic is wrong is solid, and we should fix that for auto parameters to include the type info that'd be necessary in the source code/to disambiguate. (I guess C++20 actually requires the name even in non-ambiguous, non-auto cases - so I'd be OK leaning towards the syntax requirement, even if it's not necessary to disambiguate (maybe there are cases where it is?).

You're suggesting that -ast-print should respect the type as-written by the user, even? Not just "enough to be valid C++ code/disambiguate" - and that mode might be enough to support @DoDoENT's use case using strings for type information? (more addressing this feature request, less about the incidental diagnostic quality bug report that's a side issue in this discussion)

I'm mostly looking at it from the perspective of "is it reasonable to add a new printing policy here and if so, what do we need from it" first and then "when is it reasonable to enable that printing policy". I think the presence of -ast-print answers the first question that we probably do need *some* sort of printing policy here because we believe we have at least two scenarios to support: print for a diagnostic and print for the pretty printer.

Ah, OK - I'm not sure that use case motivates a different printing mode, though - skipping to the bottom to continue this thought.

We might even find we want a verbosity level instead of a boolean switch: print minimal type info, print "disambiguating" type info, print all type info, though it sort of sounds to me like we only need two modes (disambiguating mode for diagnostics and all type info for pretty printing/@DoDoENT).

(I think the original goal of -ast-print was to respect the code as-written by the user, but I am not convinced we need that strict of a goal. I think "produces code which compiles and has the same semantics as the original source" is a more reasonable goal and once we can do that much we can think about improving fidelity more if we need to.)

OK, so it sounds like the printing behavior change (not necessarily with a policy flag) necessary for diagnostics (make it non-ambiguous/not invalid code which is what is currently emitted) would be a good bug fix both for diagnostics and for ast-print? I'm not sure we need anything other than that.

So that what is currently printing as t1<{}> (ambiguous if the NTTP is of type auto, non-ambiguous, but invalid C++ if the NTTP has the specific type already) prints out as t1<t2{}> (correct C++ and disambiguates in the case of auto ambiguity).

Or would you want the non-auto case to print in diagnostics as t1<{}> since there's no ambiguity, despite that not being valid C++? And then have a separate policy for ast-print that still prints it as t1<t2{}> so it's valid code?

Again, I'm not advocating for the printing as-is, I think adding the top level name that disambiguates would be a good thing - and I think the GCC and MSVC examples somewhat show why adding all the other layers would be harmful to readability - there's a lot of text in those messages that doesn't directly help the user and gets in the way of identifying the important differences between the type names.

I think this is a matter of taste. In the example that you've shown, I personally prefer the verbosity of GCC and don't see it as "less readable", but as "more informative". However, I do understand that some people may prefer the output you suggested. What about making this configurable, i.e. behind some clang flag, so that developers that prefer verbosity can enable that?

I don't think that's the right choice for clang - though I'm not the only decider here. Be great to hear from @aaron.ballman at some point.

My perspective on this issue at the moment is that we should fix any case where we print an ambiguous type (so template<auto A> struct t1; should never produce t1<{}> and should instnead produce t1<t2{}>) - but I don't think extra {} should be added where the language doesn't require it and where it isn't necessary to disambiguate.

My thinking is along a somewhat different line of approach. Clang has -ast-print which is intended to pretty print the AST back to the original source code without losing information. It is a -cc1 option and not a driver option (though we do document it a little bit here https://releases.llvm.org/15.0.0/tools/clang/docs/HowToSetupToolingForLLVM.html), so it's not something we expose trivially to users. This option has always been aspirational -- we'd love it to pretty print back to the original source, but we know there are plenty of cases where we don't manage it and we typically don't require people to maintain it except superficially. Now that we have clang-format, I think its utility is largely theoretical in some ways. Despite its known issues, I am aware of at least some users who make use of the functionality (for various reasons) and I am not 100% convinced we'll get community agreement to drop support for it as a failed experiment despite my personal feelings that the functionality is more trouble than it's worth, especially because I'm reasonably certain some other functionality relies on it (ARC rewriting maybe?) and we'd have to figure out what to do with that.

So my thinking is: if we're going to have -ast-print, we should maintain it, and part of maintaining it means having the ability for it to print this code back to compilable source with the same semantics as the original source. Our current behavior with the example is... not particularly fantastic: https://godbolt.org/z/88eK3q869. Alternatively, we could remove -ast-print as an option and narrow its focus to only the internal uses we need it for.

Hmm - that seems to be to be somewhat of a tangent, though? I think the claim that template<auto> struct t1; struct t2 { }; t1<t2{}> v1; and then rendering t1<{}> in a diagnostic is wrong is solid, and we should fix that for auto parameters to include the type info that'd be necessary in the source code/to disambiguate. (I guess C++20 actually requires the name even in non-ambiguous, non-auto cases - so I'd be OK leaning towards the syntax requirement, even if it's not necessary to disambiguate (maybe there are cases where it is?).

You're suggesting that -ast-print should respect the type as-written by the user, even? Not just "enough to be valid C++ code/disambiguate" - and that mode might be enough to support @DoDoENT's use case using strings for type information? (more addressing this feature request, less about the incidental diagnostic quality bug report that's a side issue in this discussion)

I'm mostly looking at it from the perspective of "is it reasonable to add a new printing policy here and if so, what do we need from it" first and then "when is it reasonable to enable that printing policy". I think the presence of -ast-print answers the first question that we probably do need *some* sort of printing policy here because we believe we have at least two scenarios to support: print for a diagnostic and print for the pretty printer.

Ah, OK - I'm not sure that use case motivates a different printing mode, though - skipping to the bottom to continue this thought.

We might even find we want a verbosity level instead of a boolean switch: print minimal type info, print "disambiguating" type info, print all type info, though it sort of sounds to me like we only need two modes (disambiguating mode for diagnostics and all type info for pretty printing/@DoDoENT).

(I think the original goal of -ast-print was to respect the code as-written by the user, but I am not convinced we need that strict of a goal. I think "produces code which compiles and has the same semantics as the original source" is a more reasonable goal and once we can do that much we can think about improving fidelity more if we need to.)

OK, so it sounds like the printing behavior change (not necessarily with a policy flag) necessary for diagnostics (make it non-ambiguous/not invalid code which is what is currently emitted) would be a good bug fix both for diagnostics and for ast-print? I'm not sure we need anything other than that.

Agreed, fixing up diagnostics and ast print to not print invalid types would be a good fix to have. It doesn't necessitate a new printing policy.

So that what is currently printing as t1<{}> (ambiguous if the NTTP is of type auto, non-ambiguous, but invalid C++ if the NTTP has the specific type already) prints out as t1<t2{}> (correct C++ and disambiguates in the case of auto ambiguity).

Or would you want the non-auto case to print in diagnostics as t1<{}> since there's no ambiguity, despite that not being valid C++? And then have a separate policy for ast-print that still prints it as t1<t2{}> so it's valid code?

Personally (and others are welcome to disagree with me here!), I think we would want to print t1<t2{}> even in the auto case for diagnostics and ast print. For diagnostics, we sometimes print type information reasonably far away from where the declaration of a type is (and use a note to shift the user's gaze to the related type), and so I think having the extra type information is useful for that situation. But even when we print the type information reasonably close to the declaration of the type, it can be helpful because the code context might be easy to lose -- consider using Clang from an IDE where diagnostics are added to a listbox; if you copy the diagnostic from the listbox to paste into an email to a coworker to ask about it, the related code isn't going to come along without extra intervention.

CC @cjdb for awareness that we're talking about display of diagnostics

OK, so it sounds like the printing behavior change (not necessarily with a policy flag) necessary for diagnostics (make it non-ambiguous/not invalid code which is what is currently emitted) would be a good bug fix both for diagnostics and for ast-print? I'm not sure we need anything other than that.

Agreed, fixing up diagnostics and ast print to not print invalid types would be a good fix to have. It doesn't necessitate a new printing policy.

So that what is currently printing as t1<{}> (ambiguous if the NTTP is of type auto, non-ambiguous, but invalid C++ if the NTTP has the specific type already) prints out as t1<t2{}> (correct C++ and disambiguates in the case of auto ambiguity).

Or would you want the non-auto case to print in diagnostics as t1<{}> since there's no ambiguity, despite that not being valid C++? And then have a separate policy for ast-print that still prints it as t1<t2{}> so it's valid code?

Personally (and others are welcome to disagree with me here!), I think we would want to print t1<t2{}> even in the auto case for diagnostics and ast print.

(sorry to nitpick, but this conversation's been a bit confusing so trying to be extra clear) - I guess you meant "even in the *non*-auto" case? (the auto case I think is clearer that it should always have the type - otherwise it's ambiguous) - but yeah, I'm with you - I think the language requires the type even in the non-auto case and it seems reasonable to side with that for explicitness/clarity even if it could arguably be omitted without loss of information, technically.

For diagnostics, we sometimes print type information reasonably far away from where the declaration of a type is (and use a note to shift the user's gaze to the related type), and so I think having the extra type information is useful for that situation. But even when we print the type information reasonably close to the declaration of the type, it can be helpful because the code context might be easy to lose -- consider using Clang from an IDE where diagnostics are added to a listbox; if you copy the diagnostic from the listbox to paste into an email to a coworker to ask about it, the related code isn't going to come along without extra intervention.

Ok, so sounds like we're on the same page that t1<{}> is a diagnostic quality bug and we'd love to see a fix for it to include the top level type of the NTTP, as in t1<t2{}>, and that shouldn't need a new printing policy - because we never want to print t1<{}> and anywhere we do is a bug to be fixed.

OK, so it sounds like the printing behavior change (not necessarily with a policy flag) necessary for diagnostics (make it non-ambiguous/not invalid code which is what is currently emitted) would be a good bug fix both for diagnostics and for ast-print? I'm not sure we need anything other than that.

Agreed, fixing up diagnostics and ast print to not print invalid types would be a good fix to have. It doesn't necessitate a new printing policy.

So that what is currently printing as t1<{}> (ambiguous if the NTTP is of type auto, non-ambiguous, but invalid C++ if the NTTP has the specific type already) prints out as t1<t2{}> (correct C++ and disambiguates in the case of auto ambiguity).

Or would you want the non-auto case to print in diagnostics as t1<{}> since there's no ambiguity, despite that not being valid C++? And then have a separate policy for ast-print that still prints it as t1<t2{}> so it's valid code?

Personally (and others are welcome to disagree with me here!), I think we would want to print t1<t2{}> even in the auto case for diagnostics and ast print.

(sorry to nitpick, but this conversation's been a bit confusing so trying to be extra clear) - I guess you meant "even in the *non*-auto" case? (the auto case I think is clearer that it should always have the type - otherwise it's ambiguous) - but yeah, I'm with you - I think the language requires the type even in the non-auto case and it seems reasonable to side with that for explicitness/clarity even if it could arguably be omitted without loss of information, technically.

No apologies necessary, sorry for the confusion! Yes, I meant the non-auto case (practically, I mean both cases).

For diagnostics, we sometimes print type information reasonably far away from where the declaration of a type is (and use a note to shift the user's gaze to the related type), and so I think having the extra type information is useful for that situation. But even when we print the type information reasonably close to the declaration of the type, it can be helpful because the code context might be easy to lose -- consider using Clang from an IDE where diagnostics are added to a listbox; if you copy the diagnostic from the listbox to paste into an email to a coworker to ask about it, the related code isn't going to come along without extra intervention.

Ok, so sounds like we're on the same page that t1<{}> is a diagnostic quality bug and we'd love to see a fix for it to include the top level type of the NTTP, as in t1<t2{}>, and that shouldn't need a new printing policy - because we never want to print t1<{}> and anywhere we do is a bug to be fixed.

I think we're on the same page, yes, thanks!

DoDoENT updated this revision to Diff 468881.Oct 19 2022, 5:33 AM
DoDoENT retitled this revision from Introduce the `AlwaysIncludeTypeForNonTypeTemplateArgument` into printing policy to Disambiguate type names when printing NTTP types.
DoDoENT edited the summary of this revision. (Show Details)

Implemented changes requested in the review:

  • type name of NTTP is now always printed, as C++ requires
  • keep full NTTP type printing behind a policy, for those that want/need that
DoDoENT updated this revision to Diff 468883.Oct 19 2022, 5:37 AM

Update comment/documentation of the new policy to correctly reflect the change in previous diff.

I still don't think "keep full NTTP type printing behind a policy, for those that want/need that" is a policy we should add. String printed names aren't meant to be a tool for type reflection - the AST can be queried for that information.

clang/test/CodeGenCXX/debug-info-template.cpp
224

Looks like some unintended whitespace changes? (removing trailing whitespace from these lines)

I still don't think "keep full NTTP type printing behind a policy, for those that want/need that" is a policy we should add. String printed names aren't meant to be a tool for type reflection - the AST can be queried for that information.

I agree on that matter.

However, I'm building my clang with this policy enabled by default to provide my developers with GCC/MSVC-like verbosity in diagnostic messages. They prefer it that way.

I still don't think "keep full NTTP type printing behind a policy, for those that want/need that" is a policy we should add. String printed names aren't meant to be a tool for type reflection - the AST can be queried for that information.

I agree on that matter.

However, I'm building my clang with this policy enabled by default to provide my developers with GCC/MSVC-like verbosity in diagnostic messages. They prefer it that way.

@dblaikie -- but I thought we agreed that we would always print the full type, so the policy in this case is to print information *less* suitable for type reflection, right?

However, given that we don't think we'll use that policy in the tree, can that policy can be kept in your downstream instead @DoDoENT?

clang/include/clang/AST/PrettyPrinter.h
78 ↗(On Diff #468883)

Should this be defaulting to false? I thought we wanted to always include the type for NTTP printing?

I still don't think "keep full NTTP type printing behind a policy, for those that want/need that" is a policy we should add. String printed names aren't meant to be a tool for type reflection - the AST can be queried for that information.

I agree on that matter.

However, I'm building my clang with this policy enabled by default to provide my developers with GCC/MSVC-like verbosity in diagnostic messages. They prefer it that way.

@dblaikie -- but I thought we agreed that we would always print the full type, so the policy in this case is to print information *less* suitable for type reflection, right?

I'm confused - I'll try to say some things that might lead to less confusion, but I'm far from sure.

(I'm confused by the question/statement above by itself - "agreed we would always print the full type" + "print information *less* suitable for type reflection". Are those meant to go together? I think printing the "full" type (that might be ambiguous/unclear what full means in this conversation, but best guess) would make something more suitable for type reflection? (type reflection in the sense of "I can look at/analyze/split up the string and see type names that may be useful for some tooling purpose"))

My understanding is that there are 3 possible ways of printing under discussion (all demonstrated here: https://godbolt.org/z/1s8Mf6b8K ):

struct t1 { int v1; };
struct t2 { t1 v1; };
template<auto>
struct t3 { };
  1. t3<t2{t1{42}}>: Fully explicit - all types provided. Valid code, though not strictly necessary.
  2. t3<t2{{42}}>: Semi-explicit. Necessary for validity, even if the NTTP is not auto, but instead explicitly t2 (C++ won't deduce it - requiring it to be explicit).
  3. t3<{{3}}>: Not actually valid code (even if the NTTP is t2)

Clang produces (3) in some cases, and (2) in others.
MSVC and GCC always use (1).

When you say "always print the full type" - it sounds like (1)? But I thought/my preference has been that we should always use (2) because (1) can be quite verbose.

The patch I think currently makes Clang use (2) more consistently, and provides a PrintingPolicy flag to do (1).

I think the fix to use (2) more consistently is good, but I don't think we should add functionality for (1).

However, given that we don't think we'll use that policy in the tree, can that policy can be kept in your downstream instead @DoDoENT?

That would tend to be my conclusion/preference at this point.

But open to your perspective/preferences/etc, @aaron.ballman

I still don't think "keep full NTTP type printing behind a policy, for those that want/need that" is a policy we should add. String printed names aren't meant to be a tool for type reflection - the AST can be queried for that information.

I agree on that matter.

However, I'm building my clang with this policy enabled by default to provide my developers with GCC/MSVC-like verbosity in diagnostic messages. They prefer it that way.

@dblaikie -- but I thought we agreed that we would always print the full type, so the policy in this case is to print information *less* suitable for type reflection, right?

I'm confused - I'll try to say some things that might lead to less confusion, but I'm far from sure.

(I'm confused by the question/statement above by itself - "agreed we would always print the full type" + "print information *less* suitable for type reflection". Are those meant to go together? I think printing the "full" type (that might be ambiguous/unclear what full means in this conversation, but best guess) would make something more suitable for type reflection? (type reflection in the sense of "I can look at/analyze/split up the string and see type names that may be useful for some tooling purpose"))

My understanding is that there are 3 possible ways of printing under discussion (all demonstrated here: https://godbolt.org/z/1s8Mf6b8K ):

struct t1 { int v1; };
struct t2 { t1 v1; };
template<auto>
struct t3 { };
  1. t3<t2{t1{42}}>: Fully explicit - all types provided. Valid code, though not strictly necessary.
  2. t3<t2{{42}}>: Semi-explicit. Necessary for validity, even if the NTTP is not auto, but instead explicitly t2 (C++ won't deduce it - requiring it to be explicit).
  3. t3<{{3}}>: Not actually valid code (even if the NTTP is t2)

Clang produces (3) in some cases, and (2) in others.
MSVC and GCC always use (1).

When you say "always print the full type" - it sounds like (1)? But I thought/my preference has been that we should always use (2) because (1) can be quite verbose.

Ah, and I thought/my preference was that we should always use (1) because (2) requires the reader to infer information that may not be obvious from local context.

The patch I think currently makes Clang use (2) more consistently, and provides a PrintingPolicy flag to do (1).

I think the fix to use (2) more consistently is good, but I don't think we should add functionality for (1).

I was thinking we should use (1) consistently and not add the functionality for (2). Huttah, we found the confusion! :-D

However, given that we don't think we'll use that policy in the tree, can that policy can be kept in your downstream instead @DoDoENT?

That would tend to be my conclusion/preference at this point.

But open to your perspective/preferences/etc, @aaron.ballman

I got my perspective, perhaps incorrectly, from this bit of the conversation:

Ok, so sounds like we're on the same page that t1<{}> is a diagnostic quality bug and we'd love to see a fix for it to include the top level type of the NTTP, as in t1<t2{}>, and that shouldn't need a new printing policy - because we never want to print t1<{}> and anywhere we do is a bug to be fixed.

I think my confusion came from <{}> -- I was interpreting that as meaning any braces without a type preceding them, which t2{} would resolve. I think you intended me to read that as empty braces without a preceding type, specifically.

I can sympathize with the argument that brevity can be good thing, but I think that biases towards expert users over novice users of the language in terms of making the information understandable, at least in some small way. I tend to come down on the side of thinking verbosity is better than terseness for non-expert users because I think it's easier for humans to ignore information they think is irrelevant compared to correctly inferring information where it's missing, but I can also see the argument that you can overwhelm people with too much information.

@aaron.ballman: thanks for tracking down the source of confusion between our perspectives. (& yeah, sorry, could've included less ambiguous examples with the extra nesting so we'd have avoided all that confusion)

Do you have particular examples where you find the fully explicit (1) especially helpful (to a novice or otherwise) over (2)?

It seems to me once you've got the top level type you have enough info to compare two things, even if you don't know the nested member types - "X<Y{{3}, {5}}>" is distinct from "X<Y{{4}, {5}}>" it doesn't seem to matter what the intermediate type is? but yeah, I guess it raises the question "is it that the length or the height is mismatched here" and more explicit would avoid that question. It does seem unfortunately verbose to my mind - like if we improved the printer to show only the relevant types when diffing template names, for instance, that'd be an improvement (though I guess we already do fancy things for template type diffing that don't show complete copy-paste-into-source usable names anyway, so that's a different situation - and doesn't necessarily say what we should do when the type appears alone in a diagnostic, not as a comparison)

*shrug* Sorry to you both for all the talking/back and forth then. Adding the fully explicit mode will address @DoDoENT's string-based reflection needs and @aaron.ballman so there's no tension between Clang's diagnostic goals and @DoDoENT's reflection goals.

@aaron.ballman: thanks for tracking down the source of confusion between our perspectives. (& yeah, sorry, could've included less ambiguous examples with the extra nesting so we'd have avoided all that confusion)

No worries, text is such a fantastic medium for confusion. :-D

Do you have particular examples where you find the fully explicit (1) especially helpful (to a novice or otherwise) over (2)?

It seems to me once you've got the top level type you have enough info to compare two things, even if you don't know the nested member types - "X<Y{{3}, {5}}>" is distinct from "X<Y{{4}, {5}}>" it doesn't seem to matter what the intermediate type is? but yeah, I guess it raises the question "is it that the length or the height is mismatched here" and more explicit would avoid that question.

That's largely the kind of situation I was imagining. When you just have a pile of integers and braces, you have to mentally map the braces to how many levels of initialization exist and you have to map the integers to what specifically is being initialized. Those mappings gives very little context as to what types are involved. e.g., it's easy to see that {{3}, {5}} doesn't match {{4}, {5}} in the text, but it's less easy to see if that's because of Width vs Height problems or because there was a surprise specialization involved, etc.

Sometimes you don't need that extra information and sometimes it may be helpful, and I'm not certain we can heuristically tell those situations apart for the user.

It does seem unfortunately verbose to my mind - like if we improved the printer to show only the relevant types when diffing template names, for instance, that'd be an improvement (though I guess we already do fancy things for template type diffing that don't show complete copy-paste-into-source usable names anyway, so that's a different situation - and doesn't necessarily say what we should do when the type appears alone in a diagnostic, not as a comparison)

Yeah, it'd be nice to find a happy medium. CC @cjdb due to the exploratory work he's doing with improving the way we display diagnostics. Maybe Chris has a good idea here (or can help us resolve which direction to go)?

*shrug* Sorry to you both for all the talking/back and forth then. Adding the fully explicit mode will address @DoDoENT's string-based reflection needs and @aaron.ballman so there's no tension between Clang's diagnostic goals and @DoDoENT's reflection goals.

No worries about the back and forth (at least for me) -- I appreciate that we're taking the time to carefully consider the user experience.

In terms of next steps, I think we should try to see if there's a "clear" path forward in terms of printing the types vs not printing the types. If we find one, then great, we go that way. But if we don't seem to have a clear path forward (relatively quickly; I don't think we want to drag this discussion on for months trying to find the perfect answer), then I think I'm fine with the patch as-is. It fixes the issue of t<{}> (with empty braces specifically) while retaining the status quo in other areas, but still exposes useful functionality through the additional printing policy. Does that sound reasonable?

In terms of next steps, I think we should try to see if there's a "clear" path forward in terms of printing the types vs not printing the types. If we find one, then great, we go that way. But if we don't seem to have a clear path forward (relatively quickly; I don't think we want to drag this discussion on for months trying to find the perfect answer), then I think I'm fine with the patch as-is. It fixes the issue of t<{}> (with empty braces specifically) while retaining the status quo in other areas, but still exposes useful functionality through the additional printing policy. Does that sound reasonable?

If you reckon (1) is better overall anyway - happy enough to defer to your opinion there and go with that, skip/omit the printing policy.

I think the policy is like adding off-by-default warnings, and supporting a use case (using the string name for reflection when we'd recommend the AST) I don't think we want to encourage/support.

Admittedly going with (1) means that @DoDoENT's use case will then work, until/unless we come back around and make more strategic use of type names in this printing in the future to bring down the verbosity - so I'd still discourage that use case & warn that this isn't a guarantee that all type names will be included going forward.

In terms of next steps, I think we should try to see if there's a "clear" path forward in terms of printing the types vs not printing the types. If we find one, then great, we go that way. But if we don't seem to have a clear path forward (relatively quickly; I don't think we want to drag this discussion on for months trying to find the perfect answer), then I think I'm fine with the patch as-is. It fixes the issue of t<{}> (with empty braces specifically) while retaining the status quo in other areas, but still exposes useful functionality through the additional printing policy. Does that sound reasonable?

If you reckon (1) is better overall anyway - happy enough to defer to your opinion there and go with that, skip/omit the printing policy.

Okay, thanks for that! I'm still happy to consider alternatives if we can think of any, FWIW.

I think the policy is like adding off-by-default warnings, and supporting a use case (using the string name for reflection when we'd recommend the AST) I don't think we want to encourage/support.

That's a fair point. So if we find no better approach, drop the policy and just always print types.

Admittedly going with (1) means that @DoDoENT's use case will then work, until/unless we come back around and make more strategic use of type names in this printing in the future to bring down the verbosity - so I'd still discourage that use case & warn that this isn't a guarantee that all type names will be included going forward.

Absolutely agreed! IMO, whatever strings you get out of something with a printing policy generally have very little/no stability guarantees (it's part of the C++ API surface, so it's exactly as stable as any other C++ API). We'll change printing behavior whenever we feel the need to.

In terms of next steps, I think we should try to see if there's a "clear" path forward in terms of printing the types vs not printing the types. If we find one, then great, we go that way. But if we don't seem to have a clear path forward (relatively quickly; I don't think we want to drag this discussion on for months trying to find the perfect answer), then I think I'm fine with the patch as-is. It fixes the issue of t<{}> (with empty braces specifically) while retaining the status quo in other areas, but still exposes useful functionality through the additional printing policy. Does that sound reasonable?

If you reckon (1) is better overall anyway - happy enough to defer to your opinion there and go with that, skip/omit the printing policy.

Okay, thanks for that! I'm still happy to consider alternatives if we can think of any, FWIW.

I think the policy is like adding off-by-default warnings, and supporting a use case (using the string name for reflection when we'd recommend the AST) I don't think we want to encourage/support.

That's a fair point. So if we find no better approach, drop the policy and just always print types.

Yeah - I don't think I have any better ideas that are reasonably within reach for a newer contributor nor worth holding up fixing the t1<{}> bug for.

Yeah, for type comparisons/fancy template type diffing we could trim things down if we aren't already(we probably aren't), but that doesn't address the cases where the name appears alone and not as part of a comparison. And when there's no particular context, I can't think of any cues we could use to decide which nested names "matter".

(rabbit hole/tangent: Arguably what might be more useful to the user might be a name that's not even usable in C++ - using the member variable names, as well as the type names, though that'd be even more verbose... like t1<t2{.length=3, .width=7}> - which, I guess, if you had user defined types for some kind of extra type safety, would get as verbose as t1<Shape{Length{.length = 3}, Width{.width = 5}}> though I would expect that would be the minority)

In terms of next steps, I think we should try to see if there's a "clear" path forward in terms of printing the types vs not printing the types. If we find one, then great, we go that way. But if we don't seem to have a clear path forward (relatively quickly; I don't think we want to drag this discussion on for months trying to find the perfect answer), then I think I'm fine with the patch as-is. It fixes the issue of t<{}> (with empty braces specifically) while retaining the status quo in other areas, but still exposes useful functionality through the additional printing policy. Does that sound reasonable?

If you reckon (1) is better overall anyway - happy enough to defer to your opinion there and go with that, skip/omit the printing policy.

Okay, thanks for that! I'm still happy to consider alternatives if we can think of any, FWIW.

I think the policy is like adding off-by-default warnings, and supporting a use case (using the string name for reflection when we'd recommend the AST) I don't think we want to encourage/support.

That's a fair point. So if we find no better approach, drop the policy and just always print types.

Yeah - I don't think I have any better ideas that are reasonably within reach for a newer contributor nor worth holding up fixing the t1<{}> bug for.

Yeah, for type comparisons/fancy template type diffing we could trim things down if we aren't already(we probably aren't), but that doesn't address the cases where the name appears alone and not as part of a comparison. And when there's no particular context, I can't think of any cues we could use to decide which nested names "matter".

(rabbit hole/tangent: Arguably what might be more useful to the user might be a name that's not even usable in C++ - using the member variable names, as well as the type names, though that'd be even more verbose... like t1<t2{.length=3, .width=7}> - which, I guess, if you had user defined types for some kind of extra type safety, would get as verbose as t1<Shape{Length{.length = 3}, Width{.width = 5}}> though I would expect that would be the minority)

Correction, apparently that is valid syntax, which is nice: https://godbolt.org/z/xaYYo3fva

(rabbit hole/tangent: Arguably what might be more useful to the user might be a name that's not even usable in C++ - using the member variable names, as well as the type names, though that'd be even more verbose... like t1<t2{.length=3, .width=7}> - which, I guess, if you had user defined types for some kind of extra type safety, would get as verbose as t1<Shape{Length{.length = 3}, Width{.width = 5}}> though I would expect that would be the minority)

True, but if user defined type is defined as CRTP (very common for strong typedefs):

template< typename T, typename U >
struct StrongTypedef { T value; };

struct Length : StrongTypedef< unsigned, Length > {};
struct Width : StrongTypedef< unsigned, Width > {};

struct Shape
{
    Length length;
    Width  width;
};

template< Shape > struct S {};

you would get something like S<Shape{.length = Length{StrongTypedef<unsigned int, Length>{.value = 50}}, .width = Width{StrongTypedef<unsigned int, Width>{.value = 70}}}> (inspired by this GCC output), which is truly verbose. However, the current way of printing (assuming member names are printed) it would print something like S<{.length = {{.value = 50}}, .width = {{.value = 70}}}>. Ideally, in this scenario, probably the best possible output would be S<Shape{.length = Length{{.value = 50}}, .width = Width{{.value = 70}}}>. however, I'm not exactly sure how to achieve this (my patch would print Shape, but not Length and Width with my policy disabled.

Personally, I prefer full verbosity in all cases. Yes, it's a bit more to read, but does not confuse our less experienced c++ developers. This is why our internal build of clang has this enabled by default and I would be actually very happy if this patch gets accepted in a form where full types are always printed without the need for any new policies. If you are OK with that, I can update the patch. Just, please, let me know.

clang/include/clang/AST/PrettyPrinter.h
78 ↗(On Diff #468883)

I've set this to false by default to not interfere with the current behavior of the clang.

However, after today's discussion, I think I can completely remove the policy and simply always print the full type. That would be option (1) from this comment and would make clang behave the same as GCC and MSVC.

If you are OK with that, I'll be happy to update the patch.

clang/test/CodeGenCXX/debug-info-template.cpp
224

Sorry about that. I've configured my editor to always remove trailing whitespace (we have this policy in the company) on save action. I can return the trailing whitespace if you want, although I would like to understand the reasoning for keeping the trailing whitespace...

you would get something like S<Shape{.length = Length{StrongTypedef<unsigned int, Length>{.value = 50}}, .width = Width{StrongTypedef<unsigned int, Width>{.value = 70}}}> (inspired by this GCC output), which is truly verbose. However, the current way of printing (assuming member names are printed) it would print something like S<{.length = {{.value = 50}}, .width = {{.value = 70}}}>. Ideally, in this scenario, probably the best possible output would be S<Shape{.length = Length{{.value = 50}}, .width = Width{{.value = 70}}}>. however, I'm not exactly sure how to achieve this (my patch would print Shape, but not Length and Width with my policy disabled.

Yeah, considering this dimension I'd have thought /just/ printing the member names would be more informative than printing the type (but no doubt folks might have different opinions) - and I guess keeping the top level type so the identifier is valid code (not a strict necessity, but seems nice to have). That way a type /without/ strong typedefs is more readable S<Shape{.length = 50, .width = 70}> and the strong typedefs would be S<Shape{.length = {{.value = 50}}, .width = {{.value = 70}}}>. Rather than S<Shape{50, 70}> V S<Shape{Width{StrongTypedef<unsigned int, Width>{50}, Length{StrongTypedef<unsigned int, Length>{70}}>.

But yeah, not sure/open to perspectives.

@aaron.ballman - member names V type names V both?

clang/test/CodeGenCXX/debug-info-template.cpp
224

Oh, no problem with fixing them when you're touching that line of code (& I guess that's what your editor does - and then when these lines were undone, the whitespace removal was leftover) - but we tend not to do trailing whitespace cleanup by itself because it isn't worth the cost/disadvantage when it comes to doing source archaeology/blame/etc - and even if we are doing the cleanup it's usually important to separate it into independent commits so it's easier to focus on the important parts of a change under review.

FWIW, i find the GCC diagnostics (and the application of this patch) to be much more clear/easy to read. The pile of { and } don't really look useful/readable/meaningful to me, and leaves us ambiguous, so I'm in favor of this.

As far as the over-verbosity, I would be interested to see if we can do some additional work to make sure we use type aliases whenever possible, I believe @mizvekov has done some work in that direction, so perhaps he has commetns?

But yeah, not sure/open to perspectives.

@aaron.ballman - member names V type names V both?

I think type names are really the only thing that will disambiguate the expressions in some circumstances. Consider the case where there are two unrelated classes both with the same member name (like Person::name and Vehicle::name) -- having: S<Shape{.obj = {.name = "foo"}}> would still leave the reader guessing as to what object is actually being created without having to go chase the information down themselves.

That said, I am also curious to hear if @mizvekov has thoughts on this topic because he's been doing excellent work on template readability.

FWIW, i find the GCC diagnostics (and the application of this patch) to be much more clear/easy to read. The pile of { and } don't really look useful/readable/meaningful to me, and leaves us ambiguous, so I'm in favor of this.

I'm not suggesting we go with anything ambiguous - so including the top level type (especially for auto template parameters where omitting it could be genuinely ambiguous, but even for non-auto where omitting the type would produce an invalid identifier, but not be ambiguous) seems like a good thing to me.

It's a question of whether we include the nested types, or the nested variable names, both, neither, or something else.

But yeah, not sure/open to perspectives.

@aaron.ballman - member names V type names V both?

I think type names are really the only thing that will disambiguate the expressions in some circumstances. Consider the case where there are two unrelated classes both with the same member name (like Person::name and Vehicle::name) -- having: S<Shape{.obj = {.name = "foo"}}> would still leave the reader guessing as to what object is actually being created without having to go chase the information down themselves.

But the top level type is specified - that's an unambiguous identifier (Shape). I think there'd be cases where only using the type name (and not the member name) is ambiguous - probably more often. Like I have a struct with a couple of strings and a few ints. Including the names of the string type twice and the int type three times is probably not very informative - but including the member variable names is more likely to be informative to the user within the domain of the outer type information (this is what debuggers tend to do, for instance - print the member names, not their type names). We can do both, but that seems really verbose. Maybe that's the right thing though?

That said, I am also curious to hear if @mizvekov has thoughts on this topic because he's been doing excellent work on template readability.

mizvekov added a comment.EditedOct 22 2022, 9:29 AM

I think the type printer's purpose is for printing types for diagnostics, not for generating a lossless representation of types as strings, as the ast dumper does.
So packing so much information in them that hurts readability would be against that purpose.
I do agree with everyone here that the provided example is one where we are unhelpful and provide too little information.

When printing types for diagnostics we roughly want to:

  1. Point the user to the relevant type which is being diagnosed, as in the source code. When we have a TypeLoc, and that would otherwise be more presentable for the given diagnostic, we can point to the type with a caret. Otherwise, we try to print the type as closely as we can to what was written, so that the user can connect the diagnostic with something that was done in source code.
  2. Give some sort of explanatory analysis, revealing implicit / context sensitive information which might be relevant.

I think that for the motivating example, we completely fail to provide 1), we don't have a TypeLoc, and we only have a canonical type.
I think fixing 1) is the most pressing issue, and is not addressed by this patch.
Ideally, long term, we can remove all instances where we try to print canonical types for diagnostic purposes, and not have the type printer deal with that, simplifying things.

For the short term, we have canonical types, and we need the type printer to deal with them reasonably.
I agree that the way MSVC is presenting these canonicalized NTTPs of class type looks the most sensible here.

But by fixing 1), we have more options. We can do something like:

no known conversion from 'NDArray<[...], W (aka 'Width{}')>' to 'const NDArray<[...], H (aka 'Height{}')>' for 1st argument

Or perhaps in addition to the aka, we can generate notes which place a caret in source code for each of W and H.

I think the type printer's purpose is for printing types for diagnostics, not for generating a lossless representation of types as strings, as the ast dumper does.
So packing so much information in them that hurts readability would be against that purpose.
I do agree with everyone here that the provided example is one where we are unhelpful and provide too little information.

When printing types for diagnostics we roughly want to:

  1. Point the user to the relevant type which is being diagnosed, as in the source code. When we have a TypeLoc, and that would otherwise be more presentable for the given diagnostic, we can point to the type with a caret. Otherwise, we try to print the type as closely as we can to what was written, so that the user can connect the diagnostic with something that was done in source code.
  2. Give some sort of explanatory analysis, revealing implicit / context sensitive information which might be relevant.

I think that for the motivating example, we completely fail to provide 1), we don't have a TypeLoc, and we only have a canonical type.
I think fixing 1) is the most pressing issue, and is not addressed by this patch.
Ideally, long term, we can remove all instances where we try to print canonical types for diagnostic purposes, and not have the type printer deal with that, simplifying things.

For the short term, we have canonical types, and we need the type printer to deal with them reasonably.
I agree that the way MSVC is presenting these canonicalized NTTPs of class type looks the most sensible here.

Could you describe this in more detail? I find that to be more verbose than is necessary/helpful and probably the variable name based option to be more informative than the type name based option (if I have Shape{3, 5} (because I have two int members) that's less helpful than Shape{.length = 3, .width = 5} and all the type information seems pretty large/verbose.

& given the example:

NDArray<float, Height{DimensionImpl<Height>{Dimension{0}}}, Width{DimensionImpl<Width>{Dimension{0}}}, Channels{DimensionImpl<Channels>{Dimension{0}}}>

V

NDArray<float, Height{{{0}}}, Width{{{0}}}, Channels{{{0}}}>

The former seems to not add a lot of value for the user - in this case due to the CRTP and other shenanigans. Admittedly other types might have nested aggregates that are more interesting/less obvious - but variable names still seem to me like they'd be more informative than types. (what I'm suggesting is the outermost has a type (since there's nothing else to use there, and C++ requires a type there) and nested things could use variable names (because they're more contextually relevant - differentiating a width and height, for instance, even though they might be the same type))

But by fixing 1), we have more options. We can do something like:

no known conversion from 'NDArray<[...], W (aka 'Width{}')>' to 'const NDArray<[...], H (aka 'Height{}')>' for 1st argument

Or perhaps in addition to the aka, we can generate notes which place a caret in source code for each of W and H.

That feels liable to be /really/ verbose & make it harder to find the signal in all that noise, but maybe there's ways to make that work. Certainly having the type as written gives us more choices about how to do it/see what can be made to work, for sure.

Could you describe this in more detail? I find that to be more verbose than is necessary/helpful and probably the variable name based option to be more informative than the type name based option (if I have Shape{3, 5} (because I have two int members) that's less helpful than Shape{.length = 3, .width = 5} and all the type information seems pretty large/verbose.

I agree printing all of the types of all of the subobjects is too much.

I wouldn't object to what you are suggesting, it seems that if all we had was the canonical type of the NTTP, and no idea of what would be helpful for this diagnostic, then printing it in this designated-initializer style seems more readable, for reasonably small objects.

Though if we had the as-written type, I think the best option here is just printing the expression that initializes it, it's type, and perhaps if it was simple declref, pointing to the declaration, but that is probably situational.

The former seems to not add a lot of value for the user - in this case due to the CRTP and other shenanigans. Admittedly other types might have nested aggregates that are more interesting/less obvious - but variable names still seem to me like they'd be more informative than types. (what I'm suggesting is the outermost has a type (since there's nothing else to use there, and C++ requires a type there) and nested things could use variable names (because they're more contextually relevant - differentiating a width and height, for instance, even though they might be the same type))

Yeah, that definitely looks better than all the other options, if all we had was the NTTP value.

That feels liable to be /really/ verbose & make it harder to find the signal in all that noise, but maybe there's ways to make that work. Certainly having the type as written gives us more choices about how to do it/see what can be made to work, for sure.

It seems to me that, at least for this diagnostic, and probably most others, just printing the expression that initializes the NTTP and it's type is enough.

Could you describe this in more detail? I find that to be more verbose than is necessary/helpful and probably the variable name based option to be more informative than the type name based option (if I have Shape{3, 5} (because I have two int members) that's less helpful than Shape{.length = 3, .width = 5} and all the type information seems pretty large/verbose.

I agree printing all of the types of all of the subobjects is too much.

Ah, OK.

I wouldn't object to what you are suggesting, it seems that if all we had was the canonical type of the NTTP, and no idea of what would be helpful for this diagnostic, then printing it in this designated-initializer style seems more readable, for reasonably small objects.

I think we should restrict the discussion in this review to being only about the case where we have the canonical type (acknowledging that there's broader work to reduce the number of cases where that is what we have to deal with).

When you say "for reasonably small objects" are you suggesting this patch/an nearish-term solution to the original problem presented here (t1<{}> with no info about the type of the NTTP) should include some heuristic to judge "reasonably small objects" and choose a different rendering in that case?

Though if we had the as-written type, I think the best option here is just printing the expression that initializes it, it's type, and perhaps if it was simple declref, pointing to the declaration, but that is probably situational.

Given this review's already gotten fairly muddled - hopefully we can defer the nuances of what to do when we have the type as-written for other threads/discussions that are addressing that case.

The former seems to not add a lot of value for the user - in this case due to the CRTP and other shenanigans. Admittedly other types might have nested aggregates that are more interesting/less obvious - but variable names still seem to me like they'd be more informative than types. (what I'm suggesting is the outermost has a type (since there's nothing else to use there, and C++ requires a type there) and nested things could use variable names (because they're more contextually relevant - differentiating a width and height, for instance, even though they might be the same type))

Yeah, that definitely looks better than all the other options, if all we had was the NTTP value.

That feels liable to be /really/ verbose & make it harder to find the signal in all that noise, but maybe there's ways to make that work. Certainly having the type as written gives us more choices about how to do it/see what can be made to work, for sure.

It seems to me that, at least for this diagnostic, and probably most others, just printing the expression that initializes the NTTP and it's type is enough.

OK, I'm not sure what to make of this, though - "the expression that initializes the NTTP" is something we don't have, right? Just the value, I think, at this point? So we're trying to pick some kind of rendering for that value - is it just the outer type and then braces and values (Using the "bad"/difficult case of nested subobjects - t1{{3}, {4}}) or with nested types (t1{t2{3}, t3{4]}) or with member names (t1{.v1 = {.v2 = 3}, .v3 = {.v4 = 4}}) or both (t1{.v1 = t2{.v2 = 3}, .v3 = t3{.v4 = 4}}).

I guess you meant the first, t1{{3}, {4}}? OK, so I think you're agreeing with what I would prefer/outlined in https://reviews.llvm.org/D134453#3871251 (which isn't where this review was going when @aaron.ballman roped you in, it was going towards fully explicit (t1{t2{3}, t3{4}}))

I think we should restrict the discussion in this review to being only about the case where we have the canonical type (acknowledging that there's broader work to reduce the number of cases where that is what we have to deal with).

I agree that we can leave that aside for this patch.

When you say "for reasonably small objects" are you suggesting this patch/an nearish-term solution to the original problem presented here (t1<{}> with no info about the type of the NTTP) should include some heuristic to judge "reasonably small objects" and choose a different rendering in that case?

I don't think we necessarily have to do something about that from the start, but it's probably a problem that is going to come up at some point.
We are trying to render the whole structure of a class type in one line of diagnostic, seems like this breaks new ground.

OK, I'm not sure what to make of this, though - "the expression that initializes the NTTP" is something we don't have, right? Just the value, I think, at this point? So we're trying to pick some kind of rendering for that value - is it just the outer type and then braces and values (Using the "bad"/difficult case of nested subobjects - t1{{3}, {4}}) or with nested types (t1{t2{3}, t3{4]}) or with member names (t1{.v1 = {.v2 = 3}, .v3 = {.v4 = 4}}) or both (t1{.v1 = t2{.v2 = 3}, .v3 = t3{.v4 = 4}}).

I guess you meant the first, t1{{3}, {4}}? OK, so I think you're agreeing with what I would prefer/outlined in https://reviews.llvm.org/D134453#3871251 (which isn't where this review was going when @aaron.ballman roped you in, it was going towards fully explicit (t1{t2{3}, t3{4}}))

I am a bit torn actually, something like t1{.name = "Bar"} does look better ideally, but it's probably more prone to bad corner cases. Will probably not work well for tuples.

The second option, t1{{3}, {4}}, looks more well-rounded.

The fully explicit t1{t2{3}, t3{4}} option seems like it would be both less readable and also even more prone to bad corner cases than the first (field names) option.

I think we should restrict the discussion in this review to being only about the case where we have the canonical type (acknowledging that there's broader work to reduce the number of cases where that is what we have to deal with).

I agree that we can leave that aside for this patch.

+1 :-)

When you say "for reasonably small objects" are you suggesting this patch/an nearish-term solution to the original problem presented here (t1<{}> with no info about the type of the NTTP) should include some heuristic to judge "reasonably small objects" and choose a different rendering in that case?

I don't think we necessarily have to do something about that from the start, but it's probably a problem that is going to come up at some point.
We are trying to render the whole structure of a class type in one line of diagnostic, seems like this breaks new ground.

OK, I'm not sure what to make of this, though - "the expression that initializes the NTTP" is something we don't have, right? Just the value, I think, at this point? So we're trying to pick some kind of rendering for that value - is it just the outer type and then braces and values (Using the "bad"/difficult case of nested subobjects - t1{{3}, {4}}) or with nested types (t1{t2{3}, t3{4]}) or with member names (t1{.v1 = {.v2 = 3}, .v3 = {.v4 = 4}}) or both (t1{.v1 = t2{.v2 = 3}, .v3 = t3{.v4 = 4}}).

I guess you meant the first, t1{{3}, {4}}? OK, so I think you're agreeing with what I would prefer/outlined in https://reviews.llvm.org/D134453#3871251 (which isn't where this review was going when @aaron.ballman roped you in, it was going towards fully explicit (t1{t2{3}, t3{4}}))

I am a bit torn actually, something like t1{.name = "Bar"} does look better ideally, but it's probably more prone to bad corner cases. Will probably not work well for tuples.

I also think this form will get chatty in practice. It's not that uncommon for real world code to have quite a few members in a structure, so I could see t1{.name = "Bar"} being reasonable while t1{.name = "Bar", .age = 183, .height = -12, .weight = 1.2f, .eye_color = Eyes::Chartreuse, .smell = Smells::LikeTeenSpirit } being a far less reasonable thing to print. That said, we could perhaps have a cut point so we print with the member names if there's < N of them and print without member names otherwise. But I'm not certain it's worth it.

The second option, t1{{3}, {4}}, looks more well-rounded.

I can live with this approach if folks think it's the better way to go. But my concern still remains that the user has no idea what's being constructed by {3} or {4} without tracking down which constructor of t1 is being called. Because of overloads, std::initializer_list, and conversion operators (and the fact that C++ initialization rules are a bit of a disaster), I think this can be unobvious sometimes without including type names. That said, after playing around a bit, I think those times might involve more contorted code than others are worried about, and so maybe it's not worth worrying about those cases. The fact that Clang is the only C++ compiler that doesn't print full type information makes me wary though. My thinking is that it's always more frustrating to have not enough information in a diagnostic than too much information (both are problems in their own ways though).

The fully explicit t1{t2{3}, t3{4}} option seems like it would be both less readable and also even more prone to bad corner cases than the first (field names) option.

I can understand it being less readable for deep nesting situations, but I do not see why it would be prone to bad corner cases -- it's the most explicit form we can write (and matches the behavior of all other C++ compilers, from what we can tell).

My thinking is that it's always more frustrating to have not enough information in a diagnostic than too much information (both are problems in their own ways though).
...
I can understand it being less readable for deep nesting situations, but I do not see why it would be prone to bad corner cases -- it's the most explicit form we can write (and matches the behavior of all other C++ compilers, from what we can tell).

I agree with @aaron.ballman. There is no ideal solution here, and I would too prefer to have too much information rather than not enough. GCC and MSVC are quite fine with printing full type names and it's up to IDE's and good editors and CI log parsers to parse that and present in a possibly better format.

If others would agree with that as well, I'll be happy to remove the policy from my PR completely and simply make clang print full explicit types always, just like GCC and MSVC do.

I can live with this approach if folks think it's the better way to go. But my concern still remains that the user has no idea what's being constructed by {3} or {4} without tracking down which constructor of t1 is being called. Because of overloads, std::initializer_list, and conversion operators (and the fact that C++ initialization rules are a bit of a disaster), I think this can be unobvious sometimes without including type names. That said, after playing around a bit, I think those times might involve more contorted code than others are worried about, and so maybe it's not worth worrying about those cases. The fact that Clang is the only C++ compiler that doesn't print full type information makes me wary though. My thinking is that it's always more frustrating to have not enough information in a diagnostic than too much information (both are problems in their own ways though).

I think the problem here is that since we don't have the as-written argument, we just have the converted argument which is an APValue with the state of the object.
I don't think we can in general, starting from the state, figure out which constructor would produce that state.

So we might print it with a syntax as if the class was a simple pod type, even though that might produce a completely different result if used in practice.
I don't think we can do much better without having the actual expression used.

That might be a point in favor of the designated-initializer style with the field names, that should never produce the wrong result if copy-pasted back to code.

I can understand it being less readable for deep nesting situations, but I do not see why it would be prone to bad corner cases -- it's the most explicit form we can write (and matches the behavior of all other C++ compilers, from what we can tell).

The types of the fields might be internal, or otherwise too distant from user code.

it's up to IDE's and good editors and CI log parsers to parse that and present in a possibly better format.

Lots of folks use the diagnostic info from their compiler without IDEs or log parsers doing anything else to them - we should be trying to design these things for usability directly. (SARIF and such are probably the direction to go when producing diagnostics intended to be massaged by tooling in various ways - could provide collapsed views of template type names or these struct literals - comparitive highlighting views (though we do some of that for template type diffing in clang today - maybe one day we'll do that/expand it to cover the struct literals too))

So we might print it with a syntax as if the class was a simple pod type, even though that might produce a completely different result if used in practice.
I don't think we can do much better without having the actual expression used.

That might be a point in favor of the designated-initializer style with the field names, that should never produce the wrong result if copy-pasted back to code.

Oh, yeah, I hadn't considered that. :/ that is awkward & something I would certainly place some value in avoiding (was one of the issues I was motivated by in the earlier discussion, that it'd be useful if we produced an identifier that also happened to be valid code - but I wasn't considering the case where we might produce something valid-but-incorrect, just valid being better than invalid - but yeah, invalid being better than valid-but-incorrect).

it's up to IDE's and good editors and CI log parsers to parse that and present in a possibly better format.

Lots of folks use the diagnostic info from their compiler without IDEs or log parsers doing anything else to them - we should be trying to design these things for usability directly. (SARIF and such are probably the direction to go when producing diagnostics intended to be massaged by tooling in various ways - could provide collapsed views of template type names or these struct literals - comparitive highlighting views (though we do some of that for template type diffing in clang today - maybe one day we'll do that/expand it to cover the struct literals too))

+1 to the point that we want to design our diagnostics to be usable without requiring another tool, and +1 to using SARIF for when we want to pass our diagnostics to another tool which modifies them somehow.

So we might print it with a syntax as if the class was a simple pod type, even though that might produce a completely different result if used in practice.
I don't think we can do much better without having the actual expression used.

That might be a point in favor of the designated-initializer style with the field names, that should never produce the wrong result if copy-pasted back to code.

Oh, yeah, I hadn't considered that. :/ that is awkward & something I would certainly place some value in avoiding (was one of the issues I was motivated by in the earlier discussion, that it'd be useful if we produced an identifier that also happened to be valid code - but I wasn't considering the case where we might produce something valid-but-incorrect, just valid being better than invalid - but yeah, invalid being better than valid-but-incorrect).

Yeah, I would place value in avoiding that outcome as well. Also, the point by @mizvekov that the types may be internal types is kind of compelling to me as well; I was thinking that the types involved are most often going to be ones users are aware of, but that's certainly not true for highly generic code like STL implementations.

In the interests of getting this review unstuck, I think we should 1) remove the printing policy option (which should address the concerns from @dblaikie about use of that policy for introspection purposes); the policy is something that @DoDoENT can carry in their downstream, 2) keep the changes in TemplateBase.cpp and drop the changes from APValue.cpp.

That gives us an incremental improvement over what the status quo is, and we can debate other improvements (if we can think of any we'd like to make) separately. Is everyone comfortable with this?

So we might print it with a syntax as if the class was a simple pod type, even though that might produce a completely different result if used in practice.
I don't think we can do much better without having the actual expression used.

That might be a point in favor of the designated-initializer style with the field names, that should never produce the wrong result if copy-pasted back to code.

Oh, yeah, I hadn't considered that. :/ that is awkward & something I would certainly place some value in avoiding (was one of the issues I was motivated by in the earlier discussion, that it'd be useful if we produced an identifier that also happened to be valid code - but I wasn't considering the case where we might produce something valid-but-incorrect, just valid being better than invalid - but yeah, invalid being better than valid-but-incorrect).

Yeah, I would place value in avoiding that outcome as well. Also, the point by @mizvekov that the types may be internal types is kind of compelling to me as well; I was thinking that the types involved are most often going to be ones users are aware of, but that's certainly not true for highly generic code like STL implementations.

In the interests of getting this review unstuck, I think we should 1) remove the printing policy option (which should address the concerns from @dblaikie about use of that policy for introspection purposes); the policy is something that @DoDoENT can carry in their downstream, 2) keep the changes in TemplateBase.cpp and drop the changes from APValue.cpp.

That gives us an incremental improvement over what the status quo is, and we can debate other improvements (if we can think of any we'd like to make) separately. Is everyone comfortable with this?

This would have the effect of implementing (2) from here ( https://reviews.llvm.org/D134453#3869610 )? I'm good with that as an incremental improvement.

DoDoENT updated this revision to Diff 471081.Oct 27 2022, 2:11 AM
DoDoENT edited the summary of this revision. (Show Details)

Removed the policy and full explicit type printing behind it and keep only printing of first type name, as suggested by the review.

This is essentially (2) from here: https://reviews.llvm.org/D134453#3869610

DoDoENT updated this revision to Diff 471083.Oct 27 2022, 2:14 AM

Restored trailing whitespace on untouched line that was automatically deleted by editor's save action.

Thanks! The changes are generally looking good to me. Can you add a release note about the improvement to our diagnostic behavior (in clang/docs/ReleaseNotes.rst -- there's a section for diagnostic improvements) and can you rebase the patch on main so that precommit CI testing can run?

DoDoENT updated this revision to Diff 471145.Oct 27 2022, 6:02 AM

Updated clang release notes about improvement to diagnostics as requested by reviewers and added more vim-specific temp files to .gitignore.

aaron.ballman accepted this revision.Oct 27 2022, 6:23 AM

LGTM assuming precommit CI comes back green. Do you need someone to land these changes on your behalf? If so, please let me know what name and email address you'd like used for patch attribution.

.gitignore
23–26 ↗(On Diff #471145)

I don't have any particular problems with this change, but it should be separate from this patch (it's logically distinct from the other changes here). I can handle fixing that up when landing the changes if we go that route (I'll split it into two commits).

This revision is now accepted and ready to land.Oct 27 2022, 6:23 AM

Do you need someone to land these changes on your behalf? If so, please let me know what name and email address you'd like used for patch attribution.

Yes. My name is Nenad Mikša, e-mail address: dodoentertainment@gmail.com

.gitignore
23–26 ↗(On Diff #471145)

Should I put that in the separate PR then?

If you can handle this by splitting it into two commits, please do so.

DoDoENT updated this revision to Diff 471174.Oct 27 2022, 8:29 AM

updated new tests that appeared after rebasing to main and removed vim-specific files from .gitignore (I'll keep that downstream).

DoDoENT updated this revision to Diff 471204.Oct 27 2022, 10:03 AM

apply git-clang-format to changed files

This revision was automatically updated to reflect the committed changes.