This is an archive of the discontinued LLVM Phabricator instance.

[Index/USRGeneration] Use NamedDecl::getDeclName() instead of NamedDecl::printName in USRGenerator::EmitDeclName
Needs ReviewPublic

Authored by riccibruno on Jul 26 2020, 9:49 AM.

Details

Summary

NamedDecl::printName is used as a customisation point for getNameForDiagnostic. The default implementation is just sending the DeclarationName into the output stream, but it can be overloaded to display a more user-friendly name for some entities.

Currently only DecompositionDecl and MSGuidDecl have an overload (and they were both added after this code was written) but I will be adding overloads to VarDecl, FieldDecl, EnumDecl and RecordDecl to better handle unnamed entities.

To preserve the current behaviour (that is except for DecompositionDecl and MSGuidDecl which I don't think are relevant here?), just send the DeclarationName into the output instead of going through NamedDecl::printName.

Diff Detail

Event Timeline

riccibruno created this revision.Jul 26 2020, 9:49 AM

Update a comment I originally missed.

The overloads of NamedDecl::printName are in D84658.

I can see how this can cause problems with current implementation, but I don't think the proposed solution in here is also solving these problems.

For example, MSGuidDecl's DeclName is actually empty (it is the printName overload that makes it meaningful). Hence when generating the USR
for two guid decls coming from similar contexts (there are other factors like namespace etc.) the code might end up producing the same output for both.

Generating the same USR for two different symbols might mess up lots of functionality in clangd's (and possibly other tools') index. I am not aware of
all the signals going into the USR generation and what is the importance of Decl name itself, maybe there should be a getUniqueName which defaults to
getDeclName, and can be overloaded for cases like MSGuidDecl. But as mentioned I don't understand this code well enough to make any concrete
suggestions so I would wait for an input from other reviewers.

(Disclaimer: I am not at all familiar with this code)

A few notes in no particular order:
Some entities with special names should presumably be mangled, but are currently not. Example:

namespace special_names {
struct S { operator int(); };
int operator""_udl(const char *);
template <typename T> struct ST {};
template <typename T> ST(T) -> ST<int>;
}

results in:

// CHECK: usrs.cpp c:@N@special_names Extent=[120:1 - 127:2]
// CHECK: usrs.cpp c:@N@special_names@S@S Extent=[121:1 - 123:2]
// CHECK: usrs.cpp c:@N@special_names@S@S@F@operator int# Extent=[122:3 - 122:17]
// CHECK: usrs.cpp c:@N@special_names@F@operator""_udl#*1C# Extent=[124:1 - 124:33]
// CHECK: usrs.cpp c:@N@special_names@ST>1#T@ST Extent=[125:1 - 125:35]
// CHECK: usrs.cpp c:usrs.cpp@2114 Extent=[125:11 - 125:21]
// CHECK: usrs.cpp c:@N@special_names@FT@>1#T<deduction guide for ST>#t0.0#$@N@special_names@S@ST>#I# Extent=[126:1 - 126:38]
// CHECK: usrs.cpp c:usrs.cpp@2149 Extent=[126:10 - 126:20]

Some unnamed entities are already handled. In particular:

  • Unnamed bit-fields are ignored (VisitFieldDecl). But note that the comment is wrong; the fields in a lambda class are unnamed. Should they be mangled?
  • Anonymous namespaces are handled (VisitNamespaceDecl).
  • The name of template parameters is not included in the mangling (VisitTemplateParameterList) so unnamed template parameters are already indirectly handled.
  • For anonymous records, no USR is generated for the unnamed variable but an USR is generated for the anonymous record. However it can ambiguous in some cases:
namespace anonymous_records {
static union { int i0; };
static union { int i1; };
struct S { union { int j; }; };
struct T { struct { int k; }; };
}

results in:

// CHECK: usrs.cpp c:@N@anonymous_records Extent=[136:1 - 141:2]
// CHECK: usrs.cpp c:usrs.cpp@N@anonymous_records@Ua Extent=[137:8 - 137:25]
// CHECK: usrs.cpp c:usrs.cpp@N@anonymous_records@Ua@FI@i0 Extent=[137:16 - 137:22]
// CHECK: usrs.cpp c:usrs.cpp@N@anonymous_records@Ua Extent=[138:8 - 138:25]
// CHECK: usrs.cpp c:usrs.cpp@N@anonymous_records@Ua@FI@i1 Extent=[138:16 - 138:22]
// CHECK: usrs.cpp c:@N@anonymous_records@S@S Extent=[139:1 - 139:31]
// CHECK: usrs.cpp c:@N@anonymous_records@S@S@Ua Extent=[139:12 - 139:28]
// CHECK: usrs.cpp c:@N@anonymous_records@S@S@Ua@FI@j Extent=[139:20 - 139:25]
// CHECK: usrs.cpp c:@N@anonymous_records@S@T Extent=[140:1 - 140:32]
// CHECK: usrs.cpp c:@N@anonymous_records@S@T@Sa Extent=[140:12 - 140:29]
// CHECK: usrs.cpp c:@N@anonymous_records@S@T@Sa@FI@k Extent=[140:21 - 140:26]
  • For unnamed enumerations, the name of the first enumerator is used.
  • No USRs are currently generated for MSGuidDecl and DecompositionDecl anyway since they are not visited:
namespace decomposition_decl {
struct S { int i = 0; };
void Test() { auto [x] = S(); } 
}

results in:

// CHECK: usrs.cpp c:@N@decomposition_decl Extent=[150:1 - 155:2]
// CHECK: usrs.cpp c:@N@decomposition_decl@S@S Extent=[151:1 - 151:24]
// CHECK: usrs.cpp c:@N@decomposition_decl@S@S@FI@i Extent=[151:12 - 151:21]
// CHECK: usrs.cpp c:@N@decomposition_decl@F@Test# Extent=[152:1 - 154:2]
// CHECK: usrs.cpp c:usrs.cpp@2610@N@decomposition_decl@F@Test#@x Extent=[153:9 - 153:10]

Conclusion: I think that USR generation should not depend in general on the pretty-printed name of entities. Therefore something like the following should be done:

  • Instead of using NamedDecl::printName in EmitDeclName, use NamedDecl::getName for simple names and handle separately each kind of special names.
  • For unnamed entities for which an USR should be generated, implement the corresponding Visit*. The Visit* function should not depend on the pretty-printed name.

Thoughts?

riccibruno retitled this revision from [clang-index] Use NamedDecl::getDeclName() instead of NamedDecl::printName in USRGenerator::EmitDeclName to [Index/USRGeneration] Use NamedDecl::getDeclName() instead of NamedDecl::printName in USRGenerator::EmitDeclName.Aug 7 2020, 12:37 PM

... or if a string such as (unnamed struct at /path/to/input.cc:1:3) is suitable in an USR then printName can be used.

Friendly ping on this patch; the patches depending on this patch (D84658 and D85033 on Phab + others not uploaded yet) significantly improve the handling of unnamed entities in diagnostics.

To make this patch more acceptable I could also add a Visit function for DecompositionDecl and MSGuidDecl such that the current behaviour is preserved (I won't be able to test it though since these implicit AST nodes are not visited).

jkorous resigned from this revision.Jan 21 2022, 3:55 PM