This is an archive of the discontinued LLVM Phabricator instance.

Fix type name generation in DWARF for template instantiations with enum types and template specializations
Needs ReviewPublic

Authored by xgsa on Nov 3 2017, 4:15 PM.

Details

Summary

Currently, there are a few cases, when clang generates type names differently for template instantiations in DWARF and during symbols mangling. Thus debugger (lldb, in particular) is unable to resolve to resolve the real type using RTTI. Consider an example:

enum class TagType : bool
{
    Tag1
};

struct I
{
    virtual ~I() = default;
};

template <TagType Tag>
struct Impl : public I
{
private:
    int v = 123;    
};

int main(int argc, const char * argv[]) {
    Impl<TagType::Tag1> impl;
    I& i = impl;
    return 0;  // [*]
}

For such code clang generates type name "Impl<TagType::Tag1>" in DWARF and "Impl<(TagType)0>" when mangling symbols.
This leads to the following issue in debugger (let's say the debugger is stopped at point [*]):

  1. The "i" variable is of reference type and "I" is of polymorphic type;
  2. The debugger tries to resolve RTTI record for "I" and get the real type name from it;
  3. The real type of the "i" variable is "Impl<(TagType)0>" (because it was generated with clang mangling mechanism);
  4. The debugger tries to find the "Impl<(TagType)0>" type in DWARF and fails, because in DWARF it is named "Impl<TagType::Tag1>";

Thus the "v" member of "i" is not shown and "i" has the "I&" type, but not "Impl<TagType::Tag1>&".

There is one more case even without enums:

struct I 
{
  virtual ~I(){}
};

template <int Tag>
struct Impl : public I
{
        int v = 123;
};

template <>
struct Impl<1+1+1> : public I  // Note the expression used for this specialization
{
        int v = 124;
};

template <class T>
struct TT {
  I* i = new T();
};

int main(int argc, const char * argv[]) {
    TT<Impl<3>> tt;
    return 0;  // [*]
}

For such code clang generates type name "Impl<1+1+1>" into DWARF and "Impl<3>" when mangling symbols, so similarly at point [*] the "tt.i" won't be shown properly because of the same reasons. BTW, "Impl<1+1+1>" is generated in CGDebugInfo::getClassName(), where RD->getNameForDiagnostic() is called for template specializations.

This patch fixes the described issues, but has the following drawback: after the fix, the template instantiations with enums are generated in format "TemplateType<EnumType(value)>", that is less native than previously "TemplateType<EnumType::Item>". This could be visible for user when working with the debugger (the type in format "TemplateType<EnumType::Item>" won't be resolved to a known type). Is it OK or are there other ideas how to fix the issues, described above?

Diff Detail

Repository
rL LLVM

Event Timeline

xgsa created this revision.Nov 3 2017, 4:15 PM
aprantl edited edge metadata.

Can you add a testcase?

aprantl added inline comments.Nov 3 2017, 4:22 PM
include/clang/AST/PrettyPrinter.h
68

this change looks like it has the potential to break existing code.

227

Te \brief is redundant and can be omitted.

xgsa added a comment.Nov 4 2017, 2:04 AM

Can you add a testcase?

Definitely, I just wanted to know if such fix is correct at all. Moreover, could you please help me with the correct place for a test case? Possibly, there are some similar test case I can look at, it would be very helpful.

xgsa added inline comments.Nov 4 2017, 2:15 AM
include/clang/AST/PrettyPrinter.h
68

If not to change the size of this field, the overall size of the PrintingPolicy will exceed 32 bits, so it won't fit a CPU register on 32-bit systems and will be less lightweight. Is it OK, should this line to be reverted?

227

I reviewed the other descriptions once again and I suppose it would be more consistent to have "\brief Use formatting compatible with ABI specification." and the rest of the description as a separate paragraph. Don't you mind against such fix?

aprantl added inline comments.Nov 4 2017, 10:07 AM
include/clang/AST/PrettyPrinter.h
227

This is obviously not super important, but since you asked: our policy for how we use Doxygen has evolved over time, and this file represents an older version of that policy. The right thing to do would be to first commit a cleanup patch that removes all existing \brief occurrences from the file and then land the patch.

xgsa added inline comments.Nov 4 2017, 3:10 PM
include/clang/AST/PrettyPrinter.h
227

Thank you for clarification, I don't mind doing the right thing: https://reviews.llvm.org/D39633

Still, I don't have rights to commit the patch, so somebody should commit it anyway.

For clarification: what is the "symbols table" you are referring to in the description?

xgsa added a comment.Nov 12 2017, 12:52 PM

For clarification: what is the "symbols table" you are referring to in the description?

I meant the data dumped with "nm -an ./test".

By the way, I haven't abandoned the patch, I have found one more case when my fix doesn't work and I am working on improvement. Nevertheless, it would be helpful to get answers to the questions in this review (about changing the "Indentation" field and about the test).

xgsa updated this revision to Diff 126826.Dec 13 2017, 2:14 PM
xgsa retitled this revision from Fix type debug information generation for enum-based template specialization to Fix type name generation in DWARF for template instantiations with enum types and template specializations.
xgsa edited the summary of this revision. (Show Details)

One more case was handled, review comments were applied, but no tests though, because I still not sure if the approach I have chosen is correct.

xgsa marked 6 inline comments as done.Dec 13 2017, 2:15 PM
xgsa edited the summary of this revision. (Show Details)Dec 13 2017, 2:23 PM

Philosophically, mangled names and DWARF information serve different purposes, and I don't think you will find one true solution where both of them can yield the same name that everyone will be happy with. Mangled names exist to provide unique and reproducible identifiers for the "same" entity across compilation units. They are carefully specified (for example) to allow a linker to associate a reference in one object file to a definition in a different object file, and be guaranteed that the association is correct. A demangled name is a necessarily context-free translation of the mangled name into something that has a closer relationship to how a human would think of or write the name of the thing, but isn't necessarily the only way to write the name of the thing.

DWARF names are (deliberately not carefully specified) strings that ought to bear some relationship to how source code would name the thing, but you probably don't want to attach semantic significance to those names. This is rather emphatically true for names containing template parameters. Typedefs (and their recent offspring, 'using' aliases) are your sworn enemy here. Enums, as you have found, are also a problem.

Basically, the type of an entity does not have a unique name, and trying to coerce different representations of the type into having the same unique name is a losing battle.

xgsa added a comment.EditedDec 14 2017, 2:21 AM

Philosophically, mangled names and DWARF information serve different purposes, and I don't think you will find one true solution where both of them can yield the same name that everyone will be happy with. Mangled names exist to provide unique and reproducible identifiers for the "same" entity across compilation units. They are carefully specified (for example) to allow a linker to associate a reference in one object file to a definition in a different object file, and be guaranteed that the association is correct. A demangled name is a necessarily context-free translation of the mangled name into something that has a closer relationship to how a human would think of or write the name of the thing, but isn't necessarily the only way to write the name of the thing.

DWARF names are (deliberately not carefully specified) strings that ought to bear some relationship to how source code would name the thing, but you probably don't want to attach semantic significance to those names. This is rather emphatically true for names containing template parameters. Typedefs (and their recent offspring, 'using' aliases) are your sworn enemy here. Enums, as you have found, are also a problem.

Basically, the type of an entity does not have a unique name, and trying to coerce different representations of the type into having the same unique name is a losing battle.

Thank you for clarification, Paul! Nevertheless, I suppose, showing actual type of a dynamic variable is very important for the projects, where RTTI is used. Moreover, it works properly in gcc+gdb pair, so I am extremely interested in fixing it in clang+lldb.

I understand that the suggested solution possibly does not cover all the cases, but it improves the situation and actually covers all the cases found by me (I have just rechecked -- typedefs/usings seems to work fine when displaying the real type of variable). If more cases are found in future, they could be fixed similarly too. Moreover, the debuggers already rely on the fact that the type name looks the same in RTTI and DWARF, and I suppose they have no choice, because there is no other source of information for them (or am I missing something?). Another advantage of this solution is that it doesn't require any format extension and will probably work out of the box in gdb and other debuggers. Moreover, I have just rechecked, gcc generates exactly the same type names in DWARF for examples in the description. Furthermore, DWARF Best Practices recommend "For C++, the string should match that produced by the target platform's canonical demangler" [1].

On the other hand, I understand the idea you have described, but I am not sure how to implement this lookup in another way. I suppose, we cannot extend RTTI with the debug type name (is it correct?). Thus, the only way I see is to add additional information about the mangled type name into DWARF. It could be either a separate section (like apple_types) or a special node for TAG_structure_type/TAG_class_type, which should be indexed into map for fast lookup. Anyway, this will be an extension to DWARF and will require special support in a debugger. Furthermore, such solution will be much complicated (still I don't mind working on it).

So what do you think? Is the suggested solution not full or not acceptable? Do you have other ideas how this feature should be implemented?

P.S. Should this question be raised in mailing list? And if yes, actually, in which ones (clang or lldb?), because it seems related to both clang and lldb?

[1] - http://wiki.dwarfstd.org/index.php?title=Best_Practices#Names_of_Program_Entities