This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/clang/AST/
-
clang/
-
AST/
6/6
PrettyPrinter.h
-
lib/
-
AST/
-
TemplateBase.cpp
-
TypePrinter.cpp
-
CodeGen/
-
CGDebugInfo.cpp

Differential D39622

Fix type name generation in DWARF for template instantiations with enum types and template specializations
Needs ReviewPublic

Authored by xgsa on Nov 3 2017, 4:15 PM.

Download Raw Diff

Details

Reviewers

echristo
aprantl
dblaikie

Summary

Currently, there are a few cases, when clang generates type names differently for template instantiations in DWARF and during symbols mangling. Thus debugger (lldb, in particular) is unable to resolve to resolve the real type using RTTI. Consider an example:

enum class TagType : bool
{
    Tag1
};

struct I
{
    virtual ~I() = default;
};

template <TagType Tag>
struct Impl : public I
{
private:
    int v = 123;    
};

int main(int argc, const char * argv[]) {
    Impl<TagType::Tag1> impl;
    I& i = impl;
    return 0;  // [*]
}

For such code clang generates type name "Impl<TagType::Tag1>" in DWARF and "Impl<(TagType)0>" when mangling symbols.
This leads to the following issue in debugger (let's say the debugger is stopped at point [*]):

The "i" variable is of reference type and "I" is of polymorphic type;
The debugger tries to resolve RTTI record for "I" and get the real type name from it;
The real type of the "i" variable is "Impl<(TagType)0>" (because it was generated with clang mangling mechanism);
The debugger tries to find the "Impl<(TagType)0>" type in DWARF and fails, because in DWARF it is named "Impl<TagType::Tag1>";

Thus the "v" member of "i" is not shown and "i" has the "I&" type, but not "Impl<TagType::Tag1>&".

There is one more case even without enums:

struct I 
{
  virtual ~I(){}
};

template <int Tag>
struct Impl : public I
{
        int v = 123;
};

template <>
struct Impl<1+1+1> : public I  // Note the expression used for this specialization
{
        int v = 124;
};

template <class T>
struct TT {
  I* i = new T();
};

int main(int argc, const char * argv[]) {
    TT<Impl<3>> tt;
    return 0;  // [*]
}

For such code clang generates type name "Impl<1+1+1>" into DWARF and "Impl<3>" when mangling symbols, so similarly at point [*] the "tt.i" won't be shown properly because of the same reasons. BTW, "Impl<1+1+1>" is generated in CGDebugInfo::getClassName(), where RD->getNameForDiagnostic() is called for template specializations.

This patch fixes the described issues, but has the following drawback: after the fix, the template instantiations with enums are generated in format "TemplateType<EnumType(value)>", that is less native than previously "TemplateType<EnumType::Item>". This could be visible for user when working with the debugger (the type in format "TemplateType<EnumType::Item>" won't be resolved to a known type). Is it OK or are there other ideas how to fix the issues, described above?

Diff Detail

Event Timeline

xgsa created this revision.Nov 3 2017, 4:15 PM

Herald added a subscriber: aprantl. · View Herald TranscriptNov 3 2017, 4:15 PM

xgsa added a reviewer: echristo.Nov 3 2017, 4:19 PM

Can you add a testcase?

aprantl added inline comments.Nov 3 2017, 4:22 PM

include/clang/AST/PrettyPrinter.h
68	this change looks like it has the potential to break existing code.
227	Te \brief is redundant and can be omitted.

In D39622#915722, @aprantl wrote:

Can you add a testcase?

Definitely, I just wanted to know if such fix is correct at all. Moreover, could you please help me with the correct place for a test case? Possibly, there are some similar test case I can look at, it would be very helpful.

xgsa added inline comments.Nov 4 2017, 2:15 AM

include/clang/AST/PrettyPrinter.h
68	If not to change the size of this field, the overall size of the PrintingPolicy will exceed 32 bits, so it won't fit a CPU register on 32-bit systems and will be less lightweight. Is it OK, should this line to be reverted?
227	I reviewed the other descriptions once again and I suppose it would be more consistent to have "\brief Use formatting compatible with ABI specification." and the rest of the description as a separate paragraph. Don't you mind against such fix?

aprantl added inline comments.Nov 4 2017, 10:07 AM

include/clang/AST/PrettyPrinter.h
227	This is obviously not super important, but since you asked: our policy for how we use Doxygen has evolved over time, and this file represents an older version of that policy. The right thing to do would be to first commit a cleanup patch that removes all existing `\brief` occurrences from the file and then land the patch.

xgsa added inline comments.Nov 4 2017, 3:10 PM

include/clang/AST/PrettyPrinter.h
227	Thank you for clarification, I don't mind doing the right thing: https://reviews.llvm.org/D39633 Still, I don't have rights to commit the patch, so somebody should commit it anyway.

For clarification: what is the "symbols table" you are referring to in the description?

In D39622#919579, @aprantl wrote:

For clarification: what is the "symbols table" you are referring to in the description?

I meant the data dumped with "nm -an ./test".

By the way, I haven't abandoned the patch, I have found one more case when my fix doesn't work and I am working on improvement. Nevertheless, it would be helpful to get answers to the questions in this review (about changing the "Indentation" field and about the test).

One more case was handled, review comments were applied, but no tests though, because I still not sure if the approach I have chosen is correct.

Herald added a subscriber: JDevlieghere. · View Herald TranscriptDec 13 2017, 2:14 PM

xgsa marked 6 inline comments as done.Dec 13 2017, 2:15 PM

xgsa edited the summary of this revision. (Show Details)Dec 13 2017, 2:23 PM

probinson added a project: debug-info.Dec 13 2017, 2:37 PM

Philosophically, mangled names and DWARF information serve different purposes, and I don't think you will find one true solution where both of them can yield the same name that everyone will be happy with. Mangled names exist to provide unique and reproducible identifiers for the "same" entity across compilation units. They are carefully specified (for example) to allow a linker to associate a reference in one object file to a definition in a different object file, and be guaranteed that the association is correct. A demangled name is a necessarily context-free translation of the mangled name into something that has a closer relationship to how a human would think of or write the name of the thing, but isn't necessarily the only way to write the name of the thing.

DWARF names are (deliberately not carefully specified) strings that ought to bear some relationship to how source code would name the thing, but you probably don't want to attach semantic significance to those names. This is rather emphatically true for names containing template parameters. Typedefs (and their recent offspring, 'using' aliases) are your sworn enemy here. Enums, as you have found, are also a problem.

Basically, the type of an entity does not have a unique name, and trying to coerce different representations of the type into having the same unique name is a losing battle.

In D39622#954585, @probinson wrote:

Philosophically, mangled names and DWARF information serve different purposes, and I don't think you will find one true solution where both of them can yield the same name that everyone will be happy with. Mangled names exist to provide unique and reproducible identifiers for the "same" entity across compilation units. They are carefully specified (for example) to allow a linker to associate a reference in one object file to a definition in a different object file, and be guaranteed that the association is correct. A demangled name is a necessarily context-free translation of the mangled name into something that has a closer relationship to how a human would think of or write the name of the thing, but isn't necessarily the only way to write the name of the thing.

DWARF names are (deliberately not carefully specified) strings that ought to bear some relationship to how source code would name the thing, but you probably don't want to attach semantic significance to those names. This is rather emphatically true for names containing template parameters. Typedefs (and their recent offspring, 'using' aliases) are your sworn enemy here. Enums, as you have found, are also a problem.

Basically, the type of an entity does not have a unique name, and trying to coerce different representations of the type into having the same unique name is a losing battle.

Thank you for clarification, Paul! Nevertheless, I suppose, showing actual type of a dynamic variable is very important for the projects, where RTTI is used. Moreover, it works properly in gcc+gdb pair, so I am extremely interested in fixing it in clang+lldb.

I understand that the suggested solution possibly does not cover all the cases, but it improves the situation and actually covers all the cases found by me (I have just rechecked -- typedefs/usings seems to work fine when displaying the real type of variable). If more cases are found in future, they could be fixed similarly too. Moreover, the debuggers already rely on the fact that the type name looks the same in RTTI and DWARF, and I suppose they have no choice, because there is no other source of information for them (or am I missing something?). Another advantage of this solution is that it doesn't require any format extension and will probably work out of the box in gdb and other debuggers. Moreover, I have just rechecked, gcc generates exactly the same type names in DWARF for examples in the description. Furthermore, DWARF Best Practices recommend "For C++, the string should match that produced by the target platform's canonical demangler" [1].

On the other hand, I understand the idea you have described, but I am not sure how to implement this lookup in another way. I suppose, we cannot extend RTTI with the debug type name (is it correct?). Thus, the only way I see is to add additional information about the mangled type name into DWARF. It could be either a separate section (like apple_types) or a special node for TAG_structure_type/TAG_class_type, which should be indexed into map for fast lookup. Anyway, this will be an extension to DWARF and will require special support in a debugger. Furthermore, such solution will be much complicated (still I don't mind working on it).

So what do you think? Is the suggested solution not full or not acceptable? Do you have other ideas how this feature should be implemented?

P.S. Should this question be raised in mailing list? And if yes, actually, in which ones (clang or lldb?), because it seems related to both clang and lldb?

[1] - http://wiki.dwarfstd.org/index.php?title=Best_Practices#Names_of_Program_Entities

Revision Contents

Path

Size

include/

clang/

AST/

PrettyPrinter.h

9 lines

lib/

AST/

TemplateBase.cpp

24 lines

TypePrinter.cpp

4 lines

CodeGen/

CGDebugInfo.cpp

1 line

Diff 126826

include/clang/AST/PrettyPrinter.h

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	: Indentation(2), SuppressSpecifiers(false),
SuppressTemplateArgsInCXXConstructors(false),		SuppressTemplateArgsInCXXConstructors(false),
Bool(LO.Bool), Restrict(LO.C99),		Bool(LO.Bool), Restrict(LO.C99),
Alignof(LO.CPlusPlus11), UnderscoreAlignof(LO.C11),		Alignof(LO.CPlusPlus11), UnderscoreAlignof(LO.C11),
UseVoidForZeroParams(!LO.CPlusPlus),		UseVoidForZeroParams(!LO.CPlusPlus),
TerseOutput(false), PolishForDeclaration(false),		TerseOutput(false), PolishForDeclaration(false),
Half(LO.Half), MSWChar(LO.MicrosoftExt && !LO.WChar),		Half(LO.Half), MSWChar(LO.MicrosoftExt && !LO.WChar),
IncludeNewlines(true), MSVCFormatting(false),		IncludeNewlines(true), MSVCFormatting(false),
ConstantsAsWritten(false), SuppressImplicitBase(false),		ConstantsAsWritten(false), SuppressImplicitBase(false),
FullyQualifiedName(false) { }		FullyQualifiedName(false), ABICompatibleFormatting(false) { }

/// Adjust this printing policy for cases where it's known that we're		/// Adjust this printing policy for cases where it's known that we're
/// printing C++ code (for instance, if AST dumping reaches a C++-only		/// printing C++ code (for instance, if AST dumping reaches a C++-only
/// construct). This should not be used if a real LangOptions object is		/// construct). This should not be used if a real LangOptions object is
/// available.		/// available.
void adjustForCPlusPlus() {		void adjustForCPlusPlus() {
SuppressTagKeyword = true;		SuppressTagKeyword = true;
Bool = true;		Bool = true;
UseVoidForZeroParams = false;		UseVoidForZeroParams = false;
}		}

/// The number of spaces to use to indent each line.		/// The number of spaces to use to indent each line.
unsigned Indentation : 8;		unsigned Indentation : 8;
		aprantlUnsubmitted Done Reply Inline Actions this change looks like it has the potential to break existing code. aprantl: this change looks like it has the potential to break existing code.
		xgsaAuthorUnsubmitted Done Reply Inline Actions If not to change the size of this field, the overall size of the PrintingPolicy will exceed 32 bits, so it won't fit a CPU register on 32-bit systems and will be less lightweight. Is it OK, should this line to be reverted? xgsa: If not to change the size of this field, the overall size of the PrintingPolicy will exceed 32…

/// Whether we should suppress printing of the actual specifiers for		/// Whether we should suppress printing of the actual specifiers for
/// the given type or declaration.		/// the given type or declaration.
///		///
/// This flag is only used when we are printing declarators beyond		/// This flag is only used when we are printing declarators beyond
/// the first declarator within a declaration group. For example, given:		/// the first declarator within a declaration group. For example, given:
///		///
/// \code		/// \code
▲ Show 20 Lines • Show All 142 Lines • ▼ Show 20 Lines	struct PrintingPolicy {
/// \endcode		/// \endcode
bool ConstantsAsWritten : 1;		bool ConstantsAsWritten : 1;

/// When true, don't print the implicit 'self' or 'this' expressions.		/// When true, don't print the implicit 'self' or 'this' expressions.
bool SuppressImplicitBase : 1;		bool SuppressImplicitBase : 1;

/// When true, print the fully qualified name of function declarations.		/// When true, print the fully qualified name of function declarations.
/// This is the opposite of SuppressScope and thus overrules it.		/// This is the opposite of SuppressScope and thus overrules it.
bool FullyQualifiedName : 1;		bool FullyQualifiedName : 1;
		aprantlUnsubmitted Done Reply Inline Actions Te \brief is redundant and can be omitted. aprantl: Te \brief is redundant and can be omitted.
		xgsaAuthorUnsubmitted Done Reply Inline Actions I reviewed the other descriptions once again and I suppose it would be more consistent to have "\brief Use formatting compatible with ABI specification." and the rest of the description as a separate paragraph. Don't you mind against such fix? xgsa: I reviewed the other descriptions once again and I suppose it would be more consistent to have…
		aprantlUnsubmitted Done Reply Inline Actions This is obviously not super important, but since you asked: our policy for how we use Doxygen has evolved over time, and this file represents an older version of that policy. The right thing to do would be to first commit a cleanup patch that removes all existing `\brief` occurrences from the file and then land the patch. aprantl: This is obviously not super important, but since you asked: our policy for how we use Doxygen…
		xgsaAuthorUnsubmitted Done Reply Inline Actions Thank you for clarification, I don't mind doing the right thing: https://reviews.llvm.org/D39633 Still, I don't have rights to commit the patch, so somebody should commit it anyway. xgsa: Thank you for clarification, I don't mind doing the right thing: https://reviews.llvm.

		/// Use formatting compatible with ABI specification. It is necessary for
		/// saving entities into debug tables which have to be compatible with
		/// the representation, described in ABI specification. In particular, this forces
		/// templates parametrized with enums to be represented as "T<(Enum)0>" instead of
		/// "T<Enum::Item0>" and template specializations to be written in canonical form.
		bool ABICompatibleFormatting : 1;
};		};

} // end namespace clang		} // end namespace clang

#endif		#endif

lib/AST/TemplateBase.cpp

	Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	///			///
	/// \param Policy the printing policy for EnumConstantDecl printing.			/// \param Policy the printing policy for EnumConstantDecl printing.
	static void printIntegral(const TemplateArgument &TemplArg,			static void printIntegral(const TemplateArgument &TemplArg,
	raw_ostream &Out, const PrintingPolicy& Policy) {			raw_ostream &Out, const PrintingPolicy& Policy) {
	const Type *T = TemplArg.getIntegralType().getTypePtr();			const Type *T = TemplArg.getIntegralType().getTypePtr();
	const llvm::APSInt &Val = TemplArg.getAsIntegral();			const llvm::APSInt &Val = TemplArg.getAsIntegral();

	if (const EnumType *ET = T->getAs<EnumType>()) {			if (const EnumType *ET = T->getAs<EnumType>()) {
				if (Policy.ABICompatibleFormatting) {
				Out << "(";
				ET->getDecl()->getNameForDiagnostic(Out, Policy, true);
				Out << ")";
				Out << Val;
				return;
				} else {
	for (const EnumConstantDecl* ECD : ET->getDecl()->enumerators()) {			for (const EnumConstantDecl* ECD : ET->getDecl()->enumerators()) {
	// In Sema::CheckTemplateArugment, enum template arguments value are			// In Sema::CheckTemplateArugment, enum template arguments value are
	// extended to the size of the integer underlying the enum type. This			// extended to the size of the integer underlying the enum type. This
	// may create a size difference between the enum value and template			// may create a size difference between the enum value and template
	// argument value, requiring isSameValue here instead of operator==.			// argument value, requiring isSameValue here instead of operator==.
	if (llvm::APSInt::isSameValue(ECD->getInitVal(), Val)) {			if (llvm::APSInt::isSameValue(ECD->getInitVal(), Val)) {
	ECD->printQualifiedName(Out, Policy);			ECD->printQualifiedName(Out, Policy);
	return;			return;
	}			}
	}			}
	}			}
				}

	if (T->isBooleanType() && !Policy.MSVCFormatting) {			if (T->isBooleanType() && !Policy.MSVCFormatting) {
	Out << (Val.getBoolValue() ? "true" : "false");			Out << (Val.getBoolValue() ? "true" : "false");
	} else if (T->isCharType()) {			} else if (T->isCharType()) {
	const char Ch = Val.getZExtValue();			const char Ch = Val.getZExtValue();
	Out << ((Ch == '\'') ? "'\\" : "'");			Out << ((Ch == '\'') ? "'\\" : "'");
	Out.write_escaped(StringRef(&Ch, 1), /UseHexEscapes=/ true);			Out.write_escaped(StringRef(&Ch, 1), /UseHexEscapes=/ true);
	Out << "'";			Out << "'";
	▲ Show 20 Lines • Show All 553 Lines • Show Last 20 Lines

lib/AST/TypePrinter.cpp

Show First 20 Lines • Show All 1,054 Lines • ▼ Show 20 Lines	else if (TypedefNameDecl *Typedef = D->getTypedefNameForAnonDecl()) {
OS << (Policy.MSVCFormatting ? '\'' : ')');		OS << (Policy.MSVCFormatting ? '\'' : ')');
}		}

// If this is a class template specialization, print the template		// If this is a class template specialization, print the template
// arguments.		// arguments.
if (ClassTemplateSpecializationDecl *Spec		if (ClassTemplateSpecializationDecl *Spec
= dyn_cast<ClassTemplateSpecializationDecl>(D)) {		= dyn_cast<ClassTemplateSpecializationDecl>(D)) {
ArrayRef<TemplateArgument> Args;		ArrayRef<TemplateArgument> Args;
if (TypeSourceInfo *TAW = Spec->getTypeAsWritten()) {		if (TypeSourceInfo *TAW = !Policy.ABICompatibleFormatting
		? Spec->getTypeAsWritten()
		: nullptr) {
const TemplateSpecializationType *TST =		const TemplateSpecializationType *TST =
cast<TemplateSpecializationType>(TAW->getType());		cast<TemplateSpecializationType>(TAW->getType());
Args = TST->template_arguments();		Args = TST->template_arguments();
} else {		} else {
const TemplateArgumentList &TemplateArgs = Spec->getTemplateArgs();		const TemplateArgumentList &TemplateArgs = Spec->getTemplateArgs();
Args = TemplateArgs.asArray();		Args = TemplateArgs.asArray();
}		}
IncludeStrongLifetimeRAII Strong(Policy);		IncludeStrongLifetimeRAII Strong(Policy);
▲ Show 20 Lines • Show All 674 Lines • Show Last 20 Lines

lib/CodeGen/CGDebugInfo.cpp

Show First 20 Lines • Show All 228 Lines • ▼ Show 20 Lines	PrintingPolicy CGDebugInfo::getPrintingPolicy() const {

// If we're emitting codeview, it's important to try to match MSVC's naming so		// If we're emitting codeview, it's important to try to match MSVC's naming so
// that visualizers written for MSVC will trigger for our class names. In		// that visualizers written for MSVC will trigger for our class names. In
// particular, we can't have spaces between arguments of standard templates		// particular, we can't have spaces between arguments of standard templates
// like basic_string and vector.		// like basic_string and vector.
if (CGM.getCodeGenOpts().EmitCodeView)		if (CGM.getCodeGenOpts().EmitCodeView)
PP.MSVCFormatting = true;		PP.MSVCFormatting = true;

		PP.ABICompatibleFormatting = true;
return PP;		return PP;
}		}

StringRef CGDebugInfo::getFunctionName(const FunctionDecl *FD) {		StringRef CGDebugInfo::getFunctionName(const FunctionDecl *FD) {
assert(FD && "Invalid FunctionDecl!");		assert(FD && "Invalid FunctionDecl!");
IdentifierInfo *FII = FD->getIdentifier();		IdentifierInfo *FII = FD->getIdentifier();
FunctionTemplateSpecializationInfo *Info =		FunctionTemplateSpecializationInfo *Info =
FD->getTemplateSpecializationInfo();		FD->getTemplateSpecializationInfo();
▲ Show 20 Lines • Show All 3,919 Lines • Show Last 20 Lines