This is an archive of the discontinued LLVM Phabricator instance.

[lldb/DWARF] Fix linking direction in CopyUniqueClassMethodTypes
ClosedPublic

Authored by labath on Apr 25 2022, 2:52 AM.

Details

Summary

IIUC, the purpose of CopyUniqueClassMethodTypes is to link together
class definitions in two compile units so that we only have a single
definition of a class. It does this by adding entries to the die_to_type
and die_to_decl_ctx maps.

However, the direction of the linking seems to be reversed. It is taking
entries from the class that has not yet been parsed, and copying them to
the class which has been parsed already -- i.e., it is a very
complicated no-op.

Changing the linking order allows us to revert the changes in D13224
(while keeping the associated test case passing), and is sufficient to
fix PR54761, which was caused by an undesired interaction with that
patch.

Diff Detail

Event Timeline

labath created this revision.Apr 25 2022, 2:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 25 2022, 2:52 AM
labath requested review of this revision.Apr 25 2022, 2:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 25 2022, 2:52 AM

This mostly makes sense, the purpose of the || alternate_defn was not clear to me either.

So I believe the reason we need to be able to add methods to a class is for templates. Templates in DWARF are really poorly emitted all in the name of saving space. There is no full definition of template classes that get emitted, just a class definition with _only_ the methods that the user used in the current compile unit. DWARF doesn't really support emitting a templatized definition of a class in terms of a type T, it will always emit instantiations of the type T directly. So in LLDB we must create a template class like "std::vector<int>" and say it has no member functions, and then add each member function as a type specific specialization due to how the types must be created in the clang::ASTContext.

in one DWARF compile unit you end up having say "std::vector<int>::clear()" and in another you would have "std::vector<int>::erase()". In order to make the most complete definition of template types we need to find all "std::vector<int>" definitions and manually add any method that we haven't already seen, otherwise we end up not being able to call methods that might exist. Of course calling template functions is already fraught with danger since most are inlined if they come from the STL, but if the user asks for the class definition for "std::vector<int>", we should show all that we can. Can you make sure this patch doesn't hurt that functionality? We must have a test for it.

So we can't just say we will accept just one class definition if we have template types. Given this information, let me know what you think of your current patch

I take it no tests fail upon removing this condition? Any hints in version history about its purpose?

labath planned changes to this revision.Apr 26 2022, 8:36 AM

So I believe the reason we need to be able to add methods to a class is for templates. Templates in DWARF are really poorly emitted all in the name of saving space. There is no full definition of template classes that get emitted, just a class definition with _only_ the methods that the user used in the current compile unit. DWARF doesn't really support emitting a templatized definition of a class in terms of a type T, it will always emit instantiations of the type T directly. So in LLDB we must create a template class like "std::vector<int>" and say it has no member functions, and then add each member function as a type specific specialization due to how the types must be created in the clang::ASTContext.

in one DWARF compile unit you end up having say "std::vector<int>::clear()" and in another you would have "std::vector<int>::erase()". In order to make the most complete definition of template types we need to find all "std::vector<int>" definitions and manually add any method that we haven't already seen, otherwise we end up not being able to call methods that might exist. Of course calling template functions is already fraught with danger since most are inlined if they come from the STL, but if the user asks for the class definition for "std::vector<int>", we should show all that we can. Can you make sure this patch doesn't hurt that functionality? We must have a test for it.

So we can't just say we will accept just one class definition if we have template types. Given this information, let me know what you think of your current patch

I think I'm confused. I know we're doing some funny stuff with class templates, and I'm certain I don't fully understand how it works. However, I am also pretty sure your description is not correct either. A simple snippet like std::vector<int> v; will produce a definition for the vector<int> class, and that definition will include the declarations for erase, clear, and any other non-template member function, even though the code definitely does not call them. And this makes perfect sense to me given how DWARF (and clang) describes only template instantiations -- so the template instantiation vector<int> is treated the same way as a class vector_int with the same interface.

What you're describing resembles the treatment of template member functions -- those are only emitted in the compile units that they are used. This is very unfortunate for us, though I think it still makes sense from the DWARF perspective. However:
a) In line with the previous paragraph -- this behavior is the same for template class instantiations and regular classes. IOW, the important question is whether the method is a template, not whether its enclosing class is
b) (precisely for this reason) we currently completely ignore member function templates when parsing classes
Given all of that, I don't see how templates are relevant for this patch. *However*, please read on.

I take it no tests fail upon removing this condition? Any hints in version history about its purpose?

No tests broke with that. My first archaeology expedition did not find anything, but now I've tried it once more, and I found D13224. That diff is adding this condition (to support -flimit-debug-info, no less) and it even comes with a test. That test still exists, but it was broken over time (in several ways). Once I fix it so that it tests what it was supposed to test, I see that my patch breaks it.

I'm now going to try to figure out how does that code actually work...

labath updated this revision to Diff 425727.Apr 28 2022, 2:34 AM

Here's the updated version. I'll place my new findings into the patch
description.

labath retitled this revision from [lldb] Fix PR54761 to [lldb/DWARF] Fix linking direction in CopyUniqueClassMethodTypes.Apr 28 2022, 2:35 AM
labath edited the summary of this revision. (Show Details)

So I believe the reason we need to be able to add methods to a class is for templates. Templates in DWARF are really poorly emitted all in the name of saving space. There is no full definition of template classes that get emitted, just a class definition with _only_ the methods that the user used in the current compile unit. DWARF doesn't really support emitting a templatized definition of a class in terms of a type T, it will always emit instantiations of the type T directly. So in LLDB we must create a template class like "std::vector<int>" and say it has no member functions, and then add each member function as a type specific specialization due to how the types must be created in the clang::ASTContext.

in one DWARF compile unit you end up having say "std::vector<int>::clear()" and in another you would have "std::vector<int>::erase()". In order to make the most complete definition of template types we need to find all "std::vector<int>" definitions and manually add any method that we haven't already seen, otherwise we end up not being able to call methods that might exist. Of course calling template functions is already fraught with danger since most are inlined if they come from the STL, but if the user asks for the class definition for "std::vector<int>", we should show all that we can. Can you make sure this patch doesn't hurt that functionality? We must have a test for it.

So we can't just say we will accept just one class definition if we have template types. Given this information, let me know what you think of your current patch

I think I'm confused. I know we're doing some funny stuff with class templates, and I'm certain I don't fully understand how it works. However, I am also pretty sure your description is not correct either. A simple snippet like std::vector<int> v; will produce a definition for the vector<int> class, and that definition will include the declarations for erase, clear, and any other non-template member function, even though the code definitely does not call them. And this makes perfect sense to me given how DWARF (and clang) describes only template instantiations -- so the template instantiation vector<int> is treated the same way as a class vector_int with the same interface.

What you're describing resembles the treatment of template member functions -- those are only emitted in the compile units that they are used. This is very unfortunate for us, though I think it still makes sense from the DWARF perspective. However:
a) In line with the previous paragraph -- this behavior is the same for template class instantiations and regular classes. IOW, the important question is whether the method is a template, not whether its enclosing class is
b) (precisely for this reason) we currently completely ignore member function templates when parsing classes
Given all of that, I don't see how templates are relevant for this patch. *However*, please read on.

Yep, that mostly lines up with my understand.

Re: member function templates: These aren't the only things that vary between copies of a type in the DWARF. The other things that vary are implicit special member functions ({default,move,copy} ctor, dtor, {copy,move} assignment operator}) - these will only be present when instantiated - and nested types (though that's perhaps less of an issue for LLDB, not sure). I have made an argument that functions with "auto" return type should be treated this way (omitted unless the definition is instantiated and the return type is deduced/resolved) too - but for now they are not (they are always included, albeit with the "auto" return type on the declaration, and then the definition DIE has the real/deduced return type).

I take it no tests fail upon removing this condition? Any hints in version history about its purpose?

No tests broke with that. My first archaeology expedition did not find anything, but now I've tried it once more, and I found D13224. That diff is adding this condition (to support -flimit-debug-info, no less) and it even comes with a test. That test still exists, but it was broken over time (in several ways). Once I fix it so that it tests what it was supposed to test, I see that my patch breaks it.

I'm now going to try to figure out how does that code actually work...

Should the cleanups you found for the previous test case be included in this patch (or a separate pre/post-commit?)? The description in the updated patch sounds plausible - though delves into parts of lldb I'm not especially qualified to make a firm assessment on.

labath added a comment.May 4 2022, 8:53 AM

@clayborg, @shafik, what do you make of this version?

Should the cleanups you found for the previous test case be included in this patch (or a separate pre/post-commit?)? The description in the updated patch sounds plausible - though delves into parts of lldb I'm not especially qualified to make a firm assessment on.

I already checked those in --> rG089a1d9deba5

clayborg accepted this revision.May 5 2022, 12:35 PM
This revision is now accepted and ready to land.May 5 2022, 12:35 PM