This is an archive of the discontinued LLVM Phabricator instance.

Extend FindTypes w/ CompilerContext to allow filtering by language
ClosedPublic

Authored by aprantl on Aug 21 2019, 11:16 AM.

Details

Summary

This patch is also motivated by the Swift branch and is effectively NFC for the single-TypeSystem llvm.org branch.

In multi-language projects it is extremely common to have, e.g., a Clang type and a similarly-named rendition of that same type in another language. When searching for a type It is much cheaper to pass a set of supported languages to the SymbolFile than having it materialize every result and then rejecting the materialized types that have the wrong language.

Diff Detail

Event Timeline

aprantl created this revision.Aug 21 2019, 11:16 AM

I think the functionality is fine.

I know this is a little ironic, given that we just had this discussion, but I would propose wrapping the SmallBitVector here. That way you can abstracting away the details about the underlying enum values, the size, and make the whole thing type safe. Although I agree with Pavel's earlier comment, I think subclassing might even be a good fit, so that you don't have to wrap the operator and set methods, but either having the SmallBitVector as a member is fine with me too.

lldb/source/Symbol/ClangASTContext.cpp
801

Let's add a comment why you choose 64-8 and why we can't use those last 8 bits.

I also just discovered void ClangASTContext::EnumerateSupportedLanguages() and will incorporate that into the next revision, too.

aprantl updated this revision to Diff 216508.Aug 21 2019, 3:42 PM

Address review feedback.

(This was quite a trip down the rabbit hole, but on the plus side I got to remove a few (now) completely useless callback APIs).

This looks like a great improvement. Thanks for working on this!

lldb/include/lldb/Core/PluginManager.h
400

🤔

423

🙄

lldb/source/Core/Debugger.cpp
1628–1630

Let's wrap this check and the find_first call into a method that returns an llvm::Optional< LanguageType>?

aprantl updated this revision to Diff 216518.Aug 21 2019, 4:21 PM

More feedback from Jonas.

This LGTM, unless Pavel has comments.

The functionality seems fine, and I like how you got rid of the extra callbacks in the plugin manager in favour of passing the sets explicitly. I'm not really a fan of inheriting from standard containers. And though the motivation here is stronger than in the previous case, it is not without its problems -- for instance half of this patch uses the new LanguageSet type, whereas the other (SymbolFile::FindTypes) still uses llvm::SmallBitVector. This also nicely demonstrates the main problem with inheriting from classes which aren't meant to be inherited (that they can be accidentally sliced) and means that this isn't as type safe as one might hope..

Now, I am not sure what all of that means here. This functionality is not so widespread that it makes sense to develop a full-blown class to do what you need here, so it may be best to just go with what you have here, and possibly rework this if it becomes more widespread in the future.... It's a pitty that std::bitset does not have a richer interface for accessing the bits. Otherwise, you could just do typedef std::bitset<eNumLanguages> LanguageSet and be done with it..

lldb/source/Symbol/TypeSystem.cpp
36–37

Will this work correctly on big endian too?

aprantl updated this revision to Diff 216644.Aug 22 2019, 9:30 AM

Here's an improved version:

  • got rid of the completely unnecessary optimization that use memcpy
  • made the struct have a SmallBitVector member instead of inheriting from it.

The new struct has an LLVM-like interface for many use-cases, and still exposes the bitvector where necessary, but is explicit about it.

Looks good overall. Just a question of it we want to return "const LanguageSet &" to avoid copies. Also switch static functions that currently return "LanguageSet" over to use static variables and llvm::once to init them and then return "const LanguageSet &".

lldb/include/lldb/Target/Language.h
269–272

return a "const LanguageSet &" to avoid copies?

lldb/source/Core/Debugger.cpp
1626

"const LanguageSet &" to avoid copies?

lldb/source/Core/PluginManager.cpp
2172

Add a LanguageSet method to do this? We are playing with internals here. Maybe:

all.Merge(instances[i]);
2265–2267

This function is static. Should we return a "const LanguageSet &" here? Also use std::once/llvm::once?:

static LanguageSet g_langs;
std::once once(...) {
  ... do work to populate
});
return g_langs;
lldb/source/Interpreter/OptionValueLanguage.cpp
43

"const LanguageSet &"?

lldb/source/Symbol/ClangASTContext.cpp
737–755

return "const LanguageSet &"? Then make "static LanguageSet g_languages;" and use llvm::once to control one time init?

754

return "const LanguageSet &"? Then make "static LanguageSet g_languages;" and use llvm::once to control one time init?

Looks good overall. Just a question of it we want to return "const LanguageSet &" to avoid copies. Also switch static functions that currently return "LanguageSet" over to use static variables and llvm::once to init them and then return "const LanguageSet &".

The point of the static_assert in TypeSystem.cpp is to make sure that a LanguageSet is exactly 64 bits in size so it is cheap to pass it by value.

aprantl added a comment.EditedAug 22 2019, 10:47 AM

Looks good overall. Just a question of it we want to return "const LanguageSet &" to avoid copies. Also switch static functions that currently return "LanguageSet" over to use static variables and llvm::once to init them and then return "const LanguageSet &".

The point of the static_assert in TypeSystem.cpp is to make sure that a LanguageSet is exactly 64 bits in size so it is cheap to pass it by value.

Actually, on a 64-bit system it is *always* 64 bits in size, but the static_assert guarantees that no memory is allocated.

Sounds good about LanguageSet being cheap to pass by value. Are there any paths that will call this over and over where we still will be calculating the LangaugeSet over and over in a type system? We might benefit from using llvm::once and a static LanguageSet in the static functions that return LanguageSets if that is the case?

Sounds good about LanguageSet being cheap to pass by value. Are there any paths that will call this over and over where we still will be calculating the LangaugeSet over and over in a type system? We might benefit from using llvm::once and a static LanguageSet in the static functions that return LanguageSets if that is the case?

No, the most "expensive" function is ClangASTContext::GetSupportedLanguagesForTypes() which is called once by the ClangASTContext constructor.

No, the most "expensive" function is ClangASTContext::GetSupportedLanguagesForTypes() which is called once by the ClangASTContext constructor.

... "expensive" in quotes, because a sufficiently smart compiler should optimize it into returning a 64-bit constant.

labath accepted this revision.Aug 22 2019, 11:34 AM

This sounds like a good compromise.

lldb/include/lldb/Symbol/SymbolFile.h
197

Is there a reason for using a SmallBitVector instead of the LanguageSet type here?

This revision is now accepted and ready to land.Aug 22 2019, 11:34 AM
aprantl marked an inline comment as done.Aug 22 2019, 12:11 PM
aprantl added inline comments.
lldb/include/lldb/Symbol/SymbolFile.h
197

No! Thanks. That's a leftover.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptAug 22 2019, 12:23 PM

Yeah, I'm trying to get behind what happened there, but I may need some help later on.

Looks like this also broke other platforms, which is good because I can debug it! I reverted the patch for now.

Turns out I accidentally deleted the constructor of LanguageSet in my very last refactoring.