This is an archive of the discontinued LLVM Phabricator instance.

[Core] Generalize ValueObject::IsRuntimeSupportValue
ClosedPublic

Authored by xiaobai on Jun 12 2019, 5:59 PM.

Details

Summary

Instead of falling back to ObjCLanguageRuntime, we should be falling
back to every loaded language runtime. This makes ValueObject more
language agnostic.

Event Timeline

xiaobai created this revision.Jun 12 2019, 5:59 PM

+ Adrian, as I believe we touched this code recently...

include/lldb/Target/LanguageRuntime.h
159–161

Could we make this a non-public extension point, and make it so that it is automatically invoked from IsRuntimeSupportValue ?

That would simplify the code on the caller side..

source/Target/CPPLanguageRuntime.cpp
47–59

I don't think you're preserving the semantics here. Previously If GetObjectRuntimeLanguage() returned "C++", we would still end up consulting the objc white list. However, now you don't do that..

The new code actually makes more sense to me, but I'm guessing there was a reason things were implemented this way. However, I'm not that familiar with this code, so I'm not sure what to do about that, or whether it even matters...

clayborg requested changes to this revision.Jun 13 2019, 8:22 AM
clayborg added a subscriber: clayborg.
clayborg added inline comments.
source/Core/ValueObject.cpp
1697–1708

Seems like this should still just make one call. The two calls to IsRuntimeSupportValue followed by IsWhitelistedRuntimeValue seems like a bunch of hard to read code. We should make just one call to each runtime and do all the work needed from within there. So I vote to revert this change and do the work in each IsRuntimeSupportValue method. See additional comments in CPPLanguageRuntime::IsRuntimeSupportValue() that was removed.

source/Target/CPPLanguageRuntime.cpp
59

Seems like we should restore this function and just get the ObjC language runtime from the process, and if it exists, call IsWhitelistedRuntimeValue on it?

auto objc_runtime = ObjCLanguageRuntime::Get(m_process);
if (objc_runtime)
  return objc_runtime->IsWhitelistedRuntimeValue(name);
return false;
This revision now requires changes to proceed.Jun 13 2019, 8:22 AM
xiaobai marked 3 inline comments as done.Jun 13 2019, 11:51 AM
xiaobai added inline comments.
include/lldb/Target/LanguageRuntime.h
159–161

See my comment on ValueObject::IsRuntimeSupportValue to see why I think this isn't going to be enough.

source/Core/ValueObject.cpp
1697–1708

I'm not convinced that doing all the work in LanguageRuntime::IsRuntimeSupportValue is the right thing to do. The interface doesn't make a distinction between "Isn't a runtime support value" and "Is a whitelisted runtime support value".
There's not a good way to know. The implementation I have here keeps track of whether or not any language runtime considers this a runtime support value. If a runtime considers it a runtime support value and it's a whitelisted one, you immediately return false. If you have just IsRuntimeSupportValue and one runtime says "this is a runtime support value" but the other says "this isn't a runtime support value", how do you know if the second runtime is trying to say it's a whitelisted value?

However, getting rid of the initial GetLanguageRuntime+check in favor of just having one big loop seems like it'll be better so I'll change that.

source/Target/CPPLanguageRuntime.cpp
59

Indeed, I did not preserve the semantics here because they are technically wrong. It means that, regardless of whether or not you have an ObjC runtime loaded, you still make sure it's being handled.

Greg's answer is the correct answer while trying to preserve the intention. I think that this is unnecessary though, because ValueObject::IsRuntimeSupportValue after this patch should check every available runtime. If you care about ObjC++, the ObjC language runtime should handle the ObjC whitelisted runtime values.

xiaobai updated this revision to Diff 204615.Jun 13 2019, 1:06 PM

Simplify ValueObject::IsRuntimeSupportValue

labath resigned from this revision.Jun 20 2019, 11:54 PM

I'm not familiar with the code enough to make the calls here.

clayborg added inline comments.Jun 24 2019, 8:55 AM
source/Core/ValueObject.cpp
1698–1708

Things brings the questions: do we really need to filter these variables? I wouldn't mind seeing "_cmd" and any other defaulted objective C variables in the IDE. This seems like a lot of work to go through to just stop showing "_cmd". Any variables for the language object ("self" for ObjC or "this" for C++) are marked as artificial so we want to see these. If that is the case, we can remove this all together.

jingham added inline comments.Jun 24 2019, 10:18 AM
source/Core/ValueObject.cpp
1698–1708

The reason for having a whitelist as well as suppressing artificial variables is so that we can show self & _cmd while still suppressing the really artificial variables like the ones used to track dynamic array sizes and swift metadata symbols. I don't think we want to show the latter.

clayborg added inline comments.Jun 24 2019, 10:24 AM
source/Core/ValueObject.cpp
1698–1708

Sounds good, my objection is removed. Still kind of weird that we runtime->IsRuntimeSupportValue() from one language and runtime->IsWhitelistedRuntimeValue() from another. I know why we do it, but it still seems a bit weird.

aprantl added inline comments.Jun 24 2019, 4:17 PM
include/lldb/Target/LanguageRuntime.h
156

I think this function should not be part of LanguageRuntime any more since there is nothing runtime-specific about it any more.Instead, it should probably be a function implemented by ValueObjectVariable. The Whitelist make still sense in the runtime of course.

xiaobai marked an inline comment as done.Jun 24 2019, 5:44 PM
xiaobai added inline comments.
include/lldb/Target/LanguageRuntime.h
156

I think you're right. No LanguageRuntime currently overrides this and ValueObject uses this anyway as a fallback. If any language runtime needs to implement this kind of behavior, we can add it back at that time.

xiaobai updated this revision to Diff 206351.Jun 24 2019, 6:45 PM

Address feedback

aprantl added inline comments.Jun 25 2019, 10:33 AM
source/Core/ValueObject.cpp
1698–1708
if (!process)
  return false;
if (!marked_as_runtime_support_val)
  return false;
1705

then this condition becomes

if (runtime->IsWhitelistedRuntimeValue(GetName()))
  return false;
1712

return true;

xiaobai updated this revision to Diff 206506.Jun 25 2019, 12:30 PM

Address feedback

clayborg added inline comments.Jun 26 2019, 9:37 AM
source/Core/ValueObject.cpp
1706–1708

Still seems weird that any language can white list a variable by name. Say swift has a variable named "this" which is truly should be hidden and is marked as artificial, we will always show it...

davide added inline comments.Jun 26 2019, 9:50 AM
source/Core/ValueObject.cpp
1706–1708

Swift has no this, it has self. And yes, there are many places in the debugger where self is mentioned explicitly by name and has special handling. Also, I think we want to show it, most of the times.

clayborg added inline comments.Jun 26 2019, 10:26 AM
source/Core/ValueObject.cpp
1706–1708

Exactly Davide. Any language can whitelist any variable they want for any other language regardless of the language of origin of the current ValueObject.

jingham requested changes to this revision.Jun 26 2019, 10:28 AM
jingham added inline comments.
source/Core/ValueObject.cpp
1706–1708

I agree with Greg. Otherwise various languages are going to fight about their respective white lists. We should really get the ValueObject's runtime language with ValueObject::GetObjectRuntimeLanguage() and then asking that runtime.

This revision now requires changes to proceed.Jun 26 2019, 10:28 AM
aprantl added inline comments.Jun 26 2019, 11:39 AM
source/Core/ValueObject.cpp
1706–1708

That's true, but in practice not likely a big problem, since you'd, e.g., need to have an *artificial* variable called self in C++ for this to surface.

clayborg added inline comments.Jun 26 2019, 2:08 PM
source/Core/ValueObject.cpp
1706–1708

What variables are we actually trying to not show here? Seems like we mostly want to see the variables, even artificial ones. Does anyone know what the variables that we don't want to see are? And how big of a problem are they?

aprantl added inline comments.Jun 26 2019, 2:23 PM
source/Core/ValueObject.cpp
1706–1708

The ones Jim mentioned earlier: Clang injects helper variables for VLA size, Swift injects various kinds of type metadata. We definitely do want to hide them.

xiaobai marked an inline comment as done.Jun 26 2019, 2:33 PM
xiaobai added inline comments.
source/Core/ValueObject.cpp
1706–1708

We should really get the ValueObject's runtime language with ValueObject::GetObjectRuntimeLanguage() and then asking that runtime.

This isn't going to work though. If you want to know whether you should hide something like _cmd, then GetObjectRuntimeLanguage is going to give you eLanguagetypeC, which has no runtime. If it's marked as artificial, then you'll not see it even though ObjCRuntime whitelists it.

Jim: do you have any other solutions that could make this work?

source/Core/ValueObject.cpp
1706–1708

What Alex is saying is the ValueObject figures out its language solely from the TypeSystem type, not from the compile unit language. So we can have C variables in C++, ObjC or ObjC++. That does make it harder.

What we really want to know is if a variable is artificial _and_ a language or language runtime related variable. We seem to want to omit compiler helper variables, but we can't tell the difference. I am find with the current solution for now since there is not easy solution. We can revisit if we need to later.

Shouldn't ValueObjectVariables figure out their runtime language from their defining frame, not their CompilerType? For a ValueObject you get from an expression, you probably can't do that. But we're always talking about hiding locals or args here - i.e. they are all ValueObjectVariables. And it seems to me that in that case getting the RuntimeLanguage from the containing frame is much more useful than from the CompilerType.

Shouldn't ValueObjectVariables figure out their runtime language from their defining frame, not their CompilerType? For a ValueObject you get from an expression, you probably can't do that. But we're always talking about hiding locals or args here - i.e. they are all ValueObjectVariables. And it seems to me that in that case getting the RuntimeLanguage from the containing frame is much more useful than from the CompilerType.

Does every variable have an associated frame? I imagine things like global variables wouldn't. I'm not sure if any language or implementation of a language has global variables as compiler/runtime helper variables, but I'm not comfortable making that assumption since one of my goals is more generalized language support.

Shouldn't ValueObjectVariables figure out their runtime language from their defining frame, not their CompilerType? For a ValueObject you get from an expression, you probably can't do that. But we're always talking about hiding locals or args here - i.e. they are all ValueObjectVariables. And it seems to me that in that case getting the RuntimeLanguage from the containing frame is much more useful than from the CompilerType.

Does every variable have an associated frame? I imagine things like global variables wouldn't. I'm not sure if any language or implementation of a language has global variables as compiler/runtime helper variables, but I'm not comfortable making that assumption since one of my goals is more generalized language support.

To be more precise, "frame" is the wrong word to use, Variables have "scopes"... All Variables have a scope, though not every scope is contained in a Function.

But just to back up a little bit.

We're getting into problems, it seems to me, because we're asking ValueObjects questions that should be asked of their containing scope. For instance, if you are in a Function, and want to list the locals and arguments, you want to know what the rules for the Function are for which artificial variables should and should not be printed, not any of its value objects.

For instance, ObjC could decide that it wants to reserve the word "this" in methods of ObjC classes in ObjC++ files, and use it as a should-be-hidden artificial variable that will always store a pointer to C++ object for some evil purposes. Asking the runtime language of this artificial "this" variable whether the word "this" is runtime whitelisted seems like the wrong thing to do. After all, the Variable has a CompilerType that's clearly a C++ object, so you might reasonably expect its runtime language to be C++. But then you would get the wrong answer. You really need to ask the Function "what are your rules for hiding artificial variables".

Getting to the right actor to ask about "which artificial variables to show" was the reason for suggesting that the runtime language of a Variable should be that of its containing scope. But then you get into the weird situation where a C++ Variable is claiming a runtime language of ObjC++, which seems a little odd. Instead, it makes more sense to get the RuntimeLanguage of the current frame, and then iterate over the variables and ask the frame's runtime whether they should be shown or not.

I'm not sure what to do about global variables. Reasoning by analogy, that's something that the language of the CompileUnit which contains the global variables should answer.

To be more precise, "frame" is the wrong word to use, Variables have "scopes"... All Variables have a scope, though not every scope is contained in a Function.

But just to back up a little bit.

We're getting into problems, it seems to me, because we're asking ValueObjects questions that should be asked of their containing scope. For instance, if you are in a Function, and want to list the locals and arguments, you want to know what the rules for the Function are for which artificial variables should and should not be printed, not any of its value objects.

For instance, ObjC could decide that it wants to reserve the word "this" in methods of ObjC classes in ObjC++ files, and use it as a should-be-hidden artificial variable that will always store a pointer to C++ object for some evil purposes. Asking the runtime language of this artificial "this" variable whether the word "this" is runtime whitelisted seems like the wrong thing to do. After all, the Variable has a CompilerType that's clearly a C++ object, so you might reasonably expect its runtime language to be C++. But then you would get the wrong answer. You really need to ask the Function "what are your rules for hiding artificial variables".

Getting to the right actor to ask about "which artificial variables to show" was the reason for suggesting that the runtime language of a Variable should be that of its containing scope. But then you get into the weird situation where a C++ Variable is claiming a runtime language of ObjC++, which seems a little odd. Instead, it makes more sense to get the RuntimeLanguage of the current frame, and then iterate over the variables and ask the frame's runtime whether they should be shown or not.

I'm not sure what to do about global variables. Reasoning by analogy, that's something that the language of the CompileUnit which contains the global variables should answer.

If I understood you correctly, your conclusion is that ValueObject::IsRuntimeSupportValue shouldn't really exist and that you should be asking a frame or a compilation unit if a value should be displayed to the user. I think that this is reasonable. That being said, I think that this idea will have some interesting challenges as well. One problem that I'm aware of at the moment is that SBValue has IsRuntimeSupportValue. If I remember correctly, removing from the SBAPI is generally not permitted (something I disagree with in general, but I digress).

In the short term, I'd like to get this patch (or some variation of it) into LLDB so that we can remove ValueObject's dependence on ObjC (after this there's one reference left but I have a good idea of how to remove it). Would you mind if I committed this, or do you feel strongly that we should first refactor further?

jingham added a comment.EditedJun 26 2019, 5:04 PM

To be more precise, "frame" is the wrong word to use, Variables have "scopes"... All Variables have a scope, though not every scope is contained in a Function.

But just to back up a little bit.

We're getting into problems, it seems to me, because we're asking ValueObjects questions that should be asked of their containing scope. For instance, if you are in a Function, and want to list the locals and arguments, you want to know what the rules for the Function are for which artificial variables should and should not be printed, not any of its value objects.

For instance, ObjC could decide that it wants to reserve the word "this" in methods of ObjC classes in ObjC++ files, and use it as a should-be-hidden artificial variable that will always store a pointer to C++ object for some evil purposes. Asking the runtime language of this artificial "this" variable whether the word "this" is runtime whitelisted seems like the wrong thing to do. After all, the Variable has a CompilerType that's clearly a C++ object, so you might reasonably expect its runtime language to be C++. But then you would get the wrong answer. You really need to ask the Function "what are your rules for hiding artificial variables".

Getting to the right actor to ask about "which artificial variables to show" was the reason for suggesting that the runtime language of a Variable should be that of its containing scope. But then you get into the weird situation where a C++ Variable is claiming a runtime language of ObjC++, which seems a little odd. Instead, it makes more sense to get the RuntimeLanguage of the current frame, and then iterate over the variables and ask the frame's runtime whether they should be shown or not.

I'm not sure what to do about global variables. Reasoning by analogy, that's something that the language of the CompileUnit which contains the global variables should answer.

If I understood you correctly, your conclusion is that ValueObject::IsRuntimeSupportValue shouldn't really exist and that you should be asking a frame or a compilation unit if a value should be displayed to the user. I think that this is reasonable. That being said, I think that this idea will have some interesting challenges as well. One problem that I'm aware of at the moment is that SBValue has IsRuntimeSupportValue. If I remember correctly, removing from the SBAPI is generally not permitted (something I disagree with in general, but I digress).

In the short term, I'd like to get this patch (or some variation of it) into LLDB so that we can remove ValueObject's dependence on ObjC (after this there's one reference left but I have a good idea of how to remove it). Would you mind if I committed this, or do you feel strongly that we should first refactor further?

I was talking specifically about which runtime should be asked the question. It seemed to me most straightforward to invert the order in which the question was asked, but another way to get the same effect without rejiggering the API's at this point is to have ValueObject::IsRuntimeSupportValue do:

SymbolContextScope *sym_ctx_scope = GetSymbolContextScope();
LanguageType languageToAsk = eLanguageTypeUnknown;
if (sym_ctx_scope) {
   Function *func = sym_ctx_scope->CalculateSymbolContextFunction();
   if (func)
    languageToAsk = func->GetLanguage();
  else  //Do something similar for the CU.

Then get the languageRuntime for languageToAsk and ask it. That way you don't have to monkey with the runtime language of the ValueObject, but you are still asking the right actor.

I was talking specifically about which runtime should be asked the question. It seemed to me most straightforward to invert the order in which the question was asked, but another way to get the same effect without rejiggering the API's at this point is to have ValueObject::IsRuntimeSupportValue do:

SymbolContextScope *sym_ctx_scope = GetSymbolContextScope();
LanguageType languageToAsk = eLanguageTypeUnknown;
if (sym_ctx_scope) {
   Function *func = sym_ctx_scope->CalculateSymbolContextFunction();
   if (func)
    languageToAsk = func->GetLanguage();
  else  //Do something similar for the CU.

Then get the languageRuntime for languageToAsk and ask it. That way you don't have to monkey with the runtime language of the ValueObject, but you are still asking the right actor.

Right, I see. I'll try to implement something along the lines of what you're suggesting and report back. Thanks for the suggestion!

xiaobai added a comment.EditedJun 27 2019, 2:45 PM

@jingham: Okay, so I tried to do what you suggested and that solution doesn't work because of ObjC++. After looking into it, it looks like Variable and Function just ask the CompileUnit for the language type instead of determining it themselves meaning that determining the Variable's language boils down to asking the CompileUnit for its language. That can probably be improved, but I think that just relying on the SymbolContextScope alone isn't yet sufficient. I think it would be worth it to ask the SymbolContextScope before trying all the loaded language runtimes though. Would you be okay with that solution?

@jingham: Okay, so I tried to do what you suggested and that solution doesn't work because of ObjC++. After looking into it, it looks like Variable and Function just ask the CompileUnit for the language type instead of determining it themselves meaning that we determining the Variable's language boils down to asking the CompileUnit for its language. That can probably be improved, but I think that just relying on the SymbolContextScope alone isn't yet sufficient. I think it would be worth it to ask the SymbolContextScope before trying all the loaded language runtimes though. Would you be okay with that solution?

That's unfortunate. For ObjC++ CompUnits we should get the language from the function name: it's either a C++ mangled name (language => , or an ObjC name...

One way to solve this would be an ObjC++ language runtime which dispatches to the ObjC and C++ ones as is appropriate. I'm not sure whether that would always be correct, but it would provide a more explicit way to get this right. OTOH, that's a lot more work...

Is your suggestion to say that if the value IS whitelisted for the SymbolContextScope language, then we're done, otherwise we ask all the runtimes?

Asking C, C++ & ObjC in an ObjC++ frame is not really right - it would fail my contrived example above - but seems like it has a low probability of getting us into trouble. But if you are in a ObjC++ file, you certainly don't want to ask the Swift LanguageRuntime whether it whitelists the symbol. Those are entirely incompatible languages and you shouldn't be asking Swift questions about any C-family language frame. Perhaps a more precise version of your suggestion would be that if your SymbolContextScope language is a C-family language, we ask all the other C Family language runtimes, but not the orthogonal languages.

We don't have this problem of intermixable "language's" with Swift (and probably not with Rust...) or really anything else sensible. There's just one language and you can't intermix it with code from another language... But if we did we could add the notion of language families more generally.

@jingham: Okay, so I tried to do what you suggested and that solution doesn't work because of ObjC++. After looking into it, it looks like Variable and Function just ask the CompileUnit for the language type instead of determining it themselves meaning that we determining the Variable's language boils down to asking the CompileUnit for its language. That can probably be improved, but I think that just relying on the SymbolContextScope alone isn't yet sufficient. I think it would be worth it to ask the SymbolContextScope before trying all the loaded language runtimes though. Would you be okay with that solution?

That's unfortunate. For ObjC++ CompUnits we should get the language from the function name: it's either a C++ mangled name (language => , or an ObjC name...

I'm going to try to modify Function::GetLanguage to see if it can figure out the language based on the mangled name before it asks the compilation unit. I think that could potentially solve the issue with ObjC++

One way to solve this would be an ObjC++ language runtime which dispatches to the ObjC and C++ ones as is appropriate. I'm not sure whether that would always be correct, but it would provide a more explicit way to get this right. OTOH, that's a lot more work...

This could possibly work, but it feels like the wrong abstraction. Creating a LanguageRuntime for a language with no runtime library to work around the fact that it uses two runtimes for different languages is just masking the issue.

Is your suggestion to say that if the value IS whitelisted for the SymbolContextScope language, then we're done, otherwise we ask all the runtimes?

Yup! I personally don't think that asking every language runtime is that big of a deal, but I recognize that I could be wrong about that.

Asking C, C++ & ObjC in an ObjC++ frame is not really right - it would fail my contrived example above - but seems like it has a low probability of getting us into trouble. But if you are in a ObjC++ file, you certainly don't want to ask the Swift LanguageRuntime whether it whitelists the symbol. Those are entirely incompatible languages and you shouldn't be asking Swift questions about any C-family language frame. Perhaps a more precise version of your suggestion would be that if your SymbolContextScope language is a C-family language, we ask all the other C Family language runtimes, but not the orthogonal languages.

This idea isn't super terrible imo, but you're right here: Asking runtimes of other languages in the same family isn't right even if it has a lower probability of getting us into trouble. It will work until it doesn't.

xiaobai updated this revision to Diff 206994.Jun 27 2019, 10:02 PM
  • Implement @jingham's suggestion
  • Change Function::GetLanguage to first guess the language from the name of the function you're in.
  • Fix a bug in DWARFASTParserClang::ParseFunctionFromDWARF that qualifies ObjC methods as if they were C++ methods when parsing a function from an ObjC++ CU

This looks good to me. I wonder if the SymbolContextScope -> Language calculation that you do in IsRuntimeSupportValue should really be done in SymbolContextScope? If that's going to be the policy for going from SymbolContextScope, maybe centralize it there?

This looks good to me. I wonder if the SymbolContextScope -> Language calculation that you do in IsRuntimeSupportValue should really be done in SymbolContextScope? If that's going to be the policy for going from SymbolContextScope, maybe centralize it there?

I agree, centralize this in SymbolContextScope and this should be good to go. Thanks for sticking with this Alex.

Centralizing it means changing the classes that inherit from SymbolContextScope, so I will refactor that in a seaparate change and stack this change on top of that one.

Jim, you ok with doing the symbol context scope refactoring as a separate diff?

jingham accepted this revision.Jul 1 2019, 11:30 AM

Sure, NP.

This revision was not accepted when it landed; it landed in state Needs Review.Jul 1 2019, 1:36 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJul 1 2019, 1:36 PM