This is an archive of the discontinued LLVM Phabricator instance.

Be more permissive in what we consider a variable name.
Needs ReviewPublic

Authored by zturner on Nov 12 2018, 3:40 PM.

Details

Summary

When we evaluate a variable name as part of an expression, we run a regex against it so break it apart. The intent seems to be that we want to get the main variable name (i.e. the part of the user input which is both necessary and sufficient to find the record in debug info), but leave the rest of the expression alone (for example the variable could be an instance of a class, and you have written variable.member.

But I believe the current regex to be too restrictive. For example, it disallows variable templates, so for example if the user writes variable<int> we would strip off the <int>, but this is absolutely necessary to find the proper record in the debug info. It also doesn't allow things like 'anonymous namespace'::variable which under the Microsoft ABI is a valid thing. Nor does it permit spaces, so we couldn't have something like foo<long double> (assuming we first fixed the template issue).

Rather than try to accurately construct a regex for the set of all possible things that constitute a variable name, it seems easier to construct a regex to match all the things that do not constitute a variable name. Specifically, an occurrence of the . operator or -> operator, since that's what ultimately defines the beginning of a sub-expression.

So this changes the regex accordingly.

Diff Detail

Event Timeline

zturner created this revision.Nov 12 2018, 3:40 PM

You were probably speaking loosely, but to be clear, the code you are changing doesn't get used for expressions - i.e. the expression command - unless I'm missing something.

This little mini-parser is for doing things like:

(lldb) frame variable foo.bar

We don't use clang or the expression parser to comprehend foo.bar, we use GetValuesForVariableExpressionPath instead. But 'frame variable' 'target variable' etc. which use this are fairly limited in what they need to support. They just needs to handle variable references - either local variables or static variables, and references to children of those variables. No calls, no casts, no types, etc.

So I don't see that the ability to handle template definition markers like <int> is important. I don't see how that would show up in a variable name or the name of one of its children.

You were probably speaking loosely, but to be clear, the code you are changing doesn't get used for expressions - i.e. the expression command - unless I'm missing something.

This little mini-parser is for doing things like:

(lldb) frame variable foo.bar

We don't use clang or the expression parser to comprehend foo.bar, we use GetValuesForVariableExpressionPath instead. But 'frame variable' 'target variable' etc. which use this are fairly limited in what they need to support. They just needs to handle variable references - either local variables or static variables, and references to children of those variables. No calls, no casts, no types, etc.

So I don't see that the ability to handle template definition markers like <int> is important. I don't see how that would show up in a variable name or the name of one of its children.

C++14 variable templates.

template<typename T> T SomeVariable;

void foo() {
  SomeVariable<int> IntVariable = x;
}

Now in this case you can simply say frame variable IntVariable, and it will work fine. But what about this?

template<typename T>
constexpr T Pi = 3.1415;

In this case you might say target variable Pi<double> and this will currently fail.

A slightly less contrived example:

template<typename T>
struct Foo {
  static T SomeGlobal;
};

int Foo<int>::SomeGlobal = 42;
long double Foo<long double>::SomeGlobal = 42.0;

Now you might write target variable "Foo<long double>::SomeGlobal"

For a totally real world example, this came up for me when I tried to write target variable std::numeric_limits<int>::is_signed.

If namespace::non_template_class::some_constant works, then there's no reason this shouldn't also work.

Those seem legit things to try to capture, though a little esoteric. Since "frame variable" and "target variable" didn't support these constructs before you should certainly add some tests for that.

The frame variable parser also supports:

(lldb) frame variable foo[0]

where foo is anything that can produce "vector" like children (e.g. std::vector's). Will your change work with that?

Since we only handle . and -> this general idea seems ok to me. Do we even need a regex then? Maybe just search for first of ".-"?

We do also handle [], though it isn't obvious to me after a quick glance where that gets done. This is a little cheesy because the name of the child that we are finding with [0] is actually "[0]", so you just have to be careful not to consume that when you consume the variable name.

I don't remember any other special aggregate names like this.

Those seem legit things to try to capture, though a little esoteric. Since "frame variable" and "target variable" didn't support these constructs before you should certainly add some tests for that.

The frame variable parser also supports:

(lldb) frame variable foo[0]

where foo is anything that can produce "vector" like children (e.g. std::vector's). Will your change work with that?

Might need to modify the regex to stop at [, but then it should. Might as well make target variable work with that syntax too, or at least there's no reason to add special code to frame variable that's not in target variable. I think the regex should just also stop at an open brace, that way everything should "just work". So perhaps Greg's suggestion of not using a regex at all, but just find_first_of(".-[") is sufficient. (There are still some even more obscure cases where [ can appear in a template argument, but it's so obscure that I think it's better to optimize for the common case).

BTW, I will have to see if it's possible to write a test for this. Even when I compiled and built a program with DWARF on Linux, the target variable Pi<double> example didn't "just work" for me, because FindGlobalVariables wasn't returning the variable. So I think this part actually needs to be fixed in the DWARF plugin, which I'm not equipped to fix. I can try something that is not a variable template, such as the Foo<int>::StaticMember example, but if that also doesn't work, then there might not be a good way to write a general purpose test for this until this is fixed in the DWARF plugin.

I can definitely add a test in the native pdb plugin though, and technically that runs everywhere (although it would only test target variable and not frame variable, maybe that's ok though?).

A dotest test can be xfailed if the debug format is not PDB, right? At least we can xfail them for all the DWARF variants so it should be possible to do that for PDB as well. So you should be able to write a test for this and then just xfail it till the DWARF parser can be fixed.

About the parsing rules... The name of the children of arrays or array like entities is, for example, "[0]". So the virtue of the current approach is that we grab a "legal" identifier, and then the next bit we grab will be the name of the child or synthetic child - if there are any. But we don't actually have to know what the name of the child is, it can be anything so long as it doesn't start with a legal identifier first character.

Your suggest approach will bless useable aggregate child names. That's probably okay. These child names should look like the way you would access the element in the languages we support - and this code is really C-ish right now, so just allowing the C/C++ style element accessors isn't a huge restriction.

BTW, I was just using "frame variable" as a example, target variable also supports []:

(lldb) source list -l 1
   1   	int g_vec[3] = {10, 20, 30};
   2   	
   3   	int main() {
   4   	  return g_vec[0] + g_vec[1]; 
   5   	}
(lldb) target var g_vec[0]
(int) g_vec[0] = 10

BTW, I will have to see if it's possible to write a test for this. Even when I compiled and built a program with DWARF on Linux, the target variable Pi<double> example didn't "just work" for me, because FindGlobalVariables wasn't returning the variable. So I think this part actually needs to be fixed in the DWARF plugin, which I'm not equipped to fix. I can try something that is not a variable template, such as the Foo<int>::StaticMember example, but if that also doesn't work, then there might not be a good way to write a general purpose test for this until this is fixed in the DWARF plugin.

I can definitely add a test in the native pdb plugin though, and technically that runs everywhere (although it would only test target variable and not frame variable, maybe that's ok though?).

It is the template stuff that throws things off. Can't remember if there is an accelerator table entry for those. Global variables in general work. It would be good to figure out what this doesn't work for DWARF. I will take a look.

I know when I stepped through it with the Pi example, it was returning all
matches, but not filtering down the results based on the template parameter
first, so you’d get back every instantiation but the template parameter
would be treated as a subexpression.

I think the SymbolFile plugin should just get the whole string though and
do this filtering itself. Might be hard for complicated template arguments.
But if it’s too hard the plugin can always just give up and do what it
currently does.

For PDB it’s the other way around, without this information lookup is
actually impossible, because you have to hash the record name, and the
template parameters are part of the hash