This is an archive of the discontinued LLVM Phabricator instance.

Specify a language to use when parsing breakpoint identifiers.
ClosedPublic

Authored by dawn on Jul 11 2015, 2:20 AM.

Details

Summary

The language must be known in order to correctly set the name_type_mask for languages like Pascal and Java. For example, to set a breakpoint on 'ns.foo' for a Pascal function 'foo' in namespace 'ns', we need to know that the language is Pascal in order to set the lookup masks to (eFunctionNameTypeMethod | eFunctionNameTypeBase) and the basename to "foo", just as we would in C++ for 'ns::foo'.

Previous patch (rejected):
This patch solves this problem by using the language of the selected frame (similar to gdb). Getting the language from user settings is also planned. This patch (and others) paves the way for the PascalLanguageRuntime support to come, which will be called if the language is Pascal.

New patch:
Target and breakpoints options were added:

breakpoint set --language lang --name func
settings set target.language pascal

These specify the Language to use when interpreting the breakpoint's expression (note: currently only implemented for breakpoints on identifiers). If the breakpoint language is not set, the target.language setting is used. These are needed This support is required by Pascal, for example, to set breakpoint at 'ns.foo' for function 'foo' in namespace 'ns'.
Tests on the language were also added to Module::PrepareForFunctionNameLookup for efficiency.

Diff Detail

Repository
rL LLVM

Event Timeline

dawn updated this revision to Diff 29511.Jul 11 2015, 2:20 AM
dawn retitled this revision from to Parse breakpoint expressions using the language of the frame's CU..
dawn updated this object.
dawn added reviewers: clayborg, spyffe, zturner.
dawn set the repository for this revision to rL LLVM.
dawn added a subscriber: lldb-commits.
dawn added a comment.Jul 11 2015, 2:49 AM

I expect this patch may draw up some controversy; if not acceptable, please advise on how to rework?
Greg, as the expert in this area, I would especially appreciate your feedback. Any others I could add to this review?

source/Core/Module.cpp
1745

tests for language will be added along with PascalLanguageRuntime support.

clayborg requested changes to this revision.Jul 13 2015, 10:11 AM
clayborg edited edge metadata.

The current frame really isn't a good indicator of the language that one should use for a breakpoint you are want to set by name.

I would rather see smarts go into Module::PrepareForFunctionNameLookup() and teach it to pull apart names like "ns.foo" and do the right thing. If a string comes into Module::PrepareForFunctionNameLookup() that contains <indentifier>.<indentifier> repeated as many times as needed (<indentifier>.<indentifier>.<indentifier>.<indentifier>) then we should correctly pull those apart and search for "foo" and then match against "ns.foo" (just like Module::PrepareForFunctionNameLookup() takes "ns::foo" and looks up "foo" and then matches agains "ns::foo"). I don't think we need a language indicator to tell us, we should be able to just break it up without that hint. We could provide a language hint to Module::PrepareForFunctionNameLookup() if we want to and then have the user specify "--language pascal" when doing the "breakpoint set --name ns.foo --language pascal", but we shouldn't be using any frame's compile unit language to do so.

This revision now requires changes to proceed.Jul 13 2015, 10:11 AM
dawn added a comment.Jul 14 2015, 2:44 PM

Many thanks for taking the time to review! This new language support is painful... I'm grateful you're willing to work with me on it!!

I would rather see smarts go into Module::PrepareForFunctionNameLookup() and teach it to pull apart names like "ns.foo" and do the right thing. If a string comes into Module::PrepareForFunctionNameLookup() that contains <indentifier>.<indentifier> repeated as many times as needed (<indentifier>.<indentifier>.<indentifier>.<indentifier>) then we should correctly pull those apart and search for "foo" and then match against "ns.foo" (just like Module::PrepareForFunctionNameLookup() takes "ns::foo" and looks up "foo" and then matches agains "ns::foo"). I don't think we need a language indicator to tell us, we should be able to just break it up without that hint.

Currently this "pulling apart" happens via CPPLanguageRuntime. Do you want a version of
CPPLanguageRuntime that handle Pascals as well; a LanguageRuntime::ExtractContextAndIdentifier that is a superset and iterate through that? The problem is that the type mask can come in as eFunctionNameTypeMethod or eFunctionNameTypeBase. What C++ does with this (how it would split it apart) is different from how Pascal would split it. How would you workaround that?

We could provide a language hint to Module::PrepareForFunctionNameLookup() if we want to and then have the user specify "--language pascal" when doing the "breakpoint set --name ns.foo --language pascal", but we shouldn't be using any frame's compile unit language to do so.

Yes, I plan to add options like this, but wanted to use the frame's CU as the default, just as gdb does. It's a pain for a Pascal user to always have to specify "--language pascal" every time they want to set a breakpoint or evaluate a string. Agreed - using the frame's CU isn't perfect either, but it's the option gdb adopted, so has precedence and years of acceptance.

But I hear you, and will revise this patch to work off an option instead.

I had looked at adding language options settings... getting those options down to the code that needs to check them will change quite a few APIs. I also wasn't sure how best to specify the language - should it be a global setting that breakpoints and evaluation both use, or should each command have a "--language" option?

dawn added a comment.Jul 15 2015, 2:17 AM

I will add breakpoint option:

breakpoint set --expr-language <lang> <expr>

and default option:

settings set target.expr-language <lang>

and submit revised patch. Sound good?

That sounds fine, except you shouldn't call the option "expr-language" because we expr is the common abbreviation for the "expression" command, but the breakpoint specifications are not in general expressions. In fact, the only breakpoint specifier that is an expression: --address would NOT take this option... So that's a confusing name.

Since you are only going to add the language options to versions of breakpoint set that take some form of identifier (--name, --method --base-name) maybe --identifier-language would be better?

Jim

I would like to see the following:

  • Add a new class named PascalLanguageRuntime and have it be able to figure out how to chop pascal names up into a basename and the larger string to then compare against.
  • Add a --language <language> to "breakpoint set" and pass it along if needed
  • Modify the Module::PrepareForFunctionNameLookup() to look at the language (which should default to "eLanguageTypeUnknown") and skip C++ if language it not eLanguageTypeUnknown and not related to C++. Have it call functions in your new PascalLanguageRuntime like a new (at least make PascalLanguageRuntime::ExtractContextAndIdentifier(...)).
dawn added a comment.Jul 15 2015, 11:43 AM

That sounds fine, except you shouldn't call the option "expr-language" because [...]
Since you are only going to add the language options to versions of breakpoint set that take some form of identifier (--name, --method --base-name) maybe --identifier-language would be better?

Actually, I do plan for this option to apply to the expression in '--address <expr>' as well. The

settings set target.expr-language <lang>

is intended to affect how expressions used in all commands are parsed and interpreted. This patch is just part of that bigger plan, by addressing how the name_type_masks are set.

We want the command options to match the target's setting for consistency. For example:

expr --expr-language <lang> <expr>
breakpoint set --expr-language <lang> <expr>

So '--identifier-language' isn't quite right, and my concern with plain '--language' is that

settings set target.language <lang>

might have users thinking it affects everything from unmangling to how the DWARF is read.

'--expr-language' seemed to to fit best to me. Would you agree?

I think I would prefer --language and have the help text for this option specify what it means (it means expression language for --address and source language for file + line breakpoints).

dawn added a comment.Jul 16 2015, 2:13 PM

I think I would prefer --language and have the help text for this option specify what it means (it means expression language for --address and source language for file + line breakpoints).

Ok, will do. At some point we may want a language option which affects everything from unmangling to how the DWARF is read, and it would make sense to reserve --language for that, but we can always change the names then.

I'll leave this review open until I have a revised one, just in case anyone else wants to comment on this...

dawn updated this revision to Diff 30277.Jul 21 2015, 12:25 PM
dawn retitled this revision from Parse breakpoint expressions using the language of the frame's CU. to Specify a language to use when parsing breakpoint identifiers..
dawn updated this object.
dawn edited edge metadata.

This revised patch uses target and breakpoint options for determining the language instead of the selected frame's CU.

clayborg accepted this revision.Jul 21 2015, 1:40 PM
clayborg edited edge metadata.

Looks good as long as the test suite is happy. I worry about MacOSX tests.

This revision is now accepted and ready to land.Jul 21 2015, 1:40 PM
This revision was automatically updated to reflect the committed changes.
dawn added a comment.Jul 21 2015, 3:06 PM

Looks good as long as the test suite is happy. I worry about MacOSX tests.

All tests passed locally. Will watch for build failures...