This is an archive of the discontinued LLVM Phabricator instance.

[lldb/Target] Support more than 2 symbols in StackFrameRecognizer
ClosedPublic

Authored by mib on Mar 14 2020, 6:32 PM.

Details

Summary

This patch changes the way the StackFrame Recognizers match a certain
frame.

Until now, recognizers could be registered with a function
name but also an alternate symbol.
This change is motivated by a test failure for the Assert frame
recognizer on Linux. Depending the version of the libc, the abort
function (triggered by an assertion), could have more than two
signatures (i.e. raise, __GI_raise and gsignal).

Instead of only checking the default symbol name and the alternate one,
lldb will iterate over a list of symbols to match against.

rdar://60386577

Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>

Diff Detail

Event Timeline

mib created this revision.Mar 14 2020, 6:32 PM
friss added a subscriber: friss.Mar 14 2020, 7:57 PM
friss added inline comments.
lldb/include/lldb/Target/StackFrameRecognizer.h
113–117

can this be a const vector, or even better an ArrayRef? We certainly don't want the callbacks to modify the list?

lldb/source/Commands/CommandObjectFrame.cpp
749

Does the command actually accept multiple symbols names now? If yes, it should be tested and a warning should be added when you use the regex option and you pass multiple -n options. If it doesn't accept multiple names, it shouldn't store a vector.

859

*at least

lldb/source/Target/AssertFrameRecognizer.cpp
145

This reads a little weird to me. Should we call it MatchesSymbol instead?

lldb/source/Target/StackFrameRecognizer.cpp
65–67

Is this really the clang-format formatting?

83–84

It's kinda weird to do this here, and I also think it's wrong as it will leave the recognizer with an entry in its symbols list which changes its semantics. If the callback was taking an ArrayRef, it would be easy (and cheap) to create one for a single element as well as from a vector.

mib marked 7 inline comments as done.Mar 15 2020, 12:24 AM
mib added inline comments.
lldb/source/Target/StackFrameRecognizer.cpp
65–67

It is but I'll change it back to be more compact.

mib updated this revision to Diff 250405.Mar 15 2020, 12:28 AM
mib marked an inline comment as done.
mib added a reviewer: friss.

Addressed Fred's comments.
Added testing for multi-symbol frame recognizers.

mib updated this revision to Diff 250412.Mar 15 2020, 4:35 AM

Add missing test annotation.

friss added inline comments.Mar 15 2020, 9:45 AM
lldb/test/API/commands/frame/recognizer/TestFrameRecognizer.py
162–167

Please rewrite using lldbutil.run_to_name_breakpoint, otherwise this LGTM

mib updated this revision to Diff 250446.Mar 15 2020, 3:53 PM

Refactored test to use lldbutil.run_to_name_breakpoint.

mib marked an inline comment as done.Mar 15 2020, 3:53 PM

Being able to match multiple symbols sounds useful in its own right, but the motivating case is a bit shady. raise, __GI_raise and gsignal are all aliases to one another (they have the same address). The reason you're sometimes getting gsignal here is not because some glibcs really call this function from their assert macro. It's because we happen to pick that symbol (maybe because it comes first in the symtab, depending on how the library was linked) when doing address resolution.

I'm wondering if that doesn't signal a flaw in the recognizer infrastructure. If we changed the matching logic so that it resolves the name it is supposed to search for, and then does a match by address, then only one name would be sufficient here.

lldb/include/lldb/Target/StackFrameRecognizer.h
105

Using std::vector might be good here as the function "consumes" the argument (by assigning it to the internal recognizer list. But in that case you ought to use std::move, as appropriate to prevent needless copying. If you don't want to optimize things that much, maybe just use ArrayRef here too?

lldb/source/Commands/CommandObjectFrame.cpp
880

Is there something which ensure that m_symbols contains at least one element here? (i.e., that we do not silently drop the extra symbols arguments)

lldb/source/Target/AssertFrameRecognizer.cpp
27

llvm::is_contained(symbols, symbol)

lldb/source/Target/StackFrameRecognizer.cpp
65–67

instead of arguing with clang-format, you could change this to m_recognizers.emplace_front(arguments, without, braces), which will be more efficient, and probably result in better formatting.

125–126

is_contained

mib marked 6 inline comments as done.Mar 16 2020, 7:30 AM

I'll investigate matching the symbol by address instead of name, but I still think having the ability to register a recognizer with multiple symbols can be useful.

Being able to match multiple symbols sounds useful in its own right, but the motivating case is a bit shady. raise, __GI_raise and gsignal are all aliases to one another (they have the same address). The reason you're sometimes getting gsignal here is not because some glibcs really call this function from their assert macro. It's because we happen to pick that symbol (maybe because it comes first in the symtab, depending on how the library was linked) when doing address resolution.

I'm wondering if that doesn't signal a flaw in the recognizer infrastructure. If we changed the matching logic so that it resolves the name it is supposed to search for, and then does a match by address, then only one name would be sufficient here.

lldb/source/Commands/CommandObjectFrame.cpp
880

Few lines above, we make sure the symbols list is not empty.

mib updated this revision to Diff 250557.Mar 16 2020, 7:30 AM
mib marked an inline comment as done.
mib updated this revision to Diff 250577.Mar 16 2020, 8:54 AM

Addressed Pavel's comment.

Being able to match multiple symbols sounds useful in its own right, but the motivating case is a bit shady. raise, __GI_raise and gsignal are all aliases to one another (they have the same address). The reason you're sometimes getting gsignal here is not because some glibcs really call this function from their assert macro. It's because we happen to pick that symbol (maybe because it comes first in the symtab, depending on how the library was linked) when doing address resolution.

I'm wondering if that doesn't signal a flaw in the recognizer infrastructure. If we changed the matching logic so that it resolves the name it is supposed to search for, and then does a match by address, then only one name would be sufficient here.

Note that a frame recognizer should trigger any time the you stop in a recognized frame, not just at the beginning. So if you wanted to do address matching, you would have to get the symbol address and its extent, and then match the incoming address against that. This seems like a nice optimization, with the side effect that it handles aliased symbols transparently.

mib added a comment.Mar 16 2020, 10:14 AM

@labath Do you object if I land this patch to revert our CIs to green and work on matching the symbol by address instead of name on a separate one ?

Also do you know what's the "default" symbol I should look for with the assert frame recognizer ?

In D76188#1924840, @mib wrote:

@labath Do you object if I land this patch to revert our CIs to green and work on matching the symbol by address instead of name on a separate one ?

Can we temporarily skip the test?

In D76188#1924840, @mib wrote:

@labath Do you object if I land this patch to revert our CIs to green and work on matching the symbol by address instead of name on a separate one ?

I think this is fine -- a vector is definitely cleaner than two member variables, and I can imagine there being a situation where you'd need to match multiple symbols that aren't just aliases of one another.

I just have one question about the regex handling.

Also do you know what's the "default" symbol I should look for with the assert frame recognizer ?

raise is the canonical name. __GI_raise is an internal name you'll only get if you have debug info (.symtab) for libc installed. gsignal is an alias to raise only due to a "historical mistake". I am not sure if anyone is actually using it (in fact, I didn't know the thing exists until yesterday).

lldb/source/Commands/CommandObjectFrame.cpp
880

Not empty is one thing, but what about the case when it contains 2 or more elements (frame recognizer add --name foo --name bar --regex)? Will that produce some kind of an error, or will it just ignore the second regex ?

mib marked 2 inline comments as done.Mar 17 2020, 4:02 AM
mib added inline comments.
lldb/source/Commands/CommandObjectFrame.cpp
880

It takes only the first symbol and ignores the other ones.

labath added inline comments.Mar 17 2020, 8:22 AM
lldb/source/Commands/CommandObjectFrame.cpp
880

Hmm.. that doesn't sound very useful. I think that should be an error.

jingham added inline comments.Mar 17 2020, 9:59 AM
lldb/source/Commands/CommandObjectFrame.cpp
880

Just put --name and --regex in different option groups when you define the command. Then the command machinery won't allow you to specify both at the same time (and the help will show they are mutually incompatible).

I agree with Pavel, an error is what you want in that case.

mib marked 2 inline comments as done.Mar 17 2020, 10:57 AM
mib added inline comments.
lldb/source/Commands/CommandObjectFrame.cpp
880

@jingham I think this is wrong because --regex is just a boolean flag, not the actual regular expression.

When this flag is enabled, the overloading for AddRecognizerthat is called is the one that holds RegularExpressionSPs instead of the overloading with a ConstString and llvm::ArrayRef<ConstString> ...

I'll add the check above to make sure m_symbols has only 1 entry when the --regex flag is enable and return an error otherwise.

mib updated this revision to Diff 250840.Mar 17 2020, 11:33 AM

Added error when passing multiple symbols with --regex flag.

jingham added inline comments.Mar 17 2020, 11:36 AM
lldb/source/Commands/CommandObjectFrame.cpp
880

Oh, that's unfortunate. It's weird that you can pass --name twice w/o the --regex but only once with. But it's probably too late to change that now.

You solution sounds fine, though you should also amend the help string for --name to say "can be specified more than once except if --regex is provided" or this will be really confusing. BTW, if there any reason why this can't take more than one regex? I allowed only one for breakpoints and things like the "no step into source regex" on the theory that you can always just use "|" to concatenate two regex's. That is in the end probably more efficient but for maintenance it's easier if your regex's don't get too long, so I have wished on occasion since then that I had allowed more than one regex for these options.

Note, the reason I don't like "--regex" as a flag is it makes the job of providing completions for --name harder. Note, it's still possible if the user typed:

(lldb) whatever --regex --name A<TAB>

because the option completion for --name can actually look at the whole command line. For instance,

--shlib libBar.dylib --name Foo<TAB>

will limit the completions to names in libBar. But that makes it trickier to do.

For folks designing commands, IMO would be better to have --name <NAME> and --regex <REGEX> and not have the --regex be just a flag.

mib marked 4 inline comments as done.Mar 17 2020, 11:52 AM
mib added inline comments.
lldb/source/Commands/CommandObjectFrame.cpp
880

I didn't know that completion could be restricted by other flags, this sounds like a neat "optimization".

The issue with making the --regex as the actual regular expression is that it will prevent us to distinguish the shared library regex from the symbol regex.

I'm not against changing that behavior to have better completion and more homogeneous commands but this would need to be done on a separate patch.

jingham added inline comments.Mar 17 2020, 12:28 PM
lldb/source/Commands/CommandObjectFrame.cpp
880

It's fine to do that as a separate patch, or leave it for later as your time dictates. This feature doesn't demand it.

But you should document the current behavior in the option string for --name.

mib updated this revision to Diff 250864.Mar 17 2020, 12:38 PM
mib marked an inline comment as done.
mib added a reviewer: jingham.

Updated frame recognizer add help description.

mib updated this revision to Diff 250868.Mar 17 2020, 12:44 PM
labath accepted this revision.Mar 18 2020, 5:53 AM

looks good to me

This revision is now accepted and ready to land.Mar 18 2020, 5:53 AM
This revision was automatically updated to reflect the committed changes.