This is an archive of the discontinued LLVM Phabricator instance.

Add the ability to pass extra args to a Python breakpoint command function
ClosedPublic

Authored by jingham on Oct 8 2019, 4:21 PM.

Details

Summary

For example, it is pretty easy to write a breakpoint command that implements "stop when my caller is Foo", and it is pretty easy to write a breakpoint command that implements "stop when my caller is Bar". But there's no way to write a generic "stop when my caller is..." function, and then specify the caller when you add the command. With this patch, you can pass this data in a SBStructuredData dictionary. That will get stored in the PythonCommandBaton for the breakpoint, and passed to the implementation function (if it has the right signature) when the breakpoint is hit. Then in lldb, you can say:

(lldb) break com add -F caller_is -k caller_name -v Foo

More generally this will allow us to write reusable Python breakpoint commands.

Diff Detail

Repository
rLLDB LLDB

Event Timeline

jingham created this revision.Oct 8 2019, 4:21 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 8 2019, 4:21 PM
shafik added a subscriber: shafik.Oct 8 2019, 4:55 PM
shafik added inline comments.
lldb/include/lldb/API/SBBreakpointLocation.h
59

Could this be a const &

lldb/include/lldb/API/SBBreakpointName.h
89

Could this be a const &

Looks good overall. A few questions:

  • should we do a global "s/extra_args/args/" edit in this patch? Not sure what "extra_" is adding? Can we remove?
  • Can the structured data not be a dictionary? Valid StructuredData could be "malloc" or "[0x1000, 0x2000]" or "true", etc. If the data is required to be a dictionary we should provide lldb::SBError support for the new API calls to ensure we can let the user know what they did wrong. Be good to test this as well.
  • is it possible to update the args for a callback function? That would seem like a good thing you might want to do? And if so, to test?
    • If it is possible to update the structured data, and if the structured data can be anything (array, string, boolean, dictionary), we might want an option like: --json "..." to be able to set or update from the command line? The current -k -v options seem to imply it must be a dict and only one level deep?
lldb/include/lldb/API/SBBreakpointLocation.h
59

const doesn't mean anything for lldb::SB arguments. Since they contain shared pointers or unique pointers, so just leave as is.

lldb/include/lldb/API/SBBreakpointName.h
89

const doesn't mean anything for lldb::SB arguments. Since they contain shared pointers or unique pointers, so just leave as is.

lldb/include/lldb/Interpreter/OptionGroupPythonClassWithDict.h
26–30

clang-format? Indent seems off (it was before too I see).

Looks good overall. A few questions:

  • should we do a global "s/extra_args/args/" edit in this patch? Not sure what "extra_" is adding? Can we remove?

Since we still support the form that doesn't take extra_args, they do feel "extra" to me. I also like the "extra" because it helps make it clear these are just free bits of data that the callback system just passes along. After all, there are some fixed arguments to the call back that do have specific meanings.

Note, I called the equivalent feature "extra_args" for the scripted breakpoint resolvers (added last year) and scripted thread plans, so if we change it here, we need to change it in those places as well. If we're going to do that, I'll do it as a follow on patch because I don't want to mix those changes in with this patch. But as I said, I like the "extra".

  • Can the structured data not be a dictionary? Valid StructuredData could be "malloc" or "[0x1000, 0x2000]" or "true", etc. If the data is required to be a dictionary we should provide lldb::SBError support for the new API calls to ensure we can let the user know what they did wrong. Be good to test this as well.

The only place where I treat the extra_args as a Dictionary is when I use as a way to tell whether the extra_args is empty, which I need to write the right wrapper function in GenerateBreakpointCommandCallbackData. That function is little annoying because it doesn't know whether it is calling a function or some text gotten from the I/O handler, so I can't count the function arguments there to decide which signature to use.

Let me see if I can make this cleaner so I don't have to rely on that. I really don't intend to limit fancier uses of the extra args.

Note, the command-line doesn't allow you to make anything but a dictionary (through the -k -v option pairs). I'm actually happy with that, it enforces some kind of structure on the breakpoint command callback that you write. You'll thank me when you want to add a second or third parameter...

Plus once you write something you are going to use frequently, you can do:

(lldb) command alias caller_break breakpoint command add -F MyCommands.break_when_parent -k func_name -v %1
(lldb) caller_break Caller -n Callee

That's how you would really want to use it.

  • is it possible to update the args for a callback function? That would seem like a good thing you might want to do? And if so, to test?

There's no way to do this. "breakpoint command add" is a bit poorly named, since it really overwrites the current command rather than adding to it. But that's the way it has always worked. So only giving you the option to provide a new pair {python function, extra_args} is in keeping with its behavior.

People have asked for a "breakpoint command modify" that could chain python callbacks, or incrementally add command-line commands. That would also naturally allow you to change/add to the extra args in place. But that's a whole other piece of work that I don't want to take on here.

  • If it is possible to update the structured data, and if the structured data can be anything (array, string, boolean, dictionary), we might want an option like: --json "..." to be able to set or update from the command line? The current -k -v options seem to imply it must be a dict and only one level deep?

I thought about adding --json when making the OptionGroupPythonClassWithDict. The completionist side of me was sorely tempted, especially since it would be quite easy to do. But the options for "break set" are already several pages long, so I don't want to add more options to it just "because I can".

For all the use cases I can think of, a simple one level dictionary will suffice. If you want to do something fancier, you can make an arbitrarily complicated SBStructuredData in Python, and then use SBBreakpoint.SetScriptCallbackFunction to pass it in. And if you want to do that something fancier a lot, you can make this into a python command.

One useful enhancement that wouldn't overburden "break set" would be to call ParseJSON on the -v option value, and if that succeeded add the resultant StructuredData::ObjectSP as the value. That would allow you to pass lists in for the value, which does seem useful. You can of course do this yourself by getting the string value out in your callback and passing it to SBStructuredData.ParseFromJSON(). So I don't think it's important to add this now. I'd also need to assure myself that this wouldn't surprise people by munging up legitimate string values.

But anyway, I'd prefer to do that as a follow on, since I'm running out of time to play with this this time round the wheel...

The only place where I treat the extra_args as a Dictionary is when I use as a way to tell whether the extra_args is empty, which I need to write the right wrapper function in GenerateBreakpointCommandCallbackData. That function is little annoying because it doesn't know whether it is calling a function or some text gotten from the I/O handler, so I can't count the function arguments there to decide which signature to use.

Let me see if I can make this cleaner so I don't have to rely on that. I really don't intend to limit fancier uses of the extra args.

Let me know if you are going to look into making this cleaner for this patch. If so, I will hold off, else I can accept if you are out of time

jingham updated this revision to Diff 224719.Oct 11 2019, 6:40 PM

I added ScriptInterpreter::GetNumArgumentsForCallable, so I can base all the decisions on how to call the function on whether I was passed a function with 3 or 4 arguments. That's cleaner. I also plumbed an error through the callback setting, so I can report an error if you pass me a function with the wrong number of arguments, or provide extra_args when you are passing a function that doesn't take 4 arguments.

I added a check in the extant test to ensure that a function that takes extra_args will work with empty extra_args, since there's no reason to disallow that.

I also added a test for the error handling when you pass a function with the wrong number of arguments.

Now we don't depend on the SBStructuredData handed across the SB API's being any particular structure.

labath added inline comments.
lldb/include/lldb/Interpreter/ScriptInterpreter.h
469

In light of varargs functions (*args, **kwargs), which are fairly popular in python, the concept of "number of arguments of a callable" does not seem that well defined. The current implementation seems to return the number of fixed arguments, which might be fine, but I think this behavior should be documented. Also, it would be good to modernize this function signature -- have it take a StringRef, and return a Expected<unsigned (?)> -- ongoing work by @lawrence_danna will make it possible to return errors from the python interpreter, and this will make it possible to display those, instead of just guessing that this is because the callable was not found (it could in fact be because the named thing is not a callable, of because resolving the name produced an exception, ...).

jingham updated this revision to Diff 224929.Oct 14 2019, 4:38 PM

Addressed Pavel's comments.

I explicitly want the number of fixed arguments, so I changed the function name to GetNumFixedArguments. The docs say explicitly that you have to make the function take either three or four fixed arguments, and that's what I have to check against. I don't want to know about varargs and optional named arguments.

I added the Expected, though I didn't plumb the change into PythonCallable::GetNumArguments. That's of marginal extra benefit, and orthogonal to the purposes of this patch.

Note also, the fact that I was now checking argument signatures when adding the command means you can no longer do:

(lldb) break command add -F myfunc.function
(lldb) command script import myfunc.py

which TestBreakpointCommand.py was doing. You have to make the Python function available BEFORE adding it to the breakpoint. I don't think this was an feature, and I don't see any harm in adding that requirement. But it is a change in behavior so I thought I'd bring it up.

lawrence_danna marked an inline comment as done.Oct 14 2019, 5:25 PM
lawrence_danna added inline comments.
lldb/include/lldb/Interpreter/ScriptInterpreter.h
469

I just took a look at PythonCallable::GetNumArguments() and it's horribly broken.

It doesn't even work for the simplest test case I could think of.

auto builtins = PythonModule::Import("builtins");
ASSERT_THAT_EXPECTED(builtins, llvm::Succeeded());
auto hex = As<PythonCallable>(builtins.get().GetAttribute("hex"));
ASSERT_THAT_EXPECTED(hex, llvm::Succeeded());  
EXPECT_EQ(hex.get().GetNumArguments().count, 1u);

we should really re-write it to use inspect.signature.

labath added inline comments.Oct 15 2019, 4:09 AM
lldb/include/lldb/Interpreter/ScriptInterpreter.h
469

lol :)

I would actually say that we should try not to use this function(ality) wherever possible. Making decisions based on the number of arguments the thing you're about to call takes sounds weird. I don't want to get too involved in this, but I was designing this, I'd just say that if one tries to pass arguments to the callback then the callback MUST take three arguments (or we'll abort processing the breakpoint command). If he wants his function to be called both with arguments and without, he can just add a default value to the third argument. (And if his function takes two arguments, but he still tries to pass something... well, it's his own fault).

Anyway, feel free to ignore this comment, but I felt like I had to say something. :)

lawrence_danna added inline comments.Oct 15 2019, 5:23 AM
lldb/include/lldb/Interpreter/ScriptInterpreter.h
469

I completely agree with Pavel. Inspecting a function signature before calling it is a big code smell in python. If there's a way to avoid doing that introspection, that would be better.

jingham marked 2 inline comments as done.Oct 15 2019, 10:12 AM
jingham added inline comments.
lldb/include/lldb/Interpreter/ScriptInterpreter.h
469

Interesting. We use it for free functions (what you pass to the -F option of breakpoint command add) and for the init and call method in the little classes you can make up for scripted thread plans and for the class version of Python implemented command-line commands. We have tests for telling 3 vrs. 4 vrs. not enough or too many, and they all pass. So it seems to work in the cases we currently need it to work for...

"inspect.signature" is python 3 only, and the python 2 equivalent is deprecated. So it will take a little fancy footwork to use it in the transition period.

469

Unfortunately, we originally designed this interface to take three arguments, the frame pointer, the breakpoint location and the Python session dict. Then it became clear that it would be better to add this extra args argument (and in the case of Python based commands the ExecutionContext pointer). At that point we had three choices, abandon the improvement; switch to unconditionally passing the extra arguments, and break everybody's extant uses; or switch off of the number of arguments to decide whether the user had provided the old or new forms.

My feeling about lldb Python scripts/commands etc. that people use in the debugger is that a lot of users don't know how they work at all, they just got them from somebody else; and many more figured out how to write them for some specific purpose, and then pretty much forgot how they worked. So suddenly breaking all these bits of functionality will result in folks just deciding that this affordance is not reliable and not worth their time, which would be a shame.

So instead we accommodate both forms, which requires that we know which one the user provided. If you see a better way to do this, (and are willing to implement it because so far as I can see this method is going to work just fine) dig in, I'm not wedded to the particular approach. But I am not excited about penalizing our users because we didn't get the API design right the first time through.

lldb/include/lldb/Interpreter/ScriptInterpreter.h
469

makes sense.

The only other way I can think of to solve it would be to have some indication in the break com add command of what signature it expects from the function. But that's really ugly too because now you're asking users to understand yet another option.

I put up https://reviews.llvm.org/D68995 this morning which adds inspect.signature support for pythons that have it.

Currently we have really gnarly ArgInfo tests like this:

if (argc.count == 5 || argc.is_bound_method || argc.has_varargs)
    pfunc(debugger_arg, PythonString(args), exe_ctx_arg, cmd_retobj_arg, dict);
else
    pfunc(debugger_arg, PythonString(args), cmd_retobj_arg, dict);

😖

I think if we replace count with max_positional_args we should be able to replace that kindof test with just

if (argc.max_positional_args < 5)
   old_version
else
   new_version
labath added inline comments.Oct 15 2019, 10:45 AM
lldb/include/lldb/Interpreter/ScriptInterpreter.h
469

So, how about this: Put extra_args as the last argument, instead of inserting it in the middle (so the new signature becomes frame, bp_loc, dict, extra_args instead of frame, bp_loc, extra_args, dict. Then instead of

if (arg_info.count == 3)
    result = pfunc(frame_arg, bp_loc_arg, dict);
  else if (arg_info.count == 4) {
      lldb::SBStructuredData *args_value = new lldb::SBStructuredData(args_impl);
      PythonObject args_arg(PyRefType::Owned, SBTypeToSWIGWrapper(args_value));
      result = pfunc(frame_arg, bp_loc_arg, args_arg, dict);

we do:

if (args_impl.was_specified())
  pfunc(frame_arg, bp_loc_arg, dict, args_arg)
else
  pfunc(frame_arg, bp_loc_arg, dict);

All existing scripts will not specify the extra arguments, so they will work as usual. New scripts which do pass extra arguments will have to use the new signature. New scripts can also put args = None in their python signature, so that they are callable both with and without arguments, should they so desire. (If we don't want to support the =None use case then we can even keep the arguments in the same order as in this patch.)

Is there some reason why that would not work?

labath added inline comments.Oct 15 2019, 10:48 AM
lldb/include/lldb/Interpreter/ScriptInterpreter.h
469

The only other way I can think of to solve it would be to have some indication in the break com add command of what signature it expects from the function.

That's kind of what I'm getting at. I am hoping that the presence of the --key, --value options can be used as an indicator of that signature. (Though maybe I am misunderstanding something and I should shut up.)

jingham marked an inline comment as done.Oct 15 2019, 10:55 AM
jingham added inline comments.
lldb/include/lldb/Interpreter/ScriptInterpreter.h
469

The is_bound_method test is cheesy. We didn't offer a "class with call" for Python based commands until after we added the ExecutionContext argument, so this check knows that if you are providing a class method, then you are probably also providing the correct number of arguments. max_positional_args seems a more explicit approach.

I think the varargs check is to allow you to write a command that takes the old three arguments and the ExecutionContext as a vararg, so the same Python function could work with an older lldb that didn't send the exe_ctx but take advantage of the better interface if it was present. After all, the fallback of using the currently selected "target/process/thread/frame" inside the function will mostly work.

For the affordances taking the "extra_args", like breakpoint commands and scripted breakpoints and the like, it's hard to see how you could have a reasonable fallback to "you didn't tell me what function to look for..." So for these I want to count the fixed arguments, I don't think it is necessary to allow them to be passed as varargs.

Thanks for fixing up the PythonObjects code, BTW!

jingham marked an inline comment as done.Oct 15 2019, 11:03 AM
jingham added inline comments.
lldb/include/lldb/Interpreter/ScriptInterpreter.h
469

I did consider switching off of whether the user provided key & value arguments. But that would mean that you could not write Python breakpoint command functions that have default behaviors, but will do special things if the user provided --key and --value arguments. That seemed like a reasonable thing to support, so I switched to doing everything based on the actual signature we were provided.

labath added inline comments.Oct 15 2019, 11:18 AM
lldb/include/lldb/Interpreter/ScriptInterpreter.h
469

Why not? If the last argument has a default value (def my_fancy_callback(frame, bp_loc, dict, extra_args = None):), then it should be callable both with three and four arguments, should it not? I think the =None part actually nicely documents that the callback is usable both with extra arguments, but also has a reasonable default behavior.

jingham marked an inline comment as done.Oct 15 2019, 11:33 AM
jingham added inline comments.
lldb/include/lldb/Interpreter/ScriptInterpreter.h
469

It's already weird that we have to tell people to pass the session dict. Now we're saying you have to pass a defaulted arg after the dict; that seems even weirder. I don't actually think lots of people are going to want to write commands that take advantage of being passed extra arguments but want them to work with older lldb's. The case I want to support is that you are always going to get this extra args SBStructuredData, but it might be empty depending on whether the user added --key and --value, and your command can do what it wants based on the contents of the extra_args. That seems straightforwardly supported by requiring the arguments.

I'm not sure we are solving any actual problem here?

labath added inline comments.Oct 15 2019, 12:10 PM
lldb/include/lldb/Interpreter/ScriptInterpreter.h
469

What I find weird is that the meaning of a positional argument changes depending on what comes after it. So if I define a function like def foo(a,b,c), then c will be a session dictionary. But if I define it as def foo(a,b,c,d): then the dict becomes d. I don't find the =None confusing, particularly as that means we could just have one function signature we advertise, and if the users write it that way then it will "happen" to work on older lldb's too. But then again, I've never written any breakpoint callbacks, so if you think that these concerns are unfounded, then feel free to go ahead.

jingham marked an inline comment as done.Oct 16 2019, 10:43 AM
jingham added inline comments.
lldb/include/lldb/Interpreter/ScriptInterpreter.h
469

Yeah, I don't think that is what you are told to do. For breakpoint commands, you are told to write:

def foo(frame, bp_no, dict):

in the old form, or if you want to receive extra args, you would write:

def foo(frame, bp_no, extra_args, dict)

and in either case the last - "dict" - argument is an internal detail. That seems straightforward.

Also, it is the way command implementation functions (with the added exe_ctx argument), and scripted thread plans (with a similar extra_args argument) work. It is also how the scripted breakpoint resolvers work, except there was never a version w/o the extra_args, because that's when I realized that was a much better way to go... So I do think I'm going to leave it this way.

FYI: This broke the build for me.

lldb/include/lldb/Interpreter/ScriptInterpreter.h
323

This breaks some builds because it doesn't return a value.

wallace added inline comments.
lldb/packages/Python/lldbsuite/test/functionalities/breakpoint/breakpoint_command/TestBreakpointCommand.py
21

I've been trying to track a regression in source maps between Android Studio versions and it seems that this test shouldn't have been disabled. :(

jingham added inline comments.Jul 29 2021, 5:46 PM
lldb/packages/Python/lldbsuite/test/functionalities/breakpoint/breakpoint_command/TestBreakpointCommand.py
21

Yup, my bad. I sometimes disable tests this way when I'm only trying to run one test in the set a bunch of times. I must have forgotten to set it back. Thanks for catching this!

wallace added inline comments.Jul 29 2021, 6:19 PM
lldb/packages/Python/lldbsuite/test/functionalities/breakpoint/breakpoint_command/TestBreakpointCommand.py
21

No worries!! It was fun, though

Just for history's sake, this change was committed (738af7a6241c9). I put in the Differential Revision line in the changelog, so I'm not sure why this didn't get marked as closed...

Herald added a project: Restricted Project. · View Herald TranscriptMay 26 2022, 10:51 AM
JDevlieghere accepted this revision.May 26 2022, 12:20 PM

Apparently closing a revision requires it to be accepted first.

This revision is now accepted and ready to land.May 26 2022, 12:20 PM
JDevlieghere closed this revision.May 26 2022, 12:20 PM