Page MenuHomePhabricator

[LLDB/Lua] add support for one-liner breakpoint callback
ClosedPublic

Authored by tammela on Nov 15 2020, 1:01 PM.

Details

Summary

These callbacks are set using the following:

breakpoint command add -s lua -o "print('hello world!')"

The user supplied script is executed as:

function (frame, bp_loc, ...)
   <body>
end

So the local variables 'frame', 'bp_loc' and vararg are all accessible.
Any global variables declared will persist in the Lua interpreter.
A user should never hold 'frame' and 'bp_loc' in a global variable as
these userdatas are context dependent.

Diff Detail

Event Timeline

tammela created this revision.Nov 15 2020, 1:01 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 15 2020, 1:01 PM
tammela requested review of this revision.Nov 15 2020, 1:01 PM

@JDevlieghere

When writing this patch I noticed that there is no mechanism in-place to remove the Python/Lua function when the breakpoint is removed or when the callback function is replaced.
The class that sets the callback provides ClearCallback() but it never calls into the ScriptInterpreter to clean up.
Although the real world impact is not that big, it's crucial for a small memory footprint. Any thoughts on this?

tammela updated this revision to Diff 305427.Nov 16 2020, 1:11 AM

Clean up includes

labath added a subscriber: labath.Nov 16 2020, 1:48 AM
labath added inline comments.
lldb/source/Plugins/ScriptInterpreter/Lua/Lua.cpp
83–101

I'm confused. Why use lua to call a c callback, when you could just do calllback()?

lldb/source/Plugins/ScriptInterpreter/Lua/ScriptInterpreterLua.cpp
195

How is lua_interpreter different from this?

206
if (llvm::Error E = lua.Run(callback)) {
  debugger.GetErrorStream() << toString(std::move(E));
  stop = true;
}

would be less callback-y.

lldb/test/Shell/ScriptInterpreter/Lua/breakpoint_oneline_callback.test
4

Print something more unique so this doesn't match accidentally?

tammela added inline comments.Nov 17 2020, 1:30 AM
lldb/source/Plugins/ScriptInterpreter/Lua/Lua.cpp
83–101

Some Lua API functions throw errors, if there's any it will abort() the program if no panic handler or protected call is provided.
To guarantee that the callback always runs in a protected environment and it has error handling, we do the above.
Delegating this to the callback itself makes it cumbersome to write.

@JDevlieghere

When writing this patch I noticed that there is no mechanism in-place to remove the Python/Lua function when the breakpoint is removed or when the callback function is replaced.
The class that sets the callback provides ClearCallback() but it never calls into the ScriptInterpreter to clean up.
Although the real world impact is not that big, it's crucial for a small memory footprint. Any thoughts on this?

I see, it looks like we just clean up the LLDB side but never tell the runtimes about it. What exactly are we leaking in that case? Do you think it's worth it to wire that up?

This looks good with Pavel's comments addressed and an integration test added for the breakpoint callback. (Watchpoints are not supported everywhere and it's hard to determine whether they are from a Shell test).

lldb/bindings/lua/lua-wrapper.swig
30

Might be nice to add a comment to explain the "magic values" here.

lldb/source/Plugins/ScriptInterpreter/Lua/Lua.h
28

Let's move this into the Lua class (and rename it to Callback) so we don't taint every CU that includes this file.

labath requested changes to this revision.Nov 17 2020, 9:34 AM

I'll have more questions about this, but probably won't get around to asking them today.

This revision now requires changes to proceed.Nov 17 2020, 9:34 AM
tammela updated this revision to Diff 305874.Nov 17 2020, 12:15 PM
tammela marked 4 inline comments as done and an inline comment as not done.

Adressing comments

lldb/source/Plugins/ScriptInterpreter/Lua/ScriptInterpreterLua.cpp
195

Are you asking why I didn't go for m_script_interpreter?

This function is static (on the class declaration), perhaps it's clearer a static keyword here as well?

@JDevlieghere

When writing this patch I noticed that there is no mechanism in-place to remove the Python/Lua function when the breakpoint is removed or when the callback function is replaced.
The class that sets the callback provides ClearCallback() but it never calls into the ScriptInterpreter to clean up.
Although the real world impact is not that big, it's crucial for a small memory footprint. Any thoughts on this?

I see, it looks like we just clean up the LLDB side but never tell the runtimes about it. What exactly are we leaking in that case? Do you think it's worth it to wire that up?

For both interpreters, the function objects itself. I think it's worth it, specially for long running debugging sessions.

labath added inline comments.Nov 18 2020, 2:05 AM
lldb/bindings/lua/lua-wrapper.swig
27–28

This name made sense for python, as the functions actually returned the wrappers. But here, the name makes it pretty unobvious that these functions actually push the object on the lua stack. It'd be better if this was called something like PushSBType or something

lldb/source/Plugins/ScriptInterpreter/Lua/Lua.cpp
83–101

Aha, I see.

So, if I remember my lua correctly (I wrote a c++ lua wrapper once, but that was years ago..), whenever there's a lua exception inside this (c++) callback, lua will longjmp(3) back to the lua_pcall call on line 68, skipping any destructors that should normally be invoked. Is that so?

If that's the case, then I think this is a dangerous api, that should at the very least get a big fat warning, but that ideally shouldn't exist at all.

What's the part that makes delegating this to the callback "cumersome to write"? And why can't that be overcome by a suitable utility function which wraps lua_pcall or whatever else is needed?

The approach that we've chosen in python is to have very little code interacting with the python C API directly. Instead, code generally works with our C++ wrappers defined in PythonDataObject.h. These functions try to hide the python exception magic as much as possible, and present a c++-y version of the interface.

Now, since lua is stack-based, something like LuaDataObjects.h would probably not work. However, that doesn't meant we should give up on the c++ wrapper idea altogether. During the intitial reviews, my intention was for the Lua class to serve this purpose. I still think this can be achieved if we make the callback functions take Lua& instead of lua_State* as an argument, and then ensure the class contains whatever is needed to make the callbacks not cumerbsome to write.

lldb/source/Plugins/ScriptInterpreter/Lua/ScriptInterpreterLua.cpp
195

I just missed the fact that this is a static function (I hate these static baton functions in lldb). Putting static on an out-of-line member function definition is not valid c++.

However, this does raise a different issue. Debugger::GetScriptInterpreter will return the default script interpreter, and that may not be the lua one. I think you should pass eScriptLanguageLua here explicitly.

tammela planned changes to this revision.Nov 18 2020, 11:41 AM
tammela added inline comments.Nov 18 2020, 12:16 PM
lldb/source/Plugins/ScriptInterpreter/Lua/Lua.cpp
83–101

So, if I remember my lua correctly (I wrote a c++ lua wrapper once, but that was years ago..), whenever there's a lua exception inside this (c++) callback, lua will longjmp(3) back to the lua_pcall call on line 68, skipping any destructors that should normally be invoked. Is that so?

If that's the case, then I think this is a dangerous api, that should at the very least get a big fat warning, but that ideally shouldn't exist at all.

You are right. This escaped me completely.
Lua can be compiled to use try/catch instead of longjmp, but that is an exception (no pun intended). Since we rely on the host's library, we need to assume that it's using longjmp.

What's the part that makes delegating this to the callback "cumersome to write"? And why can't that be overcome by a suitable utility function which wraps lua_pcall or whatever else is needed?

The approach that we've chosen in python is to have very little code interacting with the python C API directly. Instead, code generally works with our C++ wrappers defined in PythonDataObject.h. These functions try to hide the python exception magic as much as possible, and present a c++-y version of the interface.

Now, since lua is stack-based, something like LuaDataObjects.h would probably not work. However, that doesn't meant we should give up on the c++ wrapper idea altogether. During the intitial reviews, my intention was for the Lua class to serve this purpose. I still think this can be achieved if we make the callback functions take Lua& instead of lua_State* as an argument, and then ensure the class contains whatever is needed to make the callbacks not cumerbsome to write.

Any C++ code that runs in a lua_pcall seems to face this issue. I checked a very complete C++ Lua wrapper called sol2 and they seem to have the same issue. I don't think there's a way out of it except code discipline.

Lua provides lua_atpanic as a way to recover from unprotected throws. Perhaps we can leverage this to throw an exception, guaranteeing stack unwinding and avoiding a call to abort(). We would then let the callback run unprotected and delegate any calls to lua_pcall to the callback.

Changes proposed:

  • Register the panic function on Lua ctor
  • Let the Callback run unprotected
  • Wrap the Callback call in a try catch block
  • Change the Callback signature to receive Lua& and return llvm::Error

It looks like it ticks all goals.

  • Unprotected Lua errors do not cause lldb to abort.
  • Stack unwinding is guaranteed for C++ code interacting with the Lua state.
  • Errors are propagated

Continuing the discussion from the inline comment:

Lua can be compiled to use try/catch instead of longjmp, but that is an exception (no pun intended). Since we rely on the host's library, we need to assume that it's using longjmp.

Right. In fact even if it had exceptions enabled, that still wouldn't help as llvm itself is also compiled without exceptions (it can be, but it's an exception :P).

Any C++ code that runs in a lua_pcall seems to face this issue. I checked a very complete C++ Lua wrapper called sol2 and they seem to have the same issue. I don't think there's a way out of it except code discipline.

I'm not sure what you mean by that. Can you elaborate?

The way I see it, the problem is not about which environment is the c++ code running in, as it can't throw an exception (in llvm). The problem is what happens when we call lua from c++, and the lua code throws. And this is a problem regardless of the environment the C(++) code is in -- the difference is "only" in whether lua will panic or jump over the C code, but something bad will happen anyway.

I think that could be solved by ensuring we always call lua in a protected environment. That way lua will never unwind through the C code (regardless of what environment it is in) because it would always stop right at the boundary.

That would essentially mean, lua_call is banned (except maybe as an optimization, if you can be sure that the code you're about to call will not throw), and all C->lua transitions should go through lua_pcall. And we can provide a wrapper that handles catches those errors and converts them to llvm::Error/Expected, similar to how the python things do it.

Lua provides lua_atpanic as a way to recover from unprotected throws. Perhaps we can leverage this to throw an exception, guaranteeing stack unwinding and avoiding a call to abort(). We would then let the callback run unprotected and delegate any calls to lua_pcall to the callback.

I'm afraid that won't work, because exceptions are not allowed in llvm. But I think something similar to what I propose above should do the trick.

lldb/source/Plugins/ScriptInterpreter/Lua/Lua.cpp
83–101

I'll move this out-of-line, as it's getting rather long.

tammela updated this revision to Diff 307202.Nov 23 2020, 2:37 PM

Redesign of the API

I'm not sure what you mean by that. Can you elaborate?

Sure, it's was just a nod to your previous comment about that running callbacks (C++ lambdas) inside a pcall is a dangerous API. Although possible, it requires that the developer be very cautious about implicit throws.

The way I see it, the problem is not about which environment is the c++ code running in, as it can't throw an exception (in llvm). The problem is what happens when we call lua from c++, and the lua code throws. And this is a problem regardless of the environment the C(++) code is in -- the difference is "only" in whether lua will panic or jump over the C code, but something bad will happen anyway.

I agree 100% with your analysis of the problem.

I think that could be solved by ensuring we always call lua in a protected environment. That way lua will never unwind through the C code (regardless of what environment it is in) because it would always stop right at the boundary.

That would essentially mean, lua_call is banned (except maybe as an optimization, if you can be sure that the code you're about to call will not throw), and all C->lua transitions should go through lua_pcall. And we can provide a wrapper that handles catches those errors and converts them to llvm::Error/Expected, similar to how the python things do it.

I have posted a redesign based on your comments. The lua_push* functions are all safe-ish. The Lua implementation guarantees at least 20 stack slots when the lua_State is created so I've added the stack checks for sanity rather than necessity.

I'm not sure what you mean by that. Can you elaborate?

Sure, it's was just a nod to your previous comment about that running callbacks (C++ lambdas) inside a pcall is a dangerous API. Although possible, it requires that the developer be very cautious about implicit throws.

Right. That's why I'd like to have good wrappers, which make it easy to do the right thing, and hard to do the wrong one.

I don't think we're quite there yet, but before I comment on the API, I want to understand one other thing.

I am puzzled by all the wrapping that's happening inside the PushSBClass functions. What is that protecting us from? I would hope that pushing a swig wrapper on the stack is a safe operation...

The Lua implementation guarantees at least 20 stack slots when the lua_State is created so I've added the stack checks for sanity rather than necessity.

So, IIUC, this can only fail if we are running out of memory? If that's the case, then I would remove these checks, as (for better or worse) llvm is not robust against memory allocation errors, and they add a fair amount of cruft to the code.

tammela marked an inline comment as done.Nov 24 2020, 6:59 AM

Right. That's why I'd like to have good wrappers, which make it easy to do the right thing, and hard to do the wrong one.

I don't think we're quite there yet, but before I comment on the API, I want to understand one other thing.

I am puzzled by all the wrapping that's happening inside the PushSBClass functions. What is that protecting us from? I would hope that pushing a swig wrapper on the stack is a safe operation...

I thought that too, but internally it's a naked call to lua_newuserdata() which might throw in case of a memory error.

So, IIUC, this can only fail if we are running out of memory? If that's the case, then I would remove these checks, as (for better or worse) llvm is not robust against memory allocation errors, and they add a fair amount of cruft to the code.

Fair enough. Will remove those.
Since this seems to be a fact of life for LLVM, perhaps wrapping potential memory errors turns out to be just bloat. If that's the case, then the wrapping in PushSBClass is not needed and the abort() call that Lua does is honest.

tammela updated this revision to Diff 307440.Nov 24 2020, 1:15 PM

Remove unnecessary wrappers/checks

I think we're converging on something. Now to iron out the details...

I am puzzled by all the wrapping that's happening inside the PushSBClass functions. What is that protecting us from? I would hope that pushing a swig wrapper on the stack is a safe operation...

I thought that too, but internally it's a naked call to lua_newuserdata() which might throw in case of a memory error.

So, IIUC, this can only fail if we are running out of memory? If that's the case, then I would remove these checks, as (for better or worse) llvm is not robust against memory allocation errors, and they add a fair amount of cruft to the code.

Fair enough. Will remove those.
Since this seems to be a fact of life for LLVM, perhaps wrapping potential memory errors turns out to be just bloat. If that's the case, then the wrapping in PushSBClass is not needed and the abort() call that Lua does is honest.

Right. I'm not worried about OOM errors (well.. I am slightly, but I don't think it's worthwhile to do something about them without addressing the bigger problem).

lldb/bindings/lua/lua-wrapper.swig
22–23

What's up with the copying? Is it to ensure the callback does not modify the caller's values? If so, you could just drop the & from the signature, and have the compiler copy these for you...

26–27

Is there a reason this function has to be in this file? It would be nice all the baton/registry handling was happening in the same place...

lldb/source/Plugins/ScriptInterpreter/Lua/Lua.cpp
92

Shouldn't this also pop the baton ?

lldb/test/Shell/ScriptInterpreter/Lua/breakpoint_oneline_callback.test
6

Could we also have a test where calling the callback raises an error? And for the different values of the "stop" argument that can be returned from the callback? And when compiling the callback expression results in an error?

tammela updated this revision to Diff 308029.Nov 27 2020, 5:51 AM
tammela marked 3 inline comments as done.

Addressing review comments

lldb/bindings/lua/lua-wrapper.swig
22–23

SBFrame and SBBreakpointLocation only provide copy constructors. I can't see the difference between the two strategies, could you elaborate?

26–27

My reasoning was to have the lua_pcall on the same function that pushes the Lua callback but I changed to your version as it's clearer who is handling the baton (Lua class)

lldb/test/Shell/ScriptInterpreter/Lua/breakpoint_oneline_callback.test
6

Apparently when the user sets the breakpoint and it fails to compile the code lldb will not report any errors and it will fail silently. I will address this particular test case in another patch since it affects Python as well.

labath added inline comments.Nov 30 2020, 12:42 AM
lldb/bindings/lua/lua-wrapper.swig
22–23

Ok, let's try this.

My main point was that a function which receives an argument as a (non-const) reference, and then does not modify the referenced object is weird. One just doesn't do that. It should either take a const reference or a plain value. Using a plain value is simpler, particularly in cases where you need to locally modify the object you received, but you don't want the modifications to affect the callers value. That's because in this case the compiler will do the copying for you:

void foo(SBFrame foo, SBFrame &bar, const SBFrame &baz) {
  // I don't have to worry about modifying foo, because it's my local copy
  foo = ...;
  
  // Modifications to bar will be visible in the caller
  bar = ...;

  // I can't modify baz. If I really need to do that, I need to make a local copy
  SBFrame baz_copy = baz;
  baz_copy = ...;
}
lldb/test/Shell/ScriptInterpreter/Lua/breakpoint_oneline_callback.test
6

Sounds good.

11

"frame #0" will also get printed if the process stops for any reason. I'd suggest printing a completely random string instead (print("bacon"), for example)

tammela updated this revision to Diff 308319.Nov 30 2020, 4:16 AM

Addressing comments

labath accepted this revision.Nov 30 2020, 4:17 AM

I think this is ok now. Thanks for your patience.

This revision is now accepted and ready to land.Nov 30 2020, 4:17 AM
This revision was landed with ongoing or failed builds.Nov 30 2020, 6:12 AM
This revision was automatically updated to reflect the committed changes.

I think this might have broken the Windows build as it seems the LLDBSwigLuaBreakpointCallbackFunction return type is an error for MSVC (and not a warning which we seem to ignore locally).

To quote someone from the Discord server:

I'm getting the following build error when attempting to build 12.0.0 using Microsoft CL 19.28.29913 for x64:
C:\llvm-build\llvm-project-12.0.0\lldb\source\Plugins\ScriptInterpreter\Lua\Lua.cpp(29): error C2526: 'LLDBSwigLuaBreakpointCallbackFunction': C linkage function cannot return C++ class 'llvm::Expected<bool>'
C:\llvm-build\llvm-project-12.0.0\lldb\source\Plugins\ScriptInterpreter\Lua\Lua.h(36): note: see declaration of 'llvm::Expected<bool>'
C:\llvm-build\llvm-project-12.0.0\lldb\source\Plugins\ScriptInterpreter\Lua\Lua.cpp(112): error C2440: 'return': cannot convert from 'void' to 'llvm::Expected<bool>'

I think this might have broken the Windows build as it seems the LLDBSwigLuaBreakpointCallbackFunction return type is an error for MSVC (and not a warning which we seem to ignore locally).

To quote someone from the Discord server:

I'm getting the following build error when attempting to build 12.0.0 using Microsoft CL 19.28.29913 for x64:
C:\llvm-build\llvm-project-12.0.0\lldb\source\Plugins\ScriptInterpreter\Lua\Lua.cpp(29): error C2526: 'LLDBSwigLuaBreakpointCallbackFunction': C linkage function cannot return C++ class 'llvm::Expected<bool>'
C:\llvm-build\llvm-project-12.0.0\lldb\source\Plugins\ScriptInterpreter\Lua\Lua.h(36): note: see declaration of 'llvm::Expected<bool>'
C:\llvm-build\llvm-project-12.0.0\lldb\source\Plugins\ScriptInterpreter\Lua\Lua.cpp(112): error C2440: 'return': cannot convert from 'void' to 'llvm::Expected<bool>'

@kastiglione isn't this the error you fixed for Python? Should we do the same for Lua?

jbsolomon added a subscriber: jbsolomon.EditedApr 20 2021, 9:26 AM

@kastiglione isn't this the error you fixed for Python? Should we do the same for Lua?

Correct me if I'm wrong, but I think the Python integration uses the same pragma approach and return type. I have it disabled, since I don't have a reliable Python environment on this machine, so I don't know for sure that it suffers from the same compilation error, but it appears to use the same syntax.