This is an archive of the discontinued LLVM Phabricator instance.

Add LLDB C Bindings
Needs ReviewPublic

Authored by mjsabby on Apr 15 2015, 12:40 AM.

Details

Summary

For a different project (http://github.com/mjsabby/LLDBSharp) C bindings are needed to make it easier to analyze and import from. The ultimate use case of the project is to interact with LLDB from C# on Windows.

This review is a mostly automated C++ to C conversion of the API definitions and declarations with clang-format run post generation of the code.

For multiple function names (like ctors) a monotonic integer is postfixed on the function name.

I understand this is a very large review that is going to be difficult to scan through. I'm open to ideas on how to introduce this piecemeal if there is a concern. I prefer to upstream as a whole so I don't have to play catch up on new API additions, but I understand if that is seen as code dumping and that the community wants to avoid that.

In any case, please let me know if this is something we'd like to have -- I think it's particularly useful for other languages being able to talk to LLDB besides Python.

Diff Detail

Repository
rL LLVM

Event Timeline

mjsabby updated this revision to Diff 23752.Apr 15 2015, 12:40 AM
mjsabby retitled this revision from to Add LLDB C Bindings.
mjsabby updated this object.
mjsabby edited the test plan for this revision. (Show Details)
mjsabby added reviewers: domipheus, zturner.
mjsabby set the repository for this revision to rL LLVM.
mjsabby added a subscriber: Unknown Object (MLST).
zturner edited edge metadata.Apr 15 2015, 10:11 AM

I'm not sure if this is the right way to go about things. Who is going to maintain these files? When someone adds a method to the public API, now they have to add it in two different places. I support being able to have bindings for multiple languages, but I feel like it needs to be automated.

You say your ultimate goal is to interact with LLDB from C#. Have you considered using swig to generate C# bindings directly?

Automating it seems like an ok thing to do, this was a start to see if the
community feels it will benefit them immediately having a C interface.

I'm using a different tool (http://clangsharp.org) ClangSharp to generate
this and then a little bit of manual fix up.

I can fully automate it given some time, so if that's the recommend way
I'll get it in shape for that.

If you can look over a few files to see if the output conforms naming
conventions (like LLDBCreateSBDebugger, LLDBDisposeSBDebugger, etc.) that'd
be great, or we can defer it until the automated setup is working.

domipheus edited edge metadata.Apr 15 2015, 12:09 PM

With Zachary on this - I think it would be better to ensure the configuration/input files to the tool that produces these bindings is committed rather than the output itself.

With Zachary on this - I think it would be better to ensure the configuration/input files to the tool that produces these bindings is committed rather than the output itself.

I'm actually still not convinced that even that is the best solution. Hopefully Mukul can expand on why using SWIG directly to generate C# bindings doesn't work. We already have all these swig interface files built up, and the CMake is mostly set up to just work. Why not just use it and generate C# bindings directly?

With Zachary on this - I think it would be better to ensure the configuration/input files to the tool that produces these bindings is committed rather than the output itself.

I'm actually still not convinced that even that is the best solution. Hopefully Mukul can expand on why using SWIG directly to generate C# bindings doesn't work. We already have all these swig interface files built up, and the CMake is mostly set up to just work. Why not just use it and generate C# bindings directly?

Ohh, yes - of course. That can certainly be done.

With Zachary on this - I think it would be better to ensure the configuration/input files to the tool that produces these bindings is committed rather than the output itself.

I'm actually still not convinced that even that is the best solution. Hopefully Mukul can expand on why using SWIG directly to generate C# bindings doesn't work. We already have all these swig interface files built up, and the CMake is mostly set up to just work. Why not just use it and generate C# bindings directly?

The SWIG approach requires LLDB changes every time a new language binding needs to be added. I see that SWIG generates LLDBWrapPython.cpp, now we'll have one for C#, and as you add more languages it continues to grow. Furthermore if someone wants Python, C#, and LanguageX bindings in their liblldb.dll/so, your exported function list effectively is now effectively larger (doubled/tripled?) to accommodate that need.

I therefore see value in exposing a C interface that can facilitate bindings for other languages through each language's own process. I would imagine this would also help remove Python as a core requirement for LLDB -- I'm not sure if that's the only thing blocking that or even if that is something the community even wants, but to me the separation of concerns seems obvious, but then again I'm a Windows user where LLDB build+setup is unnecessarily complicated because of the Python dependency.

I'm not particularly attached to my conversion tool, so if we instead want to use SWIG to generate C bindings I'm still on board.

If I've sufficiently conveyed my point, perhaps the discussion can be changed to: Should we generate C bindings as part of the LLDB build instead a Python specific one? Or in addition to a Python one (because SWIG may lose context to generate good Python-style bindings)?

Thoughts?

brucem added a subscriber: brucem.Apr 15 2015, 8:39 PM

I have been working towards 2 different bindings for LLDB as well.

One of the bindings is for JavaScript and for that one, I had been using SWIG (3.x), although in the end, I'm going to need to do some serious patching of SWIG to add a backend to use the NAN library instead of always using the V8 API directly (since the V8 API changes frequently).

In that case, I was building the binding shared library and linking it to the underlying LLDB shared library, so there wasn't an entirely duplicated binary.

For the other language, we don't currently have SWIG support for it and while we do have a very powerful C-FFI, we don't have a C++ FFI. I haven't decided yet how I'm going to approach this, but having a good C11 API would be very useful. (We have a tool that lets us parse C headers and auto-generate our C-FFI bindings.)

I'm not sure where I stand on having a C API to LLDB as part of the upstream due to the maintenance and long term support issues that others are concerned about in the other comments. I'm just wanting to express that I've had some pain in this area as well and haven't yet decided how I want to deal with it. (This might be what kicks me into starting to develop a C++ FFI for our language?)

With Zachary on this - I think it would be better to ensure the configuration/input files to the tool that produces these bindings is committed rather than the output itself.

I'm actually still not convinced that even that is the best solution. Hopefully Mukul can expand on why using SWIG directly to generate C# bindings doesn't work. We already have all these swig interface files built up, and the CMake is mostly set up to just work. Why not just use it and generate C# bindings directly?

The SWIG approach requires LLDB changes every time a new language binding needs to be added. I see that SWIG generates LLDBWrapPython.cpp, now we'll have one for C#, and as you add more languages it continues to grow. Furthermore if someone wants Python, C#, and LanguageX bindings in their liblldb.dll/so, your exported function list effectively is now effectively larger (doubled/tripled?) to accommodate that need.

I therefore see value in exposing a C interface that can facilitate bindings for other languages through each language's own process. I would imagine this would also help remove Python as a core requirement for LLDB -- I'm not sure if that's the only thing blocking that or even if that is something the community even wants, but to me the separation of concerns seems obvious, but then again I'm a Windows user where LLDB build+setup is unnecessarily complicated because of the Python dependency.

I'm not particularly attached to my conversion tool, so if we instead want to use SWIG to generate C bindings I'm still on board.

If I've sufficiently conveyed my point, perhaps the discussion can be changed to: Should we generate C bindings as part of the LLDB build instead a Python specific one? Or in addition to a Python one (because SWIG may lose context to generate good Python-style bindings)?

Thoughts?

Probably Greg shoudl weigh in. I can see your point, but at the same time I'm not sure how big of an issue it would be in practice for people who want multiple bindings to have a large exported function list. If it does become a concern, the bindings could always be compiled into their own DLL, like lldbpython.dll, lldbcsharp.dll, etc.

Python is actually not a core requirement for LLDB even right now. You can compile with LLDB_DISABLE_PYTHON=1 and python won't be linked in. There's not a strong enough layer of separation right now though (there's Python stuff littered around in core LLDB code that needs to be abstracted out), and I've done some work recently to make this better. But I need to go back and finish this off when I get some spare cycles.

I kind of feel like what most people will want is to have "native" bindings generated. By going from C++ to C and then using your language's FFI to interface with C code, what you end up with is something that isn't as pleasant to work with as a native interface. So there's cost to increasing the number of configurations we support, even if everything is auto-generated, because we have to ask if someone is using native bindings, or C bindings through FFI, etc. It seems easier for everyone if everyone is using the same thing.

With that in mind, please consider whether using SWIG to generate C# bindings directly would satisfy your needs. If it solves your actual problem, then I would rather just do that, because the exported function list problem is kind of a hypothetical and I think there's still ways to address it even if it does become a problem for people (like the DLL suggestion mentioned previously).

clayborg edited edge metadata.Apr 16 2015, 1:15 PM

I would love to see everyone stick to using SWIG.

No symbols need to be exported from the LLDB shared library except stuff that starts with "lldb::" or "init_lldb*". We use this as our exports list:

% cat resources/lldb-framework-exports
ZN4lldb*
ZNK4lldb*
_init_lld*

So adding extra languages will add a bit of bulk to the size of the LLDB shared library, but it shouldn't affect the public interface. So I vote to use SWIG.

Also, you can use the C++ mangled names like you would use a C interface. We don't use any inheritance and we use no virtual functions so every C++ function is available by looking up the mangled name and then you can learn how to call it correctly.

I would love to see everyone stick to using SWIG.

No symbols need to be exported from the LLDB shared library except stuff that starts with "lldb::" or "init_lldb*". We use this as our exports list:

% cat resources/lldb-framework-exports
ZN4lldb*
ZNK4lldb*
_init_lld*

So adding extra languages will add a bit of bulk to the size of the LLDB shared library, but it shouldn't affect the public interface. So I vote to use SWIG.

Also, you can use the C++ mangled names like you would use a C interface. We don't use any inheritance and we use no virtual functions so every C++ function is available by looking up the mangled name and then you can learn how to call it correctly.

Ok. So is it reasonable to say that upstream will accept changes to the CMake build to add a switch to have LLDB_ENABLE_CSHARP or something like that? And possibly any changes to the SWIG files (I'm not sure if it is needed, just thinking aloud)?

I'm fine with this approach since C# is what I care about.

I would love to see everyone stick to using SWIG.

No symbols need to be exported from the LLDB shared library except stuff that starts with "lldb::" or "init_lldb*". We use this as our exports list:

% cat resources/lldb-framework-exports
ZN4lldb*
ZNK4lldb*
_init_lld*

So adding extra languages will add a bit of bulk to the size of the LLDB shared library, but it shouldn't affect the public interface. So I vote to use SWIG.

Also, you can use the C++ mangled names like you would use a C interface. We don't use any inheritance and we use no virtual functions so every C++ function is available by looking up the mangled name and then you can learn how to call it correctly.

Ok. So is it reasonable to say that upstream will accept changes to the CMake build to add a switch to have LLDB_ENABLE_CSHARP or something like that? And possibly any changes to the SWIG files (I'm not sure if it is needed, just thinking aloud)?

I'm fine with this approach since C# is what I care about.

Maybe something like LLDB_SWIG_LANGUAGES="csharp,python" would be even better, that way people could easily choose the subset of languages they want. In any case, to answer your question, at a high level I think that yes, as long as we're using SWIG there's no reason to limit the set of languages that it can generate bindings for. So feel free to add support to CMake for generating other languages from swig. You'll probably need to make some changes to the lldb/scripts folder (in particularl the buildSwigWrapperClasses.py and finishSwigWrapperClasses.py) to make changes there as necessary to produce C# bindings.

Also we need to make sure that the swig interface files don't change in a way that impacts the generated python bindings (basically, existing python code written against the current set of bindings can't break, so be careful if you need to change a .i file, although i don't think this should be necessary in theory)

Changes to the .i files will be needed, but not one that would change things for Python.

  • Should move them into a shared location outside of a Python specific directory.
  • SWIG doesn't support the various language extensions when building for a different language, so you'll have to #ifdef around things some (or see if you can separate it into different files but keep the built output the same).
  • Each language will need a bunch of type mappings established.
  • Not everything that needs a type mapping in Python actually has one yet. (There is at least one thing that can't be easily invoked from Python in current HEAD due to missing typemaps.)

Also any changes made to the .i files need to be backward compatible with SWIG Version 1.3.40.

This is the one and only swig that we can use at Apple.

Greg

I've opened http://reviews.llvm.org/D9212 which is a start on the points that I listed for supporting multiple languages via the SWIG stuff.

zturner resigned from this revision.Oct 15 2015, 1:46 PM
zturner removed a reviewer: zturner.