This is an archive of the discontinued LLVM Phabricator instance.

allow arbitrary python streams to be converted to SBFile
ClosedPublic

Authored by lawrence_danna on Sep 29 2019, 12:23 AM.

Details

Summary

This patch adds SWIG typemaps that can convert arbitrary python
file objects into lldb_private::File.

A SBFile may be initialized from a python file using the
constructor. There are also alternate, tagged constructors
that allow python files to be borrowed, and for the caller
to control whether or not the python I/O methods will be
called even when a file descriptor is available.I

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptSep 29 2019, 12:23 AM

I haven't looked at the patch in detail, but I have one high-level question. It seems to be that instead of subclassing a fully-functional File class, it would be better to create an abstract class, that both File and the new python thingy could implement. That would also create a natural place to write down the FILE* conversion semantics and other stuff we talked about at lldb-dev. I don't have any opinion as to whether the name File should denote the new abstract class, or the real thing, but it should most likely be done as a separate patch.

What do you think about that?

lldb/include/lldb/API/SBFile.h
16–21

I don't think we have anything similar to this in any of our SB classes. It might be better to avoid it. Could the same thing be achieved e.g. with a static factory function (static SBFile SBFile::Create(FileSP file_sp, bool borrow, bool force_scripting_io)) ?

lawrence_danna marked 2 inline comments as done.Sep 30 2019, 5:01 PM
lawrence_danna added inline comments.
lldb/include/lldb/API/SBFile.h
16–21

I don't think it can be done that way, because we want those bools to control how the python object is translated into a C++ object, so we need a different swig wrapper for each possibility, which means we need a different c++ function to wrap for each possibility.

One way around this would be to have SWIG translate the python object into an intermediate object that just takes a reference, and then perform further translation inside the SB layer. But we can't do that either, because scripting support is not part of the base LLDB library, it's a plugin. And in fact Xcode ships two versions of the plugin, one for python2 and one for python3. The SB needs to be agnostic about different versions of python, so it can't do the translation itself.

lawrence_danna marked an inline comment as done.Sep 30 2019, 7:27 PM

It seems to be that instead of subclassing a fully-functional File class, it would be better to create an abstract class, that both File and the new python thingy could implement. That would also create a natural place to write down the FILE* conversion semantics and other stuff we talked about at lldb-dev. I don't have any opinion as to whether the name File should denote the new abstract class, or the real thing, but it should most likely be done as a separate patch.

What do you think about that?

I could do it that way, but I have some reservations about it. In this version the class hierarchy looks like this:

          File
           |
 SimplePythonFile --------+
  |                       |
TextPythonFile     BinaryPythonFile

If we added an abstract base class, I guess it would look like this:

          AbstractFile
           |        |                
          File      PythonFile
           |         |     | |
         SimplePythonFile  | |
                           | |
  + -----------------------+ |  
  |                          |
TextPythonFile     BinaryPythonFile

Does a more complicated class hierarchy like that really solve a problem or make the code easier to understand?

labath added a comment.Oct 1 2019, 4:36 AM

It seems to be that instead of subclassing a fully-functional File class, it would be better to create an abstract class, that both File and the new python thingy could implement. That would also create a natural place to write down the FILE* conversion semantics and other stuff we talked about at lldb-dev. I don't have any opinion as to whether the name File should denote the new abstract class, or the real thing, but it should most likely be done as a separate patch.

What do you think about that?

I could do it that way, but I have some reservations about it. In this version the class hierarchy looks like this:

          File
           |
 SimplePythonFile --------+
  |                       |
TextPythonFile     BinaryPythonFile

If we added an abstract base class, I guess it would look like this:

          AbstractFile
           |        |                
          File      PythonFile
           |         |     | |
         SimplePythonFile  | |
                           | |
  + -----------------------+ |  
  |                          |
TextPythonFile     BinaryPythonFile

Does a more complicated class hierarchy like that really solve a problem or make the code easier to understand?

No that wouldn't make things more understandable, but I think it has helped me understand what are you doing here. The usual OO answer to attempts to introduce complex class hierarchies is to try to use object composition instead. Trying to apply that here gives me the following set of classes.

  • File (or AbstractFile ?) - defines just the abstract interface, documents what the interface is
  • NativeFile (or File ?) - implements the File interface via the usual FILE* and file descriptor routines (one day these may be split into two classes too)
  • TextPythonFile - implements the File interface via Python TextIOBase interface. Does not handle ownership/closing.
  • BinaryPythonFile - implements the File interface via Python RawIOBase interface. Does not handle ownership/closing.
  • OwnedPythonFile - implements the File interface by delegating to another File object. Has special handling for the close method.

This way the decision whether to "own" something is decoupled from the decision what api do you use to actually write to the file, and I _think_ it would make things clearer. What do you make of that? The thing I don't like about the current solution is that it leaves you with unused FILE* members in the Text/BinaryPythonFile classes, and forces the introduction of weird-looking OverridesIO methods...

(BTW, as this is C++, it allows you to "cheat" and instead of creating another File object the delegation can be incorporated into the type system by making OwnedPythonFile a template parameterized on the base class. I am fine with both solutions though I think I'd prefer the regular class variant.)

lldb/include/lldb/API/SBFile.h
16–21

Are you sure about that? Swig allows you to define typemaps for sequences of values. This example is taken from the swig documentation:

%typemap(in) (char *str, int len) {
    $1 = PyString_AsString($input);   /* char *str */
    $2 = PyString_Size($input);       /* int len   */
}

It seems to me that your typemap could switch on the value of the flag at runtime rather than having compile time switching by swig. This way you wouldn't even need a static function, you could just have an additional constructor with two extra bool arguments.

However, if this is not the preferred syntax for some reason, then we could also have a set of static functions, one for each flag combination, do the conversion that way.

(BTW, the main reason I am trying to avoid the tag structs is because they are wading into uncharted territory in terms of our SB coding guidelines https://lldb.llvm.org/resources/sbapi.html)

lldb/source/Plugins/ScriptInterpreter/Python/PythonDataObjects.h
500 ↗(On Diff #222313)

no else after return.

updated based on an abstract base clase

This way the decision whether to "own" something is decoupled from the decision what api do you use to actually write to the file, and I _think_ it would make things clearer. What do you make of that?

Yea I think that works. Updated. I decided to go with the template thingy because there's kind of a lot of virtual methods to delegate and it's nice to have a list of them in the header.

rename LLDBPythonFile -> OwnedPythonFile

lawrence_danna marked an inline comment as done.Oct 1 2019, 9:11 PM

use a python helper instead of funny tag structs

lawrence_danna marked 2 inline comments as done.Oct 1 2019, 10:10 PM
lawrence_danna added inline comments.
lldb/include/lldb/API/SBFile.h
16–21

I really do think it can't be done with swig. You can group multiple arguments together on the C++ side, but not on the python side.

I think I did find a reasonable way to deal with it though without uglying up the SB headers with those tags. I added some statics with %extend, and then wrapped them with a python-friendly classmethod called Create, with both flags as keyword args.

How's that look?

lawrence_danna marked an inline comment as done.Oct 1 2019, 10:13 PM
labath added a comment.Oct 2 2019, 4:27 AM

Hmm... I like your solution of the typemap problem, but this CRTP class seems to be way more complicated than needed. There shouldn't be any need for CRTP as we already have regular dynamic dispatch via virtual methods. Also, I don't think the forwarding should be needed if you're going the template route. How about something like this instead?

template<typename Base>
class OwningPythonFile : public Base {
  Status Close() { /* close magic here*/; }
  // no need to forward anything -- inheritance handles that
};

class TextPythonFile: public OwningPythonFile<File> {
  // override methods as usual
};
// same for BinaryPythonFile

Then depending on what you need, you create either a NativeFile, OwningPythonFile<NativeFile>, or a Text/BinaryPythonFile. How does that sound?

Hmm... I like your solution of the typemap problem, but this CRTP class seems to be way more complicated than needed. There shouldn't be any need for CRTP as we already have regular dynamic dispatch via virtual methods. Also, I don't think the forwarding should be needed if you're going the template route. How about something like this instead?

template<typename Base>
class OwningPythonFile : public Base {
  Status Close() { /* close magic here*/; }
  // no need to forward anything -- inheritance handles that
};

class TextPythonFile: public OwningPythonFile<File> {
  // override methods as usual
};
// same for BinaryPythonFile

Then depending on what you need, you create either a NativeFile, OwningPythonFile<NativeFile>, or a Text/BinaryPythonFile. How does that sound?

Next step is add type maps to convert these things back to python. Without the CRTP it can’t just check for a single OwnedPythonfile base class — it’ll have to check for all three. Is that worse than using CRTP?

labath added a comment.Oct 2 2019, 7:24 AM

Then depending on what you need, you create either a NativeFile, OwningPythonFile<NativeFile>, or a Text/BinaryPythonFile. How does that sound?

Next step is add type maps to convert these things back to python. Without the CRTP it can’t just check for a single OwnedPythonfile base class — it’ll have to check for all three. Is that worse than using CRTP?

Hmm... hard to say without knowing more details, but I would say that being worse than CRTP is quite hard. :) Also, Text and Binary python file will have OwningPythonFile<File> as their base, so you can check for those together. I'd say we go with this first, and then revisit if needed.

get rid of CRTP gobbledygook

labath added a comment.Oct 3 2019, 3:46 AM

Ok, I think that the major issues are out of the way now. I've done a more careful pass over the file, and I think this is looking pretty good. I am particularly happy with how you've implemented (and tested) the interaction with python exceptions. A bunch of additional comments inline, but none really that important.

lldb/packages/Python/lldbsuite/test/python_api/file_handle/TestFileHandle.py
342–343

Actually, I'm pretty sure that this category annotation is unnecessary, as the python_api contains a .categories file specifying this category. Also, the @no_debug_info_test thingy might be better achieved by setting NO_DEBUG_INFO_TESTCASE = True class property.

472–473

self.assertTrue(status.Succes()). If you're doing this because you want a better error message or something, feel free to implement something like assertSuccess or the like. I'd be happy to use that, as I want better error messages too.

lldb/source/Plugins/ScriptInterpreter/Python/PythonDataObjects.cpp
1099

All of these file-local classes (unless you choose to move them to a separate file) should go into an anonymous namespace. See http://llvm.org/docs/CodingStandards.html#anonymous-namespaces on the recommended way to do that.

1129–1131

We're using LLDB_LOG or LLDB_LOGF these days. The first one is prefereed if you have complex values that can't be displayed by printf natively (e.g. StringRef).

1236

s/Simply/Simple

1240–1242

This reordering of arguments is confusing. Can we make the two orders match?

1298–1305

How about if OwnedPythonFile defines IsValid() as IsPythonObjectValid() || Base::IsValid(). Then PythonIOFile could redefine it to be simply IsPythonObjectValid() without the need for the extra class?

1311

no semicolon

1315–1324

Hmm... how exactly does this differ from the implementation in OwnedPythonFile? Is it only that it does not chain into the base class? Could we use the same trick as for IsValid to reduce the duplication?

1338

What is this needed for? I don't see the PythonFile class doing anything funny (and it probably shouldn't).

1457

I was struggling to find this method, as I was looking it next to the other PythonFile methods. I get why it's here, but I think it's confusing to have PythonFile be split this way. Could you move the other classes so that they are declared before all of PythonFile methods. Or, given that this file is starting to become largish, maybe move some of this stuff into a separate file?

We need to obsessively check PyErr_Occurred() before calling
into the next python API, even when we'd just catch it a
little later. Python APIs will generally work if you don't,
but it's undefined behavior to do it and debug builds of python
will assert.

no_debug_info_test -> NO_DEBUG_INFO_TESTCASE

lawrence_danna marked an inline comment as done.

anonymous namespace, and a windows fix

lawrence_danna marked an inline comment as done.Oct 3 2019, 11:02 AM

assert instead of raise

lawrence_danna marked an inline comment as done.Oct 3 2019, 11:09 AM

use LLDB_LOGF and add a logging test

lawrence_danna marked an inline comment as done.Oct 3 2019, 12:03 PM
lawrence_danna marked 2 inline comments as done.

cleanup

lawrence_danna marked 6 inline comments as done.Oct 3 2019, 12:12 PM
lawrence_danna added inline comments.
lldb/source/Plugins/ScriptInterpreter/Python/PythonDataObjects.cpp
1298–1305

If I did that then a SimplePythonFile would still be valid if the file was closed on the python side... seems like the wrong behavior.

1311

huh?

1338

python API functions expect you to check for an exception with PyErr_Occured() and clear it before calling into them again. If you don't things usually work out OK, except on a debug build in which an assert will trip.

lawrence_danna marked 3 inline comments as done.

group all the PythonFile support together in the source file

lawrence_danna marked an inline comment as done.Oct 3 2019, 12:17 PM
labath added inline comments.Oct 3 2019, 12:23 PM
lldb/source/Plugins/ScriptInterpreter/Python/PythonDataObjects.cpp
1298–1305

Sorry, I meant &&: IsPythonObjectValid() && Base::IsValid(). Basically, I'm trying to see if there's a reasonable way to reduce the number of classes floating around, and this PresumptivelyValidFile seems like it could be avoided...

1311

There's no need to put a semicolon after the body of the constructor.

1338

Are you sure you're looking at the right place (the comments tend to move around as you reorganize code)? I was asking why is PythonFile a friend of the BinaryPythonFile class. That doesn't seem relevant to the PyErr_Occured checks...

lawrence_danna marked 2 inline comments as done.Oct 3 2019, 12:23 PM
lawrence_danna added inline comments.
lldb/source/Plugins/ScriptInterpreter/Python/PythonDataObjects.cpp
1315–1324

they actually have different behavior. The OwnedPythonFile version chains into the base class, but the PythonIOFile version also flushes the stream on the python side if the file is borrowed. The idea is that closing any file should flush the buffer you've been writing too even if the file is borrowed. That way you don't get any weird surprises where you "close" a file and nothing gets written.

lawrence_danna marked 5 inline comments as done.

remove unnecessary friends

lldb/source/Plugins/ScriptInterpreter/Python/PythonDataObjects.cpp
1298–1305

If i did it that way, with && and no PresumptivelyValidFile then the python IO files would chain into the base File class and say they're invalid.

That's how I coded it up at first, I didn't notice I needed PresumptivelyValidFile until I saw the python IO files coming up as invalid.

1338

oh, yea it got moved around. fixed.

lawrence_danna marked an inline comment as done.

semicolons

labath added inline comments.Oct 3 2019, 1:07 PM
lldb/source/Plugins/ScriptInterpreter/Python/PythonDataObjects.cpp
1298–1305

Yeah, but that's where the second part of this idea comes in. The python IO files can re-override IsValid() so that it does *not* chain into the base class, and just calls IsPythonObjectValid() from OwnedPythonFile. That arrangement seems to make sense to me, though I am willing to be convinced otherwise.

rm class PresumptivelyValidFile

lawrence_danna marked 2 inline comments as done.Oct 3 2019, 1:14 PM
lawrence_danna added inline comments.
lldb/source/Plugins/ScriptInterpreter/Python/PythonDataObjects.cpp
1298–1305

oh, yea that works.

lawrence_danna marked an inline comment as done.Oct 3 2019, 1:16 PM

Thanks. I was just about to hit approve, but then I noticed one other thing... :/ It seems that somebody (I guess it was @zturner) spent a lot of time in creating the whole PythonObject hierarchy, and it's (worthwhile) goal seems to be to enable the rest of the to code to avoid dealing with the python APIs directly. However, here you're ignoring all of that, and jumping to the C python interface directly.

I'd like to know what's your take on that? Have you considered basing these file classes on the PythonDataObject stuff?

The main reason I mention that is your comment about PyErr_Occurred, and the resulting boiler plate it produced. This seems like something that would be nicely solved by some c++-like wrapper, which is exactly what the PythonObject stuff is AFAICT.

Most of your interactions seem to be about calling methods. Would it be possible to add a PythonDataObject wrapper for this (and any other frequently used python API)? I'm hoping that we could have something like:

if (Expected<PythonInteger> i = m_py_obj.CallMethod<PythonInteger>(pybuffer))
  l_bytes_written = i->GetInteger();
else
  return Status(i.takeError());

instead of stuff like

auto bytes_written = Take<PythonObject>(
    PyObject_CallMethod(m_py_obj, "write", "(O)", pybuffer.get()));
if (PyErr_Occurred())
  return Status(llvm::make_error<PythonException>("Write"));
long l_bytes_written = PyLong_AsLong(bytes_written.get());
if (PyErr_Occurred())
  return Status(llvm::make_error<PythonException>("Write"));

It doesn't look like adding this should be too hard (including the template-based computation of the python signature), but it could go a long way towards improving the readability, and it might actually fix other issues too -- it seems that the PythonDataObjects currently completely ignore the PyErr_Occurred stuff, which seems to be wrong, and it could actually be the reason why @amccarth was unable to run the test suite with a debug python on windows. Improving that, even if it's just for the stuff that's needed for your work would be really helpful, as it would provide a pattern on how to fix the remaining issues.

I'm sorry that I'm bringing this up so late in the review, but I am also learning this stuff as we go along -- unfortunately, I don't think we have anyone really knowledgeable about the python stuff still active in the lldb community..

lldb/source/Plugins/ScriptInterpreter/Python/PythonDataObjects.cpp
1222

Maybe this GIL-talking can be done inside IsPythonSideValid (instead of two IsValid methods)?

1254

extra semicolon here too

1426

This is an example where _I think_ you may be using PyErr_Occurred incorrectly. If python really asserts that PyErr_Occurred is always called, then this is not right, as you will not call it if the result is null. There are a couple of other examples like that in the code too.

Most of your interactions seem to be about calling methods. Would it be possible to add a PythonDataObject wrapper for this (and any other frequently used python API)? I'm hoping that we could have something like...

I think this is a good idea, but there's a lot of places in PythonDataObjects that need to be fixed to check for exceptions and return Expected<>. I really think it ought to be done in a separate patch.

lawrence_danna marked 4 inline comments as done.

minor fixes

lldb/source/Plugins/ScriptInterpreter/Python/PythonDataObjects.cpp
1426

I think it is correct.

Python doesn't assert that PyErr_Occured is called, it asserts that the error is cleared before you call back in to another thing that can cause an error. PythonException will clear the error if there is one, or say "unknown error" if there isn't one (which should never happen if utf8==NULL, but i'm checking out of abundance of caution)

I agree about the separate patch stuff, but it seems to be that this should be done before this one. After all, all (most?) of the existing code has already been DataObject-ized and this patch is the thing that's deviating from that practice. I don't think you should rewrite all of the existing code -- hopefully we can find a way to fit this in there. For instance, the CallMethod thingy would be a new api, so we can have that return expected independently of the other APIs. I guess the main part you will need is (and that already existing in the non-Expected form) is the type conversions, but maybe there we can introduce a new kind of conversion, which is more strict in checking for errors. Or possibly have the current form return Expected<T>, but then have a helper function to suppress those errors and return a default-constructed T() for legacy uses (so legacy code would be sth like suppressError(object.AsType<PythonInteger>()))

lldb/source/Plugins/ScriptInterpreter/Python/PythonDataObjects.cpp
1426

Ok, thanks for the explanation.

lawrence_danna marked an inline comment as done.Oct 4 2019, 3:09 PM

better integration with PythonObject

I agree about the separate patch stuff, but it seems to be that this should be done before this one. After all, all (most?) of the existing code has already been DataObject-ized and this patch is the thing that's deviating from that practice. I don't think you should rewrite all of the existing code -- hopefully we can find a way to fit this in there.

Yea... I tried updating all the PythonDataObjects code to use Errors and Expected<> and it wound up being just way too big of a patch.

I've updated this patch to add a giant warning at the top of PythonDataObjects.h about the dangers, as well adding just enough Expected<> support to do what I need for this patch.

How's it look now?

I changed my mind about splitting this into another patch. When I first said that I had in mind a much more extensive rewrite of the PythonDataObjects. I think it's fine now as one patch -- though we should still go back later add Expected<> everywhere in PythonDataObjects.

python2 const fix

fix typemaps to deal with None correctly

add safe PythonModule functions and use them

split off generic exception handling stuff into a separate patch

@labath anything else for this one?

labath added a comment.Oct 9 2019, 6:52 AM

I think this is looking pretty good. I like how the read/write methods look now. Just one more question about the PythonBuffer class.

lldb/source/Plugins/ScriptInterpreter/Python/PythonDataObjects.cpp
1259–1266

Storing the Error as a member is quite unusual. It would be better to have a factory function which returns Expected<PythonBuffer>.

lawrence_danna marked an inline comment as done.Oct 9 2019, 9:33 AM
lawrence_danna added inline comments.
lldb/source/Plugins/ScriptInterpreter/Python/PythonDataObjects.cpp
1259–1266

Oh yea I’ll fix that

lawrence_danna marked an inline comment as done.

fixed pybuffer error handling weirdness

labath accepted this revision.Oct 9 2019, 11:41 AM
labath added inline comments.
lldb/source/Plugins/ScriptInterpreter/Python/PythonDataObjects.cpp
1285

I would slightly prefer if the PyObject_GetBuffer happened in the Create function (and the constructor took an already-constructed Py_buffer). That way, the call is next to the exception check and you avoid creating an "invalid" PythonBuffer object, even as an implementation detail.

This revision is now accepted and ready to land.Oct 9 2019, 11:41 AM

Pavel would slightly prefer if the PyObject_GetBuffer happened in the Create function

lawrence_danna marked an inline comment as done.Oct 9 2019, 12:14 PM
This revision was automatically updated to reflect the committed changes.
jankratochvil added a subscriber: jankratochvil.EditedOct 10 2019, 5:06 AM

Fixed now by rL374322, there was a regression - Linux Fedora 30 x86_64:

======================================================================
ERROR: test_exceptions (TestFileHandle.FileHandleTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jkratoch/redhat/llvm-monorepo2/lldb/packages/Python/lldbsuite/test/python_api/file_handle/TestFileHandle.py", line 483, in test_exceptions
    self.assertRaises(TypeError, lldb.SBFile, None)
  File "/home/jkratoch/redhat/llvm-monorepo2/lldb/third_party/Python/module/unittest2/unittest2/case.py", line 534, in assertRaises
    callableObj(*args, **kwargs)
  File "/home/jkratoch/redhat/llvm-monorepo2-clangassert/lib/python2.7/site-packages/lldb/__init__.py", line 5334, in __init__
    this = _lldb.new_SBFile(*args)
NotImplementedError: Wrong number or type of arguments for overloaded function 'new_SBFile'.
  Possible C/C++ prototypes are:
    lldb::SBFile::SBFile()
    lldb::SBFile::SBFile(int,char const *,bool)
    lldb::SBFile::SBFile(lldb::FileSP)

Yeah, as Jan mentions, I've had to make some tweaks (r374331, r374322, r374299) in order to get this passing on all python and swig versions. Let me know if you have questions/issues/etc about any of those.