This is an archive of the discontinued LLVM Phabricator instance.

new api class: SBFile
ClosedPublic

Authored by lawrence_danna on Sep 19 2019, 9:27 PM.

Details

Summary

SBFile is a scripting API wrapper for lldb_private::File

This is the first step in a project to enable arbitrary python
io.IOBase file objects -- including those that override the read()
and write() methods -- to be used as the main debugger IOStreams.

Currently this is impossible because python file objects must first
be converted into FILE* streams by SWIG in order to be passed into
the debugger.

full prototype: https://github.com/smoofra/llvm-project/tree/files

Event Timeline

lawrence_danna created this revision.Sep 19 2019, 9:27 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 19 2019, 9:27 PM
Herald added a subscriber: mgorny. · View Herald Transcript
lawrence_danna edited the summary of this revision. (Show Details)Sep 19 2019, 9:30 PM
labath added a subscriber: labath.Sep 20 2019, 6:01 AM

This is an interesting undertaking. There's a lot of confusion about FILE* in lldb, and it would be great to clean some of that stuff up.

The api seems reasonably straight-forward. The one thing that struck me is that you're using unique_ptrs for the pimpled objects. That would mean the SBFile objects are non-copyable, which is generally not a good thing, as python really likes to copy things. Looking ahead at your patchset, it looks like I am onto something, as you're then trying to implement shared semantics inside lldb_private::File. That doesn't sound like the most fortunate solution to me, as most of the lldb_private::File users don't need that functionality. I am wondering if it wouldn't be better to use a shared_ptr in SBFile, and implement any extra sharing semantics you need at the SB level. Also, this needs a bunch of tests.

From a higher level perspective, it would be good to have a couple of paragraphs saying what exactly is the end goal here (what new apis will you introduce, how will they work, etc.). I can sort of guess some of that from looking at the patchset, but it would be nice to have that written down, so that we know we're on the same page. This looks like it is a sufficiently big of a change that it would deserve a quick RFC email to lldb-dev.

lldb/source/API/SBFile.cpp
21–27

I think it would be better if these were constructors instead of member functions. That way you might be able to get rid of the all the if (!m_opaque_up) { checks as the File member will always be initialized.

22

It looks like you need to reformat this file.

The api seems reasonably straight-forward. The one thing that struck me is that you're using unique_ptrs for the pimpled objects. That would mean the SBFile objects are non-copyable, which is generally not a good thing, as python really likes to copy things. Looking ahead at your patchset, it looks like I am onto something, as you're then trying to implement shared semantics inside lldb_private::File. That doesn't sound like the most fortunate solution to me, as most of the lldb_private::File users don't need that functionality. I am wondering if it wouldn't be better to use a shared_ptr in SBFile, and implement any extra sharing semantics you need at the SB level. Also, this needs a bunch of tests.

That was my initial thought as well, to use a shared_ptr here, and not have any sharing at the lldb_private::File level. However, I found I couldn't do it that way. lldb_private::File doesn't really have the semantics of a file object, it has the semantics of a variable, which can point to a file object. Users of lldb_private::File expect to be able to Close() a file and re-open it as something else. They expect to be able to embed File immediately in other objects. They expect to be able to make an empty File and check it with IsValid(). I thought it would be best to let them keep using File in that way, so later in the queue I make File copyable and add FileOps class to manage the sharing between copied files. The other way to do it would be to track down every use of lldb_private::File that isn't through a pointer and convert it to shared_ptr<File> instead. Then SBFile could just be another shared_ptr<File>. Should I do it that way?

As far as python copying things goes, that's not a problem. SWIG will always allocate a SBFile on the heap and copy the pointers to it, it doesn't need an actual C++ copy constructor.

added some tests, rebased

@labath So what do you think?
Update all the lldb_private::File clients to use shared_ptr<File> instead?
Or do it this way, with sharing handled inside of File?

@labath

I wrote a patch for the shared_ptr approach. It's simple, but it touches a lot of lines.

https://reviews.llvm.org/D67891

Now that I've gone and done it, I kind of like it better that way. If you approve the other patch, I'll update this one accordingly.

@labath

I wrote a patch for the shared_ptr approach. It's simple, but it touches a lot of lines.

https://reviews.llvm.org/D67891

Now that I've gone and done it, I kind of like it better that way. If you approve the other patch, I'll update this one accordingly.

Thanks for the quick turnaround. /I think/ the approach in the other patch looks better (I am super happy about the removal of the SetXXX methods), but I'd still like to understand whether we need to introduce that many shared_ptrs, so I'll need to think about this some more (I just got back from vacation and I am still working through my backlog).

changed unique_ptr to shared_ptr

lawrence_danna marked 2 inline comments as done.Sep 24 2019, 5:51 PM
lawrence_danna added inline comments.
lldb/source/API/SBFile.cpp
21–27

Unfortunately, SWIG does not allow return types that don't have default constructors.

https://github.com/swig/swig/issues/1062

labath added inline comments.Sep 25 2019, 5:35 AM
lldb/source/API/SBFile.cpp
21–27

Hmm.. that is unfortunate. I guess that means we still need the validity checks, but you should still be able to replace these with constructors, right? (you've guessed by now that I am not a fan of Set methods :) ).

lawrence_danna edited the summary of this revision. (Show Details)

rebased, and converted Set* methods into constructors.

lawrence_danna marked 2 inline comments as done.Sep 27 2019, 9:35 AM

@labath, OK sets are now constructors.

rebased, removed duplicate test

Thanks. This looks good to me, but given that this is going to become a stable api that well need to maintain for a looong time, I'd like to get signoff from someone else as well. @jingham maybe ?

lldb/include/lldb/API/SBDefines.h
95

Sort this list ?

lldb/packages/Python/lldbsuite/test/python_api/file_handle/TestFileHandle.py
28

Please add a comment to explain why is this needed (I guess that has something to do with needing more control over the command output stream), because superficially it looks like you're reimplementing TestBase.runCmd here.

38–39

Maybe instead of raise_on_fail this should be called check (in analogy to runCmd function), and instead of raising an exception you could assert that ret.Succeeded() is true?

53

I believe all of these will end up writing to the source tree. Could you try replacing that with self.getBuildArtifact("output") ?

128–133

delete

lldb/scripts/interface/SBFile.i
37

add operator bool here. I'm not sure if swig handles operator!, but if it does, then we should add that too.

lldb/source/API/SBFile.cpp
2

I think you'll need to add the reproducer boilerplate to this file. I think it should be possible to do that automatically via lldb-instr, but I've never done that. @JDevlieghere should be able to help here.

lldb/source/Host/common/File.cpp
72–74

I know you're just copying, but could you remove the .str() call the `.empty() check, as neither of them are really needed here.

This looks fine to me, it doesn't look like you are leaving out anything useful.

Note, I think just to enforce the "default constructor" discipline, there's a test in the test suite (python_api/default-constructor/TestDefaultConstructorForAPIObjects.py) that has a file per class that just makes a default constructed object and calls all of its methods to make sure they don't crash. According to that pattern your test_sbfile_invalid test should actually go there.

lawrence_danna marked 7 inline comments as done.

fixed according to reviewer comments

addressed comments

flush fix for python2

add reproducer instrumentation.

lawrence_danna marked an inline comment as done.Sep 30 2019, 5:53 PM
lawrence_danna marked an inline comment as done.Sep 30 2019, 5:56 PM
lawrence_danna added inline comments.
lldb/scripts/Python/python-typemaps.swig
464 ↗(On Diff #222528)

swig/LICENSE says the following, so it should be fine to copy and paste these.

The SWIG library and examples, under the Lib and Examples top level 
directories, are distributed under the following terms:

  You may copy, modify, distribute, and make derivative works based on
  this software, in source code or object code form, without
  restriction. If you distribute the software to others, you may do
  so according to the terms of your choice. This software is offered as
  is, without warranty of any kind.

no inlines in the API

@labath any more comments on this one?

labath accepted this revision.Oct 2 2019, 7:26 AM

Nah, this looks fine to me now.

This revision is now accepted and ready to land.Oct 2 2019, 7:26 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptOct 2 2019, 8:59 PM
labath added a comment.Oct 3 2019, 1:49 AM

BTW, I've had to add (or rather, extend) a fairly ugly hack in r373573 in order to get the new tests running on non-darwin platforms. The reason for that is that the FILE* out typemap needs to get the "mode" of the object in order to construct the python wrapper.

My hope is that we can be improved once you're finished your work here. For instance by reimplementing GetOutputFileHandle on top of GetOutputFile in swig, as the latter will (hopefully) be able to get the mode of a file correctly. Does that sound reasonable, or am I totally off-track here?

BTW, I've had to add (or rather, extend) a fairly ugly hack in r373573 in order to get the new tests running on non-darwin platforms. The reason for that is that the FILE* out typemap needs to get the "mode" of the object in order to construct the python wrapper.

My hope is that we can be improved once you're finished your work here. For instance by reimplementing GetOutputFileHandle on top of GetOutputFile in swig, as the latter will (hopefully) be able to get the mode of a file correctly. Does that sound reasonable, or am I totally off-track here?

oops, sorry I missed this. Are you happy with how it all turned out?

BTW, I've had to add (or rather, extend) a fairly ugly hack in r373573 in order to get the new tests running on non-darwin platforms. The reason for that is that the FILE* out typemap needs to get the "mode" of the object in order to construct the python wrapper.

My hope is that we can be improved once you're finished your work here. For instance by reimplementing GetOutputFileHandle on top of GetOutputFile in swig, as the latter will (hopefully) be able to get the mode of a file correctly. Does that sound reasonable, or am I totally off-track here?

oops, sorry I missed this. Are you happy with how it all turned out?

Yes, very much. Thank you for cleaning that up.

While looking at the other bug, I noticed something suspicious here.

lldb/trunk/source/Host/common/File.cpp
74 ↗(On Diff #222960)

Shouldn't this also include File::eOpenOptionCanCreate | File::eOpenOptionTruncate?

labath added inline comments.Oct 24 2019, 9:41 AM
lldb/trunk/source/Host/common/File.cpp
74 ↗(On Diff #222960)

Yeah, I guess it should, though it probably does not matter in practice right now as we're only using this on already opened FILE* objects.

vadimcn added inline comments.
lldb/trunk/scripts/Python/python-typemaps.swig
481 ↗(On Diff #222960)

Sorry for being late to the party, but I just stumbled upon this code...

It seems to return a pointer from a view that had just been released. Isn't this kind of risky? While most of the time buffer views point into object's internal memory, buffer protocol does not prohibit allocating memory just to fulfill the buffer request. In which case PyBuffer_Release would be expected to release that memory, leaving the caller with a dangling pointer.

labath added inline comments.Jan 13 2020, 2:54 AM
lldb/trunk/scripts/Python/python-typemaps.swig
481 ↗(On Diff #222960)

Yes, that does seem worrying, though the swig typemap also does the same thing (but it could have the same bug too). @lawrence_danna, do you know anything about this? Should the buffer be released after the function returns (in the "out" typemap, I guess)?

lawrence_danna marked an inline comment as done.Jan 13 2020, 11:07 AM
lawrence_danna added inline comments.
lldb/trunk/scripts/Python/python-typemaps.swig
481 ↗(On Diff #222960)

@vadimcn is correct. If GetBuffer allocates to fulfill the request, rather than returning a pre-existing buffer, then this will access a dangling pointer.

@labath "in" and "out" type-maps aren't paired up like that.

I believe there is a way you can instruct SWIG to create a C++ helper variable for a typemap. The solution in this case is to use such a temporary to carry the PyBuffer reference in RAII-style.