This is an archive of the discontinued LLVM Phabricator instance.

remove FILE* usage from ReportEventState() and HandleProcessEvent()
ClosedPublic

Authored by lawrence_danna on Oct 5 2019, 4:02 PM.

Details

Summary

This patch adds FileSP and SBFile versions of the API methods
ReportEventState and HandleProcessEvent. It points the SWIG
wrappers at these instead of the ones that use FILE* streams.

Event Timeline

lawrence_danna created this revision.Oct 5 2019, 4:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 5 2019, 4:02 PM

oops, remove unnecessary FileSP reference from interface file

Adding Jim for the API design aspects.

Two things come to mind here:

  • It's unfortunate that we have to add the FileSP overload (and the associated cruft) to every API that used to take a FILE*. I take it that is needed to get rid of the swig typemaps? Is there any way to avoid it?
  • shouldn't these APIs be taking an (SB)Stream instead? I am not sure if we have ever specified the difference between (SB)Stream and (SB)File, but if I had to spell that out, I'd say that this it is that a (SB)Stream supports write-only append-only output, while an (SB)File offers a richer set of APIs (combining reads and writes, random access, etc.). In this sense, it looks like both of these APIs should be perfectly fine (now and for the forseeable future) with a (SB)Stream. Should we make a new overload taking a SBStream instead? Jim, do you know why these functions are taking a FILE*. Could it be that they predate SBStream?

SBProcess::ReportEventState was introduced in r112221, and SBStream was added in r114188, so Pavel's speculation seems like a reasonable one, though that was 9 years ago...

But in the SB API's we use SBStream in a bunch of places to be more like an SBString, something you can pass to a client and they will add some data to it, and then you call GetData to get the result. That's a not very stream-like usage.

In the case of HandleProcessEvent, SBDebugger uses it for HandleCommand, when it wants to dump all the stop event data to the Debugger's STDIO. That would be less convenient with an SBStream, you'd have to make one, then Redirect it to the debugger's stdout & stderr. That seems kind of awkward.

Adding Jim for the API design aspects.

It's unfortunate that we have to add the FileSP overload (and the associated cruft) to every API that used to take a FILE*. I take it that is needed to get rid of the swig typemaps? Is there any way to avoid it?

I think at the swig level there really does need to be two, because they're doing two different things. The FileSP version takes a python file object and converts it, and then passes it in. This should have the same behavior as the old FILE* binding. The SBFile takes a file that has already been converted -- possibly with the borrow or force_io_methods flags set, and passes that in directly, without doing further conversion.

I could take the duplicates out of the .h and .cpp files, and use %extend to add them back in in the .i files. Is that cleaner?

Personally, I think %extend is a bit ugly as it increases the difference between what's in the headers and what's actually in the python bindings, so I'd rather not do it that way. But I'm not dogmatically opposed to it.

labath added a comment.Oct 8 2019, 4:50 AM

Thanks for weighing in, Jim. My responses are inline.

SBProcess::ReportEventState was introduced in r112221, and SBStream was added in r114188, so Pavel's speculation seems like a reasonable one, though that was 9 years ago...

But in the SB API's we use SBStream in a bunch of places to be more like an SBString, something you can pass to a client and they will add some data to it, and then you call GetData to get the result. That's a not very stream-like usage.

Why not? I think that's fairly similar to how one uses std::ostringstream.

std::ostringstream stream;
write_some_data_to(stream); // or stream << stuff;
std::string data = stream.str();

I think the main non-stream like feature of SBStream is that it behaves like std::ostringstream _by default_, but then it can be later converted to something like std::ofstream by calling RedirectToFile. However, I think that is something that can still be fixed, mostly..

In the case of HandleProcessEvent, SBDebugger uses it for HandleCommand, when it wants to dump all the stop event data to the Debugger's STDIO. That would be less convenient with an SBStream, you'd have to make one, then Redirect it to the debugger's stdout & stderr. That seems kind of awkward.

So, what would you say if we added a SBStream constructor taking an SBFile (which would do equivalent of SBStream::RedirectToFile). Then all HandleCommand would have to do is call HandleProcessEvent(process, event, SBStream(GetOutputFile()), SBStream(GetErrorFile())). I think that would be fairly familiar to anyone who has worked with stream, as creating a stream which uses some other object (file, fd, string, ...) as a backing store is a pretty common pattern.

lldb/source/API/SBProcess.cpp
359–364

For instance, this code already kind of mocks a "stream wrapping a file object" by using snprintf. If we had a real stream object around (which properly supported formatted IO) then this could be simplified to stream->Printf(...)

labath added a comment.Oct 8 2019, 5:08 AM

Personally, I think %extend is a bit ugly as it increases the difference between what's in the headers and what's actually in the python bindings, so I'd rather not do it that way. But I'm not dogmatically opposed to it.

Yeah, the %extends are somewhat ugly, but I also find the FileSP overloads in the "public" sections of the SB classes unnerving too, because they are not really public -- no user can actually call them directly in a meaningful way. They are really only meant for the python interface, which sort does sit on the other side of the SB api, but also kind of doesn't. That's the weird part about lldb python support, where python can be embedded *into* lldb, but it also can be sitting *above* it (or both at the same time). I am not really sure what to do about that. I don't suppose there's any way to make the FileSP overloads private?

Or maybe they actually should be public, and the actual problem is that there is no way for a c++ user to make use of this functionality, even though it should be able to? Right now, the c++ api is sort of disadvantaged in that one cannot create an SBFile that would redirect to some custom fancy c++ output object. I am not saying you should implement that, but if we assume that such functionality were available do you think it would be possible to implement the python stuff on top of that? In that sense, maybe the FileSP overloads are actually fine, and we can treat them as a placeholder for this hypothetical future functionality?

I am sorry if this doesn't make much sense -- that's because I myself am not clear on this matter. But I hope that I at least managed to give you an idea of the things going through my head right now. If you have any thoughts on any of this, I'd like to hear them.

Thanks for weighing in, Jim. My responses are inline.

SBProcess::ReportEventState was introduced in r112221, and SBStream was added in r114188, so Pavel's speculation seems like a reasonable one, though that was 9 years ago...

But in the SB API's we use SBStream in a bunch of places to be more like an SBString, something you can pass to a client and they will add some data to it, and then you call GetData to get the result. That's a not very stream-like usage.

Why not? I think that's fairly similar to how one uses std::ostringstream.

std::ostringstream stream;
write_some_data_to(stream); // or stream << stuff;
std::string data = stream.str();

I think the main non-stream like feature of SBStream is that it behaves like std::ostringstream _by default_, but then it can be later converted to something like std::ofstream by calling RedirectToFile. However, I think that is something that can still be fixed, mostly..

In most of the uses of SBStream (and all the ones' I've done for the various scripts I've written), I would be very surprised if GetData returned nothing because the data had gone elsewhere. I find myself always wanting that output. That's what I meant by saying it was more string-like that stream-like. But that behavior is really at the boundary between lldb & scripts, and can be controlled by the person providing the stream. It is a little weird however that this behavior is non-explicit in SBStream. W.R.T. below, I wonder if it wouldn't have been better to make the Stream output -> File behavior be something that has to be set at construction time and not modifiable. It would be really surprising if I made an SBStream to gather a Description, and have somebody under the covers decide to drain the output off to a file.

In the case of HandleProcessEvent, SBDebugger uses it for HandleCommand, when it wants to dump all the stop event data to the Debugger's STDIO. That would be less convenient with an SBStream, you'd have to make one, then Redirect it to the debugger's stdout & stderr. That seems kind of awkward.

So, what would you say if we added a SBStream constructor taking an SBFile (which would do equivalent of SBStream::RedirectToFile). Then all HandleCommand would have to do is call HandleProcessEvent(process, event, SBStream(GetOutputFile()), SBStream(GetErrorFile())). I think that would be fairly familiar to anyone who has worked with stream, as creating a stream which uses some other object (file, fd, string, ...) as a backing store is a pretty common pattern.

This would be fine.

lawrence_danna added a comment.EditedOct 8 2019, 12:29 PM

Yeah, the %extends are somewhat ugly, but I also find the FileSP overloads in the "public" sections of the SB classes unnerving too, because they are not really public -- no user can actually call them directly in a meaningful way. They are really only meant for the python interface, which sort does sit on the other side of the SB api, but also kind of doesn't. That's the weird part about lldb python support, where python can be embedded *into* lldb, but it also can be sitting *above* it (or both at the same time). I am not really sure what to do about that. I don't suppose there's any way to make the FileSP overloads private?

I don't think so, because the SWIG wrappers need to see them. They are de-facto private though, because they require an argument of type std::shared_ptr<lldb_private::File>.
All external API clients can see is a forward declaration that lldb_private::File is a class. They can make a NULL pointer to one, but that's it.

Or maybe they actually should be public, and the actual problem is that there is no way for a c++ user to make use of this functionality, even though it should be able to? Right now, the c++ api is sort of disadvantaged in that one cannot create an SBFile that would redirect to some custom fancy c++ output object. I am not saying you should implement that, but if we assume that such functionality were available do you think it would be possible to implement the python stuff on top of that? In that sense, maybe the FileSP overloads are actually fine, and we can treat them as a placeholder for this hypothetical future functionality?

I am sorry if this doesn't make much sense -- that's because I myself am not clear on this matter. But I hope that I at least managed to give you an idea of the things going through my head right now. If you have any thoughts on any of this, I'd like to hear them.

No, it does make sense. I too thought it was kind of odd that there's a python API which is inaccessible from C++. I can't really think of any ultimate use case where a C++ binding for File I/O callbacks would be needed, but if it were, I can think of three ways to do it.

The first and simplest way to support it is to just use funopen() or whatever the equivalent of that is on your platform to make a FILE*. I put the FILE* constructor in for C++ even though it will never have a SWIG wrapper so this would be possible. The disadvantage is that funopen() isn't portable, so clients that go this route will need platform specific code to do it.

The intermediate method would be to make our own funopen that would act as a factory function for FileSP. File would remain private, but you could make one with callbacks if you wanted to. Disadvantage here is that C++ clients would be calling though a chain of two function pointers to actually get to the Read and Write methods. This is still better from a performance perspective than calling into a Python method, but if we wanted to avoid that, we could go one step further.

The most complicated way we could handle it would be to expose an ABI-stable abstract base class of File in the API headers. This should be at the base of the File class hierarchy, where IOObject is now. It would only have virtual methods, and it would also have a bunch of unused dummy virtual so we can add new virtual methods later if we need too, without changing the ABI. C++ clients could then just subclass their own File types and use FileSP directly. I think to do this properly we'd need to do some more refactoring, because I'm not sure what should happen to IOObject and Socket under this scenario, or what kinds of error codes the virtual methods should be returning.

lawrence_danna added a comment.EditedOct 8 2019, 12:40 PM

So what's the conclusion here? Should HandleProcessEvent get a SBStream variant as well? I say "as well", because the FileSP variant is required in order to remove the python binding to the FILE* variant, and once we have the FileSP it seems like it would be really strange to leave out the SBFile.

So what's the conclusion here? Should HandleProcessEvent get a SBStream variant as well? I say "as well", because the FileSP variant is required in order to remove the python binding to the FILE* variant, and once we have the FileSP it seems like it would be really strange to leave out the SBFile.

Well... if we agree that SBStream is the future for APIs like this, then I don't think that would be too strange. The FileSP and FILE* variants would both be "legacy/deprecated" and present only to support legacy c++/python uses, and the SBStream would be the thing which we expect new users to use.

That said, I don't think that having an SBFile-based API is that bad either (though I would still like if it is used via an SBStream internally).. The main advantage of the "higher level" stream interface I see is that it is easier to provide your own implementation of it (less methods to override). However, given that we've just went through the exercise of making the file API overridable externally, I don't think we'll want to create an overridable stream abstraction any time soon.

Well... if we agree that SBStream is the future for APIs like this, then I don't think that would be too strange. The FileSP and FILE* variants would both be "legacy/deprecated" and present only to support legacy c++/python uses, and the SBStream would be the thing which we expect new users to use.

That said, I don't think that having an SBFile-based API is that bad either (though I would still like if it is used via an SBStream internally).. The main advantage of the "higher level" stream interface I see is that it is easier to provide your own implementation of it (less methods to override). However, given that we've just went through the exercise of making the file API overridable externally, I don't think we'll want to create an overridable stream abstraction any time soon.

Ok I'll just update it to go though StreamFile internally and leave the decision of whether a SBStream version should be added to a later patch and/or someone else to decide.

Harbormaster completed remote builds in B39452: Diff 224668.
labath accepted this revision.Oct 14 2019, 12:56 AM

Ok I'll just update it to go though StreamFile internally and leave the decision of whether a SBStream version should be added to a later patch and/or someone else to decide.

SGTM

This revision is now accepted and ready to land.Oct 14 2019, 12:56 AM
This revision was automatically updated to reflect the committed changes.