This is an archive of the discontinued LLVM Phabricator instance.

[LLDB][PythonFile] fix dangerous borrow semantics on python2
ClosedPublic

Authored by lawrence_danna on Oct 28 2019, 2:06 PM.

Details

Summary

It is inherently unsafe to allow a python program to manipulate borrowed
memory from a python object's destructor. It would be nice to
flush a borrowed file when python is finished with it, but it's not safe
to do on python 2.

Python 3 does not suffer from this issue.

Event Timeline

lawrence_danna created this revision.Oct 28 2019, 2:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 28 2019, 2:06 PM

So, if I understand correctly. The problem here is the final call to fflush, which can end up referencing a closed FILE*. Can we just not call fflush then? It shouldn't be really needed -- if everything goes through the same FILE* object, the C library will make sure the writes are available to anyone who tries to read through that object.

I don't buy the argument that holding onto a FD-backed object after the FD has been closed is somehow "safer" than holding onto a FILE*. They both produce undefined behavior, and given how FDs are recycled, it's very likely that this dangling object will end up writing to a random other open file -- that may be even worse than crashing. If we're not happy with that requirement then we can make GetFile() return a real python object, which will hold onto the lldb_private::File instance and handle these use-after-free cases in a reliable manner.

So, if I understand correctly. The problem here is the final call to fflush, which can end up referencing a closed FILE*. Can we just not call fflush then?

Hrm, maybe. I'll try it and see if anything breaks.

It shouldn't be really needed -- if everything goes through the same FILE* object, the C library will make sure the writes are available to anyone who tries to read through that object.
I don't buy the argument that holding onto a FD-backed object after the FD has been closed is somehow "safer" than holding onto a FILE*. They both produce undefined behavior,

But this is a fundamental difference between python and C++. In C++ there's only one level of UB. If you do something illegal, your program can crash, demons come out of your nose, whatever. In python, unless you're using something like ctypes, UB means your program misbehaves, but should never mean that the interpreter crashes. An interpreter crash in python is not the moral equivalent of a segfault in C++, it's the equivalent of a kernel panic. If unprivileged C++ code crashes the kernel, that's a bug in the kernel, whether UB was involved or not. If a non-ctypes python program crashes the interpreter, that's a bug in Python or some loadable module, no matter how incorrect the python program was.

The other aspect here is that it is reasonable to ask a python programmer not to use a closed file. I don't think it is ever safe to ask a python programmer to ensure that reference counts get decremented in a particular order.

just don't even flush

fix the test too

lawrence_danna edited the summary of this revision. (Show Details)Oct 29 2019, 6:17 PM

@labath looks like you were right, just not flushing seems to work fine.

labath accepted this revision.Oct 30 2019, 3:58 AM

It shouldn't be really needed -- if everything goes through the same FILE* object, the C library will make sure the writes are available to anyone who tries to read through that object.
I don't buy the argument that holding onto a FD-backed object after the FD has been closed is somehow "safer" than holding onto a FILE*. They both produce undefined behavior,

But this is a fundamental difference between python and C++. In C++ there's only one level of UB. If you do something illegal, your program can crash, demons come out of your nose, whatever. In python, unless you're using something like ctypes, UB means your program misbehaves, but should never mean that the interpreter crashes. An interpreter crash in python is not the moral equivalent of a segfault in C++, it's the equivalent of a kernel panic. If unprivileged C++ code crashes the kernel, that's a bug in the kernel, whether UB was involved or not. If a non-ctypes python program crashes the interpreter, that's a bug in Python or some loadable module, no matter how incorrect the python program was.

Yes, I get that. What I was trying to say that using a fd-based python file after it has been closed introduces exactly the same level of UB as for a FILE*-based one. I mean, for all you know, the fd might be recycled to point to /dev/sda, and writing to it can make your whole system unbootable.

The difference was that in the FILE* case the mere act of closing constituted a "use", while for a fd file we just happily did nothing. Not fflushing makes this aspect of the two implementations equal. It is true that this introduces some potential difference in behavior as a FILE* involves userspace buffers, whereas for an fd everything would go to the kernel straight away (at least on posix systems). However, cloning a bunch of FILE*s would introduce other kinds of inconsistencies, so I think this is a better tradeoff.

This revision is now accepted and ready to land.Oct 30 2019, 3:58 AM
This revision was automatically updated to reflect the committed changes.