This is an archive of the discontinued LLVM Phabricator instance.

Fix ref-counting of Python objects
ClosedPublic

Authored by zturner on Oct 9 2015, 7:41 PM.

Details

Summary

PythonObjects were being incorrectly ref-counted. This problem was pervasive throughout the codebase, leading to an unknown number of memory leaks and potentially use-after-free.

The issue stems from the fact that Python native methods can either return "borrowed" references or "owned" references. For the former category, you *must* incref it prior to decrefing it. And for the latter category, you should not incref it before decrefing it. This is mostly an issue when a Python C API method returns a PyObject to you, but it can also happen with a method accepts a PyObject. Notably, this happens in PyList_SetItem, which is documented to "steal" the reference that you give it. So if you pass something to PyList_SetItem, you cannot hold onto it unless you incref it first. But since this is one of only two exceptions in the entire API, it's confusing and difficult to remember.

Our PythonObject class was indiscriminantely increfing every object it received, which means that if you passed it an owned reference, you now have a dangling reference since owned references should not be increfed. We were doing this in quite a few places.

There was also a fair amount of manual increfing and decrefing prevalent throughout the codebase, which is easy to get wrong.

This patch solves the problem by first changing the name of PythonObject to PyRef (so that it is clear from the name what this actually is), and then making any construction of a PyRef from a PyObject take a flag which indicates whether it is an owned reference or a borrowed reference. There is no way to construct a PyRef without this flag, and it does not offer a default value, forcing the user to make an explicit decision every time.

All manual uses of PyObject have been cleaned up throughout the codebase and replaced with PyRef.

Diff Detail

Event Timeline

zturner updated this revision to Diff 37018.Oct 9 2015, 7:41 PM
zturner retitled this revision from to Fix ref-counting of Python objects.
zturner updated this object.
zturner added a reviewer: clayborg.
zturner added a subscriber: lldb-commits.
zturner added inline comments.Oct 9 2015, 7:49 PM
source/Plugins/ScriptInterpreter/Python/PythonDataObjects.cpp
337 ↗(On Diff #37018)

Just to provide one example, here is an instance of a fixed memory leak. PyLong_FromLongLong returns a new reference, not a borrowed one, but PythonObject::Reset would indiscriminately call Py_INCREF.

clayborg requested changes to this revision.Oct 12 2015, 10:18 AM
clayborg edited edge metadata.

See inlined comments.

source/Plugins/ScriptInterpreter/Python/PythonDataObjects.h
73–79 ↗(On Diff #37018)

It seems like we don't need an enum for this, this can just be a bool named "dont_incref" or some other name?

73–85 ↗(On Diff #37018)

Might be nicer to move into the PyRef class and just name these "Type" and "InitialValue"?

87 ↗(On Diff #37018)

Why did we need to rename PythonObject to PyRef? It can be done, but seems different from the the rest of the classes here. Everything else is spelled out, I would prefer this stay as PythonObject.

90–91 ↗(On Diff #37018)

indent is wrong here

192 ↗(On Diff #37018)

I really don't like these implicit conversions as they make for bad things that can happen. We already see issues with Reset as you mention above. I would rather just have a "PyObject *get() const;" method that we use to explicitly do this. It gets event worse when you already have an implicit bool operator...

206–211 ↗(On Diff #37018)

indent is wrong here

234–238 ↗(On Diff #37018)

indent is wrong here

258–261 ↗(On Diff #37018)

indent is wrong here

284–287 ↗(On Diff #37018)

indent is wrong here

304 ↗(On Diff #37018)

We should use "const PyRef &object" here instead of "PyObject *value". I know the old code used PyObject, but it is probably better to switch over.

306 ↗(On Diff #37018)

We should use "const PyRef &object" here instead of "PyObject *value". I know the old code used PyObject, but it is probably better to switch over.

source/Plugins/ScriptInterpreter/Python/ScriptInterpreterPython.cpp
183–201 ↗(On Diff #37018)

please move : to previous line and commas to the end.

This revision now requires changes to proceed.Oct 12 2015, 10:18 AM

Let me know what you think about my responses. Anything I didn't specifically call out I'll fix in the next version of the patch.

source/Plugins/ScriptInterpreter/Python/PythonDataObjects.h
73–79 ↗(On Diff #37018)

In theory. I used an enum because if you read the Python documentation, they explicitly use the terminology of Borrowed and New references. I don't want people to have to think about how to translate the Python documentation into a boolean value. This way the code speaks the same language as the documentation, and it's hard to make a mistake.

73–85 ↗(On Diff #37018)

Possible, but it only really applies for Dictionary and List, or at least the way we use it it does.

87 ↗(On Diff #37018)

Less to type for one thing, and I thought it made it clearer that it's a very thin wrapper that can be trivially copied without any overhead.

192 ↗(On Diff #37018)

Fair enough, I can fix that. Do you want me to remove the implicit bool conversion operator too? The bool conversion operator is weird, because it only checks for null, and not Py_None, so it's different than calling IsNULLOrNone. In practice I think it would be fine to call IsNULLOrNone, and at one point while this CL is in progress I had changed the client code to do that, but it ended up being really ugly so I put the bool conversion back.

In any case, should I just leave the bool operator and remove the PyObject operator?

I would like to see PythonObject used in place of PyRef, feel free to remove the operator bool and PyObject, and then the rest of the mostly indent and style cleanups.

source/Plugins/ScriptInterpreter/Python/PythonDataObjects.h
73–79 ↗(On Diff #37018)

Ok, no problem then.

192 ↗(On Diff #37018)

Yeah, feel free to pull the operator bool as well. It can do bad things as well. IsValid() is what we use elsewhere, thought we may want more python specific things. We might want IsNULL(), IsNone() or IsNULLOrNone() depending on who calls it. Feel free to clean this up.

zturner accepted this revision.Oct 15 2015, 1:50 PM
zturner added a reviewer: zturner.
granata.enrico resigned from this revision.Nov 29 2016, 10:05 AM
granata.enrico removed a reviewer: granata.enrico.

I am not an Apple employee working on LLDB anymore

This revision was not accepted when it landed; it landed in state Needs Revision.Oct 7 2019, 7:35 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptOct 7 2019, 7:35 AM