This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Make the NSSet formatter faster and less prone to infinite recursion
ClosedPublic

Authored by teemperor on Apr 29 2021, 7:39 AM.

Details

Summary

Right now to get the 'NSSet *` pointer value we first derefence it and then take the address of the result.

Beside being inefficient this potentially can cause an infinite recursion if the pointer value we get is a pointer
of a type that the TypeSystem can't derefence. If the pointer is for example some form of void * that the
dynamic type resolution can't resolve to an actual type, then the Derefence call goes back to asking
the formatters how to reference it. If the NSSet formatter then checks if it's an NSSet variation under the
hood then we just end infinitely often recursion.

In practice this seems to happen with some form of Builtin.RawPointer we get from a NSDictionary in Swift.

FWIW, no other formatter is doing the same deref->addressOf as here and there doesn't seem to be any
specific reason to do so in the git history (it's just part of the initial formatter commit)

Diff Detail

Event Timeline

teemperor created this revision.Apr 29 2021, 7:39 AM
teemperor requested review of this revision.Apr 29 2021, 7:39 AM
JDevlieghere accepted this revision.Apr 29 2021, 9:36 AM
JDevlieghere added a subscriber: JDevlieghere.

Looks reasonable to me. Barring a good explanation for the dereference -> address of roundtrip, this LGTM.

This revision is now accepted and ready to land.Apr 29 2021, 9:36 AM
kastiglione added inline comments.
lldb/source/Plugins/Language/ObjC/NSSet.cpp
729–730

why GetValueAsUnsigned(0) here and GetPointerValue() above?

shafik added a subscriber: shafik.Apr 29 2021, 9:50 AM

Seems reasonable to me, I wonder why it was done that way originally?

teemperor added inline comments.Apr 29 2021, 9:58 AM
lldb/source/Plugins/Language/ObjC/NSSet.cpp
729–730

Huh, wrong diff... GetValueAsUnsigned is apparently (according to the docs next to the function) how we are supposed to get pointer values. Not sure why, but it seems that's just the way it's done (I originally thought PointerValue is the right way but that seems actually quite uncommon. FWIW, I think they do the same thing?)

  • Made both ValueAsUnsigned which is what the other formatters use.
This revision was landed with ongoing or failed builds.Apr 29 2021, 10:14 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptApr 29 2021, 10:14 AM

Functionally this LGTM. Would be curious to do git archeology to understand why the old code used this pattern.

GetPointerValue strips sign bits, so formally it should be used whenever you are getting a pointer that might be signed. OTOH, GetPointerValue is pretty new and since plain data pointers aren't going to be signed, there was no reason to go convert all the uses. We probably should use GetPointerValue. Also, using GetValueAsUnsigned with a 0 fallback value is not a great idea, though it seems to have been pretty common practice. That means if we are unable to read the pointer value for some reason, we'll say it is NULL. I think it would be better to use LLDB_INVALID_ADDRESS in these cases.