This is an archive of the discontinued LLVM Phabricator instance.

[ADT] Add a new class OwnedOrBorrowed<T>
AbandonedPublic

Authored by zturner on Aug 30 2018, 9:04 PM.

Details

Summary

This might be controversial, but this class is essentially a variant<unique_ptr<T>, T&>. I can go into great detail about why I need this, but it's a huge wall of text, so unless anyone asks, I'll spare you the details. The idea, however, is simple. You can construct this class with a unique_ptr<T> or with a T&. In the former case, the class owns the object, in the latter it doesn't. Aside from that, it's a simple smart pointer with all the overloaded operators, structors, and other functionality you'd expect.

Diff Detail

Event Timeline

zturner created this revision.Aug 30 2018, 9:04 PM

Yep, I'll ask for the explanation - at least at Google there's been some
pushback on adding such a general purpose construct to the core library on
the premise that there's often/usually a better design. Equally I've
certainly seen some places it felt kind of right, for sure.

zturner added a comment.EditedAug 31 2018, 8:23 AM

So this goes back to the PDB reading code. IIRC you helped me design the interface of that library, so maybe you still remember some of it. But here's a refresher.

The idea is to provide a platform-agnostic interface over DIA SDK. Then we have one concrete implementation of this interface that is implemented via DIA, and we have another concrete implementation that we implement natively. Furthermore, the way DIA works is that there is really only one main interface, called IDiaSymbol, that provides a monolithic interface to every possible operation for every possible symbol type, but a large portion of the methods simply return an error if you try to use them on the wrong symbol type. We want to provide a rich class hierarchy though, so that you can't call the wrong method on an object. If you've got a function object, we only want to expose methods that are valid on functions. And we want this to be invisible to the user so that they get the same object / class library / interface regardless of which implementation they're using.

So our solution was to provide two class hierarchies. The first is rooted at IPDBRawSymbol and provides this same monolithic interface. Then we have, for example, DIARawSymbol which forwards calls to DIA, and wraps the results in a class that hides the implementation so there is no user dependency on DIA. Then we could provide a native implementation that works the same way.

The second hierarchy is what the user actually operates on. This hierarchy is rooted at PDBSymbol and there are concrete implementations for the different types of symbols. There's PDBSymbolFunc, PDBSymbolTypeEnum, and many others. The purpose of these is to expose only the set of the monolithic interface that makes sense for a given symbol type. But there is not a parallel hierarchy for each implementation (DIA/Native). There is just one hierarchy, rooted at PDBSymbol, and this PDBSymbol holds some kind of polymorphic instance of an IPDBRawSymbol so that it can forward calls.

To draw it out, it's something like this:

IPDBRawSymbol
|_ NativeRawSymbol
   |+ class member: NativePDBFile
|_ DIARawSymbol
   |+ class member: CComPtr<IDiaSymbol>

PDBSymbol
|+ class member: unique_ptr<IPDBRawSymbol>
|_ PDBSymbolFunc
|_ PDBSymbolExe
|_ PDBSymbolTypeEnum
|_ PDBSymbolFunctionSignature
|_ etc

Now all of the methods of PDBSymbol and subclasses return unique_ptr<T> where T is some class from the second hierarchy. So if you say "give me the function signature of this function", you're going to get back a unique_ptr<PDBSymbolFunctionSignature>. When it's destroyed, the class member of type unique_ptr<IPDBRawSymbol> will get destroyed, and in the case of DIA this means the CComPtr<IDiaSymbol> will get released and properly cleaned up. All is good.

In the native implementation, I want to cache the NativeRawSymbol instances though. It's expensive to create them, because it is actually required by the API in a way that they be "stable". What I mean by this is that each symbol has a method called getUniqueId(). In DIA, if you have an object, call getUniqueId(), then destroy the object, then call getSymbolById(n) where n is the previous id, you will get back the same object again. Not the same pointer value obviously, since it was destroyed, but the COM interface is really just an opaque wrapper around something that *is* cached, and that's what you're getting back. We need to do the same thing (both for efficiency and correctness).

So, when someone calls, for example, getAllCompilands, we return an enumerator for the compilands, and as you enumerate them, a unique id is assigned to each compiland returned. The next time someone calls getAllCompilands, the enumerator should not assign new unique ids for any compilands which have already been enumerated. It should simply return items from the cache.

so herein lies the problem, and the reason i want this class. I want to change PDBSymbol to hold a MaybeOwned<IPDBRawSymbol>. On the native implementation side, I can construct instances of them with references to the cached objects, and on the DIA side, I can construct instances of them with unique_ptr<DIARawSymbol>.

FWIW, if we don't think this is general purpose enough, I can just hack up the implementation of PDBSymbol to do this directly in the class. This just seemed like a cleaner solution to me.

zturner updated this revision to Diff 163545.Aug 31 2018, 9:05 AM
  1. Changed the name to MaybeOwned.
  2. Made constructors implicit (It's nice to be able to write a class with a constructor like Foo(MaybeOwne<T>) and be able to create it as Foo F(std::move(UniquePtr));.
  3. Added an enable_if clause to the constructor so that unique_ptr<U> can be used whenever it's convertible to unique_ptr<T>

FWIW, thanks for the fantastic explanation Zach.

I'm still interested in Dave's take on this, but here is mine.

I don't think we should add this general of a construct. While I understand your use case, I *do* think it represents a really awkward design that would be best avoided. That doesn't mean you can reasonably avoid it, but it makes me very hesitant to build very general purpose abstractions that are likely to encourage the same pattern in other bits of code.

I also suspect (but I'm not sure) that there is a better approach when handling this directly within your library. My suggestion would be to (wait for it!) add a layer of abstraction. Fundamentally, you have two implementation strategies that want different ownership semantics. That's fine, but exposing a std::unique_ptr in the API that sits *above* that implementation leaks an ownership strategy out of the implementation and into the interface. Instead, I think you should build an abstraction that more thoroughly hides the ownership semantics. This may be effectively the same type as you have here, but now rather than merely being "maybe we own it, maybe we don't", we have a more principled distinction: we're abstracting between implementation A and B where the implementations have different ownership strategies. And if you use some handle type, you may be able to have it "always be destroyed" but only trigger the deallocation when necessary based on the implemnetation details so that the user of the interface cannot observe this "maybe owned" semantic -- they see it as *always* having ownership, its just that in one case the ownership is a trivial wrapper because of a long-lived cache.

Not sure if this makes tons of sense, happy to try nand chat about how to more cleanly figure this out if helpful, but I also don't want to slow down or add more work to the design of the underlying debug info library -- it may just not be worth the time/effort in that localized part of the code to build this more complex construct....

zturner abandoned this revision.Sep 4 2018, 9:51 AM

I can just abandon it. It's no big deal really, because I can work around it in my project with like a few lines of code:

struct Foo {
  Foo(std::unique_ptr<Bar> B) : Borrowed(B.get()), Owned(std::move(B)) {}
  Foo(Bar &B) : Borrowed(&B) {}
  Bar *Borrowed;
  std::unique_ptr<Bar> Owned;
};

Another idea to fix it "for real" is to use a shared_ptr<>, and then in my cache instead of storing unique_ptr store shared_ptr, so it's properly ref-counted. Yet another way would be in the borrowed case to construct the shared_ptr<> with a custom deleter that does nothing. So there are several approaches. I mainly just wanted to throw this out there in case it was like "yea, even though it's lame we've needed this enough times that we might as well generalize it". But if we're not there, I'll just abandon it.

More or less agreed with @chandlerc (thanks for covering that all in detail) - though I'm marginally less concerned about the precedent setting in LLVM (small enough project we can rely a bit more on code review to discuss these things - and less likely to end up baked in so heavily that refactoring designs that are later discovered to be problematic to remove the use of this sort of thing).

I'd be interested to see the real code when it's out for review/committed & see what the options look like.