This is an archive of the discontinued LLVM Phabricator instance.

RFC: specialized Optional<T> for T that can represent its own invalid state
AbandonedPublic

Authored by lawrence_danna on Oct 20 2019, 6:29 PM.

Details

Summary

Some types have a representation that would allow them to serve as their own
optional. For example PythonObject in LLDB is just a PyObject * under
the hood. An Optional<PythonObject> could be represented by that one pointer,
with the value of NULL in the None case.

Right now, instead of using Optional<PythonObject>, we just allow them to be in
an invalid state, with lots of checks to PythonObject::IsValid everywhere.
I think it would be better to actually mark which PythonObjects are allowed
to be invalid using optionals. But it seems like a shame to add an extra runtime
bool when we can just use a NULL pointer instead.

I'm posting this patch as a straw-man, as I strongly suspect there's a better way
to do it.

It creates a type trait llvm::optional_detail::has_is_valid. If a class
is marked with this trait, it should implement MakeInvalid() and IsValid().

What I like: it's easy to opt a class in without a bunch of boilerplate.

What I don't like: MakeInvalid() isn't private, even though you're not supposed to,
you can still create invalid objects instead of using optionals.

Another way to do it would be to require the invalid state is all zeros represent
it as

union {
    T value;
    uint8_t bytes[sizeof(T)];
}

What do you think? Is this a good idea in general? If so, what's the right way
to implement it?

Also who else should be added to reviewers?

Event Timeline

lawrence_danna created this revision.Oct 20 2019, 6:29 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 20 2019, 6:29 PM
Herald added a subscriber: dexonsmith. · View Herald Transcript

moved MakeInvalid and IsValid into the trait struct.

I think this is a lot more type correct, as someone would really
have to go out of their way to misuse the trait struct and make an
invalid T instead of using Optional.

remove empty lines

Herald added a project: Restricted Project. · View Herald TranscriptOct 21 2019, 3:39 PM
labath added a subscriber: dblaikie.

Lets loop in @dblaikie for start, and see how it goes from there. You can also check who modified/reviewed changes to this file lately..

I am a very big opponent of IsValid() methods, and so I am very much in favour of this idea. However, I expect it to be moderately controversial (which is part of the reason why I haven't tried proposing anything like that yet).

The thing I would like to see here is to have this behaviour be configurable via a traits argument of the Optional class, similarly to how DenseMap allows the type to specify default traits by specialising DenseMapInfo (I recommend taking a look at that for inspiration/consistency), but then a specific user can also override that and use a custom traits class. That would be particularly useful for builtin types -- one cannot pick any uint32_t value as an "invalid value" in general, but for a particular application, there usually is one free value. This would allow us to replace all the pid == LLDB_INVALID_PROCESS_ID checks in lldb with this new optional class.

Another thing to consider is that llvm::Optional tries to follow the interface of the (upcoming) std::optional, which does not have this configurability. This would put it in the same category as DenseMap (DenseOptional :P ?), which matches the interface (mostly) but has our own specific optimizations.

All that said, I don't think this needs to block your work on the python classes in lldb. IIUC, all/most of them are just temporaries anyway, and so their size does not matter that much (and can be often optimized away).

ugh I edited the wrong revision, reverted. Sorry for the noise.

fix some silly bugs

call the traits struct OptionalInfo for consistency with DenseMap

The thing I would like to see here is to have this behaviour be configurable via a traits argument of the Optional class, similarly to how DenseMap allows the type to specify default traits by specialising DenseMapInfo (I recommend taking a look at that for inspiration/consistency),

So that would mean actually adding the Info class as a template parameter to Optional, right? It looks like that would be quite disruptive, as there's other headers that predeclare Optional as taking only a single template parameter.

I'm not sure I understand what the advantage of having that extra level of customization would be. Are there situations where OptionalInfo<Foo> is specialized in Foo.h, but some other code is going to want to use its own Info struct to customize Foo optionals? I can see how you'd want that for maps, because different code has different requirements for their maps, even with the same key types. But it seems like for Optionals there's probably one right way to make an Optional<T> for each T.

add a comment explaining OptionalInfo

Harbormaster completed remote builds in B39893: Diff 225984.

(keeping the ball rolling) lYou want to exploit the presence of a sentinel in the domain of values of a given type. Maybe make that explicit through the method naming instead of using the isValid theme.

Some bibliography on the subject: https://groups.google.com/a/isocpp.org/forum/#!topic/std-proposals/46J1onhWJ-s/discussion.

On a personal note, as much as I like the idea (and I really do like it), I'd rather have llvm::optional sticking to std::optional interface.

On a personal note, as much as I like the idea (and I really do like it), I'd rather have llvm::optional sticking to std::optional interface.

(short-ish reply, while busy with the dev meeting)
Yep, that's about where I am - I think llvm::Optional should move towards the std::optional API. And I don't think maintaining a separate/additional type for this functionality is worth the complexity of having a distinct type compared to something more and more people will be readily familiar with (std::optional).

But, like serge, I really do like the idea of exposing one or more "secret" States from types to use spare bits, padding-ish bits, etc, for DenseMap, PointerIntPair/PointerUnion, etc. (be nice to unify these extra state bits, even, if possible, across all these types so you could potentially compose them - if you've got 3 spare bits in your underlying representation (say, an aligned pointer) perhaps you could have a DenseSet<Optional<T*>> with elements that are only sizeof(T*) for instance)

llvm/include/llvm/ADT/Optional.h
216

Slight note (borrowed from this thread.

This implementation, compared to the generic one, is not type-safe and relies on an assert on y to check whether it's the sentinel or not. This is a slight change that may matter (somehow trading type-safety for performance).

Possible exploration path: if the sentinel has its own type (e.g. std::nullptr_t) we somehow improve type safety, but still don't have the same guarantee as the generic implementation.

Seems like there's a consensus that if we have something like this it should be called DenseOptional, and changes to Optional should only make it more like std::optional

Seems like there's a consensus that if we have something like this it should be called DenseOptional, and changes to Optional should only make it more like std::optional

I am not sure I would call that a "consensus" -- my interpretation of @dblaikie's comment is that he does not want an extra dense optional type either. Which I disagree with, because I think it would be useful to have some form of a compressed optional storage, as that's something that is currently often implemented via bitfields, magic invalid values, etc. That said, I think you have convinced me that having different optional representations for a single type is not a good idea. It's probably better to use some form of a "strong" typedef to achieve that instead.

Seems like there's a consensus that if we have something like this it should be called DenseOptional, and changes to Optional should only make it more like std::optional

I am not sure I would call that a "consensus" -- my interpretation of @dblaikie's comment is that he does not want an extra dense optional type either. Which I disagree with, because I think it would be useful to have some form of a compressed optional storage, as that's something that is currently often implemented via bitfields, magic invalid values, etc.

Open to disagreement - I don't feel super strongly. Though if we're going to diverge from std::optional & have functionality like this - maybe, yeah, we just rename what we have to "DenseOptional" and use it everywhere. Maybe not even rename it? Maybe we just accept our Optional will be different/better (for us) than the standard one indefinitely?

That said, I think you have convinced me that having different optional representations for a single type is not a good idea. It's probably better to use some form of a "strong" typedef to achieve that instead.

I've not followed this part of the thread properly - could you/someone rephrase the concerns here?

That said, I think you have convinced me that having different optional representations for a single type is not a good idea. It's probably better to use some form of a "strong" typedef to achieve that instead.

I've not followed this part of the thread properly - could you/someone rephrase the concerns here?

Sure. lldb currently has types like lldb::pid_t, and uses identifiers like LLDB_INVALID_PID for the "None" value. I wanted to define pid_t as something like Optional<some_integer_type, PIDInfo>, where PIDInfo would specify what is the representation of an "invalid" pid. The reason I abandoned this idea was that this arrangement makes it impossible to represent (in the type system) a pid which is always valid. I now think that it would be better (at least for this use case) to have `pid_t be a "strong" typedef of the integer type. At that point you can specify an "invalid" value for this type even without a separate Optional template argument...

That said, I think you have convinced me that having different optional representations for a single type is not a good idea. It's probably better to use some form of a "strong" typedef to achieve that instead.

I've not followed this part of the thread properly - could you/someone rephrase the concerns here?

Sure. lldb currently has types like lldb::pid_t, and uses identifiers like LLDB_INVALID_PID for the "None" value. I wanted to define pid_t as something like Optional<some_integer_type, PIDInfo>, where PIDInfo would specify what is the representation of an "invalid" pid. The reason I abandoned this idea was that this arrangement makes it impossible to represent (in the type system) a pid which is always valid. I now think that it would be better (at least for this use case) to have `pid_t be a "strong" typedef of the integer type. At that point you can specify an "invalid" value for this type even without a separate Optional template argument...

Ah, thanks - yeah, agreed. The invalid state should always be hidden/non-public - though I'm not sure that's quite true for DenseMap/Set/etc - yeah, the DenseMapInfo traits use ~0 and ~0-1 as the empty and tombstone values for integer types.

So I think we should generalize DenseMapInfo traits (or perhaps build DenseMapInfo traits, that need two states, on top of the OptionalInfo trait that only needs one) into some generalized "I'd like this many bits of state" & yeah, probably a default trait argument to Optional - and, yes, I agree that it's generally better to provide a wrapper type that excludes the invalid states from being publicly accessible, but I guess folks have found it useful to be able to have a DenseSet<int> without worrying too much about the fact they can't store the max int in that set (because it's the empty value).

lawrence_danna abandoned this revision.Oct 29 2019, 8:16 PM

thanks for the feedback, I'll come back with a "Dense" version of this at some point

BTW, I was bored on the plane, so I created this proof of concept

. It needs a lot of cleanup of course, but it demonstrates one way to generically store some extra data with some type. It gives each type the option to define a bunch of unused (the patch calls them "free") values (which can be useful for for Optional<>s, DenseMaps, etc.). Also each type can say that it is able to store some extra unrelated bits (like PointerIntPair does), and with a bit of care, one can even nest these types arbitrarily). the main thing I haven't yet figured out is the relationship with DenseMapInfo. Theoretically DenseMapInfo could be implemented on top of DenseInfo, but I am still wondering whether we could not make that a single type (trait) somehow...

BTW, I was bored on the plane, so I created this proof of concept

. It needs a lot of cleanup of course, but it demonstrates one way to generically store some extra data with some type. It gives each type the option to define a bunch of unused (the patch calls them "free") values (which can be useful for for Optional<>s, DenseMaps, etc.). Also each type can say that it is able to store some extra unrelated bits (like PointerIntPair does), and with a bit of care, one can even nest these types arbitrarily). the main thing I haven't yet figured out is the relationship with DenseMapInfo. Theoretically DenseMapInfo could be implemented on top of DenseInfo, but I am still wondering whether we could not make that a single type (trait) somehow...

Neat - yeah, I'd certainly like to see something along those lines, including unification with/replacement of DenseMapInfo (though DenseMapInfo needs other things like hashing, so I'm not sure we'd necessarily want to replace them - I guess it could be a trait class with "optional" features - no reason an implementation necessarily has to have all 4 functions of DenseMapInfo - but maybe separating it would be better to express intent (DenseMapInfo builds on DenseInfo and adds the requirement for hashing and equality)).

I guess there's an open question to me as to whether DenseOptional should just be the only optional we use, and it could fallback to non-dense when the type didn't provide some extra states - or should it be like DenseMap where you just can't use it and have to use std::map or similar once your type doesn't have extra states? I don't know that I feel super strongyl either way - I don't think we have lots of hypergeneric code that wants to DenseOptional<T> over some unknown T that might have spare bits of state and T that might not.