This is an archive of the discontinued LLVM Phabricator instance.

Clang extensions yolo, woot & kaboom
Needs ReviewPublic

Authored by beanz on Jul 18 2022, 6:15 PM.

Details

Summary

Sometimes a default-initialized value isn't really properly
initialized. In most cases these are handled with assertions, but...
what if we could annotate our code to allow the compiler to detect
cases where "uninitialized" user-defined types get used.

Enter yolo, woot and kaboom!
(proper names pending)

The yolo attribute denotes a constructor as creating an
"uninitialized" variable.

The woot attribute appliest to member functions noting that they
initialize the base object.

The kaboom attribute applies to member functions noting that they
must have an initialized base object.

These additional annotations ensure that at least one woot function
must be called on an object initialized with a yolo constructor
before any kaboom functions can be called otherwise a diagnostic will
be issued via the standard -Wuninitialized diagnostics.

Diff Detail

Event Timeline

beanz created this revision.Jul 18 2022, 6:15 PM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: StephenFan. · View Herald Transcript
beanz requested review of this revision.Jul 18 2022, 6:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 18 2022, 6:15 PM
NoQ added a subscriber: Szelethus.Jul 18 2022, 10:59 PM

@Szelethus this patch is closely related to your prior work!

Ok yeah we'll need names. I recommend at least explaining it a bit in the review head so that people had at least some idea what is this about.

Yeah, I'm afraid no fun is allowed on this block. On another note, kaboom is interesting, shouldn't we assume all functions to be kaboom unless proven to be woot?

@balazske's work on checking the return value of certain function calls is related to this as well!

beanz added a comment.Jul 19 2022, 6:52 AM

On another note, kaboom is interesting, shouldn't we assume all functions to be kaboom unless proven to be woot?

I won’t claim to have spent a whole lot of time thinking on this. The idea came to me yesterday and I just wrote it up. My thought process was this:

The existing analysis errs to the side of not diagnosing anything it isn’t sure is an issue, so by default operations are ignored. Ignored operations are important in some use cases where initialization might be more than one call, like a builder pattern. They could instead be their own annotation, I propose ‘schroedinger’.

jkorous added a subscriber: jkorous.EditedJul 19 2022, 10:42 AM

Hi Chris! This is a very interesting idea!

I do have couple thoughts - mostly that this could lead to something great and I would love it to apply to as many relevant cases as possible.

It looks like there is a possibility that a free function, static method or a method of another class (a friend?) should be woot for a specific pointer/reference parameter.

struct Foo {int a};
void init_Foo(Foo& f) { f.a = 42; }

In the same spirit as the above I think that kaboom should be applicable to functions in general.

BTW - if we generalize woot and kaboom - won't we get support for built-in types as a side-effect and won't that give us C support as a side-effect?

Last but not least - it would be really interesting to have the ability to check and diagnose improper use of these attributes.
One specific scenario that seems interesting to me is the ability to verify also absence of an attribute. (That could in-turn potentially unlock more aggressive analysis at call-sites.)
Example:

struct Bar {
  int a, b;
  Bar() : a(0) { } // <--- warning: constructor should be [[yolo]]
};

...and if we had that - we might potentially even collapse woot and !yolo together?
I can imagine that some parts of C++ would be tricky - e. g. templates so not sure how far could this idea be stretched.

It looks like there is a possibility that a free function, static method or a method of another class (a friend?) should be woot for a specific pointer/reference parameter.

Oh! That's a fascinating idea I hadn't considered. As implemented I only applied the woot-iness of a function to it's this parameter.

In the same spirit as the above I think that kaboom should be applicable to functions in general.

I'd have to double check, but I'm pretty sure non-member functions get treated as Use in the analysis today, so they _should_ implicitly be kaboom.

BTW - if we generalize woot and kaboom - won't we get support for built-in types as a side-effect and won't that give us C support as a side-effect?

I think generally we have support for built-in types through this analysis already. This just piggybacks off the existing support.

Last but not least - it would be really interesting to have the ability to check and diagnose improper use of these attributes.

This would be super cool!

beanz added a comment.Jul 27 2022, 8:32 AM

Any chance this could be a generalization of https://clang.llvm.org/doxygen/Consumed_8cpp_source.html ?

I'm not really familiar with that code, but at glance there seems to be some similarities between the cases covered by Consumed and UninitializedValues. I built off UninitializedValues mostly just because the existing analysis for builtin types seemed consistent with the analysis I needed, so the changes to the analysis are pretty minimal.

IIUC from skimming the code, Consumed also does IPC to track usage across call boundaries, which is really cool, but I don't know if it fully covers the use cases without annotations. For example with a tagged union the default object might be initialized to some known state (Uninitialized and zero'd), but the semantic state of the object is still "uninitialized".

I can imagine a more general implementation of these annotations to basically emulate the states of a state machine. Constructors setting to different default states, methods being annotated as to which states they are valid to call in, and the compiler tracking values through their possible states... that is probably a bit much, and certainly overkill for the problem I'm trying to solve.

Any chance this could be a generalization of https://clang.llvm.org/doxygen/Consumed_8cpp_source.html ?

I'm not really familiar with that code, but at glance there seems to be some similarities between the cases covered by Consumed and UninitializedValues. I built off UninitializedValues mostly just because the existing analysis for builtin types seemed consistent with the analysis I needed, so the changes to the analysis are pretty minimal.

IIUC from skimming the code, Consumed also does IPC to track usage across call boundaries, which is really cool, but I don't know if it fully covers the use cases without annotations. For example with a tagged union the default object might be initialized to some known state (Uninitialized and zero'd), but the semantic state of the object is still "uninitialized".

I can imagine a more general implementation of these annotations to basically emulate the states of a state machine. Constructors setting to different default states, methods being annotated as to which states they are valid to call in, and the compiler tracking values through their possible states... that is probably a bit much, and certainly overkill for the problem I'm trying to solve.

Fair enough - figured it was at least worth mentioning/checking. Thanks!

Thank you for this! I'm starting to get dug out from standards meetings and able to think about this a bit more, and I had some questions.

The yolo attribute denotes a constructor as creating an "uninitialized" variable.

Are there circumstances where we cannot "simply" infer this from the constructor itself? (After instantiation) we know the class hierarchy, we know the data members, and we know the ctor init list/in-class initializers, so it seems like we should be able to recursively mark a constructor as yolo for the user rather than making them write it themselves. It seems like inferring this would reduce false positives and false negatives (in the case the user marks the constructor incorrectly or the code gets out of sync with the marking), and we might only need this attribute in very rare circumstances (or not at all, as a user-facing attribute). Is that an approach you're considering?

The woot attribute appliest to member functions noting that they initialize the base object.

Fully? Partially? I'm trying to decide whether this is only useful for functions named init() and clear() and the like, or whether this is useful for functions like setWhatever() where you can have a partially constructed object that is not fully initialized by any one member function.

The kaboom attribute applies to member functions noting that they must have an initialized base object.

Same question here about fully vs partially initialized. e.g., where you have a partially constructed object and you can only call getWhatever() after setWhatever() has been called.

beanz added a comment.Jul 27 2022, 1:02 PM

Are there circumstances where we cannot "simply" infer this from the constructor itself?

I think this gets tricky in cases where objects may have valid "uninitialized" state. Like a tagged union that hasn't been set might set its tag value to "uninitialized" and zero its storage, which would make it look like all the members are initialized to the compiler, they just happen to be initialized to known sentinel values.

(After instantiation) we know the class hierarchy, we know the data members, and we know the ctor init list/in-class initializers, so it seems like we should be able to recursively mark a constructor as yolo for the user rather than making them write it themselves. It seems like inferring this would reduce false positives and false negatives (in the case the user marks the constructor incorrectly or the code gets out of sync with the marking), and we might only need this attribute in very rare circumstances (or not at all, as a user-facing attribute). Is that an approach you're considering?

I hadn't thought at all about automated propagation. One of the other use cases I was thinking about was std::optional, where you could specify that initialization to std::nullopt is yolo, and then get becomes kaboom.

With these annotations we can conservatively diagnose cases where an optional's value is accessed when uninitialized.

Fully? Partially? I'm trying to decide whether this is only useful for functions named init() and clear() and the like, or whether this is useful for functions like setWhatever() where you can have a partially constructed object that is not fully initialized by any one member function.

I had also thought about this too. As implemented in this patch woot implies fully initialized, but there could be some interesting complex initializations. I could imagine builder patterns where given code like Builder::Create().X().Y().Z(), you might want Y to only be callable after X and Z might be the required final initializer.

Same question here about fully vs partially initialized. e.g., where you have a partially constructed object and you can only call getWhatever() after setWhatever() has been called.

I'm wondering if there could be a generalization of the attribute like:

class TaggedValue {
  enum Kind {
    Uninitialized = 0,
    Integer,
    Float
  };
  Kind VK = Uninitialized;

  union {
    int I;
    float F;
  };
public:
  
  [[clang::yolo]]
  TaggedValue() = default;
  
  TaggedValue(TaggedValue&) = default;

  void hasValue() { return VK == Uninitialized; } // always safe

  [[clang::woot("FloatSet"]] // Marks as safe for functions with matching kaboom arguments
  void set(float V) {
    VK= Float;
    F = V;
  }

  [[clang::woot("IntSet")]] // Marks as safe for functions with matching kaboom arguments
  void set(int V) {
    VK= Integer;
    I = V;
  }

    [[clang::woot]] // Marks as safe for all kaboom functions (because I'm sad)
   void zero() {
     VK= Integer;
     I = 0;
   }

  [[clang::kaboom("FloatSet"]]
  operator float() {
    return F;
  }

  [[clang::kaboom("IntSet")]]
  operator int() {
    return I;
  }
};

This does get into some more complex questions of whether woot would change or append status, and I can see arguments for both so we might need appending_woot too...

Are there circumstances where we cannot "simply" infer this from the constructor itself? (After instantiation) we know the class hierarchy, we know the data members, and we know the ctor init list/in-class initializers, so it seems like we should be able to recursively mark a constructor as yolo for the user rather than making them write it themselves.

The init list can be in a different translation unit (TU) which means we can't compute all properties just from the code in current TU. (Unless we decide to pessimistically consider such unanalyzable constructors as non-initializing.)
Some variation of proposed attributes could work as the cross-TU communication channel that would provide the missing information based on analysis of other TUs.

Example - 1 header, 1 implementation file, 1 file with client code:

  • foo.hpp
  • foo.cpp - implements interface in foo.hpp
  • client_of_foo.cpp - uses the interface from foo.hpp.

We can have per-TU analysis that results in correct attributes in the header:

  • foo.cpp - Analysis of the available implementation can verify correctness of [[yolo]] attributes on related declarations in foo.hpp.
  • client_of_foo.cpp - Analysis of the implementation can take into the account attributes in foo.hpp. (And inductively we should also verify correctness of attributes on related declarations.)

That way we could infer properties based on code in current implementation file and attributes on declarations it uses from headers.
I believe we should be able to have one-TU-at-a-time analysis that could guarantee correctness of the attributes on declarations (or warn otherwise) and that could iteratively give us full-project (global) correctness.
The order in which we analyze TUs shouldn't matter as long as we analyze all of them.

BTW Even for the constructor analysis - the base class initializer list can be in another TU again.

It seems like inferring this would reduce false positives and false negatives

I strongly support the idea that we should not put the burden of not making mistakes on the user if we can make a tool that guarantees or verifies correctness.

... we might only need this attribute in very rare circumstances ...

I think this is orthogonal to the above but agree that changing the default for constructors to be implicitly [[yolo]] could be a win for ergonomics.

I'm wondering if there could be a generalization of the attribute like:

I can't imagine it is possible to design a reasonable and practically usable system of attributes to model anything but very limited subset of interface contracts.
FWIW I would consider the state space of {initialized, uninitialized} as a great first step. Generalizing this idea to per-abstract-property should be entirely possible as a subsequent improvement.

One more thing - the "attributes as a cross-TU metadata" idea might be (possibly with some squinting) conceptually similar to enforce_tcb attribute that @NoQ made me aware of some time ago.
https://clang.llvm.org/docs/AttributeReference.html#enforce-tcb

Are there circumstances where we cannot "simply" infer this from the constructor itself?

I think this gets tricky in cases where objects may have valid "uninitialized" state. Like a tagged union that hasn't been set might set its tag value to "uninitialized" and zero its storage, which would make it look like all the members are initialized to the compiler, they just happen to be initialized to known sentinel values.

Ah, sentinel values in general all have that same general property, good point!

(After instantiation) we know the class hierarchy, we know the data members, and we know the ctor init list/in-class initializers, so it seems like we should be able to recursively mark a constructor as yolo for the user rather than making them write it themselves. It seems like inferring this would reduce false positives and false negatives (in the case the user marks the constructor incorrectly or the code gets out of sync with the marking), and we might only need this attribute in very rare circumstances (or not at all, as a user-facing attribute). Is that an approach you're considering?

I hadn't thought at all about automated propagation.

My experiences with thread safety analysis annotations was that inference (or even a tool to automatically slap the right annotations explicitly on to declarations) was a huge usability win for users. This particular kind of diagnostic tends to be somewhat "viral" in that starting to use the annotations only helps if everything in some subset of your project is annotated. You can start from the leaf nodes in your types to get some benefit, but the most benefit tends to come from composition of these leaf types into more complex types. Reducing the number of annotations the user has to write makes this transition significantly easier if possible to accomplish with good fidelity.

One of the other use cases I was thinking about was std::optional, where you could specify that initialization to std::nullopt is yolo, and then get becomes kaboom.

With these annotations we can conservatively diagnose cases where an optional's value is accessed when uninitialized.

Fully? Partially? I'm trying to decide whether this is only useful for functions named init() and clear() and the like, or whether this is useful for functions like setWhatever() where you can have a partially constructed object that is not fully initialized by any one member function.

I had also thought about this too. As implemented in this patch woot implies fully initialized, but there could be some interesting complex initializations. I could imagine builder patterns where given code like Builder::Create().X().Y().Z(), you might want Y to only be callable after X and Z might be the required final initializer.

I was thinking about our own AST nodes where, for example, you deserialize an AST by creating an empty node and then filling out its structure piecemeal. I think partial construction is a pretty important concept to consider early in the design given its prevalence as a reasonable implementation strategy.

Same question here about fully vs partially initialized. e.g., where you have a partially constructed object and you can only call getWhatever() after setWhatever() has been called.

I'm wondering if there could be a generalization of the attribute like:

class TaggedValue {
  enum Kind {
    Uninitialized = 0,
    Integer,
    Float
  };
  Kind VK = Uninitialized;

  union {
    int I;
    float F;
  };
public:
  
  [[clang::yolo]]
  TaggedValue() = default;
  
  TaggedValue(TaggedValue&) = default;

  void hasValue() { return VK == Uninitialized; } // always safe

  [[clang::woot("FloatSet"]] // Marks as safe for functions with matching kaboom arguments
  void set(float V) {
    VK= Float;
    F = V;
  }

  [[clang::woot("IntSet")]] // Marks as safe for functions with matching kaboom arguments
  void set(int V) {
    VK= Integer;
    I = V;
  }

    [[clang::woot]] // Marks as safe for all kaboom functions (because I'm sad)
   void zero() {
     VK= Integer;
     I = 0;
   }

  [[clang::kaboom("FloatSet"]]
  operator float() {
    return F;
  }

  [[clang::kaboom("IntSet")]]
  operator int() {
    return I;
  }
};

This does get into some more complex questions of whether woot would change or append status, and I can see arguments for both so we might need appending_woot too...

The more I look at this, the more closely it seems to resemble attributes and an analysis pass we already have: capability attributes: https://clang.llvm.org/docs/ThreadSafetyAnalysis.html#basic-concepts-capabilities We implemented those for thread safety analysis initially, but realized there's a more general principle hiding under there in terms of resources and capabilities for using those resources. yolo is acquiring an "uninitialized" capability, woot is releasing the "uninitialized" capability, and kaboom is requiring there not be an "uninitialized" capability. I'm CC'ing in @aaronpuchert and @delesley to see if they also agree with this assessment (both Aaron and DeLesley have been instrumental in this work), but I'm curious if @beanz had seen these and considered them or not.

Are there circumstances where we cannot "simply" infer this from the constructor itself? (After instantiation) we know the class hierarchy, we know the data members, and we know the ctor init list/in-class initializers, so it seems like we should be able to recursively mark a constructor as yolo for the user rather than making them write it themselves.

The init list can be in a different translation unit (TU) which means we can't compute all properties just from the code in current TU. (Unless we decide to pessimistically consider such unanalyzable constructors as non-initializing.)

I was imagining that this analysis is most powerful when done as a cross-TU static analysis pass. However, having markings for when there's less context is definitely a benefit, so that's a great point.