Page MenuHomePhabricator

Add lifetime categories attributes
ClosedPublic

Authored by mgehre on Jun 28 2019, 1:25 PM.

Details

Summary

Joint work with @xazax.hun.

This is the first part of work announced in
"[RFC] Adding lifetime analysis to clang" [0],
i.e. the addition of the [[gsl::Owner(T)]] and
[[gsl::Pointer(T)]] attributes, which
will enable user-defined types to participate in
the lifetime analysis (which will be part of the
next PR).
The type T here is called "DerefType" in the paper,
and denotes the type that an Owner owns and a Pointer
points to. E.g. std::vector<int> should be annotated
with [[gsl::Owner(int)]] and
a std::vector<int>::iterator with [[gsl::Pointer(int)]].

We explicitly allow to add an annotation after
the definition of the class to allow adding annotations
to external source from by the user, e.g.

#include <vector>

namespace std {
template<typename T, typename Alloc>
class [[gsl::Owner(T)]] vector;
}

until we can ship a standard library with annotations.

[0] http://lists.llvm.org/pipermail/cfe-dev/2018-November/060355.html

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
aaron.ballman added inline comments.Jul 1 2019, 7:08 AM
clang/include/clang/Basic/AttrDocs.td
4186 ↗(On Diff #207235)

Please keep the newline at the end of the file.

clang/lib/Sema/SemaDeclAttr.cpp
4537 ↗(On Diff #207235)

Under what circumstances is this check needed? I believe this is already automatically handled for you.

4546–4547 ↗(On Diff #207235)

Is void really the only problematic type? What about incomplete types, for instance?

4553 ↗(On Diff #207235)

Will this work? What happens if we see a forward declaration with this attribute first and then see the canonical declaration later? I suspect this work needs to be done when merging declaration attributes instead.

4554 ↗(On Diff #207235)

Formatting.

4557 ↗(On Diff #207235)

Please use a name other than Attr (as that's a type name).

clang/test/SemaCXX/attr-gsl-owner-pointer.cpp
61 ↗(On Diff #207235)

Please add a newline to the end of the file.

We explicitly allow to add an annotation after the definition of the class to allow adding annotations to external source from by the user

Asking users to forward-declare anything from the standard library is a very bad idea -- in fact it is undefined behavior. https://stackoverflow.com/questions/307343/forward-declare-an-stl-container

The compiler should just know about standard library types and attach attributes automatically.

We explicitly allow to add an annotation after the definition of the class to allow adding annotations to external source from by the user

Asking users to forward-declare anything from the standard library is a very bad idea -- in fact it is undefined behavior. https://stackoverflow.com/questions/307343/forward-declare-an-stl-container

The compiler should just know about standard library types and attach attributes automatically.

I agree. In a follow-up patch we will add the attributes to STL types during parsing. Do you think it is OK to always add the attributes or should we only do it if a flag, e.g. -wlifetime is added to the compiler invocation?

On the other hand I still think it is useful to give the users the option to maintain a header with forward declarations to add the annotations to other (non-standard) 3rd party types. These headers might be fragile to 3rd party changes but could still be a better option than maintaining patches on top of 3rd party libraries. Having API notes upstream would be a superior solution here and I might look into upstreaming it at some point, but I think this is something that could be addressed later on. What do you think?

clang/include/clang/Basic/Attr.td
2770 ↗(On Diff #207235)

I will double check this. Her Sutter's paper have this spelling and I believe that the Microsoft implementation is following the same spelling. The main difference is that the MSVC version does not have a type argument at this point but they do plan to add it as an optional argument. (They want to infer the pointee type when it is not provided.) The clang community do not want the inference, hence we made the type argument required. Always providing the type argument should be compatible with future versions of the MSVC implementation.

clang/lib/Sema/SemaDeclAttr.cpp
4564 ↗(On Diff #207138)

We actually allow duplicated attributes as long as they are not conflicting. The reason why we introduced a new diagnostic is that the source of the problem is the conflict, i.e.: the two attributes are contradicting each other. Not the duplication.

4537 ↗(On Diff #207235)

It is handled automatically indeed, thanks!

4546–4547 ↗(On Diff #207235)

I think incomplete types are fine. The only information we actually need here is the name the pointee types. E.g. for a SomeOwner<Incomplete> which is annotated Owner(Incomplete), we will assume that a method returning Incomplete*, std::reference_wrapper<Incomplete> etc will return an object pointing to the memory owned by SomeOwner.

4553 ↗(On Diff #207235)

For TagDecls the canonical decl is the first declaration:

TagDecl *TagDecl::getCanonicalDecl() { return getFirstDecl(); }

So I suspect we can never see a non-canonical declaration first. And once we see the canonical declaration it remains the same no matter how many new declaration do we see.

aaron.ballman added inline comments.Jul 1 2019, 9:08 AM
clang/lib/Sema/SemaDeclAttr.cpp
4553 ↗(On Diff #207235)

This is the scenario I am worried about:

struct [[whatever]] Foo;
struct Foo {};
xazax.hun added inline comments.Jul 1 2019, 9:33 AM
clang/lib/Sema/SemaDeclAttr.cpp
4553 ↗(On Diff #207235)

This works well as the forward declaration is the canonical declaration since it is the first declaration in the redecl chain. A declaration being canonical or being a definition are two independent properties.

The only caveat here is that the users of these attributes always need to query the attribute of the canonical decl.

I agree. In a follow-up patch we will add the attributes to STL types during parsing. Do you think it is OK to always add the attributes or should we only do it if a flag, e.g. -wlifetime is added to the compiler invocation?

Warning flags should not change the AST -- compiler engineers don't expect that. So if Clang is going to perform inference, it should either always do it, or it should be guarded by an -f flag, not a -W flag.

On the other hand I still think it is useful to give the users the option to maintain a header with forward declarations to add the annotations to other (non-standard) 3rd party types. These headers might be fragile to 3rd party changes but could still be a better option than maintaining patches on top of 3rd party libraries. Having API notes upstream would be a superior solution here and I might look into upstreaming it at some point, but I think this is something that could be addressed later on. What do you think?

I think it is acceptable for 3rd party libraries, but there's already a solution for 3rd party libraries -- API notes in Swift's fork of Clang. It has been successfully used by Apple for 5 years for this exact purpose (adding attributes to existing libraries without changing headers), and only needs to be upstreamed to Clang.

I agree. In a follow-up patch we will add the attributes to STL types during parsing. Do you think it is OK to always add the attributes or should we only do it if a flag, e.g. -wlifetime is added to the compiler invocation?

Warning flags should not change the AST -- compiler engineers don't expect that. So if Clang is going to perform inference, it should either always do it, or it should be guarded by an -f flag, not a -W flag.

Thanks, this make sense. My only concern would be that a -W flag without the "inference" for STL types would be useless. Is it ok to make the driver add a -f flag automatically if a warning is turned on or would you find that confusing as well?

On the other hand I still think it is useful to give the users the option to maintain a header with forward declarations to add the annotations to other (non-standard) 3rd party types. These headers might be fragile to 3rd party changes but could still be a better option than maintaining patches on top of 3rd party libraries. Having API notes upstream would be a superior solution here and I might look into upstreaming it at some point, but I think this is something that could be addressed later on. What do you think?

I think it is acceptable for 3rd party libraries, but there's already a solution for 3rd party libraries -- API notes in Swift's fork of Clang. It has been successfully used by Apple for 5 years for this exact purpose (adding attributes to existing libraries without changing headers), and only needs to be upstreamed to Clang.

Supporting the forward declarations way is only a few lines of code now, supporting API Notes is a much larger effort. I would also prefer the latter but I think having this work not blocked on upstreaming API notes is essential to get this upstreamed. Or would you prefer not supporting the 3rd party library use-case until API Notes are upstreamed?

mgehre updated this revision to Diff 207381.Jul 1 2019, 11:23 AM
mgehre marked 5 inline comments as done.
  • Address more review comments.
  • git-clang-format
mgehre marked an inline comment as done.Jul 1 2019, 11:33 AM

I agree. In a follow-up patch we will add the attributes to STL types during parsing. Do you think it is OK to always add the attributes or should we only do it if a flag, e.g. -wlifetime is added to the compiler invocation?

Warning flags should not change the AST -- compiler engineers don't expect that. So if Clang is going to perform inference, it should either always do it, or it should be guarded by an -f flag, not a -W flag.

Thanks, this make sense. My only concern would be that a -W flag without the "inference" for STL types would be useless. Is it ok to make the driver add a -f flag automatically if a warning is turned on or would you find that confusing as well?

Why not always attach attributes to standard library types? I only suggested the -f flag in case it is necessary to gate this functionality for some reason (e.g., performance).

On the other hand I still think it is useful to give the users the option to maintain a header with forward declarations to add the annotations to other (non-standard) 3rd party types. These headers might be fragile to 3rd party changes but could still be a better option than maintaining patches on top of 3rd party libraries. Having API notes upstream would be a superior solution here and I might look into upstreaming it at some point, but I think this is something that could be addressed later on. What do you think?

I think it is acceptable for 3rd party libraries, but there's already a solution for 3rd party libraries -- API notes in Swift's fork of Clang. It has been successfully used by Apple for 5 years for this exact purpose (adding attributes to existing libraries without changing headers), and only needs to be upstreamed to Clang.

Supporting the forward declarations way is only a few lines of code now, supporting API Notes is a much larger effort. I would also prefer the latter but I think having this work not blocked on upstreaming API notes is essential to get this upstreamed. Or would you prefer not supporting the 3rd party library use-case until API Notes are upstreamed?

I would prefer an API Notes-based approach. And I would suggest that you, or someone else working on these attributes, help with upstreaming.

The reason why I prefer API notes is because users forward declaring third party APIs will put those libraries in a difficult situation, where those libraries can't use normal evolution processes to change their APIs -- the forward declarations that users have will stop compiling, or will start introducing additional overloads which don't have definitions. Of course severity of the problem depends on the nature of such forward declarations -- would it only be for owner and pointer types, or for some other APIs? Just the types -- and rare types like pointers -- is not that bad, however forward declaring functions is bad.

With API notes users have all the extra stuff out of line, and it is easy to disable when things go sideways.

When forward declarations are sprinkled in users' code, they are impossible to change.

mgehre added a comment.Jul 1 2019, 1:19 PM

I understand that putting types on forward declared header (standard libraries / third party libraries) is not a approach we should favor.
Thank you for your good arguments.
API notes seems like a really good approach for this (I would like to help with the upstreaming), and in the meanwhile
I will work on hard-coding attributes for standard library types in clang for a follow-up PR. I think we could have them always attached (I will measure performance),
so that -Wlifetime will not change the AST.

mgehre updated this revision to Diff 207400.Jul 1 2019, 1:43 PM
  • Add newline at end of file
mgehre updated this revision to Diff 207613.Jul 2 2019, 1:08 PM
  • Make the list of attribute properties more consistent.
xazax.hun added inline comments.Jul 9 2019, 10:52 AM
clang/include/clang/Basic/AttrDocs.td
4177 ↗(On Diff #207235)

I think it is referenced implicitly after the comma by it. Would you prefer to get rid of the name P and only say a class?

gribozavr added inline comments.Jul 10 2019, 2:12 AM
clang/include/clang/Basic/AttrDocs.td
4164 ↗(On Diff #207613)

Slightly better: "In a class annotated with ..., each function that returns ... is assumed to ..."

However, the "is assumed" part throws me off. There's no assumption here -- this is what this attribute means, intentionally, by design, no guessing involved.

So how about this?

The attribute [[gsl::Owner(T)]] applies to structs and classes that own an object of type T:

[[gsl::Owner(int)]] class IntOwner {
private:
  int value;
public:
  int *getInt() { return &value; }
  long *getLong();
};

In a class annotated like this each member function that returns T& or T* returns a pointer/reference to the data owned by the class instance:

IntOwner int_owner;
int *x = int_owner.getInt();
// `x` points to memory owned by `int_owner`.

Therefore, that the lifetime of that data ends when ... For example:

int *x = IntOwner().getInt();
// `x` points to memory owned by the temporary `IntOwner()`, which has been destroyed now.

long *y = IntOwner().getLong();
// Can't make any conclusion about validity of `y`.
4177 ↗(On Diff #207613)

Ditto, "A class annotated with ... is assumed to be ..."

However, again, I feel like "assume" is the wrong word to use.

"The attribute [[gsl::Pointer(T)]] applies to structs and classes that behave like pointers to an object of type T:

class [[gsl::Pointer(int)]] IntPointer {
private:
  int *valuePointer;
public:
  // fill it in with appropriate APIs
};

Classes annotated like this behave like regular pointers: they can either point to objects of type T, or dangle (<<<explain what it means>>>).

For example: ... sample code and a static analysis message...

clang/lib/Sema/SemaDeclAttr.cpp
4546–4547 ↗(On Diff #207235)

What about array types? References?

gsl::Owner(int[])
gsl::Owner(int&)

aaron.ballman added inline comments.Jul 10 2019, 5:54 AM
clang/include/clang/Basic/Attr.td
2770 ↗(On Diff #207235)

I take this to mean that we're not implementing the same attribute as MSVC is, and that worries me.

Always providing the type argument should be compatible with future versions of the MSVC implementation.

The problem is that code written for MSVC (where the type argument is not required) will not compile with Clang (where the type argument is required), if I understand properly.

Generally speaking, when we implement an attribute from another vendor namespace, we expect the attribute to have the same semantics as whoever "owns" that vendor namespace. It's a bit trickier here because gsl isn't really a vendor so much as a collective, so I don't know who is the authority to turn to.

On a completely different note: would this attribute also make sense within C with a C2x spelling?

clang/lib/Parse/ParseDecl.cpp
335–341 ↗(On Diff #207613)

I don't like how this short-circuits the remainder of the logic in this function. I'd rather see this one-off logic handled in ParseClangAttributeArgs() if we have to treat it as a one-off. A better approach would be to implement type arguments within ParseAttributeArgsCommon().

clang/lib/Sema/SemaDecl.cpp
2618–2619 ↗(On Diff #207613)

I'd like to see some tests for this.

I'm a bit worried about whether this will do good things in practice or not. I'm concerned about situations like:

struct S;

void func(S *s) {
  // Interesting things happen here.
}

struct [[gsl::Pointer]] S {};

and whether the "interesting things" will be properly [un]diagnosed.

xazax.hun added inline comments.Jul 11 2019, 12:13 PM
clang/include/clang/Basic/Attr.td
2770 ↗(On Diff #207235)

I see, that is a valid concern. A followup patch makes the type argument optional so it will be fully compatible with MSVC. I don't think it makes sense with C or at least I am not sure what would I annotate using these attributes in C code as the subjects are classes. I know that it is possible to simulate classes in C but I am not sure how well the analysis would work in such cases.

Are you OK with making the argument optional in a followup patch or do you want us to cherry-pick that modification into this one?

aaron.ballman added inline comments.Jul 11 2019, 1:51 PM
clang/include/clang/Basic/Attr.td
2770 ↗(On Diff #207235)

I don't think it makes sense with C or at least I am not sure what would I annotate using these attributes in C code as the subjects are classes. I know that it is possible to simulate classes in C but I am not sure how well the analysis would work in such cases.

I couldn't think of a reason to expose this in C either, so I think the current spelling is good as-is.

Are you OK with making the argument optional in a followup patch or do you want us to cherry-pick that modification into this one?

I'd prefer the argument be optional in this patch.

mgehre updated this revision to Diff 209750.Jul 14 2019, 3:31 PM
mgehre marked 23 inline comments as done.
  • Don't crash when pretty-printing an Attribute with optional type argument
  • Don't crash AST dump when Owner/Pointer has no Type argument
  • gsl::Owner/gsl::Pointer: Make DerefType parameter optional
  • Unify type and non-type attribute parsing; don't add OwnerAttr/PointerAttr twice
  • Improve documentation
clang/include/clang/Basic/Attr.td
2770 ↗(On Diff #207235)

The argument is now optional.

clang/include/clang/Basic/AttrDocs.td
4164 ↗(On Diff #207613)

What I meant to express by "is assumed to" is that this attribute does not change what member functions do. E.g. if I had int* f() { static int i; return &i; }, then f would not return data owned by the class instance, even when annotating the class with [[gsl::Owner(T)]]. The point is that when member functions with return type T*/T& don't return data owned by the class instance, one should not add the [[gsl::Owner(T)]] attribute - or the analysis will produce wrong results. Here, we would warn if code uses the returned value of f after the class instances has been destroyed even though it would be safe.

There is a chicken and egg problem here. With this PR, we don't get any analysis yet, so providing examples of analysis results like "x points to memory owned by the temporary IntOwner(), which has been destroyed now." can be misleading. On the other hand, if we would just document the current meaning of the attribute, it would be nothing.
The implication "member function that returns T& or T* returns a pointer/reference to the data owned by the class instance" is just one part of being an Owner. Herb's paper
uses this in more ways (e.g. output arguments, handling of move semantics, etc.), which we would implement incrementally together with the respective documentation update.

So I propose to keep the documentation basic for this PR and update the documentation with the upcoming PRs that add to the analysis. Would that be acceptable?
I added a note that the attribute is currently experimental and subject to change.

clang/lib/Parse/ParseDecl.cpp
335–341 ↗(On Diff #207613)

I integrated parsing of type attributes now into the main logic of ParseAttributeArgsCommon()

clang/lib/Sema/SemaDecl.cpp
2618–2619 ↗(On Diff #207613)

I have removed this part of the PR. Implementing implicit attributes for std types went well without requiring to allow "late" additions of attributes.

gribozavr added inline comments.Jul 15 2019, 10:12 PM
clang/include/clang/Basic/AttrDocs.td
4164 ↗(On Diff #207613)

What I meant to express by "is assumed to" is that this attribute does not change what member functions do.

I don't think we should be making allowances for code that uses an attribute to promise something and then does some other thing. In that model, there's alignment between what code does and what attribute says.

And I mean of course the attribute does not change what the code does... that must be obvious but you can mention it explicitly if you want.

With this PR, we don't get any analysis yet

I don't think the analysis must be automatic in order to show what analysis can be done. Attribute documentation needs enough examples to give people a flavor of what the attribute does, and what possible benefits it can bring.

Documentation for the specific analysis that is actually done should go elsewhere into Clang's documentation (if necessary).

mgehre updated this revision to Diff 211008.Jul 21 2019, 1:49 PM
mgehre marked 6 inline comments as done.
  • Disallow reference and array types as DerefType
  • Add example to documentation.

I think that I have addressed all open comments.

clang/include/clang/Basic/AttrDocs.td
4164 ↗(On Diff #207613)

I added an example to the documentation to show how the initial analysis will look like to give a feeling for what the attribute does.

clang/lib/Sema/SemaDeclAttr.cpp
4546–4547 ↗(On Diff #207235)

Good catch! I disallow them now and added appropriate tests.

aaron.ballman added inline comments.Jul 23 2019, 8:14 AM
clang/include/clang/Basic/AttrDocs.td
4167 ↗(On Diff #211008)

Do either of these attributes make sense on a union type? If so, might be worth mentioning unions here and below. If not, would it be worth diagnosing on a union?

clang/include/clang/Basic/DiagnosticSemaKinds.td
2516 ↗(On Diff #211008)

Can you combine this one with err_attribute_argument_vec_type_hint?

clang/lib/Parse/ParseDecl.cpp
385 ↗(On Diff #211008)

Can you add a FIXME prefix to the comment?

clang/lib/Sema/SemaDeclAttr.cpp
4550 ↗(On Diff #211008)

This should say a reference type, but I prefer seeing this done in a %select{} within the diagnostic rather than manually adding string literals. This will also simplify the logic down to:

unsigned SelectIdx = ~0U;
if (ParamType->isVoidType())
  SelectIdx = 0;
else if (ParamType->isReferenceType())
  SelectIdx = 1;
else if (ParamType->isArrayType())
  SelectIdx = 2;

if (SelectIdx != ~0U) {
  S.Diag(...) << SelectIdx << AL;
  return;
}
mgehre updated this revision to Diff 211370.Jul 23 2019, 3:54 PM
mgehre marked 4 inline comments as done.
  • Diagnose unions; improve other diagnostics; fix all comments
mgehre marked 2 inline comments as done.Jul 23 2019, 3:54 PM
mgehre added inline comments.
clang/include/clang/Basic/AttrDocs.td
4167 ↗(On Diff #211008)

Good catch! I added diagnostics and a test.

clang/include/clang/Basic/DiagnosticSemaKinds.td
2516 ↗(On Diff #211008)

I'm not sure: vec_type_hint reads "invalid attribute argument %0 - expecting a vector or vectorizable scalar type" and here ""%0 is an invalid argument to attribute %1", i.e. one is positive ("expecting ...") and the other is negative ("%0 is an invalid argument").

I don't know how to describe "not void, not reference, not array type" in terms of "expecting ...", and I think that we should keep "expecting a vector or vectorizable scalar type" on the VecTypeHint attribute diagnostic.

aaron.ballman added inline comments.Jul 24 2019, 5:13 AM
clang/include/clang/Basic/Attr.td
2771 ↗(On Diff #211370)

This subject should be Struct and same below.

clang/include/clang/Basic/DiagnosticSemaKinds.td
2516 ↗(On Diff #211008)

I had figured it would be something like %select{'void'|a reference type|an array type|a non-vector or non-vectorizable scalar type}0 is an invalid argument to attribute %1 but I am not strongly opposed if you want to keep the diagnostics separate.

2933 ↗(On Diff #211370)

You can drop this change when the Subjects field is fixed above.

clang/include/clang/Sema/ParsedAttr.h
1037 ↗(On Diff #211370)

Same

clang/lib/Sema/SemaDeclAttr.cpp
4538–4542 ↗(On Diff #211370)

You can drop this bit when the Subjects field is fixed above.

clang/test/Misc/pragma-attribute-supported-attributes-list.test
119 ↗(On Diff #211370)

These will also wind up needing to be updated when switching the Subjects field.

clang/test/SemaCXX/attr-gsl-owner-pointer-parsing.cpp
2 ↗(On Diff #211370)

This file tests part of parsing and part of Sema somewhat interchangeably. I'm not strictly opposed to it being that way, but it would be a bit cleaner to have the parsing tests in the Parser directory and the rest in SemaCXX (that also reduces the confusion from the file name).

mgehre marked 9 inline comments as done.Jul 24 2019, 2:19 PM
mgehre added inline comments.
clang/test/SemaCXX/attr-gsl-owner-pointer-parsing.cpp
2 ↗(On Diff #211370)

I had split the tests into two files due to a misconception that the ast-dump would not work when errors are emitted. It actually does work, and so I recombined the test cases.

mgehre updated this revision to Diff 211600.Jul 24 2019, 2:19 PM
mgehre marked an inline comment as done.
  • Fix all comments
aaron.ballman accepted this revision.Jul 24 2019, 3:31 PM

LGTM aside from a minor formatting nit. Thank you for working on these attributes!

clang/include/clang/Basic/DiagnosticSemaKinds.td
2517 ↗(On Diff #211600)

80-col limit

This revision is now accepted and ready to land.Jul 24 2019, 3:31 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJul 25 2019, 10:51 AM

Thank you very much for dedicating your time for the review. It improved the code a lot!

gribozavr added inline comments.Jul 29 2019, 7:00 AM
cfe/trunk/include/clang/Basic/AttrDocs.td
4252

"and is currently ignored"

even better:

"and is ignored"

It does not matter for the user that we might change it in future. We might change *anything* in future, and yet we don't hedge everywhere.

4278

ditto

cfe/trunk/test/SemaCXX/attr-gsl-owner-pointer.cpp
34

Technically correct, but not very helpful for the user.

It would be better if the message was tailored for the attribute, for example,

owner types can't own an object of type 'void'

I will include your latest comments into D64448