Page MenuHomePhabricator

[LifetimeAnalysis] Add support for lifetime annotations on functions
Needs ReviewPublic

Authored by xazax.hun on Jan 15 2020, 2:12 PM.

Details

Summary

This patch corresponds to this RFC: http://lists.llvm.org/pipermail/cfe-dev/2019-December/064067.html

This does not cover everything yet, but I wanted to keep the patch small to be incremental.

Diff Detail

Event Timeline

xazax.hun created this revision.Jan 15 2020, 2:12 PM

I'd also appreciate if you updated the docs for the changes done in this patch.

clang/include/clang/AST/LifetimeAttr.h
23

"/// An abstract memory location..."

It would also really help to show some examples of the sorts of locations it can express (especially the nested field accesses).

25

I feel like the name is too broad, maybe LifetimeContractVariable?

32

I'd suggest to make the above two constructors into named factories as well. It took me a while to realize that converting a RecordDecl* to ContractVariable actually models a "this" pointer.

72

I'd suggest to add all such is* methods in this commit, and group them next to each other in the source code.

144

These names are not descriptive...

147

I'm sorry, but this comment is missing the most important piece of information -- what this function does.

This comment says "matching the AST that describes the contracts", which does give me a feeling of the compiler component, but I have no idea about how exactly it uses the expression that is passed in, and what happens with the map, and why a SourceRange is returned.

clang/include/clang/Basic/Attr.td
2929

I'd suggest to call it something other than "Expr" (to avoid confusion with or shadowing the type name) -- "ContractExpr" maybe?

clang/include/clang/Basic/DiagnosticSemaKinds.td
3470

I think a warning that is a debugging aid is new territory.

clang/lib/AST/LifetimeAttr.cpp
17

I don't think this comment adds much.

19

I'm not sure what you mean by "return values".

ignoreWrapperASTNodes?

56

The lifetime paper says it is called "global" now:

Renamed static to global to reduce confusion with the keyword.

Also, the paper spells these special names in all lowercase letters.

clang/lib/Sema/SemaDeclAttr.cpp
4538

I'm assuming higher quality error messages are going to come later? Right now it is a catch-all message that is shown for unsupported expressions, expressions that don't type check, and expressions that will never be supported.

clang/test/Sema/attr-psets-annotation.cpp
24 ↗(On Diff #238367)

If this code imitates an existing library, please ignore.

Declaring PointerTraits::deref to be a function and then invoking it and using decltype on the return value seems unnecessarily complex.

You could declare a type alias within PointerTraits instead.

template<typename T>
struct PointerTraits {
  using DerefType = decltype(*declvar<T>());
};

template<typename T>
struct PointerTraits<T*> {
  using DerefType = T;
}; // Is this specialization needed? Wouldn't the primary template work?

template<typename T>
struct PointerTraits<T&> {
  using DerefType = T;
};

Also, a specialization for rvalue references?

However, I'm not sure if the specialization even for regular references will be used -- the free function deref() strips the reference from T (also strips constness from it!) Due to reference collapse rules, T will never be a reference.

95 ↗(On Diff #238367)

I'm not sure what causes this unqualified call to deref to call gsl::deref (in a different namespace). Does not seem like ADL would do it, since the type of out is also not in gsl.

xazax.hun marked 10 inline comments as done.
  • Address most of the review comments.
xazax.hun marked 6 inline comments as done.Jan 22 2020, 10:38 AM
xazax.hun added inline comments.
clang/include/clang/AST/LifetimeAttr.h
144

I changed them slightly and added some comments. Let me know if they are better or if you have some alternatives.

147

You are completely right. This comment was mainly an implementation related one. Now I have a more user facing comment in the header and moved the implementation related part to the cpp file.

clang/include/clang/Basic/DiagnosticSemaKinds.td
3470

Do you think a cc1 flag would be sufficient or do you have other ideas/preferences?

clang/lib/AST/LifetimeAttr.cpp
56

In this case the implementation was out of date, sorry for that. Now all the names should reflect the paper except for the Return. The function name to express return values did not work out well due to how name lookup works (and overloads). Herb already updated his working draft to reflect this change. I will ask him if he is willing to publish the working draft somewhere.

Also, he argued that Return is capitalized because it has a slightly different meaning compared to null, global, and invalid. In the lifetime annotations, Return will be on the 'left hand side', while the others will be on the 'right hand side'.

clang/lib/Sema/SemaDeclAttr.cpp
4538

Yes, I would love to improve the error message in the future. Note that, the current message is not that bad as it will highlight the unsupported expression. Expressions that does not type check will produce a warning earlier while parsing the argument of the attribute. Those warnings would be identical to warnings from expressions within the functions.

As for invalid lifetime annotations that are valid C++ expressions, ideally we want to make this impossible. (This patch does not push this direction yet.) Basically, we would like to make a DSL where every expression that type-checks using C++ type checking rules is a valid lifetime annotation.

clang/test/Sema/attr-psets-annotation.cpp
24 ↗(On Diff #238367)

I agree with you.

If this code imitates an existing library, please ignore.

Yes and no. This imitates an experiment (not a well established library). In that experiment I was looking into generating some runtime checks from static lifetime annotations. It turns out most of the time it is not possible to generate appropriate runtime checks from these annotations except for some corner cases like nullability. So the reason why I had this as a function because it carried out some null checks.

Since null pointers are easy to catch dynamically using other methods I do not have strong feelings about the current API. Do you think it is worth to keep it as a function or do you prefer the simpler way?

However, I'm not sure if the specialization even for regular references will be used...

That is correct, this is a leftover from the implementation which was using a trick to trigger these specializations like:

#define deref(arg) deref<decltype(arg)>(arg)
95 ↗(On Diff #238367)

I have a using namespace above to keep tests more concise.

gribozavr2 marked an inline comment as done.Jan 24 2020, 4:23 AM
gribozavr2 added inline comments.
clang/include/clang/AST/LifetimeAttr.h
23

Drop the "this represents", it does not add anything (you can say so about *every* class, "this represents an expression", "this represents a person", "this represents an postal address").

"An abstract memory location that participates in defining a lifetime contract. A lifetime contract constrains lifetime of a LifetimeContractVariable to be at least as big as the lifetime of other LifetimeContractVariables.

The memory locations that we can describe are: return values of a function, this pointer, any function parameter, a "drilldown" expression based on function parameters, null etc."

30

Also a factory, please.

162

WDYT about s/LifetimeSet/ObjectLifetimeSet/ ?

Comment:

"A lifetime of a pointee of a specific pointer-like C++ object. This lifetime is represented as a disjunction of different lifetime possibilities (set elements). Each lifetime possibility is specified by naming another object that the pointee can point at."

Also, three slashes for doc comments.

163

"Lifetime constraints for multiple objects. The key of the map is the pointer-like object, the value is the lifetime of the pointee.

Can be used to describe all lifetime constraints required by a given function, or all lifetimes inferred at a specific program point etc."

164

Generally, DenseMap and DenseSet are faster. Any reason to not use them instead?

167

Cut the fluff.

"Converts an AST of a lifetime contract (that is, the `gtl::lifetime(...) call expression) to a LifetimeContracts object that is used throughout the lifetime analysis.

If the AST does not describe a valid contract, the source range of the erroneous part is returned.

When the function succeeds, the returned SourceRange is invalid."

Having written this out, I'm contemplating whether the return value should be an optional of a source range.

clang/include/clang/Basic/DiagnosticSemaKinds.td
3470

Yes, I think that would be quite standard (similar to dumping the AST, dumping the CFG etc.)

clang/lib/AST/LifetimeAttr.cpp
56

I don't think the left-hand-side/right-hand-side difference helps me as a reader much. However, we would have a huge implementation problem if return was not capitalized (because it would be a keyword, and our expression parser really would not like that). So... whatever works, as long as we are consistent with the spec.

clang/lib/Sema/SemaDeclAttr.cpp
4538

Expressions that does not type check will produce a warning earlier while parsing the argument of the attribute.

I'd love some basic tests to show that.

clang/test/Sema/attr-psets-annotation.cpp
24 ↗(On Diff #238367)

Yes, I'd prefer a simpler implementation.

95 ↗(On Diff #238367)

I'd strongly suggest to un-use that namespace to make tests more realistic. Or even add basic tests for both cases, and keep the bulk of the tests close to what you expect users to write. There will be some differences in the AST between the two cases, I believe.

35 ↗(On Diff #239644)

Using a C variadic here gives me pause. I can't definitely say it would be a problem, however, C variadics are weird, can only pass certain types correctly, require passing them by value etc. So there might be some issues down the line, for example, with static analysis that warns about passing non-C-compatible types through a variadic, or with noncopyable types failing to typecheck in the call to PSet's constructor.

I think it would be better to do a variadic template that takes everything by const reference.

xazax.hun marked 14 inline comments as done.Jan 24 2020, 5:02 PM
xazax.hun added inline comments.
clang/include/clang/AST/LifetimeAttr.h
164

No specific reason other than I am always insecure about the choice. Dense data structures are great for small objects and I do not know where the break even point is and never really did any benchmarks. Is there some guidelines somewhere for what object size should we prefer one over the other?

clang/include/clang/Basic/DiagnosticSemaKinds.td
3470

I started to look into a cc1 flag, but it looks like it needs a bit more plumbing I expected as some does not have access to the CompilerInvocation object or the FrontendOptions. While it is not too hard to pass an additional boolean to Sema I also started to think about the user/developer experience (ok, regular users are not expected to do debugging). The advantage of warnings are that we get a source location for free, so it is really easy to see where the message is originated from. And while it is true that I cannot think of any examples of using warnings for debugging the Clang Static Analyzer is full of checks that are only dumping debug data as warnings.

xazax.hun updated this revision to Diff 240335.Jan 24 2020, 5:06 PM
  • Address some review comments.

Thank you for the work on this!

clang/include/clang/AST/LifetimeAttr.h
25

constrains lifetime -> constrains the lifetime

26

big -> long

30

What is a drilldown expression? That's not a term I'm familiar with.

71

I think it would be easier to read if the pattern was: if (Foo < Bar) return true; as opposed to checking inequality before returning the comparison.

80

This will have non deterministic behavior between program executions -- are you sure that's what we want? Similar below for fields.

154–155

These constructors should probably be explicit?

164

I usually figure that if it's less than two pointers in size, it's sufficiently small, but maybe others have a different opinion. This class is probably a bit large for using dense containers.

I do worry a little bit about the hoops we're jumping through to make the class orderable -- is there a reason to avoid an unordered container instead?

clang/include/clang/Basic/Attr.td
2924

I think the attribute should probably only be supported in C++ for now, and should have let LangOpts = [CPlusPlus]; in the declaration. WDYT?

2963

I agree with the FIXME comment. ;-)

clang/include/clang/Basic/DiagnosticSemaKinds.td
3470

I also prefer a cc1 flag -- using warnings to control debugging behavior is novel and doesn't seem like something we should encourage. I don't think you would need to pass any additional flags to Sema however; I would expect this to show up in a language option so it can be queried through LangOpts. This makes it available to other things, like clang-tidy checks as well.

clang/lib/AST/LifetimeAttr.cpp
24

This need some explanation as it sort of comes out of thin air. What is special about a class named PSet? Does it need to be in the global namespace, or a class named PSet in any namespace is special?

124

Same question about identifier namespaces here as above.

clang/lib/Sema/SemaDeclAttr.cpp
4518

I suspect you want this to handle redeclarations where you may be working with an inherited attribute? If so, you probably should be following the merge pattern used by other attributes.

clang/lib/Sema/SemaType.cpp
7731

This is not an okay thing to do for C++ attributes. They have a specific meaning as to what they appertain to and do not move around to better subjects like GNU-style attributes.

clang/test/Sema/attr-psets-annotation.cpp
1 ↗(On Diff #240335)

I think this test should be in SemaCXX instead of Sema.

44 ↗(On Diff #240335)

Please spell out the full diagnostic the first time it is used in the test file.

130 ↗(On Diff #240335)

This is missing all of the tests which verify the sanity of the attribute (appertains to the correct subject, accepts the correct number/type of arguments, etc).

xazax.hun marked 14 inline comments as done.Mar 20 2020, 1:28 AM
xazax.hun added inline comments.
clang/include/clang/AST/LifetimeAttr.h
71

I think this might not be equivalent.

The code below will always early return if the Tag is not the same.

if (Tag != O.Tag)
  return Tag < O.Tag;

The code below will only early return if the condition is true.

if (Tag < O.Tag)
  return true;

So I think the pattern above is an optimization.

80

Good point. As far as the analysis is concerned the end result should always be the same. I see potential problems in the tests where the contents of some ordered data structures using this ordering is dumped. I do not see any natural (and preformant) way to order RecordDecls. (I can order FieldDecls using their indices.) Are you ok with sorting the string representation before outputting them as strings? This should solve the potential non-deterministic behavior of tests.

164

I am not insisting on using ordered containers but note that since I have no idea how to get a deterministic and efficient hash value of a RecordDecl I would likely just include the address of the canonical decl. So the order of the elements in the container would be non-deterministic both ways. But the algorithm (other than the debug dumps which are fixable) will not depend on the order of the elements.

clang/include/clang/Basic/Attr.td
2924

Makes sense, thanks!

clang/include/clang/Basic/DiagnosticSemaKinds.td
3470

Yeah, plumbing through LangOpts will definitely work. The main reason I was reluctant to do so because I do not see anything in the language options that serves the purpose of debugging and I was wondering it this would really belong there.

clang/lib/AST/LifetimeAttr.cpp
24

I agree, I will make this more strict. Basically, there is a need for a small support library for these contracts to work. That support library is not defined anywhere yet, but it will be expected to live in the gslnamespace.

clang/lib/Sema/SemaDeclAttr.cpp
4518

It is not completely the same. I did not give much testing for redeclarations yet. The main purpuse of this code is to handle the following use case:

void multiple_annotations(int *a, int *b, int *c)
    [[gsl::pre(gsl::lifetime(b, {a}))]]
    [[gsl::pre(gsl::lifetime(c, {a}))]];

So the same declaration might have multiple instances of the annotation (similar to contracts) and we need to merge them.

clang/lib/Sema/SemaType.cpp
7731

Yeah, I know that. But this is problematic in the standard. See the contracts proposal which also kind of violates this rule by adding an exception. Basically, this is the poor man's version of emulating the contracts proposal. In the long term we plan to piggyback on contracts rather than hacking the C++ attributes.

xazax.hun updated this revision to Diff 251569.Mar 20 2020, 1:30 AM
xazax.hun marked an inline comment as done.
  • Rebase.
  • Address some review comments.
aaron.ballman added inline comments.Mar 20 2020, 12:04 PM
clang/include/clang/AST/LifetimeAttr.h
71

Good point about ordering requirements, but this is still extremely hard to read. Can you use std::tie() to simplify the logic somewhat?

80

Would it make sense to order the records by their source location when the pointer values are not equal? Or ordering on the fully-qualified name of the record?

164

I guess I was more speaking to the point that, if this is going to be unordered anyway, why not just use unordered containers and not attempt to impose an ordering in the first place? If the nondeterminism is not user-observable, then I'm not certain it matters.

clang/lib/Sema/SemaType.cpp
7731

The contracts proposal was not adopted and I am opposed to adding this change for any standard attribute. We've done a lot of work to ensure that those attributes attach to the correct thing based on lexical placement and I don't want to start to undo that effort.

xazax.hun marked an inline comment as done.Mar 22 2020, 10:23 AM
xazax.hun added inline comments.
clang/lib/Sema/SemaType.cpp
7731

I am not sure how to proceed in this case. As far as I understand this wasn't the main reason why didn't we adopt the contracts proposal.

While I do understand why is it good to make the lexical placement matter, but the current placement hinders non-trivial use-cases. Currently, we either have a type attribute or we cannot mention any of the formal parameters. This is why the contracts cannot work without changing attributes in the standard, and the same reason why this patch will never work without doing this. Or do you have an alternative in mind?

Do you think introducing a frontend flag for this attribute would help? This would make it clear that we are currently using a non-standard feature that needs to be enabled explicitly. Again, this is a temporary solution, that will go away as soon as we have contracts. Since contracts need to solve the same problem, whatever the solution will be, this patch can piggyback on that.

What do you think?

aaron.ballman added inline comments.Mar 23 2020, 1:18 PM
clang/lib/Sema/SemaType.cpp
7731

I am not sure how to proceed in this case. As far as I understand this wasn't the main reason why didn't we adopt the contracts proposal.

There were a lot of flaws, and this was one of them. I don't think you could point to any one flaw and say "this is why we didn't get contracts" though.

Or do you have an alternative in mind?

I agree that this is an existing deficiency in both the C and C++ standards with the attribute syntax for functions. There simply is no way to make a function declaration attribute that can refer to the parameters currently. The only alternative that comes to mind is to use only a GNU-style spelling rather than trying to use a [[]] spelling.

Do you think introducing a frontend flag for this attribute would help? This would make it clear that we are currently using a non-standard feature that needs to be enabled explicitly. Again, this is a temporary solution, that will go away as soon as we have contracts. Since contracts need to solve the same problem, whatever the solution will be, this patch can piggyback on that.

There's no guarantee that contracts will come back with the same syntax used previously, so I wouldn't want to rely on that to solve the issue until we have a better idea of what contracts will look like coming out of the new study group. My preference is to use the GNU-style spelling and not provide a [[]] spelling. We've done that for other attributes (like enable_if and diagnose_if.