[Attr] Fix parameter indexing for attributes
ClosedPublic

Authored by jdenny on Feb 13 2018, 10:57 AM.

Details

Summary

The patch fixes a number of bugs related to parameter indexing in
attributes:

  • Parameter indices in some attributes (argument_with_type_tag, pointer_with_type_tag, nonnull, ownership_takes, ownership_holds, and ownership_returns) are specified in source as one-origin including any C++ implicit this parameter, were stored as zero-origin excluding any this parameter, and were erroneously printing (-ast-print) and confusingly dumping (-ast-dump) as the stored values.
  • For alloc_size, the C++ implicit this parameter was not subtracted correctly in Sema, leading to assert failures or to silent failures of __builtin_object_size to compute a value.
  • For argument_with_type_tag, pointer_with_type_tag, and ownership_returns, the C++ implicit this parameter was not added back to parameter indices in some diagnostics.

This patch fixes the above bugs and aims to prevent similar bugs in
the future by introducing careful mechanisms for handling parameter
indices in attributes. ParamIdx stores a parameter index and is
designed to hide the stored encoding while providing accessors that
require each use (such as printing) to make explicit the encoding that
is needed. Attribute declarations declare parameter index arguments
as [Variadic]ParamIdxArgument, which are exposed as ParamIdx[*]. This
patch rewrites all attribute arguments that are processed by
checkFunctionOrMethodParameterIndex in SemaDeclAttr.cpp to be declared
as [Variadic]ParamIdxArgument. The only exception is xray_log_args's
argument, which is encoded as a count not an index.

An alternative approach is to attempt to hide the issue of parameter
index encoding at each use by somehow implicitly exposing exactly the
encoding needed. That approach is essentially the approach that
already existed, and I made several attempts to improve it, but I have
concluded that it is both futile and harmful. The trouble is that the
needed encoding varies among attribute arguments and even among uses
of a single attribute argument. Making the exposed encoding implicit
then requires developers to think about not only what encoding is
needed at every use but also what encoding is implicitly being
exposed. Thus, making the encoding explicit at every use should
actually reduce the cognitive burden and the potential for mistakes.

Diff Detail

Repository
rC Clang
There are a very large number of changes, so older changes are hidden. Show Older Changes
aaron.ballman added inline comments.Feb 25 2018, 1:00 PM
include/clang/AST/Attr.h
263

I'd prefer more clear names, like "getSourceIndex()" and "getASTIndex()".

281

Perhaps getLLVMIndex() for this one.

290–291

std::iterator was deprecated in C++17, so you should manually expose these fields.

jdenny updated this revision to Diff 136019.Feb 26 2018, 6:16 PM
jdenny marked 8 inline comments as done.
jdenny edited the summary of this revision. (Show Details)

This revision should address all issues raised.

include/clang/AST/Attr.h
209–211

Good point. Thanks.

237–242

Oops. Thanks.

290–291

Now that sizeof(ParamIdx) == sizeof(unsigned), I can no longer find a reason to avoid ParamIdx arrays, and so I can no longer find a good justification for the iterator classes.

jdenny marked 3 inline comments as done.Feb 26 2018, 6:18 PM
jdenny marked 8 inline comments as done.Mar 1 2018, 7:10 AM
jdenny added a comment.Mar 1 2018, 7:41 AM

Hi Aaron. It occurs to me now that this patch has grown rather large and, in some places, a little subtle. Would it help the review if I were to break it up into a patch series that introduces ParamIdx to each attribute, one at a time? I'm not trying to rush you, but I hate for the review to be painful for you if it doesn't have to be.

aaron.ballman added inline comments.Mar 1 2018, 11:09 AM
include/clang/AST/Attr.h
206

The name here can be improved. How about checkInvariants()? Might as well make this inline while you're at it.

236

Is this constructor used anywhere? I didn't see it being used, but I could have missed something. If it's not used, go ahead and remove it.

267

Please assert that Idx won't wrap before doing the return.

276

Likewise here.

281–284

Are all of these operations required for the class?

include/clang/Basic/Attr.td
175–177

I still find the AllowThis parts to be hard to follow, so I want to be sure I understand your design properly. Everything that uses this new argument type sets AllowsThis to 0. As written, it sounds like setting that to 0 means that the parameter index cannot be used on a C++ instance method, and I know that's not the correct interpretation. Under what circumstances would this be set to 1 instead?

Looking at the existing code, the only circumstances under which checkFunctionOrMethodParameterIndex() was being passed true for "allow this" was for XRayLogArgs, which doesn't even use ParamIdx. Perhaps this member isn't necessary any longer?

lib/Sema/SemaChecking.cpp
2622–2624

const ParamIdx &

10084–10085

const ParamIdx &

12246

Hoist the AST index so you don't have to call for it twice. (Same applies elsewhere.)

lib/Sema/SemaDeclAttr.cpp
786–787

Good catch about this not being needed any longer!

1644–1645

This change looks to be unrelated to the patch?

4485–4488

Is this formatting produced by clang-format?

lib/StaticAnalyzer/Checkers/NonNullParamChecker.cpp
57–59

const ParamIdx &

Hi Aaron. It occurs to me now that this patch has grown rather large and, in some places, a little subtle. Would it help the review if I were to break it up into a patch series that introduces ParamIdx to each attribute, one at a time? I'm not trying to rush you, but I hate for the review to be painful for you if it doesn't have to be.

No need to do that -- this review just takes a bit more time for me to complete, but it's reasonably well-factored. Thank you, though!

Hi Aaron. It occurs to me now that this patch has grown rather large and, in some places, a little subtle. Would it help the review if I were to break it up into a patch series that introduces ParamIdx to each attribute, one at a time? I'm not trying to rush you, but I hate for the review to be painful for you if it doesn't have to be.

No need to do that -- this review just takes a bit more time for me to complete, but it's reasonably well-factored. Thank you, though!

Sure! Thanks for the review.

include/clang/AST/Attr.h
206

Sure, I can change the name.

It's inside the class, so specifying inline is against the LLVM coding standards, right?

236

It's used by the deserialization code generated by ClangAttrEmitter.cpp.

281–284

operator== and operator< are needed for sorting and finding. It seems strange to me not to finish the set.

include/clang/Basic/Attr.td
175–177

I still find the AllowThis parts to be hard to follow, so I want to be sure I understand your design properly. Everything that uses this new argument type sets AllowsThis to 0. As written, it sounds like setting that to 0 means that the parameter index cannot be used on a C++ instance method, and I know that's not the correct interpretation.

Right. AllowsThis=0 means it is an error for the attribute in the source code to specify the C++ implicit this parameter (index 1).

Under what circumstances would this be set to 1 instead?

Looking at the existing code, the only circumstances under which checkFunctionOrMethodParameterIndex() was being passed true for "allow this" was for XRayLogArgs, which doesn't even use ParamIdx. Perhaps this member isn't necessary any longer?

Right. I also noticed this issue, but I probably should have mentioned that in a comment in the source instead of just rewording the last paragraph of the patch summary. Sorry.

I thought about removing AllowsThis, but I hesitated because I had already implemented it by the time I noticed this issue and because I assumed there must be some reason why attributes for C++ have index 1 mean the this parameter, so there might be some attribute that could eventually take advantage of AllowsThis=1. Moreover, it makes the related argument to checkFunctionOrMethodParameterIndex in SemaDeclAttr.cpp clearer.

I don't feel strongly either way, so I'm happy to remove it or keep it.

lib/Sema/SemaDeclAttr.cpp
1644–1645

Sorry, I think clang-format suggested that at some point in history, and I overlooked it when reviewing the latest version.

4485–4488

Yes.

aaron.ballman added inline comments.Mar 1 2018, 12:51 PM
include/clang/AST/Attr.h
206

Derp, you're correct, it's already implicitly inline. Ignore that part of the suggestion.

236

That'd explain why I hadn't seen it.

281–284

I don't think it's actively harmful to do so, but I also don't think it's really needed either. Your call.

include/clang/Basic/Attr.td
175–177

Right. AllowsThis=0 means it is an error for the attribute in the source code to specify the C++ implicit this parameter (index 1).

Then if we keep this functionality, I think a better identifier would be something like CanIndexImplicitThis and the comments could be updated to more clearly state what is allowed/disallowed. Then the other uses of "allow this" can be updated to use similar terminology for clarity.

so there might be some attribute that could eventually take advantage of AllowsThis=1. Moreover, it makes the related argument to checkFunctionOrMethodParameterIndex in SemaDeclAttr.cpp clearer.

My gut feeling is that this functionality isn't going to be needed all that frequently. If we get a second case where we need it, then I'd say it might make more sense to add it.

Attributes that use positional arguments should hopefully be the exception, not the rule, because those indexes are horribly fragile.

What do you think?

lib/Sema/SemaDeclAttr.cpp
4485–4488

How funky. :-)

jdenny marked 3 inline comments as done.Mar 1 2018, 1:41 PM
jdenny added inline comments.
include/clang/Basic/Attr.td
175–177

My gut feeling is that this functionality isn't going to be needed all that frequently. If we get a second case where we need it, then I'd say it might make more sense to add it.

Attributes that use positional arguments should hopefully be the exception, not the rule, because those indexes are horribly fragile.

What do you think?

I'm guessing you have more experience with attributes in general, so let's go with your gut. :-)

lib/Sema/SemaDeclAttr.cpp
4485–4488

Agreed.

jdenny updated this revision to Diff 136624.Mar 1 2018, 4:03 PM
jdenny marked 23 inline comments as done.
jdenny edited the summary of this revision. (Show Details)

This update should address all outstanding comments.

include/clang/AST/Attr.h
206

The name here can be improved. How about checkInvariants()?

I went with assertComparable because, in my view, this is not so much a class invariant as it is an assertion about correct usage of comparison operators. But I'm not married to it.

jdenny marked an inline comment as done.Mar 1 2018, 4:10 PM
aaron.ballman accepted this revision.Mar 2 2018, 5:00 AM

Aside from two minor nits, this LGTM. Thank you for working on it!

test/Sema/attr-ownership.cpp
2

Please pass -fsyntax-only as well.

utils/TableGen/ClangAttrEmitter.cpp
793

Given how oddly this is wrapped, might as well make this a stream argument rather than a string literal concatenation.

This revision is now accepted and ready to land.Mar 2 2018, 5:00 AM
jdenny added inline comments.Mar 2 2018, 8:34 AM
test/Sema/attr-ownership.cpp
2

Sure. Would you like me to change test/Sema/attr-ownership.c while we're thinking about it?

aaron.ballman added inline comments.Mar 2 2018, 8:37 AM
test/Sema/attr-ownership.cpp
2

Because it's unrelated to this patch, that can be done in a separate commit (no review required, you can just make the change and commit it).

jdenny added a comment.Mar 2 2018, 9:17 AM

Aaron, thanks for the review. I've applied your suggestions and am ready to commit.

I've noticed a variety of styles in commit logs, I've read the coding standards, and it seems there's a lot of freedom here. Below are the two commit logs I'm planning to use. Please let me know if you have any objections. Thanks.

[Attr] Fix parameter indexing for several attributes

The patch fixes a number of bugs related to parameter indexing in
attributes:

* Parameter indices in some attributes (argument_with_type_tag,
  pointer_with_type_tag, nonnull, ownership_takes, ownership_holds,
  and ownership_returns) are specified in source as one-origin
  including any C++ implicit this parameter, were stored as
  zero-origin excluding any this parameter, and were erroneously
  printing (-ast-print) and confusingly dumping (-ast-dump) as the
  stored values.

* For alloc_size, the C++ implicit this parameter was not subtracted
  correctly in Sema, leading to assert failures or to silent failures
  of __builtin_object_size to compute a value.

* For argument_with_type_tag, pointer_with_type_tag, and
  ownership_returns, the C++ implicit this parameter was not added
  back to parameter indices in some diagnostics.

This patch fixes the above bugs and aims to prevent similar bugs in
the future by introducing careful mechanisms for handling parameter
indices in attributes.  ParamIdx stores a parameter index and is
designed to hide the stored encoding while providing accessors that
require each use (such as printing) to make explicit the encoding that
is needed.  Attribute declarations declare parameter index arguments
as [Variadic]ParamIdxArgument, which are exposed as ParamIdx[*].  This
patch rewrites all attribute arguments that are processed by
checkFunctionOrMethodParameterIndex in SemaDeclAttr.cpp to be declared
as [Variadic]ParamIdxArgument.  The only exception is xray_log_args's
argument, which is encoded as a count not an index.

Differential Revision: https://reviews.llvm.org/D43248

For the test/Sema/attr-ownership.c change:

[Attr] Use -fsyntax-only in test

Suggested at: https://reviews.llvm.org/D43248

Aaron, thanks for the review. I've applied your suggestions and am ready to commit.

You're correct that we have a lot of freedom with the commit message, but the important piece is in describing what's changed and why (if it's not immediately obvious). Both commit messages look good to me.

This revision was automatically updated to reflect the committed changes.
jdenny marked 4 inline comments as done.Mar 2 2018, 11:08 AM
jdenny updated this revision to Diff 137964.Mar 11 2018, 5:42 PM

This commit was reverted by r326862 due to:

https://bugs.llvm.org/show_bug.cgi?id=36620

This revision includes a new test case and a fix.

While the difference from the last revision is small, it's not trivial, so another review is probably worthwhile. The main difference is two new ParamIdx member functions: serialize and deserialize. In ClangAttrEmitter.cpp, these functions enable significant simplifications and facilitate the bug fix.

jdenny reopened this revision.Mar 11 2018, 5:42 PM
This revision is now accepted and ready to land.Mar 11 2018, 5:42 PM
jdenny requested review of this revision.Mar 11 2018, 5:43 PM

Can you regenerate the patch using the same paths as https://reviews.llvm.org/D43248?id=136811? When I try to do a diff between what was accepted & committed and the current patch, Phabricator gets confused because the paths are too different from one another.

jdenny updated this revision to Diff 138111.Mar 12 2018, 5:15 PM

OK, this diff has the svn paths, and I've rebased to a more recent master.

jdenny updated this revision to Diff 138113.Mar 12 2018, 5:20 PM

Well, that didn't work. Here's another attempt at getting the paths right.

It seems like there are some other changes than just the serialize and deserialize that I'm not opposed to, but am wondering why they're needed. It seems some functions are now getFoo() calls and it seems like some declarations moved around. Are those intended as part of this patch?

cfe/trunk/test/Frontend/ast-attr.cpp
1–2 ↗(On Diff #138113)

Can you move this below the RUN line?

cfe/trunk/test/Sema/attr-print.cpp
1 ↗(On Diff #138113)

Can you move this comment below the RUN line?

It seems like there are some other changes than just the serialize and deserialize that I'm not opposed to, but am wondering why they're needed. It seems some functions are now getFoo() calls

These were originally named getFoo, and my previous patch changed them to foo. I believe I did that to make ParamIdxArgument accessors more like VariadicParamIdxArgument accessors (which inherits accessors from VariadicArgument), but I probably shouldn't have done that. In any case, this new revision implements ParamIdxArgument using SimpleArgument, and that names accessors like getFoo.

and it seems like some declarations moved around. Are those intended as part of this patch?

Are you referring to the changes in SemaDeclAttr.cpp? Those changes are needed because the ParamIdx constructor now asserts that Idx is one-origin, but that requires validating that it's actually one-origin beforehand. Sorry, I should've mentioned the new asserts.

jdenny added inline comments.Mar 12 2018, 6:09 PM
cfe/trunk/test/Frontend/ast-attr.cpp
1–2 ↗(On Diff #138113)

Sure. I'm still trying to learn the LLVM coding standards. Is this specified there?

It seems like there are some other changes than just the serialize and deserialize that I'm not opposed to, but am wondering why they're needed. It seems some functions are now getFoo() calls

These were originally named getFoo, and my previous patch changed them to foo. I believe I did that to make ParamIdxArgument accessors more like VariadicParamIdxArgument accessors (which inherits accessors from VariadicArgument), but I probably shouldn't have done that. In any case, this new revision implements ParamIdxArgument using SimpleArgument, and that names accessors like getFoo.

Ahhh, thank you for the explanation, that makes sense.

and it seems like some declarations moved around. Are those intended as part of this patch?

Are you referring to the changes in SemaDeclAttr.cpp? Those changes are needed because the ParamIdx constructor now asserts that Idx is one-origin, but that requires validating that it's actually one-origin beforehand. Sorry, I should've mentioned the new asserts.

Ah, okay, thank you!

cfe/trunk/test/Frontend/ast-attr.cpp
1–2 ↗(On Diff #138113)

Nope! I'm just used to looking at the very first line of the test to know what it's running, and that seems consistent with other tests.

jdenny added inline comments.Mar 12 2018, 6:39 PM
cfe/trunk/test/Frontend/ast-attr.cpp
1–2 ↗(On Diff #138113)

OK, that makes sense. I'll be sure to change both of those.

aaron.ballman added inline comments.Mar 12 2018, 6:46 PM
cfe/trunk/test/Frontend/ast-attr.cpp
5 ↗(On Diff #138113)

Just to verify my understanding, this test is checking the serialization and deserialization?

jdenny added inline comments.Mar 12 2018, 6:59 PM
cfe/trunk/test/Frontend/ast-attr.cpp
5 ↗(On Diff #138113)

That's right, and it failed without the other changes in this revision because alloc_size's second argument serialized/deserialized as 0 when it was actually invalid (that is, unspecified). Of course, this test is more thorough than just exercising alloc_size.

aaron.ballman accepted this revision.Mar 12 2018, 7:02 PM

LGTM with the test comments fixed up.

cfe/trunk/test/Frontend/ast-attr.cpp
5 ↗(On Diff #138113)

Perfect, thank you!

This revision is now accepted and ready to land.Mar 12 2018, 7:02 PM

LGTM with the test comments fixed up.

Thanks! I'll commit tomorrow.

This revision was automatically updated to reflect the committed changes.
jdenny marked 8 inline comments as done.
echristo added inline comments.
test/Sema/attr-print.cpp
3

Relatedly I don't think we use files as input files to other directories and I don't think we should really. What are you trying to test here? This breaks the hermeticness of each particular test directory.

jdenny added inline comments.Mar 13 2018, 1:25 PM
test/Sema/attr-print.cpp
3

Grep for "%S/.." and you'll see that a few other tests do something like this as well.

In test/Sema/attr-print.cpp, I am testing printing of attributes. I chose to put that there because of the existing attr-print.c there.

In test/Frontend/ast-attr.cpp, I am testing serialization of attributes. I chose to put that there because I see other -emit-ast tests there and because, if I put it in test/Sema instead, I get the complaint "Do not use the driver in Sema tests".

The same C++ code makes sense in both of these, but replicating it in two files would worsen maintainability.

I could try to combine into one file in, say, tests/Misc if that works.

I have no strong opinions on where these live. Just for my own education, is something real breaking right now because of their current locations?

Thanks.

echristo added inline comments.Mar 13 2018, 1:56 PM
test/Sema/attr-print.cpp
3

Basically it's breaking an external build system (bazel) that has fairly distinct and specific dependency requirements and so layering and other dependencies are pretty entertaining.

Right now we avoid testing those particular tests and have TODOs of fixing them and the requirements to use a single directory and I was trying to avoid one more here.

All of that said I totally understand the desire to keep the maintenance burden minimized and an external build system shouldn't affect whether or not we do one particular thing or another - I was trying to get it written so that we could enable it without much undue burden. I'm uncertain if a comment of:

// If you change this file you should also change blah file.

or moving it to another directory where you can do both tests at the same time.

I'd prefer to move it than to expect people to obey such a comment. Let's see what Aaron says.

I'd prefer to move it than to expect people to obey such a comment. Let's see what Aaron says.

I have a slight preference for moving the tests now that I know they're causing problems, unless that turns out to be overly onerous for some reason.

Thank you, @echristo for pointing out the issues with the tests, I hadn't considered that.

I'd prefer to move it than to expect people to obey such a comment. Let's see what Aaron says.

I have a slight preference for moving the tests now that I know they're causing problems, unless that turns out to be overly onerous for some reason.

Thank you, @echristo for pointing out the issues with the tests, I hadn't considered that.

No worries :)

That said, please don't revert and reapply to move. Just a followup commit is just fine - and whenever you get a chance. (Though if you let me know when you do I'd appreciate it :)

I'd prefer to move it than to expect people to obey such a comment. Let's see what Aaron says.

I have a slight preference for moving the tests now that I know they're causing problems, unless that turns out to be overly onerous for some reason.

Thank you, @echristo for pointing out the issues with the tests, I hadn't considered that.

No worries :)

That said, please don't revert and reapply to move. Just a followup commit is just fine - and whenever you get a chance. (Though if you let me know when you do I'd appreciate it :)

Sure.

So, I'm planning to remove test/Frontend/ast-attr.cpp, rename test/Sema/attr-print.cpp to test/Misc/attr-print-emit.cpp, and change its run lines to:

// RUN: %clang_cc1 %s -ast-print | FileCheck %s
// RUN: %clang -emit-ast -o %t.ast %s
// RUN: %clang_cc1 %t.ast -ast-print | FileCheck %s

That seems to work fine locally. I'll just leave the old test/Sema/attr-print.c alone as it's not part of this problem. Any objections?

So, I'm planning to remove test/Frontend/ast-attr.cpp, rename test/Sema/attr-print.cpp to test/Misc/attr-print-emit.cpp, and change its run lines to:

// RUN: %clang_cc1 %s -ast-print | FileCheck %s
// RUN: %clang -emit-ast -o %t.ast %s
// RUN: %clang_cc1 %t.ast -ast-print | FileCheck %s

That seems to work fine locally. I'll just leave the old test/Sema/attr-print.c alone as it's not part of this problem. Any objections?

I went ahead and committed. It's in r327456.

jdenny marked 3 inline comments as done.Mar 13 2018, 3:26 PM