Page MenuHomePhabricator

[Sema/Attribute] Check for noderef attribute
ClosedPublic

Authored by leonardchan on Jul 18 2018, 2:48 PM.

Details

Summary

This patch adds the noderef attribute in clang and checks for dereferences of types that have this attribute. This attribute is currently used by sparse and would like to be ported to clang.

The following are examples of when a warning would be raised on dereferencing a noderef type.

int __attribute__((noderef)) *p;
int x = *p;  // warning: dereference of noderef expression [-Wignored-attributes]

int __attribute__((noderef)) **p2;
x = **p2;  // warning: dereference of noderef expression [-Wignored-attributes]

int * __attribute__((noderef)) *p3;
p = *p3;  // warning: dereference of noderef expression [-Wignored-attributes]

struct S {
  int a;
};
struct S __attribute__((noderef)) *s;
x = s->a;    // warning: dereference of noderef expression [-Wignored-attributes]
x = (*s).a;   // warning: dereference of noderef expression [-Wignored-attributes]

Not all accesses may raise a warning if the value directed by the pointer may not be accessed. The following are examples where no warning may be raised:

int *q;
int __attribute__((noderef)) *p;
q = &*p;
q = *&p;

struct S {
  int a;
};
struct S __attribute__((noderef)) *s;
p = &s->a;
p = &(*s).a;

More examples of existing usage of noderef in sparse can be found in https://git.kernel.org/pub/scm/devel/sparse/sparse.git/tree/validation/noderef.c

Diff Detail

Repository
rC Clang

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Your current counter-based approach doesn't work very well in the case where we switch to another context while processing an expression (for example, during template instantiation): you'll defer all the diagnostics for the inner construct until the outer construct is complete. Generally global Sema state doesn't work very well for this reason.

Alternatively, I could move the set, counter, and logic using them into the ExpressionEvaluationContextRecord. Instead of tracking relative to a global Sema state, it would be relative to the latest record on the stack.

  • Moved the counter and set into ExpressionEvaluationContextRecord
aaron.ballman added inline comments.Aug 2 2018, 1:14 PM
include/clang/Basic/AttrDocs.td
3639

I would drop the "C style" and just say it's only supported for pointers and arrays.

3640

I'd clarify this a little bit to Objective-C object pointers.

include/clang/Basic/DiagnosticSemaKinds.td
9553

Please use single quotes instead of backticks in the quoting around noderef in these diagnostics.

leonardchan marked 3 inline comments as done.
  • Changed tick to single quote in diagnostic

@rsmith any more feedback on this current version? If it still looks incorrect to use the record this way, I don't mind simplifying it to work on lvalue to rvalue conversions without checking for a leading address space operation.

aaron.ballman added inline comments.Aug 10 2018, 12:07 PM
lib/Sema/SemaExpr.cpp
14440–14442

I don't think this strips off sugar from the type, so I suspect it won't properly handle a typedef to a pointer type, for instance, or a type including parens. You should add tests for these cases.

Oops, I hit "Send" too soon. I was going to say that you should also keep an eye on D50526 because that may impact this patch (or vice versa if this one lands first).

leonardchan marked an inline comment as done.
  • Checks for sugared types and expressions wrapped in parenthesis
aaron.ballman added inline comments.Aug 13 2018, 11:37 AM
lib/Sema/SemaExpr.cpp
14447

The sugar was stripped off at the pointer level, but not at the pointee level. e.g.,

typedef int (bobble);
typedef bobble * (frobble);
typedef frobble * yobble;

yobble gobble;

I think you can handle this within TypeHasNoDeref() and that should fix up all the callers.

lib/Sema/SemaExpr.cpp
14447

So TypeHasNoDeref() checks for the attribute on the base type already and is called after the pointer is stripped off. Attempting to desugar via getDesugaredType() here also removes the address_space attribute from the type I'm checking.

Do you know another method that essentially "expands" the typedefs without stripping the qualifiers? Otherwise I think I do need do the desugaring at the pointer level. Alternatively I could also change this method such that it accepts pointers instead of pointees since it appears I already call getDesugaredType() for almost every pointer who's pointee I'm passing to TypeHasNoDeref().

Also I tested with your example and the warning still seems to be thrown appropriately.

aaron.ballman added inline comments.Aug 13 2018, 1:10 PM
lib/Sema/SemaExpr.cpp
14447

I think you have to do the desugaring manually in a loop with getSingleStepDesugaredType() so that you don't strip off attributed type information along with the rest of the type sugar.

Also I tested with your example and the warning still seems to be thrown appropriately.

The example may depend on where you put the attribute (inside the parens vs outside the parens, for instance); it was an off-the-cuff example, so it may need some tweaking.

leonardchan marked 3 inline comments as done.
  • Remove sugar from pointee types
lib/Sema/SemaExpr.cpp
14447

Done. Also added different variations of your example to the tests.

aaron.ballman accepted this revision.Aug 14 2018, 12:47 PM

A few more minor nits to be cleared up, but otherwise LGTM. You should wait for @rsmith to sign off before committing in case he has further comments, however.

lib/Sema/SemaExpr.cpp
4219

Missing a full stop at the end of the comment.

4221

I'd prefer to use !isa<AttributedType>(Ty); I think that's more clear than looking at the enumeration.

test/Frontend/noderef.c
154

Typo: Parentheses

This revision is now accepted and ready to land.Aug 14 2018, 12:47 PM
leonardchan marked 6 inline comments as done.
rsmith added inline comments.Aug 20 2018, 10:06 AM
lib/Parse/ParseExpr.cpp
1126 ↗(On Diff #160892)

This parser-driven start/stop mechanism will not work in C++ templates. Instead, can you remove the "start" part and check the noderef exprs as part of popping the ExprEvaluationContextRecord?

lib/Sema/SemaExpr.cpp
4306

You need to loop/recurse here; there could be a sequence of . member accesses to strip.

4330

Did you mean to use ElemTy somewhere below? (I'd guess you intended to check whether it's marked noderef?)

14428–14440

This comment is marked done but has not been addressed.

lib/Sema/SemaType.cpp
4297

This is not correct (there could be multiple attributes on the type). See below.

4909

Move this after the check below (directly setting ExpectNoDerefChunk instead), and remove the unnecessary IsNoDeref variable.

4909

This is not correct: there could be multiple attributes at this level. You could fix this locally either by looping through the attributes on the type or iterating through the ParsedAttrs on the chunk. But I think my preference would be to store state indicating that we have an unvalidated noderef attribute on the TypeProcessingState at the point where you create the AttributedType, and check and clear that flag here when creating a type chunk.

@rsmith any more feedback on this current version? If it still looks incorrect to use the record this way, I don't mind simplifying it to work on lvalue to rvalue conversions without checking for a leading address space operation.

I've been thinking more about cleaner ways to implement this (and in particular, approaches that will provide more reasonable semantics in C++ -- allowing references to noderef, for example). We want to disallow operands of type noderef T to all operations by default, and only allow very specific operations on lvalues of type noderef T -- taking the address, performing member accesses, lvalue casts. The most natural way to get that effect would be to add a new form of placeholder type for a "dereferenced noderef" expression, that CheckPlaceholderExpr rejects, and that we add explicit support for in the contexts where such a construct is valid. (This is similar to how we handle overloaded function names and bound member function expressions in C++, for example.) (When we reach a context that "consumes" a dereferenced noderef expression, we'd need to go back and patch up its type, but I think that can be handled in a straightforward way.)

I think we should also treat noderef more like a type qualifier: as an important example, if we have a pointer or reference to noderef struct X, then member access for a member of type T should give an lvalue of type noderef T.

@rsmith any more feedback on this current version? If it still looks incorrect to use the record this way, I don't mind simplifying it to work on lvalue to rvalue conversions without checking for a leading address space operation.

I've been thinking more about cleaner ways to implement this (and in particular, approaches that will provide more reasonable semantics in C++ -- allowing references to noderef, for example). We want to disallow operands of type noderef T to all operations by default, and only allow very specific operations on lvalues of type noderef T -- taking the address, performing member accesses, lvalue casts. The most natural way to get that effect would be to add a new form of placeholder type for a "dereferenced noderef" expression, that CheckPlaceholderExpr rejects, and that we add explicit support for in the contexts where such a construct is valid. (This is similar to how we handle overloaded function names and bound member function expressions in C++, for example.) (When we reach a context that "consumes" a dereferenced noderef expression, we'd need to go back and patch up its type, but I think that can be handled in a straightforward way.)

I think we should also treat noderef more like a type qualifier: as an important example, if we have a pointer or reference to noderef struct X, then member access for a member of type T should give an lvalue of type noderef T.

Sorry, I forgot to say: I do not think we need to do this for the initial version of this functionality. This is mostly about cleanly extending the functionality to cover more C++ constructs.

leonardchan marked 6 inline comments as done and an inline comment as not done.
leonardchan added inline comments.
lib/Parse/ParseExpr.cpp
1126 ↗(On Diff #160892)

I'm not sure if I should remove the the start and stop methods because for a regular C program, the Push/PopExprEvaluationContextRecord functions don't seem to be called, and even when they are called in C++, the initial record that exists on the stack isn't popped at all.

Since pending noderef expressions are still parsed and added to the last record during template instantiation, doing another check when popping covers all noderef exprs added during instantiation.

lib/Sema/SemaExpr.cpp
4330

Forgot to remove this.

14428–14440

My bad. My response to this was adding a check in StopCheckingNoDerefAndWarn where we check if the DeclRefExpr is nullptr and throw the appropriate error if it is found or not.

Although I can't come up with an example where this doesn't get the declref we want, since the expressions we pass to this visitor are all our marked noderef expressions (which we check via the attribute on the type). So even if we have a complex expression that has multiple declrefs, all noderef declrefs should be checked by this.

lib/Sema/SemaType.cpp
4909

After https://reviews.llvm.org/D50526, it seems that the attribute can also be applied in ConvertDeclSpecToType, so I added another check for the attributes there.

@rsmith any more feedback on this current version? If it still looks incorrect to use the record this way, I don't mind simplifying it to work on lvalue to rvalue conversions without checking for a leading address space operation.

I've been thinking more about cleaner ways to implement this (and in particular, approaches that will provide more reasonable semantics in C++ -- allowing references to noderef, for example). We want to disallow operands of type noderef T to all operations by default, and only allow very specific operations on lvalues of type noderef T -- taking the address, performing member accesses, lvalue casts. The most natural way to get that effect would be to add a new form of placeholder type for a "dereferenced noderef" expression, that CheckPlaceholderExpr rejects, and that we add explicit support for in the contexts where such a construct is valid. (This is similar to how we handle overloaded function names and bound member function expressions in C++, for example.) (When we reach a context that "consumes" a dereferenced noderef expression, we'd need to go back and patch up its type, but I think that can be handled in a straightforward way.)

I think we should also treat noderef more like a type qualifier: as an important example, if we have a pointer or reference to noderef struct X, then member access for a member of type T should give an lvalue of type noderef T.

Sorry, I forgot to say: I do not think we need to do this for the initial version of this functionality. This is mostly about cleanly extending the functionality to cover more C++ constructs.

No problem. After this, I will make another patch that does this.

rsmith added inline comments.Aug 21 2018, 5:38 PM
lib/Parse/ParseExpr.cpp
1126 ↗(On Diff #160892)

PushExpressionEvaluationContext / PopExpressionEvaluationContext are called regardless of which language we're parsing. If we're missing ExpressionEvaluationContext records around some expression parsing, we should fix that. We should never be creating expressions within the initial ExpressionEvaluationContext record (except perhaps during error recovery).

Since pending noderef expressions are still parsed and added to the last record during template instantiation, doing another check when popping covers all noderef exprs added during instantiation.

That's not how template instantiation works in Clang. We don't re-parse, we perform a recursive tree transformation that does not involve the parser.

lib/Sema/SemaExpr.cpp
4218–4231

You can just use Ty->hasAttr(attr::NoDeref) for this now.

4302–4305

Do you need to do this both here and after walking into the . expressions? It seems to me that it would be sufficient to first walk into the LHS of any . expressions, and then remove the result from PossibleDerefs.

4308

You should check the MemberExpr is a . rather than a -> before stepping into it.

4317

This type check seems incorrect, because you don't propagate the type through -> operators. In particular, given

`&noderef_ptr->x.y`

you will step back to noderef_ptr->x, which will not have a noderef type, and not remove it from the set as a result. It seems best to drop this type check.

4328

Do you ensure that the noderef attribute will be on the innermost level of the array type? I think you'll need to do so in order to warn on:

typedef int A[32];
typedef A __attribute__((noderef)) *B;
int f(B b) { return (*B)[1]; }

(Here, we have a pointer to a noderef annotated array of non-noderef-annotated int. So I think we will not emit a warning from the first dereference, because we have a pointer to an array, and we will not emit a warning from the second dereference in the array indexing, because the result type does not have the noderef attribute.)

4336

QualTypes should be held by value, not by reference.

4341–4347

There could be a sequence of MemberExprs here, and watch out for . vs ->.

13056–13057

You should probably not add the expression to the list if it's of array type. Given:

typedef int A[32];
typedef A __attribute__((noderef)) *B;
B b;
int __attribute__((noderef)) *p = *b;

... you should not warn on the "dereference" here. That should either be handled here or by treating an array-to-pointer decay operation the same as an "address of" operation and removing pending derefs from the list.

14428–14440

How about this:

int __attribute__((noderef)) *a;
int __attribute__((noderef)) *b;
int __attribute__((noderef)) *c;
*(a = b, c);

The first noderef pointer the visitor finds will be a, but that's not the one that was dereferenced.

14440–14442

Calling getDesugaredType is not the right way to address this: use Ty->getAs<PointerType>() and Context.getAsArrayType(Ty) instead.

leonardchan marked 13 inline comments as done.
leonardchan added inline comments.
lib/Parse/ParseExpr.cpp
1126 ↗(On Diff #160892)

So when should a new ExpressionEvaluationContext be pushed or popped?

For the following code:

#define NODEREF __attribute__((noderef))

void func(){
  int NODEREF *x;
  *x;
}

int main(){
  func();
}

A new context is pushed then popped in C++ but not for C. From what I can tell based off my observations and looking for where Push/Pop get called in code, new ones would get added when we enter a new GNU style statement expression, captured region after a pragma, or different error blocks.

lib/Sema/SemaExpr.cpp
4328

Hmmmm, so normally in order to check what's correct, I usually run these examples through sparse since that's the tool that actually checks noderef for gcc, but it seems that sparse instead diagnoses a warning on the array indexing instead and nothing for the first dereference.

This shouldn't be though since, as you pointed out, the array does not have noderef types. For a simple array,

int __attribute__((noderef)) x[10];
x[0];

sparse diagnoses the appropriate warning for the array index. Personally though, I would chalk this up to an edge case that wasn't thought of before in sparse, since something like this isn't handled on their existing validation tests.

Currently, this diagnoses a warning on the first dereference, but I can also understand why it shouldn't warn because accessing noderef structs shouldn't warn if the member accessed is an array. The only case though this applies in sparse's tests are with structs and they don't provide a test for dereferencing a pointer to an array.

I think the reason for why the struct example still passes is that if the member is an array, none of the actual data in the struct is being accessed. Instead, the value returned is just a pointer to the start of the array within the struct and equivalent to just adding an offset to the struct pointer. I think that no warning should also be thrown for this example if it is guaranteed that array A can similarly be found from simple pointer arithmetic on pointer B. I don't think it can unless B were an array or struct also, but I could be wrong or overthinking this.

  • Push and pop a new ExpressionEvaluationContext when we enter and exit a compound statement.
  • Remove Start/StopCheckingNoderef functions since we can now warn whenever we pop
  • Push and pop contexts for every parsed statement inside ParseStatementOrDeclaration instead of at the start and end of compound statements
rsmith added inline comments.Oct 3 2018, 4:52 PM
include/clang/Basic/AttrDocs.td
3645

Should noderef appear somewhere in this example? :)

include/clang/Basic/DiagnosticSemaKinds.td
9556–9557

Putting this under -Wonderef doesn't really seem appropriate to me -- you're using -Wnoderef above to mean "warn about cases where a noderef pointer is dereferenced", whereas this warning is about invalid use of an attribute. I think -Wignored-attributes (InGroup<IgnoredAttributes>) would be a better choice.

lib/Parse/ParseExpr.cpp
1126 ↗(On Diff #160892)

Hmm, it looks like we don't push/pop expression evaluation context records for function definitions or variable initializers currently (instead we have a single global "potentially evaluated" context which wraps all such contexts and appears to never be popped).

The fact that we share a single record for all functions / variables is likely simply because it never previously mattered that we were merging notionally-distinct contexts in this way. I suspect actually fixing that will uncover more problems (most likely, pre-existing bugs, but barriers to your progress nonetheless).

lib/Parse/ParseStmt.cpp
102–104 ↗(On Diff #164706)

Hmm, we call ParseStatementOrDeclaration rather a lot. Can we pull this up into its ultimate callers instead:

  • Parser::ParseFunctionDefinition
  • Parser::ParseLateTemplatedFuncDef
  • Parser::ParseLexedMethodDef
  • Parser::ParseLexedObjCMethodDefs

? Actually, we can do better than that: those functions all call either ActOnStartOfFunctionDef or ActOnStartOfObjCMethodDef first, and ActOnFinishFunctionBody when they're done. We should put the PushExpressionEvaluationContext and PopExpressionEvaluationContext calls in those functions instead.

We're missing at least one other place: when parsing the initializer for a global variable, in C++ we call Sema::ActOnCXXEnterDeclInitializer, which pushes a potentially-evaluated context. But in C, the InitializerScopeRAII object (in Parser::ParseDeclaratoinAfterDeclaratorAndAttributes) doesn't call into Sema and we don't ever push a suitable expression evaluation context.

You can find if any other places are missing by removing Sema's global ExpressionEvaluationContext and adding an assert that fires if we try to parse an expression with no ExpressionEvaluationContext, and then running the unit tests. (And we should check in that change to ensure this doesn't regress.)

lib/Sema/SemaExpr.cpp
4218

If it makes sense for this to exist at all, it should be a member of Type instead.

4309–4311

Replace these three lines with just

LastRecord.PossibleDerefs.erase(StrippedExpr);
4328

Arrays-of-arrays and structs-containing-arrays are the same in this regard: in both cases, accessing the first level (array element or struct member) gives you an array lvalue, and only accessing that second array will actually dereference memory. So I think the treatment of the two cases should be consistent. For member access, you defer the check if the element type is an array, and I think you should do the same thing here.

There are two consequences of this:

  1. If the array element type is an array, you should bail out of here and not add E to PossibleDerefs.
  2. If the array type (not the element type) has the noderef attribute, you should propagate it down to the array element type so that we can catch a later pointer dereference / array indexing operation.

For point 2, what this means in my example is that the expression *B should have type "array of 32 noderef ints" so that further noderef checks apply to it. The easiest way to accomplish this would be to check, when applying noderef to a type, whether the type is an array type, and if so also apply noderef to the array's element type.

test/Frontend/noderef_on_non_pointers.cpp
4 ↗(On Diff #164706)

I'd like to see a test that we get a wraning for attempting to bind a reference to a dereferenced noderef pointer.

In C++, should it be valid to assign a noderef pointer to a dereferenceable pointer without a cast?

The motivating use case always pairs noderef with an address_space attribute, and it's already invalid to convert between pointer types in different address spaces without a cast.
So I don't think we have a strong opinion one way or the other. In the abstract, I'd say noderef feels like a "constraining" qualifier a la const/volatile so that going from unconstrained to constrained implicitly is OK but not vice versa.

leonardchan marked 2 inline comments as done.Oct 31 2018, 12:28 PM
leonardchan added inline comments.
lib/Parse/ParseStmt.cpp
102–104 ↗(On Diff #164706)

@rsmith Would it be simpler to instead push and pop at the start and end of Parser::ParseExternalDeclaration? I'm currently working on your suggestion of removing the Sema global ExpressionEvaluationContext and find that a lot of accesses to the ExprEvalContexts stack stem from this method.

ActOnStartOfFunctionDef and ActOnFinishFunctionBody still work on anything in a function body, but I keep running into many cases for declarations declared globally that could be easily caught if instead I push and pop contexts at the start and end of Parser::ParseExternalDeclaration. Do you think this is a good idea or is there something that I may be missing from pushing and popping here?

This still accomplishes the goal of not reusing the global Sema context and I will still be able to check for noderef on every push and pop this way.

leonardchan marked 2 inline comments as not done.Oct 31 2018, 12:28 PM
leonardchan marked 5 inline comments as done.
leonardchan added inline comments.
lib/Parse/ParseStmt.cpp
102–104 ↗(On Diff #164706)

Made a separate patch (https://reviews.llvm.org/D54014) that does the push and pop for ActOnStartOfFunctionDef and ActOnFinishFunctionBody.

I can continue doing this for the remaining places that depend on the global Sema context in other patches since I ran into a couple of issues running into fixing everything in one big patch and think fixing each individual place in smaller separate patches would be simpler.

lib/Sema/SemaExpr.cpp
4328

There currently doesn't seem to be a simple way, that I could find, for editing the element type of an array type. I could alternatively just recreate the array type when wrapping one with the noderef AttributedType, but this would be difficult if the array type was surrounded by type sugar since the sugar would need to somehow be recreated on the new AttributedType.

Is it possible to do something like a tree transformation where I can change the element type of an array type itself without having to recreate the whole type with sugar?

test/Frontend/noderef_on_non_pointers.cpp
4 ↗(On Diff #164706)

Ah, I forgot about this case. Sparse normally has a different warning for casting involving it's attributes (which includes a warning when casting from a noderef pointer to regular pointer) , but I didn't focus on that since it seemed like a whole other feature.

I can see though how it would be problematic if a user unknowingly did something like cast to a dereferenceable pointer. Added a warning and tests for this.

*ping* @rsmith Any more comments on this patch or the one before it (https://reviews.llvm.org/D54014)? That one has the fix for pushing and popping another ExprEvalContext before acting on a function body in this patch.

rsmith accepted this revision.Dec 5 2018, 1:58 PM

Thanks for your patience.

clang/lib/Sema/SemaInit.cpp
7838–7854 ↗(On Diff #172695)

CheckSingleAssignmentConstraints also warns in this case; is the explicit warning emission above necessary?

leonardchan marked an inline comment as done.
This revision was automatically updated to reflect the committed changes.
dblaikie added inline comments.
lib/Sema/SemaType.cpp
4913–4916

Pointed out in PR41774 there's a dead store to ExpectNoDerefChunk here.

Should line 4913 be removed (& then the two if conditions can be collapsed together)?

leonardchan marked 2 inline comments as done.Mon, May 6, 3:07 PM
leonardchan added inline comments.
lib/Sema/SemaType.cpp
4913–4916

Yup, fixed in r360089