Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Phabricator shutdown timeline

[Clang] Implement the 'counted_by' attribute
Needs ReviewPublic

Authored by void on Apr 14 2023, 2:27 PM.

Details

Summary

The 'counted_by' attribute is used on flexible array members. The
argument for the attribute is the name of the field member in the same
structure holding the count of elements in the flexible array. This
information can be used to improve the results of the array bound sanitizer
and the '__builtin_dynamic_object_size' builtin.

This example specifies the that the flexible array member 'array' has the
number of elements allocated for it in 'count':

struct bar;
struct foo {
  size_t count;
   /* ... */
  struct bar *array[] __attribute__((counted_by(count)));
};

This establishes a relationship between 'array' and 'count', specifically
that 'p->array' must have *at least* 'p->count' number of elements available.
It's the user's responsibility to ensure that this relationship is maintained
through changes to the structure.

In the following, the allocated array erroneously has fewer elements than
what's specified by 'p->count'. This would result in an out-of-bounds access not
not being detected:

struct foo *p;

void foo_alloc(size_t count) {
  p = malloc(MAX(sizeof(struct foo),
                 offsetof(struct foo, array[0]) + count *
                     sizeof(struct bar *)));
  p->count = count + 42;
}

The next example updates 'p->count', breaking the relationship requirement that
'p->array' must have at least 'p->count' number of elements available:

struct foo *p;

void foo_alloc(size_t count) {
  p = malloc(MAX(sizeof(struct foo),
                 offsetof(struct foo, array[0]) + count *
                     sizeof(struct bar *)));
  p->count = count + 42;
}

void use_foo(int index) {
  p->count += 42;
  p->array[index] = 0; /* The sanitizer cannot properly check this access */
}

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
void updated this revision to Diff 556324.Fri, Sep 8, 4:26 PM

Add test output when the attribute isn't used.

void updated this revision to Diff 556619.Tue, Sep 12, 5:18 PM

This is ready for review. Please take a look.

nickdesaulniers accepted this revision.Wed, Sep 13, 9:33 AM

Thanks for the patch!

This revision is now accepted and ready to land.Wed, Sep 13, 9:33 AM

Also, I encourage you to test a build of the linux kernel with __counted_by BEFORE merging this!

Perhaps worth updating the release notes, too.

aaron.ballman requested changes to this revision.Wed, Sep 13, 11:25 AM

Thank you for working on this! Please be sure to also add a release note to clang/docs/ReleaseNotes.rst so users know about this. Also adding the codegen and attribute code owners for awareness.

clang/include/clang/AST/Decl.h
4276–4279

Could this be implemented as: return !field_empty() ? *std::prev(field_end()) : nullptr; ? Then maybe someday we'll get better iterator support that isn't forward-only and this code will automagically get faster.

4281–4286

We use this pattern fairly often to avoid repeating the implementation.

clang/include/clang/Basic/Attr.td
4237

Clang supports both as extensions in C++, as does GCC: https://godbolt.org/z/5xP7ncoha -- I think we should consider enabling this attribute in C++ mode, but I'm fine considering that in a follow-up.

4246–4249

Teeny tiniest of coding style nits :-D

clang/include/clang/Basic/AttrDocs.td
7248

If we land the changes with the attribute as a C-only attribute, we should document that explicitly here (unless we're immediately enabling it C++ mode; not asking for busywork).

clang/include/clang/Basic/DiagnosticSemaKinds.td
6382–6383

Why is this a warning and not a typical err_no_member error? I think we want that behavior so that we get typo correction for free. e.g.,

struct S {
  int Count;
  int FAM[] __attribute__((counted_by(Clount))); // Should get a "did you mean 'Count'" fix-it
};
6385–6389

We try to wrap syntax elements in single quotes in our diagnostics so it's more visually distinct

clang/lib/AST/ASTImporter.cpp
8979–8980

Should these be castAttrAs so it's more clear that type mismatches don't result in a null pointer, but an assertion?

clang/lib/AST/DeclBase.cpp
443–444

Can you explain a bit about what this code is for? I didn't spot any test coverage for it, but perhaps I missed something.

clang/lib/CodeGen/CGBuiltin.cpp
857

We don't generally use top-level const on local variables.

886
clang/lib/Sema/SemaDecl.cpp
17997–17999

Ask and you shall receive! :-D

18009

This logic seems like it should live in SemaDeclAttr.cpp when handling the attribute. We're given the declaration the attribute is applied to, so we can do everything we need from that (the FieldDecl has a getParent() so we don't need the RecordDecl to be passed in. Is there a reason we need it to live here instead?

18032

Errr any integer type?

struct S {
  bool HerpADerp;
  int FAM[] __attribute__((counted_by(HerpADerp)));
};

seems like something we don't want, but I also wonder about enumeration types and more interestingly, plain char where it could be treated as signed or unsigned depending on compiler flags.

clang/test/Misc/warning-flags.c
21 ↗(On Diff #556619)

One line up: "The list of warnings below should NEVER grow. It should gradually shrink to 0." ;-)

That said, I don't think this warning needs to be added because I think we have an existing error that is more appropriate.

clang/test/Sema/attr-counted-by.c
2

Is the triple necessary? Also, do we have to enable -fsanitize=array-bounds to test this?

22

Please add test with the other strange FAM-like members, as well as one that is just a trailing constant array of size 2.

Some more tests or situations to consider:

struct S {
  struct {
    int NotInThisStruct;
  };
  int FAM[] __counted_by(NotInThisStruct); // Should this work?
};

int Global;
struct T {
  int FAM[] __counted_by(Global); // Should fail per your design but is that the behavior you want?
};

struct U {
  struct {
    int Count;
  } data;
  int FAM[] __counted_by(data.Count); // Thoughts?
};

struct V {
  int Sizes[2];
  int FAM[] __counted_by(Sizes[0]); // Thoughts?
};

(I'm not suggesting any of this needs to be accepted in order to land the patch.)

You should also have tests showing the attribute is diagnosed when applied to something other than a field, given something other than an identifier, is given zero arguments, and is given two arguments.

This revision now requires changes to proceed.Wed, Sep 13, 11:25 AM
nickdesaulniers added inline comments.
clang/lib/CodeGen/CGBuiltin.cpp
857

Do we explicitly document this in the style guide? This came up for me recently in a code review (cc @jyknight ).
https://llvm.org/docs/CodingStandards.html

aaron.ballman added inline comments.Wed, Sep 13, 11:38 AM
clang/lib/CodeGen/CGBuiltin.cpp
857

It's the prevailing style in our files rather than a hard rule. In Clang, we seem to have organically landed on "const members are fine, const locals are not fine, do not use globals". Our const-correctness story is terrible and if we had better const correct interfaces, I think we'd have landed somewhere different.

clang/lib/CodeGen/CGBuiltin.cpp
857

prevailing styles should be documented explicitly in the style guide. That settles all discussions, now and in the future.

Or perhaps an RFC on discourse.

erichkeane added inline comments.Wed, Sep 13, 11:56 AM
clang/include/clang/AST/DeclBase.h
482

Why isn't this a member function?

clang/include/clang/Basic/Attr.td
4246

Should we instead be capturing the field itself, rather than its location? It seems to me that would be more useful?

clang/include/clang/Basic/AttrDocs.td
7251

As this is the purpose of this attribute, it seems to me that the documentation should headline/more obviously highlight that the purpose here is to improve the result of the sanitizer when it comes to flexible array members.

clang/include/clang/Basic/DiagnosticSemaKinds.td
6385

This one isn't clear... something more specifically about 'cannot point to itself' is perhaps more useful/explainatory.

Also, the beginning of all of these is really quite awkward as well, perhaps something like:

field %0 referenced by 'counted_by' attribute ... ?

clang/lib/AST/DeclBase.cpp
482

Whats going on here? Is this simply an attempt to see if this is the last one? Can we combine this effort with the 'getLastFieldDecl' above? Alternatively, can we get a good comment here?

void updated this revision to Diff 556734.Wed, Sep 13, 4:44 PM
void marked 36 inline comments as done.

Added more error messages. Changed some code around to align with coding practices. Added some more test cases.

clang/include/clang/AST/Decl.h
4276–4279

Using std::prev on a forward iterator won't work:

https://stackoverflow.com/questions/23868378/why-stdprev-does-not-fire-an-error-with-an-iterator-of-stdunordered-set

std::prev itself is defined only for bidirectional iterators:

 template<typename _BidirectionalIterator>
    _GLIBCXX_NODISCARD
    inline _GLIBCXX17_CONSTEXPR _BidirectionalIterator
    prev(_BidirectionalIterator __x, typename
         iterator_traits<_BidirectionalIterator>::difference_type __n = 1)
    {
...
4281–4286

Done. I just hate the use of const_cast...

clang/include/clang/AST/DeclBase.h
482

Because I want to be able to call it with a nullptr. I'll document this.

clang/include/clang/Basic/Attr.td
4237

I understand we can accept it as an extension, but yeah let's burn that bridge when we come to it.

4246

I tried that before and it didn't work. The issue is that at the time of parsing we don't yet have the full definition of the structure / union. So the Decl's aren't really available to us yet. There may be a way to massage the parsing code to allow this to happen, but I'd like to do it as a separate patch. I'll document it with a FIXME.

clang/include/clang/Basic/AttrDocs.td
7248

That's not something normally done for other COnly attributes. (However, the documentation shows those attributes being available for C++11, which is probably a mistake...) I went ahead and added that this is for C though.

7251

I'm not sure how much more I could do to highlight that. :-)

7261

This was a suggestion by a GCC developer as well. I personally hate that syntax. Regardless, it's something to be done in the future. :-)

clang/include/clang/Basic/DiagnosticSemaKinds.td
6382–6383

I thought it would be too harsh to make this an error. But I suppose there's no harm in doing this. I'm not sure how to implement a fixit for a closely matching identifier. Do you have an example of how to find the closest matching Decl?

6385

I reworded them a bit.

As for referring to itself, it could be a bit confusing to say that it 'cannot point to itself' because the 'itself' in that is the attribute, not the flexible array member. I think it's more-or-less clear what the error's referring to. The user can use this attribute only with a flexible array member. So they should know what what's meant by the message.

clang/lib/AST/ASTImporter.cpp
8979–8980

Sure.

clang/lib/AST/DeclBase.cpp
443–444

Short answer is I don't know. :-) This is copied from the original implementation of this method.

482

It's not really the same as the getLastField method. That one will return a non-null pointer whether or not the last field is a FAM. This code is just shorthand for getting the iterator for the FAM and then quickly testing if it's the final one. Again, this is because of the forward iterator that RecordDecl uses, for some reason. If we had bidirectional iterators, this would be much clearer.

clang/lib/CodeGen/CGBuiltin.cpp
857

Easy enough to do. Done.

clang/lib/Sema/SemaDecl.cpp
17997–17999

Nice!! :-D

18009

I put it here only because it's close to where it's called. I'm ambivalent on where it should live. Am I performing the check in the correct place?

Changed to use getParent().

18032

Yeah...unless they want just 1 element. ;-) I can rule a bool out. (Done)

A char should be fine. Dunno about signed vs. unsigned...

clang/test/Misc/warning-flags.c
21 ↗(On Diff #556619)

Awww....gatekeeping. ;-) Reverted.

clang/test/Sema/attr-counted-by.c
2

I suppose their not 100% necessary. I'll remove them.

22

I added a "this isn't a flexible array member" error message.

struct S {
  struct {
    int NotInThisStruct;
  };
  int FAM[] __counted_by(NotInThisStruct); // Should this work?
};

Yes, I believe it should. Kees had a similar example:

struct S {
  int x;
  struct {
    int count;
    int FAM[] __counted_by(count);
  };
};

I made changes to support this. It may be controversial, so please let me know if you or anyone has an issue with it.

int Global;
struct T {
  int FAM[] __counted_by(Global); // Should fail per your design but is that the behavior you want?
};

Yes, it should fail. I'll add a test case.

struct U {
  struct {
    int Count;
  } data;
  int FAM[] __counted_by(data.Count); // Thoughts?
};

I don't think we should support that at the moment. Referencing arbitrary fields in a nested structure is very complicated and the syntax for it is not at all settled on.

struct V {
  int Sizes[2];
  int FAM[] __counted_by(Sizes[0]); // Thoughts?
};

Hmm...Not sure. I'll ask the GCC person who's implementing this on her side about this.

void updated this revision to Diff 556749.Thu, Sep 14, 1:48 AM

Add release note.

aaron.ballman added inline comments.Thu, Sep 14, 6:29 AM
clang/include/clang/AST/Decl.h
4276–4279

Drat! Oh well, it was a lovely thought.

4281–4286

Yeah, I usually go take a shower after that sort of thing. :-D

clang/include/clang/Basic/Attr.td
4246

I think at some point we're going to want this to take an Expr argument instead of an identifier argument, so we can support foo.bar and baz[0], etc as arguments for more complicated situations. But I believe the current form is reasonable as-is (we can start taking an expression and all of these identifiers should become a MemberExpr, so users don't have to change their code).

clang/include/clang/Basic/AttrDocs.td
7248

Heh, our attribute documentation has been slowly getting better over time but is still pretty inconsistent; thank you for adding it!

clang/include/clang/Basic/DiagnosticSemaKinds.td
6382–6383

I'll leave a comment below with what I'm thinking.

clang/lib/AST/DeclBase.cpp
443–444

Good enough for me, thank you. :-) (I made that comment before I realized this was a refactor, I forgot to come back and delete it.)

clang/lib/Sema/SemaDecl.cpp
18029

When the lookup result is not found, I think you should call Sema::DiagnoseEmptyLookup(). This will handle giving you the error about the member not being found, but it should also give you typo correction for free, and if typo correction happens, you should then already know what new declaration was found via the correction, so you can continue to check for the other cases from there.

https://github.com/llvm/llvm-project/blob/b57df9fe9a1a230f277d671bfa0884bbda9fc1c5/clang/lib/Sema/SemaLambda.cpp#L1151 is an example of this being called elsewhere.

18032

The char situation I'm thinking about is morally: https://godbolt.org/z/377WPYojz

With -fsigned-char, accessing Idx means you get the value -1 and with -funsigned-char, you get the value 255. If that was used to index into FAM, the signed char version would become a bounds access issue whereas the unsigned char version works fine, so it's hard to write a portable attribute that refers to a char member.

Then again, the counted_by attribute is intended to catch exactly this kind of issue, so perhaps it's fine on char?

void added inline comments.Thu, Sep 14, 1:34 PM
clang/include/clang/Basic/Attr.td
4246

I think that GCC may not be able to use expressions (apart from simple, resolvable integer arithmetic). But yeah, those types of expansions would be nice. There are many issues to resolve with them first---such as how to reference a field in a sub-struct when the fam is in another substruct:

struct S {
  struct A {
    int count;
  } a;
  struct C {
    int count;
    struct X {
      int count;
    } x;
  } c;
  struct F {
    int fam[] __counted_by(/* how to refer to p->c.x.count? */);
  } f;
};
clang/lib/Sema/SemaDecl.cpp
18029

GAAAA! It's just like the rest of Clang: There are a million ways to do something, but none of them are documented and only one of them works for your particular case...kind of...and it takes a ton of arguments which seem counter-intuitive (CXXScopeSpec instead of DeclSpec?). All of Clang has been over-engineered from the beginning and it shows in how hard it is to do even the most basic things. So hard that people write the same things over and over with only small changes (see the two designated initializers implementations) leading to a nightmare.

The example you pointed to doesn't work. And none of the other forms seem to fit well enough. There are apparently many ways to "correct" a typo, but none of them seemed appropriate or like they'd do what I want. Is it really necessary to have this?

18032

True. But we technically have that issue with every type. Maybe it'd be best to make sure we don't have a negative result?

kees added a comment.Fri, Sep 15, 2:37 PM

Added more error messages. Changed some code around to align with coding practices. Added some more test cases.

Doing a full "allmodconfig" Linux kernel build with my 200ish __counted_by annotations is building cleanly. Yay! :)

clang/test/Sema/attr-counted-by.c
22

Yes, I believe it should. Kees had a similar example:

struct S {
  int x;
  struct {
    int count;
    int FAM[] __counted_by(count);
  };
};

I made changes to support this. It may be controversial, so please let me know if you or anyone has an issue with it.

FWIW, the Linux kernel has a LOT of this style (in an anonymous struct), which is why I wanted to make sure it was supported in this first version of the feature.

void added a comment.Fri, Sep 15, 3:37 PM

Added more error messages. Changed some code around to align with coding practices. Added some more test cases.

Doing a full "allmodconfig" Linux kernel build with my 200ish __counted_by annotations is building cleanly. Yay! :)

Great! That means I did something wrong.

;-)

struct V {
  int Sizes[2];
  int FAM[] __counted_by(Sizes[0]); // Thoughts?
};

-fbounds-safety doesn't allow this. In our internal adoption experience, we haven't encountered such use cases yet. So, I think it's best to make the model restrictive to avoid surprises. If we were to support it, I think it should at least be limited to cases where the array subscript expression is known to be in bounds at compile time, to avoid an OOB access when the counted_by argument is evaluated.

-fbounds-safety doesn't allow this. In our internal adoption experience, we haven't encountered such use cases yet. So, I think it's best to make the model restrictive to avoid surprises. If we were to support it, I think it should at least be limited to cases where the array subscript expression is known to be in bounds at compile time, to avoid an OOB access when the counted_by argument is evaluated.

Additionally: it is probably safe from an aliasing perspective (or at least not worse than using any other field) to use an array subscript in a count expression, provided the array's storage exists within the struct. However, we certainly wouldn't want people to go towards array[variable], pointer[anything], or (worse!) FAM[anything], and constant array subscripts are confusingly adjacent to the boundary we need to close. If we're just entertaining the possibility without motivating use cases at this time, I'd advise to leave it be.

I've got no further comments/concerns.

clang/include/clang/AST/Decl.h
4276–4279

something like !field_empty() ? *std::advance(field_begin(), std::distance(field_begin(), field_end() - 1) : nullptr

would do what Aaron would like, though perhaps with an extra run through the fields list until then.

clang/include/clang/Basic/DiagnosticSemaKinds.td
6385

I think the reword helps, I am less confident that referring to the 'flexible array member' is clear. One of the things I've noticed in the past is users 'trying' attributes without looking them up to see what they do. Frankly, I think that should be a supported strategy, and one that we manage by making clear diagnostics.

I'm up for suggestions/hopeful someone else will come up with something better, but I can hold my nose at this if we don't have anything better (and in fact, my attempts to put a strawman up were at least as bad).

void added inline comments.Mon, Sep 18, 1:32 PM
clang/include/clang/AST/Decl.h
4276–4279

Yeah, I want to avoid unnecessary iterations through the list.

clang/include/clang/Basic/DiagnosticSemaKinds.td
6385

I added a diagnostic (the first one) that indicates it must apply to a FAM. So maybe it'll be okay? I'm open to other wording...

void added a comment.Thu, Sep 21, 12:31 PM

Friendly Ping.

Friendly Ping.

Aaron is away at a conference this week, so hopefully he'll do another pass when he gets back next week.

pirama added a subscriber: pirama.Mon, Sep 25, 1:47 PM
void updated this revision to Diff 557415.Wed, Sep 27, 11:43 AM

Use CBA as an acronym of CountedByAttribute instead of EBA, which is oldspeak.