This is an archive of the discontinued LLVM Phabricator instance.

[Clang] Implement the 'counted_by' attribute
ClosedPublic

Authored by void on Apr 14 2023, 2:27 PM.

Details

Summary

The 'counted_by' attribute is used on flexible array members. The
argument for the attribute is the name of the field member in the same
structure holding the count of elements in the flexible array. This
information can be used to improve the results of the array bound sanitizer
and the '__builtin_dynamic_object_size' builtin.

This example specifies the that the flexible array member 'array' has the
number of elements allocated for it in 'count':

struct bar;
struct foo {
  size_t count;
   /* ... */
  struct bar *array[] __attribute__((counted_by(count)));
};

This establishes a relationship between 'array' and 'count', specifically
that 'p->array' must have *at least* 'p->count' number of elements available.
It's the user's responsibility to ensure that this relationship is maintained
through changes to the structure.

In the following, the allocated array erroneously has fewer elements than
what's specified by 'p->count'. This would result in an out-of-bounds access not
not being detected:

struct foo *p;

void foo_alloc(size_t count) {
  p = malloc(MAX(sizeof(struct foo),
                 offsetof(struct foo, array[0]) + count *
                     sizeof(struct bar *)));
  p->count = count + 42;
}

The next example updates 'p->count', breaking the relationship requirement that
'p->array' must have at least 'p->count' number of elements available:

struct foo *p;

void foo_alloc(size_t count) {
  p = malloc(MAX(sizeof(struct foo),
                 offsetof(struct foo, array[0]) + count *
                     sizeof(struct bar *)));
  p->count = count + 42;
}

void use_foo(int index) {
  p->count += 42;
  p->array[index] = 0; /* The sanitizer cannot properly check this access */
}

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Jumping a bit late in the thread, apologize in advance if I missed something.

The GCC version of the attributes seems to be element_count (as found in the link https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108896 mentionned by @kees) I guess we could align (I personally prefer the GCC name, FWIW)

I may be wrong , but it seems to me that there is nothing specific to FAM there, and this would work as well for

struct foo {
  size_t num_elements;
   // ...
  struct bar ** data __attribute__((counted_by(num_elements)));
};

struct foo make_foo(size_t num_elements) {
struct foo f;
  f.data = malloc(num_elements *
                         sizeof(struct bar *));

  f.num_elements = num_elements;
  return f;
}

Which makes us very close to:

void stuff(int n, int *data __attribute__((counted_by(n)))) {
    for(int i = 0; i < n; ++i)
      other_stuff(data[i]);
}

which is very close to VLA, right?

clang/lib/AST/Expr.cpp
286 ↗(On Diff #554083)

I really like this cleanup!

void added a comment.Aug 29 2023, 4:54 PM

Jumping a bit late in the thread, apologize in advance if I missed something.

The GCC version of the attributes seems to be element_count (as found in the link https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108896 mentionned by @kees) I guess we could align (I personally prefer the GCC name, FWIW)

Qing Zhao has been sending out patches to gcc-patches@, so I've been trying to follow those as much as possible. I plan to go through my implementation to ensure that I copy hers as much as possible. See the thread here: https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626376.html

I may be wrong , but it seems to me that there is nothing specific to FAM there, and this would work as well for

struct foo {
  size_t num_elements;
   // ...
  struct bar ** data __attribute__((counted_by(num_elements)));
};

struct foo make_foo(size_t num_elements) {
struct foo f;
  f.data = malloc(num_elements *
                         sizeof(struct bar *));

  f.num_elements = num_elements;
  return f;
}

Which makes us very close to:

void stuff(int n, int *data __attribute__((counted_by(n)))) {
    for(int i = 0; i < n; ++i)
      other_stuff(data[i]);
}

which is very close to VLA, right?

Yeah. It could definitely be extended to the Apple RFC (which is similar to your example).

void marked 4 inline comments as done.Aug 29 2023, 5:23 PM

I assume you plan to add some clang CodeGen tests at some point?

Yes. :-)

clang/lib/CodeGen/CodeGenFunction.h
531–532 ↗(On Diff #554083)

Ah! Thanks for catching this. I had trouble with this scoping mechanism and for a while used a reference counting method to get around it, but put it back to its previous impl, except for this. I still think the SanitizerScope needs work, but that's beyond this project.

clang/lib/Sema/SemaDecl.cpp
17903

I prefer it this way in this case (actually, I wish C++ had a way to combine those two if's into one). But it's not a big deal.

17922

That should be an error. I'll add that as a check.

void updated this revision to Diff 554533.Aug 29 2023, 5:23 PM

Fix some issues Nick pointed out.

void updated this revision to Diff 554769.Aug 30 2023, 10:12 AM

Add more diagnostics.

Coming next: testcases!!!

void updated this revision to Diff 554792.Aug 30 2023, 11:32 AM

Add testcases.

void updated this revision to Diff 555169.Aug 31 2023, 2:42 PM

Add a few more tests to the codegen testcase.

void updated this revision to Diff 555180.Aug 31 2023, 3:25 PM

Remove un-needed change.

void updated this revision to Diff 555219.Aug 31 2023, 6:28 PM

Fix testcase.

void retitled this revision from [WIP][Clang] Add counted_by attribute to [Clang] Add counted_by attribute.Sep 5 2023, 1:13 PM
void updated this revision to Diff 555946.Sep 5 2023, 3:10 PM

Update documentation. Cribbed off of Qing Zhao's GCC implementation.

void added a comment.Sep 6 2023, 3:18 AM

This is now ready for a non-WIP review. PTAL. :-)

Will gcc use counted_by or element_count ?

kees added a comment.Sep 6 2023, 9:43 AM

Will gcc use counted_by or element_count ?

GCC is using __counted_by: https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628459.html

kees added a comment.Sep 6 2023, 11:00 AM

I can generate warnings for anonymous structs were the __counted_by member is reported as "not found". For example:

little.c:7:28: warning: counted_by field 'count' not found
    7 |                 int array[] __counted_by(count);
      |                                          ^~~~~

For this:

#define __counted_by(member)   __attribute__((__counted_by__(member)))

struct anon {
        unsigned long flags;
        struct {
                unsigned char count;
                int array[] __counted_by(count);
        };
};

extern void bar(int input);

void foo(struct anon *p, int index)
{
        bar(p->array[index]);
}

Otherwise, things are looking good on test builds so far...

void added a comment.Sep 6 2023, 1:14 PM

I can generate warnings for anonymous structs were the __counted_by member is reported as "not found". For example:

little.c:7:28: warning: counted_by field 'count' not found
    7 |                 int array[] __counted_by(count);
      |                                          ^~~~~

For this:

#define __counted_by(member)   __attribute__((__counted_by__(member)))

struct anon {
        unsigned long flags;
        struct {
                unsigned char count;
                int array[] __counted_by(count);
        };
};

extern void bar(int input);

void foo(struct anon *p, int index)
{
        bar(p->array[index]);
}

I'll look into this.

Patch LGTM; just want moar tests.


Mind adding a test for

#if !__has_attribute(counted_by)
#error "has attribute broken"
#endif
clang/test/CodeGen/attr-counted-by.c
2 ↗(On Diff #555946)

Can you add another run line without the -fsanitize flags set, and use 2 different --check-prefixes for the two RUN lines? I'd be curious to see the differences in codegen between those set or not. I assume this attribute should affect codegen even with all of those disabled (maybe trapping instead of libcalling into ubsan runtime). I think update_cc_test_checks should be able to handle that.

39–51 ↗(On Diff #555946)

Is the call to func is unnecessary for this test or can we just return __builtin_dynamic_object_size(p->array, 1)?

67–73 ↗(On Diff #555946)

What is the intent of this test? Mind adding a comment?

In particular, it's not clear to my why this case should conditionally trap, seeing as the dynamically allocated object's array field is yet to be accessed in this example. What am I missing?

void updated this revision to Diff 556310.Sep 8 2023, 3:20 PM
  • Expand test to show the unsanitized version,
  • Unify the ways of finding a specific field based on a predicate,
  • Make sure we're using the correct types in a couple of key places.
void updated this revision to Diff 556312.Sep 8 2023, 3:22 PM
void marked an inline comment as done.

Remove unneded extern decl from test.

void added a comment.Sep 8 2023, 3:29 PM

I changed more than just the testcase (not a lot, but non-trivial nonetheless). PTAL.

clang/test/CodeGen/attr-counted-by.c
2 ↗(On Diff #555946)

Done.

39–51 ↗(On Diff #555946)

Nope. Not sure why I had it. :-/ Removed.

67–73 ↗(On Diff #555946)

This is a bit weird. I don't understand why it would trap either. It might have something to do with the 'alloc_size' attribute? (The code that this patch generates doesn't put that trap in there.) That's probably a bug.

However, when I modify the test to include an actual store:

void test3(int index) {
  /* ... */
  p->array[index] = 42;
}

I get two checks: one for the 'counted_by' attribute, and this one, probably for the bounds check. Again, it's probably because of the 'alloc_size', because when I remove the call to 'malloc' it generates only the 'counted_by' check...

nickdesaulniers added inline comments.Sep 8 2023, 3:51 PM
clang/test/CodeGen/attr-counted-by.c
2 ↗(On Diff #555946)

I guess that's what I was curious about; how come the attribute doesn't affect codegen unless the sanitizers are enabled?

void retitled this revision from [Clang] Add counted_by attribute to [Clang] Implement the counted_by attribute.Sep 8 2023, 3:58 PM
void edited the summary of this revision. (Show Details)
void retitled this revision from [Clang] Implement the counted_by attribute to [Clang] Implement the 'counted_by' attribute.Sep 8 2023, 4:10 PM
void edited the summary of this revision. (Show Details)
void edited the summary of this revision. (Show Details)Sep 8 2023, 4:12 PM
void updated this revision to Diff 556323.Sep 8 2023, 4:15 PM
void edited the summary of this revision. (Show Details)

Fix docs to use the correct code examples throughout.

void added inline comments.Sep 8 2023, 4:16 PM
clang/test/CodeGen/attr-counted-by.c
2 ↗(On Diff #555946)

I don't think we would want the bounds checking unless explicitly told so. I could be wrong?

void added inline comments.Sep 8 2023, 4:19 PM
clang/test/CodeGen/attr-counted-by.c
2 ↗(On Diff #555946)

To clarify. It does generate the code for the __bdos calculation. It just doesn't generate a trap.

void updated this revision to Diff 556324.Sep 8 2023, 4:26 PM

Add test output when the attribute isn't used.

void updated this revision to Diff 556619.Sep 12 2023, 5:18 PM

This is ready for review. Please take a look.

nickdesaulniers accepted this revision.Sep 13 2023, 9:33 AM

Thanks for the patch!

This revision is now accepted and ready to land.Sep 13 2023, 9:33 AM

Also, I encourage you to test a build of the linux kernel with __counted_by BEFORE merging this!

Perhaps worth updating the release notes, too.

aaron.ballman requested changes to this revision.Sep 13 2023, 11:25 AM

Thank you for working on this! Please be sure to also add a release note to clang/docs/ReleaseNotes.rst so users know about this. Also adding the codegen and attribute code owners for awareness.

clang/include/clang/AST/Decl.h
4272–4275 ↗(On Diff #556619)

Could this be implemented as: return !field_empty() ? *std::prev(field_end()) : nullptr; ? Then maybe someday we'll get better iterator support that isn't forward-only and this code will automagically get faster.

4277–4282 ↗(On Diff #556619)

We use this pattern fairly often to avoid repeating the implementation.

clang/include/clang/Basic/Attr.td
4235

Clang supports both as extensions in C++, as does GCC: https://godbolt.org/z/5xP7ncoha -- I think we should consider enabling this attribute in C++ mode, but I'm fine considering that in a follow-up.

4244–4247

Teeny tiniest of coding style nits :-D

clang/include/clang/Basic/AttrDocs.td
7196

If we land the changes with the attribute as a C-only attribute, we should document that explicitly here (unless we're immediately enabling it C++ mode; not asking for busywork).

clang/include/clang/Basic/DiagnosticSemaKinds.td
6361–6362

Why is this a warning and not a typical err_no_member error? I think we want that behavior so that we get typo correction for free. e.g.,

struct S {
  int Count;
  int FAM[] __attribute__((counted_by(Clount))); // Should get a "did you mean 'Count'" fix-it
};
6364–6368

We try to wrap syntax elements in single quotes in our diagnostics so it's more visually distinct

clang/lib/AST/ASTImporter.cpp
8891–8892

Should these be castAttrAs so it's more clear that type mismatches don't result in a null pointer, but an assertion?

clang/lib/AST/DeclBase.cpp
443–444 ↗(On Diff #556619)

Can you explain a bit about what this code is for? I didn't spot any test coverage for it, but perhaps I missed something.

clang/lib/CodeGen/CGBuiltin.cpp
857 ↗(On Diff #556619)

We don't generally use top-level const on local variables.

886 ↗(On Diff #556619)
clang/lib/Sema/SemaDecl.cpp
17901–17903

Ask and you shall receive! :-D

17913

This logic seems like it should live in SemaDeclAttr.cpp when handling the attribute. We're given the declaration the attribute is applied to, so we can do everything we need from that (the FieldDecl has a getParent() so we don't need the RecordDecl to be passed in. Is there a reason we need it to live here instead?

17936

Errr any integer type?

struct S {
  bool HerpADerp;
  int FAM[] __attribute__((counted_by(HerpADerp)));
};

seems like something we don't want, but I also wonder about enumeration types and more interestingly, plain char where it could be treated as signed or unsigned depending on compiler flags.

clang/test/Misc/warning-flags.c
21 ↗(On Diff #556619)

One line up: "The list of warnings below should NEVER grow. It should gradually shrink to 0." ;-)

That said, I don't think this warning needs to be added because I think we have an existing error that is more appropriate.

clang/test/Sema/attr-counted-by.c
1 ↗(On Diff #556619)

Is the triple necessary? Also, do we have to enable -fsanitize=array-bounds to test this?

21 ↗(On Diff #556619)

Please add test with the other strange FAM-like members, as well as one that is just a trailing constant array of size 2.

Some more tests or situations to consider:

struct S {
  struct {
    int NotInThisStruct;
  };
  int FAM[] __counted_by(NotInThisStruct); // Should this work?
};

int Global;
struct T {
  int FAM[] __counted_by(Global); // Should fail per your design but is that the behavior you want?
};

struct U {
  struct {
    int Count;
  } data;
  int FAM[] __counted_by(data.Count); // Thoughts?
};

struct V {
  int Sizes[2];
  int FAM[] __counted_by(Sizes[0]); // Thoughts?
};

(I'm not suggesting any of this needs to be accepted in order to land the patch.)

You should also have tests showing the attribute is diagnosed when applied to something other than a field, given something other than an identifier, is given zero arguments, and is given two arguments.

This revision now requires changes to proceed.Sep 13 2023, 11:25 AM
nickdesaulniers added inline comments.
clang/lib/CodeGen/CGBuiltin.cpp
857 ↗(On Diff #556619)

Do we explicitly document this in the style guide? This came up for me recently in a code review (cc @jyknight ).
https://llvm.org/docs/CodingStandards.html

aaron.ballman added inline comments.Sep 13 2023, 11:38 AM
clang/lib/CodeGen/CGBuiltin.cpp
857 ↗(On Diff #556619)

It's the prevailing style in our files rather than a hard rule. In Clang, we seem to have organically landed on "const members are fine, const locals are not fine, do not use globals". Our const-correctness story is terrible and if we had better const correct interfaces, I think we'd have landed somewhere different.

clang/lib/CodeGen/CGBuiltin.cpp
857 ↗(On Diff #556619)

prevailing styles should be documented explicitly in the style guide. That settles all discussions, now and in the future.

Or perhaps an RFC on discourse.

erichkeane added inline comments.Sep 13 2023, 11:56 AM
clang/include/clang/AST/DeclBase.h
482 ↗(On Diff #556619)

Why isn't this a member function?

clang/include/clang/Basic/Attr.td
4244

Should we instead be capturing the field itself, rather than its location? It seems to me that would be more useful?

clang/include/clang/Basic/AttrDocs.td
7199

As this is the purpose of this attribute, it seems to me that the documentation should headline/more obviously highlight that the purpose here is to improve the result of the sanitizer when it comes to flexible array members.

clang/include/clang/Basic/DiagnosticSemaKinds.td
6364

This one isn't clear... something more specifically about 'cannot point to itself' is perhaps more useful/explainatory.

Also, the beginning of all of these is really quite awkward as well, perhaps something like:

field %0 referenced by 'counted_by' attribute ... ?

clang/lib/AST/DeclBase.cpp
482 ↗(On Diff #556619)

Whats going on here? Is this simply an attempt to see if this is the last one? Can we combine this effort with the 'getLastFieldDecl' above? Alternatively, can we get a good comment here?

void updated this revision to Diff 556734.Sep 13 2023, 4:44 PM
void marked 36 inline comments as done.

Added more error messages. Changed some code around to align with coding practices. Added some more test cases.

clang/include/clang/AST/Decl.h
4272–4275 ↗(On Diff #556619)

Using std::prev on a forward iterator won't work:

https://stackoverflow.com/questions/23868378/why-stdprev-does-not-fire-an-error-with-an-iterator-of-stdunordered-set

std::prev itself is defined only for bidirectional iterators:

 template<typename _BidirectionalIterator>
    _GLIBCXX_NODISCARD
    inline _GLIBCXX17_CONSTEXPR _BidirectionalIterator
    prev(_BidirectionalIterator __x, typename
         iterator_traits<_BidirectionalIterator>::difference_type __n = 1)
    {
...
4277–4282 ↗(On Diff #556619)

Done. I just hate the use of const_cast...

clang/include/clang/AST/DeclBase.h
482 ↗(On Diff #556619)

Because I want to be able to call it with a nullptr. I'll document this.

clang/include/clang/Basic/Attr.td
4235

I understand we can accept it as an extension, but yeah let's burn that bridge when we come to it.

4244

I tried that before and it didn't work. The issue is that at the time of parsing we don't yet have the full definition of the structure / union. So the Decl's aren't really available to us yet. There may be a way to massage the parsing code to allow this to happen, but I'd like to do it as a separate patch. I'll document it with a FIXME.

clang/include/clang/Basic/AttrDocs.td
7196

That's not something normally done for other COnly attributes. (However, the documentation shows those attributes being available for C++11, which is probably a mistake...) I went ahead and added that this is for C though.

7199

I'm not sure how much more I could do to highlight that. :-)

7209

This was a suggestion by a GCC developer as well. I personally hate that syntax. Regardless, it's something to be done in the future. :-)

clang/include/clang/Basic/DiagnosticSemaKinds.td
6361–6362

I thought it would be too harsh to make this an error. But I suppose there's no harm in doing this. I'm not sure how to implement a fixit for a closely matching identifier. Do you have an example of how to find the closest matching Decl?

6364

I reworded them a bit.

As for referring to itself, it could be a bit confusing to say that it 'cannot point to itself' because the 'itself' in that is the attribute, not the flexible array member. I think it's more-or-less clear what the error's referring to. The user can use this attribute only with a flexible array member. So they should know what what's meant by the message.

clang/lib/AST/ASTImporter.cpp
8891–8892

Sure.

clang/lib/AST/DeclBase.cpp
443–444 ↗(On Diff #556619)

Short answer is I don't know. :-) This is copied from the original implementation of this method.

482 ↗(On Diff #556619)

It's not really the same as the getLastField method. That one will return a non-null pointer whether or not the last field is a FAM. This code is just shorthand for getting the iterator for the FAM and then quickly testing if it's the final one. Again, this is because of the forward iterator that RecordDecl uses, for some reason. If we had bidirectional iterators, this would be much clearer.

clang/lib/CodeGen/CGBuiltin.cpp
857 ↗(On Diff #556619)

Easy enough to do. Done.

clang/lib/Sema/SemaDecl.cpp
17901–17903

Nice!! :-D

17913

I put it here only because it's close to where it's called. I'm ambivalent on where it should live. Am I performing the check in the correct place?

Changed to use getParent().

17936

Yeah...unless they want just 1 element. ;-) I can rule a bool out. (Done)

A char should be fine. Dunno about signed vs. unsigned...

clang/test/Misc/warning-flags.c
21 ↗(On Diff #556619)

Awww....gatekeeping. ;-) Reverted.

clang/test/Sema/attr-counted-by.c
1 ↗(On Diff #556619)

I suppose their not 100% necessary. I'll remove them.

21 ↗(On Diff #556619)

I added a "this isn't a flexible array member" error message.

struct S {
  struct {
    int NotInThisStruct;
  };
  int FAM[] __counted_by(NotInThisStruct); // Should this work?
};

Yes, I believe it should. Kees had a similar example:

struct S {
  int x;
  struct {
    int count;
    int FAM[] __counted_by(count);
  };
};

I made changes to support this. It may be controversial, so please let me know if you or anyone has an issue with it.

int Global;
struct T {
  int FAM[] __counted_by(Global); // Should fail per your design but is that the behavior you want?
};

Yes, it should fail. I'll add a test case.

struct U {
  struct {
    int Count;
  } data;
  int FAM[] __counted_by(data.Count); // Thoughts?
};

I don't think we should support that at the moment. Referencing arbitrary fields in a nested structure is very complicated and the syntax for it is not at all settled on.

struct V {
  int Sizes[2];
  int FAM[] __counted_by(Sizes[0]); // Thoughts?
};

Hmm...Not sure. I'll ask the GCC person who's implementing this on her side about this.

void updated this revision to Diff 556749.Sep 14 2023, 1:48 AM

Add release note.

aaron.ballman added inline comments.Sep 14 2023, 6:29 AM
clang/include/clang/AST/Decl.h
4272–4275 ↗(On Diff #556619)

Drat! Oh well, it was a lovely thought.

4277–4282 ↗(On Diff #556619)

Yeah, I usually go take a shower after that sort of thing. :-D

clang/include/clang/Basic/Attr.td
4244

I think at some point we're going to want this to take an Expr argument instead of an identifier argument, so we can support foo.bar and baz[0], etc as arguments for more complicated situations. But I believe the current form is reasonable as-is (we can start taking an expression and all of these identifiers should become a MemberExpr, so users don't have to change their code).

clang/include/clang/Basic/AttrDocs.td
7196

Heh, our attribute documentation has been slowly getting better over time but is still pretty inconsistent; thank you for adding it!

clang/include/clang/Basic/DiagnosticSemaKinds.td
6361–6362

I'll leave a comment below with what I'm thinking.

clang/lib/AST/DeclBase.cpp
443–444 ↗(On Diff #556619)

Good enough for me, thank you. :-) (I made that comment before I realized this was a refactor, I forgot to come back and delete it.)

clang/lib/Sema/SemaDecl.cpp
17933

When the lookup result is not found, I think you should call Sema::DiagnoseEmptyLookup(). This will handle giving you the error about the member not being found, but it should also give you typo correction for free, and if typo correction happens, you should then already know what new declaration was found via the correction, so you can continue to check for the other cases from there.

https://github.com/llvm/llvm-project/blob/b57df9fe9a1a230f277d671bfa0884bbda9fc1c5/clang/lib/Sema/SemaLambda.cpp#L1151 is an example of this being called elsewhere.

17936

The char situation I'm thinking about is morally: https://godbolt.org/z/377WPYojz

With -fsigned-char, accessing Idx means you get the value -1 and with -funsigned-char, you get the value 255. If that was used to index into FAM, the signed char version would become a bounds access issue whereas the unsigned char version works fine, so it's hard to write a portable attribute that refers to a char member.

Then again, the counted_by attribute is intended to catch exactly this kind of issue, so perhaps it's fine on char?

void added inline comments.Sep 14 2023, 1:34 PM
clang/include/clang/Basic/Attr.td
4244

I think that GCC may not be able to use expressions (apart from simple, resolvable integer arithmetic). But yeah, those types of expansions would be nice. There are many issues to resolve with them first---such as how to reference a field in a sub-struct when the fam is in another substruct:

struct S {
  struct A {
    int count;
  } a;
  struct C {
    int count;
    struct X {
      int count;
    } x;
  } c;
  struct F {
    int fam[] __counted_by(/* how to refer to p->c.x.count? */);
  } f;
};
clang/lib/Sema/SemaDecl.cpp
17933

GAAAA! It's just like the rest of Clang: There are a million ways to do something, but none of them are documented and only one of them works for your particular case...kind of...and it takes a ton of arguments which seem counter-intuitive (CXXScopeSpec instead of DeclSpec?). All of Clang has been over-engineered from the beginning and it shows in how hard it is to do even the most basic things. So hard that people write the same things over and over with only small changes (see the two designated initializers implementations) leading to a nightmare.

The example you pointed to doesn't work. And none of the other forms seem to fit well enough. There are apparently many ways to "correct" a typo, but none of them seemed appropriate or like they'd do what I want. Is it really necessary to have this?

17936

True. But we technically have that issue with every type. Maybe it'd be best to make sure we don't have a negative result?

kees added a comment.Sep 15 2023, 2:37 PM

Added more error messages. Changed some code around to align with coding practices. Added some more test cases.

Doing a full "allmodconfig" Linux kernel build with my 200ish __counted_by annotations is building cleanly. Yay! :)

clang/test/Sema/attr-counted-by.c
21 ↗(On Diff #556619)

Yes, I believe it should. Kees had a similar example:

struct S {
  int x;
  struct {
    int count;
    int FAM[] __counted_by(count);
  };
};

I made changes to support this. It may be controversial, so please let me know if you or anyone has an issue with it.

FWIW, the Linux kernel has a LOT of this style (in an anonymous struct), which is why I wanted to make sure it was supported in this first version of the feature.

void added a comment.Sep 15 2023, 3:37 PM

Added more error messages. Changed some code around to align with coding practices. Added some more test cases.

Doing a full "allmodconfig" Linux kernel build with my 200ish __counted_by annotations is building cleanly. Yay! :)

Great! That means I did something wrong.

;-)

struct V {
  int Sizes[2];
  int FAM[] __counted_by(Sizes[0]); // Thoughts?
};

-fbounds-safety doesn't allow this. In our internal adoption experience, we haven't encountered such use cases yet. So, I think it's best to make the model restrictive to avoid surprises. If we were to support it, I think it should at least be limited to cases where the array subscript expression is known to be in bounds at compile time, to avoid an OOB access when the counted_by argument is evaluated.

-fbounds-safety doesn't allow this. In our internal adoption experience, we haven't encountered such use cases yet. So, I think it's best to make the model restrictive to avoid surprises. If we were to support it, I think it should at least be limited to cases where the array subscript expression is known to be in bounds at compile time, to avoid an OOB access when the counted_by argument is evaluated.

Additionally: it is probably safe from an aliasing perspective (or at least not worse than using any other field) to use an array subscript in a count expression, provided the array's storage exists within the struct. However, we certainly wouldn't want people to go towards array[variable], pointer[anything], or (worse!) FAM[anything], and constant array subscripts are confusingly adjacent to the boundary we need to close. If we're just entertaining the possibility without motivating use cases at this time, I'd advise to leave it be.

I've got no further comments/concerns.

clang/include/clang/AST/Decl.h
4272–4275 ↗(On Diff #556619)

something like !field_empty() ? *std::advance(field_begin(), std::distance(field_begin(), field_end() - 1) : nullptr

would do what Aaron would like, though perhaps with an extra run through the fields list until then.

clang/include/clang/Basic/DiagnosticSemaKinds.td
6364

I think the reword helps, I am less confident that referring to the 'flexible array member' is clear. One of the things I've noticed in the past is users 'trying' attributes without looking them up to see what they do. Frankly, I think that should be a supported strategy, and one that we manage by making clear diagnostics.

I'm up for suggestions/hopeful someone else will come up with something better, but I can hold my nose at this if we don't have anything better (and in fact, my attempts to put a strawman up were at least as bad).

void added inline comments.Sep 18 2023, 1:32 PM
clang/include/clang/AST/Decl.h
4272–4275 ↗(On Diff #556619)

Yeah, I want to avoid unnecessary iterations through the list.

clang/include/clang/Basic/DiagnosticSemaKinds.td
6364

I added a diagnostic (the first one) that indicates it must apply to a FAM. So maybe it'll be okay? I'm open to other wording...

void added a comment.Sep 21 2023, 12:31 PM

Friendly Ping.

Friendly Ping.

Aaron is away at a conference this week, so hopefully he'll do another pass when he gets back next week.

pirama added a subscriber: pirama.Sep 25 2023, 1:47 PM
void updated this revision to Diff 557415.Sep 27 2023, 11:43 AM

Use CBA as an acronym of CountedByAttribute instead of EBA, which is oldspeak.

aaron.ballman added inline comments.Sep 28 2023, 9:05 AM
clang/include/clang/Basic/Attr.td
4244

FWIW, my intuition is that we'd want something close to:

struct S {
  struct A {
    int count;
  } a;
  struct C {
    int count;
    struct X {
      int count;
    } x;
  } c;
  struct F {
    int fam[] __counted_by(c.x.count);
  } f;
};

but this is in follow-up territory.

clang/lib/Sema/SemaDecl.cpp
17913

I prefer it to live in SemaDeclAttr.cpp as that's where the other attribute checking logic generally lives; it makes it slightly easier for us to find the checking logic in the future.

17933

Necessary? No. A better implementation of the feature? I think so; we want the user experience to be consistent whenever possible and it's pretty unfortunate that users would get typo correction elsewhere but not here.

That said, I don't think this patch needs to be gated on that; we can add a FIXME to use a more general mechanism instead of even more custom logic.

17936

So at CodeGen time we see if we know the value within the field, and if it's known to be negative, we diagnose? Or are you thinking of something else?

void updated this revision to Diff 557462.Sep 28 2023, 1:02 PM
void marked an inline comment as done.

Move attribute checking over to SemaDeclAttr.cpp

void added inline comments.Sep 28 2023, 4:19 PM
clang/include/clang/Basic/Attr.td
4244

I'm similarly inclined, though I'd like to adopt the "designator syntax" and add a dot (.) before it: .c.x.count. But yeah, that's something for future us to decide. :-)

clang/lib/Sema/SemaDecl.cpp
17913

Okay. I moved it over there.

17933

I'm still not able to get it to work for me. Is there any documentation on how to work with that feature?

17936

No, because we generally won't know the value in this field at codegen time. I meant it as a runtime check. I'm going to see what GCC does.

void updated this revision to Diff 557473.Sep 28 2023, 5:28 PM

Add a "FIXME" to improve the diagnostics with a typo hint.

void updated this revision to Diff 557496.Sep 29 2023, 12:29 PM

FINALLY! found out how to do suggestions for typos.

void added a comment.Sep 29 2023, 12:29 PM

Okay, I now have a suggested fix hint. PTAL.

FINALLY! found out how to do suggestions for typos.

<3 Thank you for your perseverance!

clang/lib/Sema/SemaDeclAttr.cpp
8429–8437

The logic here still seems incorrect. I was expecting the code to look more like this:

bool Sema::CheckCountedByAttr(Scope *S, const FieldDecl *FD) {
  const RecordDecl *RD = FD->getParent();
  const auto *CBA = FD->getAttr<CountedByAttr>();
  const IdentifierInfo *FieldName = CBA->getCountedByField();
  DeclarationNameInfo NameInfo(FieldName,
                               CBA->getCountedByFieldLoc().getBegin());
  LookupResult Result(*this, NameInfo, Sema::LookupMemberName);

  LookupName(Result, S);
  if (Result.empty()) {
    CXXScopeSpec SS;
    DeclFilterCCC<FieldDecl> Filter(const_cast<IdentifierInfo *>(FieldName));
    if (DiagnoseEmptyLookup(S, SS, Result, Filter, nullptr, std::nullopt,
                            const_cast<DeclContext *>(FD->getDeclContext())))
      return true;
  }

  const FieldDecl *Field = Result.getAsSingle<FieldDecl>();
  LangOptions::StrictFlexArraysLevelKind StrictFlexArraysLevel =
      Context.getLangOpts().getStrictFlexArraysLevel();
  ...

I tested this locally on code like:

struct not_a_fam {
  int foo;
  int fam[] __attribute__((counted_by(fob)));
};

and get a diagnostic like:

C:\Users\aballman\OneDrive - Intel Corporation\Desktop\test.c:3:39: error: use of undeclared identifier 'fob'; did you
      mean 'foo'?
    3 |   int fam[] __attribute__((counted_by(fob)));
      |                                       ^~~
      |                                       foo
C:\Users\aballman\OneDrive - Intel Corporation\Desktop\test.c:2:7: note: 'foo' declared here
    2 |   int foo;
      |       ^
1 error generated.

Note, I had to add a constructor to DeclFilterCCC to expose the base class constructor, and modify DiagnoseEmptyLookup() like this:

diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp
index 2ed31a90c5dc..3c4ade391a5e 100644
--- a/clang/lib/Sema/SemaExpr.cpp
+++ b/clang/lib/Sema/SemaExpr.cpp
@@ -2458,7 +2458,8 @@ bool Sema::DiagnoseDependentMemberLookup(const LookupResult &R) {
 bool Sema::DiagnoseEmptyLookup(Scope *S, CXXScopeSpec &SS, LookupResult &R,
                                CorrectionCandidateCallback &CCC,
                                TemplateArgumentListInfo *ExplicitTemplateArgs,
-                               ArrayRef<Expr *> Args, TypoExpr **Out) {
+                               ArrayRef<Expr *> Args, DeclContext *LookupCtx,
+                               TypoExpr **Out) {
   DeclarationName Name = R.getLookupName();

   unsigned diagnostic = diag::err_undeclared_var_use;
@@ -2474,7 +2475,9 @@ bool Sema::DiagnoseEmptyLookup(Scope *S, CXXScopeSpec &SS, LookupResult &R,
   // unqualified lookup.  This is useful when (for example) the
   // original lookup would not have found something because it was a
   // dependent name.
-  DeclContext *DC = SS.isEmpty() ? CurContext : nullptr;
+  DeclContext *DC =
+      LookupCtx ? LookupCtx : (SS.isEmpty() ? CurContext : nullptr);
+  DeclContext *OrigLookupCtx = DC;
   while (DC) {
     if (isa<CXXRecordDecl>(DC)) {
       LookupQualifiedName(R, DC);
@@ -2517,12 +2520,12 @@ bool Sema::DiagnoseEmptyLookup(Scope *S, CXXScopeSpec &SS, LookupResult &R,
           emitEmptyLookupTypoDiagnostic(TC, *this, SS, Name, TypoLoc, Args,
                                         diagnostic, diagnostic_suggest);
         },
-        nullptr, CTK_ErrorRecovery);
+        nullptr, CTK_ErrorRecovery, OrigLookupCtx);
     if (*Out)
       return true;
-  } else if (S &&
-             (Corrected = CorrectTypo(R.getLookupNameInfo(), R.getLookupKind(),
-                                      S, &SS, CCC, CTK_ErrorRecovery))) {
+  } else if (S && (Corrected = CorrectTypo(R.getLookupNameInfo(),
+                                           R.getLookupKind(), S, &SS, CCC,
+                                           CTK_ErrorRecovery, OrigLookupCtx))) {
     std::string CorrectedStr(Corrected.getAsString(getLangOpts()));
     bool DroppedSpecifier =
         Corrected.WillReplaceSpecifier() && Name.getAsString() == CorrectedStr;
@@ -2812,7 +2815,7 @@ Sema::ActOnIdExpression(Scope *S, CXXScopeSpec &SS,
     // a template name, but we happen to have always already looked up the name
     // before we get here if it must be a template name.
     if (DiagnoseEmptyLookup(S, SS, R, CCC ? *CCC : DefaultValidator, nullptr,
-                            std::nullopt, &TE)) {
+                            std::nullopt, nullptr, &TE)) {
       if (TE && KeywordReplacement) {
         auto &State = getTypoExprState(TE);
         auto BestTC = State.Consumer->getNextCorrection();

Can you give something like this a shot and see how well it works for you?

void added inline comments.Oct 3 2023, 2:02 AM
clang/lib/Sema/SemaDeclAttr.cpp
8429–8437

I wanted to check for two different issues when the field isn't found:

  1. The field is outside of the struct, or
  2. The field isn't found (or is misspelled).

If I do your version, that distinction goes away.

But if you want me to work on this more, would you mind if I did it in a follow-up commit? I'm not at all happy about adding yet another parameter to a method that already has a million parameters. It would be nice if that was cleaned up.

aaron.ballman added inline comments.Oct 3 2023, 6:31 AM
clang/lib/Sema/SemaDeclAttr.cpp
8429–8437

I wanted to check for two different issues when the field isn't found:

  1. The field is outside of the struct, or
  2. The field isn't found (or is misspelled).

If I do your version, that distinction goes away.

Correct, my version is only looking for the field as a member name because we never want it to find a variable outside the scope of the structure. We could augment it to do what you want though:

bool Sema::CheckCountedByAttr(Scope *S, const FieldDecl *FD) {
  const RecordDecl *RD = FD->getParent();
  const auto *CBA = FD->getAttr<CountedByAttr>();
  const IdentifierInfo *FieldName = CBA->getCountedByField();
  DeclarationNameInfo NameInfo(FieldName,
                               CBA->getCountedByFieldLoc().getBegin());
  LookupResult Result(*this, NameInfo, Sema::LookupMemberName);

  LookupName(Result, S);
  if (Result.empty()) {
    // Look for the name outside of the scope of the structure so we can issue a different
    // diagnostic in that case.
    LookupResult BackupResult(*this, NameInfo, Sema::LookupOrdinaryName);
    LookupName(BackupResult, S);
    if (!BackupResult.empty()) {
      return Diag(Loc, diag::whatever);
    } else {
      // Otherwise, diagnose the empty lookup.
      CXXScopeSpec SS;
      DeclFilterCCC<FieldDecl> Filter(const_cast<IdentifierInfo *>(FieldName));
      if (DiagnoseEmptyLookup(S, SS, Result, Filter, nullptr, std::nullopt,
                              const_cast<DeclContext *>(FD->getDeclContext())))
        return true;
    }
  }

  const FieldDecl *Field = Result.getAsSingle<FieldDecl>();
  LangOptions::StrictFlexArraysLevelKind StrictFlexArraysLevel =
      Context.getLangOpts().getStrictFlexArraysLevel();
  ...

But if you want me to work on this more, would you mind if I did it in a follow-up commit? I'm not at all happy about adding yet another parameter to a method that already has a million parameters. It would be nice if that was cleaned up.

I don't insist on doing it as part of this patch but I do strongly prefer doing it now. Waiting runs the risk of not actually doing that work, but also, I worry that the lookups will be subtly different and we might risk breaking code by changing it later. That said, if you mean "(almost) immediately after landing this" kind of follow-up, I'm not worried about the breakage and that's fine.

Yes, I mean to do it as a direct follow-up. 😊

-bw

aaron.ballman accepted this revision.Oct 4 2023, 5:38 AM

Yes, I mean to do it as a direct follow-up. 😊

Okay, let's go that route then. This gets a fairly significant review off Phab with incremental progress, so LGTM!

This revision is now accepted and ready to land.Oct 4 2023, 5:38 AM
This revision was automatically updated to reflect the committed changes.