Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Phabricator shutdown timeline

[clang] Introduce -fstrict-flex-arrays=<n> for stricter handling of flexible arrays
ClosedPublic

Authored by serge-sans-paille on Jun 2 2022, 2:34 AM.

Details

Summary

Some code [0] consider that trailing arrays are flexible, whatever their size.
Support for these legacy code has been introduced in
f8f632498307d22e10fab0704548b270b15f1e1e but it prevents evaluation of
builtin_object_size and builtin_dynamic_object_size in some legit cases.

Introduce -fstrict-flex-arrays=<n> to have stricter conformance when it is
desirable.

n = 0: current behavior, any trailing array member is a flexible array. The default.
n = 1: any trailing array member of undefined, 0 or 1 size is a flexible array member
n = 2: any trailing array member of undefined or 0 size is a flexible array member

Similar patch for gcc discuss here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836

[0] https://docs.freebsd.org/en/books/developers-handbook/sockets/#sockets-essential-functions

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

restore blank line

MaskRay added inline comments.Jun 27 2022, 5:08 PM
clang/include/clang/Driver/Options.td
1143
clang/test/CodeGen/bounds-checking-fma.c
1 ↗(On Diff #440396)

I realized that I made a typo. It should be fam instead of fma....

I precomitted a test for the default case. You may add -fstrict-flex-arrays={1,2} lines on top of it.

clang/test/CodeGen/object-size-flex-array.c
100

see https://reviews.llvm.org/D128643 "[clang] -fsanitize=array-bounds: treat all trailing size-1 arrays as flexible" for another regression

For the record (now that the original commit has been reverted anyway for now):

Besides the above regression for -fsanitize=array-bounds and macro'ized array size in HarfBuzz, I also ran into yet another regression when Firebird is built with -fsanitize=array-bounds: That project, in https://github.com/FirebirdSQL/firebird/blob/master/src/lock/lock_proto.h uses a trailing member

	srq lhb_hash[1];			// Hash table

as a flexible array, but declared in a

struct lhb : public Firebird::MemoryHeader

that is not a standard-layout class (because the Firebird::MemoryHeader base class also declares non-static data members). And isFlexibleArrayMember returns false if the surrounding class is not a standard-layout class.

In the end, what brought back a successful -fsanitize=array-bounds LibreOffice (which bundles those HarfBuzz and Firebird, among others) for me was to exclude the whole set of blocks

// Don't consider sizes resulting from macro expansions or template argument
// substitution to form C89 tail-padded arrays.

TypeSourceInfo *TInfo = FD->getTypeSourceInfo();
while (TInfo) {
  TypeLoc TL = TInfo->getTypeLoc();
  // Look through typedefs.
  if (TypedefTypeLoc TTL = TL.getAs<TypedefTypeLoc>()) {
    const TypedefNameDecl *TDL = TTL.getTypedefNameDecl();
    TInfo = TDL->getTypeSourceInfo();
    continue;
  }
  if (ConstantArrayTypeLoc CTL = TL.getAs<ConstantArrayTypeLoc>()) {
    const Expr *SizeExpr = dyn_cast<IntegerLiteral>(CTL.getSizeExpr());
    if (!SizeExpr || SizeExpr->getExprLoc().isMacroID())
      return false;
  }
  break;
}

const ObjCInterfaceDecl *ID =
    dyn_cast<ObjCInterfaceDecl>(FD->getDeclContext());
const RecordDecl *RD = dyn_cast<RecordDecl>(FD->getDeclContext());
if (RD) {
  if (RD->isUnion())
    return false;
  if (const CXXRecordDecl *CRD = dyn_cast<CXXRecordDecl>(RD)) {
    if (!CRD->isStandardLayout())
      return false;
  }
} else if (!ID)
  return false;

(by means of an additional bool parameter) when isFlexibleArrayMember is called from getArrayIndexingBound (in clang/lib/CodeGen/CGExpr.cpp).

see https://reviews.llvm.org/D128643 "[clang] -fsanitize=array-bounds: treat all trailing size-1 arrays as flexible" for another regression

For the record (now that the original commit has been reverted anyway for now):

Besides the above regression for -fsanitize=array-bounds and macro'ized array size in HarfBuzz, I also ran into yet another regression when Firebird is built with -fsanitize=array-bounds: That project, in https://github.com/FirebirdSQL/firebird/blob/master/src/lock/lock_proto.h uses a trailing member

	srq lhb_hash[1];			// Hash table

as a flexible array, but declared in a

struct lhb : public Firebird::MemoryHeader

that is not a standard-layout class (because the Firebird::MemoryHeader base class also declares non-static data members). And isFlexibleArrayMember returns false if the surrounding class is not a standard-layout class.

In the end, what brought back a successful -fsanitize=array-bounds LibreOffice (which bundles those HarfBuzz and Firebird, among others) for me was to exclude the whole set of blocks

// Don't consider sizes resulting from macro expansions or template argument
// substitution to form C89 tail-padded arrays.

TypeSourceInfo *TInfo = FD->getTypeSourceInfo();
while (TInfo) {
  TypeLoc TL = TInfo->getTypeLoc();
  // Look through typedefs.
  if (TypedefTypeLoc TTL = TL.getAs<TypedefTypeLoc>()) {
    const TypedefNameDecl *TDL = TTL.getTypedefNameDecl();
    TInfo = TDL->getTypeSourceInfo();
    continue;
  }
  if (ConstantArrayTypeLoc CTL = TL.getAs<ConstantArrayTypeLoc>()) {
    const Expr *SizeExpr = dyn_cast<IntegerLiteral>(CTL.getSizeExpr());
    if (!SizeExpr || SizeExpr->getExprLoc().isMacroID())
      return false;
  }
  break;
}

const ObjCInterfaceDecl *ID =
    dyn_cast<ObjCInterfaceDecl>(FD->getDeclContext());
const RecordDecl *RD = dyn_cast<RecordDecl>(FD->getDeclContext());
if (RD) {
  if (RD->isUnion())
    return false;
  if (const CXXRecordDecl *CRD = dyn_cast<CXXRecordDecl>(RD)) {
    if (!CRD->isStandardLayout())
      return false;
  }
} else if (!ID)
  return false;

(by means of an additional bool parameter) when isFlexibleArrayMember is called from getArrayIndexingBound (in clang/lib/CodeGen/CGExpr.cpp).

Oh, I did not see this when pushing a test efd90ffbfc427ad4c4675ac1fcae9d53cc7f1322 . Consider adding additional tests in clang/test/CodeGen/bounds-checking-fam.c.
It's worth bringing up such issue to the project. GCC and Clang have been working around them so far but they should eventually be fixed.
For CPython I try to be loud on https://github.com/python/cpython/issues/84301

GCC and Clang don't have the same behavior wrt. macros-as-abound and standard-layout-requirement, see https://godbolt.org/z/3vc4TcTYz
I'm fine with keeping the CLang behavior, but do we want to keep it only for level=0, and drop it for higher level (this would look odd to me).

I'd say that we keep it for level 0,1,2, but then I think we should *not* make it an option of the isFlexibleArrayMember method. That would allow for inconsistent behavior wrt. FAM across the codebase and I'm not a fan of it.

Thoughts?

Oh, I did not see this when pushing a test efd90ffbfc427ad4c4675ac1fcae9d53cc7f1322 . Consider adding additional tests in clang/test/CodeGen/bounds-checking-fam.c.

see https://reviews.llvm.org/D128783 "Check for more -fsanitize=array-bounds regressions"

It's worth bringing up such issue to the project. GCC and Clang have been working around them so far but they should eventually be fixed.

All the issues I encountered were in C++ code, and I'm not sure replacing one non-standard behavior (treating trailing length-one arrays as flexible) with another (C flexible array members) is something I feel confident to suggest to those projects.

serge-sans-paille added a subscriber: chandlerc.

Code updated to take into account two situations:

  • size resulting from macro expansion. Previous behavior was inconsistent in that situation. I chose to consider that int a[1] and int a[N] where N is a macro definition both are FAM. That's GCC behavior too. CCing @chandlerc because he introduced that feature, probably because that could make an array a FAM just based on preprocessor flag (?)
  • FAM within non-standard layout C++ object. GCC allows that behavior and clang used to have an inconsistent approach here. I'd like to stick to GCC behavior.

Note that we still diverge from GCC behavior for int a[N] when N results from template expansion. I could revert that too.

GCC and Clang don't have the same behavior wrt. macros-as-abound and standard-layout-requirement, see https://godbolt.org/z/3vc4TcTYz
I'm fine with keeping the CLang behavior, but do we want to keep it only for level=0, and drop it for higher level (this would look odd to me).

If you're referring to the warning, GCC needs -O2 to issue most instances of -Warray-bounds. The warning also treats one-element trailing arrays as flexible array members, so it's intentionally silent about the test case. GCC should update the warning to reflect the -fstrict-flex-arrays=N level. (Macros are long expanded by the time the code reaches the middle end so they don't come into play, except perhaps when they come from system headers.)

Do *not* try to syndicate code across different location. Alhtough that would be benefitial to the project, it also makes this patch too complex and controversial because it would imply regression on some behaviors.

Instead, focus on compatibility with existing tests, and refine existing behavior wrt. bound size.

As I commented on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836#c32,

It doesn't make sense to have a mode in which int array[0] is accepted but is not a flex array.
Either that should be a compilation error (as the standard specifies), or it should be a flex array. Accepting it as an extension but having it do the wrong thing is not useful or helpful.
Note that Clang has a dedicated warning flag for zero-length arrays: -Wzero-length-array, so anyone who wants to prohibit them may use -Werror=zero-length-array. It would be helpful for GCC could follow suit there.

The -fstrict-flex-arrays=3 option should be removed.

jyknight added inline comments.Jul 6 2022, 9:55 AM
clang/docs/ClangCommandLineReference.rst
2648

Docs should also mention what the default -fno-strict-flex-arrays means -- that ALL sizes of trailing arrays are considered flexible array members. (I'm amazed that's the rule, and I never knew it. I always thought the special casing for FAMs was restricted to sizes 0 and 1!)

Also, since apparently different parts of the compiler have been (and will now continue to) use different default behaviors, may want to document that as well. I'm sure I don't know what the rules actually are intended to be here. E.g. that a macro-expansion of the size arg disables the special-behavior for [1] is extremely surprising!

clang/lib/Sema/SemaChecking.cpp
15804

Presumably the size-zero/unsized cases are already being taken care of elsewhere in the code? I find it hard to believe we are currently emitting diagnostics for size-0 FAM which we don't emit for size-1 FAM?

clang/test/SemaCXX/array-bounds-strict-flex-arrays.cpp
10

Except we actually _do_ know the bounds of the full-object and ought to be able to warn on this code anyhow...

Better to have the test function accept a pointer, so that's not a conflating issue?

clang/docs/ClangCommandLineReference.rst
2648

it is worse than that: for some checks, any size is valid for FAM, but not for alls. For some checks, macro expansion prohibits FAM, but not for all, etc, etc. I don't want to document that behavior, because it is too specific to each pass. My plan is

  1. land -fstrict-flex-array support
  2. launch a thread on the ugly situation we put ourselves in, and extract a decision for each case
  3. support and document an homogeneous behavior across passes.
  4. syndicate code across passes
clang/lib/Sema/SemaChecking.cpp
15804

correct

15969

And here

15975

Handled here

clang/test/SemaCXX/array-bounds-strict-flex-arrays.cpp
10

Correct

serge-sans-paille edited the summary of this revision. (Show Details)

Take into account @jyknight review.

serge-sans-paille edited the summary of this revision. (Show Details)

Remove debug code

jyknight added inline comments.Jul 7 2022, 2:34 PM
clang/lib/CodeGen/CGExpr.cpp
895

Similar to SemaChecking below, could use a comment like:

FIXME: While the default -fstrict-flex-arrays=0 permits Size>1 trailing arrays to be treated as flexible-array-members, we still emit ubsan checks as if they are not. Pending further discussion...

909

I believe this bit is incorrect -- it should just go back to 'return true;'. The StrictFlexArraysLevel check above already eliminates the cases we want to eliminate (size==1 in strictness-level 2.)

clang/lib/Sema/SemaChecking.cpp
15804

The FIXME comment isn't correct, since only nonzero sizes ever reach this function. The code can be simplified too. Also -- there is another FIXME that should be here, regarding the behavior of size>1 FAMs.

I suggest (with rewrapped comment of course):

if (!ND) return false;
// FIXME: While the default -fstrict-flex-arrays=0 permits Size>1 trailing arrays to be treated as flexible-array-members, we still emit diagnostics as if they are not. Pending further discussion...
if (StrictFlexArraysLevel >= 2 || Size != 1) return false;`
15969

Looks like actually the int x[] case is handled with the large "IsUnboundedArray" condition above not here...

And, actually, all of that code to generate warnings for larger-than-addrspace offsets OUGHT to be getting used for int x[0] and int x[1] flexible arrays, too. Needs another FIXME for that...

clang/lib/StaticAnalyzer/Core/MemRegion.cpp
792

Yuk, another place that is weird and doesn't handle StrictFlexArrays as expected...

serge-sans-paille marked 4 inline comments as done.

Take reviews into account, basically spreading a lot of FIXME across the codebase :'(

@kees are you ok with current state?

clang/lib/CodeGen/CGExpr.cpp
909

Well, if we are in strictness-level 2, with an undefined size or size = 0, we can still reach that path, and don't want to return 'true' because FAM in union are in invalid per the standard.

clang/lib/StaticAnalyzer/Core/MemRegion.cpp
792

I've been able to slightly move that check below, without impacting test suite. Also added a FIXME because Opts.ShouldConsiderSingleElementArraysAsFlexibleArrayMembers looks redundant with -fstrict-flex-arrays

jyknight added inline comments.Jul 11 2022, 11:01 AM
clang/lib/CodeGen/CGExpr.cpp
909

Yes, we can reach this path, which is why the change is incorrect. We should not be changing the FAMness of undefined size, or size == 0, in any of the modes. To be more specific --

union X { int x[0]; }; should still be a FAM in all strictness modes. (if you don't want zero-length-arrays, use -Werror=zero-length-array).

For union X { int x[]; };: this ought to be a compiler error. It's likely just an oversight that we currently accept it; I'd be OK with a (separate) patch to fix that. (GCC emits an error, so there's unlikely to be compatibility issues with such a change.)

clang/lib/Sema/SemaChecking.cpp
15804

You didn't like the code change I suggested there?

15969

Is it redundant? This check is for an _incomplete_ type as the array element type, which doesn't seem directly related to unbounded array sizes. (though it also prevents knowing the size of the array)

15972

What I meant to say is that both the IsTailPaddedMemberArray call, and this code, should be moved up above, in order to set IsUnboundedArray to true, so we will get the checks from that block.

(But doesn't need to be in this patch, fine to just leave a comment to that effect).

kees added inline comments.Jul 11 2022, 11:49 AM
clang/lib/CodeGen/CGExpr.cpp
909

union X { int x[0]; }; should still be a FAM in all strictness modes. (if you don't want zero-length-arrays, use -Werror=zero-length-array).

The Linux kernel cannot use -Wzero-length-array because we have cases of userspace APIs being stuck with them. (i.e. they are part of the struct declaration, even though the kernel code doesn't use them.) For example:

In file included from ../kernel/bounds.c:13:
In file included from ../include/linux/log2.h:12:
In file included from ../include/linux/bitops.h:9:
In file included from ../include/uapi/linux/kernel.h:5:
../include/uapi/linux/sysinfo.h:22:10: error: zero size arrays are an extension [-Werror,-Wzero-length-array]
        char _f[20-2*sizeof(__kernel_ulong_t)-sizeof(__u32)];   /* Padding: libc5 uses this.. */
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kees added a comment.Jul 12 2022, 10:58 AM

@kees are you ok with current state?

Build tests are still looking correct on my end. Thanks!

kees added a comment.Jul 12 2022, 11:01 AM

I should clarify: I still need the =3 mode. Since sizeof([0]) == 0 and sizeof([]) == error, they are being treated differently already by the compiler causing bugs in Linux. The kernel must still have a way to reject the _use_ of a [0] array. We cannot reject _declaration_ of them due to userspace API.

I should clarify: I still need the =3 mode. Since sizeof([0]) == 0 and sizeof([]) == error, they are being treated differently already by the compiler causing bugs in Linux. The kernel must still have a way to reject the _use_ of a [0] array. We cannot reject _declaration_ of them due to userspace API.

Looks like the linux kernel is currently chock-full of zero-length-arrays which are actually intended/required to work as flexible array members. Do you have a pending patch to convert all of them to [] which should be? If you're saying you absolutely need this mode -- which I still believe would be nonsensical to provide -- I'd like to see the set of concrete examples you cannot otherwise deal with.

clang/lib/CodeGen/CGExpr.cpp
909

ISTM you can simply remove this field on 64-bit platforms, without changing the ABI. If you're worried about API breakage, for anything that might use the field-name _f (?), you could make it visible only to user-space or

#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wzero-length-array"
...
#pragma GCC diagnostic pop

if really needed.

I'd like to see the set of concrete examples you cannot otherwise deal with.

I'm also very curious, I fail to understand the need of not considering [0] as a FAM, esp. given that to my understanding, this is the only use of that syntax.

kees added a comment.EditedJul 13 2022, 4:34 PM

I should clarify: I still need the =3 mode. Since sizeof([0]) == 0 and sizeof([]) == error, they are being treated differently already by the compiler causing bugs in Linux. The kernel must still have a way to reject the _use_ of a [0] array. We cannot reject _declaration_ of them due to userspace API.

Looks like the linux kernel is currently chock-full of zero-length-arrays which are actually intended/required to work as flexible array members. Do you have a pending patch to convert all of them to [] which should be?

Well, yes and no. It's been a multi-year on-going effort. Though it's actually complete for the kernel now as of v5.18:
https://github.com/KSPP/linux/issues/78

What remains is the userspace headers. The _real_ pain has been the 1-element arrays. Ugh. The master bug is here:
https://github.com/KSPP/linux/issues/21

If you're saying you absolutely need this mode -- which I still believe would be nonsensical to provide -- I'd like to see the set of concrete examples you cannot otherwise deal with.

I have several goals. The most important is making sure all the fixed-sized trailing arrays are NOT treated as flexible arrays so that __builtin_object_size will work as expected for Linux's FORTIFY implementation and -fstrict-flex-arrays=2 certainly solves that. However, then we're left with all the 0-element arrays, which need to be cleaned up because they don't behave the same as true flexible arrays. For example, with sizeof; the kernel has dealt with allocation size bugs relating to sizeof being used against 0-element arrays (which _should_ have caused a build failure if it were a true flexible array and the bug would have been immediately found), and I would like to make sure those can never happen again.

If this were any other code base, I could just use -Wzero-length-array, but the scattered ancient userspace APIs need to live on until $forever.

And if this were any other code base, I could use #pragma, but it's frowned on in general, and is going to be met with even greater resistance in userspace API headers, especially given that it will need to be #ifdefed for clang vs gcc, and then tweaked again once GCC gains the option, which would then need to be version checked. (This is why #pragma is strongly discouraged: Linux has a whole system for doing compiler behavior detection for its internal headers and build system, but such things don't exist for the userspace headers.)

So given the behavioral differences that already existed between true flexible arrays and 0-element arrays, I'd wanted to have a way to turn off _all_ the flexible array "extensions". I never wanted these different levels for -fstrict-flex-arrays. :) I want the complete disabling of all the extensions that were creating inconsistent behaviors. To lose the ability to disable the 0-is-flexible would be disappointing, but if I absolutely have no choice, I can live with it. Given Linux will be the first user of this option, and the reason for this option being created, I'd _much_ rather have it doing what's needed, though. :)

kees added a comment.Jul 13 2022, 4:53 PM

Example of the bug I want to block:

struct foo {
    int stuff;
    u32 data[0];
};

struct foo *deserialize(u8 *str, int len)
{
    struct foo *instance;
    size_t bytes;

    bytes = sizeof(*instance) + sizeof(instance->data) * (len / sizeof(u32));
    instance = kmalloc(bytes, GFP_KERNEL);
    if (!instance)
        return NULL;
    memcpy(instance->data, str, len)
}

This contains a catastrophic 1 character bug (should be sizeof(*instance->data)) that will only be encountered at runtime when the memcpy runs past the end of the the allocation. It could have been caught at build-time if the flex-array extensions were disabled; without -fstrict-flex-arrays=3 I have no way to block these (or similar) sneaking back into the kernel by way of old (or new) userspace APIs. :( So actually, even with #pragma, we could still trip over this. Please leave the =3 mode.

https://godbolt.org/z/dexd3a4Y8

Just to move things forward here: I propose to continue and land this patch without mode 3 (there were only a couple comments left to address for that), and continue the discussion about whether to add mode 3 elsewhere. (I don't mind where, KSPP, gcc, or llvm issue trackers would all be fine by me.) Perhaps we add it later, perhaps we don't, but that doesn't need to hold up the rest.

Sound good to land the current form (we have sufficient tests for Sema and -fsanitize=array-bounds now) and have =3 as a separate discussion.

clang/test/CodeGen/bounds-checking-fam.c
4

84301 is the official CPython bug number. 94250 has been marked as a duplicate.

Forgot the commit tag: just landed it as f764dc99b37e1e6428724a61f36bcb49c015dc70

Sorry if I was unclear -- I did intend that the remaining minor open comments be addressed before commit. I've quoted them again here for clarity. Please address in a follow-up, thanks!

clang/include/clang/Driver/Options.td
1143

Minor whitespace issue.

clang/lib/CodeGen/CGExpr.cpp
909

Yes, we can reach this path, which is why the change is incorrect. We should not be changing the FAMness of undefined size, or size == 0, in any of the modes.

Please address.

clang/lib/Sema/SemaChecking.cpp
15804

You didn't like the code change I suggested there?

Open question

15969

Is it redundant? This check is for an _incomplete_ type as the array element type, which doesn't seem directly related to unbounded array sizes. (though it also prevents knowing the size of the array)

Please address (comment issue only).

15972

What I meant to say is that both the IsTailPaddedMemberArray call, and this code, should be moved up above, in order to set IsUnboundedArray to true, so we will get the checks from that block.

(But doesn't need to be in this patch, fine to just leave a comment to that effect).

Please address (comment issue only).

clang/test/CodeGen/object-size-flex-array.c
100

Please update all CHECK-LABELs to have open-paren, as per maskray's suggestion.

I've added a few suggestions for more rigorous testing, and raised a concern about a bug in this patch that is hiding expected -Warray-bounds warnings in certain cases.

clang/lib/CodeGen/CGExpr.cpp
888–889

I was exploring this difference and during my testing, I ran across what I think is a bug in mode 1 behavior of this patch:

https://godbolt.org/z/hj9T4MY61 -fsfa=0, no-warnings: f[0] f[ZERO] f[1] warnings: f[ONE] f[5]
https://godbolt.org/z/Ea9MY9jjh -fsfa=1, no-warnings: f[0] f[ZERO] f[5] warnings: f[1] f[ONE] <-- Incorrect behavior?
https://godbolt.org/z/aerMvxs6q -fsfa=2, no-warnings: f[0] f[ZERO] warnings: f[1] f[ONE] f[5]

I would think that -Warray-bounds should give a warning for accesses beyond the declared size of an array of 5 elements no matter what the -fstrict-flex-arrays=<n> mode is. Have I misunderstood?

Testing with compiler prior to this patch (using 14.0.0, e.g., and not giving -fsfa option) gives behavior consistent with -fsfa=0 on trunk. So it seems -Warray-bounds has always treated size>1 trailing arrays as concrete/fixed arrays, just as it does with non-trailing arrays. But I may be missing something, of course....

clang/lib/Sema/SemaChecking.cpp
15801–15803

Under what circumstances (what compiler options, what example code) would size>1 trailing arrays be treated as flexible array members? I'm not seeing that. -Warray-bounds warns about OOB access on size>1 trailing arrays, and always has (with the notable exception of the comment I made above, which I think is a bug).

clang/test/CodeGen/bounds-checking-fam.c
23

Should there be a "test_zero" case? Such as:

struct Zero {
   int a[0];
};

int test_zero(struct Zero *p, int i) {
   return p->a[i] + (p->a)[i];
}

Likewise, for C99 compilation, should there also be a case with a trailing incomplete array?

struct Inc {
   int x;
   int a[];
};

int test_inc(struct Inc *p, int i) {
   return p->a[i] + (p->a)[i];
}
clang/test/SemaCXX/array-bounds.cpp
211–213

This may be a bit beyond the scope of this change, but I think there are some problems with -Warray-bounds behavior that should be considered here as well, having to do with *nesting* of structs that have trailing size=1 arrays.

Consider https://godbolt.org/z/ex6P8bqY9, in which this code:

#define LEN 1

typedef struct S1 {
    int x;
    double y;
    int tail[3];
} S1;

typedef struct S {
    int   x;
    S1    s1;
    double y;
    int   tail[1];
} S;

void fooS(S *s) {
    s->tail[2] = 10;
    s->tail[5] = 20;
    (s->tail)[10] = 200;
    s->s1.tail[10] = (s->s1.tail)[10] + 1;
}

produces a warning (as you'd expect). But if you change the size of the trailing array in the nested struct to 1, you no longer get a warning: https://godbolt.org/z/dWET3E9d6

Note also that testing these examples with trunk build repeats the bug I mentioned in an earlier comment, where -fsfa=1 causes an expected -Warray-bounds warning to be dropped.

sberg added a comment.Aug 4 2022, 12:57 AM

I'm surprised that

$ cat test.c
struct S {
    int m1;
    int m2[1];
};
void f(struct S * s) {
    s->m2[1] = 0;
}

$ clang -fsyntax-only -fstrict-flex-arrays=1 test.c
test.c:6:5: warning: array index 1 is past the end of the array (which contains 1 element) [-Warray-bounds]
    s->m2[1] = 0;
    ^     ~
test.c:3:5: note: array 'm2' declared here
    int m2[1];
    ^
1 warning generated.

causes a warning? I would have expected it to be suppressed in this case, as with the lax -fstrict-flex-arrays=0 default, and only to hit with the stricter -fstrict-flex-arrays=2.

I'm surprised that

[...]

causes a warning? I would have expected it to be suppressed in this case, as with the lax -fstrict-flex-arrays=0 default, and only to hit with the stricter -fstrict-flex-arrays=2.

appears to be addressed with https://reviews.llvm.org/D132853 "[clang] Fix -Warray-bound interaction with -fstrict-flex-arrays=1", thanks

aaron.ballman added inline comments.
clang/docs/ClangCommandLineReference.rst
2646

From the top of this file:

-------------------------------------------------------------------
NOTE: This file is automatically generated by running clang-tblgen
-gen-opt-docs. Do not edit this file by hand!!
-------------------------------------------------------------------

so none of these changes are reflected at: https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fstrict-flex-arrays

This documentation needs to live alongside the option.

2648

This phrasing is problematic because it uses the term "flexible array members" which is a term of art in the standard when it's really talking about something else. We don't want int trailing[1]; to be considered a flexible array member in terms of the language rules.

msebor added inline comments.Sep 12 2022, 8:34 AM
clang/docs/ClangCommandLineReference.rst
2648

A pedantic reading of the text would suggest that the type of s.trailing is incomplete at level 1 and sizeof s.trailing is invalid (requiring an error). Another minor issue is that the text refers to a size of the array when it actually means the number of elements.

steakhal added a comment.EditedAug 13 2023, 3:18 AM

I noticed some inconsistency between this -fstrict-flex-arrays=N flag and what the RecordDecl::hasFlexibleArrayMember() returns for an example like this:

typedef unsigned long size_t;
void *malloc(size_t);
void free(void *);

void field(void) {
  struct vec { size_t len; int data[0]; };
  struct vec *a = (struct vec*) malloc(sizeof(struct vec) + 10*sizeof(int));
  free(a);
}

In the example, I use don't specify the -fstrict-flex-arrays flag, thus it should default to 0, which means that any trailing arrays (let it be incomplete or of any concrete size), to be considered as a flexible-array-member.
In the AST, for the RecordDecl, I'd expect that the hasFlexibleArrayMember() member function would reflect this, and return true, however, it returns false instead.
It does so because in SemaDecl.cpp Sema::ActOnFields() before doing some FAM-related checks and diagnostics it performs this check, guarding that branch:

bool IsLastField = (i + 1 == Fields.end());
if (FDTy->isFunctionType()) {
  // ...
} else if (FDTy->isIncompleteArrayType() &&
           (Record || isa<ObjCContainerDecl>(EnclosingDecl))) {
  //This is the FAM handling branch.
  // ...
}

Consequently, Sema will only set the bit for hasFlexibleArrayMember if the array is incomplete.
So my question is, if the RecordDecl::hasFlexibleArrayMember() should be consistent with the -fstrict-flex-arrays flag, or not?
@serge-sans-paille @aaron.ballman

RecordDecl::hasFlexibleArrayMember() is supposed to reflect the standard's definition of a flexible array member, which only includes incomplete arrays. The places that care about other array members use separate checks. We wouldn't want to accidentally extend the non-standard treatment of arrays with fixed bound to other places.

I think there was an effort to try to refactor uses of StrictFlexArrays to a centralized helper, but I don't remember what happened to it.

RecordDecl::hasFlexibleArrayMember() is supposed to reflect the standard's definition of a flexible array member, which only includes incomplete arrays. The places that care about other array members use separate checks. We wouldn't want to accidentally extend the non-standard treatment of arrays with fixed bound to other places.

I think there was an effort to try to refactor uses of StrictFlexArrays to a centralized helper, but I don't remember what happened to it.

Are you thinking of https://reviews.llvm.org/D134791 perchance?

I was probably thinking of that, yes.

Thanks for the xrefs! It turns out I've already seen that one :D
I'll look into it once more, with a fresh set of eyes. Thanks again!