This is an archive of the discontinued LLVM Phabricator instance.

Fix an assertion failure regression in isDesignatorAtObjectEnd for __builtin_object_size with incomplete array type in struct
ClosedPublic

Authored by arphaman on Dec 19 2017, 11:01 AM.

Details

Summary

The commit r316245 introduced a regression that causes an assertion failure when Clang tries to cast an IncompleteArrayType to a PointerType when evaluating __builtin_object_size in this sample:

typedef struct 	 {
  char string[512];
} NestedArrayStruct;

typedef struct 	 {
    int x;
    NestedArrayStruct	session[];
} IncompleteArrayStruct;

void func(IncompleteArrayStruct* p) {
   __builtin___strlcpy_chk (p->session[0].string, "ab", 2, __builtin_object_size(p->session[0].string, 1));
}

Interestingly enough gcc seems to produce a different output for the above code (when 1 is the last parameter to __builtin_object_size). It evaluates __builtin_object_size to 512 instead of -1 like Clang:

https://godbolt.org/g/vD9F9T

I'm still not sure what's the right behavior after reading GCC's description of __builtin_object_size (https://gcc.gnu.org/onlinedocs/gcc/Object-Size-Checking.html). Maybe someone who's more familiar with this builtin could point to the cause of this discrepancy.

rdar://36094951

Diff Detail

Repository
rC Clang

Event Timeline

arphaman created this revision.Dec 19 2017, 11:01 AM

Note that even though there is a discrepancy between GCC and Clang, this patch does not change Clang's behavior in this instance as it emitted -1 previously as well

LGTM assuming my nit gets addressed.

Thank you!

Maybe someone who's more familiar with this builtin could point to the cause of this discrepancy

Yeah, the documentation for this builtin is pretty sparse. My guess: clang assumes that the array's size is unknown because it's both an array and at the end of a struct. This exists because code will do clever things like

struct string {
  size_t len;
  char buf[1]; // actual string is accessed from here; `string` just gets overallocated to hold it all.
};

in FreeBSD-land, they also recommend overallocation with sockaddrs, which have a 14-length trailing element: https://www.freebsd.org/doc/en/books/developers-handbook/sockets-essential-functions.html

...So, the best compatible heuristic we've been able to settle on here, sadly, is "are we touching the final element in a struct, and is that element an array?" On the bright side, clang failing just means LLVM gets to try to figure it out, so only some hope of getting a useful answer is lost. :)

It's interesting that GCC tries to do better here, since AIUI it has a similar heuristic meant to cope with code like the above.

test/Sema/builtin-object-size.c
95

Please clang-format the additions to this file.

This revision is now accepted and ready to land.Dec 19 2017, 1:28 PM
vsapsai added inline comments.Dec 19 2017, 3:10 PM
test/Sema/builtin-object-size.c
105

Do we execute significantly different code paths when the second __builtin_object_size argument is 0, 2, 3? I think it is worth checking locally if these values aren't causing an assertion. Not sure about having such tests permanently, I'll leave it to you as you are more familiar with the code.

test/Sema/builtin-object-size.c
105

In this case, only 1 and 3 should be touching the buggy codepath, and they should execute it identically. If 0 and 2 reach there, we have bigger problems, since they shouldn't really be poking around in the designator of the given LValue.

Since it's presumably only ~10 seconds of copy-pasting, I'd be happy if we added sanity checks for other modes, as well. :)

LGTM assuming my nit gets addressed.

Thank you!

Maybe someone who's more familiar with this builtin could point to the cause of this discrepancy

Yeah, the documentation for this builtin is pretty sparse. My guess: clang assumes that the array's size is unknown because it's both an array and at the end of a struct. This exists because code will do clever things like

struct string {
  size_t len;
  char buf[1]; // actual string is accessed from here; `string` just gets overallocated to hold it all.
};

in FreeBSD-land, they also recommend overallocation with sockaddrs, which have a 14-length trailing element: https://www.freebsd.org/doc/en/books/developers-handbook/sockets-essential-functions.html

...So, the best compatible heuristic we've been able to settle on here, sadly, is "are we touching the final element in a struct, and is that element an array?" On the bright side, clang failing just means LLVM gets to try to figure it out, so only some hope of getting a useful answer is lost. :)

It's interesting that GCC tries to do better here, since AIUI it has a similar heuristic meant to cope with code like the above.

Thanks!

test/Sema/builtin-object-size.c
105

Sure, I'll test the other modes as well.

This revision was automatically updated to reflect the committed changes.
arphaman marked 2 inline comments as done.