This is an archive of the discontinued LLVM Phabricator instance.

Limit types of builtins that can be redeclared.
ClosedPublic

Authored by erichkeane on Apr 6 2018, 11:37 AM.

Details

Summary

As reported here: https://bugs.llvm.org/show_bug.cgi?id=37033
Any usage of a builtin function that uses a va_list by reference
will cause an assertion when redeclaring it.

After discussion in the review, it was concluded that the correct
way of accomplishing this fix is to make attempts to redeclare certain
builtins an error. Unfortunately, doing this limitation for all builtins
is likely a breaking change, so this commit simply limits it to
types with custom type checking and those that take a reference.

Two tests needed to be updated to make this work.

Diff Detail

Repository
rL LLVM

Event Timeline

erichkeane created this revision.Apr 6 2018, 11:37 AM

Rather than adding weird hacks to merging, can we just reject any attempt to redeclare a builtin?

Rather than adding weird hacks to merging, can we just reject any attempt to redeclare a builtin?

I don't believe we can. I think things often re-declare them, particularly when it comes to va_end. I'd love to move the 'builtinID' check higher and reject any attempts to redeclare in C, but I'd like some further guidance. Do you know of anyone who can help us make that decision?

One big issue with just disallowing redecls of builtins is we have is that a number of these are actually expected to be in the system headers/libraries (such as vprintf). A ton of the math.h functions are also builtins.

By "builtins" I meant specifically the stuff that that user code isn't allowed to declare, like __builtin_*, since they often have weird/incomplete signatures.

By "builtins" I meant specifically the stuff that that user code isn't allowed to declare, like __builtin_*, since they often have weird/incomplete signatures.

Unfortunately, the problem shows in vprintf as well, not just __builtin functions. So that would solve the original bug report, but not the rest of the similar issues.

How could this patch possibly affect vprintf?

How could this patch possibly affect vprintf?

vprintf is defined in the builtins.def file, and other than our messages considering it a 'library function', it is otherwise identical.

Identical to what? __builtin_va_start and __builtin_va_end specifically are weird because they're builtins which have a signature which can't be expressed in C. vprintf doesn't have that problem.

Builtins.def makes the relevant distinction already: a "BUILTIN" is a reserved identifier the user isn't allowed to redeclare, and a "LIBBUILTIN" is a library function which the compiler has special knowledge of.

Identical to what? __builtin_va_start and __builtin_va_end specifically are weird because they're builtins which have a signature which can't be expressed in C. vprintf doesn't have that problem.

Builtins.def makes the relevant distinction already: a "BUILTIN" is a reserved identifier the user isn't allowed to redeclare, and a "LIBBUILTIN" is a library function which the compiler has special knowledge of.

In the context of this patch, vprintf is handled using the exact same code-paths. SO, it'll have its 2nd parameter created with type 'char*&', which can cause the same crash that I observed with overloading va_end. Currently, you ARE permitted to redeclare a BUILTIN (see my 1st test above), but a conflicting type is an error. A LIBBUILTIN is a warning to have a conflicting redeclaration.

vprintf is handled using the exact same code-paths. SO, it'll have its 2nd parameter created with type 'char*&'

vprintf is defined in the C standard as "int vprintf(const char *format, va_list arg);"; on Windows, that's equivalent to "int vprintf(const char *format, char* arg);". If a reference is showing up anywhere, something has gone seriously wrong before we get anywhere near the type merging code.

vprintf is handled using the exact same code-paths. SO, it'll have its 2nd parameter created with type 'char*&'

vprintf is defined in the C standard as "int vprintf(const char *format, va_list arg);"; on Windows, that's equivalent to "int vprintf(const char *format, char* arg);". If a reference is showing up anywhere, something has gone seriously wrong before we get anywhere near the type merging code.

Hrm... looking back again, I believe you're right. I did a more careful grepping and found that only 4 builtins and 3 C99 library functions have this issue. Basically, anything with a 'A' in the parameter spec in Builtins.def. I don't know how I got attached to vprintf, I apologize for the wasted time.

The functions with this problem are:
builtins_va_start, builtins_va_end, builtins_va_copy, and builtin_stdarg_start.
va_start, va_end, va_copy.

I don't see redeclaring the latter 3 as being disallowed, so I presume that we'd not be able to prevent that. I'm really open to suggestions, as thinking over the weekend has convinced me that this patch is a little bit hacky. I'd like some solution for the assertion, but I'm not sure I see one. I note that godbolt (https://godbolt.org/g/qkryPR) doesn't show the assert as it is a non-assert build. Would we be OK with simply relaxing the '!isreferencetype' assert to something like, "!isReferenceType || isVaListTypeRef"?

I don't see redeclaring the latter 3 as being disallowed,

It is in fact disallowed by the standard, in the section describing va_start/va_arg/va_end/va_copy: "If a macro definition is suppressed in order to access an actual function, or a program defines an external identifier with the same name, the behavior is undefined."

I don't see redeclaring the latter 3 as being disallowed,

It is in fact disallowed by the standard, in the section describing va_start/va_arg/va_end/va_copy: "If a macro definition is suppressed in order to access an actual function, or a program defines an external identifier with the same name, the behavior is undefined."

The C spec seems to overload the usage of 'macro' a few times there. I believe it is permissible to implement va_copy and va_end as a function, isn't it? The first sentence of that section says va_start/va_arg must be implemented as "macros, not functions", but va_copy/va_end can be identifiers with external linkage.

I believe it is permissible to implement va_copy and va_end as a function, isn't it?

I guess you could, in theory, but LLVM doesn't support any targets which do that.

I believe it is permissible to implement va_copy and va_end as a function, isn't it?

I guess you could, in theory, but LLVM doesn't support any targets which do that.

Ok, fair enough. So is your guidance to identify these 7 builtins (4 builtins + 3 lib builtins) and disallow redeclaring? Or am I missing a more subtle solution here?

Basically, yes, although I was thinking you could be a bit more aggressive. At least, anything marked "t" for custom typechecking is probably not something which should be redeclared.

Basically, yes, although I was thinking you could be a bit more aggressive. At least, anything marked "t" for custom typechecking is probably not something which should be redeclared.

Ok, thank you very much for your help! I'll see if I can accomplish soon!

erichkeane retitled this revision from Strip reference from a va_list object in C when merging parameter types. to Limit types of builtins that can be redeclared..
erichkeane edited the summary of this revision. (Show Details)

Eli's suggestion for implementation.

efriedma added inline comments.Apr 10 2018, 11:08 AM
test/Sema/builtin-redecl.cpp
9 ↗(On Diff #141876)

We don't want to allow this; in particular, for builtins with custom type-checking, we have no way to correctly resolve which overload to call. Probably Sema::CheckOverload should be checking canBeRedeclared.

erichkeane marked an inline comment as done.

Restrict overloads as well.

I know that the Windows SDK definitely declares the __va_start function. Did you try building something like swift against the Windows SDK with this change?

I know that the Windows SDK definitely declares the __va_start function. Did you try building something like swift against the Windows SDK with this change?

Are we sure it declares it, and not just #define's it? If it actually declares it, there is no way it can currently build in certain C modes due to the issue I'd mentioned previously.

Snipping bits from va_defs.h:

#elif defined _M_ARM64

    void __cdecl __va_start(va_list*, ...);

    #define __crt_va_start_a(ap,v) ((void)(__va_start(&ap, _ADDRESSOF(v), _SLOTSIZEOF(v), __alignof(v), _ADDRESSOF(v))))
...

#elif defined _M_X64

    void __cdecl __va_start(va_list* , ...);

    #define __crt_va_start_a(ap, x) ((void)(__va_start(&ap, x)))

...

This looks like a declaration to me. Although, this is in system headers, so maybe we can ignore it by means of the system header suppression? The minor problem with that is that clang-cl (and the clang driver with the windows triple) do not support -isystem or -isysroot or --sysroot arguments. I suppose that as long as we expose the cc1 option (I imagine that clang-cl will pass the system paths appropriately), that is one option.

Snipping bits from va_defs.h:

#elif defined _M_ARM64

    void __cdecl __va_start(va_list*, ...);

    #define __crt_va_start_a(ap,v) ((void)(__va_start(&ap, _ADDRESSOF(v), _SLOTSIZEOF(v), __alignof(v), _ADDRESSOF(v))))
...

#elif defined _M_X64

    void __cdecl __va_start(va_list* , ...);

    #define __crt_va_start_a(ap, x) ((void)(__va_start(&ap, x)))

...

This looks like a declaration to me. Although, this is in system headers, so maybe we can ignore it by means of the system header suppression? The minor problem with that is that clang-cl (and the clang driver with the windows triple) do not support -isystem or -isysroot or --sysroot arguments. I suppose that as long as we expose the cc1 option (I imagine that clang-cl will pass the system paths appropriately), that is one option.

Ah, thank you! I couldn't find that when I looked after your last message, perhaps my grep skills are lacking. Unfortunately, allowing this in headers won't fix the actual bug, it'll just let it continue to crash on these headers.
I'm kind of out of ideas here... @efriedma : Ideas?

We can could add an exception to the "don't allow redeclarations with custom type-checking" rule to specifically allow redeclaring __va_start. The general rule is that we don't want user code redeclaring them because they don't have a specific signature, so our internal representation could change between releases. But __va_start has a specific fixed signature from the Microsoft SDK headers, so it should be fine to allow redeclaring it without a warning.

We can could add an exception to the "don't allow redeclarations with custom type-checking" rule to specifically allow redeclaring __va_start. The general rule is that we don't want user code redeclaring them because they don't have a specific signature, so our internal representation could change between releases. But __va_start has a specific fixed signature from the Microsoft SDK headers, so it should be fine to allow redeclaring it without a warning.

That still doesn't fix the bug, since redeclaring __va_start in 'C' mode when va_list is a reference type causes the assert.

I don't see an A in LANGBUILTIN(__va_start, "vc**.", "nt", ALL_MS_LANGUAGES)

I don't see an A in LANGBUILTIN(__va_start, "vc**.", "nt", ALL_MS_LANGUAGES)

Ah, you're right! I was confusing it with LIBBUILTIN(va_start, "vA.", "fn", "stdarg.h", ALL_LANGUAGES) and BUILTIN(__builtin_va_start, "vA.", "nt").

It seems that the MS builtins dont' actually use 'A'. I'll implement your suggestion, thanks for your patience :)

Added the exception for MSVC's va_start. va_start is a normal function on Linux, so it isn't an error on non-windows either.

It looks like you didn't fix the tests?

Revert unnecessary test changes... :/

This revision is now accepted and ready to land.Apr 16 2018, 2:29 PM
This revision was automatically updated to reflect the committed changes.

This broke the build of FreeBSD for me due to the declaration of __builtin_return_address(unsigned int) in https://github.com/tianocore/edk2/blob/master/MdePkg/Include/Base.h#L1281:

In file included from /exports/users/alr48/sources/freebsd-x86/sys/contrib/edk2/Include/Uefi.h:23:
In file included from /exports/users/alr48/sources/freebsd-x86/sys/contrib/edk2/Include/Uefi/UefiBaseType.h:20:
/exports/users/alr48/sources/freebsd-x86/sys/contrib/edk2/Include/Base.h:1231:10: error: cannot redeclare builtin function '__builtin_return_address'
  void * __builtin_return_address (unsigned int level);
         ^
/exports/users/alr48/sources/freebsd-x86/sys/contrib/edk2/Include/Base.h:1231:10: note: '__builtin_return_address' is a builtin with type 'void *(unsigned int)'

I'm curious why that file is declaring the builtin like that? The declaration doesn't seem like it is ever useful, builtins are available without any includes/etc as far back as I could go on all 3 compilers that I validated against.

Additionally, I believe that the declaration is actually prohibited by the standard, since it is declaring an identifier that is reserved for the implementation.

Is this a situation where the target project cannot be corrected? That line simply needs to be removed as far as I can tell.