Page MenuHomePhabricator

[Sema] Introduce BuiltinAttr, per-declaration builtin-ness
ClosedPublic

Authored by tambre on Apr 5 2020, 5:08 AM.

Details

Summary

Instead of relying on whether a certain identifier is a builtin, introduce BuiltinAttr to specify a declaration as having builtin semantics.

This fixes incompatible redeclarations of builtins, as reverting the identifier as being builtin due to one incompatible redeclaration would have broken rest of the builtin calls.
Mostly-compatible redeclarations of builtins also no longer have builtin semantics. They don't call the builtin nor inherit their attributes.
A long-standing FIXME regarding builtins inside a namespace enclosed in extern "C" not being recognized is also addressed.

Due to the more correct handling attributes for builtin functions are added in more places, resulting in more useful warnings.
Tests are updated to reflect that.

Intrinsics without an inline definition in intrin.h had inline and static removed as they had no effect and caused them to no longer be recognized as builtins otherwise.

A pthread_create() related test is XFAIL-ed, as it relied on it being recognized as a builtin based on its name.
The builtin declaration syntax is too restrictive and doesn't allow custom structs, function pointers, etc.
It seems to be the only case and fixing this would require reworking the current builtin syntax, so this seems acceptable.

Fixes PR45410.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
tambre added inline comments.May 23 2020, 9:58 AM
clang/test/Sema/warn-fortify-source.c
13–14

Agreed.
I've removed this test, as there doesn't seem to be an easy way to replicate this behaviour.

aaron.ballman added inline comments.May 25 2020, 10:42 AM
clang/lib/AST/Decl.cpp
3169–3171

I think this is a bit more clear:

} else if (const auto *A = getAttr<BuiltinAttr>()) {
  BuiltinID = A->getID();
}

and initialize BuiltinID to zero above.

clang/test/Sema/implicit-builtin-decl.c
64

It looks like we're losing test coverage with this change?

tambre updated this revision to Diff 267450.May 30 2020, 3:20 AM
tambre marked 4 inline comments as done.

Fix some intrinsics not being marked as builtin due to being static in the headers.
Make some code easier to read, fix test for sigsetjmp in Sema/implicit-builtin-decl.c to reflect the original intent.

tambre marked an inline comment as done.May 30 2020, 3:27 AM
tambre added inline comments.
clang/lib/AST/Decl.cpp
3169–3171

Done.

clang/test/CodeGen/callback_pthread_create.c
20

As many others prior to this patch, pthread_create was recognized as a builtin due to its name and thus had attributes applied.
Unlike others however, pthread_create is the only builtin in Builtins.def that doesn't have its arguments specified. Doing that would require implementing support for function pointers in the builtin database and adding yet another special case for pthread_t and pthread_attr_t.
That'd be quite a bit of work, which I'm not interested in doing.

How about simply removing the hack that is the pthread_create builtin entry and disabling/removing this test?

clang/test/Sema/implicit-builtin-decl.c
64

Indeed. I've reverted this change and changed the test to instead not test for it being recognized as a builtin, since it shouldn't be.

tambre updated this revision to Diff 288792.Aug 29 2020, 11:20 AM

Rebase, fix new MS builtins and other tests, XFAIL pthread_create() test

tambre edited the summary of this revision. (Show Details)Aug 29 2020, 11:25 AM

Please review. I would like to get this in a mergeable state.

FYI: PR45410, which this fixes, has been marked as a release blocker for 11.0.0.

I'd still like @rsmith to sign off here as code owner.

clang/include/clang/Basic/IdentifierTable.h
227–228

Do we need to support reverting builtins anymore?

clang/test/CodeGen/callback_pthread_create.c
4

I guess the problem with pthread_create is that the types are not really reasonable to synthesize. I wonder if we can use an approach more like what we do with C++, where we don't magically synthesize a declaration but where we do recognize that a particular declaration is compatible with the builtin signature.

tambre updated this revision to Diff 288911.Aug 31 2020, 3:16 AM
tambre marked 2 inline comments as done.

Remove builtin reverting.

clang/include/clang/Basic/IdentifierTable.h
227–228

We don't. It'd be possible to set the builtin ID of identifiers to 0 as an optimization to avoid declaration compatibility checking if there's a declaration that would hide the actual builtin. But I doubt it's worth it.

I've removed code related to this, as nothing was actually checking the reverted status anymore.

clang/test/CodeGen/callback_pthread_create.c
4

Seems like a good idea. Would also reduce code duplication between C and C++. I would be willing to look into that as a followup.

rjmccall added inline comments.Aug 31 2020, 2:43 PM
clang/lib/Serialization/ASTWriter.cpp
3385–3386

Clang bitcode isn't really a stable format, it's fine to just drop this bit.

I'd still like @rsmith to sign off here as code owner.

Generally, I'm happy with this direction.

What happens for builtins with the "t" (custom typechecking) flag, for which the signature is intended to have no meaning? Do we always give them builtin semantics, or never, or ... something else? I think it might be reasonable to require them to always be declared as taking unspecified arguments -- () in C and (...) in C++, or to simply say that the user cannot declare such functions themselves.

clang/lib/Headers/intrin.h
134

Does the __inline__ here do anything for a builtin function? Can we remove it along with the static?

182

Why is static being removed from some of the functions in this header but not others?

clang/lib/Sema/SemaDecl.cpp
9661–9687

I think this needs more refinement:

  • You appear to be creating and throwing away a new builtin function declaration (plus parameter declarations etc) each time you see a declaration with a matching name, even if one was already created. Given that you don't actually use D for anything other than its type, creating the declaration seems redundant and using ASTContext::GetBuiltinType would be more appropriate.
  • There are no checks of which scope the new function is declared in; this appears to apply in all scopes, but some builtin names are only reserved in the global scope (those beginning with an underscore followed by a lowercase letter such as _bittest), so that doesn't seem appropriate. The old code in FunctionDecl::getBuiltinID checked that the declaration is given C language linkage (except for _GetExceptionInfo, which was permitted to have C++ language linkage), and that check still seems appropriate to me.
  • The special case permitting _GetExceptionInfo to be declared with *any* type seems suspect; the old code permitted it to have different language linkage, not the wrong type.
  • Using typesAreCompatible in C++-specific code is weird, since C++ doesn't have a notion of "compatible types".
clang/lib/Serialization/ASTReader.cpp
973–974

We don't have any stability guarantees for our AST bitcode format yet; you can just remove this bit rather than retaining a placeholder.

clang/test/Analysis/bstring.cpp
106 ↗(On Diff #288911)

This should not be recognized as a builtin, because the memset function is not extern "C".

tambre updated this revision to Diff 289631.EditedSep 2 2020, 9:15 PM
tambre marked 6 inline comments as done.

Remove __inline__ and static from all builtins in intrin.h.
Remove ASTReader/ASTWriter placeholders.
During builtin recognition:

  • Use ASTContext::GetBuiltinType() instead of creating a new function declaration every time.
  • Check for C linkage.
  • Handle builtins with custom typechecking.
tambre added a comment.Sep 2 2020, 9:17 PM

Thanks for the review!
I believe I've managed to address your comments.

What happens for builtins with the "t" (custom typechecking) flag, for which the signature is intended to have no meaning? Do we always give them builtin semantics, or never, or ... something else? I think it might be reasonable to require them to always be declared as taking unspecified arguments -- () in C and (...) in C++, or to simply say that the user cannot declare such functions themselves.

The options I see are:

  1. Disallow.
  2. Run the custom typechecking.
  3. Require as taking unspecified arguments.
  4. Simply check the name.

Implementing running custom typechecking seems it'd be tricky and not worth it.
Disallowing isn't an option as a cursory search shows that this is used in the wild.
Requiring unspecified arguments isn't feasible either as the real-world usages seem to include arguments in their declarations. E.g. __va_start from MSVC's vadefs.h, which is also tested by clang/test/SemaCXX/microsoft-varargs.cpp.

I've thus gone with simply marking as them builtins if the name matches and they're extern "C". The C linkage requirement didn't exist before, but seems a safe improvement to make.

clang/lib/Headers/intrin.h
134

It does not. Removed from all.

182

I removed static from builtins that showed up as problematic in tests. But like inline, there's really no effect. I've removed both from all builtins in this header.

clang/lib/Sema/SemaDecl.cpp
9661–9687
  • You appear to be creating and throwing away a new builtin function declaration (plus parameter declarations etc) each time you see a declaration with a matching name, even if one was already created. Given that you don't actually use D for anything other than its type, creating the declaration seems redundant and using ASTContext::GetBuiltinType would be more appropriate.

Fixed.

  • There are no checks of which scope the new function is declared in; this appears to apply in all scopes, but some builtin names are only reserved in the global scope (those beginning with an underscore followed by a lowercase letter such as _bittest), so that doesn't seem appropriate. The old code in FunctionDecl::getBuiltinID checked that the declaration is given C language linkage (except for _GetExceptionInfo, which was permitted to have C++ language linkage), and that check still seems appropriate to me.

Fixed.

  • The special case permitting _GetExceptionInfo to be declared with *any* type seems suspect; the old code permitted it to have different language linkage, not the wrong type.

I've fixed the linkage part, but the old code didn't have any type checking at all so I've kept it this way.

  • Using typesAreCompatible in C++-specific code is weird, since C++ doesn't have a notion of "compatible types".

Agreed, however I was unable to come up with a better way to do this. Let me know if you have any ideas.
That said, since I've now fixed it to check for C linkage I think this is less of a problem.

clang/test/Analysis/bstring.cpp
106 ↗(On Diff #288911)

Fixed by proper extern "C" handling.

tambre edited the summary of this revision. (Show Details)Sep 2 2020, 9:57 PM

The builtins with custom type-checking are all true intrinsics like __builtin_operator_new and so on. They really can't be validly declared by the user program. The thing that seems most likely to avoid random compiler crashes would be to either forbid explicit declarations of them or treat those as no longer being builtins. If we need to maintain compatibility with people making custom declarations, we would need to always treat them as builtins and run the risk of crashing if someone declares one with a bad signature. But I don't think it's unfair of us to break those people; that is seriously not reasonable user behavior.

It's possible that custom declarations are people trying to create compatibility shims for compilers that don't provide these as builtins. Those people should be guarding their custom declarations, preferably with __has_builtin.

tambre added a comment.EditedSep 3 2020, 9:12 AM

The builtins with custom type-checking are all true intrinsics like __builtin_operator_new and so on. They really can't be validly declared by the user program. The thing that seems most likely to avoid random compiler crashes would be to either forbid explicit declarations of them or treat those as no longer being builtins. If we need to maintain compatibility with people making custom declarations, we would need to always treat them as builtins and run the risk of crashing if someone declares one with a bad signature. But I don't think it's unfair of us to break those people; that is seriously not reasonable user behavior.

It's possible that custom declarations are people trying to create compatibility shims for compilers that don't provide these as builtins. Those people should be guarding their custom declarations, preferably with __has_builtin.

I fully agree.

However, I believe you forget to account for the example that I brought up.
In particular, MSVC's header vadefs.h includes a declaration of __va_start(), which would cause almost any code including standard headers to fail to compile on Windows.
This issue isn't isolated to some old MSVC version, as the declaration is still there in the latest Visual Studio Preview version 14.28.29213.

How about turning it into an error only for non-Microsoft builtins?
Though keeping that as a followup might be even better, as this will probably be merged into 11.0.

I didn't see the specific example, sorry. I agree that my description is more applicable to builtins in the __builtin namespace, which describes most of the builtins with custom typechecking. Is the problem just __va_start?

If we have to treat all declarations as builtins for the custom-typechecking builtins just to land this patch, I don't think that's the worst result in the world, and we can incrementally go from there. __va_start actually has a signature, it just effectively has optional arguments, which is something we could definitely teach the builtins database and signature-matcher about.

tambre updated this revision to Diff 289986.Sep 4 2020, 9:43 AM

Improve comments.

tambre added a comment.EditedSep 4 2020, 9:44 AM

I didn't see the specific example, sorry. I agree that my description is more applicable to builtins in the __builtin namespace, which describes most of the builtins with custom typechecking. Is the problem just __va_start?

I grepped for all builtins with custom typechecking.
I found two occurences in MSVC's headers: __va_start and __GetExceptionInfo, but none on my Debian Unstable machine.

If we have to treat all declarations as builtins for the custom-typechecking builtins just to land this patch, I don't think that's the worst result in the world, and we can incrementally go from there. __va_start actually has a signature, it just effectively has optional arguments, which is something we could definitely teach the builtins database and signature-matcher about.

I'd still prefer we go incrementally.

It would be nice to receive another round of code review as I have fixed the issues pointed out by Richard.

rsmith added inline comments.Sep 7 2020, 1:04 AM
clang/lib/Headers/intrin.h
435–437

The functions with inline definitions should still be static inline so that we don't emit them as strong external defintiions.

clang/lib/Sema/SemaDecl.cpp
9666–9667

This will give the wrong answer for

extern "C" {
namespace X {
void __builtin_foo();
}
}

... which does have C language linkage. Instead, please call FunctionDecl::getLanguageLinkage(), which knows how to handle these cases.

9684–9685

Please use ASTContext::hasSameTypeIgnoringExceptionSpec instead.

clang/lib/Serialization/ASTReader.cpp
916

We now consider getObjCOrBuiltinID() here for the IsModule case, where we didn't before. Is that an intentional change?

tambre updated this revision to Diff 290229.Sep 7 2020, 2:59 AM
tambre marked 3 inline comments as done.

Address review comments.

tambre edited the summary of this revision. (Show Details)Sep 7 2020, 3:02 AM
tambre updated this revision to Diff 290232.Sep 7 2020, 3:11 AM
tambre marked an inline comment as done.

Remove now obsolete FIXME.

tambre added a comment.Sep 7 2020, 3:11 AM

Thanks for the review. All tests still pass, should be good for another round.

clang/lib/Sema/SemaDecl.cpp
9666–9667

Good suggestion. This fixes the long-standing FIXME inherited from getBuiltinID(). I've added a test for this.

clang/lib/Serialization/ASTReader.cpp
916

Unintentional, fixed.

aaron.ballman added inline comments.Sep 14 2020, 8:19 AM
clang/lib/Sema/SemaDecl.cpp
2088

Should we do this check *before* we create the C linkage decl spec above?

rsmith accepted this revision.Sep 14 2020, 3:10 PM

Thanks! This looks good to me (subject to Aaron's comment being addressed). Please wait a couple of days for any more comments from the other reviewers.

clang/lib/Sema/SemaDecl.cpp
9683

Please can you add a // FIXME here that we should probably only recognize this as a builtin in the scope where the MS headers actually declare it, rather than in every scope.

tambre updated this revision to Diff 291778.Sep 14 2020, 10:15 PM
tambre marked 2 inline comments as done.

Address comments.

If there are no further comments I'll commit this in a few days.

clang/lib/Sema/SemaDecl.cpp
2088

We should. However, since now only LazilyCreateBuiltin() calls this and checks these before, we can simply removed them here and pass in the QualType.

9683

Added.

This revision was not accepted when it landed; it landed in state Needs Review.Sep 17 2020, 9:29 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
jrtc27 added a subscriber: jrtc27.Sep 17 2020, 12:46 PM

If someone cares about pthread_create they might wish to follow up on my https://reviews.llvm.org/D58531, which I filed early last year to permit pthread_create to have a proper type in the syntax. It will likely need rebasing, but probably isn't that much work.

If someone cares about pthread_create they might wish to follow up on my https://reviews.llvm.org/D58531, which I filed early last year to permit pthread_create to have a proper type in the syntax. It will likely need rebasing, but probably isn't that much work.

Sorry I missed this before! Please feel free to ping patches every ~week if they're not getting attention.

dmajor added a subscriber: dmajor.Sep 18 2020, 9:40 AM

This commit broke Firefox builds on Mac with an error in the SDK headers. Could you please revert if a fix is not readily available?

Reproducer:

struct objc_super {};
extern "C" id objc_msgSendSuper(struct objc_super *super, SEL op, ...);
extern "C" void objc_msgSendSuper_stret(struct objc_super *super, SEL op, ...);

Result:

$ clang -c test.mm
test.mm:3:48: error: reference to 'objc_super' is ambiguous
extern "C" void objc_msgSendSuper_stret(struct objc_super *super, SEL op, ...);
                                               ^
test.mm:1:8: note: candidate found by name lookup is 'objc_super'
struct objc_super {};
       ^
note: candidate found by name lookup is 'objc_super'
1 error generated.

This commit broke Firefox builds on Mac with an error in the SDK headers. Could you please revert if a fix is not readily available?

Reproducer:

struct objc_super {};
extern "C" id objc_msgSendSuper(struct objc_super *super, SEL op, ...);
extern "C" void objc_msgSendSuper_stret(struct objc_super *super, SEL op, ...);

Result:

$ clang -c test.mm
test.mm:3:48: error: reference to 'objc_super' is ambiguous
extern "C" void objc_msgSendSuper_stret(struct objc_super *super, SEL op, ...);
                                               ^
test.mm:1:8: note: candidate found by name lookup is 'objc_super'
struct objc_super {};
       ^
note: candidate found by name lookup is 'objc_super'
1 error generated.

Looking into this, got an idea what might be wrong. Gimme an hour or two. :)

This commit broke Firefox builds on Mac with an error in the SDK headers. Could you please revert if a fix is not readily available?

Reproducer:

struct objc_super {};
extern "C" id objc_msgSendSuper(struct objc_super *super, SEL op, ...);
extern "C" void objc_msgSendSuper_stret(struct objc_super *super, SEL op, ...);

Result:

$ clang -c test.mm
test.mm:3:48: error: reference to 'objc_super' is ambiguous
extern "C" void objc_msgSendSuper_stret(struct objc_super *super, SEL op, ...);
                                               ^
test.mm:1:8: note: candidate found by name lookup is 'objc_super'
struct objc_super {};
       ^
note: candidate found by name lookup is 'objc_super'
1 error generated.

Looking into this, got an idea what might be wrong. Gimme an hour or two. :)

Fix in D87917.

We've hit a fairly subtle miscompile caused by this patch.

glibc's setjmp.h looks like this (irrelevant parts removed):

struct __jmp_buf_tag { /*...*/ };
extern int __sigsetjmp(struct __jmp_buf_tag __env[1], int);
typedef struct __jmp_buf_tag sigjmp_buf[1];
#define sigsetjmp __sigsetjmp

This worked fine with the old approach. But with the new approach, we decide the declaration of __sigsetjmp is not a builtin, because at its point of declaration, we can't compute the "proper" type because sigjmp_buf has not been declared yet. As a result, we don't add a BuiltinAttr to __sigsetjmp, but much more critically, we don't add a ReturnsTwiceAttr, which results in miscompiles in calls to this function. (I think sigsetjmp is the only affected function with glibc. jmp_buf is declared prior to __setjmp and friends.)

I suppose we don't actually care what the parameter types for __sigsetjmp are, and it would be fine (and much safer) to treat any function with that name as a builtin, like we used to. Perhaps we should have a way of marking builtins as "the given type is what we expect / what we will implicitly declare, but it's OK if it doesn't actually match"?

We've hit a fairly subtle miscompile caused by this patch.

glibc's setjmp.h looks like this (irrelevant parts removed):

struct __jmp_buf_tag { /*...*/ };
extern int __sigsetjmp(struct __jmp_buf_tag __env[1], int);
typedef struct __jmp_buf_tag sigjmp_buf[1];
#define sigsetjmp __sigsetjmp

This worked fine with the old approach. But with the new approach, we decide the declaration of __sigsetjmp is not a builtin, because at its point of declaration, we can't compute the "proper" type because sigjmp_buf has not been declared yet. As a result, we don't add a BuiltinAttr to __sigsetjmp, but much more critically, we don't add a ReturnsTwiceAttr, which results in miscompiles in calls to this function. (I think sigsetjmp is the only affected function with glibc. jmp_buf is declared prior to __setjmp and friends.)

I suppose we don't actually care what the parameter types for __sigsetjmp are, and it would be fine (and much safer) to treat any function with that name as a builtin, like we used to. Perhaps we should have a way of marking builtins as "the given type is what we expect / what we will implicitly declare, but it's OK if it doesn't actually match"?

Marking __sigsetjmp as having custom typechecking should suffice (t attribute in Builtins.def), no? Though a case in Sema::CheckBuiltinFunctionCall() might also then be necessary.

Being permissive about recognizing builtins when the expected signature requires a type that lookup can't find seems completely reasonable. We don't really want to force library functions to take the custom-typechecking path just because we want to infer an attribute for them.

Being permissive about recognizing builtins when the expected signature requires a type that lookup can't find seems completely reasonable. We don't really want to force library functions to take the custom-typechecking path just because we want to infer an attribute for them.

Proposed fix in https://reviews.llvm.org/D88518.