This is an archive of the discontinued LLVM Phabricator instance.

[clang] Add a namespace for interesting identifiers.
ClosedPublic

Authored by zahiraam on Mar 15 2023, 9:07 AM.

Diff Detail

Event Timeline

zahiraam created this revision.Mar 15 2023, 9:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 15 2023, 9:07 AM
zahiraam requested review of this revision.Mar 15 2023, 9:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 15 2023, 9:07 AM

These changes will also need a release note at some point.

clang/include/clang/Basic/DiagnosticParseKinds.td
662–665 ↗(On Diff #505520)

I think we should add some documentation as to why this error happens. Naively, users expect the pragma to override the command line (the command line is the "default" and the pragma overrides that in the cases the default is wrong), so I'd imagine a user getting this error would ask, "But why?"

clang/lib/Parse/ParseStmt.cpp
1122–1123 ↗(On Diff #505520)

Does something assert that we never try to call this with FEM_Indeterminable?

1198–1210 ↗(On Diff #505520)

This is not correct -- it's only going to catch the case where the user literally writes float_t as the first token in a statement, so it's going to miss const float_t t and typedef float_t frobble; frobble t; etc. Also, this seems like it will impact compile time overhead because that's going to be performing a lot of useless string comparisons.

This cannot be handled at the level of the parser, it needs to be handled within Sema. I think you'll want it to live somewhere around GetFullTypeForDeclarator() in SemaType.cpp.

There are other situations that should be tested:

char buffer[sizeof(float_t)]; // This seems like something we'd want to prevent?
typedef float_t foo; // This seems like something that's harmless to accept?
using quux = float_t; // Same here as above
foo bar; // This, however, is a problem

extern float_t blah(); // This also seems bad

_Generic(1, float_t : 0); // Maybe a bad thing?

auto lam = [](float_t f) { return f; }; // Bad?

float f = (float_t)12; // What about this? Note, the float_t is only used in the cast...
clang/test/CodeGen/abi-check-1.c
1–2 ↗(On Diff #505520)

Why are these codegen tests when the diagnostic (should be) in Sema?

zahiraam updated this revision to Diff 507097.Mar 21 2023, 1:01 PM
zahiraam marked 3 inline comments as done.
zahiraam added inline comments.
clang/include/clang/Basic/DiagnosticParseKinds.td
662–665 ↗(On Diff #505520)

Where do you suggest I add the documentation?

Where do you suggest I add the documentation?

Somewhere around https://clang.llvm.org/docs/LanguageExtensions.html#extensions-to-specify-floating-point-flags where we document clang fp eval_method.

zahiraam updated this revision to Diff 507360.Mar 22 2023, 7:33 AM
zahiraam marked an inline comment as done.

The user isn't modifying the float_t type definition, they're using it. I think the diagnostic should say something like cannot use type 'float_t' within '#pragma clang fp eval_method'; type is set according to the default eval method for the translation unit.

Is there any way we can infer an attribute for these typedefs when they're declared, then diagnose it in DiagnoseUseOfDecl? Some sort of "available_only_in_default_eval_method" attribute?

zahiraam updated this revision to Diff 507829.Mar 23 2023, 11:19 AM
zahiraam added a comment.EditedMar 23 2023, 11:22 AM

The user isn't modifying the float_t type definition, they're using it. I think the diagnostic should say something like cannot use type 'float_t' within '#pragma clang fp eval_method'; type is set according to the default eval method for the translation unit.

Is there any way we can infer an attribute for these typedefs when they're declared, then diagnose it in DiagnoseUseOfDecl? Some sort of "available_only_in_default_eval_method" attribute?

@rjmccall Thanks for the review.
I might be able to call the DiagnoseUseOfDecl here https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaType.cpp#L1609 but I don't seem to have access to the NameDecl (first argument of the function).
Would the attribute be on the Decl?
Thanks.

The user isn't modifying the float_t type definition, they're using it. I think the diagnostic should say something like cannot use type 'float_t' within '#pragma clang fp eval_method'; type is set according to the default eval method for the translation unit.

Is there any way we can infer an attribute for these typedefs when they're declared, then diagnose it in DiagnoseUseOfDecl? Some sort of "available_only_in_default_eval_method" attribute?

@rjmccall Thanks for the review.
I might be able to call the DiagnoseUseOfDecl here https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaType.cpp#L1609 but I don't seem to have access to the NameDecl (first argument of the function).

We already call DiagnoseUseOfDecl whenever you use a declaration. You wouldn't need to inject any extra code into the core type-lookup operation.

Would the attribute be on the Decl?

I'm suggesting that you could add a check in DiagnoseUseOfDecl for some new attribute that you would synthesize when we build a typedef with this name. (We do something similar with library declarations of builtins.) Or, better yet, we could just modify the header to explicitly use the attribute, or maybe add the attribute in a redeclaration in our own math.h; I don't know if those are feasible options.

zahiraam updated this revision to Diff 508114.Mar 24 2023, 8:53 AM

Okay, so modifying math.h to use this attribute is acceptable? That's great, that's definitely the best outcome for the compiler.

Just a minor request about the diagnostic name, but otherwise LGTM.

clang/include/clang/Basic/DiagnosticSemaKinds.td
11503
zahiraam updated this revision to Diff 508176.Mar 24 2023, 11:35 AM
zahiraam marked an inline comment as done.

Okay, so modifying math.h to use this attribute is acceptable? That's great, that's definitely the best outcome for the compiler.

Just a minor request about the diagnostic name, but otherwise LGTM.

Yes. Added the attribute inside the included math.h header file.

Okay, so modifying math.h to use this attribute is acceptable? That's great, that's definitely the best outcome for the compiler.

Just a minor request about the diagnostic name, but otherwise LGTM.

Yes. Added the attribute inside the included math.h header file.

How does this work for, say, glibc, musl, or MSVC CRT? Won't those math.h headers lack the attribute and thus run into problems when used with Clang?

Okay, so modifying math.h to use this attribute is acceptable? That's great, that's definitely the best outcome for the compiler.

Just a minor request about the diagnostic name, but otherwise LGTM.

Yes. Added the attribute inside the included math.h header file.

How does this work for, say, glibc, musl, or MSVC CRT? Won't those math.h headers lack the attribute and thus run into problems when used with Clang?

Good point! @rjmccall are you thinking of something in particular with the attribute?
If not I guess we will have to rely on string comparison for all the typedef in the TU. Aaron pointed out the compile time overhead.

Okay, so modifying math.h to use this attribute is acceptable? That's great, that's definitely the best outcome for the compiler.

Just a minor request about the diagnostic name, but otherwise LGTM.

Yes. Added the attribute inside the included math.h header file.

How does this work for, say, glibc, musl, or MSVC CRT? Won't those math.h headers lack the attribute and thus run into problems when used with Clang?

Good point! @rjmccall are you thinking of something in particular with the attribute?

Zahira, this is what I was asking you when I asked whether modifying the math.h header was acceptable: whether you were prepared to accept that the warning would only fire on system math.h headers that we'd modified, or whether you cared about making it work with non-cooperative headers. I wasn't asking if you were willing to change the test code.

If not I guess we will have to rely on string comparison for all the typedef in the TU. Aaron pointed out the compile time overhead.

Well, the compile-time overhead of doing this on every typedef *declaration* is way better than the overhead of doing this on every type lookup, at least.

Anyway, there are three possibilities I can see:

  • We accept that this needs cooperative system headers.
  • We add a math.h compiler header that #include_nexts the system math.h and then adds the attribute. I believe you can just add an attribute to a typedef retroactively with something like typedef float_t float_t __attribute__((whatever)).
  • We do checks on every typedef declaration. There's a builtin-identifier trick that we do with library functions that we should be able to generalize to typedefs, so you wouldn't need to actually do string comparisons, you'd just check whether the IdentifierInfo* was storing a special ID. We'd set that up in initializeBuiltins at the start of the translation unit, so the overhead would just be that we'd create two extra IdentifierInfo*s in every TU.

The builtin ID logic is currently specific to builtin *functions*, and I don't think we'd want to hack typedef names into that. But the same storage in IdentifierInfo* is also used for identifying the ObjC context-sensitive keywords, by just offsetting the builtin IDs by the NUM_OBJC_KEYWORD. You should be able to generalize that by also introducing a concept of a builtin *type* name, so that e.g. IDs in [0,NUM_OBJC_KEYWORDS) are ObjCKeywordKinds, IDs in [NUM_OBJC_KEYWORDS, NUM_OBJC_KEYWORDS+NUM_BUILTIN_TYPES) are BuiltinTypeKinds (when you subtract NUM_OBJC_KEYWORDS), and IDs in [NUM_OBJC_KEYWORDS+NUM_BUILTIN_TYPES,∞) are builtin function IDs (when you subtract NUM_OBJC_KEYWORDS+NUM_BUILTIN_TYPES).

Okay, so modifying math.h to use this attribute is acceptable? That's great, that's definitely the best outcome for the compiler.

Just a minor request about the diagnostic name, but otherwise LGTM.

Yes. Added the attribute inside the included math.h header file.

How does this work for, say, glibc, musl, or MSVC CRT? Won't those math.h headers lack the attribute and thus run into problems when used with Clang?

Good point! @rjmccall are you thinking of something in particular with the attribute?

Zahira, this is what I was asking you when I asked whether modifying the math.h header was acceptable: whether you were prepared to accept that the warning would only fire on system math.h headers that we'd modified, or whether you cared about making it work with non-cooperative headers. I wasn't asking if you were willing to change the test code.

Okay sorry about the confusion. I think the diagnostic should fire when the user is including system's math.h and using float_t inside a scope cotaining a #pragma clang fp eval-method.
I am not sure what you mean by "cooperative headers"?

If not I guess we will have to rely on string comparison for all the typedef in the TU. Aaron pointed out the compile time overhead.

Well, the compile-time overhead of doing this on every typedef *declaration* is way better than the overhead of doing this on every type lookup, at least.

Anyway, there are three possibilities I can see:

  • We accept that this needs cooperative system headers.

Not sure what you mean by cooperative system headers.

  • We add a math.h compiler header that #include_nexts the system math.h and then adds the attribute. I believe you can just add an attribute to a typedef retroactively with something like typedef float_t float_t __attribute__((whatever)).

So when a user writes:

#include <math.h>
 int foo()
 {
    #pragma clang fp eval_method(source)
   return sizeof(float_t);
 }

It would be as though the user wrote:

#include "math.h"
int foo()
{
   #pragma clang fp eval_method(source)
  return sizeof(float_t);
}

where the content of this new math header being:

#include_next <math.h>
typedef float_t float_t __attribute__((whatever));
typedef double_t double_t __attribute__((whatever));

The compiler would have to recognize that the included file is math.h and create a new math.h. Where would this file reside?

  • We do checks on every typedef declaration. There's a builtin-identifier trick that we do with library functions that we should be able to generalize to typedefs, so you wouldn't need to actually do string comparisons, you'd just check whether the IdentifierInfo* was storing a special ID. We'd set that up in initializeBuiltins at the start of the translation unit, so the overhead would just be that we'd create two extra IdentifierInfo*s in every TU.

The builtin ID logic is currently specific to builtin *functions*, and I don't think we'd want to hack typedef names into that. But the same storage in IdentifierInfo* is also used for identifying the ObjC context-sensitive keywords, by just offsetting the builtin IDs by the NUM_OBJC_KEYWORD. You should be able to generalize that by also introducing a concept of a builtin *type* name, so that e.g. IDs in [0,NUM_OBJC_KEYWORDS) are ObjCKeywordKinds, IDs in [NUM_OBJC_KEYWORDS, NUM_OBJC_KEYWORDS+NUM_BUILTIN_TYPES) are BuiltinTypeKinds (when you subtract NUM_OBJC_KEYWORDS), and IDs in [NUM_OBJC_KEYWORDS+NUM_BUILTIN_TYPES,∞) are builtin function IDs (when you subtract NUM_OBJC_KEYWORDS+NUM_BUILTIN_TYPES).

Need to look at this more closely to understand what you are suggesting.

@rjmccall do you have a preference for any one of these solutions? @aaron.ballman ?
Thanks.

Okay, so modifying math.h to use this attribute is acceptable? That's great, that's definitely the best outcome for the compiler.

Just a minor request about the diagnostic name, but otherwise LGTM.

Yes. Added the attribute inside the included math.h header file.

How does this work for, say, glibc, musl, or MSVC CRT? Won't those math.h headers lack the attribute and thus run into problems when used with Clang?

Good point! @rjmccall are you thinking of something in particular with the attribute?

Zahira, this is what I was asking you when I asked whether modifying the math.h header was acceptable: whether you were prepared to accept that the warning would only fire on system math.h headers that we'd modified, or whether you cared about making it work with non-cooperative headers. I wasn't asking if you were willing to change the test code.

Okay sorry about the confusion. I think the diagnostic should fire when the user is including system's math.h and using float_t inside a scope cotaining a #pragma clang fp eval-method.
I am not sure what you mean by "cooperative headers"?

Ones that know about the special marking and use it (cooperative) or ones that don't use the special marking (uncooperative). My intuition is that we'll want this to work with any math.h because the problem still exists if you use an older C standard library release with a newer Clang.

If not I guess we will have to rely on string comparison for all the typedef in the TU. Aaron pointed out the compile time overhead.

Well, the compile-time overhead of doing this on every typedef *declaration* is way better than the overhead of doing this on every type lookup, at least.

Anyway, there are three possibilities I can see:

  • We accept that this needs cooperative system headers.

Not sure what you mean by cooperative system headers.

  • We add a math.h compiler header that #include_nexts the system math.h and then adds the attribute. I believe you can just add an attribute to a typedef retroactively with something like typedef float_t float_t __attribute__((whatever)).

So when a user writes:

#include <math.h>
 int foo()
 {
    #pragma clang fp eval_method(source)
   return sizeof(float_t);
 }

It would be as though the user wrote:

#include "math.h"
int foo()
{
   #pragma clang fp eval_method(source)
  return sizeof(float_t);
}

where the content of this new math header being:

#include_next <math.h>
typedef float_t float_t __attribute__((whatever));
typedef double_t double_t __attribute__((whatever));

Correct.

The compiler would have to recognize that the included file is math.h and create a new math.h. Where would this file reside?

Clang's implementation-specific headers live in clang/lib/Headers and we don't currently have one for math.h so you'd have to add it.

  • We do checks on every typedef declaration. There's a builtin-identifier trick that we do with library functions that we should be able to generalize to typedefs, so you wouldn't need to actually do string comparisons, you'd just check whether the IdentifierInfo* was storing a special ID. We'd set that up in initializeBuiltins at the start of the translation unit, so the overhead would just be that we'd create two extra IdentifierInfo*s in every TU.

The builtin ID logic is currently specific to builtin *functions*, and I don't think we'd want to hack typedef names into that. But the same storage in IdentifierInfo* is also used for identifying the ObjC context-sensitive keywords, by just offsetting the builtin IDs by the NUM_OBJC_KEYWORD. You should be able to generalize that by also introducing a concept of a builtin *type* name, so that e.g. IDs in [0,NUM_OBJC_KEYWORDS) are ObjCKeywordKinds, IDs in [NUM_OBJC_KEYWORDS, NUM_OBJC_KEYWORDS+NUM_BUILTIN_TYPES) are BuiltinTypeKinds (when you subtract NUM_OBJC_KEYWORDS), and IDs in [NUM_OBJC_KEYWORDS+NUM_BUILTIN_TYPES,∞) are builtin function IDs (when you subtract NUM_OBJC_KEYWORDS+NUM_BUILTIN_TYPES).

Need to look at this more closely to understand what you are suggesting.

Basically, he's saying that instead of doing a string comparison against the name of the typedef, we can ask the typedef for its IdentifierInfo * for the name and do a pointer comparison against an IdentiferInfo * that's cached.

@rjmccall do you have a preference for any one of these solutions? @aaron.ballman ?
Thanks.

My slight preference is to modify math.h, but I'd also be okay with doing the pointer comparison trick. I think there are less includes of math.h than there are typedefs in a TU, so adding a tiny bit of compile time overhead when including math.h seems to be a better tradeoff.

Zahira, this is what I was asking you when I asked whether modifying the math.h header was acceptable: whether you were prepared to accept that the warning would only fire on system math.h headers that we'd modified, or whether you cared about making it work with non-cooperative headers. I wasn't asking if you were willing to change the test code.

Okay sorry about the confusion. I think the diagnostic should fire when the user is including system's math.h and using float_t inside a scope cotaining a #pragma clang fp eval-method.
I am not sure what you mean by "cooperative headers"?

Ones that know about the special marking and use it (cooperative) or ones that don't use the special marking (uncooperative). My intuition is that we'll want this to work with any math.h because the problem still exists if you use an older C standard library release with a newer Clang.

Right. It's not a great solution for standard stuff like this.

  • We add a math.h compiler header that #include_nexts the system math.h and then adds the attribute. I believe you can just add an attribute to a typedef retroactively with something like typedef float_t float_t __attribute__((whatever)).

So when a user writes:

#include <math.h>
 int foo()
 {
    #pragma clang fp eval_method(source)
   return sizeof(float_t);
 }

It would be as though the user wrote:

#include "math.h"
int foo()
{
   #pragma clang fp eval_method(source)
  return sizeof(float_t);
}

where the content of this new math header being:

#include_next <math.h>
typedef float_t float_t __attribute__((whatever));
typedef double_t double_t __attribute__((whatever));

Correct.

Right. To be clear, this is a general thing that we already do to a bunch of other headers. It doesn't require any special handling, the compiler's include directory is just the first entry on the search list for system headers.

  • We do checks on every typedef declaration. There's a builtin-identifier trick that we do with library functions that we should be able to generalize to typedefs, so you wouldn't need to actually do string comparisons, you'd just check whether the IdentifierInfo* was storing a special ID. We'd set that up in initializeBuiltins at the start of the translation unit, so the overhead would just be that we'd create two extra IdentifierInfo*s in every TU.

The builtin ID logic is currently specific to builtin *functions*, and I don't think we'd want to hack typedef names into that. But the same storage in IdentifierInfo* is also used for identifying the ObjC context-sensitive keywords, by just offsetting the builtin IDs by the NUM_OBJC_KEYWORD. You should be able to generalize that by also introducing a concept of a builtin *type* name, so that e.g. IDs in [0,NUM_OBJC_KEYWORDS) are ObjCKeywordKinds, IDs in [NUM_OBJC_KEYWORDS, NUM_OBJC_KEYWORDS+NUM_BUILTIN_TYPES) are BuiltinTypeKinds (when you subtract NUM_OBJC_KEYWORDS), and IDs in [NUM_OBJC_KEYWORDS+NUM_BUILTIN_TYPES,∞) are builtin function IDs (when you subtract NUM_OBJC_KEYWORDS+NUM_BUILTIN_TYPES).

Need to look at this more closely to understand what you are suggesting.

Basically, he's saying that instead of doing a string comparison against the name of the typedef, we can ask the typedef for its IdentifierInfo * for the name and do a pointer comparison against an IdentiferInfo * that's cached.

That's a trick we could do, but I'm actually suggesting a step better than that. The way we handle builtin functions is that we have bits in the IdentifierInfo structure saying that the identifier is a builtin name. Those bits are generally ignored except by a few places in Sema that check for them during lookup. We eagerly create those identifiers and set those bits at the start of the TU. We could do the same trick for these names, so that when we're declaring a typedef at global scope we just check whether the name is special and trigger some special processing only if so. It would add a very, very small overhead to processing every typedef declaration, comparable to the overhead we add to processing every function declaration in order to support the declaration of "builtin" library functions.

@rjmccall do you have a preference for any one of these solutions? @aaron.ballman ?
Thanks.

My slight preference is to modify math.h, but I'd also be okay with doing the pointer comparison trick. I think there are less includes of math.h than there are typedefs in a TU, so adding a tiny bit of compile time overhead when including math.h seems to be a better tradeoff.

Yes, adding a math.h to the compiler header seems like the cleanest thing if it doesn't cause integration headaches. The redeclaration thing would be a problem if there are math.h's that don't declare these typedefs, for example (if there isn't a way to check for that).

Zahira, this is what I was asking you when I asked whether modifying the math.h header was acceptable: whether you were prepared to accept that the warning would only fire on system math.h headers that we'd modified, or whether you cared about making it work with non-cooperative headers. I wasn't asking if you were willing to change the test code.

Okay sorry about the confusion. I think the diagnostic should fire when the user is including system's math.h and using float_t inside a scope cotaining a #pragma clang fp eval-method.
I am not sure what you mean by "cooperative headers"?

Ones that know about the special marking and use it (cooperative) or ones that don't use the special marking (uncooperative). My intuition is that we'll want this to work with any math.h because the problem still exists if you use an older C standard library release with a newer Clang.

Right. It's not a great solution for standard stuff like this.

  • We add a math.h compiler header that #include_nexts the system math.h and then adds the attribute. I believe you can just add an attribute to a typedef retroactively with something like typedef float_t float_t __attribute__((whatever)).

So when a user writes:

#include <math.h>
 int foo()
 {
    #pragma clang fp eval_method(source)
   return sizeof(float_t);
 }

It would be as though the user wrote:

#include "math.h"
int foo()
{
   #pragma clang fp eval_method(source)
  return sizeof(float_t);
}

where the content of this new math header being:

#include_next <math.h>
typedef float_t float_t __attribute__((whatever));
typedef double_t double_t __attribute__((whatever));

Correct.

Right. To be clear, this is a general thing that we already do to a bunch of other headers. It doesn't require any special handling, the compiler's include directory is just the first entry on the search list for system headers.

  • We do checks on every typedef declaration. There's a builtin-identifier trick that we do with library functions that we should be able to generalize to typedefs, so you wouldn't need to actually do string comparisons, you'd just check whether the IdentifierInfo* was storing a special ID. We'd set that up in initializeBuiltins at the start of the translation unit, so the overhead would just be that we'd create two extra IdentifierInfo*s in every TU.

The builtin ID logic is currently specific to builtin *functions*, and I don't think we'd want to hack typedef names into that. But the same storage in IdentifierInfo* is also used for identifying the ObjC context-sensitive keywords, by just offsetting the builtin IDs by the NUM_OBJC_KEYWORD. You should be able to generalize that by also introducing a concept of a builtin *type* name, so that e.g. IDs in [0,NUM_OBJC_KEYWORDS) are ObjCKeywordKinds, IDs in [NUM_OBJC_KEYWORDS, NUM_OBJC_KEYWORDS+NUM_BUILTIN_TYPES) are BuiltinTypeKinds (when you subtract NUM_OBJC_KEYWORDS), and IDs in [NUM_OBJC_KEYWORDS+NUM_BUILTIN_TYPES,∞) are builtin function IDs (when you subtract NUM_OBJC_KEYWORDS+NUM_BUILTIN_TYPES).

Need to look at this more closely to understand what you are suggesting.

Basically, he's saying that instead of doing a string comparison against the name of the typedef, we can ask the typedef for its IdentifierInfo * for the name and do a pointer comparison against an IdentiferInfo * that's cached.

That's a trick we could do, but I'm actually suggesting a step better than that. The way we handle builtin functions is that we have bits in the IdentifierInfo structure saying that the identifier is a builtin name. Those bits are generally ignored except by a few places in Sema that check for them during lookup. We eagerly create those identifiers and set those bits at the start of the TU. We could do the same trick for these names, so that when we're declaring a typedef at global scope we just check whether the name is special and trigger some special processing only if so. It would add a very, very small overhead to processing every typedef declaration, comparable to the overhead we add to processing every function declaration in order to support the declaration of "builtin" library functions.

@rjmccall do you have a preference for any one of these solutions? @aaron.ballman ?
Thanks.

My slight preference is to modify math.h, but I'd also be okay with doing the pointer comparison trick. I think there are less includes of math.h than there are typedefs in a TU, so adding a tiny bit of compile time overhead when including math.h seems to be a better tradeoff.

Yes, adding a math.h to the compiler header seems like the cleanest thing if it doesn't cause integration headaches. The redeclaration thing would be a problem if there are math.h's that don't declare these typedefs, for example (if there isn't a way to check for that).

@rjmccall, @aaron.ballman and I looked over various math header files to see when float_t/double_t started. It looks like it started with C99 but is allowed with C89 with MS for example.

In glibc the typedefs are guarded with an #ifdef USE_ISOC99.

MS:

#if defined _M_IX86 && _M_IX86_FP < 2 && !defined _M_FP_FAST
   typedef long double float_t;
   typedef long double double_t;
#else
   typedef float  float_t;
   typedef double double_t;
#endif

It's not guarded with the version of the standard. So, including math.h even with std=c89 would have a float_t/double_t.

MacOS, the definition of float_t is not guarded either. We would have the same issue than with MS.

/*  Define float_t and double_t per C standard, ISO/IEC 9899:2011 7.12 2,       
    taking advantage of GCC's __FLT_EVAL_METHOD__ (which a compiler may         
    define anytime and GCC does) that shadows FLT_EVAL_METHOD (which a          
    compiler must define only in float.h).                                    *\
/
#if __FLT_EVAL_METHOD__ == 0
    typedef float float_t;
    typedef double double_t;
#elif __FLT_EVAL_METHOD__ == 1
    typedef double float_t;
    typedef double double_t;
#elif __FLT_EVAL_METHOD__ == 2 || __FLT_EVAL_METHOD__ == -1

It looks like we have an array of choices that might make the header method a bit uncontrollable.
So, if you all agree I will start looking at the identifier solution that you are proposing above.
Thanks.

Zahira, this is what I was asking you when I asked whether modifying the math.h header was acceptable: whether you were prepared to accept that the warning would only fire on system math.h headers that we'd modified, or whether you cared about making it work with non-cooperative headers. I wasn't asking if you were willing to change the test code.

Okay sorry about the confusion. I think the diagnostic should fire when the user is including system's math.h and using float_t inside a scope cotaining a #pragma clang fp eval-method.
I am not sure what you mean by "cooperative headers"?

Ones that know about the special marking and use it (cooperative) or ones that don't use the special marking (uncooperative). My intuition is that we'll want this to work with any math.h because the problem still exists if you use an older C standard library release with a newer Clang.

Right. It's not a great solution for standard stuff like this.

  • We add a math.h compiler header that #include_nexts the system math.h and then adds the attribute. I believe you can just add an attribute to a typedef retroactively with something like typedef float_t float_t __attribute__((whatever)).

So when a user writes:

#include <math.h>
 int foo()
 {
    #pragma clang fp eval_method(source)
   return sizeof(float_t);
 }

It would be as though the user wrote:

#include "math.h"
int foo()
{
   #pragma clang fp eval_method(source)
  return sizeof(float_t);
}

where the content of this new math header being:

#include_next <math.h>
typedef float_t float_t __attribute__((whatever));
typedef double_t double_t __attribute__((whatever));

Correct.

Right. To be clear, this is a general thing that we already do to a bunch of other headers. It doesn't require any special handling, the compiler's include directory is just the first entry on the search list for system headers.

  • We do checks on every typedef declaration. There's a builtin-identifier trick that we do with library functions that we should be able to generalize to typedefs, so you wouldn't need to actually do string comparisons, you'd just check whether the IdentifierInfo* was storing a special ID. We'd set that up in initializeBuiltins at the start of the translation unit, so the overhead would just be that we'd create two extra IdentifierInfo*s in every TU.

The builtin ID logic is currently specific to builtin *functions*, and I don't think we'd want to hack typedef names into that. But the same storage in IdentifierInfo* is also used for identifying the ObjC context-sensitive keywords, by just offsetting the builtin IDs by the NUM_OBJC_KEYWORD. You should be able to generalize that by also introducing a concept of a builtin *type* name, so that e.g. IDs in [0,NUM_OBJC_KEYWORDS) are ObjCKeywordKinds, IDs in [NUM_OBJC_KEYWORDS, NUM_OBJC_KEYWORDS+NUM_BUILTIN_TYPES) are BuiltinTypeKinds (when you subtract NUM_OBJC_KEYWORDS), and IDs in [NUM_OBJC_KEYWORDS+NUM_BUILTIN_TYPES,∞) are builtin function IDs (when you subtract NUM_OBJC_KEYWORDS+NUM_BUILTIN_TYPES).

Need to look at this more closely to understand what you are suggesting.

Basically, he's saying that instead of doing a string comparison against the name of the typedef, we can ask the typedef for its IdentifierInfo * for the name and do a pointer comparison against an IdentiferInfo * that's cached.

That's a trick we could do, but I'm actually suggesting a step better than that. The way we handle builtin functions is that we have bits in the IdentifierInfo structure saying that the identifier is a builtin name. Those bits are generally ignored except by a few places in Sema that check for them during lookup. We eagerly create those identifiers and set those bits at the start of the TU. We could do the same trick for these names, so that when we're declaring a typedef at global scope we just check whether the name is special and trigger some special processing only if so. It would add a very, very small overhead to processing every typedef declaration, comparable to the overhead we add to processing every function declaration in order to support the declaration of "builtin" library functions.

@rjmccall do you have a preference for any one of these solutions? @aaron.ballman ?
Thanks.

My slight preference is to modify math.h, but I'd also be okay with doing the pointer comparison trick. I think there are less includes of math.h than there are typedefs in a TU, so adding a tiny bit of compile time overhead when including math.h seems to be a better tradeoff.

Yes, adding a math.h to the compiler header seems like the cleanest thing if it doesn't cause integration headaches. The redeclaration thing would be a problem if there are math.h's that don't declare these typedefs, for example (if there isn't a way to check for that).

@rjmccall, @aaron.ballman and I looked over various math header files to see when float_t/double_t started. It looks like it started with C99 but is allowed with C89 with MS for example.

In glibc the typedefs are guarded with an #ifdef USE_ISOC99.

MS:

#if defined _M_IX86 && _M_IX86_FP < 2 && !defined _M_FP_FAST
   typedef long double float_t;
   typedef long double double_t;
#else
   typedef float  float_t;
   typedef double double_t;
#endif

It's not guarded with the version of the standard. So, including math.h even with std=c89 would have a float_t/double_t.

MacOS, the definition of float_t is not guarded either. We would have the same issue than with MS.

/*  Define float_t and double_t per C standard, ISO/IEC 9899:2011 7.12 2,       
    taking advantage of GCC's __FLT_EVAL_METHOD__ (which a compiler may         
    define anytime and GCC does) that shadows FLT_EVAL_METHOD (which a          
    compiler must define only in float.h).                                    *\
/
#if __FLT_EVAL_METHOD__ == 0
    typedef float float_t;
    typedef double double_t;
#elif __FLT_EVAL_METHOD__ == 1
    typedef double float_t;
    typedef double double_t;
#elif __FLT_EVAL_METHOD__ == 2 || __FLT_EVAL_METHOD__ == -1

It looks like we have an array of choices that might make the header method a bit uncontrollable.
So, if you all agree I will start looking at the identifier solution that you are proposing above.
Thanks.

I think the identifier solution makes the most sense. The situation I'm mostly worried about is where the system math.h does not expose float_t but our math.h would then add it -- I don't think we've got the introspection capabilities to be able to determine if the typedef is or isn't present from within our math.h header file, and I don't think we want a pile of libc-specific macros guarding the typedefs.

Yeah, that was the sort of thing I was worried about. The identifier thing is also a reasonable approach, and it'll be nice infrastructure for other, similar things.

The ranges of special identifier we already call out on IdentifierInfo* are for builtin functions (where we need to synthesize a declaration) and ObjC context-sensitive tokens (which are directly important to the parser). I think it makes sense for these to be a third category, but please don't make it as narrow as BuiltinTypedefKind or something like that; call it something generic like InterestingIdentifierKind, and make sure the code in ActOnTypedefNameDecl checks for the float_t and double_t kinds specifically before adding the attribute, so that people using this mechanism for other purposes don't inadvertently trigger your code. We have a number of other places in the compiler that would probably benefit from using this over doing IdentifierInfo* comparisons.

zahiraam updated this revision to Diff 519121.May 3 2023, 9:32 AM

(Sorry for taking so long to get to it!) Thanks.

rjmccall added inline comments.May 7 2023, 2:06 PM
clang/docs/LanguageExtensions.rst
4328

Suggestion:

``math.h`` defines the typedefs ``float_t`` and ``double_t`` based on the active
evaluation method at the point where the header is included, not where the
typedefs are used.  Because of this, it is unwise to combine these typedefs with
``#pragma clang fp eval_method``.  To catch obvious bugs, Clang will emit an
error for any references to these typedefs within the scope of this pragma;
however, this is not a fool-proof protection, and programmers must take care.
clang/include/clang/Basic/IdentifierTable.h
96 ↗(On Diff #519121)

The suggestion was to use a range of values within ObjCOrBuiltinID rather than adding
storage to all identifiers.

zahiraam marked an inline comment as done.May 9 2023, 1:58 PM

Okay, so modifying math.h to use this attribute is acceptable? That's great, that's definitely the best outcome for the compiler.

Just a minor request about the diagnostic name, but otherwise LGTM.

Yes. Added the attribute inside the included math.h header file.

How does this work for, say, glibc, musl, or MSVC CRT? Won't those math.h headers lack the attribute and thus run into problems when used with Clang?

Good point! @rjmccall are you thinking of something in particular with the attribute?

Zahira, this is what I was asking you when I asked whether modifying the math.h header was acceptable: whether you were prepared to accept that the warning would only fire on system math.h headers that we'd modified, or whether you cared about making it work with non-cooperative headers. I wasn't asking if you were willing to change the test code.

If not I guess we will have to rely on string comparison for all the typedef in the TU. Aaron pointed out the compile time overhead.

Well, the compile-time overhead of doing this on every typedef *declaration* is way better than the overhead of doing this on every type lookup, at least.

Anyway, there are three possibilities I can see:

  • We accept that this needs cooperative system headers.
  • We add a math.h compiler header that #include_nexts the system math.h and then adds the attribute. I believe you can just add an attribute to a typedef retroactively with something like typedef float_t float_t __attribute__((whatever)).
  • We do checks on every typedef declaration. There's a builtin-identifier trick that we do with library functions that we should be able to generalize to typedefs, so you wouldn't need to actually do string comparisons, you'd just check whether the IdentifierInfo* was storing a special ID. We'd set that up in initializeBuiltins at the start of the translation unit, so the overhead would just be that we'd create two extra IdentifierInfo*s in every TU.

The builtin ID logic is currently specific to builtin *functions*, and I don't think we'd want to hack typedef names into that. But the same storage in IdentifierInfo* is also used for identifying the ObjC context-sensitive keywords, by just offsetting the builtin IDs by the NUM_OBJC_KEYWORD. You should be able to generalize that by also introducing a concept of a builtin *type* name, so that e.g. IDs in [0,NUM_OBJC_KEYWORDS) are ObjCKeywordKinds, IDs in [NUM_OBJC_KEYWORDS, NUM_OBJC_KEYWORDS+NUM_BUILTIN_TYPES) are BuiltinTypeKinds (when you subtract NUM_OBJC_KEYWORDS), and IDs in [NUM_OBJC_KEYWORDS+NUM_BUILTIN_TYPES,∞) are builtin function IDs (when you subtract NUM_OBJC_KEYWORDS+NUM_BUILTIN_TYPES).

Reading this more carefully... Does that mean that we initialize the float_t, double_t in initializeBuiltins even if they are not used in the source code?
Also not sure how to define the NUM_BUILTIN_TYPES since I don't need to add it to TokenKinds.h? I was proposing to do something like this:
enum InterestingIdentifierKind {
#define Interesting_Identifier(X) X,
#include "clang/Basic/TokenKinds.def"

NUM_INTERESTING_IDENTIFIERS

};
But I guess I don't need since we don't want to add additional storage. Do I understand things correctly?

Okay, so modifying math.h to use this attribute is acceptable? That's great, that's definitely the best outcome for the compiler.

Just a minor request about the diagnostic name, but otherwise LGTM.

Yes. Added the attribute inside the included math.h header file.

How does this work for, say, glibc, musl, or MSVC CRT? Won't those math.h headers lack the attribute and thus run into problems when used with Clang?

Good point! @rjmccall are you thinking of something in particular with the attribute?

Zahira, this is what I was asking you when I asked whether modifying the math.h header was acceptable: whether you were prepared to accept that the warning would only fire on system math.h headers that we'd modified, or whether you cared about making it work with non-cooperative headers. I wasn't asking if you were willing to change the test code.

If not I guess we will have to rely on string comparison for all the typedef in the TU. Aaron pointed out the compile time overhead.

Well, the compile-time overhead of doing this on every typedef *declaration* is way better than the overhead of doing this on every type lookup, at least.

Anyway, there are three possibilities I can see:

  • We accept that this needs cooperative system headers.
  • We add a math.h compiler header that #include_nexts the system math.h and then adds the attribute. I believe you can just add an attribute to a typedef retroactively with something like typedef float_t float_t __attribute__((whatever)).
  • We do checks on every typedef declaration. There's a builtin-identifier trick that we do with library functions that we should be able to generalize to typedefs, so you wouldn't need to actually do string comparisons, you'd just check whether the IdentifierInfo* was storing a special ID. We'd set that up in initializeBuiltins at the start of the translation unit, so the overhead would just be that we'd create two extra IdentifierInfo*s in every TU.

The builtin ID logic is currently specific to builtin *functions*, and I don't think we'd want to hack typedef names into that. But the same storage in IdentifierInfo* is also used for identifying the ObjC context-sensitive keywords, by just offsetting the builtin IDs by the NUM_OBJC_KEYWORD. You should be able to generalize that by also introducing a concept of a builtin *type* name, so that e.g. IDs in [0,NUM_OBJC_KEYWORDS) are ObjCKeywordKinds, IDs in [NUM_OBJC_KEYWORDS, NUM_OBJC_KEYWORDS+NUM_BUILTIN_TYPES) are BuiltinTypeKinds (when you subtract NUM_OBJC_KEYWORDS), and IDs in [NUM_OBJC_KEYWORDS+NUM_BUILTIN_TYPES,∞) are builtin function IDs (when you subtract NUM_OBJC_KEYWORDS+NUM_BUILTIN_TYPES).

Reading this more carefully... Does that mean that we initialize the float_t, double_t in initializeBuiltins even if they are not used in the source code?

Yes. If we decide this is an overhead worth eliminating, we'll find a way to do it lazily to the builtins, and then we'll be able to take advantage of it here, too. The builtins are a much larger contributor to overhead.

Also not sure how to define the NUM_BUILTIN_TYPES since I don't need to add it to TokenKinds.h? I was proposing to do something like this:
enum InterestingIdentifierKind {
#define Interesting_Identifier(X) X,
#include "clang/Basic/TokenKinds.def"

NUM_INTERESTING_IDENTIFIERS

};
But I guess I don't need since we don't want to add additional storage. Do I understand things correctly?

We should have an enum like this, yes. And this is what we do in IdentifierTable.h for all the other kinds. Alternatively, you can just hard-code the number and then static_assert that it's correct in some .cpp file.

FWIW, I think I like NUM_INTERESTING_IDENTIFIERS as a name rather than NUM_BUILTIN_TYPES.

Okay, so modifying math.h to use this attribute is acceptable? That's great, that's definitely the best outcome for the compiler.

Just a minor request about the diagnostic name, but otherwise LGTM.

Yes. Added the attribute inside the included math.h header file.

How does this work for, say, glibc, musl, or MSVC CRT? Won't those math.h headers lack the attribute and thus run into problems when used with Clang?

Good point! @rjmccall are you thinking of something in particular with the attribute?

Zahira, this is what I was asking you when I asked whether modifying the math.h header was acceptable: whether you were prepared to accept that the warning would only fire on system math.h headers that we'd modified, or whether you cared about making it work with non-cooperative headers. I wasn't asking if you were willing to change the test code.

If not I guess we will have to rely on string comparison for all the typedef in the TU. Aaron pointed out the compile time overhead.

Well, the compile-time overhead of doing this on every typedef *declaration* is way better than the overhead of doing this on every type lookup, at least.

Anyway, there are three possibilities I can see:

  • We accept that this needs cooperative system headers.
  • We add a math.h compiler header that #include_nexts the system math.h and then adds the attribute. I believe you can just add an attribute to a typedef retroactively with something like typedef float_t float_t __attribute__((whatever)).
  • We do checks on every typedef declaration. There's a builtin-identifier trick that we do with library functions that we should be able to generalize to typedefs, so you wouldn't need to actually do string comparisons, you'd just check whether the IdentifierInfo* was storing a special ID. We'd set that up in initializeBuiltins at the start of the translation unit, so the overhead would just be that we'd create two extra IdentifierInfo*s in every TU.

The builtin ID logic is currently specific to builtin *functions*, and I don't think we'd want to hack typedef names into that. But the same storage in IdentifierInfo* is also used for identifying the ObjC context-sensitive keywords, by just offsetting the builtin IDs by the NUM_OBJC_KEYWORD. You should be able to generalize that by also introducing a concept of a builtin *type* name, so that e.g. IDs in [0,NUM_OBJC_KEYWORDS) are ObjCKeywordKinds, IDs in [NUM_OBJC_KEYWORDS, NUM_OBJC_KEYWORDS+NUM_BUILTIN_TYPES) are BuiltinTypeKinds (when you subtract NUM_OBJC_KEYWORDS), and IDs in [NUM_OBJC_KEYWORDS+NUM_BUILTIN_TYPES,∞) are builtin function IDs (when you subtract NUM_OBJC_KEYWORDS+NUM_BUILTIN_TYPES).

Reading this more carefully... Does that mean that we initialize the float_t, double_t in initializeBuiltins even if they are not used in the source code?

Yes. If we decide this is an overhead worth eliminating, we'll find a way to do it lazily to the builtins, and then we'll be able to take advantage of it here, too. The builtins are a much larger contributor to overhead.

Also not sure how to define the NUM_BUILTIN_TYPES since I don't need to add it to TokenKinds.h? I was proposing to do something like this:
enum InterestingIdentifierKind {
#define Interesting_Identifier(X) X,
#include "clang/Basic/TokenKinds.def"

NUM_INTERESTING_IDENTIFIERS

};
But I guess I don't need since we don't want to add additional storage. Do I understand things correctly?

We should have an enum like this, yes. And this is what we do in IdentifierTable.h for all the other kinds. Alternatively, you can just hard-code the number and then static_assert that it's correct in some .cpp file.

FWIW, I think I like NUM_INTERESTING_IDENTIFIERS as a name rather than NUM_BUILTIN_TYPES.

Okay, so I did add that in TokenKinds.h. Isn't that the right place for it? The same way it's done for the other builtins? And in TokenKinds.def I added the lines for the interesting identifiers?

Okay, so modifying math.h to use this attribute is acceptable? That's great, that's definitely the best outcome for the compiler.

Just a minor request about the diagnostic name, but otherwise LGTM.

Yes. Added the attribute inside the included math.h header file.

How does this work for, say, glibc, musl, or MSVC CRT? Won't those math.h headers lack the attribute and thus run into problems when used with Clang?

Good point! @rjmccall are you thinking of something in particular with the attribute?

Zahira, this is what I was asking you when I asked whether modifying the math.h header was acceptable: whether you were prepared to accept that the warning would only fire on system math.h headers that we'd modified, or whether you cared about making it work with non-cooperative headers. I wasn't asking if you were willing to change the test code.

If not I guess we will have to rely on string comparison for all the typedef in the TU. Aaron pointed out the compile time overhead.

Well, the compile-time overhead of doing this on every typedef *declaration* is way better than the overhead of doing this on every type lookup, at least.

Anyway, there are three possibilities I can see:

  • We accept that this needs cooperative system headers.
  • We add a math.h compiler header that #include_nexts the system math.h and then adds the attribute. I believe you can just add an attribute to a typedef retroactively with something like typedef float_t float_t __attribute__((whatever)).
  • We do checks on every typedef declaration. There's a builtin-identifier trick that we do with library functions that we should be able to generalize to typedefs, so you wouldn't need to actually do string comparisons, you'd just check whether the IdentifierInfo* was storing a special ID. We'd set that up in initializeBuiltins at the start of the translation unit, so the overhead would just be that we'd create two extra IdentifierInfo*s in every TU.

The builtin ID logic is currently specific to builtin *functions*, and I don't think we'd want to hack typedef names into that. But the same storage in IdentifierInfo* is also used for identifying the ObjC context-sensitive keywords, by just offsetting the builtin IDs by the NUM_OBJC_KEYWORD. You should be able to generalize that by also introducing a concept of a builtin *type* name, so that e.g. IDs in [0,NUM_OBJC_KEYWORDS) are ObjCKeywordKinds, IDs in [NUM_OBJC_KEYWORDS, NUM_OBJC_KEYWORDS+NUM_BUILTIN_TYPES) are BuiltinTypeKinds (when you subtract NUM_OBJC_KEYWORDS), and IDs in [NUM_OBJC_KEYWORDS+NUM_BUILTIN_TYPES,∞) are builtin function IDs (when you subtract NUM_OBJC_KEYWORDS+NUM_BUILTIN_TYPES).

Reading this more carefully... Does that mean that we initialize the float_t, double_t in initializeBuiltins even if they are not used in the source code?

Yes. If we decide this is an overhead worth eliminating, we'll find a way to do it lazily to the builtins, and then we'll be able to take advantage of it here, too. The builtins are a much larger contributor to overhead.

Also not sure how to define the NUM_BUILTIN_TYPES since I don't need to add it to TokenKinds.h? I was proposing to do something like this:
enum InterestingIdentifierKind {
#define Interesting_Identifier(X) X,
#include "clang/Basic/TokenKinds.def"

NUM_INTERESTING_IDENTIFIERS

};
But I guess I don't need since we don't want to add additional storage. Do I understand things correctly?

We should have an enum like this, yes. And this is what we do in IdentifierTable.h for all the other kinds. Alternatively, you can just hard-code the number and then static_assert that it's correct in some .cpp file.

FWIW, I think I like NUM_INTERESTING_IDENTIFIERS as a name rather than NUM_BUILTIN_TYPES.

Okay but should this be added to TokenKinds.def or Builtin.defs? the patch is adding it to TokenKinds.

Okay, so I did add that in TokenKinds.h. Isn't that the right place for it? The same way it's done for the other builtins? And in TokenKinds.def I added the lines for the interesting identifiers?

What you're doing in TokenKinds.{def,h} seems fine. What I'm objecting to is adding another field to IdentifierInfo to store it instead of fitting it into ObjCOrBuiltinID. Currently an given identifier is either normal, an ObjC keyword, or a builtin ID, and it determines this by checking what range of values ObjCOrBuiltinID fits into. You can use the same idea to allow an identifier to be either normal, an ObjC keyword, a builtin ID, or an interesting.

clang/include/clang/Basic/TokenKinds.def
89 ↗(On Diff #519121)

Please follow the existing pattern by spelling this INTERESTING_IDENTIFIER.

zahiraam updated this revision to Diff 522800.May 16 2023, 2:53 PM
zahiraam marked an inline comment as done.

@rjmccall Made a bit of progress on this, but still not quite where it should be! Would appreciate your feedback/guidance to see that's moving in the right direction. Not sure about lazily creating the interesting identifiers? There are still some LIT tests failing but will care for those when I get your feedback. Thanks.

rjmccall added inline comments.May 19 2023, 11:04 AM
clang/include/clang/Basic/IdentifierTable.h
316 ↗(On Diff #522800)

This is closer to the right approach, thanks. I think the best way to do this is to put the ObjC keywords first, then the interesting identifiers, then the builtins. That means the builtin ID code above should check for / add / subtract tok::NUM_OBJC_KEYWORDS + tok::NUM_INTERESTING_IDENTIFIERS; please add a FirstBuiltinID constant for that in this class so that we can more easily rework things in the future (e.g. if we decide to stuff more things into this field). And then the code for InterestingIdentifierID should add/subtract tok::NUM_OBJC_KEYWORDS; again, please add a FirstInterestingIdentifier constant for that.

Please make these two functions use tok::InterestingIdentifierKind. For getInterestingIdentifierID(), you'll have to decide how you want to represent that something isn't an interesting identifier. ObjCKeywordKind has a special enumerator (not_keyword) at value 0, so you could do the same thing here, but I think it might be better to use llvm::Optional.

zahiraam updated this revision to Diff 524882.May 23 2023, 2:16 PM
zahiraam added inline comments.
clang/include/clang/Basic/IdentifierTable.h
316 ↗(On Diff #522800)

This is closer to the right approach, thanks. I think the best way to do this is to put the ObjC keywords first, then the interesting identifiers, then the builtins. That means the builtin ID code above should check for / add / subtract tok::NUM_OBJC_KEYWORDS + tok::NUM_INTERESTING_IDENTIFIERS; please add a FirstBuiltinID constant for that in this class so that we can more easily rework things in the future (e.g. if we decide to stuff more things into this field). And then the code for InterestingIdentifierID should add/subtract tok::NUM_OBJC_KEYWORDS; again, please add a FirstInterestingIdentifier constant for that.

I have made the changes in this new version of the patch.

Please make these two functions use tok::InterestingIdentifierKind. For getInterestingIdentifierID(), you'll have to decide how you want to represent that something isn't an interesting identifier. ObjCKeywordKind has a special enumerator (not_keyword) at value 0, so you could do the same thing here, but I think it might be better to use llvm::Optional.

However, I am stumped for this. If we want to use tok::InterestingIdentifierKind, then the addition of the interesting Identifiers need to be made within IdentifierTable::Addkeywords (by adding a new macro). If we do it the way it's done in this patch in Builtin::Context::initializeBuiltins then tok::InterestingIdentifierKind can't be used. Unless I am missing something?

Also, I left the tok::not_interesting only because not really sure how to use the llvm::Optional.

@rjmccall any chance we get your feedback on this please? Thanks.

rjmccall added inline comments.Jun 13 2023, 11:46 AM
clang/include/clang/Basic/Builtins.h
65 ↗(On Diff #524882)

You shouldn't muddle this into Builtin::ID.

clang/include/clang/Basic/IdentifierTable.h
79 ↗(On Diff #524882)
// The "layout" of ObjCOrBuiltinID is:
//   - The first value (0) represents "not a special identifier".
//   - The next (NUM_OBJC_KEYWORDS - 1) values represent ObjCKeywordKinds (not
//      including objc_not_keyword).
//   - The next (NUM_INTERESTING_IDENTIFIERS - 1) values represent InterestingIdentifierKinds
//      (not including not_interesting).
//   - The rest of the values represent builtin IDs (not including not_builtin).
static constexpr int FirstObjCKeywordID = 1;
static constexpr int LastObjCKeywordID = FirstObjCKeywordID + tok::NUM_OBJC_KEYWORDS - 2;
static constexpr int FirstInterestingIdentifierID = LastObjCKeywordID + 1;
static constexpr int LastInterestingIdentifierID = LastObjCKeywordID + tok::NUM_INTERESTING_IDENTIFIERS - 2;
static constexpr int FirstBuiltinID = LastInterestingIdentifierID + 1;
295 ↗(On Diff #524882)
static_assert(FirstObjCKeywordID == 1, "hard-coding this assumption to simplify code");
if (ObjCOrBuiltinID <= LastObjCKeywordID)
  return tok::ObjCKeywordKind(ObjCOrBuiltinID);
else
  return tok::objc_not_keyword;
306 ↗(On Diff #524882)
if (ObjCOrBuiltinID >= FirstBuiltinID)
  return 1 + (ObjCOrBuiltinID - FirstBuiltinID);
else
  return 0;
314 ↗(On Diff #524882)
assert(ID != NotBuiltin);
ObjCOrBuiltinID = FirstBuiltinID + (ID - 1);
assert(getBuiltinID() == ID && "ID too large for field!");
323 ↗(On Diff #524882)
tok::InterestingIdentifierKind getInterestingIdentifierID() const {
  if (ObjCOrBuiltinID >= FirstInterestingIdentifierID && ObjCOrBuiltinID <= LastInterestingIdentifierID)
    return tok::InterestingIdentifierKind(1 + (ObjCOrBuiltinID - FirstInterestingIdentifierID));
  else
    return tok::not_interesting;
}
zahiraam marked an inline comment as done.Jun 14 2023, 6:34 AM
zahiraam added inline comments.
clang/lib/Basic/Builtins.cpp
33 ↗(On Diff #524882)

From your comment in Builtin.h: "You shouldn't muddle this into Builtin::ID."
This means this shouldn't happen here. It should be done in IdentiferTable::AddKeywords as I had done previously?

rjmccall added inline comments.Jun 14 2023, 10:06 AM
clang/lib/Basic/Builtins.cpp
33 ↗(On Diff #524882)

You shouldn't be adding entries for the interesting keywords to the builtins table, no. This table is indexed by Builtin::ID.

I agree that AddKeywords seems like the right place to set up the interesting identifiers rather than initializeBuiltins.

138 ↗(On Diff #524882)

You shouldn't use FirstBuiltinID and FirstInterestingIdentifierID as the starting points here.

zahiraam updated this revision to Diff 531884.Jun 15 2023, 1:08 PM
zahiraam marked an inline comment as done.

Thanks.

rjmccall added inline comments.Jun 15 2023, 2:15 PM
clang/include/clang/Basic/IdentifierTable.h
85 ↗(On Diff #531884)

The code below does not represent NotBuiltin (that's what adding and subtracting 1 does).

89 ↗(On Diff #531884)

I see that I had a bug in my suggestion: I had meant to write LastInterestingIdentifierID = FirstInterestingIdentifierID + tok::NUM_INTERESTING_IDENTIFIERS - 2; but I left it in terms of LastObjCKeywordID instead, making the ranges off by 1. Your math fixes that; sorry about that. I do think it would be clearer if each of these chained off the last one, the way I meant to have it, though. So with your ranges (which leave space to explicitly represent not_interesting), that would look like LastInterestingIdentifierID = FirstInterestingIdentifierID + tok::NUM_INTERESTING_IDENTIFIERS - 1;.

I'm not going to push you to not represent not_interesting, since you seem to have deliberately changed things back that way, and I don't think it matters that much. Although maybe you did that just because it didn't work in the code I gave you? It would be more consistent with the other enums to not explicitly represent not_interesting.

325 ↗(On Diff #531884)

initializeBuiltins does call setBuiltinID(NotBuiltin) in order to reset any identifiers for which we have a -fno-builtin=X argument, so you need to handle that somehow, because otherwise this is going to underflow and set up the identifier with the last InterestingIdentifierKind. There are two options:

  • You could just make initializeBuiltins call some sort of clearBuiltinID method that resets ObjCOrBuiltinID to 0. That has the advantage of making the "lifecycle" of this field much more straightforward. For example, we probably ought to be asserting in these methods that we're not overwriting existing data; clearBuiltinID would know that it was okay to clear data, but only from a builtin.
  • You could also just check for NotBuiltin and set ObjCBuiltinID = 0 when you see it.
zahiraam added inline comments.Jun 16 2023, 5:15 AM
clang/include/clang/Basic/IdentifierTable.h
89 ↗(On Diff #531884)

Sorry @rjmccall, I didn't mean to go against your proposal. It's a misunderstanding of my part. I was going through the debugger with an objc test case and was under the impression that not_keyword was added. Will make the change to not represent not_interesting.

zahiraam added inline comments.Jun 16 2023, 11:44 AM
clang/include/clang/Basic/IdentifierTable.h
325 ↗(On Diff #531884)

initializeBuiltins does call setBuiltinID(NotBuiltin) in order to reset any identifiers for which we have a -fno-builtin=X argument, so you need to handle that somehow, because otherwise this is going to underflow and set up the identifier with the last InterestingIdentifierKind. There are two options:

  • You could just make initializeBuiltins call some sort of clearBuiltinID method that resets ObjCOrBuiltinID to 0. That has the advantage of making the "lifecycle" of this field much more straightforward. For example, we probably ought to be asserting in these methods that we're not overwriting existing data; clearBuiltinID would know that it was okay to clear data, but only from a builtin.

Would that be an additional step in initializeBuiltins or part of step #4? I understants intuitively what the issue would be here if we didn't do that, but I can't find a test case that goes through that step. Would be clearer to me.

  • You could also just check for NotBuiltin and set ObjCBuiltinID = 0 when you see it.

I don't think this can be done here. builtin::NotBuiltin is not known here in IdentierTable.h (get a lots of build errors). So, this needs to be done in initializerBultins method.

rjmccall added inline comments.Jun 16 2023, 12:57 PM
clang/include/clang/Basic/IdentifierTable.h
325 ↗(On Diff #531884)

Would that be an additional step in initializeBuiltins or part of step #4? I understants intuitively what the issue would be here if we didn't do that, but I can't find a test case that goes through that step. Would be clearer to me.

Yeah, it'd be step 4: instead of calling Table.get(Name).setBuiltinID(Builtin::NotBuiltin);, it would call Table.get(Name).clearBuiltinID().

...although while you're editing that code, please change Table.get(Name) to NameIt->second. We've already looked up the IdentifierInfo and don't need to do it again.

I don't think this can be done here. builtin::NotBuiltin is not known here in IdentierTable.h (get a lots of build errors). So, this needs to be done in initializerBultins method.

Okay, that works for me. Please at least assert that the argument to setBuiltinID is not this value. It's acceptable to assume that the value is 0 — we're doing that already with the -1 / +1.

zahiraam updated this revision to Diff 532293.Jun 16 2023, 2:26 PM
zahiraam marked 3 inline comments as done.
zahiraam added inline comments.Jun 16 2023, 2:36 PM
clang/include/clang/Basic/IdentifierTable.h
326 ↗(On Diff #531884)

Is this the assert you are looking for?

rjmccall added inline comments.Jun 16 2023, 3:01 PM
clang/include/clang/Basic/IdentifierTable.h
341 ↗(On Diff #532293)

Similarly, this assertion needs to be != tok::not_interesting (i.e. != 0).

326 ↗(On Diff #531884)

No, I want you to assert that ID (the argument) is not 0, because if it is you'll end up setting ObjCOrBuiltinID to FirstBuiltinID - 1, i.e. LastInterestingIdentifierID.

zahiraam updated this revision to Diff 532643.Jun 19 2023, 7:18 AM
zahiraam marked 2 inline comments as done.
zahiraam updated this revision to Diff 532734.Jun 19 2023, 12:01 PM
rjmccall added inline comments.Jun 20 2023, 10:38 AM
clang/lib/Sema/SemaDecl.cpp
6784 ↗(On Diff #532734)

Please switch over the interesting identifiers here; we don't want to assume this feature is only used for these two names.

In fact, should we go ahead and immediately apply it to the four identifiers above this? That would be nice, because then we could actually do this in two patches: one patch that does the refactor to track interesting identifiers but doesn't cause any functionality changes and a second, very small patch that just introduces the new special treatment for float_t and double_t.

zahiraam added inline comments.Jun 20 2023, 11:37 AM
clang/lib/Sema/SemaDecl.cpp
6784 ↗(On Diff #532734)

Please switch over the interesting identifiers here; we don't want to assume this feature is only used for these two names.

In fact, should we go ahead and immediately apply it to the four identifiers above this? That would be nice, because then we could actually do this in two patches: one patch that does the refactor to track interesting identifiers but doesn't cause any functionality changes and a second, very small patch that just introduces the new special treatment for float_t and double_t.

Are you saying that "FILE", "jmp_buf"," sigjmp_buf" and "ucontext_t" are also interesting identifiers? If yes, they should be added to the list of interesting identifiers in TokenKinds.def?

rjmccall added inline comments.Jun 20 2023, 2:39 PM
clang/lib/Sema/SemaDecl.cpp
6784 ↗(On Diff #532734)

Right. The basic idea of interesting identifiers is to replace these sorts of identifier comparisons in performance-critical code. So your first patch would *only* add those four identifiers as interesting identifiers, handling them here by registering the typedef with the ASTContext like the code is already doing. Then you'd make a follow-up patch that adds float_t and double_t and handles them here by implicitly adding your new attribute.

zahiraam updated this revision to Diff 533334.Jun 21 2023, 10:50 AM
zahiraam retitled this revision from Float_t and double_t types shouldn't be modified by #pragma clang fp eval_method to [clang] Add a namespace for interesting identifiers..
zahiraam edited the summary of this revision. (Show Details)
zahiraam marked an inline comment as done.
zahiraam added inline comments.
clang/lib/Sema/SemaDecl.cpp
6784 ↗(On Diff #532734)

Right. The basic idea of interesting identifiers is to replace these sorts of identifier comparisons in performance-critical code. So your first patch would *only* add those four identifiers as interesting identifiers, handling them here by registering the typedef with the ASTContext like the code is already doing. Then you'd make a follow-up patch that adds float_t and double_t and handles them here by implicitly adding your new attribute.

I think that does it?

Yes, this is very close, thank you.

clang/include/clang/Basic/TokenKinds.def
805 ↗(On Diff #533334)

I think it would be cleaner if you added these two as interesting identifiers in your follow-up patch.

clang/lib/Sema/SemaLookup.cpp
950 ↗(On Diff #533334)

The two changes in this file shouldn't be necessary, since interesting identifiers should be disjoint from builtins.

zahiraam updated this revision to Diff 533360.Jun 21 2023, 12:09 PM
zahiraam marked 3 inline comments as done.
rjmccall accepted this revision.Jun 21 2023, 12:15 PM

Thanks, LGTM!

This revision is now accepted and ready to land.Jun 21 2023, 12:15 PM
zahiraam updated this revision to Diff 533583.Jun 22 2023, 6:34 AM
rjmccall accepted this revision.Jun 22 2023, 8:23 AM
This revision was automatically updated to reflect the committed changes.