Page MenuHomePhabricator

[C2X] N3007 Type inference for object definitions
Needs ReviewPublic

Authored by to268 on Sep 5 2022, 4:30 AM.

Details

Reviewers
aaron.ballman
jyknight
sammccall
rjmccall
efriedma
Group Reviewers
Restricted Project
Summary

This patches implements the auto keyword from the N3007 standard specification.
This allows deducing the type of the variable like in C++

auto nb = 1;
auto chr = 'A';
auto str = "String";

The list of statements which allows the usage of auto:

  • Basic variables declarations (int, float, double, char, char*...)
  • Macros declaring a variable with the auto type

The list of statements which will not work with the auto keyword:

  • auto arrays
  • sizeof(), alignas()
  • auto parameters, auto return type
  • auto as a struct/typedef member
  • uninitialized auto variables
  • auto in an union
  • auto as a enum type specifier
  • auto casts
  • auto in an compound statement

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Herald added a project: Restricted Project. · View Herald TranscriptSep 5 2022, 4:30 AM
to268 requested review of this revision.Sep 5 2022, 4:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 5 2022, 4:30 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
to268 edited the summary of this revision. (Show Details)Sep 5 2022, 4:39 AM
to268 updated this revision to Diff 458089.Sep 5 2022, 9:04 PM

I have fixed my CodeGen test with the windows platform.
(Linux uses an i64 for the long type instead windows is using an i32)

to268 updated this revision to Diff 458092.Sep 5 2022, 9:40 PM

Fixed DeclSpec.cpp formatting

shafik added a subscriber: shafik.Sep 6 2022, 11:43 AM

The proposal has several nice tests and I don't they are all covered here e.g.:

{
  double a = 7;
  double b = 9;
  {
    double b = b * b;       // undefined, uses uninitialized variable without address
to268 updated this revision to Diff 458358.Sep 6 2022, 10:08 PM

I've added a test case with scopes.

to268 added a comment.Sep 6 2022, 10:11 PM

Also @aaron.ballman said me that he was thinking about maybe adding an extension to add support for auto in a function return or in a parameter list.

cor3ntin added a subscriber: cor3ntin.EditedSep 7 2022, 12:55 AM

Also @aaron.ballman said me that he was thinking about maybe adding an extension to add support for auto in a function return or in a parameter list.

I think extensions are best left to a separate pr, if at all.
Supporting deduced return type sounds fine, with a warning.
Supporting auto template parameters... Not so much, as it would in effect bring function templates, overloading and mangling into C... I don't think we want to go there.

I think we should consider supporting auto in older language modes though.

Or at least, improving the diagnostics for auto in older language modes.

type specifier missing, defaults to 'int'.

The probability that someone tries to use auto as a storage specifier and wants an implicit int at the same time is about 0.

This is awesome, thank you for working on it! @to268 and I had a really good discussion during my office hours and what we decided for next steps were:

0) Next time you upload a patch, please use -U9999 to give the patch more context for reviewers to see.

  1. Need to handle function pointers: auto (*)(void) and void (*)(auto) should both be rejected - you may want to consider adding logic to Sema::BuildPointerType() to semantically restrict forming such a function pointer type.
  2. Need a ton of random tests:
auto file_scope_1 = 12;
// This <might> be something we accept because the redundant storage class specifier is not
// specifying automatic storage duration due to the lack of a type specifier, or we might reject
// because of the redundant storage class specifier. I have a question out to the author to clarify.
auto auto file_scope_2 = 12;

void func(void) {
  int i;
  _Generic(i, auto : 0); // Should reject auto in an association
  _Alignof(auto); // Should also reject
  _Alignas(auto); // Should also reject


  auto j = ({ auto x = 12; x; }); // Should work, x and j should both be int
  auto auto k = 12; // 99% this is intended to be accepted; first `auto` is the storage class specifier, second `auto` is a redundant storage class specifier

  const int ci = 12;
  auto yup = ci;
  yup = 12; // Okay, the type of yup is int, not const int

  _Atomic(auto) atom1 = 12; // Should reject, asking WG14 about it
  _Atomic auto atom2 = 12; // Should also reject
}
`

It's worth noting that auto in C is a very, very strange beast. It's a storage-class-specifier, not a type-name. If there's a storage class specifier and NO TYPE NAME, then the type is inferred. So grammar productions that use type-name without also using storage-class-specifier should reject any use of auto because that's not a valid type name.

clang/test/C/C2X/n3007.c
6 ↗(On Diff #458358)

Rather than include the system header, you should use _Alignas and _Alignof spellings.

45 ↗(On Diff #458358)

Heh, this doesn't have much to do with structs. :-)

50 ↗(On Diff #458358)

An additional tests here that's interesting is:

_Static_assert(_Generic(test_char_ptr, const char * : 1, char * : 2) == 2, "C is weird");
clang/test/Parser/c2x-auto.c
10

Should also test a structure member just to have the coverage.

clang/test/Sema/c2x-auto.c
4

Same suggestion here as above.

12–21

This seems to be duplicated from the parser tests -- there is only a need to test this once (and in each of these cases it seems to be a parsing restriction, so the tests should live in Parser).

33

This should have a FIXME comment -- we should produce a better diagnostic here. That we give an error is correct though!

C2x 6.5.2.5p4 specifies that compound literal is rewritten as if it were a declaration: SC typeof(T) ID = { IL }; (where SC is the storage class, T is the type, ID is a program-wide unique ID, and IL is the initializer list).

C2x 6.7.2.5p1 only allows typeof on an expression or a type-name, and type-name does not include auto as an option.

So the rewrite makes this invalid.

clang/www/c_status.html
1196

This should use class unreleased and specify Clang 16 instead of Yes ("Yes" basically means "we have no idea when we started supporting something, we've probably supported this forever")

shafik added inline comments.Sep 12 2022, 9:14 AM
clang/test/Sema/c2x-auto.c
50

When I made the comment about the example from the proposal, this was what I was thinking about.

FYI: regarding auto auto, I've heard back from the author of N3007 and the intent was to disallow this construct ("others" was meant to exclude auto and "except typedef" was meant to add onto the list of exclusions).

One thought that occurred to me is that the way C has this specified is awfully effectively the same way implicit int worked. It may be worth exploring a change to our implicit int functionality to instead generate an implicit auto type when in C2x mode.

to268 marked 6 inline comments as done.EditedSep 13 2022, 10:24 PM
auto auto k = 12; // 99% this is intended to be accepted; first `auto` is the storage class specifier, second `auto` is a redundant storage class specifier

It seems that specifying auto twice isn't supported at the time being, so i think i should fix that too.

auto auto test = 12; // error: cannot combine with previous 'auto' declaration specifier
auto auto k = 12; // 99% this is intended to be accepted; first `auto` is the storage class specifier, second `auto` is a redundant storage class specifier

It seems that specifying auto twice isn't supported at the time being, so i think i should fix that too.

auto auto test = 12; // error: cannot combine with previous 'auto' declaration specifier

Yup! I just learned yesterday that the error there is correct and intended, at least according to the author of WG N3007.

to268 updated this revision to Diff 461790.Sep 20 2022, 10:19 PM
to268 marked 2 inline comments as done.

I've added more test cases, added a diagnostic when using auto in compound literals and advised fixes

Also i have found that we don't parse the compound literal with the auto keyword correctly in ParseExpr.cpp between line 939 to 973 which is the beginning of Parser::ParseCastExpression(...)

int test_cl = (int){12};      // Compound literal is detected
auto test_cl2 = (auto){12};   // Compound literal is not detected

I've haven't dig deeper yet but maybe it's because we are not including auto when trying to guess if it's a compound literal due to the fact that auto is a storage-class-specifier

clang/test/Sema/c2x-auto.c
50

Do i need to treat shadowing when using auto as invalid?

One thought that occurred to me is that the way C has this specified is awfully effectively the same way implicit int worked. It may be worth exploring a change to our implicit int functionality to instead generate an implicit auto type when in C2x mode.

I've been thinking about this more and I'm starting to make myself a bit uncomfortable with the current approach, at least in terms of how we're handling it on the parsing side of things. I think it's reasonable for us to form an AutoType when we eventually get down to forming the type. But I'm uncomfortable with not modeling the language when parsing. e.g., I think we want to parse auto as a storage class specifier rather than a type specifier, and we want to rely on the lack of a type specifier coupled with use of auto as a storage class specifier to determine the situations where we want to infer a type. The resulting semantics should be basically equivalent, but this ensures that we're properly parsing as the language expects which helps us be forwards-compatible with future changes in C that build on top of this being a storage class specifier rather than a type.

Adding a few more reviewers for some extra opinions on this.

clang/include/clang/Basic/DiagnosticParseKinds.td
374 ↗(On Diff #461790)
clang/lib/Parse/ParseExpr.cpp
1515 ↗(On Diff #461790)

Why would this not be handled from Sema::ActOnCompoundLiteral()?

clang/test/C/C2x/n3007.c
8

Why? My reading of the grammar is that const auto is valid. const is a type-specifier-qualifier declaration-specifier, and auto is a storage-class-specifier declaration-specifier, and a declaration is allowed to use a sequence of declaration-specifiers.

16
clang/test/Sema/c2x-auto.c
50

You shouldn't need to do anything special there, I think. It should naturally fall out that you cannot use a variable in its own initialization. Note, you don't even need an inner scope for that: https://godbolt.org/z/EYa3fMxqx (example is compiled in C++ mode).

shafik added inline comments.Sep 21 2022, 11:17 AM
clang/test/Sema/c2x-auto.c
50

The scope of the identifier begins after its declarator so a is being used in it's own deduction which can not be.

e.g.

double a = 1.0;
  {
      int a = a*a; // undefined behavior a is used uninitialized within it's own initialization
  }
to268 updated this revision to Diff 463415.Sep 27 2022, 10:12 PM
to268 marked 3 inline comments as done.

I've fixed the diagnostic message and all of the bad TODOs comments from the Sema test.
I've explained in details why i'm only able to handle auto in compound literals.

One thought that occurred to me is that the way C has this specified is awfully effectively the same way implicit int worked. It may be worth exploring a change to our implicit int functionality to instead generate an implicit auto type when in C2x mode.

That's something is was trying to do a few weeks earlier, so i'll work on that too.

I've been thinking about this more and I'm starting to make myself a bit uncomfortable with the current approach, at least in terms of how we're handling it on the parsing side of things. I think it's reasonable for us to form an AutoType when we eventually get down to forming the type. But I'm uncomfortable with not modeling the language when parsing. e.g., I think we want to parse auto as a storage class specifier rather than a type specifier, and we want to rely on the lack of a type specifier coupled with use of auto as a storage class specifier to determine the situations where we want to infer a type. The resulting semantics should be basically equivalent, but this ensures that we're properly parsing as the language expects which helps us be forwards-compatible with future changes in C that build on top of this being a storage class specifier rather than a type.

I agree, that's a better approach to design it like that, i'll change my approach when i'll be able to generate an implicit auto.

Here are all the details that i've said earlier

clang/lib/Parse/ParseExpr.cpp
1515 ↗(On Diff #461790)

You're right, it's better to handle this in Sema::ActOnCompoundLiteral()
The problem is that right now, the usage of the auto keyword in a compound literal isn't parsed as a compound literal.
This compound literal is unable to reach Sema::ActOnCompoundLiteral() because the parser emits that it's an invalid expression.

To summarize, i'm unable to handle handle a compound literal that uses the auto keyword inside Sema::ActOnCompoundLiteral()
if it's never going to be called.

int test_ncl = (int){12};           // Parsed as a CL
auto test_cl = (auto){12};          // Not parsed as a CL and emits "expected expression"
/*
 * |-DeclStmt 0x562180ede8b0 <line:17:5, col:29>
 * | `-VarDecl 0x562180ede730 <col:5, col:28> col:9 test_ncl 'int' cinit
 * |   `-ImplicitCastExpr 0x562180ede898 <col:20, col:28> 'int' <LValueToRValue>
 * |     `-CompoundLiteralExpr 0x562180ede870 <col:20, col:28> 'int' lvalue
 * |       `-InitListExpr 0x562180ede828 <col:25, col:28> 'int'
 * |         `-IntegerLiteral 0x562180ede7b0 <col:26> 'int' 12
 * |-DeclStmt 0x562180ede970 <line:18:5, col:30>
 * | `-VarDecl 0x562180ede908 <col:5, col:10> col:10 invalid test_cl 'auto'
 * `-ReturnStmt 0x562180ede9a8 <line:19:5, col:12>
 *   `-IntegerLiteral 0x562180ede988 <col:12> 'int' 0
 */
1515 ↗(On Diff #461790)

When we are parsing an expression between parentheses in Parser::ParseParenExpression() line 2972,
we are calling Parser::isTypeIdInParens() which when using C will return true if it's a type specifier,
which is not the case for the auto keyword.

Ultimately, we fall into the conditional block of Parser::ParseParenExpression() line 3191,
which will return an ExprError() (AKA: a parsing error).

This is why we are unable to forbid a compound literal using the auto keyword at the
Semantic Analysis stage and why this feature is not working out of the box in the first place.

clang/test/C/C2x/n3007.c
8

It's a mistake, i don't remember why i've added this comment in the first place and i've forgot his existence

aaron.ballman added inline comments.Sep 28 2022, 9:45 AM
clang/lib/Parse/ParseExpr.cpp
1515 ↗(On Diff #461790)

When we are parsing an expression between parentheses in Parser::ParseParenExpression() line 2972,

we are calling Parser::isTypeIdInParens() which when using C will return true if it's a type specifier,
which is not the case for the auto keyword.

Ahhhh, that makes a lot of sense!

to268 updated this revision to Diff 465280.Oct 4 2022, 10:30 PM
This comment was removed by to268.
to268 updated this revision to Diff 465312.Oct 5 2022, 1:02 AM

Added a check to not diagnose a missing type specifier in C2x mode

to268 updated this revision to Diff 471535.Oct 28 2022, 6:48 AM

Reworked the parsing side of the patch

to268 added a comment.Oct 28 2022, 6:52 AM

Also i'm not sure if must add C2x to this condition just in case (falling back or add an assertion)

clang/lib/Sema/DeclSpec.cpp
1361

Do we need to include C2x just in case?

Thank you for the update to this!

clang/lib/Parse/ParseExpr.cpp
1526 ↗(On Diff #471535)

I don't think this is quite right; I suspect we're going to need more complicated logic here. Consider a case like: (auto const){ 12 } or (auto *){ nullptr } where two tokens ahead is ) and not {. (Note, these type specifications can be pretty arbitrarily complex.)

Given that this is a temporary workaround for a lack of support for storage class specifiers in compound literals, perhaps another approach is to not try to catch uses in compound literals. Instead, we could add tests with FIXME comments to demonstrate the behavior we get with compound literals now, and when we do that storage class specifier work, we will (hopefully) break those tests and come back to make them correct.

1530–1531 ↗(On Diff #471535)

This looks a bit less visually jarring to me (the indentation differences were distracting). WDYT?

clang/lib/Sema/DeclSpec.cpp
1361

I don't think so -- we're in a language where the type specifier *is* optional for C2x.

1369–1372
clang/test/C/C2x/n3007.c
14

I'd also like a test for:

_Atomic auto i = 12;
_Atomic(auto) j = 12;

_Atomic(int) foo(void);
auto k = foo(); // Do we get _Atomic(int) or just int?
37

I think this test should be _Alignas(auto) right?

41–42

I think other array cases we want to test are:

auto a = { 1 , 2 }; // Error
auto b = { 1, }; // OK
auto c = (int [3]){ 1, 2, 3 }; // OK
57

I'd also like to ensure we reject:

typedef auto (*fp)(void);
typedef void (*fp)(auto);

_Generic(0, auto : 1);

and we should ensure we properly get integer promotions:

short s;
auto a = s;
_Generic(a, int : 1);

and a test for use of an explicit pointer declarator (which is UB that we can define if we want to):

int a;
auto *ptr = &a; // Okay?
auto *ptr = a; // Error
clang/test/Sema/c2x-auto.c
28

This one should not be accepted -- the grammar productions for the initialization allow for {0} and {0,} but no more than one element in the braces.

to268 marked 7 inline comments as done.Sun, Nov 13, 10:43 PM

This is a status update of the patch.

clang/lib/Parse/ParseExpr.cpp
1526 ↗(On Diff #471535)

I think it's better to make a test of the actual behavior than trying to correct this case.

1530–1531 ↗(On Diff #471535)

That's better but i remember that clang-format has failed on that.

clang/test/C/C2x/n3007.c
14

The _Atomic qualifier is dropped so the type of k is an int

41–42

These statements are not working:

auto b = { 1, }; // Not valid
auto d = { 1 };  // Not valid

I think it's related to the compound literal.

57
> and we should ensure we properly get integer promotions:
>

short s;
auto a = s;
_Generic(a, int : 1);

I got the error controlling expression type 'short' not compatible with any generic association type so we don't promote from short to int

to268 updated this revision to Diff 478113.Sun, Nov 27, 11:18 AM
to268 marked 5 inline comments as done.

Added test cases and added an explicit auto* diagnostic

to268 updated this revision to Diff 481339.Thu, Dec 8, 9:44 AM

Removed compound literal diagnostic in ParseExpr