This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] Reject invalid enum initializers in C files
ClosedPublic

Authored by LegalizeAdulthood on May 14 2022, 7:49 PM.

Details

Summary

C requires that enum values fit into an int. Scan the macro tokens
present in an initializing expression and reject macros that contain
tokens that have suffixes making them larger than int.

C forbids the comma operator in enum initializing expressions, so
optionally reject comma operator.

Fixes #55467

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptMay 14 2022, 7:49 PM
LegalizeAdulthood requested review of this revision.May 14 2022, 7:49 PM

Note, I'm in C standards committee meetings all week this week, so I expect I won't be doing many reviews this week and I'll be catching up as best I can starting next week.

clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
324–331

Huh? Comma wasn't allowed in C89 either... In fact, there's no language mode which allows top-level commas in either C or C++ -- the issue with the comma operator is a parsing one. You can't tell the difference between the comma being part of the initializer expression or the comma being the separator between enumerators.

326

And C89?

clang-tools-extra/test/clang-tidy/checkers/modernize-macro-to-enum.c
2

It'd be useful to run this test in C++ mode as well to demonstrate the behavioral differences.

4–8

How do we want to handle the fact that Clang has an extension to allow this? Perhaps we want a config option for pedantic vs non-pedantic fixes?

10

It's also forbidden in C++.

11

Can you add a test:

#define GOOD_OP (1, 2)

to make sure it still gets converted to an enum?

clang-tools-extra/test/clang-tidy/checkers/modernize-macro-to-enum.c
10

C++ just talks about a constant expression and doesn't explicitly disallow operator, in such expressions that I could find. Can you point me to where it is disallowed?

If it is disallowed universally, then I don't see why you asked me to parse it :)

aaron.ballman added inline comments.May 16 2022, 10:20 AM
clang-tools-extra/test/clang-tidy/checkers/modernize-macro-to-enum.c
10

C++ just talks about a constant expression and doesn't explicitly disallow operator, in such expressions that I could find. Can you point me to where it is disallowed?

It falls out from the grammar:

http://eel.is/c++draft/enum#nt:enumerator-definition
http://eel.is/c++draft/expr.const#nt:constant-expression
http://eel.is/c++draft/expr.cond#nt:conditional-expression

If it is disallowed universally, then I don't see why you asked me to parse it :)

It can show up in paren expressions and the interface is generally about integer literal expressions.

LegalizeAdulthood marked 2 inline comments as done.May 16 2022, 9:38 PM
LegalizeAdulthood added inline comments.
clang-tools-extra/test/clang-tidy/checkers/modernize-macro-to-enum.c
10

OK, yeah, I was looking at the PDF and it doesn't have those nice navigable links, I'll have to keep that site in mind.

So.... we have to disallow top-level comma but allow it as a parenthesized expression. Tricky!

11

Yeah, another one I was thinking of is when someone does something as disgusting as this.

#define ARGS 1, 2
#define OTHER_ARGS (1, 2)

int f(int x, int y) { return x + y; }

int main()
{
    return f(ARGS) + f OTHER_ARGS;
}

However, the only real way to handle avoiding conversion of the 2nd case is to examine the context of macro expansion. This is another edge case that will have to be handled subsequently.

This gets tricky because AFAIK there is no way to select expressions in the AST that result from macro expansion. You have to match the macro expansion locations against AST nodes to identify the node(s) that match the expansion location yourself.

aaron.ballman added inline comments.May 24 2022, 12:03 PM
clang-tools-extra/test/clang-tidy/checkers/modernize-macro-to-enum.c
11

That's a good test case (for some definition of good)! :-)

At this point, there's a few options:

  1. Go back to the way things were -- disallow comma expressions, even in parens. Document it as a limitation.
  2. Accept comma expressions in parens, ignore the problem case like you described above. Document it as a limitation?
  3. Try to solve every awful code construct we can ever imagine. Document the check is perfect in every way, but probably not land it for several years.

I think I'd be happy with #2 but I could also live with #1

LegalizeAdulthood marked an inline comment as done.
  • Update from review comments
  • Disallow operator, unless inside parentheses
LegalizeAdulthood marked 5 inline comments as done.May 27 2022, 5:52 PM
LegalizeAdulthood added inline comments.
clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
324–331

The most recent diff handles the issue with operator, but the previous version was only handling the overly-big literal

326

C99 was the earliest dialect of C I could find in LangOptions.def. If you can show me an earlier version for C, I'm happy to switch it.

clang-tools-extra/test/clang-tidy/checkers/modernize-macro-to-enum.c
4–8

I was trying to find a way to make it fail on gcc/clang on compiler explorer and I couldn't get it to fail in either compiler....

Where is the extension documented? It appears to be on by default in both compilers.

11

In the most recent diff, I'm supporting operator, inside parens for C++ and never allowing it in C.

I plan to do a subsequent enhancement that examines the expansion locations of macros and rejects any macro whose expansion doesn't encompass a single expression. In other words, if the tokens in the macro somehow get incorporated into some syntactic element that isn't completely encased in a single expression, then reject the macro.

LegalizeAdulthood marked 3 inline comments as done.
  • Add C++ test case for acceptable operator, initializer
aaron.ballman accepted this revision.May 31 2022, 6:18 AM

LGTM with a few comments that can be addressed when you commit.

clang-tools-extra/clang-tidy/modernize/IntegralLiteralExpressionMatcher.cpp
87–89
91–93

I (personally) think this is a more clear form of initialization to most folks, but don't insist.

clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
326

This keeps catching folks out, I may add a helper method to make this far more clear. !LangOpts.CPlusPlus means "in a C mode" for us currently.

clang-tools-extra/test/clang-tidy/checkers/modernize-macro-to-enum.c
4–8

I don't think we have any documentation for it (we basically don't bother to document our implementation-defined behavior, which drives me mildly nuts -- we fall back on "the source code is the definitive documentation", which is not user-friendly at all).

That's why I was wondering if we wanted a pedantic vs non-pedantic option here. Pedantically, the behavior you have today is correct. However, because this is a super common extension to compilers, we might want to allow it to be transformed despite being pedantically an extension.

We can handle that as a follow-up though.

This revision is now accepted and ready to land.May 31 2022, 6:18 AM
LegalizeAdulthood marked 2 inline comments as done.Jun 1 2022, 8:30 PM
LegalizeAdulthood added inline comments.
clang-tools-extra/test/clang-tidy/checkers/modernize-macro-to-enum.c
4–8

Is there a way to force it to fail with some compiler flag? If you could show me on compiler explorer, that would be great.

LegalizeAdulthood marked 4 inline comments as done.

Update from review comments

This revision was landed with ongoing or failed builds.Jun 1 2022, 9:57 PM
This revision was automatically updated to reflect the committed changes.