This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] Support expressions of literals in modernize-macro-to-enum
ClosedPublic

Authored by LegalizeAdulthood on Apr 26 2022, 8:29 PM.

Details

Summary

Add a recursive descent parser to match macro expansion tokens against
fully formed valid expressions of integral literals. Partial expressions will
not be matched -- they can't be valid initializing expressions for an enum.

Fixes #55055

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptApr 26 2022, 8:29 PM
LegalizeAdulthood requested review of this revision.Apr 26 2022, 8:29 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 26 2022, 8:29 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
LegalizeAdulthood set the repository for this revision to rG LLVM Github Monorepo.Apr 26 2022, 8:29 PM
clang-tools-extra/clang-tidy/modernize/IntegralLiteralExpressionMatcher.cpp
2

ditto

293

Structure of these feels very similar, I'll see if I can squish out the duplication

clang-tools-extra/clang-tidy/modernize/IntegralLiteralExpressionMatcher.h
2

Uh.... I guess this needs some sort of copyright notice?

clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
325

inline variable made explicit for debugging

  • Add banner block comment for new files
  • Extract Functions to eliminate duplication
clang-tools-extra/unittests/clang-tidy/ModernizeModuleTest.cpp
2

Needs a file header

Add file block comment

Update documentation

aaron.ballman added inline comments.Apr 29 2022, 7:43 AM
clang-tools-extra/clang-tidy/modernize/IntegralLiteralExpressionMatcher.cpp
26

(Same suggestion elsewhere -- just double check that all the comments are full sentences with capitalization and punctuation.)

33–34

Do we need to care about integer suffixes that make a non-integer type, like: https://godbolt.org/z/vx3xbGa41

100

I know this is code moved from elsewhere, but I suppose we never considered the odd edge case where a user does something like "foo"[0] as a really awful integer constant. :-D

186

There is GNU extension in this space: https://godbolt.org/z/PrWY3T6hY

205

Comma operator?

clang-tools-extra/clang-tidy/modernize/IntegralLiteralExpressionMatcher.h
10

We don't use #pragma once (not portable, not reliable).

19–21

Oh boy, I'm not super excited about having another parser to maintain...

It'd be nice if we had a ParserUtils.cpp/h file that made it easier to go from an arbitrary array of tokens to AST nodes + success/failure information on parsing the tokens. It's not strictly needed for what you're trying to accomplish here, but it would be a much more general interface and it would remove the support burden from adding another parser that's out of Clang's tree.

clang-tools-extra/test/clang-tidy/checkers/modernize-macro-to-enum.cpp
26

Other interesting tests I'd expect we could convert into an enum (at least theoretically):

#define A 12 + +1
#define B 12 - -1
#define C (1, 2, 3)
#define D 100 ? : 8
#define E 100 ? 100 : 8
#define F 'f'
#define G "foo"[0]
#define H 1 && 2
#define I 1 || 2
clang-tools-extra/unittests/clang-tidy/ModernizeModuleTest.cpp
67
12i
.0
100
100 ? : 10
1, 2
135

This one is valid

LegalizeAdulthood marked 2 inline comments as done.Apr 29 2022, 10:45 AM
LegalizeAdulthood added inline comments.
clang-tools-extra/clang-tidy/modernize/IntegralLiteralExpressionMatcher.cpp
33–34

I don't think those will be parsed as literal tokens by the preprocessor, but I'll check.

100

It's always possible to write crazy contorted code and have a check not recognize it. I don't think it's worthwhile to spend time trying to handle torture cases unless we can find data that shows it's prevalent in real world code.

If I was doing a code review and saw this:

enum {
    FOO = "foo"[0]
};

I'd flag that in a code review as bogus, whereas if I saw:

enum {
    FOO = 'f'
};

That would be acceptable, which is why character literals are accepted as an integral literal initializer for an enum in this check.

186

Do you have a link for the extension?

205

Remember that the use case here is identifying expressions that are initializers for an enum. If you were doing a code review and you saw this:

enum {
    FOO = (2, 3)
};

Would you be OK with that? I wouldn't. Clang even warns about it: https://godbolt.org/z/Y641cb8Y9

Therefore I deliberately left comma operator out of the grammar.

clang-tools-extra/test/clang-tidy/checkers/modernize-macro-to-enum.cpp
26

Most of these (except comma operator and string subscript, see my comments earlier) are covered in the unit test for the matcher. I'll add tests for these:

12 + +1
12 - -1
100 ? : 8
clang-tools-extra/unittests/clang-tidy/ModernizeModuleTest.cpp
67

.0 is already covered by the case 1.23. I'm not home brewing tokenization, but using the Lexer to do that.

12i I need to investigate to find out what the Lexer does.

aaron.ballman added inline comments.Apr 29 2022, 11:50 AM
clang-tools-extra/clang-tidy/modernize/IntegralLiteralExpressionMatcher.cpp
100

I don't think it's worthwhile to spend time trying to handle torture cases unless we can find data that shows it's prevalent in real world code.

I think I'm okay agreeing to that in this particular case, but this is more to point out that writing your own parser is a maintenance burden. Users will hit cases we've both forgotten about here, they'll file a bug, then someone has to deal with it. It's very hard to justify to users "we think you write silly code" because they often have creative ways in which their code is not actually so silly, especially when we support "most" valid expressions.

186
205

This is another case where I think you're predicting that users won't be using the full expressivity of the language and we'll get bug reports later. Again, in insolation, I tend to agree that I wouldn't be happy seeing that code. However, users write some very creative code and there's no technical reason why we can't or shouldn't handle comma operators.

To clarify my previous comments, I'm fine punting on the edge cases until user reports come in, so don't let them block this review if you feel strongly about not supporting them. But when user reports start coming in, at some point I might start asking to replace the custom parser with calling into the clangParse library through some invented utility interface so that we don't have to deal with a long tail of bug reports.

LegalizeAdulthood marked 3 inline comments as done.Apr 29 2022, 12:40 PM
LegalizeAdulthood added inline comments.
clang-tools-extra/clang-tidy/modernize/IntegralLiteralExpressionMatcher.cpp
100

Writing your own parser is unavoidable here because we can't just assume that any old thing will be a valid initializer just by looking at the set of tokens present in the macro body. (There is a separate discussion going on about improving the preprocessor support and parsing things more deeply, but that isn't even to the point of a prototype yet.) The worst thing we can do is create "fixits" that produce invalid code.

The worst that happens if your expression isn't recognized is that your macro isn't matched as a candidate for an enum. You can always make it an enum manually and join it with adjacent macros that were recognized and converted.

As it stands, the check only recognizes a single literal with an optional unary operator.

This change expands the check to recognize a broad range of expressions, allowing those macros to be converted to enums. I opened the issue because running modernize-macro-to-enum on the ITK codebase showed some simple expressions involving literals that weren't recognized and converted.

If an expression isn't recognized and an issue is opened, it will be an enhancement request to support a broader range of expression, not a bug that this check created invalid code.

IMO, the more useful thing that's missing from the grammar is recognizing sizeof expressions rather than indexing string literals with an integral literal subscript.

I had planned on doing another increment to recognize sizeof expressions.

205

"Don't let the perfect be the enemy of the good."

My inclination is to simply explicitly state that comma operator is not recognized in the documentation. It's already implicit by it's absence from the list of recognized operators.

Again, the worst that happens is that your macro isn't converted.

I'm open to being convinced that it's important, but you haven't convinced me yet :)

To clarify my previous comments, I'm fine punting on the edge cases until user reports come in, so don't let them block this review if you feel strongly about not supporting them. But when user reports start coming in, at some point I might start asking to replace the custom parser with calling into the clangParse library through some invented utility interface so that we don't have to deal with a long tail of bug reports.

Yeah, I think punting on edge cases is the right thing to do here. As I say,
the worst that happens is your macro isn't converted automatically when you
could convert it manually.

Maybe we're thinking about this check differently.

I want this check to do the majority of the heavy lifting so that I'm only left with a
few "weird" macros that I might have to convert by hand. I never, ever, ever want
this check to generate invalid code. Therefore it is of paramount importance that
it be conservative about what it recognizes as a candidate for conversion.

I think this is the baseline for all the modernize checks, really. There are still cases
where loop-convert doesn't recognize an iterator based loop that could be
converted to a range-based for loop. The most important thing is that loop-convert
doesn't take my weird iterator based loop and convert it to a range based for loop
that doesn't compile.

As for calling into clangParse, I think that would be overkill for a couple reasons.
First, the real parser is going to do a lot of work creating AST nodes which we will
never use, except to traverse the structure looking for things that would invalidate
it as a candidate for a constant initializing expression. Second, we only need to
match the structure, we don't need to extract any information from the token stream
other than a "thumbs up" or "thumbs down" that it is a valid initializing expression.
Many times in clang-tidy reviews performance concerns are raised and I think
matching the token stream with the recursive descent matcher here is going to be
much, much faster than invoking clangParse, particularly since the matcher bails
out early on the first token that doesn't match. The only thing I can think of that
would make it faster is if we could get the lexed tokens from the preprocessor
instead of making it re-lex the macro body, but that's a change beyond the scope
of this check or even clang-tidy.

LegalizeAdulthood marked 3 inline comments as done.Apr 29 2022, 1:02 PM
LegalizeAdulthood added inline comments.
clang-tools-extra/unittests/clang-tidy/ModernizeModuleTest.cpp
67

OK, so 12i turns into numeric_constant token, so I've added test cases to exclude those and enhanced the matcher.

Essentially that's a bug in the existing implementation that 12i wasn't rejected outright.

LegalizeAdulthood marked 7 inline comments as done.Apr 29 2022, 1:19 PM
LegalizeAdulthood added inline comments.
clang-tools-extra/clang-tidy/modernize/IntegralLiteralExpressionMatcher.h
19–21

Yeah, I'm not a fan of duplication either, but see my earlier comments about why I think clangParse is overkill here.

LegalizeAdulthood marked an inline comment as done.

Update from review comments

clang-tools-extra/clang-tidy/modernize/IntegralLiteralExpressionMatcher.cpp
205

It wasn't much extra work/code to add comma operator support so I've done that.

Recognize comma operator expressions

FYI, once nice addition of the parsing of macro bodies is that it paves the way for
a modernize-macro-to-function check that converts function-like macros that
compute values to template functions. Once this change has landed, I'll be putting
up a review for that.

To clarify my previous comments, I'm fine punting on the edge cases until user reports come in, so don't let them block this review if you feel strongly about not supporting them. But when user reports start coming in, at some point I might start asking to replace the custom parser with calling into the clangParse library through some invented utility interface so that we don't have to deal with a long tail of bug reports.

Yeah, I think punting on edge cases is the right thing to do here. As I say,
the worst that happens is your macro isn't converted automatically when you
could convert it manually.

I largely agree, but I've found cases where we'll convert correct code to incorrect code, so it's a bit stronger than that.

Maybe we're thinking about this check differently.

I want this check to do the majority of the heavy lifting so that I'm only left with a
few "weird" macros that I might have to convert by hand. I never, ever, ever want
this check to generate invalid code. Therefore it is of paramount importance that
it be conservative about what it recognizes as a candidate for conversion.

I think that's a reasonable goal, but we're not meeting the "never ever generate invalid code" part. I already know we can break correct C and C++ code through overflow. Should we ever allow an option to use an enum with a fixed underlying type in C++ (either enum class or enum : type form), we'll have the same breakage there but at different thresholds.

I think this is the baseline for all the modernize checks, really. There are still cases
where loop-convert doesn't recognize an iterator based loop that could be
converted to a range-based for loop. The most important thing is that loop-convert
doesn't take my weird iterator based loop and convert it to a range based for loop
that doesn't compile.

As for calling into clangParse, I think that would be overkill for a couple reasons.
First, the real parser is going to do a lot of work creating AST nodes which we will
never use, except to traverse the structure looking for things that would invalidate
it as a candidate for a constant initializing expression. Second, we only need to
match the structure, we don't need to extract any information from the token stream
other than a "thumbs up" or "thumbs down" that it is a valid initializing expression.

There's a few reasons I disagree with this. First, you need to know the value of the constant expression in order to know whether it's valid as an enumeration constant. That alone requires expression evaluation capabilities, assuming you want the check to behave correctly for those kinds of cases. But second, without support for generating that AST and validating it, you can never handle cases like this:

constexpr int a = 12;
constexpr int foo() { return 12; }

#define FOO (a + 1)
#define BAR (a + 2)
#define BAZ (a + 3)
#define QUUX (foo() + 4)

because you have no way to know whether or not that constant expression is valid based solely on token soup (especially when you start to factor in namespaces, qualified lookup, template instantiations, etc). So I think avoiding the parser will limit the utility of this check. And maybe that's fine, maybe it's only ever intended to convert C code using defines to C code using enumerations or other such simple cases.

Many times in clang-tidy reviews performance concerns are raised and I think
matching the token stream with the recursive descent matcher here is going to be
much, much faster than invoking clangParse, particularly since the matcher bails
out early on the first token that doesn't match.

Absolutely 100% agreed on this point.

The only thing I can think of that
would make it faster is if we could get the lexed tokens from the preprocessor
instead of making it re-lex the macro body, but that's a change beyond the scope
of this check or even clang-tidy.

Yeah, and that wouldn't even be sufficient because I still think we're going to want to know the *value* of the expression at some point.

Allllll that "this is the wrong way!" doom and gloom aside, I still think you're fine to proceed with the current approach if you'd like to. The situations under which it will break correct code should be something we document explicitly in the check somewhere, but they feel sufficiently like edge cases to me that I'm fine moving forward with the caution in mind that if this becomes too much more problematic in the future, we have *a* path forward we could use to improve things should we decide we need to.

clang-tools-extra/clang-tidy/modernize/IntegralLiteralExpressionMatcher.cpp
100

Writing your own parser is unavoidable here because we can't just assume that any old thing will be a valid initializer just by looking at the set of tokens present in the macro body.

If you ran the token sequence through clang's parser and got an AST node out, you'd have significantly *more* information as to whether something is a valid enum constant initializer because you can check that it's an actual constant expression *and* that it's within a valid range of values. This not only fixes edge case bugs with your approach (like the fact that you can generate a series of literal expressions that result in a value too large to store within an enumerator constant), but it enables new functionality your approach currently disallows (like using constexpr variables instead of just numeric literals).

So I don't agree that it's unavoidable to write another custom parser.

205

"Don't let the perfect be the enemy of the good."

This is a production compiler toolchain. Correctness is important and that sometimes means caring more about perfection than you otherwise would like to.

I'm open to being convinced that it's important, but you haven't convinced me yet :)

It's less about importance and more about maintainability coupled with correctness. With your approach, we get something that will have a long tail of bugs. If you used Clang's parser, you don't get the same issue -- maintenance largely comes along for free, and the bugs are far less likely.

About the only reason I like your current approach over using clang's parsing is that it quite likely performs much better than doing an actual token parsing of the expression. But as you pointed out, about the worst thing for a check can do is take correct code and make it incorrect -- doing that right requires some amount of semantic evaluation of the expressions (which you're not doing). For example:

#define FINE 1LL << 30LL;
#define BAD 1LL << 31LL;
#define ALSO_BAD 1LL << 32L;

We'll convert this into an enumeration and break -pedantic-errors builds in C. If we had a ConstantExpr object, we could see what it's value is and note that it's greater than what fits into an int and decide to do something smarter.

So I continue to see the current approach as being somewhat reasonable (especially for experimentation), but incorrect in the long run. Not sufficiently incorrect for me to block this patch on, but incorrect enough that the first time this check becomes a maintenance burden, I'll be asking more strongly to do this the correct way.

To clarify my previous comments, I'm fine punting on the edge cases until user reports come in, so don't let them block this review if you feel strongly about not supporting them. But when user reports start coming in, at some point I might start asking to replace the custom parser with calling into the clangParse library through some invented utility interface so that we don't have to deal with a long tail of bug reports.

Yeah, I think punting on edge cases is the right thing to do here. As I say,
the worst that happens is your macro isn't converted automatically when you
could convert it manually.

I largely agree, but I've found cases where we'll convert correct code to incorrect code, so it's a bit stronger than that.

Are you talking generally, or with this check? I can't see how this check
is going to generate incorrect code (so far).

I think that's a reasonable goal, but we're not meeting the "never ever generate invalid code" part.

How so? Can you give an example where this check will produce invalid code?

As for calling into clangParse, I think that would be overkill for a couple reasons.
First, the real parser is going to do a lot of work creating AST nodes which we will
never use, except to traverse the structure looking for things that would invalidate
it as a candidate for a constant initializing expression. Second, we only need to
match the structure, we don't need to extract any information from the token stream
other than a "thumbs up" or "thumbs down" that it is a valid initializing expression.

There's a few reasons I disagree with this. First, you need to know the value of the
constant expression in order to know whether it's valid as an enumeration constant.

I'm not following you. Nothing requires knowing this yet.

constexpr int a = 12;
constexpr int foo() { return 12; }

#define FOO (a + 1)
#define BAR (a + 2)
#define BAZ (a + 3)
#define QUUX (foo() + 4)

QUUX will never be converted to an enum by this check. It references an identifier foo.

The situations under which it will break correct code should be something we document explicitly

You haven't shown an example yet where it will break code.

clang-tools-extra/clang-tidy/modernize/IntegralLiteralExpressionMatcher.cpp
100

You keep bringing up the idea that the values have to be known, but so far they don't.

Remember, we are replacing macro identifiers with anonymous enum identifiers. We aren't specifying a restricting type to the enum, so as long as it's a valid integral literal expression, we're not changing any semantics. Unscoped enums also allow arbitrary conversions to/from an underlying integral type chosen by the compiler.

C++20 9.7.1 paragraph 7 says:

For an enumeration whose underlying type is not fixed, the underlying type is an integral type that can
represent all the enumerator values defined in the enumeration. If no integral type can represent all the
enumerator values, the enumeration is ill-formed. It is implementation-defined which integral type is used
as the underlying type except that the underlying type shall not be larger than int unless the value of an
enumerator cannot fit in an int or unsigned int . If the enumerator-list is empty, the underlying type is as
if the enumeration had a single enumerator with value 0.

So the compiler is free to pick an underlying type that's large enough to handle all the explicitly listed initial values. Do we actually need to know the values for this check? I don't think so, because we aren't changing anything about the type of the named values. When the compiler evaluates an integral literal, it goes through a similar algorithm assigning the appropriate type to those integral values:

C++20 5.9 paragraph 2:

A preprocessing number does not have a type or a value; it acquires both after a successful conversion to an
integer-literal token or a floating-point-literal token.

C++20 5.13.2 paragraph 3:

The type of an integer-literal is the first type in the list in Table 8 corresponding to its optional integer-suffix
in which its value can be represented.

The table says the type is int, unsigned int, long int, unsigned long int, long long int, or unsigned long long int based on the suffix and the value and that the type is chosen to be big enough to hold the value if the suffix is unspecified.

but [using clangParse] enables new functionality your approach currently disallows (like using constexpr variables instead of just numeric literals).

I agree that if we used the full parser, we'd bring in constexpr expressions as valid initializers for the enums. However, before engaging in all that work, I'd like to see how likely this is in existing codebases by feedback from users requesting the support. Maybe engaging the parser isn't a big amount of work, I don't actually know. I've never looked deeply at the actual parsing code in clang. Maybe it's easy enough to throw a bag of tokens at it and get back an AST node, maybe not. (I suspect not based on my experience with the code base so far.)

My suspicion is that code bases that are heavy with macros for constants aren't using modern C++ in the body of those macros to define the values of those constants. Certainly this is 100% true for C code that uses macros to define constants, by definition. This check applies equally well to C code as C has had enums forever but even recent C code still tends to use macros for constants.

Still, my suspicions aren't data. I'd like to get this check deployed in a basic fashion and let user feedback provide data on what is important.

So I don't agree that it's unavoidable to write another custom parser.

That's a fair point. Some kind of parser is needed to recognize valid initializer expressions or we run the risk of transforming valid code into invalid code. Whether it is a custom recognizer as I've done or clangParse is what we're debating here.

205

"Don't let the perfect be the enemy of the good."

This is a production compiler toolchain. Correctness is important and that sometimes means caring more about perfection than you otherwise would like to.

That's fair.

For example:

#define FINE 1LL << 30LL;
#define BAD 1LL << 31LL;
#define ALSO_BAD 1LL << 32L;

Oh this brings up the pesky "semicolons disappear from the AST" issue. I wonder what happens when we're just processing tokens, though. I will add a test to see. This could be a case where my approach results in more correctness than clangParse!

Not sufficiently incorrect for me to block this patch on, but incorrect enough that the first time this check becomes a maintenance burden, I'll be asking more strongly to do this the correct way.

I agree.

LegalizeAdulthood marked 2 inline comments as done.May 2 2022, 9:47 AM

I largely agree, but I've found cases where we'll convert correct code to incorrect code, so it's a bit stronger than that.

Are you talking generally, or with this check? I can't see how this check
is going to generate incorrect code (so far).

Specifically this check.

I think that's a reasonable goal, but we're not meeting the "never ever generate invalid code" part.

How so? Can you give an example where this check will produce invalid code?

As posted before:

#define FINE 1LL << 30LL
#define BAD 1LL << 31LL
#define ALSO_BAD 1LL << 32L

Now with godbolt goodness: https://godbolt.org/z/Tzbe8qWT5

clang-tools-extra/clang-tidy/modernize/IntegralLiteralExpressionMatcher.cpp
100

You keep bringing up the idea that the values have to be known, but so far they don't.

See comments at the top level.

So the compiler is free to pick an underlying type that's large enough to handle all the explicitly listed initial values. Do we actually need to know the values for this check?

Yes, C requires the enumeration constants to be representable with int. But also, because this is in the modernize module, it's very likely we'll be getting a request to convert to using a scoped enumeration or an enumeration with the appropriate fixed underlying type in C++ as well.

clang-tools-extra/clang-tidy/modernize/IntegralLiteralExpressionMatcher.cpp
100

Oh, I see now, thanks for explaining it. I didn't realize that C restricts the values to int.

aaron.ballman added inline comments.May 2 2022, 1:17 PM
clang-tools-extra/clang-tidy/modernize/IntegralLiteralExpressionMatcher.cpp
100

You're welcome, sorry for not pointing it out sooner!

clang-tools-extra/clang-tidy/modernize/IntegralLiteralExpressionMatcher.cpp
100

Regarding conversion to a scoped enum, I think that is best handled by a separate enum-to-scoped-enum check. I have one I've been working on separately. As bad as it is to convert macros (since they have no respect for structure or scope), it's quite a bit of work to convert a non-scoped enum to an enum because now implicit conversions enter the picture and expressions involving macros (e.g. FLAG_X | FLAG_Y) also get much more complicated. Not only that but usages have to have types updated. I don't think it's very useful to upgrade to a scoped enum and then have every use wrapped in static_cast<int>(). It just creates uglier code than what was there before and I don't think people would adopt such a check.

Regarding conversion to an appropriate fixed underlying type, that isn't allowed on unscoped enums, only on scoped enums, so it has all the above complexity plus selecting the appropriate fixed underlying type.

aaron.ballman added inline comments.May 3 2022, 4:49 AM
clang-tools-extra/clang-tidy/modernize/IntegralLiteralExpressionMatcher.cpp
100

Regarding conversion to a scoped enum, I think that is best handled by a separate enum-to-scoped-enum check.

It's been a while since I checked, but I recall that checks with interacting fix-its tend not to play well together. We should probably see if that's still the case today. As an example, if the enum-to-scoped-enum check runs BEFORE the modernize-macros-to-enum check, then the behavior will be worse than if the checks are run in the reverse order. Because of issues like that, I'm not quite as convinced that a separate check is best (though I do agree it's notionally better).

Regarding conversion to an appropriate fixed underlying type, that isn't allowed on unscoped enums, only on scoped enums, so it has all the above complexity plus selecting the appropriate fixed underlying type.

That's incorrect; fixed underlying types and scoped enumerations are orthogonal features (though a scoped enumeration always has a fixed underlying type): https://godbolt.org/z/sGYsjdnrT

LegalizeAdulthood marked an inline comment as done.May 8 2022, 7:24 PM
LegalizeAdulthood added inline comments.
clang-tools-extra/clang-tidy/modernize/IntegralLiteralExpressionMatcher.cpp
100

You're right that there can be unexpected interactions between checks when you run multiple of them concurrently, but this has always been the case and isn't surprising to me. This doesn't seem to be a situation unique to these checks though. As more and more transformations become available through clang-tidy, it's inevitable that two different checks will want to modify the same piece of code. For instance, the identifier naming check and modernize-loop-convert. Modernize-loop-convert can eliminate variables entirely (iterators go poof), while the identifier check wants to rename the iterators.

Huh. OK, good to know. I tried doing an underlying type on an unscoped enum and I got a compilation error; I must have just done it wrong.

LegalizeAdulthood marked 2 inline comments as done.May 8 2022, 7:38 PM

OK, so thinking about this review a little more, I propose this:

  • Take the check as is, but document that the initializing expressions may result in an invalid enum, particularly for C which restricts the underlying type to be int
  • Create a subsequent commit that rejects the enums where the language is C and the initializing expression is a value larger than an int by rejecting any macro where any integer token in the expression is larger than an int
  • Create an additional subsequent commit that not only matches the expression but also computes the value and checks it for range.

How does that sound?

aaron.ballman accepted this revision.May 9 2022, 4:22 AM

OK, so thinking about this review a little more, I propose this:

  • Take the check as is, but document that the initializing expressions may result in an invalid enum, particularly for C which restricts the underlying type to be int
  • Create a subsequent commit that rejects the enums where the language is C and the initializing expression is a value larger than an int by rejecting any macro where any integer token in the expression is larger than an int
  • Create an additional subsequent commit that not only matches the expression but also computes the value and checks it for range.

How does that sound?

I think that sounds like a good approach -- I expect the third bullet is when we'll likely learn whether we should have let Clang do the parsing or not (because then we're replicating not only the expression parsing but the constant evaluation calculations from the frontend).

I re-reviewed the patch as it stands today, and given the above plan, I think this is good to go. So it gets my LGTM and my thanks for the good discussions!

clang-tools-extra/clang-tidy/modernize/IntegralLiteralExpressionMatcher.cpp
100

You're right that there can be unexpected interactions between checks when you run multiple of them concurrently, but this has always been the case and isn't surprising to me.

+1, this isn't a new issue. The reason I brought it up is because we've been bringing up this issue for a few years now and nobody has had the chance to try to fix the fixit infrastructure to improve the behavior of these kinds of interactions. So my fear is that we keep making the situation incrementally worse, and then it gets incrementally harder for anyone to fix it because of odd edge case behavior. That's not a reason for you to change what you're doing in this patch right now, though -- just background on where I'm coming from.

clang-tools-extra/docs/clang-tidy/checks/modernize-macro-to-enum.rst
21–22

Maybe we should also list the binary ?: GNU extension and comma expressions?

This revision is now accepted and ready to land.May 9 2022, 4:22 AM
clang-tools-extra/docs/clang-tidy/checks/modernize-macro-to-enum.rst
21–22

Yes, I need to update the docs on that and also call out the potential false positives explicitly.

clang-tools-extra/clang-tidy/modernize/IntegralLiteralExpressionMatcher.cpp
100

I've already observed that run-clang-tidy.py can produce invalid fixits for header files, see bug 54885 and this discussion. I haven't yet concluded if it's a bug in the way I'm emitting fixits or a bug in the way clang-apply-replacements tries to de-duplicate fixits.

LegalizeAdulthood marked 3 inline comments as done.May 13 2022, 12:47 PM

Update documentation from review comments

This revision was landed with ongoing or failed builds.May 13 2022, 5:46 PM
This revision was automatically updated to reflect the committed changes.
aaronpuchert added inline comments.
clang-tools-extra/unittests/clang-tidy/ModernizeModuleTest.cpp
210–214

Seems to have caused a build failure:

FAILED: tools/clang/tools/extra/unittests/clang-tidy/CMakeFiles/ClangTidyTests.dir/ModernizeModuleTest.cpp.o 
/home/buildbots/clang.11.0.0/bin/clang++ --gcc-toolchain=/opt/rh/devtoolset-7/root/usr  -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Itools/clang/tools/extra/unittests/clang-tidy -I/home/buildbots/docker-RHEL-buildbot/SetupBot/worker_env/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/clang-tools-extra/unittests/clang-tidy -I/home/buildbots/docker-RHEL-buildbot/SetupBot/worker_env/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/clang/include -Itools/clang/include -Iinclude -I/home/buildbots/docker-RHEL-buildbot/SetupBot/worker_env/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/llvm/include -I/home/buildbots/docker-RHEL-buildbot/SetupBot/worker_env/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/clang-tools-extra/clang-tidy -I/home/buildbots/docker-RHEL-buildbot/SetupBot/worker_env/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/llvm/utils/unittest/googletest/include -I/home/buildbots/docker-RHEL-buildbot/SetupBot/worker_env/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/llvm/utils/unittest/googlemock/include -fPIC -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O3 -DNDEBUG    -Wno-variadic-macros -Wno-gnu-zero-variadic-macro-arguments -fno-exceptions -fno-rtti -UNDEBUG -Wno-suggest-override -std=c++14 -MD -MT tools/clang/tools/extra/unittests/clang-tidy/CMakeFiles/ClangTidyTests.dir/ModernizeModuleTest.cpp.o -MF tools/clang/tools/extra/unittests/clang-tidy/CMakeFiles/ClangTidyTests.dir/ModernizeModuleTest.cpp.o.d -o tools/clang/tools/extra/unittests/clang-tidy/CMakeFiles/ClangTidyTests.dir/ModernizeModuleTest.cpp.o -c /home/buildbots/docker-RHEL-buildbot/SetupBot/worker_env/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/clang-tools-extra/unittests/clang-tidy/ModernizeModuleTest.cpp
/home/buildbots/docker-RHEL-buildbot/SetupBot/worker_env/ppc64le-clang-rhel-test/clang-ppc64le-rhel/llvm-project/clang-tools-extra/unittests/clang-tidy/ModernizeModuleTest.cpp:210:15: error: unused function 'operator<<' [-Werror,-Wunused-function]
std::ostream &operator<<(std::ostream &Str,
              ^
1 error generated.
clang-tools-extra/unittests/clang-tidy/ModernizeModuleTest.cpp
210–214

Simon Pilgrim fixed it, but I don't understand why clang calls this function unused. When the test fails, gtest uses this function to pretty print the parameter. I'm rebuilding with a forced test failure to validate.

clang-tools-extra/unittests/clang-tidy/ModernizeModuleTest.cpp
210–214

Yes, without this function the failing test prints results like this:

[ RUN      ] TokenExpressionParserTests/MatcherTest.MatchResult/123
D:\legalize\llvm\llvm-project\clang-tools-extra\unittests\clang-tidy\ModernizeModuleTest.cpp(200): error: Value of: matchText(GetParam().Text) == GetParam().Matched
  Actual: false
Expected: true
[  FAILED  ] TokenExpressionParserTests/MatcherTest.MatchResult/123, where GetParam() = 16-byte object <00-00 00-00 00-00 00-00 40-EC 3B-D6 F6-7F 00-00> (1 ms)

....which isn't particularly useful.

So how do we include pretty printers for tests without clang erroneously flagging them as unused?

aaronpuchert added inline comments.May 14 2022, 1:45 PM
clang-tools-extra/unittests/clang-tidy/ModernizeModuleTest.cpp
210–214

What got me wondering: this definition is last in the file, and there is no prior declaration of this function. How can there be any uses of it? We're not in class scope, so all prior uses of operator<< or perhaps PrintTo must have been resolved to some other function already. Or am I missing something?

LegalizeAdulthood marked an inline comment as done.May 14 2022, 1:47 PM
LegalizeAdulthood added inline comments.
clang-tools-extra/unittests/clang-tidy/ModernizeModuleTest.cpp
210–214

It seems what other tests do is define a friend function in the parameter class. I'm going to push that and see if that is accepted.

LegalizeAdulthood marked an inline comment as done.May 14 2022, 1:48 PM
LegalizeAdulthood added inline comments.
clang-tools-extra/unittests/clang-tidy/ModernizeModuleTest.cpp
210–214

Could have been an MSVC-ism, which is what I develop with. It was most definitely being used while I was working on the test cases.

LegalizeAdulthood marked an inline comment as done.May 14 2022, 5:28 PM
LegalizeAdulthood added inline comments.
clang-tools-extra/clang-tidy/modernize/IntegralLiteralExpressionMatcher.cpp
205

So I was research the C standard for what it says are acceptable initializer values for an enum and it disallows the comma operator:

https://en.cppreference.com/w/c/language/enum

integer constant expression whose value is representable as a value of type int

https://en.cppreference.com/w/c/language/constant_expression

An integer constant expression is an expression that consists only of

  • operators other than assignment, increment, decrement, function-call, or comma, except that cast operators can only cast arithmetic types to integer types

So I'll have to reject initializing expressions that use the comma operator when processing C files.

aaron.ballman added inline comments.May 16 2022, 4:33 AM
clang-tools-extra/clang-tidy/modernize/IntegralLiteralExpressionMatcher.cpp
205

So I'll have to reject initializing expressions that use the comma operator when processing C files.

Be careful when you do so: https://godbolt.org/z/dTMKv3a4v