This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] Add modernize-macro-to-enum check
ClosedPublic

Authored by LegalizeAdulthood on Jan 17 2022, 4:19 PM.

Details

Summary

This check performs basic analysis of macros and replaces them
with an anonymous unscoped enum. Using an unscoped anonymous enum
ensures that everywhere the macro token was used previously, the
enumerator name may be safely used.

Potential macros for replacement must meet the following constraints:

  • Macros must expand only to integral literal tokens.
  • Macros must be defined on sequential source file lines, or with only comment lines in between macro definitions.
  • Macros must all be defined in the same source file.
  • Macros must not be defined within a conditional compilation block.
  • Macros must not be defined adjacent to other preprocessor directives.
  • Macros must not be used in preprocessor conditions

Each cluster of macros meeting the above constraints is presumed to
be a set of values suitable for replacement by an anonymous enum.
From there, a developer can give the anonymous enum a name and
continue refactoring to a scoped enum if desired. Comments on the
same line as a macro definition or between subsequent macro definitions
are preserved in the output. No formatting is assumed in the provided
replacements.

Fixes #27408

Diff Detail

Event Timeline

LegalizeAdulthood requested review of this revision.Jan 17 2022, 4:19 PM

RFC: Should we create a cppcoreguidelines alias since this implements
"Enum.1: Prefer enumerations over macros"?

This check tries to be very conservative so as to not generate false
positives or invalid code.

njames93 added a comment.EditedJan 17 2022, 5:33 PM
This comment has been deleted.
clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
106
116

Make this a switch?

134

Is there any practical reason for these to be defined outline?

147–148

No need to create a std::string here, just keep it as a StringRef.

182

This ConditionScope checks looks like it would prevent warning in header files that use a header guard(instead of pragma once) to prevent multiple inclusion.

192–194

This seems a strange way to decect changes in file when you can just override the FileChanged callback.

212–218

Probably a syntax error there, but something like that would be much more readable.

clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.h
13

IWYU Violation

19

FIXME

clang-tools-extra/test/clang-tidy/checkers/modernize-macro-to-enum.cpp
7–15

These checks aren't needed as notes are implicitly ignored by the check_clang_tidy script.
Same goes for below

37–40

Not essential for this to go in, but it would be nice if we could detect the longest common prefix for all items in a group and potentially use that to name the enum. This would only be valid if the generated name is not currently a recognised token,

Eugene.Zelenko added inline comments.
clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.h
19

Please do this :-)

clang-tools-extra/clang-tidy/modernize/ModernizeTidyModule.cpp
63

Please address this.

clang-tools-extra/docs/clang-tidy/checks/modernize-macro-to-enum.rst
6

Please make first statement same as in Release Notes. This check should be omitted.

45

It'll be reasonable to support indentation. Option could be used to specify string (tab or several spaces).

njames93 added inline comments.Jan 17 2022, 6:01 PM
clang-tools-extra/docs/clang-tidy/checks/modernize-macro-to-enum.rst
45

That's not actually necessary as clang-tidy runs clang-format on the fixes it generates

LegalizeAdulthood marked 13 inline comments as done.Jan 17 2022, 7:54 PM
LegalizeAdulthood added inline comments.
clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
134

Is there any practical reason for it to be inline?

I don't like making large inline functions. It's my understanding that
the compiler may inline small "outlined" functions anyway, where it
has a better idea of what "small" means.

182

Oh, good catch, you're probably right. I'll test that manually.

(Gee, another case where we need check_clang_tidy.py to validate
changes to header files! This keeps coming up! My implementation
of this from several years ago died in review hell and was never born.)

192–194

I'll try that. Does it fire once at the beginning?

212–218

I was unaware of llvm::erase, presumably it's a range-like algo that
does the boiler plate of begin(), end(), std::remove_if and
std::vector<T>::erase?

Looks like I need to switch to llvm::SmallVector for that to work.
I'll try that.

clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.h
13

Good catch. This is a leftover when I had more methods declared
on the Check class.

19

Oops :)

clang-tools-extra/clang-tidy/modernize/ModernizeTidyModule.cpp
63

Yep, I didn't clang-format this file after add_new_check ran.

clang-tools-extra/docs/clang-tidy/checks/modernize-macro-to-enum.rst
45

It can optionally do this, but it is off by default.

clang-tools-extra/test/clang-tidy/checkers/modernize-macro-to-enum.cpp
37–40

Yeah, for this example that would yield a reasonable enum
name of CoordMode, but many macros have acronym prefixes
and they wouldn't yield useful names, e.g. WM_COMMAND
would get a name like WM which isn't particularly useful and
may cause other problems.

IMO, introducing names should be a step that's invoked
manually by the developer.

I intend to write another check that migrates a named, unscoped
enum to a scoped enum. Once everything has been lifted to an
enum from macros, then I can use matchers to locate the enumerators
and perform usage checks to make sure that lifting the enumerator
to a scoped identifier doesn't create invalid code. At such time, the
common prefix of the enumerators would be stripped to be replaced
with the scoped name.

LegalizeAdulthood marked 7 inline comments as done.Jan 17 2022, 8:02 PM
LegalizeAdulthood marked an inline comment as done.Jan 17 2022, 8:33 PM
LegalizeAdulthood added inline comments.
clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
192–194

Tests continue to pass after switching to FileChanged, so looks good.
I'll double check in the debugger.

Addresses most of the review comments, still need to:

  • Verify that include guards don't interfere with analysis of headers
clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
182

Any ideas for an algorithm that detects a header guard condition
from some other condition? I don't see anything obvious.

clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
182

So I created a little state machine and that covers my test cases,
but I worry that it might be too fragile, so I'm open to suggestions
for improvement on my state machine algorithm and/or test cases
that could break the algorithm.

  • Handle macros inside header files with include guards
  • Track state per-file and push/pop that state as files are included
LegalizeAdulthood marked an inline comment as done.Jan 19 2022, 7:54 AM
  • clang-format
  • prefer llvm::any_of over std::any_of
  • Tweak documentation, make sure sphinx runs without errors

Testing on iterated dynamics revealed a number of shortcomings;
update the code to address those:

  • Improve handling of include guards
  • Handle conditional compilation blocks better
  • Reject macros that expand to floating point literals
  • Start a new enum when macros have intervening text

Switch back to SmallVector after debugging with std::vector
Drop unnecessary includes

Update from additional testing:

  • reject floating-point literals (they can't be enumerations)
  • add more test cases
clang-tools-extra/test/clang-tidy/checkers/modernize-macro-to-enum.cpp
2

Ah, this leaked out from incremental testing I was doing;
it doesn't actually require C++14 or later (although it did
at one time while I was working on rejecting floating-point
literals.)

Remove -std for test

LegalizeAdulthood added a comment.EditedJan 30 2022, 11:10 PM

This situation isn't properly diagnosed (false positive):

#ifdef USE_FOO
#if USE_FOO
// code
#endif
#endif

Eh, never mind this comment it's unrelated to this check :)

Ping.

This check has gone for a week without review comments.

Since this is a modernize check, shouldn't it be proposing to use enum class instead of enum? This will conflict with Enum.3 from cppcoreguidelines, and I don't think it's unreasonable that people enable both modernize* and cppcoreguidelines*. Not sure if such a check for Enum.3 exists today but it could easily be added in the future.

I understand that the fix perhaps cannot be automated just as easily, but maybe it's OK? Or at least make the default use enum class and leave it as an option to use unscoped enums to be able to apply the fix. Still I find it a bit inconsistent with the rest of the modernize module (modernize as in "use post-C++11 stuff").

Another point is that macros typically have a different naming convention than enums or constants, so users will likely have to do manual work anyway, or rely on the "readability-identifier-naming" check - how does it play out then if 2 checks propose different fixes?

I think it helps to understand this check better if we consider what
we would do manually if we were modernizing code with a lot of
macro-style enums in it. I've done this myself; in fact all the checks
I've written for clang-tidy have been motivated by modernizing dusty
deck C code that was converted to C++.

The motivating example is on github and you can find my manual
refactorings in the commit history.

A simple behavior preserving first pass is to hoist all integral constant
macros to anonymous unscoped enums. A second pass is to examine
related anonymous unscoped enums and hoist them into named unscoped
enums. If they are used only as discriminating values, e.g. not used in
arithmetic expressions, then they can be hoisted to a named scoped
enum and you can "lean on the compiler" to adjust the relevant
declarations of variables and function parameters.

This check implements the first pass. My intention is to create another
check that assists in the "hoist a named unscoped enum to a named
scoped enum" which a) removes any common enumeration prefix from
all the enumerations (since they are now uniquely identified by scope),
b) adds the necessary scope qualifiers to all uses of the enumerations,
c) propagates the necessary declaration changes to all variables and
parameters. Obviously such a check involves quite a bit of analysis
of all the usages of an enumeration and may not even be feasible in
the context of clang-tidy since it can only examine a single translation
unit at a time, but this is the goal.

Note that this still leaves humans in the middle deciding which clumps
of anonymous unscoped enums represent semantically related items
and giving them an appropriate name.

Since this is a modernize check, shouldn't it be proposing to use enum class instead of enum?

One step at a time :). In short, the answer is no because of the
way preprocessor macros interact with the rest of the code. While
it is safe to hoist the macro into an unscoped enum without analyzing
all the macro expansion sites, the same can't be said for a scoped
enum.

First, we would have to determine a name for the scoped enum. This
may not be possible, depending on the macros. For instance some
macros are really just named constants, not enumerations, but we
can't tell from the macro itself. (There are heuristics you can apply,
and I have done some of that in this check, but the only way to know
for certain whether a macro is a named constant or intended to be
a set of related names is manual inspection.)

Second, this check can apply to C source files as well as C++ source
files. C doesn't have scoped enums, but it does have enums and they
can be anonymous.

This will conflict with Enum.3 from cppcoreguidelines,

You're correct that both this check and cppcoreguidelines-macro-usage
will make suggestions for the same macro. Again, the problem with
a macro is that it's hard to identify which is an enum and which is a
named constant.

If I have a block of macros that represent enumerations, I'm not going
to be happy with cppcoreguidelines-macro-usage suggesting that
they all be turned into constexpr declarations.

If I have a block of macros that represent named constants, I'm not
going to be happy with modernize-macro-to-enum suggesting that
they all be turned into anonymous enum declarations.

I don't think there's any clear winner over which should be chosen by
an automated tool.

If it were me, I'd apply modernize-macro-to-enum first and then
cppcoreguidelines-macro-usage. Why? Because I'd then have enums
to scoop up all the integral constants, which is a more specific language
construct, and then I'd have constexpr constants for the macros that
expand to floating-point and string values. I'd still be left with function-
like macros that macro-usage warned me about and didn't fix.

Or at least make the default use enum class and leave it as an
option to use unscoped enums to be able to apply the fix.

An additional problem with scoped enums, besides having to invent
a name for it, is that you have to analyze all the expansion sites of the
defined macros in order to determine if the macro expands into a value
that is used in an expression.

Consider this code:

#define FOO_ARG1 0x1
#define FOO_ARG2 0x2
#define FOO_ARG3 0x3

void f()
{
  int options = FOO_ARG1 | FOO_ARG3;
  g(options);
}

If I replace the macros with a scoped enum, now I've created a compilation
error at the line initializing options and I've created a compilation error
at the line calling g(). An anonymous unscoped enum doesn't suffer from
these problems.

Still I find it a bit inconsistent with the rest of the modernize module
(modernize as in "use post-C++11 stuff").

There is a lot of legacy C-isms in C++ code and not all modernization
transformations imply a shift to C++11. For instance, another modernizing
checker that I wrote -- modernize-redundant-void-arg -- applies to C++98
code.

Another point is that macros typically have a different naming convention
than enums or constants, so users will likely have to do manual work anyway,
or rely on the "readability-identifier-naming" check - how does it play out
then if 2 checks propose different fixes?

The difficulty of naming is why this check doesn't propose names. They have
to be chosen by humans. The readability check imposes a naming convention
on the identifiers but it doesn't impute semantics to those identifiers, it simply
checks that they follow a certain lexicographical convention.

Oh, I think I misinterpreted what you were saying in the last bit:

Another point is that macros typically have a different naming
convention than enums or constants, so users will likely have
to do manual work anyway, or rely on the
"readability-identifier-naming" check - how does it play out
then if 2 checks propose different fixes?

All the checks look at the original version of the source file, so if
you run modernize-macro-to-enum and readability-identifier-naming
checks at the same time, they won't conflict with each other.
You're correct that once a macro has been hoisted to an
anonymous enum, running the identifier naming check on the
transformed result might then complain about the names being
stylized as a macro and not as an enum. Since the identifier
naming check provides fix-its, I don't see this as a problem.
If you want the was-a-macro-now-is-an-enum name to follow
a different identifier naming convention, it makes sense to run
the one check after the other in order to enforce this. IMO, it's
better to have small, discrete, well-defined transformation steps
than it is to have a "kitchen sink" transformation that attempts to
do too much.

However, I've seen many coding style guidelines (current workplace
is one example) that explicitly say to name enumerations with the
same identifier convention as are used for macros, so this is style
guide specific.

This will conflict with Enum.3 from cppcoreguidelines

I went back and looked at Enum.3 Prefer class enums over “plain” enums,
and implementing that is the second-pass check I discussed earlier.

This check is more about implementing Enum.1 Prefer enumerations over macros.

This brings up the question of whether or not there should be a cppguidelines alias
for this check.

Ok, thanks for the explanation! I'm mostly interested on the warning message, we've had situations before where the warning describes the problem and the solution, which can easily lead to confusion. From the tests I can see the message is quite generic "use an enum", so it won't push users to prefer one variant over the other.

I'd like to play a bit with the patch and see what pops in our codebase but somehow I get an error when downloading, do you happen to know what could be wrong? Alternatively if there's any other easy way to checkout the patch and test it :)

$ arc patch D117522
 Exception 
preg_match(): Passing null to parameter #2 ($subject) of type string is deprecated
(Run with `--trace` for a full exception trace.)

This check is more about implementing Enum.1 Prefer enumerations over macros.

Should a cppcoreguidelines alias be added in that case?

I'm mostly interested on the warning message, [...]

I'm open to changes in the wording of the warning message.

I get an error when downloading, do you happen to know what could be wrong?

I haven't used arcanist, I've created the review by uploading the diff from git.

You should be able to download the raw diff and then use git apply to apply
the diff to your working tree.

I tested with a clone of [[ https://github.com/LegalizeAdulthood/iterated-dynamics | iterated dynamics ]]from github.
That code originates as multi-decade old C code which I've incrementally modernized
and has lots of integral constant macros in it. That led me to some improvements over
the first version of this that I put up for review.

This check is more about implementing Enum.1 Prefer enumerations over macros.

Should a cppcoreguidelines alias be added in that case?

Yeah, I think I'm going to add that.

Mmm... just realized that this check should invalidate any macro
that is used in a conditional compilation expression, because enums
can't be used there.

  • Rejects macros that are used in preprocessor conditionals
  • cppcoreguidelines-macro-to-enum is an alias for modernize-macro-to-enum
  • Document alias
  • Move small methods to class decl
LegalizeAdulthood edited the summary of this revision. (Show Details)Feb 12 2022, 9:24 PM

Ping. Waiting for review feedback for about a week now.

I tried to download the diff and apply it from the root of llvm-project, but I must be doing something wrong...

$ git apply D117522.diff
D117522.diff:808: trailing whitespace.
                                - attempt to free an illegal 
D117522.diff:809: trailing whitespace.
                                  cmap entry 
D117522.diff:810: trailing whitespace.
                                - attempt to store into a read-only 
D117522.diff:824: trailing whitespace.
// CHECK-FIXES-NEXT:                                 - attempt to free an illegal 
D117522.diff:825: trailing whitespace.
// CHECK-FIXES-NEXT:                                   cmap entry 
error: clang-tidy/modernize/CMakeLists.txt: No such file or directory
error: clang-tidy/modernize/ModernizeTidyModule.cpp: No such file or directory
error: docs/ReleaseNotes.rst: No such file or directory
error: docs/clang-tidy/checks/list.rst: No such file or directory
clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
383

Maybe wrap the raw strings inside a StringRef for a more robust and readable comparison? Not a big fan of the magic 6 there :)

carlosgalvezp added inline comments.Feb 20 2022, 3:03 AM
clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
390

Same here

clang-tools-extra/docs/clang-tidy/checks/cppcoreguidelines-macro-to-enum.rst
7–8 ↗(On Diff #408235)

Would it make sense to mention this check covers the rule Enum.1?

I see that we follow a standard pattern for aliases where we simply redirect to the main check, but maybe it's good to point this out anyway? Otherwise it might be unclear exactly which rule this check is referring to.

One drawback though is that aliases redirect to the main check after 5 seconds...

clang-tools-extra/docs/clang-tidy/checks/modernize-macro-to-enum.rst
10–14

Oh, it's right here :) I suppose as a user I would expect to find this info in the cppcoreguidelines doc, not here. But again I don't know what the de-facto pattern is with aliases so I'll leave that to someone that knows better.

20

Hmm, what about this situation?

// This is some macro
#define FOO 123
// This is some other unrelated macro
#define BAR 321

Would the check group these 2 together? Personally I'd put an empty line between the two to show they are unrelated, but I know many people prefer to save vertical space and keep everything tight without empty lines.

LegalizeAdulthood marked 5 inline comments as done.Feb 25 2022, 4:25 PM
LegalizeAdulthood added inline comments.
clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
383

OK, I've added some intention revealing functions that I think clean this up. I'm testing now and will upload a diff when the tests pass.

clang-tools-extra/docs/clang-tidy/checks/modernize-macro-to-enum.rst
10–14

readability-magic-numbers is similar; the alias simply links to the readability check and the readability check cites the C++ Core Guideline with a link.

20

They're "grouped" into the same anonymous enum. The basic heuristic is that blank lines separate groups of related identifiers, not comment lines. This is the most common pattern in header files with macros that I've observed for decades.

LegalizeAdulthood marked 3 inline comments as done.
  • Add intention revealing functions for details of pragma once parsing

Ping. Another week waiting for reviews...

Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2022, 9:23 AM

Ping. Another week waiting for reviews...

Thanks for the ping! FWIW, it's also not uncommon for there to be week delays (reviewers go on vacation, have other obligations, etc), so hopefully the delays are not too frustrating. We do our best to review in a timely manner.

Overall, I think this is a really neat new check. One thing I think we should do is entirely ignore macros that are defined in system headers. We don't diagnose in system headers by default, but there's no reason to even do the processing work within a system header because those macros can't be changed (not only can the user not change them because it's a system header, but it's also likely that the macro is required for standards conformance). I think we can get some good compile-time performance wins from bailing on system headers, but this is speculative.

clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
48

I'm a bit confused by this one as this is not a valid line ending (it's either three valid line endings or two valid line endings, depending on how you look at it). Can you explain why this is needed?

84–85

Won't this cause a problem for hex literals like 0xE?

107

Maybe not worth worrying about, but should this be a uint64_t to handle massive source files?

316

We don't seem to have any tests for literal suffixes and I *think* those will show up here as additional tokens after the literal. e.g., #define FOO +1ULL, so I think the size check here may be a problem.

319

It's worth pointing out that both of these are expressions that operate on a literal, and if we're going to allow expressions that operator on a literal, why only these two? e.g. why not allow #define FOO ~0U or #define BAR FOO + 1? Someday (not today), it would be nice for this to work on any expression that's a valid constant expression.

419

Diagnostics in clang-tidy don't start with a capital letter or have trailing punctuation.

420

I *think* you don't need to call getName() here because the diagnostics engine already knows how to handle an IdentifierInfo * (but I could be remembering wrong)

429
clang-tools-extra/clang-tidy/modernize/ModernizeTidyModule.cpp
89–93

Unrelated formatting changes? (Feel free to land separately)

clang-tools-extra/test/clang-tidy/checkers/modernize-macro-to-enum.cpp
68–69

What about an #undef that's not adjacent to any macros? e.g.,

#define FOO 1
#define BAR 2
#define BAZ 3

int i = 12;

#if defined(FROBBLE)
#undef FOO
#endif

I'm worried that perhaps other code elsewhere will be checking defined(FOO) perhaps in cases conditionally compiled away, and switching FOO to be an enum constant will break other configurations. To be honest, I'm a bit worried about that for all of the transformations here... and I don't know a good way to address that aside from "don't use the check". It'd be interesting to know what kind of false positive rate we have for the fixes if we ran it over a large corpus of code.

LegalizeAdulthood marked 2 inline comments as done.Mar 15 2022, 9:46 AM
LegalizeAdulthood added inline comments.
clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
48

It's a state machine, where the states are named for what we've seen so far and we're looking for two consecutive line endings, not just one. Does it make sense now?

84–85

Good catch, I'll add a test.

107

I use the type returned by getSpellingLineNumber.

316

On an earlier iteration I had some tests for literals with suffixes and the suffix is included in the literal token. I can add some test cases just to make it clear what the behavior is when they are encountered.

319

A later enhancement can generalize to literal expressions (which are valid initializers for an enumeration), but I wanted to cover the most common case of simple negative integers in this first pass.

clang-tools-extra/clang-tidy/modernize/ModernizeTidyModule.cpp
89–93

Probably, I just routinely clang-format the whole file instead of just isolated changes.

clang-tools-extra/test/clang-tidy/checkers/modernize-macro-to-enum.cpp
68–69

Yeah, the problem arises whenever you make any changes to a header file. Did you also change all translation units that include the header? What about conditionally compiled code that was "off" in the translation unit for the automated change? Currently, we don't have a way of analyzing a group of translation units together for a cohesive change, nor do we have any way of inspecting more deeply into conditionally compiled code. Addressing those concerns is beyond the scope of this check (or any clang-tidy check) as it involves improvements to the entire infrastructure.

However, I think it is worth noting in the documentation about possible caveats. I think the way clang-tidy avoids this problem now is that you have to request fixes and the default mode is to issue warnings and leave it up to the reader as to whether or not they should apply the fixes.

I believe I already have logic to disqualify any cluster of macros where any one of them are used in a preprocessor condition (that was the last functional change I made to this check). Looks like I need to extend that slightly to include checking for macros that are #undef'ed.

LegalizeAdulthood marked 3 inline comments as done.Mar 16 2022, 6:19 PM
LegalizeAdulthood added inline comments.
clang-tools-extra/test/clang-tidy/checkers/modernize-macro-to-enum.cpp
68–69

OK, looks like I was already handling this, LOL. See line 135

// Undefining an enum-like macro results in the enum set being dropped.
LegalizeAdulthood marked an inline comment as done.Mar 16 2022, 6:19 PM
LegalizeAdulthood marked 5 inline comments as done.Mar 16 2022, 9:26 PM
LegalizeAdulthood marked an inline comment as done.
aaron.ballman added inline comments.Mar 17 2022, 6:11 AM
clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
48

Thanks, I understood it was a state machine, but it's a confused one to me. \r was the line ending on Mac Classic, I've not seen it used outside of that platform (and I've not seen anyone write code for that platform in a long time). So, to me, the only valid combinations of line endings to worry about are: LF LF; CRLF CRLF; CRLF LF; LF CRLF.

LF LF returns false (Nothing -> LF -> return false)
CRLF CRLF returns false (Nothing -> CR -> CRLF -> CRLFCR -> return false)
CRLF LF returns true (Nothing -> CR -> CRLF -> LF -> finish loop)
LF CRLF returns true (Nothing -> LF -> CR -> CRLF -> finish loop)

(If you also intend to support Mac Classic line endings for some reason, this gets even more complicated.)

107

Sounds like a good enough reason to me, thanks!

316

Thanks, I'd appreciate adding the tests just to be sure we handle the case properly.

319

I'm less worried about the arbitrary constant expressions than I am about not supporting ~ because ~0U is not uncommon in macros as a way to set all bits to 1. It's certainly more common than seeing a unary +, at least in my experience. However, an even more important use case that I should have thought of first is surrounding the value in parens (which is another common idiom with macros). e.g, #define ONE (1)

Some examples of this in the wild (search the files for ~0U):

https://codesearch.isocpp.org/actcd19/main/l/linux/linux_4.15.17-1/drivers/gpu/drm/i915/gvt/handlers.c
https://codesearch.isocpp.org/actcd19/main/w/wine/wine_4.0-1/dlls/d3d8/d3d8_private.h
https://codesearch.isocpp.org/actcd19/main/q/qtwebengine-opensource-src/qtwebengine-opensource-src_5.11.3+dfsg-2/src/3rdparty/chromium/third_party/vulkan/include/vulkan/vulkan.h

(There's plenty of other examples to be found on the same site.)

I'm fine not completing the set in the initial patch, but the current behavior is a bit confusing (+ is almost of negligible importance). I think ~ and parens need to be supported; I'd prefer in this patch, but I'm fine if it comes in a subsequent patch so long as those two are supported before we release.

clang-tools-extra/clang-tidy/modernize/ModernizeTidyModule.cpp
89–93

Please don't clang-format the whole file (it makes code reviews more difficult; we document this request a bit in https://llvm.org/docs/CodingStandards.html#introduction), there's a script for formatting just the isolated changes: https://clang.llvm.org/docs/ClangFormat.html#script-for-patch-reformatting that I've found works really well in most cases.

clang-tools-extra/docs/clang-tidy/checks/modernize-macro-to-enum.rst
45

+1 to it not being necessary, there's a command line option to reformat fixes ( --format-style=<string>).

clang-tools-extra/test/clang-tidy/checkers/modernize-macro-to-enum.cpp
68–69

Yeah, you already have the code for handling this somewhat (that's one of the reasons why I brought this particular use case up). My greater concern is: how many false positives does this check generate on real world code? Documentation may help alleviate those concerns well enough, but if the false positive rate is sufficiently high that you basically have to disable this check for real world code, we need to do better. I don't fully trust my intuition on this one because preprocessor code in the real world has 40+ years worth of accumulated oddities, so having some actual measurements against real world code would be very informative.

I think I've got all the changes incorporated, but I'm getting a test failure so I haven't uploaded a new diff.

Honestly, I don't understand the test failure output. On Windows, the test failure output is all mangled and
difficult to interpret correctly as to what exactly is the problem, so I'm still trying to figure it out.

Improving the readability of the test failures is one of the things I would like to address in a future change.
I suspect it's only a problem on Windows as it seems most of the clang-tidy devs are using unix?

I think I've got all the changes incorporated, but I'm getting a test failure so I haven't uploaded a new diff.

Honestly, I don't understand the test failure output. On Windows, the test failure output is all mangled and
difficult to interpret correctly as to what exactly is the problem, so I'm still trying to figure it out.

If you're able to post the output you're getting, I can try to help psychically debug it.

Improving the readability of the test failures is one of the things I would like to address in a future change.
I suspect it's only a problem on Windows as it seems most of the clang-tidy devs are using unix?

FWIW, I use Windows almost exclusively. Also, clang-tidy is frequently integrated into people's CI pipelines so even folks on *nix can be impacted by the behavior on Windows in those cases. That said, I have no idea what the test failures look like or how likely they are to be hit, so it may be reasonable to improve in a follow-up.

I've worked through this issue before, I just need to spend some time on it.

The basic problem is that the test fails and dumps a bunch of output to "help" you
understand the failure, but the way it is formatted and mangled doesn't end up
being useful.

LegalizeAdulthood marked 2 inline comments as done.Mar 18 2022, 10:09 AM
LegalizeAdulthood added inline comments.
clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
316

I added tests in my latest diff, which I'll upload now even though the tests are failing. (I'll upload again after I've got the test failure figured out.)

319

The difficulty in supporting more complex expressions is that we have NO AST support here and it involves manually matching tokens in the macro definition.

However, your point about ~ is well taken and that's easy to add based on what I've got here. I thought it important to handle negative literals, so I added support for unary -. I added support for unary + out of symmetry.

clang-tools-extra/clang-tidy/modernize/ModernizeTidyModule.cpp
89–93

Most of the time it doesn't result in any new diffs besides those I'm adding myself. This is particularly true because most of the time I'm contributing new checks (which means whole new files) or fixing bugs on my existing checks (which have already had the whole file clang-format'ed).

I was unaware of the script, I'll take a look at that, thanks.

Should we cross link to the docs for this script in the "contributing" docs for clang-tidy?

LegalizeAdulthood marked 2 inline comments as done.
  • Update from review comments
  • check-clang-tools is reporting a test failure that still needs to be diagnosed (I think it is a mismatch between a CHECK-MESSAGES line and the exact output)
LegalizeAdulthood marked 3 inline comments as done.Mar 18 2022, 10:20 AM
LegalizeAdulthood added inline comments.
clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
48

I was trying to follow "be liberal in what you accept as input and conservative in what you generate as output" maxim. I can remove the CR as a line ending case if you think it's too obscure.

clang-tools-extra/test/clang-tidy/checkers/modernize-macro-to-enum.cpp
68–69

In the latest diff, I added a test case to make it clear that even if you #undef a macro later in the file, it invalidates all the surrounding macros in the cluster containing the undef'ed macro.

I'm open to suggestions for code bases on which to run this. Since this was motivated by my modernization of the fractint code base, I will run it on an old version of the code from my repo before I had manually converted the defines to enums.

LLVM isn't a good test case here, because LLVM doesn't use macros as enums :)

For this scenario (a macro that is later undef'ed), I don't believe I will emit any false positives.

LegalizeAdulthood marked 2 inline comments as done.Mar 18 2022, 10:21 AM

I think I've got all the changes incorporated, but I'm getting a test failure so I haven't uploaded a new diff.

If you're able to post the output you're getting, I can try to help psychically debug it.

I finally figured it out! I previously had written:

void MacroToEnumCallbacks::warnMacroEnum(const EnumMacro &Macro) const {
  Check->diag(Macro.Directive->getLocation(),
              "macro '%0' defines an integral constant; prefer an enum instead")
      << Macro.Name.getIdentifierInfo()->getName();
}

and you suggested that I could drop the ->getName() as it had an insertion operator
for identifier info. Well, that was true, but what I didn't realize is that this would add an
extra single quotes around the identifier, so my diagnostic output now had doubled-up
single quotes everywhere and I couldn't figure out what was doing this as I couldn't find
doubled up single quotes in my format string and I was suspecting the python script
was somehow treating single quotes special somewhere.

Once I accounted for this, everything passes, so I'll be uploading a new diff shortly.

  • Tests pass again
  • Recognize bitwise negated integers
LegalizeAdulthood marked an inline comment as done.Mar 18 2022, 4:03 PM
LegalizeAdulthood added inline comments.
clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
319

I've added support for bitwise negated integers. I didn't go further and try to recognize parenthesized literals (this just seems dumb, anyway... the extra parentheses do nothing and aren't ever needed).

  • Undo change introduced by clang-format
aaron.ballman added inline comments.Mar 22 2022, 9:20 AM
clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
48

If Clang supports it as a line ending, we probably should too, but... how do we handle CRLF vs "I mixed a CR with an LF by accident" kind of inputs? (Maybe we just treat that as CRLF and if the behavior is bad, the user shouldn't mix their line endings that way; I think that's defensible.) That seems to be similar to the scenario that's confusing me above where the user mixed an LF and CRLF by accident.

319

Thanks for adding the ~ support! I think we'll want parens supported as well because there's very common guidance to parenthesize EVERYTHING in the macro expansion list, but at the same time, I think that can be done once we get a bug report from a user.

clang-tools-extra/test/clang-tidy/checkers/modernize-macro-to-enum.cpp
68–69

In the latest diff, I added a test case to make it clear that even if you #undef a macro later in the file, it invalidates all the surrounding macros in the cluster containing the undef'ed macro.

Oh, thank you! I had the impression we only cared if it was undefed within the same "run" we used to determine the macro.

I'm open to suggestions for code bases on which to run this.

I'd recommend some FreeBSD or Linux packages (or the whole distro if that's easy enough), as those tend to have a fair number of surprises. If that's harder and you'd rather just do one big project, I'd recommend something openssl, sqlite3, maybe postgres for C projects likely to make a fair amount of use of macros (from what I recall of them anyway).

clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
48

Well, as far as Clang is concerned it's all just "whitespace" that gets eaten up by the preprocessor. Actually, that gives me a thought. A preprocessing directive is considered to end at the physical line ending, so I should look to see what sort of characters it considers to "end the line".

For the accidental mix-up, I'm not going to worry about that here. Your input files are assumed to be "well formed". The worst that happens in this check is that two blocks of macros that look like they are separated by a blank line are considered as a single clump by this check.

In other words, the worst that can happen is:

  • Two clumps of macros are considered together.
  • One clump of macros that is discarded because it doesn't follow the constraints "taints" an adjacent clump of macros that do follow the constraints.

Either way, nothing harmful happens to your code. It will still compile and be syntactically and semantically equivalent to what was there before.

clang-tools-extra/test/clang-tidy/checkers/modernize-macro-to-enum.cpp
68–69

Those are all good suggestions, I'll see what I can do.

aaron.ballman added inline comments.Mar 24 2022, 5:42 AM
clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
48

Actually, that gives me a thought. A preprocessing directive is considered to end at the physical line ending, so I should look to see what sort of characters it considers to "end the line".

All of \r, \n, \r\n I believe (you can double-check in Lexer::LexTokenInternal()

Either way, nothing harmful happens to your code. It will still compile and be syntactically and semantically equivalent to what was there before.

Oh, that's a very good point, thank you. I think that's reasonable fallback behavior for these weird edge cases.

LegalizeAdulthood marked 3 inline comments as done.Mar 24 2022, 7:53 PM
LegalizeAdulthood added inline comments.
clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
48

Well..... maybe.

If you look at Lexer::ReadToEndOfLine which is used to skip to the end of a preprocessor directive you'll see that it considers the first of '\r', '\n' or '\0' (end of file) as the end of the "line". This is around line 2835 of Lexer.cpp in my tree.

LegalizeAdulthood marked an inline comment as done.
  • clang-format
aaron.ballman accepted this revision.Mar 25 2022, 6:12 AM

Thanks for the discussion on this new check, it LGTM!

clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
48

Yeah, I saw that as well (that's typically used for error recovery in the preprocessor). We also have Lexer::SkipWhitespace() which skips all vertical whitespace, but not \0.

This revision is now accepted and ready to land.Mar 25 2022, 6:12 AM
This revision was automatically updated to reflect the committed changes.
RKSimon added inline comments.
clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
217

@LegalizeAdulthood I'm seeing errors in https://lab.llvm.org/buildbot/#/builders/93/builds/7956

/home/buildbots/ppc64be-clang-multistage-test/clang-ppc64be-multistage/llvm/clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp:216:22: error: declaration of ‘const clang::LangOptions& clang::tidy::modernize::{anonymous}::MacroToEnumCallbacks::LangOptions’ changes meaning of ‘LangOptions’ [-fpermissive]
  216 |   const LangOptions &LangOptions;
dyung added a subscriber: dyung.Mar 25 2022, 11:55 AM

I've reverted this change in order to get the build bots green again.