Page MenuHomePhabricator

[clang-tidy] Add modernize-macro-to-enum check
ClosedPublic

Authored by LegalizeAdulthood on Jan 17 2022, 4:19 PM.

Details

Summary

This check performs basic analysis of macros and replaces them
with an anonymous unscoped enum. Using an unscoped anonymous enum
ensures that everywhere the macro token was used previously, the
enumerator name may be safely used.

Potential macros for replacement must meet the following constraints:

  • Macros must expand only to integral literal tokens.
  • Macros must be defined on sequential source file lines, or with only comment lines in between macro definitions.
  • Macros must all be defined in the same source file.
  • Macros must not be defined within a conditional compilation block.
  • Macros must not be defined adjacent to other preprocessor directives.
  • Macros must not be used in preprocessor conditions

Each cluster of macros meeting the above constraints is presumed to
be a set of values suitable for replacement by an anonymous enum.
From there, a developer can give the anonymous enum a name and
continue refactoring to a scoped enum if desired. Comments on the
same line as a macro definition or between subsequent macro definitions
are preserved in the output. No formatting is assumed in the provided
replacements.

Fixes #27408

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
carlosgalvezp added inline comments.Feb 20 2022, 3:03 AM
clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
383

Maybe wrap the raw strings inside a StringRef for a more robust and readable comparison? Not a big fan of the magic 6 there :)

390

Same here

clang-tools-extra/docs/clang-tidy/checks/cppcoreguidelines-macro-to-enum.rst
8–9

Would it make sense to mention this check covers the rule Enum.1?

I see that we follow a standard pattern for aliases where we simply redirect to the main check, but maybe it's good to point this out anyway? Otherwise it might be unclear exactly which rule this check is referring to.

One drawback though is that aliases redirect to the main check after 5 seconds...

clang-tools-extra/docs/clang-tidy/checks/modernize-macro-to-enum.rst
10–14

Oh, it's right here :) I suppose as a user I would expect to find this info in the cppcoreguidelines doc, not here. But again I don't know what the de-facto pattern is with aliases so I'll leave that to someone that knows better.

20

Hmm, what about this situation?

// This is some macro
#define FOO 123
// This is some other unrelated macro
#define BAR 321

Would the check group these 2 together? Personally I'd put an empty line between the two to show they are unrelated, but I know many people prefer to save vertical space and keep everything tight without empty lines.

LegalizeAdulthood marked 5 inline comments as done.Feb 25 2022, 4:25 PM
LegalizeAdulthood added inline comments.
clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
383

OK, I've added some intention revealing functions that I think clean this up. I'm testing now and will upload a diff when the tests pass.

clang-tools-extra/docs/clang-tidy/checks/modernize-macro-to-enum.rst
10–14

readability-magic-numbers is similar; the alias simply links to the readability check and the readability check cites the C++ Core Guideline with a link.

20

They're "grouped" into the same anonymous enum. The basic heuristic is that blank lines separate groups of related identifiers, not comment lines. This is the most common pattern in header files with macros that I've observed for decades.

LegalizeAdulthood marked 3 inline comments as done.
  • Add intention revealing functions for details of pragma once parsing

Ping. Another week waiting for reviews...

Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2022, 9:23 AM

Ping. Another week waiting for reviews...

Thanks for the ping! FWIW, it's also not uncommon for there to be week delays (reviewers go on vacation, have other obligations, etc), so hopefully the delays are not too frustrating. We do our best to review in a timely manner.

Overall, I think this is a really neat new check. One thing I think we should do is entirely ignore macros that are defined in system headers. We don't diagnose in system headers by default, but there's no reason to even do the processing work within a system header because those macros can't be changed (not only can the user not change them because it's a system header, but it's also likely that the macro is required for standards conformance). I think we can get some good compile-time performance wins from bailing on system headers, but this is speculative.

clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
48

I'm a bit confused by this one as this is not a valid line ending (it's either three valid line endings or two valid line endings, depending on how you look at it). Can you explain why this is needed?

84–85

Won't this cause a problem for hex literals like 0xE?

107

Maybe not worth worrying about, but should this be a uint64_t to handle massive source files?

316

We don't seem to have any tests for literal suffixes and I *think* those will show up here as additional tokens after the literal. e.g., #define FOO +1ULL, so I think the size check here may be a problem.

319

It's worth pointing out that both of these are expressions that operate on a literal, and if we're going to allow expressions that operator on a literal, why only these two? e.g. why not allow #define FOO ~0U or #define BAR FOO + 1? Someday (not today), it would be nice for this to work on any expression that's a valid constant expression.

419

Diagnostics in clang-tidy don't start with a capital letter or have trailing punctuation.

420

I *think* you don't need to call getName() here because the diagnostics engine already knows how to handle an IdentifierInfo * (but I could be remembering wrong)

429
clang-tools-extra/clang-tidy/modernize/ModernizeTidyModule.cpp
88–92

Unrelated formatting changes? (Feel free to land separately)

clang-tools-extra/test/clang-tidy/checkers/modernize-macro-to-enum.cpp
68–69

What about an #undef that's not adjacent to any macros? e.g.,

#define FOO 1
#define BAR 2
#define BAZ 3

int i = 12;

#if defined(FROBBLE)
#undef FOO
#endif

I'm worried that perhaps other code elsewhere will be checking defined(FOO) perhaps in cases conditionally compiled away, and switching FOO to be an enum constant will break other configurations. To be honest, I'm a bit worried about that for all of the transformations here... and I don't know a good way to address that aside from "don't use the check". It'd be interesting to know what kind of false positive rate we have for the fixes if we ran it over a large corpus of code.

LegalizeAdulthood marked 2 inline comments as done.Mar 15 2022, 9:46 AM
LegalizeAdulthood added inline comments.
clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
48

It's a state machine, where the states are named for what we've seen so far and we're looking for two consecutive line endings, not just one. Does it make sense now?

84–85

Good catch, I'll add a test.

107

I use the type returned by getSpellingLineNumber.

316

On an earlier iteration I had some tests for literals with suffixes and the suffix is included in the literal token. I can add some test cases just to make it clear what the behavior is when they are encountered.

319

A later enhancement can generalize to literal expressions (which are valid initializers for an enumeration), but I wanted to cover the most common case of simple negative integers in this first pass.

clang-tools-extra/clang-tidy/modernize/ModernizeTidyModule.cpp
88–92

Probably, I just routinely clang-format the whole file instead of just isolated changes.

clang-tools-extra/test/clang-tidy/checkers/modernize-macro-to-enum.cpp
68–69

Yeah, the problem arises whenever you make any changes to a header file. Did you also change all translation units that include the header? What about conditionally compiled code that was "off" in the translation unit for the automated change? Currently, we don't have a way of analyzing a group of translation units together for a cohesive change, nor do we have any way of inspecting more deeply into conditionally compiled code. Addressing those concerns is beyond the scope of this check (or any clang-tidy check) as it involves improvements to the entire infrastructure.

However, I think it is worth noting in the documentation about possible caveats. I think the way clang-tidy avoids this problem now is that you have to request fixes and the default mode is to issue warnings and leave it up to the reader as to whether or not they should apply the fixes.

I believe I already have logic to disqualify any cluster of macros where any one of them are used in a preprocessor condition (that was the last functional change I made to this check). Looks like I need to extend that slightly to include checking for macros that are #undef'ed.

LegalizeAdulthood marked 3 inline comments as done.Mar 16 2022, 6:19 PM
LegalizeAdulthood added inline comments.
clang-tools-extra/test/clang-tidy/checkers/modernize-macro-to-enum.cpp
68–69

OK, looks like I was already handling this, LOL. See line 135

// Undefining an enum-like macro results in the enum set being dropped.
LegalizeAdulthood marked an inline comment as done.Mar 16 2022, 6:19 PM
LegalizeAdulthood marked 5 inline comments as done.Mar 16 2022, 9:26 PM
LegalizeAdulthood marked an inline comment as done.
aaron.ballman added inline comments.Mar 17 2022, 6:11 AM
clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
48

Thanks, I understood it was a state machine, but it's a confused one to me. \r was the line ending on Mac Classic, I've not seen it used outside of that platform (and I've not seen anyone write code for that platform in a long time). So, to me, the only valid combinations of line endings to worry about are: LF LF; CRLF CRLF; CRLF LF; LF CRLF.

LF LF returns false (Nothing -> LF -> return false)
CRLF CRLF returns false (Nothing -> CR -> CRLF -> CRLFCR -> return false)
CRLF LF returns true (Nothing -> CR -> CRLF -> LF -> finish loop)
LF CRLF returns true (Nothing -> LF -> CR -> CRLF -> finish loop)

(If you also intend to support Mac Classic line endings for some reason, this gets even more complicated.)

107

Sounds like a good enough reason to me, thanks!

316

Thanks, I'd appreciate adding the tests just to be sure we handle the case properly.

319

I'm less worried about the arbitrary constant expressions than I am about not supporting ~ because ~0U is not uncommon in macros as a way to set all bits to 1. It's certainly more common than seeing a unary +, at least in my experience. However, an even more important use case that I should have thought of first is surrounding the value in parens (which is another common idiom with macros). e.g, #define ONE (1)

Some examples of this in the wild (search the files for ~0U):

https://codesearch.isocpp.org/actcd19/main/l/linux/linux_4.15.17-1/drivers/gpu/drm/i915/gvt/handlers.c
https://codesearch.isocpp.org/actcd19/main/w/wine/wine_4.0-1/dlls/d3d8/d3d8_private.h
https://codesearch.isocpp.org/actcd19/main/q/qtwebengine-opensource-src/qtwebengine-opensource-src_5.11.3+dfsg-2/src/3rdparty/chromium/third_party/vulkan/include/vulkan/vulkan.h

(There's plenty of other examples to be found on the same site.)

I'm fine not completing the set in the initial patch, but the current behavior is a bit confusing (+ is almost of negligible importance). I think ~ and parens need to be supported; I'd prefer in this patch, but I'm fine if it comes in a subsequent patch so long as those two are supported before we release.

clang-tools-extra/clang-tidy/modernize/ModernizeTidyModule.cpp
88–92

Please don't clang-format the whole file (it makes code reviews more difficult; we document this request a bit in https://llvm.org/docs/CodingStandards.html#introduction), there's a script for formatting just the isolated changes: https://clang.llvm.org/docs/ClangFormat.html#script-for-patch-reformatting that I've found works really well in most cases.

clang-tools-extra/docs/clang-tidy/checks/modernize-macro-to-enum.rst
46

+1 to it not being necessary, there's a command line option to reformat fixes ( --format-style=<string>).

clang-tools-extra/test/clang-tidy/checkers/modernize-macro-to-enum.cpp
68–69

Yeah, you already have the code for handling this somewhat (that's one of the reasons why I brought this particular use case up). My greater concern is: how many false positives does this check generate on real world code? Documentation may help alleviate those concerns well enough, but if the false positive rate is sufficiently high that you basically have to disable this check for real world code, we need to do better. I don't fully trust my intuition on this one because preprocessor code in the real world has 40+ years worth of accumulated oddities, so having some actual measurements against real world code would be very informative.

I think I've got all the changes incorporated, but I'm getting a test failure so I haven't uploaded a new diff.

Honestly, I don't understand the test failure output. On Windows, the test failure output is all mangled and
difficult to interpret correctly as to what exactly is the problem, so I'm still trying to figure it out.

Improving the readability of the test failures is one of the things I would like to address in a future change.
I suspect it's only a problem on Windows as it seems most of the clang-tidy devs are using unix?

I think I've got all the changes incorporated, but I'm getting a test failure so I haven't uploaded a new diff.

Honestly, I don't understand the test failure output. On Windows, the test failure output is all mangled and
difficult to interpret correctly as to what exactly is the problem, so I'm still trying to figure it out.

If you're able to post the output you're getting, I can try to help psychically debug it.

Improving the readability of the test failures is one of the things I would like to address in a future change.
I suspect it's only a problem on Windows as it seems most of the clang-tidy devs are using unix?

FWIW, I use Windows almost exclusively. Also, clang-tidy is frequently integrated into people's CI pipelines so even folks on *nix can be impacted by the behavior on Windows in those cases. That said, I have no idea what the test failures look like or how likely they are to be hit, so it may be reasonable to improve in a follow-up.

I've worked through this issue before, I just need to spend some time on it.

The basic problem is that the test fails and dumps a bunch of output to "help" you
understand the failure, but the way it is formatted and mangled doesn't end up
being useful.

LegalizeAdulthood marked 2 inline comments as done.Mar 18 2022, 10:09 AM
LegalizeAdulthood added inline comments.
clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
316

I added tests in my latest diff, which I'll upload now even though the tests are failing. (I'll upload again after I've got the test failure figured out.)

319

The difficulty in supporting more complex expressions is that we have NO AST support here and it involves manually matching tokens in the macro definition.

However, your point about ~ is well taken and that's easy to add based on what I've got here. I thought it important to handle negative literals, so I added support for unary -. I added support for unary + out of symmetry.

clang-tools-extra/clang-tidy/modernize/ModernizeTidyModule.cpp
88–92

Most of the time it doesn't result in any new diffs besides those I'm adding myself. This is particularly true because most of the time I'm contributing new checks (which means whole new files) or fixing bugs on my existing checks (which have already had the whole file clang-format'ed).

I was unaware of the script, I'll take a look at that, thanks.

Should we cross link to the docs for this script in the "contributing" docs for clang-tidy?

LegalizeAdulthood marked 2 inline comments as done.
  • Update from review comments
  • check-clang-tools is reporting a test failure that still needs to be diagnosed (I think it is a mismatch between a CHECK-MESSAGES line and the exact output)
LegalizeAdulthood marked 3 inline comments as done.Mar 18 2022, 10:20 AM
LegalizeAdulthood added inline comments.
clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
48

I was trying to follow "be liberal in what you accept as input and conservative in what you generate as output" maxim. I can remove the CR as a line ending case if you think it's too obscure.

clang-tools-extra/test/clang-tidy/checkers/modernize-macro-to-enum.cpp
68–69

In the latest diff, I added a test case to make it clear that even if you #undef a macro later in the file, it invalidates all the surrounding macros in the cluster containing the undef'ed macro.

I'm open to suggestions for code bases on which to run this. Since this was motivated by my modernization of the fractint code base, I will run it on an old version of the code from my repo before I had manually converted the defines to enums.

LLVM isn't a good test case here, because LLVM doesn't use macros as enums :)

For this scenario (a macro that is later undef'ed), I don't believe I will emit any false positives.

LegalizeAdulthood marked 2 inline comments as done.Mar 18 2022, 10:21 AM

I think I've got all the changes incorporated, but I'm getting a test failure so I haven't uploaded a new diff.

If you're able to post the output you're getting, I can try to help psychically debug it.

I finally figured it out! I previously had written:

void MacroToEnumCallbacks::warnMacroEnum(const EnumMacro &Macro) const {
  Check->diag(Macro.Directive->getLocation(),
              "macro '%0' defines an integral constant; prefer an enum instead")
      << Macro.Name.getIdentifierInfo()->getName();
}

and you suggested that I could drop the ->getName() as it had an insertion operator
for identifier info. Well, that was true, but what I didn't realize is that this would add an
extra single quotes around the identifier, so my diagnostic output now had doubled-up
single quotes everywhere and I couldn't figure out what was doing this as I couldn't find
doubled up single quotes in my format string and I was suspecting the python script
was somehow treating single quotes special somewhere.

Once I accounted for this, everything passes, so I'll be uploading a new diff shortly.

  • Tests pass again
  • Recognize bitwise negated integers
LegalizeAdulthood marked an inline comment as done.Mar 18 2022, 4:03 PM
LegalizeAdulthood added inline comments.
clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
319

I've added support for bitwise negated integers. I didn't go further and try to recognize parenthesized literals (this just seems dumb, anyway... the extra parentheses do nothing and aren't ever needed).

  • Undo change introduced by clang-format
aaron.ballman added inline comments.Mar 22 2022, 9:20 AM
clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
48

If Clang supports it as a line ending, we probably should too, but... how do we handle CRLF vs "I mixed a CR with an LF by accident" kind of inputs? (Maybe we just treat that as CRLF and if the behavior is bad, the user shouldn't mix their line endings that way; I think that's defensible.) That seems to be similar to the scenario that's confusing me above where the user mixed an LF and CRLF by accident.

319

Thanks for adding the ~ support! I think we'll want parens supported as well because there's very common guidance to parenthesize EVERYTHING in the macro expansion list, but at the same time, I think that can be done once we get a bug report from a user.

clang-tools-extra/test/clang-tidy/checkers/modernize-macro-to-enum.cpp
68–69

In the latest diff, I added a test case to make it clear that even if you #undef a macro later in the file, it invalidates all the surrounding macros in the cluster containing the undef'ed macro.

Oh, thank you! I had the impression we only cared if it was undefed within the same "run" we used to determine the macro.

I'm open to suggestions for code bases on which to run this.

I'd recommend some FreeBSD or Linux packages (or the whole distro if that's easy enough), as those tend to have a fair number of surprises. If that's harder and you'd rather just do one big project, I'd recommend something openssl, sqlite3, maybe postgres for C projects likely to make a fair amount of use of macros (from what I recall of them anyway).

clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
48

Well, as far as Clang is concerned it's all just "whitespace" that gets eaten up by the preprocessor. Actually, that gives me a thought. A preprocessing directive is considered to end at the physical line ending, so I should look to see what sort of characters it considers to "end the line".

For the accidental mix-up, I'm not going to worry about that here. Your input files are assumed to be "well formed". The worst that happens in this check is that two blocks of macros that look like they are separated by a blank line are considered as a single clump by this check.

In other words, the worst that can happen is:

  • Two clumps of macros are considered together.
  • One clump of macros that is discarded because it doesn't follow the constraints "taints" an adjacent clump of macros that do follow the constraints.

Either way, nothing harmful happens to your code. It will still compile and be syntactically and semantically equivalent to what was there before.

clang-tools-extra/test/clang-tidy/checkers/modernize-macro-to-enum.cpp
68–69

Those are all good suggestions, I'll see what I can do.

aaron.ballman added inline comments.Mar 24 2022, 5:42 AM
clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
48

Actually, that gives me a thought. A preprocessing directive is considered to end at the physical line ending, so I should look to see what sort of characters it considers to "end the line".

All of \r, \n, \r\n I believe (you can double-check in Lexer::LexTokenInternal()

Either way, nothing harmful happens to your code. It will still compile and be syntactically and semantically equivalent to what was there before.

Oh, that's a very good point, thank you. I think that's reasonable fallback behavior for these weird edge cases.

LegalizeAdulthood marked 3 inline comments as done.Mar 24 2022, 7:53 PM
LegalizeAdulthood added inline comments.
clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
48

Well..... maybe.

If you look at Lexer::ReadToEndOfLine which is used to skip to the end of a preprocessor directive you'll see that it considers the first of '\r', '\n' or '\0' (end of file) as the end of the "line". This is around line 2835 of Lexer.cpp in my tree.

LegalizeAdulthood marked an inline comment as done.
  • clang-format
aaron.ballman accepted this revision.Mar 25 2022, 6:12 AM

Thanks for the discussion on this new check, it LGTM!

clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
48

Yeah, I saw that as well (that's typically used for error recovery in the preprocessor). We also have Lexer::SkipWhitespace() which skips all vertical whitespace, but not \0.

This revision is now accepted and ready to land.Mar 25 2022, 6:12 AM
This revision was automatically updated to reflect the committed changes.
RKSimon added inline comments.
clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp
216

@LegalizeAdulthood I'm seeing errors in https://lab.llvm.org/buildbot/#/builders/93/builds/7956

/home/buildbots/ppc64be-clang-multistage-test/clang-ppc64be-multistage/llvm/clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp:216:22: error: declaration of ‘const clang::LangOptions& clang::tidy::modernize::{anonymous}::MacroToEnumCallbacks::LangOptions’ changes meaning of ‘LangOptions’ [-fpermissive]
  216 |   const LangOptions &LangOptions;
dyung added a subscriber: dyung.Mar 25 2022, 11:55 AM

I've reverted this change in order to get the build bots green again.