Add a recursive descent parser to match macro expansion tokens against
fully formed valid expressions of integral literals. Partial expressions will
not be matched -- they can't be valid initializing expressions for an enum.
Fixes #55055
Differential D124500
[clang-tidy] Support expressions of literals in modernize-macro-to-enum LegalizeAdulthood on Apr 26 2022, 8:29 PM. Authored by
Details Add a recursive descent parser to match macro expansion tokens against Fixes #55055
Diff Detail Event Timeline
Comment Actions To clarify my previous comments, I'm fine punting on the edge cases until user reports come in, so don't let them block this review if you feel strongly about not supporting them. But when user reports start coming in, at some point I might start asking to replace the custom parser with calling into the clangParse library through some invented utility interface so that we don't have to deal with a long tail of bug reports.
Comment Actions Yeah, I think punting on edge cases is the right thing to do here. As I say, Maybe we're thinking about this check differently. I want this check to do the majority of the heavy lifting so that I'm only left with a I think this is the baseline for all the modernize checks, really. There are still cases As for calling into clangParse, I think that would be overkill for a couple reasons.
Comment Actions FYI, once nice addition of the parsing of macro bodies is that it paves the way for Comment Actions I largely agree, but I've found cases where we'll convert correct code to incorrect code, so it's a bit stronger than that.
I think that's a reasonable goal, but we're not meeting the "never ever generate invalid code" part. I already know we can break correct C and C++ code through overflow. Should we ever allow an option to use an enum with a fixed underlying type in C++ (either enum class or enum : type form), we'll have the same breakage there but at different thresholds.
There's a few reasons I disagree with this. First, you need to know the value of the constant expression in order to know whether it's valid as an enumeration constant. That alone requires expression evaluation capabilities, assuming you want the check to behave correctly for those kinds of cases. But second, without support for generating that AST and validating it, you can never handle cases like this: constexpr int a = 12; constexpr int foo() { return 12; } #define FOO (a + 1) #define BAR (a + 2) #define BAZ (a + 3) #define QUUX (foo() + 4) because you have no way to know whether or not that constant expression is valid based solely on token soup (especially when you start to factor in namespaces, qualified lookup, template instantiations, etc). So I think avoiding the parser will limit the utility of this check. And maybe that's fine, maybe it's only ever intended to convert C code using defines to C code using enumerations or other such simple cases.
Absolutely 100% agreed on this point.
Yeah, and that wouldn't even be sufficient because I still think we're going to want to know the *value* of the expression at some point. Allllll that "this is the wrong way!" doom and gloom aside, I still think you're fine to proceed with the current approach if you'd like to. The situations under which it will break correct code should be something we document explicitly in the check somewhere, but they feel sufficiently like edge cases to me that I'm fine moving forward with the caution in mind that if this becomes too much more problematic in the future, we have *a* path forward we could use to improve things should we decide we need to.
Comment Actions Are you talking generally, or with this check? I can't see how this check
How so? Can you give an example where this check will produce invalid code?
I'm not following you. Nothing requires knowing this yet.
QUUX will never be converted to an enum by this check. It references an identifier foo.
You haven't shown an example yet where it will break code.
Comment Actions Specifically this check.
As posted before: #define FINE 1LL << 30LL #define BAD 1LL << 31LL #define ALSO_BAD 1LL << 32L Now with godbolt goodness: https://godbolt.org/z/Tzbe8qWT5
Comment Actions OK, so thinking about this review a little more, I propose this:
How does that sound? Comment Actions I think that sounds like a good approach -- I expect the third bullet is when we'll likely learn whether we should have let Clang do the parsing or not (because then we're replicating not only the expression parsing but the constant evaluation calculations from the frontend). I re-reviewed the patch as it stands today, and given the above plan, I think this is good to go. So it gets my LGTM and my thanks for the good discussions!
|
Uh.... I guess this needs some sort of copyright notice?