Page MenuHomePhabricator

[clang-tidy] Add cppcoreguidelines-prefer-scoped-enums
Needs ReviewPublic

Authored by janosbenjaminantal on Aug 10 2020, 5:08 PM.

Details

Summary

This check aims to flag every occurence of unscoped enumerations and provide useful fix to convert them to scoped enumerations.

It works for the most cases, except when an enumeration is defined within a macro, but the name of enumeration is a macro argument. This is indicated in the documentation.

Diff Detail

Event Timeline

janosbenjaminantal requested review of this revision.Aug 10 2020, 5:08 PM
Eugene.Zelenko added a project: Restricted Project.
Eugene.Zelenko removed a subscriber: aaron.ballman.

Please mention new check in Release Notes.

clang-tools-extra/docs/clang-tidy/checks/cppcoreguidelines-prefer-scoped-enums-over-unscoped.rst
22 ↗(On Diff #284531)

Please enclose enum in double back-ticks.

65 ↗(On Diff #284531)

Please add newline.

Is 'over-unscoped' really needed in the name, would just 'prefer-scoped-enum' be better, WDYT?

For the case of macro, you can check if the location is a macro using SourceLocation::isMacroID().

For this to work you would also need to check every single usage of the the values in the enum to make sure they are converted to use the scoped access.
You're best bet would be to actually store a map indexed by unscoped enum decls with a set of all their locations and maybe a flag to say if the fix can be applied or not.
For instance a fix couldn't be applied if any of the usages or decls are in macros.
This map could then be checked using the endOfTranslationUnit virtual method, with all the diags and fixes being spawned there.

clang-tools-extra/clang-tidy/cppcoreguidelines/PreferScopedEnumsOverUnscopedCheck.cpp
31–32 ↗(On Diff #284531)

Please hoist this logic into the matcher.

enumDecl(unless(isScoped())).bind("enumDecls")`
clang-tools-extra/docs/clang-tidy/checks/cppcoreguidelines-prefer-scoped-enums-over-unscoped.rst
20 ↗(On Diff #284531)

Semi-colon after the struct definition.

29 ↗(On Diff #284531)

Semi-colon after the struct definition.

clang-tools-extra/test/clang-tidy/checkers/cppcoreguidelines-prefer-scoped-enums-over-unscoped.cpp
27–28 ↗(On Diff #284531)

nit:

Is 'over-unscoped' really needed in the name, would just 'prefer-scoped-enum' be better, WDYT?

For the case of macro, you can check if the location is a macro using SourceLocation::isMacroID().

For this to work you would also need to check every single usage of the the values in the enum to make sure they are converted to use the scoped access.
You're best bet would be to actually store a map indexed by unscoped enum decls with a set of all their locations and maybe a flag to say if the fix can be applied or not.
For instance a fix couldn't be applied if any of the usages or decls are in macros.
This map could then be checked using the endOfTranslationUnit virtual method, with all the diags and fixes being spawned there.

Yes, it is better without 'over-unscoped'. I will change this.

I checked how other checks are dealing with macros, but I haven't found anything useful. Your idea sounds good, I will try to implement it.

  • Fix not just the declarations, but also the enum usages
  • Fix only the enums that are not declared/used in macros

As I am totally new to llvm, it might take a few days to come with a proper solutions, but I will do my best.

I addressed all of the review comments. Apart from the small fixes I extended a checker logic.

The check collects the following information from every unscoped enum (into the FoundEnumInfo struct):

  1. The enum's definition: Every information from an enum is attached to the enum definition. If an enum is not defined in the translation unit, then the first forward declaration will represent the enum.
  2. Whether the enum definition can be fixed: if the enum definition appears in a macro, then it cannot be fixed. otherwise it can be fixed by inserting the class keyword into the definition.
  3. Fixable usages of enum values: these are the usages that can be fixed by inserting the right qualifier.
  4. Fixable forward declarations of the enum: these can and must be fixed by inserting the class keyword similarly to the definition.
  5. Usages that cannot be fixed: these are the usages that are in a macro.
  6. Forward declarations that cannot be fixed: similarly to the unfixable usages, these are in a macro.
  7. Implicit casts: the implicit casts prevent the enum from being fixed, because the scoped enumerations cannot be implicitly casted to numerical types.

If any of the definition/usages/forward declarations/implicit casts prevent the enum from being fixed, a note is prompted out for every occurrences to inform the users about this.

I though about more cases that can prevent an enum from being fixed, but I haven't found any other reason. If you know anything else, feel free to mention it.

I tried to come up with relevant, complex and stressful test cases. I already spot a few implementation bugs with them. If you miss anything, feel free to let me know.

The known "limitations" are mostly similar to other checks:

  • It cannot detect unfixable cases from other translation units. Practically that means if an enum is used in multiple source files, one of them might end up not fixed. I tried to work around this, but I haven't found any solution for this, moreover this cause problems for other checks also. Therefore I think it shouldn't be a blocking issue.
  • It doesn't take into account if an enum is defined in a header that is filtered out. Similarly to the previous one, I tried to find a solution for this, but I was unable (the ClangTidyCheck class can access the header filter information, but it doesn't expose it for the descendant classes). I also checked other checks, and they behave in the same way. Therefore I also think it is shouldn't be a blocking issue.
  • The checks doesn't take into consideration the aliases in the diagnostic messages and fixes: always the original name is showed/inserted. I checked other checks also, and for the diagnostics message they also show the original name. I didn't find a check where the alias can be relevant when the fixes are applied. This issue might be a problem, however in my opinion it is not serious issue. Either it is okay for you or not, if you know how it could be improved, please provide me some points where to start searching. Even if this patch is got approved, I plan to improve it in the future and this point is a good things to improve.

It is not strongly connected to this review, but in the future I am planning to extend the check with:

  • options to exclude enums, because changing them to scoped enumerations might not be suitable for every cases
  • options to force the doable fixes: based on my little experience it might be easier to force the doable fixes and manually fix the remaining ones

If you have any opinion, thoughts or counter-argument against the planned extensions, please let me know.

Fix clang-tidy issues.

It may be a pain, but I'd also like to see an option like RemovePrefix.
Generally most unscoped enums constants have prefixes based on the enum name. A simple way around this would just be to get the longest common prefix in all the enums constants. There is probably some other heuristics that could be used like maybe stop after you reach an underscore or change of case but probably not essential.

clang-tools-extra/docs/ReleaseNotes.rst
82

Could this perhaps be a little more informative?

clang-tools-extra/docs/clang-tidy/checks/cppcoreguidelines-prefer-scoped-enums.rst
36–37

I'd prefer if this was user controlled, An option like UseEnumStruct that defaults to false. Just because its mainly used, doesn't mean there aren't some projects that prefer to use enum struct.

The known "limitations" are mostly similar to other checks:

  • It cannot detect unfixable cases from other translation units. Practically that means if an enum is used in multiple source files, one of them might end up not fixed. I tried to work around this, but I haven't found any solution for this, moreover this cause problems for other checks also. Therefore I think it shouldn't be a blocking issue.

That's what the run_clang_tidy.py script is meant to handle.

  • It doesn't take into account if an enum is defined in a header that is filtered out. Similarly to the previous one, I tried to find a solution for this, but I was unable (the ClangTidyCheck class can access the header filter information, but it doesn't expose it for the descendant classes). I also checked other checks, and they behave in the same way. Therefore I also think it is shouldn't be a blocking issue.

I recently pushed an upgrade to readability-identifier-naming where it would check the naming style for identifiers declared in header files, maybe thats something this could also use, this is the commit 4888c9ce97d8

It is not strongly connected to this review, but in the future I am planning to extend the check with:

  • options to exclude enums, because changing them to scoped enumerations might not be suitable for every cases

Not strictly necessary, if people don't want the fix they could annotate the code with a // NOLINT(*prefer-unscoped-enums) comment.

  • options to force the doable fixes: based on my little experience it might be easier to force the doable fixes and manually fix the remaining ones

Forcing the fix is usually just a case of converting implicit cast usages of the constants into static casts.

njames93 retitled this revision from [clang-tidy] Add cppcoreguidelines-prefer-scoped-enums-over-unscoped to [clang-tidy] Add cppcoreguidelines-prefer-scoped-enums.Oct 19 2020, 4:00 AM

I tend to be very skeptical of the value of checks that basically throw out entire usable chunks of the base language because the check is effectively impossible to apply to an existing code base. Have you run the check over large C++ code bases to see just how often the diagnostic triggers and whether there are ways we might want to reduce false-positives that the C++ Core Guidelines haven't considered as part of their enforcement strategy?

Btw, one fear I have with this check is that the automated fixits are somewhat risky -- unscoped vs scoped enumerations has ABI implications and changing from one to the other may break the ABI.

clang-tools-extra/clang-tidy/cppcoreguidelines/PreferScopedEnumsCheck.cpp
23

FWIW, Clang accepts scoped enumerations as a conforming language extension in older language modes. Should this check be enabled for any C++ mode?

27

Drop top-level const qualifiers on things that are not pointers or references (it's not a pattern we typically use in the project). This applies elsewhere in the patch as well.

29

This should be checking whether the enumeration comes from a system header or not -- we should definitely not warn about enumerations the user cannot change.

92

This feels a bit like it's a user option given that enum struct is also used. I think using class is a sensible default, though.

95–96

Please do not use auto when the type is not spelled out in the initialization (or is blindingly obvious from context, like iterators, or is unutterable, like lambdas).

125

I would drop the top-level const pointer qualification (elsewhere as well). The object being pointed at is fine, I'm only suggesting removing the qualification for the pointer itself.

135

llvm::for_each()? This may actually shorten many of the loops in this function.

168

Please don't use auto here.

195

Why are you using the internal interfaces here?

clang-tools-extra/clang-tidy/cppcoreguidelines/PreferScopedEnumsCheck.h
34

Does this need to be public?

39–43

Please remove the unnecessary empty initializers.

clang-tools-extra/docs/clang-tidy/checks/cppcoreguidelines-prefer-scoped-enums.rst
4

Underlining looks to be the wrong length.

36–37

+1. Also, we should document when the check doesn't trigger (like macros or system headers), discuss forward reference behavior, etc.

clang-tools-extra/test/clang-tidy/checkers/cppcoreguidelines-prefer-scoped-enums.cpp
34

Another interesting test case:

enum BigEnumConstants {
  TooBig = 0xFFFF'FFFF'0
};

Changing this unscoped enumeration into a scoped enumeration without specifying the fixed underlying type will break code in this case because the enum constant value is too large to be represented by an int.

166

What should happen in cases like:

// A case where the user has manually added a prefix they want; it
// seems like fixing this to use a scoped enumeration changes the
// expected way you interface with the enumeration by name.
namespace EnumPrefix {
enum Whatever {
  One,
  Two
};
}

// Another form of the same idea above.
struct AnotherPrefix {
  enum Whatever {
    One,
    Two
  };
};

// What about use in class hierarchies?
// Header file the user controls
enum SomeEnum {
  One,
  Two
};

struct SomeType {
  virtual void func(SomeEnum E) = 0; // Changing this may break the hierarchy
};

// Here's another situation where even warning is a bit suspect
enum E : int;
extern void func(E);

I tend to be very skeptical of the value of checks that basically throw out entire usable chunks of the base language because the check is effectively impossible to apply to an existing code base. Have you run the check over large C++ code bases to see just how often the diagnostic triggers and whether there are ways we might want to reduce false-positives that the C++ Core Guidelines haven't considered as part of their enforcement strategy?

I disagree with this mentality, This is following what the core guideline states perfectly. If a codebase doesn't want to adhere to those guidelines, they don't need to run this check. If you think there are shortcomings in the guidelines, raise an issue on their github.

clang-tools-extra/clang-tidy/cppcoreguidelines/PreferScopedEnumsCheck.cpp
23

If you go down that route, you lose portability as the transformed code will only compile on versions of clang, Could be option controlled I guess, WDYT?

29

This should be checking whether the enumeration comes from a system header or not -- we should definitely not warn about enumerations the user cannot change.

Won't clang-tidy handle this by suppressing system header diagnostics, I thought that's what the feature was there for.

clang-tools-extra/test/clang-tidy/checkers/cppcoreguidelines-prefer-scoped-enums.cpp
166

Maybe the check should only flag enumerations that appear at TU level, any other enumeration is kind of scoped already.

The known "limitations" are mostly similar to other checks:

  • It cannot detect unfixable cases from other translation units. Practically that means if an enum is used in multiple source files, one of them might end up not fixed. I tried to work around this, but I haven't found any solution for this, moreover this cause problems for other checks also. Therefore I think it shouldn't be a blocking issue.

That's what the run_clang_tidy.py script is meant to handle.

It is good to know, so it is not an issue.

  • It doesn't take into account if an enum is defined in a header that is filtered out. Similarly to the previous one, I tried to find a solution for this, but I was unable (the ClangTidyCheck class can access the header filter information, but it doesn't expose it for the descendant classes). I also checked other checks, and they behave in the same way. Therefore I also think it is shouldn't be a blocking issue.

I recently pushed an upgrade to readability-identifier-naming where it would check the naming style for identifiers declared in header files, maybe thats something this could also use, this is the commit 4888c9ce97d8

Will check.

It is not strongly connected to this review, but in the future I am planning to extend the check with:

  • options to exclude enums, because changing them to scoped enumerations might not be suitable for every cases

Not strictly necessary, if people don't want the fix they could annotate the code with a // NOLINT(*prefer-unscoped-enums) comment.

I think with the inline suppression the users should annotate every usage of the enum while with the option it would be enough to list the enum's name to ignore it from the whole check. However it is just an improvement, when I reach that point I will do further digging about this topic, how the suppression actually work etc.

  • options to force the doable fixes: based on my little experience it might be easier to force the doable fixes and manually fix the remaining ones

Forcing the fix is usually just a case of converting implicit cast usages of the constants into static casts.

What I meant is an option to express that the user don't care if there are some cases for an enum that cannot be fixed: fixed the other occurrences and the user will handle the rest. Currently the automated fix only fix the enums where every occurrence can be fixed. As a second thought this might be superfluous, because the other way is always possible: the user first fix the occurrences that cannot be fixed automatically and then use the check to fix the rest. So I think it is something that wouldn't mean real value.

clang-tools-extra/clang-tidy/cppcoreguidelines/PreferScopedEnumsCheck.cpp
23

I think by default the check shouldn't do that (source codes that uses multiple compilers?), but it might worth an option to be able to turn it on on "clang-only" code bases.

92

I would like to keep this as small as possible, so I didn't planned to add this option as part of the first version. Do you think it is necessary to add this to approve the review? If yes, then I will add it of course, otherwise maybe in a different review in the future.

clang-tools-extra/clang-tidy/cppcoreguidelines/PreferScopedEnumsCheck.h
34

The original function in MatchCallback is public, so I think it is reasonable to have it as a public function here also.

clang-tools-extra/test/clang-tidy/checkers/cppcoreguidelines-prefer-scoped-enums.cpp
34

That's an excellent catch. I think comparing the size of int and the underlying type of the enum might help with some extra logic, because

enum class ValidWithLongLongInt {
  Value = 0x7FFF'FFFF'FFFF'FFFF
};

enum InvalidWithLongLongInt {
  Value = 0x8FFF'FFFF'FFFF'FFFF
};

The second one must use unsigned int. I think this is manageable and will have a look at it.

166

I think there is no good solution for the first two cases. We cannot be sure whether it is meant to be a prefix or not. It is just a very educated guess (if the the scope contains only one enum, then it is very like just a prefix, but might not be true in every cases). I don't have a very strong opinion about this, but I think this check can be useful even without supporting this case. Do you think it is a very common solution?

For the third one, I think if the fix is applied for every files, then it won't be a problem. If the fix cannot be applied to all of the files (e.g.: in case of a lib where the users can inherit from the classes defined in the lib) it will definitely break the hierarchy, however I don't think we can do anything about this. I am not totally sure, but I think this a problem for a lot of other checks too, e.g.: google-explicit-constructor or readability-identifier-naming. However I am afraid I misunderstood something, so let me in that case.

The last case with the extern function is a perfect place to suppress the warning in my opinion. How it could be decided, whether the user can change the function or not? Therefore I would keep the warning in that case and trust the users to suppress it if necessary.

I tend to be very skeptical of the value of checks that basically throw out entire usable chunks of the base language because the check is effectively impossible to apply to an existing code base. Have you run the check over large C++ code bases to see just how often the diagnostic triggers and whether there are ways we might want to reduce false-positives that the C++ Core Guidelines haven't considered as part of their enforcement strategy?

No, I haven't checked over large C++ code bases, but I will do this.

Btw, one fear I have with this check is that the automated fixits are somewhat risky -- unscoped vs scoped enumerations has ABI implications and changing from one to the other may break the ABI.

I am not familiar with these specific ABI implications, if you could help me with some keywords/references/link to start to investigate, I am happy to deep dive into it. I am also willing to discard the automated fixes if it makes this review better. What do you think?

It is not strongly connected to this review, but in the future I am planning to extend the check with:

  • options to exclude enums, because changing them to scoped enumerations might not be suitable for every cases

Not strictly necessary, if people don't want the fix they could annotate the code with a // NOLINT(*prefer-unscoped-enums) comment.

I think with the inline suppression the users should annotate every usage of the enum while with the option it would be enough to list the enum's name to ignore it from the whole check. However it is just an improvement, when I reach that point I will do further digging about this topic, how the suppression actually work etc.

Ah so an issue here is you emit a warning for the enum declaration as well as each usage and the NOLINT will only affect warnings emitted for the line its declared on.
Perhaps this needs a reshape.

Only emit a warning for the enum declaration and make sure the fix-its are attached to that warning, This is a safer way around this, and its also how RenamerClangTidy checks work. It means that if there is any conflict when trying to apply one of the fix-its none of them will be applied.
If you want to emit notes about usages that need updating you can, but don't attach fixes there, they wont be applied.

Due to how diagnostics works this will make it the code slightly more finicky, you can only have one DiagnosticBuilder active at once, and the notes need to be added after the warning has been emitted.

Maybe if it can be fixed, Emit a warning for the enum declaration, then add fix-its for the decl, any forward declares and any usages, then emit that diagnostic (Happens when the DiagnosticBuilder gets destroyed). After that you could emit notes for the forward declarations and usages(though I don't think we really needs notes for those)
If it can't be fixed its kind of similar, emit a warning for the declaration, then a note for each forward declaration and fault for why a fix can't be applied.
This has the added bonus that clang-tidy will output the diagnostics for an unscoped enum and all its uses next to each other.
If you want I've done the legwork to change the diagnostics behaviour to this and fix up the test cases if you want to have a look, should apply cleanly atop this version.

I tend to be very skeptical of the value of checks that basically throw out entire usable chunks of the base language because the check is effectively impossible to apply to an existing code base. Have you run the check over large C++ code bases to see just how often the diagnostic triggers and whether there are ways we might want to reduce false-positives that the C++ Core Guidelines haven't considered as part of their enforcement strategy?

I disagree with this mentality, This is following what the core guideline states perfectly. If a codebase doesn't want to adhere to those guidelines, they don't need to run this check. If you think there are shortcomings in the guidelines, raise an issue on their github.

As with other C++ core guideline checks, the usual approach is for the patch authors to go back to the C++ Core Guideline folks with these issues to find out how they should be addressed. This has produced better C++ Core Guidelines and better checkers in the past and is the way I think we should continue to proceed. That said, I agree with you that we will eventually go with whatever the guideline authors want -- but this is us doing our due diligence since the authors put very little thought into enforcement of the rules in general, and is appropriate (for any guidelines, not just the C++ core guidelines). Note, this is in keeping with the guidelines themselves which state that they're not meant to subset the language and are intended to allow for gradual adoption of rules (https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#innot-non-aims).

Btw, one fear I have with this check is that the automated fixits are somewhat risky -- unscoped vs scoped enumerations has ABI implications and changing from one to the other may break the ABI.

I am not familiar with these specific ABI implications, if you could help me with some keywords/references/link to start to investigate, I am happy to deep dive into it. I am also willing to discard the automated fixes if it makes this review better. What do you think?

I did some digging and I'm less certain I'm worried about this now. I was thinking that the mangling was different for these two cases, but it's the same mangling for func() either way: https://godbolt.org/z/xhcd7E The automated fixits still worry me in the places where they may take working code and change it to not be working (like narrowing conversions with the enumerator values, etc), but if those are addressed, then I think the fix-its are a benefit rather than a risk.

clang-tools-extra/clang-tidy/cppcoreguidelines/PreferScopedEnumsCheck.cpp
23

The impression I have from the guidelines is that they're intended for C++11 and later code bases as a way to improve the coding practices. They don't really seem to imply there's a requirement that code remains portable, but that sure seems like a reasonable goal to me. I think I'm fine leaving this disabled unless you're in C++11 mode, but we may want to see if there's a policy decision here for the core guideline checks (I have no idea how consistent we're being with considering older standards for these particular checks).

29

Good point! I forgot that clang-tidy already handled that automatically.

92

Given that two reviewers both thought it was worthwhile, I think it's probably not a bad idea to support as an option. It could be done in a follow-up patch if that's the way you'd like to proceed, though.

clang-tools-extra/test/clang-tidy/checkers/cppcoreguidelines-prefer-scoped-enums.cpp
166

Do you think it is a very common solution?

I think it likely is -- it was a common tactic used in pre-C++11 code bases (for instance, Clang uses this quite often, as in: https://github.com/llvm/llvm-project/blob/master/clang/include/clang/AST/Decl.h#L837)

For the third one, I think if the fix is applied for every files, then it won't be a problem.

Agreed -- the scenario I was thinking about is when the virtual hierarchy is part of a published interface, like in a plugin system. Changing the base declaration may be easy, but you may not even *see* the other derived classes. I'm not certain it's a solvable issue.

The last case with the extern function is a perfect place to suppress the warning in my opinion.

Agreed.

Maybe the check should only flag enumerations that appear at TU level, any other enumeration is kind of scoped already.

I was sort of wondering the same thing, but perhaps instead of TU level, I'd go with TU or anonymous namespace so that we still diagnose things like these:

namespace {
enum E { One }; // Seems reasonable to suggest scoping
namespace {
enum F { Two }; // Same here even though there are nested namespaces
}
}
clang-tools-extra/test/clang-tidy/checkers/cppcoreguidelines-prefer-scoped-enums.cpp
166

I think the scoped vs unscoped enumeration is not about the scope itself, more likely the conversion rules applied to them: the unscoped enumerations are easily, even mistakenly convertible to/from ints. The core guideline reasons with this: https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Renum-class

From that point of view, it is reasonable to convert the enums that are in their own namespaces also. Of course, it would be nice to eliminate the extra scope, but I don't think we can detect it. From my understanding, the only we could do is just check whether and enumeration is the only thing in the namespace, and if yes, then remove the extra namespace. However I think this is not very reliable.

I will try to find a pre C++11 projects which use this and check there how it could be solved.

clang-tools-extra/clang-tidy/cppcoreguidelines/PreferScopedEnumsCheck.cpp
92

Will implement this. Not sure whether in a follow-up patch or not. If it is just a few lines of code, then I will include here, otherwise I will create a follow-up patch.

aaron.ballman added inline comments.Oct 22 2020, 5:36 AM
clang-tools-extra/test/clang-tidy/checkers/cppcoreguidelines-prefer-scoped-enums.cpp
166

From an academic and API perspective, I definitely agree that the conversion rules and the fixed underlying type for scoped enumerations are really valuable. However, I think it's not uncommon for developers to primarily be worried about how people access their interfaces and so scoped nature of the name is the property they're most interested in. e.g.,

struct Prefix {
  enum TypeName {
    Name,
    OtherName
  };
};

Whether you access via Prefix::Name or Prefix::TypeName::Name matters for readability of code.

One possibility is that we make it a configuration option so the user can decide.