This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] implement cppcoreguidelines macro rules
ClosedPublic

Authored by JonasToth on Dec 31 2017, 7:29 AM.

Details

Summary

In short macros are discouraged by multiple rules (and sometimes reference randomly). [Enum.1], [ES.30], [ES.31]
This check allows only headerguards and empty macros for annotation.

Event Timeline

JonasToth created this revision.Dec 31 2017, 7:29 AM
JonasToth edited the summary of this revision. (Show Details)Dec 31 2017, 7:31 AM
JonasToth updated this revision to Diff 128372.Dec 31 2017, 7:34 AM
  • merge with local repos
JonasToth updated this revision to Diff 128373.Dec 31 2017, 7:37 AM
  • remove unneeded includes

I think this check is going to be extraordinarily chatty. For instance, macros are often used to hide compiler details in signatures, such as use of attributes. This construct cannot be replaced with anything else because the macro isn't defining an object or value. Another example where this will be bad is for conditional processing where the macro is later used in a #if, #elif, #ifdef, or #ifndef preprocessor conditional, as this also cannot be replaced with a constexpr variable. Without handling things like this, I don't see how this rule can provide utility to real world code. Do you have ideas for how to handle these kind of situations?

clang-tidy/cppcoreguidelines/MacroUsageCheck.cpp
22

I'd use Check(Check) instead of curly braces.

29

I don't think this warning would be appropriate in all language modes. For instance, macros in C code, or in C++98 mode, etc.

43

These diagnostics should be reworded to not use contractions and instead explain what the issue is. e.g., macro used to define a constant; consider using 'constexpr' instead (or something along those lines).

45

Don't use auto.

I think this check is going to be extraordinarily chatty. For instance, macros are often used to hide compiler details in signatures, such as use of attributes. This construct cannot be replaced with anything else because the macro isn't defining an object or value. Another example where this will be bad is for conditional processing where the macro is later used in a #if, #elif, #ifdef, or #ifndef preprocessor conditional, as this also cannot be replaced with a constexpr variable. Without handling things like this, I don't see how this rule can provide utility to real world code. Do you have ideas for how to handle these kind of situations?

The check will only warn in the definition of the macro not the expansion. The guidelines are really strict with macros and explicitly state that program maniupulation is a bad thing.
Having a macro without value, like header guards or compile time flag style things are not reported with this check.

Other (valid) use cases require a NOLINT with this check. But i do not know how to figure out automatically what a macro is meant to do. I can introduce a whitelist for allowed macros.
Furthermore i could make each use case a configurable option. E.g. forbidding constant definitions i still a good thing (imho).

I think this check is going to be extraordinarily chatty. For instance, macros are often used to hide compiler details in signatures, such as use of attributes. This construct cannot be replaced with anything else because the macro isn't defining an object or value. Another example where this will be bad is for conditional processing where the macro is later used in a #if, #elif, #ifdef, or #ifndef preprocessor conditional, as this also cannot be replaced with a constexpr variable. Without handling things like this, I don't see how this rule can provide utility to real world code. Do you have ideas for how to handle these kind of situations?

The check will only warn in the definition of the macro not the expansion.

I understand. It's the definitions I am concerned about.

The guidelines are really strict with macros and explicitly state that program maniupulation is a bad thing.

Yes, and I'm saying that the guidelines aren't useful for real code bases because they restrict more than is reasonable. So while I think the check is largely implementing what the guidelines recommend, I think that some of these scenarios should be brought back to the guideline authors to weigh in on before we expose the check to users.

Having a macro without value, like header guards or compile time flag style things are not reported with this check.

Other (valid) use cases require a NOLINT with this check. But i do not know how to figure out automatically what a macro is meant to do. I can introduce a whitelist for allowed macros.
Furthermore i could make each use case a configurable option. E.g. forbidding constant definitions i still a good thing (imho).

I don't think a whitelist is going to cut it here -- users are not going to try to figure out every conditional compilation or attribute macro definition used in their large code base; they'll simply disable the check as not being overly useful. For example, look at Compiler.h in LLVM and try to see how you would rewrite that code to perform the same purposes without triggering diagnostics from this check. That sort of file is common to almost every production code base I've ever come across.

One way to make this less chatty would be to check whether the macro replacement list is defining a source construct that cannot be replaced by constexpr variables or inline functions (such as LLVM_ATTRIBUTE_ALWAYS_INLINE). If we had whole-program analysis, I think we could do something further like scan the uses of the macros to determine whether they're used for conditional compilation (such as LLVM_ENABLE_EXCEPTIONS). However, I have no idea how we would handle cases like LLVM_LIKELY and LLVM_UNLIKELY, but expecting users to NOLINT 500+ lines of common macro code doesn't seem like a good first approach. I'd be curious to know what the C++ Core Guidelines folks think about those use cases and how they would recommend rewriting the code to adhere to their guidelines. Do they really expect users to use *no* macros in C++ code outside of header include guards and noop definitions even when asked about reasonable use cases like cross-compiler attributes or builtins? I mean, many of these things cannot even use the attribute the C++ Core Guidelines require for silencing such diagnostics -- how do they want to use gsl::suppress to silence a diagnostic on a macro definition?

Eugene.Zelenko added inline comments.
clang-tidy/cppcoreguidelines/MacroUsageCheck.cpp
14

Please include string and STLExtras.h.

docs/clang-tidy/checks/cppcoreguidelines-macro-usage.rst
9

Will be good idea to add link to relevant section in documentation.

JonasToth updated this revision to Diff 128869.Jan 7 2018, 4:29 AM
  • rebase after release for 6.0

Yes, and I'm saying that the guidelines aren't useful for real code bases because they restrict more than is reasonable. So while I think the check is largely implementing what the guidelines recommend, I think that some of these scenarios should be brought back to the guideline authors to weigh in on before we expose the check to users.

I see. Are u willing to identify some of the exceptions with me or don't you have enough time? I would like to propose the first list here. My opinion is, that we should leave some useful macros (e.g. Ensure from guidelines) left for manual inspection and to disable the check for these specific macros.

I don't think a whitelist is going to cut it here...

The whitelist could be a regex or something similar. Configuring the check to allow everything with a prefix would allow poor mans namespaces for macros. For llvm it would be LLVM_*, blaze could be BLAZE_*. These are the bigger codebases i know where macro usage was reasonable. :)

... -- users are not going to try to figure out every conditional compilation or attribute macro definition used in their large code base;

Conditional compilation can be done with empty macros.

#define ALLOW_EXCEPTION

#ifdef ALLOW_EXCEPTION
// exception code
#else
// no exception code
#endif

That usecase is in my eyes pretty much like an include guard an therfore acceptable, and i think not diagnosed with the check currently.

For example, look at Compiler.h in LLVM and try to see how you would rewrite that code to perform the same purposes without triggering diagnostics from this check. That sort of file is common to almost every production code base I've ever come across.
One way to make this less chatty would be to check whether the macro replacement list is defining a source construct that cannot be replaced by constexpr variables or inline functions (such as LLVM_ATTRIBUTE_ALWAYS_INLINE). If we had whole-program analysis, I think we could do something further like scan the uses of the macros to determine whether they're used for conditional compilation (such as LLVM_ENABLE_EXCEPTIONS). However, I have no idea how we would handle cases like LLVM_LIKELY and LLVM_UNLIKELY, but expecting users to NOLINT 500+ lines of common macro code doesn't seem like a good first approach. I'd be curious to know what the C++ Core Guidelines folks think about those use cases and how they would recommend rewriting the code to adhere to their guidelines. Do they really expect users to use *no* macros in C++ code outside of header include guards and noop definitions even when asked about reasonable use cases like cross-compiler attributes or builtins? I mean, many of these things cannot even use the attribute the C++ Core Guidelines require for silencing such diagnostics -- how do they want to use gsl::suppress to silence a diagnostic on a macro definition?

Common constructs that I would identify as valid:

  1. Compiler specific Attributes. #define ALWAYS_INLINE __attribute__((always_inline))

Attributes could be identified when inspecting the tokens the macro expands to. I have no clue how to implement, but i think that should even possible with string manipulations.

  1. Compiler intrinsics

These are similar to attributes and could maybe even treated the same.

  1. Compatibility with older C++ Standards

Stuff like #define OVERRIDE override if the code is compiled with C++11 or newer might be acceptable for stepwise modernization.
This can can be considered as program text manipulation that was explicitly forbidden but does not obscure code so an exception could be made.
Such macros can be detected with string comparisons.

  1. Debugging/Logging stuff that contains the Line and file.

These can not be modeled with current C++(?), but LineInfo or similar exists. I don't know what its status is. I think it is ok to require manual silencing for such macros.

@aaron.ballman Do you agree with the list? I would ask the guideline authors about the issue and propose exceptions to the strict rules.

jbcoe added a subscriber: jbcoe.Jan 11 2018, 12:45 AM
JonasToth marked 3 inline comments as done.
  • address review comments
  • minor issues fixed

Yes, and I'm saying that the guidelines aren't useful for real code bases because they restrict more than is reasonable. So while I think the check is largely implementing what the guidelines recommend, I think that some of these scenarios should be brought back to the guideline authors to weigh in on before we expose the check to users.

I see. Are u willing to identify some of the exceptions with me or don't you have enough time? I would like to propose the first list here. My opinion is, that we should leave some useful macros (e.g. Ensure from guidelines) left for manual inspection and to disable the check for these specific macros.

I can try to help as I have time.

I don't think a whitelist is going to cut it here...

The whitelist could be a regex or something similar. Configuring the check to allow everything with a prefix would allow poor mans namespaces for macros. For llvm it would be LLVM_*, blaze could be BLAZE_*. These are the bigger codebases i know where macro usage was reasonable. :)

Using regex could be a reasonable way out, but isn't going to solve the problem in general because not all of the macro names are ones the user has control over. Having to whitelist dozens of macros by name means this check is simply going to be disabled by the user as being low-value.

... -- users are not going to try to figure out every conditional compilation or attribute macro definition used in their large code base;

Conditional compilation can be done with empty macros.

#define ALLOW_EXCEPTION

#ifdef ALLOW_EXCEPTION
// exception code
#else
// no exception code
#endif

That usecase is in my eyes pretty much like an include guard an therfore acceptable, and i think not diagnosed with the check currently.

Except that's not the only form of conditional compilation I'm worried about. See Compiler.h where we do things like #define __has_attribute(x) 0, which is a perfectly reasonable macro to define (and is an example of a macro name outside of the user's control, so they cannot simply prefix it for a whitelist).

For example, look at Compiler.h in LLVM and try to see how you would rewrite that code to perform the same purposes without triggering diagnostics from this check. That sort of file is common to almost every production code base I've ever come across.
One way to make this less chatty would be to check whether the macro replacement list is defining a source construct that cannot be replaced by constexpr variables or inline functions (such as LLVM_ATTRIBUTE_ALWAYS_INLINE). If we had whole-program analysis, I think we could do something further like scan the uses of the macros to determine whether they're used for conditional compilation (such as LLVM_ENABLE_EXCEPTIONS). However, I have no idea how we would handle cases like LLVM_LIKELY and LLVM_UNLIKELY, but expecting users to NOLINT 500+ lines of common macro code doesn't seem like a good first approach. I'd be curious to know what the C++ Core Guidelines folks think about those use cases and how they would recommend rewriting the code to adhere to their guidelines. Do they really expect users to use *no* macros in C++ code outside of header include guards and noop definitions even when asked about reasonable use cases like cross-compiler attributes or builtins? I mean, many of these things cannot even use the attribute the C++ Core Guidelines require for silencing such diagnostics -- how do they want to use gsl::suppress to silence a diagnostic on a macro definition?

Common constructs that I would identify as valid:

  1. Compiler specific Attributes. #define ALWAYS_INLINE __attribute__((always_inline))

Attributes could be identified when inspecting the tokens the macro expands to. I have no clue how to implement, but i think that should even possible with string manipulations.

Agreed.

  1. Compiler intrinsics

These are similar to attributes and could maybe even treated the same.

I'm less certain about this one -- with attributes, there's syntax that we can inspect (is there an __attribute__ or __declspec keyword?), but with intrinsics there's no commonality. Or are you thinking of tablegenning the list of intrinsics?

  1. Compatibility with older C++ Standards

Stuff like #define OVERRIDE override if the code is compiled with C++11 or newer might be acceptable for stepwise modernization.
This can can be considered as program text manipulation that was explicitly forbidden but does not obscure code so an exception could be made.
Such macros can be detected with string comparisons.

How would you generalize this? I've seen people use macros for arbitrary keywords (#define CONST const and #define false false) as well as macros for arbitrary syntax (#define DEFAULTED {} or #define DEFAULTED = default).

  1. Debugging/Logging stuff that contains the Line and file.

These can not be modeled with current C++(?), but LineInfo or similar exists. I don't know what its status is. I think it is ok to require manual silencing for such macros.

In addition to __LINE__ and __FILE__, I think we want to include __DATE__, __TIME__, and __func__.

@aaron.ballman Do you agree with the list? I would ask the guideline authors about the issue and propose exceptions to the strict rules.

I think this list is a good start, but is likely incomplete. We'll probably discover other things to add to the list when testing the check against other real world code bases to see what patterns emerge from them.

docs/clang-tidy/checks/cppcoreguidelines-macro-usage.rst
9

Coreguidelines -> Core Guidelines

  • [Feature] implement regex for allowed macros
  • [Test] add proper tests for regex silencing
clang-tidy/cppcoreguidelines/MacroUsageCheck.h
16

Please include <string>.

docs/clang-tidy/checks/cppcoreguidelines-macro-usage.rst
9

Will be good idea to separate statements with empty line.

test/clang-tidy/cppcoreguidelines-macro-usage-custom.cpp
18

Unnecessary empty line.

JonasToth marked 2 inline comments as done.
  • address review comments
JonasToth marked 2 inline comments as done.
  • doc fix
JonasToth marked 2 inline comments as done.Jan 15 2018, 11:16 AM
JonasToth added inline comments.
docs/clang-tidy/checks/cppcoreguidelines-macro-usage.rst
9

As a list or how do you mean?

  • address eugene comments

Using regex could be a reasonable way out, but isn't going to solve the problem in general because not all of the macro names are ones the user has control over. Having to whitelist dozens of macros by name means this check is simply going to be disabled by the user as being low-value.

Yes. I think with the regex its exactly implemented like the rules say. But adding sugar is still reasonable. I will run the check over llvm to get a feeling how much is still left after silence LLVM_* and others that are clearly scoped.

Except that's not the only form of conditional compilation I'm worried about. See Compiler.h where we do things like #define __has_attribute(x) 0, which is a perfectly reasonable macro to define (and is an example of a macro name outside of the user's control, so they cannot simply prefix it for a whitelist).

Having a regex allows to add __has_attribute to it or a pattern like __has*. But how many cases for such macros are left? The goal of the guidelines is to change coding style and requiring to silence _SOME_ warnings is reasonable to me.

However, I have no idea how we would handle cases like LLVM_LIKELY and LLVM_UNLIKELY, but expecting users to NOLINT 500+ lines of common macro code doesn't seem like a good first approach. I'd be curious to know what the C++ Core Guidelines folks think about those use cases and how they would recommend rewriting the code to adhere to their guidelines. Do they really expect users to use *no* macros in C++ code outside of header include guards and noop definitions even when asked about reasonable use cases like cross-compiler attributes or builtins? I mean, many of these things cannot even use the attribute the C++ Core Guidelines require for silencing such diagnostics -- how do they want to use gsl::suppress to silence a diagnostic on a macro definition?

I think the long term goal is removing the preprocessor, but especially compiler specifics are very unlikely to get away soon.
They themself make exceptions: ES.33: If you must use macros, give them unique names

  1. Compiler intrinsics

These are similar to attributes and could maybe even treated the same.

I'm less certain about this one -- with attributes, there's syntax that we can inspect (is there an __attribute__ or __declspec keyword?), but with intrinsics there's no commonality. Or are you thinking of tablegenning the list of intrinsics?

I thought mostly of __builtin when proposing. Checking the MSVC docs showed that not all compilers do follow such nice naming conventions. Adding a clang and gcc list of builtins might still be manageable with basic string handling.
I feel that keeping any list of all compiler intrinsics is too much.

  1. Compatibility with older C++ Standards

Stuff like #define OVERRIDE override if the code is compiled with C++11 or newer might be acceptable for stepwise modernization.
This can can be considered as program text manipulation that was explicitly forbidden but does not obscure code so an exception could be made.
Such macros can be detected with string comparisons.

How would you generalize this? I've seen people use macros for arbitrary keywords (#define CONST const and #define false false) as well as macros for arbitrary syntax (#define DEFAULTED {} or #define DEFAULTED = default).

Not sure about all. I think new keywords like #define OVERRIDE override should definitly be allowed. But things like false not, because it already is a keyword in C++98.

I am against allowing #define DEFAULTED because both are perfectly valid and we ship a check for modernizing from one to the other version. This is different to OVERRIDE, because it adds additional value for the programmer who might be stuck at C++98.

  1. Debugging/Logging stuff that contains the Line and file.

These can not be modeled with current C++(?), but LineInfo or similar exists. I don't know what its status is. I think it is ok to require manual silencing for such macros.

In addition to __LINE__ and __FILE__, I think we want to include __DATE__, __TIME__, and __func__.

Agreed.

I think this list is a good start, but is likely incomplete. We'll probably discover other things to add to the list when testing the check against other real world code bases to see what patterns emerge from them.

Testing Frameworks are a common source for macro usage too. They should be coverable with the regex approach, but I don't know if there are other things hidden behind the interface that would be super annoying to silence.
Same for benchmark libraries.

My gut reaction is still that this check is going to be too chatty on real world code bases to be of value beyond compliance with the C++ Core Guidelines, but that's also sufficient enough reason to have the check in the first place. Perhaps a reasonable approach would be to put in the heuristics you think make the most sense, then run the check over some large real world C++ code bases to see how many diagnostics are generated. That will also help you to see code patterns to modify your heuristic until you're happy with the results. Then we can debate whether there's more left to do from the data.

My gut reaction is still that this check is going to be too chatty on real world code bases to be of value beyond compliance with the C++ Core Guidelines, but that's also sufficient enough reason to have the check in the first place. Perhaps a reasonable approach would be to put in the heuristics you think make the most sense, then run the check over some large real world C++ code bases to see how many diagnostics are generated. That will also help you to see code patterns to modify your heuristic until you're happy with the results. Then we can debate whether there's more left to do from the data.

Agreed.

alexfh removed a reviewer: alexfh.Jan 31 2018, 1:53 PM
JonasToth updated this revision to Diff 139707.Mar 24 2018, 4:25 AM
  • Merge branch 'master' into check_macros_usage
  • [Doc] update to new doc style

I checked several codebases and implemented a little helper script to see which kind of macros exist. Then I added some filter regex as configuration and tried again, here are the results:

For https://github.com/nlohmann/json which is a very high quality codebase, the results were as expected: very clean, and good macro usage: (except a #define private public for tests)

Filter: JSON*|NLOHMANN*

CMake on the other hand uses many macros and don't obey CAPS_ONLY rules for all macros.

Filter: _FILE_OFFSET_BITS|AE_*|ARCHIVE_*|cal_*|CMAKE_*|cm*|CM_*|CPP_*|CURL*|E_*|exp_*|F90*|jp*|JSON_*|KW*|kw*|O_*|REQ_*|UV_*|YY*|yy*|Z_*

OpenCV isn't clean either, here the results:

Filter: ASSERT*|ALL*|CAL*|CC*|CALC*|calc*|CL*|cl*|CUDA*|CV*|cv*|EXPECT*|GTEST*|FUNCTOR*|HAVE*|ICV*|IPL*|IPP*|ipp*|__itt*|ITT*|JAS*|jas*|MESSAGE*|MAX*|OCL*|opengl*|OPENCV*|TYP*

libtorrent https://github.com/arvidn/libtorrent has reasonable macro usage

Filter: DEBUG*|LIBTORRENT*|TORRENT*|UNI*

LLVM lib/ defines many macros, almost all of them are ALL_CAPS

Filter: AARCH64*|X86*|ARM*|AMDGPU*|ASAN*|ATTR*|CHECK*|COMMON*|CV*|CXX*|DEBUG*|DEC*|DEFINE*|ESAN*|ENUM*|FMA*|FUZZER*|HANDLE*|INTERCEPTOR*|LSAN*|MATH*|MSAN*|OPENMP*|TSAN*|TYPE*|UBSAN*


LLVM tools/ is a similar story
Filter: ANALYSIS*|ASSERT*|CHECK*|DEBUG*|DECL*|DEPENDENT*|DIAG*|END*|ENUM*|GENERIC*|IMAGE*|KEYWORD*|LANG*|LLVM*|NON*|OBJC*|OPENMP*|OVERLOAD*|PROC*|REGISTER*|SANITIZER*|STMT*|TYPE*|X86*

@aaron.ballman Would you like to see other projects checked? I think this is a starting point for now.

My opinion based on what i see is

  • maybe two modes for this check make sense, having one requiring CAPS_ONLY and the currently implemented stricter check. This allows easier migration to safer macro usage
  • it is possible to reduce macro usage to a minimal amount, and the complex macros like AST stuff can be filtered with the regex. Furthermore, restricting all macros to a "macro namespace" is possible for sure.

Things I would like to add to the check:

  • my little filtering script is valuable for developers, that want to address the macro issue. It should be added to the docs and everyone can make something based on that. It will be linux centered.
  • enforcing ALL_CAPS, including its usage
  • (adding a prefix to all macro, passing the filter, including its usage)

Code transformation has the problems of scope and potentially breaking code badly, because clang-tidy wasn't run over all of the code.

The check is chatty, as expected, because macros in header files, especially central ones, pollute everything. That is the reason for the check, too.
Overall I am still in favor of this approach, seeing some obscure macros that should be replaced with actual language features (like overloading, .inline functions, constants, ...) in the checked codebases.

JonasToth updated this revision to Diff 139793.Mar 26 2018, 8:20 AM
  • [Feature] implement CAPS_ONLY features, adjust docs
JonasToth marked 2 inline comments as done.Mar 26 2018, 8:21 AM
Eugene.Zelenko added inline comments.Mar 26 2018, 9:04 AM
docs/clang-tidy/checks/cppcoreguidelines-macro-usage.rst
27

Will be good idea to rephrase statement: except those with CAPS_ONLY names?

28

is intended?

Eugene.Zelenko added inline comments.Mar 26 2018, 9:05 AM
docs/clang-tidy/checks/cppcoreguidelines-macro-usage.rst
25

Will be good idea to generalize this option with other check which deal with macros.

Which checks do you have in mind?

Am 26.03.2018 um 18:07 schrieb Eugene Zelenko via Phabricator:

Eugene.Zelenko added inline comments.

Comment at: docs/clang-tidy/checks/cppcoreguidelines-macro-usage.rst:24
+
+.. option:: CheckCapsOnly

+

Will be good idea to generalize this option with other check which deal with macros.

Repository:

rCTE Clang Tools Extra

https://reviews.llvm.org/D41648

Which checks do you have in mind?

bugprone-macro-*, bugprone-multiple-statement-macro.

hintonda removed a subscriber: hintonda.Mar 26 2018, 7:17 PM
JonasToth added inline comments.Mar 27 2018, 4:07 AM
docs/clang-tidy/checks/cppcoreguidelines-macro-usage.rst
25

I dont think that the bugprone- macro checks would benefit from such an option. They all address specific issues that can occur with macros and that occur even if you have restrained usage of macros.

The general goal of this check is to remove macros for pretty much everything. The Regex and CapsOnly option only try to lower the bar for enabling this check and allow gradual adoption.

I checked several codebases and implemented a little helper script to see which kind of macros exist. Then I added some filter regex as configuration and tried again, here are the results:

For https://github.com/nlohmann/json which is a very high quality codebase, the results were as expected: very clean, and good macro usage: (except a #define private public for tests)

Filter: JSON*|NLOHMANN*

CMake on the other hand uses many macros and don't obey CAPS_ONLY rules for all macros.

Filter: _FILE_OFFSET_BITS|AE_*|ARCHIVE_*|cal_*|CMAKE_*|cm*|CM_*|CPP_*|CURL*|E_*|exp_*|F90*|jp*|JSON_*|KW*|kw*|O_*|REQ_*|UV_*|YY*|yy*|Z_*

OpenCV isn't clean either, here the results:

Filter: ASSERT*|ALL*|CAL*|CC*|CALC*|calc*|CL*|cl*|CUDA*|CV*|cv*|EXPECT*|GTEST*|FUNCTOR*|HAVE*|ICV*|IPL*|IPP*|ipp*|__itt*|ITT*|JAS*|jas*|MESSAGE*|MAX*|OCL*|opengl*|OPENCV*|TYP*

libtorrent https://github.com/arvidn/libtorrent has reasonable macro usage

Filter: DEBUG*|LIBTORRENT*|TORRENT*|UNI*

LLVM lib/ defines many macros, almost all of them are ALL_CAPS

Filter: AARCH64*|X86*|ARM*|AMDGPU*|ASAN*|ATTR*|CHECK*|COMMON*|CV*|CXX*|DEBUG*|DEC*|DEFINE*|ESAN*|ENUM*|FMA*|FUZZER*|HANDLE*|INTERCEPTOR*|LSAN*|MATH*|MSAN*|OPENMP*|TSAN*|TYPE*|UBSAN*


LLVM tools/ is a similar story
Filter: ANALYSIS*|ASSERT*|CHECK*|DEBUG*|DECL*|DEPENDENT*|DIAG*|END*|ENUM*|GENERIC*|IMAGE*|KEYWORD*|LANG*|LLVM*|NON*|OBJC*|OPENMP*|OVERLOAD*|PROC*|REGISTER*|SANITIZER*|STMT*|TYPE*|X86*

@aaron.ballman Would you like to see other projects checked? I think this is a starting point for now.

My opinion based on what i see is

  • maybe two modes for this check make sense, having one requiring CAPS_ONLY and the currently implemented stricter check. This allows easier migration to safer macro usage
  • it is possible to reduce macro usage to a minimal amount, and the complex macros like AST stuff can be filtered with the regex. Furthermore, restricting all macros to a "macro namespace" is possible for sure.

Things I would like to add to the check:

  • my little filtering script is valuable for developers, that want to address the macro issue. It should be added to the docs and everyone can make something based on that. It will be linux centered.
  • enforcing ALL_CAPS, including its usage
  • (adding a prefix to all macro, passing the filter, including its usage)

Code transformation has the problems of scope and potentially breaking code badly, because clang-tidy wasn't run over all of the code.

The check is chatty, as expected, because macros in header files, especially central ones, pollute everything. That is the reason for the check, too.
Overall I am still in favor of this approach, seeing some obscure macros that should be replaced with actual language features (like overloading, .inline functions, constants, ...) in the checked codebases.

@aaron.ballman Did you have time to look at my analysis result?

I checked several codebases and implemented a little helper script to see which kind of macros exist. Then I added some filter regex as configuration and tried again, here are the results:

For https://github.com/nlohmann/json which is a very high quality codebase, the results were as expected: very clean, and good macro usage: (except a #define private public for tests)

Filter: JSON*|NLOHMANN*

CMake on the other hand uses many macros and don't obey CAPS_ONLY rules for all macros.

Filter: _FILE_OFFSET_BITS|AE_*|ARCHIVE_*|cal_*|CMAKE_*|cm*|CM_*|CPP_*|CURL*|E_*|exp_*|F90*|jp*|JSON_*|KW*|kw*|O_*|REQ_*|UV_*|YY*|yy*|Z_*

This is a pretty long list of macro patterns to have to filter, but doesn't look to problematic.

OpenCV isn't clean either, here the results:

Filter: ASSERT*|ALL*|CAL*|CC*|CALC*|calc*|CL*|cl*|CUDA*|CV*|cv*|EXPECT*|GTEST*|FUNCTOR*|HAVE*|ICV*|IPL*|IPP*|ipp*|__itt*|ITT*|JAS*|jas*|MESSAGE*|MAX*|OCL*|opengl*|OPENCV*|TYP*

This one worries me a bit more because of patterns like cl* and calc* -- those seem like they're not uncommon and the pattern may silence otherwise valid diagnostics.

libtorrent https://github.com/arvidn/libtorrent has reasonable macro usage

Filter: DEBUG*|LIBTORRENT*|TORRENT*|UNI*

LLVM lib/ defines many macros, almost all of them are ALL_CAPS

Filter: AARCH64*|X86*|ARM*|AMDGPU*|ASAN*|ATTR*|CHECK*|COMMON*|CV*|CXX*|DEBUG*|DEC*|DEFINE*|ESAN*|ENUM*|FMA*|FUZZER*|HANDLE*|INTERCEPTOR*|LSAN*|MATH*|MSAN*|OPENMP*|TSAN*|TYPE*|UBSAN*


LLVM tools/ is a similar story
Filter: ANALYSIS*|ASSERT*|CHECK*|DEBUG*|DECL*|DEPENDENT*|DIAG*|END*|ENUM*|GENERIC*|IMAGE*|KEYWORD*|LANG*|LLVM*|NON*|OBJC*|OPENMP*|OVERLOAD*|PROC*|REGISTER*|SANITIZER*|STMT*|TYPE*|X86*

@aaron.ballman Would you like to see other projects checked? I think this is a starting point for now.

I think this is a good starting point. It's certainly demonstrated that even clean projects cannot conform to the C++ Core Guidelines.

My opinion based on what i see is

  • maybe two modes for this check make sense, having one requiring CAPS_ONLY and the currently implemented stricter check. This allows easier migration to safer macro usage

I think I agree.

  • it is possible to reduce macro usage to a minimal amount, and the complex macros like AST stuff can be filtered with the regex. Furthermore, restricting all macros to a "macro namespace" is possible for sure.

I'm not certain I understand what you mean by "macro namespace". Can you clarify?

I'm still a bit worried about using regex to filter the results. It seems like any real world project is going to require somewhere between a reasonable and an unreasonable amount of configuration to reduce the noise. Perhaps as a first pass, however, it will suffice. Hopefully we can add other heuristics to reduce the false positives as we see patterns emerge from real world usage.

Things I would like to add to the check:

  • my little filtering script is valuable for developers, that want to address the macro issue. It should be added to the docs and everyone can make something based on that. It will be linux centered.

Why does the usage need to be limited to Linux?

  • enforcing ALL_CAPS, including its usage

What does "including its usage" mean?

  • (adding a prefix to all macro, passing the filter, including its usage)

Code transformation has the problems of scope and potentially breaking code badly, because clang-tidy wasn't run over all of the code.

The check is chatty, as expected, because macros in header files, especially central ones, pollute everything. That is the reason for the check, too.
Overall I am still in favor of this approach, seeing some obscure macros that should be replaced with actual language features (like overloading, .inline functions, constants, ...) in the checked codebases.

OpenCV isn't clean either, here the results:

Filter: ASSERT*|ALL*|CAL*|CC*|CALC*|calc*|CL*|cl*|CUDA*|CV*|cv*|EXPECT*|GTEST*|FUNCTOR*|HAVE*|ICV*|IPL*|IPP*|ipp*|__itt*|ITT*|JAS*|jas*|MESSAGE*|MAX*|OCL*|opengl*|OPENCV*|TYP*

This one worries me a bit more because of patterns like cl* and calc* -- those seem like they're not uncommon and the pattern may silence otherwise valid diagnostics.

The macros are worriesome. Instead of using proper function overloading or similar constructs, they defined macros that replace function names + arguments. I would say, these macros were laziness, because C++ provides the proper language tools to deal with the missing overloading capabilities for C functions. I removed them because some many macros did exist.

  • it is possible to reduce macro usage to a minimal amount, and the complex macros like AST stuff can be filtered with the regex. Furthermore, restricting all macros to a "macro namespace" is possible for sure.

I'm not certain I understand what you mean by "macro namespace". Can you clarify?

With "macro namespace" i mean a common prefix for macros, like LLVM_UNREACHABLE, DEBUG_...

I'm still a bit worried about using regex to filter the results. It seems like any real world project is going to require somewhere between a reasonable and an unreasonable amount of configuration to reduce the noise. Perhaps as a first pass, however, it will suffice. Hopefully we can add other heuristics to reduce the false positives as we see patterns emerge from real world usage.

It was hard to find reasonable patterns. I definitely saw the usage that is not supposed to happen, like simulating inline functions, overloading, ...

Things I would like to add to the check:

  • my little filtering script is valuable for developers, that want to address the macro issue. It should be added to the docs and everyone can make something based on that. It will be linux centered.

Why does the usage need to be limited to Linux?

I created small sed scripts with a chain of uniq and sort, so it will be UNIX, but windows falls short i guess.

  • enforcing ALL_CAPS, including its usage

What does "including its usage" mean?

The usage of the macro in code. That means not only definition of the macro is replaced, but all occurences. This can only be done if *all* macros are treated the same. Otherwise different TU, different transformation.

Code transformation has the problems of scope and potentially breaking code badly, because clang-tidy wasn't run over all of the code.

aaron.ballman added inline comments.Apr 24 2018, 4:36 AM
clang-tidy/cppcoreguidelines/MacroUsageCheck.cpp
78

function like -> function-like

Why is constexpr in parentheses?

81

Should this also suggest a constexpr variadic template?

87

How about: macro definition does not define the macro name using all uppercase characters or something along those lines?

  • Merge branch 'master' into check_macros_usage
JonasToth marked 7 inline comments as done.
  • Merge branch 'master' into check_macros_usage
  • address review comments
JonasToth updated this revision to Diff 168094.Oct 3 2018, 3:32 AM
  • Merge branch 'master' into check_macros_usage
JonasToth updated this revision to Diff 168095.Oct 3 2018, 3:40 AM
  • add more positive tests
  • Merge branch 'master' into check_macros_usage

@aaron.ballman What do you think of the current version of this check?
As migration strategy the option for CAPS_ONLY is provided, otherwise the filtering is provided to silence necessary macros (which kind of enforces a common prefix for macros). This does implement both the cppcoreguidelines and allows migration.

@aaron.ballman What do you think of the current version of this check?
As migration strategy the option for CAPS_ONLY is provided, otherwise the filtering is provided to silence necessary macros (which kind of enforces a common prefix for macros). This does implement both the cppcoreguidelines and allows migration.

I think that's a reasonable approach.

clang-tidy/cppcoreguidelines/MacroUsageCheck.cpp
79

Aren't all vararg macros also function-like macros, so this will trigger two diagnostics for all variadic macro definitions?

docs/clang-tidy/checks/cppcoreguidelines-macro-usage.rst
27

warn all macros -> warn on all macros

test/clang-tidy/cppcoreguidelines-macro-usage-custom.cpp
15

Please add a test like:

#define PROBLEMATIC_VARIADIC_TWO(x, ...) (__VA_ARGS__)

To ensure that you don't get two diagnostics (one for function-like and one for variadic).

JonasToth updated this revision to Diff 169402.Oct 12 2018, 7:57 AM
JonasToth marked 3 inline comments as done.
  • add tests and adjust doc
  • one more test case

Updated

clang-tidy/cppcoreguidelines/MacroUsageCheck.cpp
79

MacroInfo contains bits about function-like and beeing variadic separatly (gnu variadic as well, but thats included in isVariadic). There are no double warnings created.

JonasToth marked an inline comment as done.Oct 12 2018, 7:57 AM
JonasToth updated this revision to Diff 169403.Oct 12 2018, 7:59 AM
  • remove unused enum in header file, no idea what i intended to do with it :D
JonasToth updated this revision to Diff 169410.Oct 12 2018, 8:29 AM
  • make the logic with variadic clear
aaron.ballman accepted this revision.Oct 17 2018, 2:42 PM

LGTM aside from some minor nits.

clang-tidy/cppcoreguidelines/MacroUsageCheck.cpp
72

It seems unnecessarily complicated, to me, to use a lambda here. Why not use a StringRef local variable?

StringRef Msg = "blah";
if (Info->isVariadic())
  Msg = "blah";
else if (Info->isFunctionLike())
  Msg = "blah";
79

Nit: else after return

82

Nit: same

clang-tidy/cppcoreguidelines/MacroUsageCheck.h
35–36

Nit: these can be private functions instead.

This revision is now accepted and ready to land.Oct 17 2018, 2:42 PM
JonasToth marked 4 inline comments as done.
  • address review nits
clang-tidy/cppcoreguidelines/MacroUsageCheck.h
35–36

They are used by the PPCallbacks. I think adding a friend is not worth it.

JonasToth marked an inline comment as done.Oct 22 2018, 12:13 PM
  • add missing private
  • [Doc] add missing release notes
This revision was automatically updated to reflect the committed changes.
lebedev.ri added inline comments.
clang-tidy/cppcoreguidelines/MacroUsageCheck.cpp
68

And once again, i'm going to bitch about useless diagnostic messages:

[1/357] Building CXX object src/CMakeFiles/rawspeed.dir/librawspeed/metadata/CameraMetaData.cpp.o
FAILED: src/CMakeFiles/rawspeed.dir/librawspeed/metadata/CameraMetaData.cpp.o 
/usr/bin/cmake -E __run_co_compile --tidy=/usr/local/bin/clang-tidy --source=../src/librawspeed/metadata/CameraMetaData.cpp -- /usr/local/bin/clang++  -DDEBUG -D_GLIBCXX_SANITIZE_VECTOR -Isrc -I../src/librawspeed -isystem ../src/external -std=c++14 -Wall -Wextra -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-conversion -Wno-covered-switch-default -Wno-deprecated -Wno-double-promotion -Wno-exit-time-destructors -Wno-global-constructors -Wno-gnu-zero-variadic-macro-arguments -Wno-old-style-cast -Wno-padded -Wno-switch-enum -Wno-unused-macros -Wno-unused-parameter -Wno-weak-vtables -Wno-zero-as-null-pointer-constant -Wextra-semi -O3 -fno-optimize-sibling-calls -fsanitize=address -fno-omit-frame-pointer -fno-common -U_FORTIFY_SOURCE -fsanitize-address-use-after-scope -fsanitize=undefined -fno-sanitize-recover=undefined -fsanitize=integer -fno-sanitize-recover=integer -fPIC   -march=native -g3 -ggdb3 -Werror -pthread -fopenmp=libomp -std=c++14 -MD -MT src/CMakeFiles/rawspeed.dir/librawspeed/metadata/CameraMetaData.cpp.o -MF src/CMakeFiles/rawspeed.dir/librawspeed/metadata/CameraMetaData.cpp.o.d -o src/CMakeFiles/rawspeed.dir/librawspeed/metadata/CameraMetaData.cpp.o -c ../src/librawspeed/metadata/CameraMetaData.cpp
error: macro used to declare a constant; consider using a 'constexpr' constant [cppcoreguidelines-macro-usage,-warnings-as-errors]
error: macro used to declare a constant; consider using a 'constexpr' constant [cppcoreguidelines-macro-usage,-warnings-as-errors]
9174 warnings generated.
Suppressed 9172 warnings (9172 in non-user code).
Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
2 warnings treated as errors
ninja: build stopped: subcommand failed.
  1. It complains about something that has to location (so it was declared in command line?)
  2. I have absolutely no clue whatsoever on what macro it complains. And i don't really see a way to find it out, without modifying clang-tidy :/