This is an archive of the discontinued LLVM Phabricator instance.

clang-tidy checker that enforce proper parentheses in macros
ClosedPublic

Authored by danielmarjamaki on May 6 2015, 7:16 AM.

Details

Summary

This clang-tidy checker will make sure macros have proper parentheses.

I previously made a compiler-check for this but that only warned when bad usage was seen. This clang-tidy checker warns when dangerous macros are seen.

The rules are:

  • Expressions must be surrounded with parentheses.
  • Arguments must be surrounded with parentheses.

I have tested this on 193 debian projects so far. there was 47668 warnings. I have not looked at every warning in detail but overall it looks very good. The majority of the warnings point out dangerous macros that can be misused, and I therefore classify them as true positives. Maybe there are some 1-5% that are "false positives", in most of these cases it would not hurt to add parentheses.

Example false positives:

#define   A     (INT)&a

Is the & a bitand or a address-of operator. It's not trivial to know when looking at the tokens. Putting parentheses around the macro doesn't hurt.

#define   A     INT*

... I warn about this. Can be fixed by using typedef instead. But maybe user wants to have it this way.

and then sometimes there are warnings inside ({..}) that I think ideally would not be shown. I could bailout for instance when a { is seen in the macro, but most warnings are true positives so I think such bailout would mostly cause false negatives.

Diff Detail

Event Timeline

danielmarjamaki retitled this revision from to clang-tidy checker that enforce proper parentheses in macros.
danielmarjamaki updated this object.
danielmarjamaki edited the test plan for this revision. (Show Details)
danielmarjamaki added a subscriber: Unknown Object (MLST).
alexfh edited edge metadata.May 8 2015, 7:25 AM

Sorry for the delay. I was on vacation and currently digging out stuff out of my inbox. I'll try to review this thoroughly early next week. A couple of comments for now.

clang-tidy/misc/MacroParenthesesCheck.cpp
33

Here and below. Please use C++ style comments (// or /// for Doxygen comments). The comments should be English prose with proper capitalization and punctuation.

clang-tidy/misc/MacroParenthesesCheck.h
19

Please add a class comment describing the purpose of this check, motivation for it and what specific code patterns it triggers on.

I agree with most Aaron's comments on the list. The main concern is the noisiness of the check: it makes sense to look at a bigger sample of results and see whether the warning should be silenced in more cases. Another possibility to make the check more usable would be to detect cases where automated fixes would be possible. Otherwise I doubt anyone will be willing to fix hundreds of warnings in their projects manually.

A few more comments inline.

clang-tidy/misc/MacroParenthesesCheck.cpp
69

This looks like a reason to move isOneOf from clang::format::FormatToken (tools/clang/lib/Format/FormatToken.h) to clang::Token. It would greatly simplify this and similar conditions.

78

TI + 2 == TE implies TI + 1 != TE. I'd also put it next to TI == MI->tokens_begin().

87

nit: No need for a variable here, just use the expression below.

The main concern is the noisiness of the check: it makes sense to look at a bigger sample of results and see whether the warning should be silenced in more cases.

Do you have a suggestion how I share the results?

clang-tidy/misc/MacroParenthesesCheck.cpp
78

Yes that is implied. But are you sure it's not UB? I wanted to avoid undefined behaviour when creating an out-of-bounds pointer (calculating TI+2 without checking TI+1). is the buffer always padded with extra elements after TE?

The main concern is the noisiness of the check: it makes sense to look at a bigger sample of results and see whether the warning should be silenced in more cases.

Do you have a suggestion how I share the results?

Well, there are many convenient ways to do this. You can use Google Docs, for example.

clang-tidy/misc/MacroParenthesesCheck.cpp
78

Why does the buffer need to be padded with something when you compare iterators without dereferencing them? I didn't find any violation of the preconditions for random access iterators (http://en.cppreference.com/w/cpp/concept/RandomAccessIterator).

A few random issues I noticed on the first couple of pages of results you posted on a different thread:

../src/transmit.c:37:25: warning: macro replacement list should be enclosed in parentheses [misc-macro-parentheses]
#define MKQUERY_ADDB(b) *rqp++= (b)
                        ^

It looks like the check is complaining about the rgp variable, which is not a parameter of the macro.

../src/addrfam.c:538:3: warning: macro replacement list should be enclosed in parentheses [misc-macro-parentheses]
  const char *p= dgram + rps->labstart[labnum]; \
  ^

Does it complain about const? Then it looks like the same issue as the first one.

adnstest.c:82:3: warning: macro replacement list should be enclosed in parentheses [misc-macro-parentheses]
  case e: fprintf(stderr,"adns failure: %s: errno=" #e "\n",what); break
  ^
adnstest.c:82:8: warning: macro argument should be enclosed in parentheses [misc-macro-parentheses]
  case e: fprintf(stderr,"adns failure: %s: errno=" #e "\n",what); break
       ^

Same.

It looks like a frequent case, so fixing this would help fixing lots of false positives.

../src/addrfam.c:47:48: warning: macro argument should be enclosed in parentheses [misc-macro-parentheses]
#define SIN(cnst, sa) ((void)(sa)->sa_family, (cnst struct sockaddr_in *)(sa))
                                               ^

cnst is a parameter of the macro, but parens are not needed (if acceptable) in this context. The rule only makes sense for the macro arguments that can be expressions, so you should try to restrict the check to the cases where macro argument resembles an expression or when the context where the parameter is expanded looks like an expression. I'm not sure how exactly this can be done, and whether it's possible to do this precisely enough, but some cases can definitely be improved.

A few random issues I noticed on the first couple of pages of results you posted on a different thread:

../src/transmit.c:37:25: warning: macro replacement list should be enclosed in parentheses [misc-macro-parentheses]
#define MKQUERY_ADDB(b) *rqp++= (b)
                        ^

It looks like the check is complaining about the rgp variable, which is not a parameter of the macro.

../src/addrfam.c:538:3: warning: macro replacement list should be enclosed in parentheses [misc-macro-parentheses]
  const char *p= dgram + rps->labstart[labnum]; \
  ^

Does it complain about const? Then it looks like the same issue as the first one.

adnstest.c:82:3: warning: macro replacement list should be enclosed in parentheses [misc-macro-parentheses]
  case e: fprintf(stderr,"adns failure: %s: errno=" #e "\n",what); break
  ^
adnstest.c:82:8: warning: macro argument should be enclosed in parentheses [misc-macro-parentheses]
  case e: fprintf(stderr,"adns failure: %s: errno=" #e "\n",what); break
       ^

Same.

It looks like a frequent case, so fixing this would help fixing lots of false positives.

../src/addrfam.c:47:48: warning: macro argument should be enclosed in parentheses [misc-macro-parentheses]
#define SIN(cnst, sa) ((void)(sa)->sa_family, (cnst struct sockaddr_in *)(sa))
                                               ^

cnst is a parameter of the macro, but parens are not needed (if acceptable) in this context. The rule only makes sense for the macro arguments that can be expressions, so you should try to restrict the check to the cases where macro argument resembles an expression or when the context where the parameter is expanded looks like an expression. I'm not sure how exactly this can be done, and whether it's possible to do this precisely enough, but some cases can definitely be improved.

Hmm.. I think that the results are a bit confusing. When it says "macro replacement list should be enclosed in parentheses" I want that the whole expression is surrounded with parentheses.

It looks like the check is complaining about the rgp variable, which is not a parameter of the macro.

Yes it is confusing.. maybe it shouldn't point at the rgp.

Does it complain about const? Then it looks like the same issue as the first one.

no.. but I do think it's a confusing message.

cnst is a parameter of the macro, but parens are not needed (if acceptable) in this context. The rule only makes sense for the macro arguments that can be expressions, so you should try to restrict the check to the cases where macro argument resembles an expression or when the context where the parameter is expanded looks like an expression. I'm not sure how exactly this can be done, and whether it's possible to do this precisely enough, but some cases can definitely be improved.

I agree .. maybe it's acceptable to hide the warning when the next token is a name.

clang-tidy/misc/MacroParenthesesCheck.cpp
33

ok will try to fix in next patch

danielmarjamaki edited edge metadata.

The argument checking was rewritten.

Here are updated results I made today with latest patch:

https://drive.google.com/file/d/0BykPmWrCOxt2S2lkZWpOSEJKUHM/view

clang-tidy/misc/MacroParenthesesCheck.cpp
78

TI is a pointer.

I refer to 6.5.6.8 in the C standard (about pointer addition):

If both the pointer operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

78

hmm.. maybe I was confused. I thought it was a pointer addition.. however random access iterators should not overflow neither as far as I know..

78

I can't find that TI+2 is UB right now in the specification.

However for information this sample code:

#include <vector>

int main() {
  std::vector<int> x;
  x.push_back(123);
  std::vector<int>::iterator it = x.begin();
  if (it+2 == x.end()) {}
  return 0;
}

When I compile and run it with the STL assertions turned on I get a fault..

danielm@debian:~$ g++ -D_GLIBCXX_DEBUG -g bad-stl.cpp -o bad-stl                                                                                                         
danielm@debian:~$ ./bad-stl                                                                                                                    
/usr/include/c++/4.7/debug/safe_iterator.h:359:error: attempt to advance a                                                                                                       
    dereferenceable (start-of-sequence) iterator 2 steps, which falls     
    outside its valid range.

Here are updated results I made today with latest patch:

https://drive.google.com/file/d/0BykPmWrCOxt2S2lkZWpOSEJKUHM/view

This time I found just one issue: in for (...; X; ...) ... X doesn't need parentheses. Though it looks better, the check is rather noisy and without a way to fix the issues automatically it will require too much cleanup efforts to be used on any real code base. It's also difficult to point out false positives just by looking at macro definitions.

But if the check provided fix-it hints, it would be easier to test it on real code and it would also be possible to clean up code before starting to use the check.

WDYT?

clang-tidy/misc/MacroParenthesesCheck.cpp
78

Fair enough. Thanks for trying.

141

nit: "Don't"

149

nit: Add a trailing period.

158

nit: Initialization.

Updated patch:

  • Added fixits
  • Tweaked code comments
  • Avoid FP when argument is surrounded with semicolon ;X;

And now it would be nice to apply fixes to a bunch of code (at least, to LLVM+Clang and a few other projects where you had detected issues), and see if anything breaks as a result of the fixes.

danielmarjamaki updated this revision to Diff 26953.EditedJun 1 2015, 11:20 PM

when there is semicolon/colon in the replacement list don't warn that it should be surrounded with parentheses.

have verified that -fix works by fixing various warnings in a52dec, abcmidi, adjtime and adol.

when checking llvm/clang this checker does not generate any warnings.

Here are the results with the latest changes: https://drive.google.com/file/d/0BykPmWrCOxt2M1hmRm5qbG1CbXM/view?usp=sharing

alexfh added a comment.Jun 2 2015, 5:25 AM

A few nits, but looks good in general. I would also like to clear a couple of questions before you submit this.

Did you encounter any cases when the code broke after applying fixes? How many instances of the warning does the latest version of the check produce on the same set of projects where you saw 47k warnings initially? I would also like to see an estimate of the false positive rate (from a random sample of 100 warnings).

clang-tidy/misc/MacroParenthesesCheck.cpp
33

nit: Missing trailing period.

36

nit: Missing trailing period, no capitalization.

61

What is a "calculation operator"? How is it different from other binary operators in this context?

90

I'd make this even stricter: if the replacement list contains unbalanced parentheses/braces/brackets, then it's clearly not meant to be enclosed in parentheses.

clang-tidy/misc/MacroParenthesesCheck.h
26

nit: Missing comma after "expression".

33

nit: Remove the empty line, please.

danielmarjamaki added a comment.EditedJun 2 2015, 11:28 AM

Did you encounter any cases when the code broke after applying fixes?

No I did not. The projects compiled fine after the fixes.

How many instances of the warning does the latest version of the check produce on the same set of projects where you saw 47k warnings initially?

I can't say exactly right now. It's a little less as far as I know.

I would also like to see an estimate of the false positive rate (from a random sample of 100 warnings).

I fixed 300 warnings with -fix and saw no compiler warnings.

However I looked now at 100 random warnings and saw 5 fp (for type definitions)! I can try to write a heuristic for some of those.

clang-tidy/misc/MacroParenthesesCheck.cpp
69

Yes... I added a new patch today where this code was rewritten. I did not move isOneOf.. do you still I should do it?

78

This code has been rewritten.

87

Fixed

141

Fixed

149

Fixed.

158

ah.. I missed this.. I'll fix it immediately locally..

alexfh added a comment.Jun 3 2015, 7:15 AM

Did you encounter any cases when the code broke after applying fixes?

No I did not. The projects compiled fine after the fixes.

How many instances of the warning does the latest version of the check produce on the same set of projects where you saw 47k warnings initially?

I can't say exactly right now. It's a little less as far as I know.

I would also like to see an estimate of the false positive rate (from a random sample of 100 warnings).

I fixed 300 warnings with -fix and saw no compiler warnings.

However I looked now at 100 random warnings and saw 5 fp (for type definitions)! I can try to write a heuristic for some of those.

So the incorrect fixes in these cases were applied, but the code still compiled? Can you give a couple of examples?

clang-tidy/misc/MacroParenthesesCheck.cpp
69

The new code seems to use this pattern as well. It would be nice if you moved and used the isOneOf instead.

danielmarjamaki added a comment.EditedJun 4 2015, 10:35 PM

Did you encounter any cases when the code broke after applying fixes?

No I did not. The projects compiled fine after the fixes.

How many instances of the warning does the latest version of the check produce on the same set of projects where you saw 47k warnings initially?

I can't say exactly right now. It's a little less as far as I know.

I would also like to see an estimate of the false positive rate (from a random sample of 100 warnings).

I fixed 300 warnings with -fix and saw no compiler warnings.

However I looked now at 100 random warnings and saw 5 fp (for type definitions)! I can try to write a heuristic for some of those.

So the incorrect fixes in these cases were applied, but the code still compiled? Can you give a couple of examples?

No.

I only used -fix on the first 300 warnings in the results log I showed. There were no type definitions there. I believe all these 300 are true positives.

After that I looked at random warnings in the results log as you had requested. I did not use -fix on these warnings. I do not think that 5% of the warnings are FP I just think that I was unlucky when I did my random selection. But through "bad luck" I believe I discovered some FP warnings that I should hide with a heuristic.

Updated patch for misc-macro-parentheses checker in clang-tidy. This patch now also avoids fp warnings for type definitions.

I have scanned 100 projects with this checker today and got 42637 warnings. When comparing with the previous results, fp warnings for type definitions have been removed but no tp were removed.

To clarify, warnings are often duplicated when a there is a dangerous macro in a header that is reused from many files. So there are not 42637 unique warnings.

I have looked at > 100 warnings throughout the results file and dont see false positives.

alexfh added a comment.Jun 5 2015, 7:20 AM

I have scanned 100 projects with this checker today and got 42637 warnings. When comparing with the previous results, fp warnings for type definitions have been removed but no tp were removed.

This sounds interesting, but it would be nice to know how many warnings were generated on the same code previously. Is there a large enough subset of the projects you analyzed previously (the "193 debian projects") and now?

To clarify, warnings are often duplicated when a there is a dangerous macro in a header that is reused from many files. So there are not 42637 unique warnings.

That doesn't make the stats more useful as well. Clang-tidy deduplicates warnings on each run, but when multiple instances run, post-processing should be done externally. Maybe you could use some grep|sort|uniq-foo to count just the unique source locations? Having the output in the one line per warning form would also allow to choose a random subset easily (shuf|head -n100).

I have looked at > 100 warnings throughout the results file and dont see false positives.

That also sounds good.

Can you fix the issues pointed out in earlier reviews as well?

danielmarjamaki added a comment.EditedJun 7 2015, 11:32 PM

I have scanned 100 projects with this checker today and got 42637 warnings. When comparing with the previous results, fp warnings for type definitions have been removed but no tp were removed.

This sounds interesting, but it would be nice to know how many warnings were generated on the same code previously. Is there a large enough subset of the projects you analyzed previously (the "193 debian projects") and now?

I did not keep those original warnings.

I can rerun the old patch again and that should generate very similar results but not identical. The source code in the packages I scan are continuously updated.

To clarify, warnings are often duplicated when a there is a dangerous macro in a header that is reused from many files. So there are not 42637 unique warnings.

That doesn't make the stats more useful as well. Clang-tidy deduplicates warnings on each run, but when multiple instances run, post-processing should be done externally. Maybe you could use some grep|sort|uniq-foo to count just the unique source locations? Having the output in the one line per warning form would also allow to choose a random subset easily (shuf|head -n100).

I will look into this.

I have looked at > 100 warnings throughout the results file and dont see false positives.

That also sounds good.

Can you fix the issues pointed out in earlier reviews as well?

will do that.

I have copied the isOneOf from FormatToken. I did not want to move TokenType etc and therefore the FormatToken still has isOneOf also. As far as I see all comments in the code has been taken care of.

I've rerun original and latest patch with my script and compared warnings. This time duplicated warnings was filtered out.

Original patch:
projects: 217
warnings: 3758

Latest patch:
projects: 100
warnings: 2529

The "193 projects" I initially talked about should be a subset of the projects I checked today.. unless some project has been removed from the debian archive I'm checking last month.

Latest patch gives more warnings than the original patch. The original checker stopped the checking of a macro when a problem was found. Now all violations in the macro are reported (and fixed).

Also when comparing warnings from the original patch and latest patch there are some warnings that are not reported with latest patch. I don't see false negatives, only fixed false positives.

alexfh added inline comments.Jun 8 2015, 7:13 AM
include/clang/Lex/Token.h
97 ↗(On Diff #27309)

Thanks! This should go in a separate patch though. And I would appreciate if you updated FormatToken::isOneOf to just redirect to this implementation.

tools/extra/clang-tidy/misc/MacroParenthesesCheck.cpp
121 ↗(On Diff #27309)

!Tok.isOneOf(tok::identifier, tok::raw_identifier)

136 ↗(On Diff #27309)

I'd make Prev and Next instances of clang::Token and used is and isOneOf below and inside all of your is.* methods.

tools/extra/clang-tidy/misc/MacroParenthesesCheck.h
32 ↗(On Diff #27309)

Please remove the empty line.

edse added inline comments.
include/clang/Lex/Token.h
97 ↗(On Diff #27309)

Separate patch is no problem, but redirecting that is a simple word.. I've come up with something like this.

template<> bool isOneOf<tok::TokenKind, tok::TokenKind>
(tok::TokenKind K1, tok::TokenKind K2) const {
  return Tok.isOneOf(K1, K2);
}

but the variadic function is more tricky, I always get the function template partial specialization is not allowed warning, and to just redirect one of the functions is imho not a good idea.

My proposal is to leave this as is, do you have a comment?

alexfh added inline comments.Jun 10 2015, 8:21 AM
include/clang/Lex/Token.h
97 ↗(On Diff #27309)

It would be nice as a separate patch as it changes a totally different part of the project (residing in a different part of the repository, btw).

As for FormatToken::isOneOf, I missed that it also works for other FormatToken::is overloads (taking TokenType and IdentifierInfo *), so it doesn't make sense to redirect it.

refactoring the code with Token::isOneOf()

alexfh added inline comments.Jun 15 2015, 5:53 AM
clang-tidy/misc/MacroParenthesesCheck.cpp
33

Seems like you've missed this.

36

Seems like you've missed this.

61

Seems like you've missed this.

90

In this comment I meant to ask you to handle the Count < 0 case separately: when it happens, we clearly don't want to complain about missing parentheses or add them.

Sorry if it wasn't clear enough.

clang-tidy/misc/MacroParenthesesCheck.h
26

Seems like you've missed this.

33

Seems like you've missed this.

Fixed comments that I had missed to fix in my previous patch

alexfh accepted this revision.Jun 16 2015, 6:59 AM
alexfh edited edge metadata.

Looks good! Let's try this. If it turns out to generate too many false positives, we may need to disable the check by default. I'll let you know if I find any issues.

Thank you for the contribution!

This revision is now accepted and ready to land.Jun 16 2015, 6:59 AM

Committed with 239820.

clang-tidy/misc/MacroParenthesesCheck.cpp
33

Fixed

36

Fixed

61

Fixed

90

Fixed

clang-tidy/misc/MacroParenthesesCheck.h
19

Fixed

26

Fixed

33

Yes, fixed