Page MenuHomePhabricator

[clang-tidy] Fix a false positive in misc-redundant-expression check
ClosedPublic

Authored by dkrupp on Nov 30 2018, 8:00 AM.

Details

Summary

Do not warn for redundant conditional expressions
when the true and false branches are expanded from
different macros even when they are
defined by one another.

We used to get a false warning in the following case:

#define M1 1
#define M2 M1
int test(int cond) {
 return cond ? M1 : M2; // false warning: 'true' and 'false' expressions are equivalent
}

The problem was that the Lexer::getImmediateMacroName() call was used for comparing macros, which returned "M1" for the expansion of M1 and M2.
Now we are comparing macros from token to token.

Diff Detail

Event Timeline

dkrupp created this revision.Nov 30 2018, 8:00 AM
Szelethus retitled this revision from Fix a false positive in misc-redundant-expression check to [clang-tidy] Fix a false positive in misc-redundant-expression check.Nov 30 2018, 8:16 AM
Szelethus added a reviewer: JonasToth.

@JonasToth this is the Lexer based expression equality check I talked about in D54757#1311516. The point of this patch is that the definition is a macro sure could be build dependent, but the macro name is not, so it wouldn't warn on the case I showed. What do you think?

@JonasToth this is the Lexer based expression equality check I talked about in D54757#1311516. The point of this patch is that the definition is a macro sure could be build dependent, but the macro name is not, so it wouldn't warn on the case I showed. What do you think?

Yes, this approach is possible.
IMHO it is still a bug/redudant if you do the same thing on both paths and warning on it makes the programmer aware of the fact. E.g. the macros might have been something different before, but a refactoring made them equal and resulted in this situation.

clang-tidy/misc/RedundantExpressionCheck.cpp
601 ↗(On Diff #176124)

Please run clang-format

630 ↗(On Diff #176124)

clang:: should not be necessary here, is it? SourceLocation and other types are access without namespace, too.

test/clang-tidy/misc-redundant-expression.cpp
109 ↗(On Diff #176124)

Could you please add a test where the macro is undefed and redefined to something else?

Szelethus added a subscriber: donat.nagy.EditedNov 30 2018, 3:15 PM

I see your point, but here's why I think it isn't a bug: I like to see macros as constexpr variables, and if I used those instead, I personally wouldn't like to get a warning just because they have the same value. In C, silencing such a warning isn't even really possible.

Another interesting thought, @donat.nagy's check works by comparing actual nodes of the AST, while this one would work with Lexer, but they almost want to do the same thing, the only difference is that the first checks whether two pieces of code are equivalent, and the second checks whether they are the same. Maybe it'd be worth to extract the logic into a simple areStmtsEqual(const Stmt *S1, const Stmt *S2, bool ShouldCompareLexically = false) function, that would do AST based comparison if the last param is set to false, and would use Lexer if set to true. After that, we could just add command line options to both of these checks, to leave it up to the user to choose in between them. Maybe folks who suffer from heavily macro-infested code would rather bear the obvious performance hit Lexer causes for little more precise warnings, and (for example) users of C++11 (because there are few excuses not to prefer constexpr there) could run in AST mode.

edit: I'm not actually all that sure about the performance hit. Just a guess.

But I'm just thinking aloud really.

dkrupp updated this revision to Diff 176247.Dec 1 2018, 3:01 AM

-clang-format applied
-clang:: namespace qualifiers removed

dkrupp marked 3 inline comments as done.Dec 1 2018, 3:12 AM

@JonasToth this is the Lexer based expression equality check I talked about in D54757#1311516. The point of this patch is that the definition is a macro sure could be build dependent, but the macro name is not, so it wouldn't warn on the case I showed. What do you think?

Yes, this approach is possible.
IMHO it is still a bug/redudant if you do the same thing on both paths and warning on it makes the programmer aware of the fact. E.g. the macros might have been something different before, but a refactoring made them equal and resulted in this situation.

@JonasThoth: I see you point with refactoring, but let's imagine that the two macros COND_OP_MACRO and COND_OP_THIRD_MACRO are defined by compile time parameters. If the two macros are happened to be defined to the same value the user would get a warning, and if not she would not. How would the user would "fix" her code in the first case? So if the macro names are different in the conditional expression, it is not a redundant expression because the macro names are compile time parameters (with just eventually same values). This was the same logic the old test case were testing: k += (y < 0) ? COND_OP_MACRO : COND_OP_OTHER_MACRO;

test/clang-tidy/misc-redundant-expression.cpp
109 ↗(On Diff #176124)

I am not sure what you exactly suggest here. It should not matter what COND_OP_THIRD_MACRO is defined to as we lexically compare tokens as they are written in the original source code.

Could you be a bit more specific what test case you would like to add?

dkrupp added a comment.Dec 1 2018, 3:18 AM

I see your point, but here's why I think it isn't a bug: I like to see macros as constexpr variables, and if I used those instead, I personally wouldn't like to get a warning just because they have the same value. In C, silencing such a warning isn't even really possible.

Another interesting thought, @donat.nagy's check works by comparing actual nodes of the AST, while this one would work with Lexer, but they almost want to do the same thing, the only difference is that the first checks whether two pieces of code are equivalent, and the second checks whether they are the same. Maybe it'd be worth to extract the logic into a simple areStmtsEqual(const Stmt *S1, const Stmt *S2, bool ShouldCompareLexically = false) function, that would do AST based comparison if the last param is set to false, and would use Lexer if set to true. After that, we could just add command line options to both of these checks, to leave it up to the user to choose in between them. Maybe folks who suffer from heavily macro-infested code would rather bear the obvious performance hit Lexer causes for little more precise warnings, and (for example) users of C++11 (because there are few excuses not to prefer constexpr there) could run in AST mode.

edit: I'm not actually all that sure about the performance hit. Just a guess.

But I'm just thinking aloud really.

Wiring out the lexical comparison and AST based comparison sounds like an interesting idea, however IMHO such a setting is too coarse-grained on the file level. My guess would be that it depends from expression (fragment) to expression fragment how you want to compare them: for macros lexical comparison is better, for arithmetic expressions with many parentheses AST based recursive comparison may be a better fit (as implemented also in this checker).

Wiring out the lexical comparison and AST based comparison sounds like an interesting idea, however IMHO such a setting is too coarse-grained on the file level. My guess would be that it depends from expression (fragment) to expression fragment how you want to compare them: for macros lexical comparison is better, for arithmetic expressions with many parentheses AST based recursive comparison may be a better fit (as implemented also in this checker).

Yes, I agree. I think having such a utility would make sense as an future addition, but I would not like to block on it for several reasons:

  • do we know how often it this case even appears?
  • is it a real false positive in all cases?
  • it can be silence with // NOLINT if it actually is a false-positive

My gut feeling is that its rather a corner-case (this late patch to misc-redundant-expression is somewhat indicative for it) and would block the patch from landing.
Allowing future room for flexibility is certainly good.

test/clang-tidy/misc-redundant-expression.cpp
109 ↗(On Diff #176124)

*should* not matter is my point, please show that :)

Yup, I agree with everything that was said.

dkrupp updated this revision to Diff 176372.Dec 3 2018, 4:59 AM

new undef/defined testcase added

dkrupp marked 2 inline comments as done.Dec 3 2018, 5:02 AM
dkrupp added inline comments.
test/clang-tidy/misc-redundant-expression.cpp
109 ↗(On Diff #176124)

undef/define testcase added and passes. @JonasToth is this what you thought of?

JonasToth added inline comments.Dec 3 2018, 7:50 AM
clang-tidy/misc/RedundantExpressionCheck.cpp
605 ↗(On Diff #176372)

This operation could overflow if len(T1) > len(T2) and T2 is the last token of the file. I think std::min(T1.getLength(), T2.getLength()) would be better.

608 ↗(On Diff #176372)

Nit: Please add whitespace after and use / for the function documentation.

629 ↗(On Diff #176372)

Please format with clang-format, some places seem to be a little off whitespace-wise.

test/clang-tidy/misc-redundant-expression.cpp
109 ↗(On Diff #176124)

yup. thank you

alexfh added inline comments.Dec 4 2018, 4:02 AM
clang-tidy/misc/RedundantExpressionCheck.cpp
601 ↗(On Diff #176372)

Should this function compare token kinds first?

602 ↗(On Diff #176372)

Token::getLength() will assert-fail for annotation tokens.

605 ↗(On Diff #176372)

This code is only executed when they have the same length.

Szelethus added inline comments.Dec 12 2018, 2:16 PM
clang-tidy/misc/RedundantExpressionCheck.cpp
601 ↗(On Diff #176372)

I personally prefer to see boolean functions to have a name that starts with either "should", "is", "does", "has", or anything that clearly indicates that it returns with either true or false. In this case, "compare" is especially misleading, since it might as well return -1, 0 or 1.

Maybe hasSameLength?

602 ↗(On Diff #176372)

I guess if T1.isNot(tok::raw_identifier) (or T2) we could get away with simply comparing token kinds. If they are, it wouldn't assert. :)

638 ↗(On Diff #176372)

It seems like we're only checking whether these values are 0 or not, maybe make them bool? Also, I find Rem a little cryptic, what is it referring to? Maybe RemainingCharCount?

Maybe we should just make a simple function, so we could both boost readability, and get rid of some these variables (LLoc, RLoc) entirely:

unsinged getCharCountUntilEndOfExpr(SourceRange ExprSR, Token T,
                                    const SourceManager &SM) {

  assert((ExprSR.getBegin() > T.getLocation() ||
          ExprSR.getEnd() < T.getLocation()) &&
         "Token is not within the expression range!");

  return SM.getCharacterData(SM.getExpansionLoc(ExprSR.getEnd())) -
         SM.getCharacterData(T.getLocation());
}
Szelethus set the repository for this revision to rCTE Clang Tools Extra.Dec 12 2018, 2:20 PM
Szelethus added inline comments.Dec 12 2018, 2:23 PM
clang-tidy/misc/RedundantExpressionCheck.cpp
601 ↗(On Diff #176372)

Sorry, upon closer inspection, isSameToken would be more fitting. Mind you, there already is a function somewhere for this, but due to the lack of a Preprocessor object, we can't use it.
https://clang.llvm.org/doxygen/RewriteMacros_8cpp.html#a0ba058873ae3930f71b19b4dee4b1cbb

dkrupp updated this revision to Diff 179277.Dec 21 2018, 6:01 AM
dkrupp marked an inline comment as done.

All comments fixed.

I also added the handling of redundancy comparison of sizeof(..), alignof() operators.

dkrupp marked 13 inline comments as done.Dec 21 2018, 6:12 AM

Thanks for your comments. I fixed them all. I also added the handling of redundant sizeof() and alignof() operators on the way. Please check if OK now...

clang-tidy/misc/RedundantExpressionCheck.cpp
602 ↗(On Diff #176372)

I changed to code as per the suggestion from Szelethus.
So we will compare only raw_identifiers char-by-char, the rest by kind only.
According to my tests, macros and types, typedefs for example are such raw_identifiers.

605 ↗(On Diff #176372)

exactly, overflow should not happen.

638 ↗(On Diff #176372)

good point. Refactored ass suggested.

alexfh added inline comments.Jan 23 2019, 5:17 AM
clang-tidy/misc/RedundantExpressionCheck.cpp
136 ↗(On Diff #179277)

Any reasons not to pull out the results of cast<>() to local variables? In this particular case this would make the code look much simpler.

613 ↗(On Diff #179277)

const Token&

614 ↗(On Diff #179277)

clang-format, please

616 ↗(On Diff #179277)
668–669 ↗(On Diff #179277)

Can you just compare the location of (the end of?) the token with the end of the range?

whisperity edited the summary of this revision. (Show Details)Jan 23 2019, 10:06 AM
whisperity edited the summary of this revision. (Show Details)
dkrupp updated this revision to Diff 198041.EditedMay 3 2019, 10:13 AM
dkrupp marked 6 inline comments as done.

Thanks for your reviews!
I have fixed all your comments and rebased the patch to the latest master.

aaron.ballman added inline comments.May 3 2019, 10:46 AM
clang-tidy/misc/RedundantExpressionCheck.cpp
135–137 ↗(On Diff #198041)

const auto * since you already spell the type in the initializer.

832 ↗(On Diff #198041)

Not that I oppose the cleanup, but there are a bunch of these kinds of changes that are orthogonal to the patch. Feel free to commit these cleanups as an NFC commit and then rebase on top of it if you'd like, but they shouldn't be in this patch.

dkrupp updated this revision to Diff 214529.Aug 10 2019, 7:41 AM

@aaron.ballman 's comments are fixed.

dkrupp marked 2 inline comments as done.Aug 10 2019, 7:42 AM
aaron.ballman added inline comments.Aug 10 2019, 8:57 AM
clang-tidy/misc/RedundantExpressionCheck.cpp
621 ↗(On Diff #214529)

isSameRawIdentifierToken() so that the name isn't too generic.

629–630 ↗(On Diff #214529)

I would feel more comfortable if this were written:

return StringRef(SM.getCharacterData(T1.getLocation()), T1.getLength()) == StringRef(SM.getCharacterData(T2.getLocation()), T2.getLength());
674 ↗(On Diff #214529)

You can drop a set of parens here as they don't change the order of evaluation.

test/clang-tidy/misc-redundant-expression.cpp
152 ↗(On Diff #214529)

Spurious newline.

One more nit.

clang-tidy/misc/RedundantExpressionCheck.cpp
674 ↗(On Diff #214529)

I'd propagate the negation into the parentheses. I find it easier to understand !a || !b || !c than !(a && b && c), since the former can be digested piece by piece, but the latter requires the reader to keep the ! in their mind and "recurse" into the nested parentheses. It's similar to the "early return" pattern vs. nested conditional statements.

dkrupp updated this revision to Diff 223501.Oct 7 2019, 4:32 AM
dkrupp marked 5 inline comments as done.

Thanks @aaron.ballman and @alexfh for your review.
I fixed your findings.

This revision is now accepted and ready to land.Oct 9 2019, 10:50 AM

@aaron.ballman could you please commit?
I don't have commit access. Thx.

@aaron.ballman could you please commit?
I don't have commit access. Thx.

I'm happy to do so, but the patch does not apply cleanly to trunk. Can you rebase? (Sorry for the delayed on applying the patch, I was traveling last week and didn't have access to the repo.)

dkrupp updated this revision to Diff 226037.Oct 22 2019, 5:57 AM

The patch is rebased to the latest master.

@aaron.ballman could you please check now? Thanks!

@aaron.ballman could you please check now? Thanks!

Sorry for the delay, but I've committed this on your behalf in 1caa66d0759f6bd0851a40645afac8e8a7f84341