This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] Add a checker to remove useless intermediate variables before return statements with comparisons
Needs RevisionPublic

Authored by tbourvon on Aug 22 2017, 8:44 AM.

Details

Summary

This patch adds a checker to detect patterns of the following form:

auto IntermediateVar = foo();
return (IntermediateVar == 1);

and suggests to turn them into:

return (foo() == 1);

The reasoning behind this checker is that this kind of pattern is useless and lowers readability as long as the return statement remains rather short.
The idea of this checker was suggested to me by Sylvestre Ledru.

Diff Detail

Event Timeline

tbourvon created this revision.Aug 22 2017, 8:44 AM

Some more precisions I kept out of the revision body for clarity:

The checker contains a configuration option to set the maximum line length for the fixed return statement, so that we make sure this checker always contributes to improving readability and not the contrary.
The top-most non-trivial expression of the return statement has to be a comparison operator.

This checker tries to catch as many corner cases as possible. Most if not all of them are showcased in the tests, but the main ones are:

  • It handles double variable declarations like this:
auto Var1 = 1;
auto Var2 = 2;
return (Var1 < Var2);

so that the checker doesn't have to be run twice to fix these cases.

  • It always preserves order of execution:
auto Var = step1();
return (step2() > Var);

will correctly be transformed into:

return (step1() < step2()); // notice the comparison operator inversion
  • It never duplicates code execution:
auto Var = foo();
return (Var == Var);

will not be matched.

I did not feel it was a good idea to include this in the user-facing checker documentation because it seems to me that the user of the checker does not need to worry about these exceptions as long as they are covered. The user should only be interested in the idea behind the checker and how it can help.

Hi :)

I added my thoughts for the check. Many variables in your code could be const, i didn't mention all cases.

clang-tidy/readability/UselessIntermediateVarCheck.h
29 ↗(On Diff #112179)

It would be nice, if there is a way to get this information from a clang-format file.

JonasToth added inline comments.Aug 22 2017, 9:25 AM
clang-tidy/readability/UselessIntermediateVarCheck.cpp
24 ↗(On Diff #112179)

The matching expression can be const auto.
I would write the comment above the declaration, that the splitting does not occur (here and the next simpler matchers).

The local variables for matcher should have a capital letter as start -> s/directDeclRefExprLHS1/DirectDeclRefExpr/ here and elsewhere.

62 ↗(On Diff #112179)

here and elsewhere, is the indendation result of clang-format? Personally i like it, since it shows the structure, but there might be concerns since it probably does not match with the coding guideline.

186 ↗(On Diff #112179)

The warning message sounds a little harsh/judging, and does not explain why the intermediate is useless. (the notes do which is ok i think).

Maybe go with unnecessary intermediate variable %0

236 ↗(On Diff #112179)

const auto ?

290 ↗(On Diff #112179)

some const is possible for this an the following variables.

335 ↗(On Diff #112179)

const auto?

350 ↗(On Diff #112179)

const

clang-tidy/readability/UselessIntermediateVarCheck.h
47 ↗(On Diff #112179)

Maybe a SmallSet? the optimized datastructures from llvm would save dynamic memory allocations.

clang-tidy/utils/Matchers.h
67

formatted? usually the = ends up on the same line

docs/ReleaseNotes.rst
63

maybe "useless" should be replaced with "unnecessary". but thats only my opinion.

docs/clang-tidy/checks/readability-useless-intermediate-var.rst
14 ↗(On Diff #112179)

especially to avoid magic numbers, this kind of code could be more readable.

test/clang-tidy/readability-useless-intermediate-var.cpp
1 ↗(On Diff #112179)

the tests seem to be to less compared to the code volume of the check.

situations i think should be tested:

  • initialization from a function, method call
  • initialization with a lambda
const auto SomeVar = []() { /* super complicated stuff */ return Result; } ();
return SomeVar;
  • template stuff -> proof that they work two, even thought it doesnt seem to be relevant, at least what i can see.
  • what happens if a "temporary" is used as an argument for a function call?
const auto Var = std::sqrt(10);
return std::pow(Var, 10);
  • proof that really long function calls (like STL Algorithm tends to be) are not inlined as well
Eugene.Zelenko added inline comments.
docs/ReleaseNotes.rst
60

Please place new checks in alphabetical order.

63

Checker -> check.
Please enclose return (in statement context) in ``.
Same for other places.

Eugene.Zelenko set the repository for this revision to rL LLVM.
Eugene.Zelenko added a project: Restricted Project.

Please add the following test: (and make sure that it does the right thing :))

bool f_with_preproc_condition() {
  auto test = 42;
  assert(test == 42);
  return test;
}

I.e. if -DNDEBUG is present, variable is not needed, but if -DNDEBUG is *NOT* present...

tbourvon added a comment.EditedAug 23 2017, 6:07 AM

Please add the following test: (and make sure that it does the right thing :))

bool f_with_preproc_condition() {
  auto test = 42;
  assert(test == 42);
  return test;
}

I.e. if -DNDEBUG is present, variable is not needed, but if -DNDEBUG is *NOT* present...

Shouldn't we ignore these cases whether or not -DNDEBUG is present? If we apply the fix here and remove the variable while we have -DNDEBUG, and then remove this define, then we'll get a compiler error for test not found.
In this case it can be fixed fairly easily, but this could in fact introduce bugs with more complex conditional macros and preprocessor directives. For example:

#ifdef WIN32
#define MY_MACRO(test) test = 0
#else
#define MY_MACRO(test) /* nothing */
#endif

bool f() {
  auto test = 42;
  MY_MACRO(test);
  return (test == 42);
}

If we want to ignore these cases not matter what, which seems to me like the most logical and reasonable thing to do, we need to be able to know if there are preprocessor directives between the variable declaration and the return statement.
In order to do that, we would need to be able to get the PreprocessingRecord of the current compilation in order to call getPreprocessedEntitiesInRange(), but AFAICT this cannot be accessed from clang-tidy checks at the moment. Should we introduce a way to reach that object from checks?

Please add the following test: (and make sure that it does the right thing :))

bool f_with_preproc_condition() {
  auto test = 42;
  assert(test == 42);
  return test;
}

I.e. if -DNDEBUG is present, variable is not needed, but if -DNDEBUG is *NOT* present...

Shouldn't we ignore these cases whether or not -DNDEBUG is present?

That's *exactly* what i was talking about.

If we apply the fix here and remove the variable while we have -DNDEBUG, and then remove this define, then we'll get a compiler error for test not found.
In this case it can be fixed fairly easily, but this could in fact introduce bugs with more complex conditional macros and preprocessor directives. For example:

#ifdef WIN32
#define MY_MACRO(test) test = 0
#else
#define MY_MACRO(test) /* nothing */
#endif

bool f() {
  auto test = 42;
  MY_MACRO(test);
  return (test == 42);
}

If we want to ignore these cases not matter what, which seems to me like the most logical and reasonable thing to do, we need to be able to know if there are preprocessor directives between the variable declaration and the return statement.
In order to do that, we would need to be able to get the PreprocessingRecord of the current compilation in order to call getPreprocessedEntitiesInRange(), but AFAICT this cannot be accessed from clang-tidy checks at the moment. Should we introduce a way to reach that object from checks?

Currently, there is a registerPPCallbacks() function in ClangTidyCheck class, which allows you to register a subclass of PPCallbacks, and manually handle all the preprocessor thingies.
That being said, getPreprocessedEntitiesInRange() looks interesting, and it *might* help me in D36836.

Please add the following test: (and make sure that it does the right thing :))

bool f_with_preproc_condition() {
  auto test = 42;
  assert(test == 42);
  return test;
}

I.e. if -DNDEBUG is present, variable is not needed, but if -DNDEBUG is *NOT* present...

Shouldn't we ignore these cases whether or not -DNDEBUG is present?

That's *exactly* what i was talking about.

If we apply the fix here and remove the variable while we have -DNDEBUG, and then remove this define, then we'll get a compiler error for test not found.
In this case it can be fixed fairly easily, but this could in fact introduce bugs with more complex conditional macros and preprocessor directives. For example:

#ifdef WIN32
#define MY_MACRO(test) test = 0
#else
#define MY_MACRO(test) /* nothing */
#endif

bool f() {
  auto test = 42;
  MY_MACRO(test);
  return (test == 42);
}

If we want to ignore these cases not matter what, which seems to me like the most logical and reasonable thing to do, we need to be able to know if there are preprocessor directives between the variable declaration and the return statement.
In order to do that, we would need to be able to get the PreprocessingRecord of the current compilation in order to call getPreprocessedEntitiesInRange(), but AFAICT this cannot be accessed from clang-tidy checks at the moment. Should we introduce a way to reach that object from checks?

Currently, there is a registerPPCallbacks() function in ClangTidyCheck class, which allows you to register a subclass of PPCallbacks, and manually handle all the preprocessor thingies.
That being said, getPreprocessedEntitiesInRange() looks interesting, and it *might* help me in D36836.

IMO it makes more sense to expose PreprocessingRecord, as registering callbacks and maintaining a private inner database for every check not only isn't ideal in terms of performance, but also duplicates logic and could lead to inconsistencies.
I think I'm going to work on getting this object accessible through MatchResult and MatchFinder so that it is accessible from both checks and matchers. Probably in a separate patch. Please let me know if you have any more thoughts on this.

tbourvon updated this revision to Diff 112351.Aug 23 2017, 6:39 AM

Fixing the reviewers' remarks, mainly formatting and const-correctness, as well as adding a few more tests.

tbourvon marked 14 inline comments as done.Aug 23 2017, 6:46 AM
tbourvon added inline comments.
clang-tidy/readability/UselessIntermediateVarCheck.h
29 ↗(On Diff #112179)

There is no easy way to get this information unless you want to use all the LibFormat stuff that includes finding the .clang-format file, etc... It doesn't seem like it is the responsibility of this checker to take care of that.

test/clang-tidy/readability-useless-intermediate-var.cpp
1 ↗(On Diff #112179)

Your 4th point is not something possible with this check. For now it only looks at return statements with comparisons.
As for the 5th, it doesn't really matter what the expression we are looking at is, so this case is already covered by the very long string literal in the last test.

tbourvon updated this revision to Diff 112356.Aug 23 2017, 6:49 AM
tbourvon marked an inline comment as done.

Forgot to move the comments on small matchers as suggested by review.

lebedev.ri added inline comments.Aug 23 2017, 6:55 AM
clang-tidy/readability/UnnecessaryIntermediateVarCheck.cpp
376

Is there really no way to format the fixes, and *then* check the line length?

$ clang-tidy --help
...
  -format-style=<string>       - 
                                 Style for formatting code around applied fixes:
                                   - 'none' (default) turns off formatting
                                   - 'file' (literally 'file', not a placeholder)
                                     uses .clang-format file in the closest parent
                                     directory
                                   - '{ <json> }' specifies options inline, e.g.
                                     -format-style='{BasedOnStyle: llvm, IndentWidth: 8}'
                                   - 'llvm', 'google', 'webkit', 'mozilla'
                                 See clang-format documentation for the up-to-date
                                 information about formatting styles and options.
                                 This option overrides the 'FormatStyle` option in
                                 .clang-tidy file, if any.
...

so clang-tidy is at least aware of clang-format.

clang-tidy/readability/UnnecessaryIntermediateVarCheck.h
30

It would be better to default to 80, especially if it can't be read from .clang-tidy.

test/clang-tidy/readability-unnecessary-intermediate-var.cpp
172

Please add edge-cases, i.e. just below MaximumLineLength, exactly MaximumLineLength, ...

Abpostelnicu added inline comments.Jan 3 2018, 7:50 AM
clang-tidy/readability/UnnecessaryIntermediateVarCheck.cpp
376

I think this is doable since I see this in the code:

https://code.woboq.org/llvm/clang-tools-extra/clang-tidy/ClangTidy.cpp.html#199

That leads me to think that we can have this before applying the fixes and in case the fix after re-format has a line that violates our rule it gets dropped. I'm gonna update the patch with this new addon.

tbourvon updated this revision to Diff 129278.Jan 10 2018, 7:21 AM

Add retrieval of MaximumLineLength from clang-format options, otherwise defaults to 80.
Also add unit tests around the limit case for the line length.

tbourvon marked 4 inline comments as done.Jan 10 2018, 7:30 AM

(By the way, credits go to @Abpostelnicu for the latest changes regarding MaximumLineLength interop with clang-format options)

clang-tidy/readability/UnnecessaryIntermediateVarCheck.cpp
376

Regarding this comment, Andi (@Abpostelnicu) and I have analyzed the situation and we believe that formatting the replacement before checking the line length achieves almost nothing, yet complicates the code a lot.

If we format the replacement before applying the fix, then we're almost sure that clang-format will split the replacement into multiple lines and that it will never go above the maximum line length (except for extremely rare cases like 80+ characters identifiers).
Actually, we believe that splitting the return statement into multiple lines hinders its readability, and therefore that in cases where the fix would exceed the maximum line length, we're better off not applying it, since the main goal of this check is readability.

clang-format can still be run after the fix is applying, of course.

tbourvon marked an inline comment as done.Jan 10 2018, 7:31 AM

Thank you for working on this!
Some thoughts below.

clang-tidy/readability/CMakeLists.txt
31

Please upload patches with full context (-U999999)

clang-tidy/utils/LexerUtils.h
26

This should probably be split into new differential, and it should have a unit-test.
(And mark that diff as parent of this one)

clang-tidy/utils/Matchers.h
54

This should be split into another new differential, and it should have a unit-test.
(And mark that diff as parent of this one)
Also, the docs will need to be updated, which is sadly impossible now, see D41455.

test/clang-tidy/readability-unnecessary-intermediate-var.cpp
2

Still no tests for macros, preprocessor definitions.
I do see that it will still do the wrong thing, but i need to see that in the tests and release notes.

tbourvon updated this revision to Diff 131709.Jan 28 2018, 10:45 AM

Separated the added matcher and lexer utils into a different diff.
Also added unit tests to make sure we behave as expected when macros get in the way of the code we detect.

lebedev.ri added inline comments.Jan 28 2018, 10:52 AM
test/clang-tidy/readability-unnecessary-intermediate-var.cpp
207

Tests are nice :)
But please, add the test with assert-like macro..

lebedev.ri set the repository for this revision to rCTE Clang Tools Extra.Jan 28 2018, 10:55 AM
tbourvon marked an inline comment as done.Jan 28 2018, 11:12 AM
tbourvon added inline comments.
test/clang-tidy/readability-unnecessary-intermediate-var.cpp
207

Sorry, I'm not sure I understand what kind of test you are suggesting... Could you give me a short example?

lebedev.ri added inline comments.Jan 28 2018, 11:21 AM
test/clang-tidy/readability-unnecessary-intermediate-var.cpp
207

Something like this: (did not test at all, just back of the envelope code)

void die();

#ifndef NDEBUG
#define assert(cond) void(0) // i.e. it does nothing
#else
#define assert(cond) (cond) ? void(0) : die();
#endif
bool some_func();
bool f_preprocessor_macro2() {
  auto test = some_func();
  assert(test); // <- should not be removed regardless of whether NDEBUG is set or not
  return test;
}
tbourvon updated this revision to Diff 134936.Feb 19 2018, 9:22 AM

This updates the patch to use clang::tooling::fixit::getText() instead of the custom getStmtText. This also adds a macro unit test.

tbourvon marked 4 inline comments as done.Feb 19 2018, 9:24 AM
tbourvon updated this revision to Diff 137588.Mar 8 2018, 8:56 AM

Moves the custom matcher to the check instead of having it in utils/Matchers.h

@alexfh Do you think we can merge this? I think I've been through every suggestion and it would be nice to finally land the check!

aaron.ballman added inline comments.Mar 12 2018, 5:32 AM
clang-tidy/readability/UnnecessaryIntermediateVarCheck.cpp
28–31

Please do not use auto here as the type is not explicitly spelled out in the initializer. Same comment applies elsewhere.

33

Elide braces.

190

This should be using a %select in the diagnostic.

235–242

Can use const auto * because the type is explicitly spelled out in the initialization. Same comment applies elsewhere.

275

Elide braces (applies elsewhere).

clang-tidy/readability/UnnecessaryIntermediateVarCheck.h
52

Can this be const auto &?

58

Same here.

docs/clang-tidy/checks/readability-unnecessary-intermediate-var.rst
5

Underline is of incorrect length.

alexfh requested changes to this revision.Mar 23 2018, 8:01 AM
alexfh added inline comments.
clang-tidy/readability/UnnecessaryIntermediateVarCheck.cpp
29

I've just noticed this use of format::getStyle. The problem is that it will use the real file system, which is completely wrong for some clang-tidy use cases. I guess, we'll either have to roll this back to use a check option or create some way for checks to get the formatting style that can be implemented correctly by the client, when clang-tidy is used as a library.

I suggest going back to the option and leave a FIXME to investigate the other possibility.

This revision now requires changes to proceed.Mar 23 2018, 8:01 AM
Eugene.Zelenko added inline comments.Mar 23 2018, 8:07 AM
docs/ReleaseNotes.rst
62–63

Please rebase from trunk and use :doc: for link.

68

Please remove This new check. Please enclose return in ``, also in other places.

docs/clang-tidy/checks/readability-unnecessary-intermediate-var.rst
8

Checker -> check.

tbourvon updated this revision to Diff 140278.Mar 29 2018, 10:15 AM
tbourvon marked 10 inline comments as done.

Fixed review comments

tbourvon marked an inline comment as done.Mar 29 2018, 10:16 AM
tbourvon added inline comments.
docs/ReleaseNotes.rst
62–63

I'm sorry, what do you mean by "use :doc: for link"?

docs/ReleaseNotes.rst
148

Please use alphabetic order and :doc: as well as proper link. See other checks as example.

hintonda removed a subscriber: hintonda.Mar 29 2018, 1:18 PM
tbourvon updated this revision to Diff 140541.Mar 31 2018, 8:17 AM
tbourvon marked 3 inline comments as done.

Order and link fixes in the release notes

Eugene.Zelenko added inline comments.Mar 31 2018, 8:58 AM
clang-tidy/readability/UnnecessaryIntermediateVarCheck.cpp
2

Please make fit line into 80 characters and remove incorrect continuation.

clang-tidy/readability/UnnecessaryIntermediateVarCheck.h
2

Please make fit line into 80 characters and remove incorrect continuation.

docs/ReleaseNotes.rst
133

Final underscore is not needed.

docs/clang-tidy/checks/readability-unnecessary-intermediate-var.rst
7

Please add missing `. Same below.

alexfh requested changes to this revision.Apr 4 2018, 10:44 AM
alexfh added inline comments.
clang-tidy/readability/UnnecessaryIntermediateVarCheck.h
28

Looking at how this matcher is used, I'm starting to doubt this is an efficient way to find the kind of the coding pattern this check targets. Creating CFG for each declaration statement would result in a lot of overlapping and, thus, unnecessary work. I guess, it should be easy to create a test case (e.g. thousands of declarations in the same function that would trigger one or better both hasSuccessor matchers) where this check would run for extremely long time. I'd recommend doing this to test this or any alternative solution you come up with. Performance problems are very real for some clang-tidy checks. And this check raises a number of red flags w.r.t. performance.

An alternative I could suggest is to build CFG for each function once and then search for the interesting declarations while traversing it (instead of running AST matchers and building CFG inside them).

This revision now requires changes to proceed.Apr 4 2018, 10:44 AM