Page MenuHomePhabricator

[clang-tidy] new check: bugprone-too-small-loop-variable
ClosedPublic

Authored by ztamas on Nov 1 2018, 6:07 AM.

Details

Summary

The new checker searches for those for loops which has a loop variable with a "too small" type which means this type can't represent all values which are part of the iteration range.

For example:

int main() {
  long size = 300000;
  for( short int i = 0; i < size; ++i) {}
}

The short type leads to infinite loop here because it can't store all values in the [0..size] interval. In a real use case, size means a container's size which depends on the user input. Which means for small amount of objects the algorithm works, but with a larger user input the software will freeze.

The idea of the checker comes from the LibreOffice project, where the same check was implemented as a clang compiler plugin, called LoopVarTooSmall (LLVM licensed).
The idea is the same behind this check, but the code is different because of the different framework.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ztamas added inline comments.Nov 2 2018, 1:47 PM
test/clang-tidy/bugprone-too-small-loop-variable.cpp
6 ↗(On Diff #172226)

Do you mean tests like voidForLoopWithLiteralCond()?
Is it worth to add more tests like that?

152 ↗(On Diff #172226)

Hmm, it can be handled I think. However, I'm not sure how often it is, that an enum has an item value bigger than 32767 (short) or 65535 (unsigned short) or another even bigger value.
For now, I think it's better to just ignore these cases to avoid false positives and keep the first version of the check simple. The scope can be extended anytime later I think.

JonasToth added inline comments.Nov 2 2018, 1:54 PM
clang-tidy/bugprone/TooSmallLoopVariableCheck.cpp
20 ↗(On Diff #172226)

I didn't see that, but they should still be StringRef as type.

test/clang-tidy/bugprone-too-small-loop-variable.cpp
6 ↗(On Diff #172226)

I didn't see it. In principle yes, but i would like to see a test with a bigger number then iterateable (maybe there is a frontend warning for that?). If there is no warning, this should definitely be implemented here (possible follow up) or even become a frontend warning.

152 ↗(On Diff #172226)

enum values can become very big for flag-enums

enum Foo {
  value1 = 1 << 1;
// ...
  value 24 = 1 << 24;
};

You should still add a test for a enum with specified underlying type to ensure, there is no noise.

Hmm, i thought Clang has some warning for this, but I was wrong... Did you think to implement this check as Clang warning?

That is an interesting point actually -- maybe it'd be worth doing that, and if more powerful analysis is required, Static Analyzer would be the next step. I haven't actually implemented any 'regular' clang warning, so I'm not sure.

clang-tidy/bugprone/TooSmallLoopVariableCheck.cpp
109 ↗(On Diff #172422)

You seem to have code for handling templates, but I don't see test cases for it.

ztamas updated this revision to Diff 172436.Nov 2 2018, 2:33 PM
  • Use StringRef instead of char[]
  • Add test cases about big constant / literal / enum values
ztamas marked 3 inline comments as done.Nov 2 2018, 2:37 PM
ztamas added inline comments.
test/clang-tidy/bugprone-too-small-loop-variable.cpp
6 ↗(On Diff #172226)

I added a test case. This kind of test cases are caught by -Wtautological-constant-out-of-range-compare, so we are good I think.

Eugene.Zelenko added inline comments.Nov 2 2018, 4:07 PM
docs/ReleaseNotes.rst
116 ↗(On Diff #172226)

Please avoid This check, may be Detects? Same in documentation.

116 ↗(On Diff #172436)

Somehow for is still highlighted with single , should be `. Same in documentation.

ztamas updated this revision to Diff 172486.Nov 3 2018, 12:18 AM
ztamas marked an inline comment as done.
  • Analyze template dependant for loops too (add some test cases)
  • Use double backticks for marking code in docs
ztamas marked 2 inline comments as done.Nov 3 2018, 12:27 AM
ztamas added inline comments.
clang-tidy/bugprone/TooSmallLoopVariableCheck.cpp
109 ↗(On Diff #172422)

Thanks for mentioning it. I was overdefensive against false positives. This isInstantiationDependent() is not needed here. So I removed and added some test cases with templates.

ztamas updated this revision to Diff 172487.Nov 3 2018, 12:30 AM
ztamas marked an inline comment as done.

Update docs based on reviewer comment

ztamas marked an inline comment as done.Nov 3 2018, 12:31 AM

Hmm, i thought Clang has some warning for this, but I was wrong... Did you think to implement this check as Clang warning?

That is an interesting point actually -- maybe it'd be worth doing that, and if more powerful analysis is required, Static Analyzer would be the next step. I haven't actually implemented any 'regular' clang warning, so I'm not sure.

Well, I'm implementing it as a clang-tidy check now. I guess in the future anyone can replace it with a clang warning if he/she can implement it effectively (e.g. no false positives).

My first impression was that having something accepted as clang static analyzer check takes ages and so I expect that implementing something as a clang warning takes even more time. My impression is based on bugzilla activity and on some read review history. It seems to me it's not rare to have comments like: "Ping, let's not abandon this change" or the author says that he/she has no more time for further work, etc. However clang-tidy seems more progressive. So I prefer to have something as a clang-tidy check (and actually get it in the upstream tool) than implementing it as a clang warning (if it can be implemented effectively at all), wait for a year of review and most probably abandon the change. Of course, it's just a first impression, but why should I take the risk. I think this clang-tidy check is powerful, so useful to have it.

Hmm, i thought Clang has some warning for this, but I was wrong... Did you think to implement this check as Clang warning?

That is an interesting point actually -- maybe it'd be worth doing that, and if more powerful analysis is required, Static Analyzer would be the next step. I haven't actually implemented any 'regular' clang warning, so I'm not sure.

Well, I'm implementing it as a clang-tidy check now. I guess in the future anyone can replace it with a clang warning if he/she can implement it effectively (e.g. no false positives).

My first impression was that having something accepted as clang static analyzer check takes ages and so I expect that implementing something as a clang warning takes even more time. My impression is based on bugzilla activity and on some read review history. It seems to me it's not rare to have comments like: "Ping, let's not abandon this change" or the author says that he/she has no more time for further work, etc. However clang-tidy seems more progressive. So I prefer to have something as a clang-tidy check (and actually get it in the upstream tool) than implementing it as a clang warning (if it can be implemented effectively at all), wait for a year of review and most probably abandon the change. Of course, it's just a first impression, but why should I take the risk. I think this clang-tidy check is powerful, so useful to have it.

Okay, I'm sold on that :).

ztamas marked 10 inline comments as done.Nov 4 2018, 2:16 AM

Mark all handled inline comments as Done.

Regarding the warning discussion: It is ok to have this check here in clang-tidy first and let it mature. Once we can handle all kind of loops and do not produce false positives this logic can move into the frontend.
But in my opinion the warning-version should handle all loops. Until then we can happily have this here :)

clang-tidy/bugprone/TooSmallLoopVariableCheck.cpp
45 ↗(On Diff #172487)

missing full stop.

51 ↗(On Diff #172487)

same

60 ↗(On Diff #172487)

same.

82 ↗(On Diff #172487)

same, other places too, but i won't mark them anymore.

100 ↗(On Diff #172487)

marco defined is outdated.

I think the sentence should be improved. Maybe Ignore casting cause by constant values inside a binary operator, e.g. from ... .?

120 ↗(On Diff #172487)

I think that comment should be before the if to be consistent with other comment positions, and missing full stop.

122 ↗(On Diff #172487)

you can utilize early return for all these cases.
Please don't use if-else then, because no return after else-rule.

142 ↗(On Diff #172487)

Does this try to ensure a precondition? Then it should become an assertion instead.
Please adjust the comment like above (punctuation, position)

docs/clang-tidy/checks/bugprone-too-small-loop-variable.rst
6 ↗(On Diff #172487)

Disclaimer: english isn't my mother tongue and its not perfect :)

which has -> that have? Sounds better to me.

10 ↗(On Diff #172487)

the .. code-block:: c++ is usually not indended, only the code itself.

test/clang-tidy/bugprone-too-small-loop-variable.cpp
5 ↗(On Diff #172487)

please add tests for range-for loops to ensure the implicitly generated loop does not generate any false positives or the like.

ztamas added inline comments.Nov 5 2018, 12:24 PM
clang-tidy/bugprone/TooSmallLoopVariableCheck.cpp
45 ↗(On Diff #172487)

Sorry, I forgot the punctuation what you've already mentioned.

142 ↗(On Diff #172487)

It's not an assumed precondition. This if handles the case when LoopVarMatcher is not fitted with the actual loop variable. That's why the IncrementMatcher is there so we can check whether we found the loop variable.
See voidForLoopReverseCond() test case which hits this if branch.
I did not find a solution to handle this check inside the matcher.

docs/clang-tidy/checks/bugprone-too-small-loop-variable.rst
10 ↗(On Diff #172487)

Hmm, I copied this from somewhere. It might be a good idea to make this consistent across all the *.rst files. See bugprone-suspicious-semicolon.rst or bugprone-use-after-move.rst for example.

ztamas updated this revision to Diff 172635.Nov 5 2018, 12:35 PM
  • Add a range-based loop test case
  • Restructure test cases a bit
  • Fix-up comments, position, punctuation
ztamas marked 10 inline comments as done.Nov 5 2018, 12:38 PM
ztamas marked an inline comment as done.Nov 5 2018, 12:43 PM
ztamas added inline comments.Nov 5 2018, 1:25 PM
clang-tidy/bugprone/TooSmallLoopVariableCheck.cpp
142 ↗(On Diff #172487)

voidForLoopReverseCond() was renamed to voidForLoopCondImplicitCast() in the mean time.

last nits from my side (for now :)).
If the other reviews could take a look at it as well, would be great.
I am uncertain about the english in some comments @aaron.ballman finds all these language bugs ;)

clang-tidy/bugprone/TooSmallLoopVariableCheck.cpp
142 ↗(On Diff #172487)

Ok, I can't think of a solution of head right now as well. It's fine to leave as is.

123 ↗(On Diff #172635)

please no else after return

126 ↗(On Diff #172635)

same

docs/clang-tidy/checks/bugprone-too-small-loop-variable.rst
10 ↗(On Diff #172487)

True, but nobody want's to do the documentation work :D

Eugene.Zelenko added inline comments.Nov 5 2018, 2:10 PM
docs/clang-tidy/checks/bugprone-too-small-loop-variable.rst
10 ↗(On Diff #172487)

I could try to fix, but I need to be pointed to proper example :-)

JonasToth added inline comments.Nov 5 2018, 2:24 PM
docs/clang-tidy/checks/bugprone-too-small-loop-variable.rst
10 ↗(On Diff #172487)

A nice little sed line i had to write 3 times fixed the thing :D

JonasToth added inline comments.Nov 6 2018, 12:56 AM
clang-tidy/bugprone/TooSmallLoopVariableCheck.cpp
20 ↗(On Diff #172226)

One more nit i forgot: these variables should be static for linkage and be CamelCase to match the naming conventions.

Szelethus added inline comments.Nov 6 2018, 1:16 AM
clang-tidy/bugprone/TooSmallLoopVariableCheck.cpp
20 ↗(On Diff #172226)

Hmmm, StringRef actually has a constexpr constructor, we could make these constexpr too I guess.

lebedev.ri added inline comments.
clang-tidy/bugprone/TooSmallLoopVariableCheck.cpp
20 ↗(On Diff #172226)

You want StringLiteral:

static constexpr llvm::StringLiteral loopVarCastName = llvm::StringLiteral("loopVarCast");
static constexpr llvm::StringLiteral loopCondName = llvm::StringLiteral("loopCond");
static constexpr llvm::StringLiteral loopIncrementName = llvm::StringLiteral("loopIncrement");
MTC added a subscriber: MTC.Nov 6 2018, 6:22 PM
ztamas updated this revision to Diff 172938.Nov 7 2018, 6:08 AM
  • no else after return
  • static constexpr llvm::StringLiteral
  • CamelCase variable names
  • Remove unneeded isIntegerType() check
  • Better terminology: not terminating condition, but iteration's upper bound.
ztamas marked 13 inline comments as done.Nov 7 2018, 6:12 AM

LG from my side. Please wait for feedback from @aaron.ballman or @alexfh before committing.

ztamas updated this revision to Diff 173372.Nov 9 2018, 10:05 AM
  • Make local functions static
JonasToth accepted this revision.Nov 9 2018, 10:52 AM

LGTM.
Did you run this check in its final form against a bigger project? These results would be interesting.
Do you have commit access?

This revision is now accepted and ready to land.Nov 9 2018, 10:52 AM
ztamas added a comment.Nov 9 2018, 3:11 PM

LGTM.
Did you run this check in its final form against a bigger project? These results would be interesting.

I'll run it on LibreOffice code again and we'll see.

Do you have commit access?

Commit access? This is my first patch.

I'll run it on LibreOffice code again and we'll see.

Perfect, if you have the time, running it against LLVM would be very interesting as well.

Do you have commit access?

Commit access? This is my first patch.

If you plan to regularly contribute to LLVM you can ask for commit access (process written here: https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access).
I (or someone else) can commit this patch for you in the meanwhile (of course with attribution to you).

ztamas added a comment.EditedNov 10 2018, 2:17 PM

I have the result after running the current version of the check on LibreOffice.

Found around 50 new issues were hidden by macros:
https://cgit.freedesktop.org/libreoffice/core/commit/?id=afbfe42e63cdba1a18c292d7eb4875009b0f19c0

Also found some new false positives related to macros. The number of all false positives is around 38, which is still seems good to me.

ztamas added a comment.EditedNov 10 2018, 2:42 PM

I also tested on LLVm code.
The output is here:
https://gist.github.com/tzolnai/fe4f23031d3f9fdbdbf1ee38abda00a4

I found 362 warnings.

Around 95% of these warnings are similar to the next example:

/home/zolnai/lohome/llvm/lib/Support/Chrono.cpp:64:24: warning: loop variable has narrower type 'unsigned int' than iteration's upper bound 'size_t' (aka 'unsigned long') [bugprone-too-small-loop-variable]
  for (unsigned I = 0; I < Style.size(); ++I) {

Where the loop variable has an unsigned int type while in the loop condition it is compared with a container size which has size_t type. The actual size method can be std::string::length() or array_lengthof() too.

An interesting catch related to a template function:

/home/zolnai/lohome/llvm/tools/clang/lib/CodeGen/CGNonTrivialStruct.cpp:310:24: warning: loop variable has narrower type 'unsigned int' than iteration's upper bound 'size_t' (aka 'unsigned long') [bugprone-too-small-loop-variable]
  for (unsigned I = 0; I < N; ++I)

Where N is a template value with size_t type. If the container function is instantiated with a "large" value I expect the function won't work. I guess it never actually happens.

An other catch inside a macro:

/home/zolnai/lohome/llvm/lib/ExecutionEngine/Interpreter/Execution.cpp:157:5: warning: loop variable has narrower type 'uint32_t' (aka 'unsigned int') than iteration's upper bound 'std::vector::size_type' (aka 'unsigned long') [bugprone-too-small-loop-variable]
    IMPLEMENT_VECTOR_INTEGER_ICMP(ne,Ty);
    ^
/home/zolnai/lohome/llvm/lib/ExecutionEngine/Interpreter/Execution.cpp:123:24: note: expanded from macro 'IMPLEMENT_VECTOR_INTEGER_ICMP'
    for( uint32_t _i=0;_i<Src1.AggregateVal.size();_i++)

I can't see similar false positives what LibreOffice code produces. In the case of LibreOffice short type loop variables used to lead to false positives. I expect that in LLVM code short type is not used frequently in a loop. The closest case which can be interpreted as a false positive is the next one:

/home/zolnai/lohome/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp:2432:21: warning: loop variable has narrower type 'unsigned int' than iteration's upper bound 'unsigned long' [bugprone-too-small-loop-variable]
    for (Chunk = 0; Chunk < NumBytes / sizeof(uint64_t); ++Chunk)

Where Chunk and NumBytes both have unsigned type. sizeof(uint64_t) leads to the warning, which won't make the whole expression actually bigger. It's a similar constant like value as enums, literals, etc, which are ignored by the check. I count around 10 similar use cases where I saw that the sizeof() is used on a type, so must be constant in runtime.

I did not see any other kind of false positives.

Thank you for checking these two projects!

I have the result after running the current version of the check on LibreOffice.

Found around 50 new issues were hidden by macros:
https://cgit.freedesktop.org/libreoffice/core/commit/?id=afbfe42e63cdba1a18c292d7eb4875009b0f19c0

Also found some new false positives related to macros. The number of all false positives is around 38, which is still seems good to me.

I would be very interested why these false positives are created. Is there a way to avoid them and maybe it makes sense to already add FIXME at the possible places the check needs improvements.

I also tested on LLVm code.
The output is here:
https://gist.github.com/tzolnai/fe4f23031d3f9fdbdbf1ee38abda00a4

I found 362 warnings.

Around 95% of these warnings are similar to the next example:

/home/zolnai/lohome/llvm/lib/Support/Chrono.cpp:64:24: warning: loop variable has narrower type 'unsigned int' than iteration's upper bound 'size_t' (aka 'unsigned long') [bugprone-too-small-loop-variable]
  for (unsigned I = 0; I < Style.size(); ++I) {

Where the loop variable has an unsigned int type while in the loop condition it is compared with a container size which has size_t type. The actual size method can be std::string::length() or array_lengthof() too.

[snip snip]

I can't see similar false positives what LibreOffice code produces.

I am fairly concerned the example with unsigned use for container iteration are not false positives, just examples of bad happenstance code which never breaks under real life applications due to uint32_t being good enough but is actually not type-safe.
Those examples that I kept in my quote are especially bad and should be fixed eventually...

ztamas updated this revision to Diff 173559.Nov 11 2018, 8:30 AM
  • Add new test cases based on findings in LibreOffice code base

Also found some new false positives related to macros. The number of all false positives is around 38, which is still seems good to me.

I would be very interested why these false positives are created. Is there a way to avoid them and maybe it makes sense to already add FIXME at the possible places the check needs improvements.

voidForLoopFalsePositive() and voidForLoopFalsePositive2() test cases covers most of the false positives found in LibreOffice.

ztamas updated this revision to Diff 173560.Nov 11 2018, 8:40 AM
  • Fix comment

I would be very interested why these false positives are created. Is there a way to avoid them and maybe it makes sense to already add FIXME at the possible places the check needs improvements.

voidForLoopFalsePositive() and voidForLoopFalsePositive2() test cases covers most of the false positives found in LibreOffice.

I would not consider them as full false positives, the constants that are created allow to filter more, but type-based the diagnostics are correct. So I think that is fine. If the constants would be a big value, the check does find a real problem.

@whisperity

I am fairly concerned the example with unsigned use for container iteration are not false positives, just examples of bad happenstance code which never breaks under real life applications due to uint32_t being good enough but is actually not type-safe.
Those examples that I kept in my quote are especially bad and should be fixed eventually...

clang-tidy is not about finding _only_ bugs, but find patterns that can be problematic but are not in every instance (therefore bugprone- and not bug-).
uint32_t does not span the whole memory-space for current hardware and a std::string can have more then uint32_t::max elements. Diagnosing this is valid.
Containers where uint32_t::max * sizeof(element_type) > size_t::max could be filtered for normal iteration over the container. I would consider it still a bugprone pattern.

I would be very interested why these false positives are created. Is there a way to avoid them and maybe it makes sense to already add FIXME at the possible places the check needs improvements.

voidForLoopFalsePositive() and voidForLoopFalsePositive2() test cases covers most of the false positives found in LibreOffice.

I would not consider them as full false positives, the constants that are created allow to filter more, but type-based the diagnostics are correct. So I think that is fine. If the constants would be a big value, the check does find a real problem.

Yes, that's right, these are not full false positives, but the check's main focus is on those loops which are runtime dependent. If a loop's upper bound can be calculated in compile time then this loop should be caught by a compiler warning based on the actual value of that constant. See -Wtautological-constant-out-of-range-compare for example. So I think it's the best if we can avoid catching these issues using a type based matching.
Anyway, there is not too many of this kind of false positives, so it's not a big issue. In LLVM code I did not find any similar case.
I can't see full false positives where the check works incorrectly. The detected type mismatch seems correctly detected in every case.

Yes, that's right, these are not full false positives, but the check's main focus is on those loops which are runtime dependent. If a loop's upper bound can be calculated in compile time then this loop should be caught by a compiler warning based on the actual value of that constant. See -Wtautological-constant-out-of-range-compare for example. So I think it's the best if we can avoid catching these issues using a type based matching.
Anyway, there is not too many of this kind of false positives, so it's not a big issue. In LLVM code I did not find any similar case.
I can't see full false positives where the check works incorrectly. The detected type mismatch seems correctly detected in every case.

Agree. I think this check is good to go.

I would commit this check tomorrow if that is ok with you.

JonasToth added inline comments.Nov 11 2018, 11:15 AM
test/clang-tidy/bugprone-too-small-loop-variable.cpp
138 ↗(On Diff #173560)

Could you please add a test

for (short i = 0; i < (!cond ? size : 0); ++i) {
}

as well. Just to make sure the analysis does not depent on the type of the LHS. If it does, add a FIXME for later.

ztamas updated this revision to Diff 173573.Nov 11 2018, 12:54 PM
  • Add requested test case
ztamas marked an inline comment as done.Nov 11 2018, 12:55 PM

Agree. I think this check is good to go.

I would commit this check tomorrow if that is ok with you.

Of course, It would be great if this check can get in, Thanks!

whisperity accepted this revision.Nov 12 2018, 12:15 AM
This revision was automatically updated to reflect the committed changes.

Committed r346665.

Thank you very much for the patch!

Thank you very much for the patch!

Thank you for reviewing!