This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] Add a clang-tidy check for possible inefficient vector operations
ClosedPublic

Authored by hokein on Apr 6 2017, 5:43 AM.

Details

Summary

The "performance-inefficient-vector-operation" check finds vector oprations in
for-loop statements which may cause multiple memory reallocations.

This is the first version, only detects typical for-loop:

std::vector<int> v;
for (int i = 0; i < n; ++i) {
  v.push_back(i);
}

// or

for (int i = 0; i < v2.size(); ++i) {
  v.push_back(v2[i]);
}

We can extend it to handle more cases like for-range loop in the future.

Diff Detail

Repository
rL LLVM

Event Timeline

hokein created this revision.Apr 6 2017, 5:43 AM
hokein updated this revision to Diff 94356.Apr 6 2017, 5:45 AM

Fix a small nit.

Isn't such analysis is path-sensitive and should be implemented in Static Analyzer?

Please mention this check in docs/ReleaseNotes.rst (in alphabetical order).

docs/clang-tidy/checks/performance-inefficient-vector-operation.rst
6 ↗(On Diff #94356)

I think will be good idea to use std::vector, and enclose it and push_back in ``.

Eugene.Zelenko set the repository for this revision to rL LLVM.
Eugene.Zelenko added a subscriber: zaks.anna.
hokein updated this revision to Diff 94498.Apr 7 2017, 2:42 AM
hokein marked an inline comment as done.

Mention the check in Release note.

hokein added a comment.Apr 7 2017, 2:43 AM

Isn't such analysis is path-sensitive and should be implemented in Static Analyzer?

This check is more like a flow-sensitive analysis IMO, which can be supported by clang-tidy.

Please mention this check in docs/ReleaseNotes.rst (in alphabetical order).

Done. I think we could add this tip in the add_new_check.py script, so that developers don't forget about it ;)

aaron.ballman edited edge metadata.Apr 7 2017, 11:07 AM

Isn't such analysis is path-sensitive and should be implemented in Static Analyzer?

This check is more like a flow-sensitive analysis IMO, which can be supported by clang-tidy.

It can be supported by clang-tidy, but your patch isn't doing it. I think the check would be improved substantially by improving the analysis.

clang-tidy/performance/InefficientVectorOperationCheck.cpp
22 ↗(On Diff #94498)

Why does this check only focus on vector? Other containers allow you to preallocate space for their elements or lend themselves to inefficient operations.

Also, this should be checking ::std::vector instead.

32 ↗(On Diff #94498)

This isn't paying attention to what is being reserved.

46–50 ↗(On Diff #94498)

These conditions seem rather restrictive -- why should you not get the same issue with a range-based for loop?

53–54 ↗(On Diff #94498)

I'm really not keen on this. It will catch trivial cases, so there is some utility, but this will quickly fall apart with anything past the trivial case.

69–71 ↗(On Diff #94498)

These asserts should come with string literal messages, though I think they can be removed entirely (I don't think there's a value add here).

74 ↗(On Diff #94498)

Why the llvm:: qualifier?

80 ↗(On Diff #94498)

Do not use auto, and the formatting is off.

92 ↗(On Diff #94498)

Surround push_back with quotes.

test/clang-tidy/performance-inefficient-vector-operation.cpp
53 ↗(On Diff #94498)

I'd like to see a test where you reserve 5 elements and then loop for 10 elements.

hokein updated this revision to Diff 94734.Apr 10 2017, 1:46 PM
hokein marked 7 inline comments as done.

Address review comments.

hokein added inline comments.Apr 10 2017, 1:52 PM
clang-tidy/performance/InefficientVectorOperationCheck.cpp
22 ↗(On Diff #94498)

Why does this check only focus on vector? Other containers allow you to preallocate space for their elements or lend themselves to inefficient operations.

Only a few std containers (vector, unordered_map, unorder_set) provide the preallocate ability ("reserve") to users. vector is the most widely used container with these inefficient operations in the real world.

Yes, we could extend this check to support other containers, but it will make the check more complicated (in the future, I plan to support for-range loops and for loops with iterators, and use other elegant fix-it solutions like "range insert" or "range constructor" in some specific cases rather than merely inserting reserve statements). So I think it makes more sense to make this check only focus on vector.

32 ↗(On Diff #94498)

Good catch. Added a test for it.

46–50 ↗(On Diff #94498)

Yeah, this is intended. We want to match typical for-loops with counters here. For-range loop will be supported in the future.

53–54 ↗(On Diff #94498)

The motivation of this check is to find code patterns like for (int i = 0; i < n; ++i) { v.push_back(i); } and clean them in our codebase (we have lots of similar cases). These are all cases we want to support. Using hasParent is a simple and sufficient way to do it IMO.

test/clang-tidy/performance-inefficient-vector-operation.cpp
53 ↗(On Diff #94498)

Even for this case, the check will also ignore it.

aaron.ballman added inline comments.Apr 11 2017, 7:01 AM
clang-tidy/performance/InefficientVectorOperationCheck.cpp
53–54 ↗(On Diff #94498)

I'm not convinced of the utility without implementing this in a more sensitive way. Have you run this across any large code bases and found that it catches issues?

hokein added inline comments.Apr 11 2017, 8:02 AM
clang-tidy/performance/InefficientVectorOperationCheck.cpp
53–54 ↗(On Diff #94498)

Yeah, the check catches ~2800 cases (regexp shows ~17,000 total usages) in our internal codebase. And all caught cases are what we are interested in. It would catch more if we support for-range loops and iterator-based for-loops.

aaron.ballman added inline comments.Apr 11 2017, 8:22 AM
clang-tidy/performance/InefficientVectorOperationCheck.cpp
53–54 ↗(On Diff #94498)

I wasn't worried about it not triggering often enough, I was worried about it triggering too often because of the lack of sensitivity. If you randomly sample some of those 2800 cases, do they reserve the space in a way that your check isn't catching?

hokein added inline comments.Apr 11 2017, 11:40 AM
clang-tidy/performance/InefficientVectorOperationCheck.cpp
53–54 ↗(On Diff #94498)

Ok, I see your concern now, thanks for pointing it out.

I have read through these caught cases. The results look reasonable. Most cases (> 95%) are what we expected, the code pattern is like vector<T> v; for (...) { v.push_back(...); } where the vector definition statement and for-loop statement are consecutive. Another option is to make the check more strict (only detects the consecutive code pattern).

aaron.ballman added inline comments.Apr 11 2017, 2:02 PM
clang-tidy/performance/InefficientVectorOperationCheck.cpp
53–54 ↗(On Diff #94498)

Okay, that sounds like it has utility then. Thank you for clarifying!

hokein updated this revision to Diff 94969.Apr 12 2017, 7:41 AM

Improve the way of detecting pre-allocation usage before the loop.

hokein marked an inline comment as done.Apr 12 2017, 7:44 AM

friendly ping @alexfh.

clang-tidy/performance/InefficientVectorOperationCheck.cpp
53–54 ↗(On Diff #94498)

I improved the way of catching the preallocations before the loop. Could you take a look again?

aaron.ballman added inline comments.Apr 13 2017, 7:20 AM
clang-tidy/performance/InefficientVectorOperationCheck.cpp
81 ↗(On Diff #94969)

Perhaps you could support other comparisons, but not attempt to generate a fix-it for them? It seems odd that this would trigger for < but not <=, but I can see why you'd not want to figure out the reserve call for !(foo >= 10).

83 ↗(On Diff #94969)

Thinking out loud, but, what happens if the loop end expression has some hidden complexity to it? e.g.,

for (int i = 0; i < i + 1; ++i) { // This is a "creative" loop.
  v.push_back(i);
}
84 ↗(On Diff #94969)

Can you add a test case for post-increment (all of your tests use pre-increment)? Also, count-down loops seem reasonable to support as well, no?

for (int i = 10; i >= 0; --i) {
  v.push_back(i);
}
86–87 ↗(On Diff #94969)

You should update the documentation to mention that this check only worries about a for loop with a single statement in it. It will be surprising that this code does not trigger the same diagnostic:

for (...) {
  std::cout << i;
  v.push_back(i);
}
95–96 ↗(On Diff #94969)

We don't usually add this test in to our check calls; why are you adding it here?

98 ↗(On Diff #94969)

Might as well make this a reference rather than a pointer (simplifies code elsewhere).

100–104 ↗(On Diff #94969)

Formatting (I would recommend running clang-format over the patch).

109–111 ↗(On Diff #94969)

I'm not certain what types are being used here. Can you turn AllVectorVarRefs into something with an explicit type so that I can know what Ref's type is?

117 ↗(On Diff #94969)

identified -> identify

clang-tidy/performance/InefficientVectorOperationCheck.h
20 ↗(On Diff #94969)

Drop the hyphen in for loops.

docs/clang-tidy/checks/performance-inefficient-vector-operation.rst
7 ↗(On Diff #94969)

The docs should talk more about the limitations of the check (like how many statements it can contain, etc).

hokein updated this revision to Diff 95174.Apr 13 2017, 11:16 AM
hokein marked 11 inline comments as done.

Address review comments.

Thanks for the comments.

clang-tidy/performance/InefficientVectorOperationCheck.cpp
81 ↗(On Diff #94969)

We could support other comparisons in the future. I think it is a good start to just support < in the first version as < usage happens more frequently than other comparisons.

Added a fixme.

83 ↗(On Diff #94969)

Good point. It will copy the expr "i+1" to "reserve" parameter. This kind of case should not be existed, but it is safer to filter out this case.

95–96 ↗(On Diff #94969)

It will prevent some unexpected things happened when the translation unit fails to compile.

We had a few experiences before. We encountered some misbehavior of some checks (e.g. https://reviews.llvm.org/rL294578) with a non-compilable TU, and then we added this statement to avoid the misbehavior.

aaron.ballman added inline comments.Apr 13 2017, 1:54 PM
clang-tidy/performance/InefficientVectorOperationCheck.cpp
95–96 ↗(On Diff #94969)

I think that this should be moved up into whatever layer calls check(), because it does not seem like the checks are actually valid if the TU has had a fatal error. However, that doesn't have to be part of this patch.

109–111 ↗(On Diff #94969)

I may not have been clear -- I don't mean that the variable name should contain type information, I mean that the type should not be automatically deduced. We only use auto when the type is spelled explicitly in the initialization or is otherwise obvious from context (like range-based for loops).

hokein updated this revision to Diff 95269.Apr 13 2017, 10:23 PM
hokein marked an inline comment as done.

Explicit the type.

clang-tidy/performance/InefficientVectorOperationCheck.cpp
109–111 ↗(On Diff #94969)

I'd prefer to use auto to initialize AllVectorVarRefExprs, as its type is SmallPtrSet<const DeclRefExpr *, 16>, which is a long and noisy name. Using auto can increases readability here.

aaron.ballman added inline comments.Apr 14 2017, 11:47 AM
clang-tidy/performance/InefficientVectorOperationCheck.cpp
116 ↗(On Diff #95269)

Formatting (and you can use auto here once you correct the type above).

109–111 ↗(On Diff #94969)

It doesn't increase readability when someone reviewing the code has no idea what types are involved and has to ask you for clarification. auto should only be used when the type is obvious from the context or spelling the type is *really* noisy. A single level of template goop is a very small amount of noise.

hokein updated this revision to Diff 95338.Apr 14 2017, 1:07 PM
hokein marked 2 inline comments as done.

Address remaining comments.

This revision is now accepted and ready to land.Apr 15 2017, 6:04 AM
This revision was automatically updated to reflect the committed changes.

@alexfh, I'm submitting the patch now. If you have following comments, I'm happy to address them afterwards.

Does it work for targeting msvc?
See also; http://bb.pgr.jp/builders/test-clang-tools-msc-x64-on-i686-linux-RA/builds/1043

Appeased in r300545. Would it be happier if it worked with "-target x86_64-win32"?

@chapuni, thanks for the workaround. It makes sense to make it work for targeting msvc. Will fix it in a follow-up.

alexfh edited edge metadata.Apr 18 2017, 10:27 AM

Awesome, thanks! A few late comments inline.

clang-tools-extra/trunk/clang-tidy/performance/InefficientVectorOperationCheck.cpp
56

It might make sense to make the list of types configurable to support custom vector-like types.

65

Loops with emplace_back would suffer from the same issue.

131

No need to namespace-qualify StringRef and Lexer.

133

Actual LangOptions should be used instead of a default-constructed instance. Same below.

143

Please remove the trailing period.