This is an archive of the discontinued LLVM Phabricator instance.

Add -Wrange-loop-analysis to warn when a range-based for-loop is creating a copy.
ClosedPublic

Authored by rtrieu on Jun 16 2014, 10:18 PM.

Details

Summary

This warning will trigger on loop variables that make copies in a range-based for-loop. The three cases this warning catches are:

  1. for (const Foo &x : Foos), where the range Foos does not return a copy. This warning will suggest using the non-reference type so the copy is obvious.
  2. for (const Foo x : Foos), where the range Foos does return a reference, but is copied into x. This warning will suggest using the reference type to prevent a copy from being made.
  3. for (const Bar &x : Foos), where Bar is constructed from Foo. In this case, suggest using the non-reference "const Bar" to indicate a copy is intended to be made, or "const Foo &" to prevent a copy from being made.

-Wrange-loop-analysis is being added as a sub-group to -Wloop-analysis. The previous warnings there are moved to -Wfor-loop-analysis. While the warnings are currently split along statement types, a finer grain division of warnings may be needed.

Diff Detail

Event Timeline

rtrieu updated this revision to Diff 10476.Jun 16 2014, 10:18 PM
rtrieu retitled this revision from to Add -Wrange-loop-analysis to warn when a range-based for-loop is creating a copy..
rtrieu updated this object.
rtrieu edited the test plan for this revision. (Show Details)
rtrieu added a subscriber: Unknown Object (MLST).
dblaikie added inline comments.
lib/Sema/SemaStmt.cpp
2386

I think Alp's been trying to move to using an "isIgnored" function to do this test for some reason - I haven't looked into the motivation, but it might be something to check if you should be consistent with.

2398

What are the cases where a for loop has no loop variable? (or the loop variable has no init, down on line 2382)

2411

Might be worth splitting out these cases as separate functions, maybe?

2474

The header comment for this function doesn't mention this case of Foo v Bar, it just mentions:

Foo foos[5];
for (const Foo f : foos);

->

for (const Foo &f : foos);

And the implementation looks like that's the intent of the code. Is the comment out of date/wrong?

2477

Not sure if it'd be more readable, but you could just assign:

MakeCopy = CE->getConstructor()->isCopyConstructor();

2480

Same assignment possible here

2492

return here seems a bit unnecessary (and/or as above, this whole case might be nice as a separate function for readability)

test/SemaCXX/warn-range-loop-analysis.cpp
15

I'm not sure if this merits a change to the test code, but this is a bit surprising/out-of-character to have a container of T return T by value from its iterator's op*. (usually it would return a reference to T)

42

I think this is going to have a pretty high false positive rate... do you have some numbers?

47

If these test functions provide logical grouping, perhaps they could be named based on that grouping, or have a comment describing it?

48

any reason for the typedef here, but not in test0 above?

51

Looks the same as the previous tests? (line 33, specifically)

If this test is to ensure that a container wrapped in a typedef is handled correctly - what kind of problems did you have with this situation that you had to handle/correct for?

74

I worry about people creating expensive copies this way, but maybe that'll need to be a clang-tidy check rather than a warning.

84

Same questions as above

rtrieu updated this revision to Diff 10909.Jun 26 2014, 6:28 PM
rtrieu added inline comments.Jun 26 2014, 6:44 PM
lib/Sema/SemaStmt.cpp
2386

Phabricator doesn't show it, but the revision this patch was based on preceded Alp's patch, so using the isIgnored wasn't an option at the time. I have rebased to a later revision and now use it.

2398

There is not guarantee that this is a well-formed for loop. It is possible that ill-formed code created a less that perfect AST, which we should protect against.

2411

Done.

2474

Confusing comment removed.

2477

Used an early return in the separate function instead of the MakeCopy variable.

2480

Same as above.

2492

Removed.

test/SemaCXX/warn-range-loop-analysis.cpp
15

Using two separate containers now. Container uses reference returning iterators. ContainerNonReference uses non-reference returning iterators.

42

What is your idea of a false positive?

47

Test comments added above.

48

Removed typedef. Relevant information moved to test comments above.

51

test0 checks the entire warning and note messages while the remainder of the tests focuses on thoroughly testings the different combinations of types.

Typedefs have been removed as nothing tests them.

74

Since there are only two forms of const loop variables, this one must be the one accepted for copies. Otherwise, we would have to suggest something like:

for (const int x : A) {
  const Bar y(x);
}

in order to silence the warning.

84

Same answers as above.

dblaikie added inline comments.Jun 27 2014, 10:04 AM
lib/Sema/SemaStmt.cpp
2387

This check is already performed in the caller - I think it's OK to just omit it entirely here (and in DiagnoseForRangeConstVariableCopies) but if you think it's better for readability, perhaps it could be an assert instead?

test/SemaCXX/warn-range-loop-analysis.cpp
42

Users are taught to pass ints to functions by value, rather than reference, to avoid the indirection overhead. The same logic applies (to a degree) to for loops. It's not unreasonable to write "for (const int i : my_vec_of_ints)".

I don't think this case is likely to be a good thing to warn on and we may need a heuristic of "if a type is 'large' enough (or has a non-trivial copy), warn when a copy is made". StringRef would be a similar situation - I can imagine lots of people (I bet we've got a few in Clang/LLVM already) looping over StringRef's by value.

Do you have some stats on running this warning in its current form over LLVM/Clang? (and Google, for that matter, or any other substantial codebase).

Looking at other test cases, this wouldn't warn if the "const" was removed? Perhaps we just need a fix to disregard cv qualifiers, then?

51

Struck me as a bit strange to have the duplication, but I guess you chose it for consistency? (so this tests each case in turn, doesn't have strangely missing cases from the pattern just because they were covered elsewhere?) I'm OK with that.

74

Honestly, I think that might be worthwhile - but would need stats to understand it better. And there's no harm in the first version of the warning being conservative - more aggressive diagnosis can be done incrementally with relevant data.

rtrieu added inline comments.Jul 9 2014, 6:33 PM
test/SemaCXX/warn-range-loop-analysis.cpp
42

LLVM/Clang has a total of 5 warnings emitted for this warning. (Where changing the variable to reference type will prevent a copy.) Two are pairs from maps, one is a CXXBaseSpecifier, one is QualType, and one is a ParmVarDecl pointer. From your description, the QualType and Decl pointer are both small and trivially copyable so they should be exempt from this warning, so a 40% drop in cases with your suggested filter.

Applying the same filter to results from Google, the drop is closer to 70%. The filter is to ignore trivially copyable types which are 2x pointer size or smaller.

The const is a good indication that the variable is read-only, so we can suggest using a const-reference as the cheaper option. Without the const, we don't know if the use in the loop was intended only for the loop copy or to change the object in the container as well.

dblaikie added inline comments.Jul 15 2014, 9:19 AM
test/SemaCXX/warn-range-loop-analysis.cpp
42

You mention a 70% drop in Google - but what does a random sample of the remaining (and omitted) cases look like?

I'm trying to understand if this warning is good or not, as-is, which means we need a reasonable false positive % estimate/number to decide whether it needs to be tweaked further to reduce false positives (that's the main thing - if it's got lots of false negatives that we can gain back, that can be done in separate patches, but going in-tree with high false positive rates is generally frowned upon)

Perhaps I'm missing something where you described this already.

rtrieu updated this revision to Diff 12976.Aug 26 2014, 5:38 PM

Don't warn when POD types are copied. It's likely that some of the larger POD types should be warned on, but a reasonable limit is needed since copies of smaller POD types is a regular occurrence.

You could probably be a bit narrower than POD types - probably just
types with trivial copy constructors. But for now "all POD types"
shouldn't have any false positives, only false negatives - so perhaps
leave it that way with a FIXME Describing a narrower check for small
types (small to be defined/discovered) with trivial copy construction.

rtrieu updated this revision to Diff 20808.Feb 26 2015, 6:24 PM

Update patch to be based on a newer revision.

You could probably be a bit narrower than POD types - probably just
types with trivial copy constructors. But for now "all POD types"
shouldn't have any false positives, only false negatives - so perhaps
leave it that way with a FIXME Describing a narrower check for small
types (small to be defined/discovered) with trivial copy construction.

Have you addressed these suggestions?

I don't recall where this was all left, exactly.

You could probably be a bit narrower than POD types - probably just
types with trivial copy constructors. But for now "all POD types"
shouldn't have any false positives, only false negatives - so perhaps
leave it that way with a FIXME Describing a narrower check for small
types (small to be defined/discovered) with trivial copy construction.

Have you addressed these suggestions?

I don't recall where this was all left, exactly.

Currently, all POD types for copies are ignored. There is a comment to only ignore types with trivial constructors and to figure out a proper size for small in a future revision.

Results from running the warning on LLVM projects. The actual errors are at the end. In total, this warning was triggered 22 times. The breakdown is:

7 - for (const Foo &x : Foos), where the range Foos only return a copy. Suggest using the non-reference type so the copy is obvious.
5 - for (const Foo x : Foos), where the range Foos does return a reference, but is copied into x. Suggest using the reference type to prevent a copy from being made.
10 - for (const Bar &x : Foos), where Bar is constructed from Foo. Suggest using the non-reference "const Bar" to indicate a copy is intended to be made, or "const Foo &" to prevent a copy from being made.

llvm/lib/Analysis/RegionPass.cpp:197:22: error: loop variable 'BB' is always a copy because the range of type 'llvm::iterator_range<llvm::RegionBase<llvm::RegionTraits<llvm::Function> >::block_iterator_wrapper<false> >' does not return a reference
llvm/lib/Analysis/RegionPass.cpp:197:10: note: use non-reference type 'llvm::BasicBlock *const'
llvm/lib/Analysis/RegionPrinter.cpp:126:22: error: loop variable 'BB' is always a copy because the range of type 'llvm::iterator_range<llvm::RegionBase<llvm::RegionTraits<llvm::Function> >::block_iterator_wrapper<true> >' does not return a reference
llvm/lib/Analysis/RegionPrinter.cpp:126:10: note: use non-reference type 'llvm::BasicBlock *const'
llvm/lib/MC/MCDwarf.cpp:814:19: error: loop variable 'sec' of type 'const std::pair<const llvm::MCSection *, std::pair<llvm::MCSymbol *, llvm::MCSymbol *> >' creates a copy from type 'const std::pair<const llvm::MCSection *, std::pair<llvm::MCSymbol *, llvm::MCSymbol *> >'
llvm/lib/MC/MCDwarf.cpp:814:8: note: use reference type 'const std::pair<const llvm::MCSection *, std::pair<llvm::MCSymbol *, llvm::MCSymbol *> > &' to prevent copying
llvm/lib/Object/COFFObjectFile.cpp:263:31: error: loop variable 'SymbI' has type 'const llvm::object::symbol_iterator &' but is initialized with type 'const llvm::object::SymbolRef' resulting in a copy
llvm/lib/Object/COFFObjectFile.cpp:263:8: note: use non-reference type 'const llvm::object::symbol_iterator' to keep the copy or type 'const llvm::object::SymbolRef &' to prevent copying
llvm/lib/Target/AArch64/AArch64CollectLOH.cpp:331:25: error: loop variable 'Entry' of type 'const llvm::detail::DenseMapPair<unsigned int, unsigned int>' creates a copy from type 'const llvm::detail::DenseMapPair<unsigned int, unsigned int>'
llvm/lib/Target/AArch64/AArch64CollectLOH.cpp:331:14: note: use reference type 'const llvm::detail::DenseMapPair<unsigned int, unsigned int> &' to prevent copying
llvm/lib/Transforms/Scalar/StructurizeCFG.cpp:889:20: error: loop variable 'BB' is always a copy because the range of type 'llvm::iterator_range<llvm::RegionBase<llvm::RegionTraits<llvm::Function> >::block_iterator_wrapper<false> >' does not return a reference
llvm/lib/Transforms/Scalar/StructurizeCFG.cpp:889:8: note: use non-reference type 'llvm::BasicBlock *const'
llvm/tools/clang/lib/AST/MicrosoftMangle.cpp:1614:25: error: loop variable 'Arg' of type 'const clang::QualType' creates a copy from type 'const clang::QualType'
llvm/tools/clang/lib/AST/MicrosoftMangle.cpp:1614:10: note: use reference type 'const clang::QualType &' to prevent copying
llvm/tools/clang/lib/CodeGen/CodeGenModule.cpp:3439:19: error: loop variable 'I' of type 'const llvm::detail::DenseMapPair<const clang::Decl *, bool>' creates a copy from type 'const llvm::detail::DenseMapPair<const clang::Decl *, bool>'
llvm/tools/clang/lib/CodeGen/CodeGenModule.cpp:3439:8: note: use reference type 'const llvm::detail::DenseMapPair<const clang::Decl *, bool> &' to prevent copying
llvm/tools/clang/lib/CodeGen/CoverageMappingGen.cpp:945:22: error: loop variable 'Arg' is always a copy because the range of type 'llvm::iterator_range<clang::ConstExprIterator>' does not return a reference
llvm/tools/clang/lib/CodeGen/CoverageMappingGen.cpp:945:10: note: use non-reference type 'const clang::Expr *const'
llvm/tools/clang/lib/Sema/SemaLookup.cpp:3026:20: error: loop variable 'R' is always a copy because the range of type 'llvm::iterator_range<clang::DeclContext::all_lookups_iterator>' does not return a reference
llvm/tools/clang/lib/Sema/SemaLookup.cpp:3026:8: note: use non-reference type 'const llvm::MutableArrayRef<clang::NamedDecl *>'
llvm/tools/clang/lib/Sema/SemaTemplateInstantiate.cpp:1776:19: error: loop variable 'Base' of type 'const clang::CXXBaseSpecifier' creates a copy from type 'const clang::CXXBaseSpecifier'
llvm/tools/clang/lib/Sema/SemaTemplateInstantiate.cpp:1776:8: note: use reference type 'const clang::CXXBaseSpecifier &' to prevent copying
llvm/tools/clang/tools/extra/clang-rename/USRFindingAction.cpp:49:20: error: loop variable 'CtorDecl' is always a copy because the range of type 'llvm::iterator_range<clang::DeclContext::specific_decl_iterator<clang::CXXConstructorDecl> >' does not return a reference
llvm/tools/clang/tools/extra/clang-rename/USRFindingAction.cpp:49:8: note: use non-reference type 'clang::CXXConstructorDecl *const'
llvm/tools/llvm-vtabledump/llvm-vtabledump.cpp:299:68: error: loop variable 'VFTableEntry' has type 'const std::pair<std::pair<StringRef, uint64_t>, StringRef> &' but is initialized with type 'std::pair<const std::pair<llvm::StringRef, unsigned long>, llvm::StringRef>' resulting in a copy
llvm/tools/llvm-vtabledump/llvm-vtabledump.cpp:299:8: note: use non-reference type 'const std::pair<std::pair<StringRef, uint64_t>, StringRef>' to keep the copy or type 'const std::pair<const std::pair<llvm::StringRef, unsigned long>, llvm::StringRef> &' to prevent copying
llvm/tools/llvm-vtabledump/llvm-vtabledump.cpp:306:58: error: loop variable 'VBTable' has type 'const std::pair<StringRef, ArrayRef<little32_t> > &' but is initialized with type 'std::pair<const llvm::StringRef, llvm::ArrayRef<llvm::support::detail::packed_endian_specific_integral<int, llvm::support::endianness::little, 1> > >' resulting in a copy
llvm/tools/llvm-vtabledump/llvm-vtabledump.cpp:306:8: note: use non-reference type 'const std::pair<StringRef, ArrayRef<little32_t> >' to keep the copy or type 'const std::pair<const llvm::StringRef, llvm::ArrayRef<llvm::support::detail::packed_endian_specific_integral<int, llvm::support::endianness::little, 1> > > &' to prevent copying
llvm/tools/llvm-vtabledump/llvm-vtabledump.cpp:314:59: error: loop variable 'COLPair' has type 'const std::pair<StringRef, CompleteObjectLocator> &' but is initialized with type 'std::pair<const llvm::StringRef, CompleteObjectLocator>' resulting in a copy
llvm/tools/llvm-vtabledump/llvm-vtabledump.cpp:314:8: note: use non-reference type 'const std::pair<StringRef, CompleteObjectLocator>' to keep the copy or type 'const std::pair<const llvm::StringRef, CompleteObjectLocator> &' to prevent copying
llvm/tools/llvm-vtabledump/llvm-vtabledump.cpp:323:62: error: loop variable 'CHDPair' has type 'const std::pair<StringRef, ClassHierarchyDescriptor> &' but is initialized with type 'std::pair<const llvm::StringRef, ClassHierarchyDescriptor>' resulting in a copy
llvm/tools/llvm-vtabledump/llvm-vtabledump.cpp:323:8: note: use non-reference type 'const std::pair<StringRef, ClassHierarchyDescriptor>' to keep the copy or type 'const std::pair<const llvm::StringRef, ClassHierarchyDescriptor> &' to prevent copying
llvm/tools/llvm-vtabledump/llvm-vtabledump.cpp:331:68: error: loop variable 'BCAEntry' has type 'const std::pair<std::pair<StringRef, uint64_t>, StringRef> &' but is initialized with type 'std::pair<const std::pair<llvm::StringRef, unsigned long>, llvm::StringRef>' resulting in a copy
llvm/tools/llvm-vtabledump/llvm-vtabledump.cpp:331:8: note: use non-reference type 'const std::pair<std::pair<StringRef, uint64_t>, StringRef>' to keep the copy or type 'const std::pair<const std::pair<llvm::StringRef, unsigned long>, llvm::StringRef> &' to prevent copying
llvm/tools/llvm-vtabledump/llvm-vtabledump.cpp:338:57: error: loop variable 'BCDPair' has type 'const std::pair<StringRef, BaseClassDescriptor> &' but is initialized with type 'std::pair<const llvm::StringRef, BaseClassDescriptor>' resulting in a copy
llvm/tools/llvm-vtabledump/llvm-vtabledump.cpp:338:8: note: use non-reference type 'const std::pair<StringRef, BaseClassDescriptor>' to keep the copy or type 'const std::pair<const llvm::StringRef, BaseClassDescriptor> &' to prevent copying
llvm/tools/llvm-vtabledump/llvm-vtabledump.cpp:349:52: error: loop variable 'TDPair' has type 'const std::pair<StringRef, TypeDescriptor> &' but is initialized with type 'std::pair<const llvm::StringRef, TypeDescriptor>' resulting in a copy
llvm/tools/llvm-vtabledump/llvm-vtabledump.cpp:349:8: note: use non-reference type 'const std::pair<StringRef, TypeDescriptor>' to keep the copy or type 'const std::pair<const llvm::StringRef, TypeDescriptor> &' to prevent copying
llvm/tools/llvm-vtabledump/llvm-vtabledump.cpp:359:68: error: loop variable 'VTTPair' has type 'const std::pair<std::pair<StringRef, uint64_t>, StringRef> &' but is initialized with type 'std::pair<const std::pair<llvm::StringRef, unsigned long>, llvm::StringRef>' resulting in a copy
llvm/tools/llvm-vtabledump/llvm-vtabledump.cpp:359:8: note: use non-reference type 'const std::pair<std::pair<StringRef, uint64_t>, StringRef>' to keep the copy or type 'const std::pair<const std::pair<llvm::StringRef, unsigned long>, llvm::StringRef> &' to prevent copying
llvm/tools/llvm-vtabledump/llvm-vtabledump.cpp:366:47: error: loop variable 'TIPair' has type 'const std::pair<StringRef, StringRef> &' but is initialized with type 'std::pair<const llvm::StringRef, llvm::StringRef>' resulting in a copy
llvm/tools/llvm-vtabledump/llvm-vtabledump.cpp:366:8: note: use non-reference type 'const std::pair<StringRef, StringRef>' to keep the copy or type 'const std::pair<const llvm::StringRef, llvm::StringRef> &' to prevent copying
llvm/utils/TableGen/CodeGenRegisters.cpp:1773:24: error: loop variable 'SUI' is always a copy because the range of type 'const RegUnitList' (aka 'const SparseBitVector<>') does not return a reference
llvm/utils/TableGen/CodeGenRegisters.cpp:1773:12: note: use non-reference type 'const unsigned int'

dblaikie added inline comments.Mar 4 2015, 1:57 PM
lib/Sema/SemaStmt.cpp
2412

Avoid branch-to-unreachable, just uncoditionally cast<MaterializeTemporaryExpr> in the previous block (as an else, not an else-if)

2427

Prefer an assertion rather than branch-to-unreachable

(eg: change the else if to an else, then just use cast<CXXOperatorCallExpr> since it must be true)

test/SemaCXX/warn-range-loop-analysis.cpp
14

Maybe just have one Container class that takes the iterator element type as a parameter? (T, T&, etc) - essentially your ContainerNonReference (renamed), and just instantiate it with T& if you want the reference version.

18

Is it much hassle to just have the code work in such a way that it doesn't have errors?

24

I'd probably jsut make all these classes structs & skip the access modifiers - there's not much to hide/fence off in a test case.

88

We seem to have some varied opinions on whether to drop 'const' when removing a reference - is their diagnostic precedence in clang we can draw from where we already suggest dropping refs (to see if we also suggest dropping const there too)?

I seem to recall this coming up in some discussion about a clang-tidy fix recently (the pass-by-value fixit hint, which is very similar to this warning)

99

Having the comments /after/ the code in question is a bit hard to read - is that intentional? Could it be the other way around, with the comment coming before the code?

181

typo, 'no' rather than 'now'?

293

Is the array case in any way differently handled in the code itself? If not, I probably wouldn't bother testing them (here, or the previous 2 test functions)

rtrieu updated this revision to Diff 23119.Apr 1 2015, 10:02 PM
rtrieu updated this revision to Diff 23120.Apr 1 2015, 10:07 PM
rtrieu added inline comments.Apr 1 2015, 10:47 PM
lib/Sema/SemaStmt.cpp
2412

Done.

2427

Done.

test/SemaCXX/warn-range-loop-analysis.cpp
14

Done.

18

Yes, we could remove all the non-const reference bindings that fail. Each test tries to bind/assign a type 4 different ways, T, T&, const T, and const T&. Since the other type is convertible, binding to T& sometimes will fail. I think we should keep them for completeness.

24

Done.

88

I don't recall how we handle removing references in other cases. Here, the code author had marked to loop variable const, making the intent clear that the const should be kept.

99

The @ after the expected diagnostic can work both ways, using minus to refer to earlier lines and plus to refer to later lines. In grep, I found 130 files with @-NUM while there were 120 files with @+num.

181

Fixed.

293

The arrays use a pointer type as their iterator. The Container struct above used a user-defined iterator. The code path is slightly different for the two.

dblaikie accepted this revision.Apr 10 2015, 9:46 AM
dblaikie added a reviewer: dblaikie.

Optional feedback left (just restating stuff I already said, perhaps it sounds good to you, but don't feel like you have to agree with me on any of those). Otherwise looks good, commit away!

test/SemaCXX/warn-range-loop-analysis.cpp
18

Could we just leave a comment in, explaining that this particular combination is invalid? (that way each test is consistent/follows the same formula, but we just explain why this particular intersection isn't relevant/tested)

I'm just a little scared of semi-random note/error-ignorance in a test & how that might hide some other behavior, maybe... not sure. I'm willing to set aside my preference here, though - said my piece & leave it up to you.

88

I don't think the intent is that clear - it's pretty common to mark references const when you're not planning to mutate them (sometimes out of necessity, because an API only provides a const reference, so you have to follow suit) - but it's pretty rare to put const on local value variables.

I cc'd you on an an internal bug discussing similar behavior in clang-tidy checks for passing by value (I'm not sure if this clang-tidy check is public, the discussion just happened to be internal) which, by the sounds of things, does what I'm suggesting here, and drops const when dropping reference-ness.

I think that's probably the right thing to do.

99

Curious - that seems counter to the usual habit, when writing normal code, to put a comment before the code it applies to. I wouldn't mind breaking in that direction given that there's sufficient variation in the population (though, admittedly, my grepping showed up 173:134 in favor of trailing @- rather than leading @+.. *shrug*)

293

Ah, OK.

This revision is now accepted and ready to land.Apr 10 2015, 9:46 AM
This revision was automatically updated to reflect the committed changes.

This has been committed with dblaikie's suggestions to drop the const when getting the non-reference type and commenting out the loops which produce errors in the test case.