This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] Inefficient string operation
ClosedPublic

Authored by bittnerbarni on May 12 2016, 12:35 AM.

Details

Summary

This checker is to warn about the performance overhead caused by concatenating strings using the operator+ instead of basic_string's append or operator+=. It is configurable. In strict mode it matches every instance of a supposed inefficient concatenation, in non-strict mode it matches only those which are inside a loop.

Diff Detail

Event Timeline

bittnerbarni retitled this revision from to [clang-tidy] Inefficient string operation.
bittnerbarni updated this object.
bittnerbarni added a reviewer: alexfh.
bittnerbarni added a subscriber: xazax.hun.

Please mention this check in docs/ReleaseNotes.rst (in alphabetical order).

docs/clang-tidy/checks/performance-inefficient-string-addition.rst
8 ↗(On Diff #56998)

Please highlight operator+ and other class/methods with ``. Same for other places in documentation.

21 ↗(On Diff #56998)

Please use constructor initialization instead of assignment.

58 ↗(On Diff #56998)

Please add new line.

test/clang-tidy/performance-inefficient-string-addition.cpp
1 ↗(On Diff #56998)

Does it really need to be C++11?

45 ↗(On Diff #56998)

Please add new line.

drive-by, some nits.

clang-tidy/performance/InefficientStringAdditionCheck.cpp
31 ↗(On Diff #56998)

nits: I kind of prefer :

const auto BasicStringType

here and below.

74 ↗(On Diff #56998)

nits: remove this line

docs/clang-tidy/checks/performance-inefficient-string-addition.rst
4 ↗(On Diff #56998)

line with "===" should be the same length than title.

11 ↗(On Diff #56998)

nits: delete one blank line

19 ↗(On Diff #56998)

remove extra blank line (only one)

45 ↗(On Diff #56998)

nits: spaces between operator "+"

Eugene.Zelenko added inline comments.May 13 2016, 7:16 AM
docs/ReleaseNotes.rst
191

Please highlight operator+ and operator+= with ``.

alexfh requested changes to this revision.May 14 2016, 5:37 PM
alexfh edited edge metadata.

The last diff has nothing but a 2-line change in the docs.

This revision now requires changes to proceed.May 14 2016, 5:37 PM
bittnerbarni edited edge metadata.

Sorry for uploading 2 line long diffs. I uploaded the whole diff now.

etienneb added inline comments.May 15 2016, 5:32 PM
clang-tidy/performance/InefficientStringAdditionCheck.cpp
35 ↗(On Diff #57294)

you should swap the two parameters

hasOverloadedOperatorName("+"),
hasAnyArgument(...)

matching 'hasOverloadedOperatorName' is less expensive

41 ↗(On Diff #57294)

this is not indented correctly.
could you run

% clang-format -style=file <your-file> -i
47 ↗(On Diff #57294)

space after ","

54 ↗(On Diff #57294)

You forgot a comment here?

bittnerbarni edited edge metadata.
clang-tidy/performance/InefficientStringAdditionCheck.cpp
84 ↗(On Diff #57346)

nit: Warning messages are not full sentences, so they should not start with a capital letter and should not end with a period.

Please also change the comma to a semicolon ;.

85 ↗(On Diff #57346)

std::basic_string::append is correct, but is an unnecessary detail. I'd suggest changing this to string::append or to just append().

alexfh requested changes to this revision.May 20 2016, 2:37 AM
alexfh edited edge metadata.
alexfh added inline comments.
clang-tidy/performance/InefficientStringAdditionCheck.cpp
28 ↗(On Diff #57894)

s/IsStrictMode/StrictMode/

31 ↗(On Diff #57894)

This check is only useful for C++. Please add

if (!getLangOpts().CPlusPlus)
  return;
84 ↗(On Diff #57894)

The root of inefficiency is in unnecessary allocations of temporary strings, so maybe we should note this in the warning message, e.g. "string concatenation results in allocation of unnecessary temporary strings; consider using 'operator+=' or 'string::append()' instead".

clang-tidy/performance/InefficientStringAdditionCheck.h
24 ↗(On Diff #57894)

Please remove the empty line.

25 ↗(On Diff #57894)

s/Addition/Concatenation/?

33 ↗(On Diff #57894)

const bool StrictMode;

docs/clang-tidy/checks/performance-inefficient-string-addition.rst
15 ↗(On Diff #57894)

Please enclose std::string, std::basic_string and other inline code snippets in double backquotes.

This revision now requires changes to proceed.May 20 2016, 2:37 AM
jbcoe added a subscriber: jbcoe.May 22 2016, 5:12 AM
bittnerbarni edited edge metadata.
alexfh requested changes to this revision.Jun 3 2016, 3:49 PM
alexfh edited edge metadata.
alexfh added inline comments.
clang-tidy/performance/InefficientStringConcatenationCheck.cpp
32

clang-format -style=file, please.

67

Each additional matcher has a certain overhead, so it's usually beneficial to merge matchers, if they share common parts. In this case, matchers can be rearranged like this:

Finder->addMatcher(exprWithCleanups(
  hasAncestor(anyOf(cxxForRangeStmt(), whileStmt(), forStmt())),
  anyOf(hasDescendant(AssingOperator), hasDescendant(PlusOperatorMatcher))));

However, the really serious problem with these matchers is the liberal usage of hasDescendant matcher, which traverses the whole subtree. Nested hasDescendant matchers are even worse. Note that the subtree of a statement can be rather large, especially if you consider lambdas. Apart from performance issues, hasDescendant and hasAncestor can yield unexpected results, in particular, when lambdas or local classes are in play. It's always better to use has, if you only need to match direct children and specifically match intermediate nodes, if the descendant you're interested in is always on a fixed depths.

Please try running your check (and maybe a few others for comparison, you can use clang-tidy -enable-check-profile to get the run time) on a few large translation units to estimate its performance and measure the improvement after changing the code.

docs/clang-tidy/checks/performance-inefficient-string-concatenation.rst
7

The problem subheader is the only one here. It doesn't seem like it helps making the document more structured.

9

s/is to warn/warns/
s/strings,/strings/

21

Please make the code snippets use LLVM formatting for consistency. In particular, add a space after for and remove the line break before the opening brace.

40

Add a space before the opening brace.

This revision now requires changes to proceed.Jun 3 2016, 3:49 PM
bittnerbarni edited edge metadata.
bittnerbarni marked 6 inline comments as done.

Removed the unnecessary hasDescendant calls and simplified the checker as suggested. Tested on LLVM codebase, with minor improvements in speed (~1%).

alexfh requested changes to this revision.Jun 13 2016, 5:31 AM
alexfh edited edge metadata.
alexfh added inline comments.
clang-tidy/performance/InefficientStringConcatenationCheck.cpp
42

s/Matcher//

49

s/Assing/Assign/

58

Please avoid unnecessary negation by putting the positive branch first, this will make the logic slightly simpler.

68
  1. The anyOf(hasAncestor(A), hasAncestor(B), ...) construct is still there. Please replace it with hasAncestor(anyOf(A, B, ...)).
  2. Is there really no way to change from hasDescendant / hasAncestor to more strict patterns? I believe, running the check on LLVM doesn't help finding performance issues, since LLVM specifically avoids this pattern by using Twine.
This revision now requires changes to proceed.Jun 13 2016, 5:31 AM
bittnerbarni added inline comments.Jun 13 2016, 12:33 PM
clang-tidy/performance/InefficientStringConcatenationCheck.cpp
68
  1. I was trying to write it like you suggested since the beginning, but it says it cannot deduce the template parameter that way, what's more interesting is that clang-query accepts it.
  2. It seems to me that the assignment operator is (always?) direct child of exprWithCleanups, but I'm not totally sure of that. As for the + operator it could be anywhere among the children.
bittnerbarni edited edge metadata.
alexfh added inline comments.Jul 4 2016, 5:32 AM
clang-tidy/performance/InefficientStringConcatenationCheck.cpp
51
  1. No need for allOf, node matchers are implicitly "all of".
  2. hasDescendant() is potentially the most expensive part here and should go after all other constraints.
69

For 1: if hasAncestor(anyOf(cxxForRangeStmt(), ...)) doesn't work, you can try hasAncestor(stmt(anyOf(cxxForRangeStmt(), ...)). In any case, repeated hasAncestor matchers don't make the check faster ;)

For 2: exprWithCleanups and other implicit nodes can be skipped using the ignoringImplicit() matchers. Leave hasAncestor for the cases, where there can actually be arbitrary nodes in the path.

Here you're matching some arbitrary intermediate node (exprWithCleanups) and then you're constraining its parents and its children. This doesn't make much sense and it's rather inefficient. Since you're interested in the operators in the first place, try rearranging the matcher like this:

cxxOperatorCallExpr(
  anyOf(AssignOperator, PlusOperator),
  hasAncestor(stmt(anyOf(cxxForRangeStmt(), whileStmt(), forStmt()))))
alexfh requested changes to this revision.Jul 4 2016, 5:32 AM
alexfh edited edge metadata.
This revision now requires changes to proceed.Jul 4 2016, 5:32 AM
bittnerbarni edited edge metadata.

Thank you, for your valuable comments Alexander!

A few more nits.

clang-tidy/performance/InefficientStringConcatenationCheck.cpp
51

The allOf(declRefExpr(x), declRefExpr(y)) construct can be replaced with declRefExpr(x, y).

55

Indentation is confusing - as though hasDeclaration is an argument of stmt. Is it a result of clang-format?

64

hasAncestor is potentially more expensive, so it should go last.

alexfh requested changes to this revision.Jul 19 2016, 6:53 AM
alexfh edited edge metadata.
This revision now requires changes to proceed.Jul 19 2016, 6:53 AM
bittnerbarni edited edge metadata.
bittnerbarni marked an inline comment as done.
bittnerbarni added inline comments.
clang-tidy/performance/InefficientStringConcatenationCheck.cpp
56

Yes this is the result of clang-format.

A couple of nits.

clang-tidy/performance/InefficientStringConcatenationCheck.cpp
52

As noted earlier, there's no need to repeat declRefExpr. This should be:

declRefExpr(BasicStringType, hasDeclaration(decl().bind("lhsStrT")))
clang-tidy/performance/InefficientStringConcatenationCheck.h
26

Please format the file with clang-format -style=llvm.

Prazek edited edge metadata.Jul 25 2016, 11:36 AM
Prazek added a project: Restricted Project.
alexfh requested changes to this revision.Aug 2 2016, 3:48 AM
alexfh edited edge metadata.

Sorry for the delay. I almost missed that there was an update to the patch. Please mark addressed comments as "Done". This way it's easier for me to track changes and it's easier for you not to miss review comments.

One more thing that this check lacks is fix-it hints, however, this can be added later.

clang-tidy/performance/InefficientStringConcatenationCheck.h
27

The formatting is still off (specifically, indentation of public: and private: and the lack of an empty line before private:).

This revision now requires changes to proceed.Aug 2 2016, 3:48 AM
bittnerbarni edited edge metadata.
bittnerbarni marked 14 inline comments as done.
alexfh accepted this revision.Aug 2 2016, 4:43 PM
alexfh edited edge metadata.

Looks good. Thank you for working on this!

Do you need me to commit the patch for you?

This revision is now accepted and ready to land.Aug 2 2016, 4:43 PM

Thank you for all the assistance. Could you please do that?

Prazek added a subscriber: Prazek.Aug 2 2016, 9:11 PM

Thank you for all the assistance. Could you please do that?

btw obtaining commit access and commiting is very simple, so if you are planning to send us some more cool patches then I think you should get the commit access :)

I'm planning to submit more patches in the future, as I have time for them. So it wouldn't be in vain :)

Prazek added a comment.Aug 2 2016, 9:26 PM

I'm planning to submit more patches in the future, as I have time for them. So it wouldn't be in vain :)

http://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access

This revision was automatically updated to reflect the committed changes.

Please see PR28836. In some cases check should recommend to use insert().