Page MenuHomePhabricator

[clang-tidy] Add check 'modernize-use-algorithm'
AbandonedPublic

Authored by JDevlieghere on Jul 23 2016, 2:12 AM.

Details

Summary

This check emits a warning when memcpy or memset is used and suggest replacing it with a call to std::copy and std::fill respectively.

For the motivation see modernize-use-algorithm.rst

Taken from: https://llvm.org/bugs/show_bug.cgi?id=22209

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
etienneb added inline comments.Jul 25 2016, 2:26 PM
clang-tidy/modernize/UseAlgorithmCheck.cpp
74

It's more efficient to use hasAnyName and merge the three following matcher.
That way, you won't need the function 'getCallExpr' to retrieve the CallExpr.

81

allOf is not needed.

107

Instead of DirectCallee, you could add a "bind" to the Matcher : functionDecl(...).bind("callee")

108

What is the name is not in the map?

etienneb added a subscriber: etienneb.

hmm It seems that I mislead you, I suck at C api - memmove source and destination can overlap, but std::move can't. So I guess you have to remove the memmove support. Sorry.

On windows (MSVC CRT), both function are the same : memmove.

It is possible that the call to move with some trivialy copyable object would end up calling memmove, even if the call is wrong.

quote from http://en.cppreference.com/w/cpp/algorithm/move

d_first - the beginning of the destination range. If d_first is within [first, last), std::move_backward must be used instead of std::move.

But this is only the special ability of memmove, so we should not introduce stl-implementation dependent bugs

Thanks for the reviews everyone!

I'm currently still stuck on an issue I discovered when processing LLVM. I asked for help on the mailing list [0] but maybe someone here can help. Basically the issue is that sometimes pointers are passed to memcpy for which the types are not assignable. That's okay for memcpy because it takes its arguments as void pointers, but of course this causes std::copy to complain.

I want my pass to warn the user about this issue but skip the substitution, but I don't know how to check for this case...

[0] http://lists.llvm.org/pipermail/cfe-dev/2016-July/050105.html

JDevlieghere retitled this revision from [clang-tidy] Add check 'misc-replace-memcpy' to [clang-tidy] Add check 'modernize-use-algorithm'.
JDevlieghere updated this object.
JDevlieghere set the repository for this revision to rL LLVM.
  • Addressed concerns raised by Etienne Bergeron
  • Show warning when using void pointers but don't replace
  • Don't make replacement when destination is const
  • Don't make replacement when types are not compatible

I see you solved the void and conmpatible types problems. Excellent!

Can you post a patch with changes after running LLVM? I would wait for Alex or Aaron to accept it.

clang-tidy/modernize/UseAlgorithmCheck.cpp
59–61

Try just initializer list instead of creating vector here.
Just
for (const auto &keyValue : {std::make_pair("memcpy", "std::copy"), {"memset", "std::fill"}}

You just have to specify the type of the first argument and it will try to convert each next to it.

99–100

this should not happen. You can change it to assert

140

Would it be the same to check here for the first argument instead of second?
So then you could take this if before the if(MatchedName == "memset")

docs/clang-tidy/checks/modernize-use-algorithm.rst
38

You should add 2 notes here

  1. The check runs just with C++.
  2. There are some rare cases that require adding parens (and put example), so the check could right now generate wrong output. (which is something that I would not require to have, but it is good to leave not about it in documentation)

Also leave Fixit comment in the check.

JDevlieghere updated this revision to Diff 66350.EditedAug 1 2016, 11:58 AM

Addressed comments from Piotr Padlewski

LLVM compiles with the latest version of this check, however some tests are failing. I'll investigate the cause of this and update this check if it can be prevented.

Prazek edited edge metadata.Aug 1 2016, 12:36 PM

Addressed comments from Piotr Padlewski

LLVM compiles with the latest version of this check, however some tests are failing. I'll investigate the cause of this and update this check if it can be prevented.

That's weird. I wonder what happen, because the transformations looks legit.

aaron.ballman requested changes to this revision.Aug 3 2016, 6:56 AM
aaron.ballman edited edge metadata.
aaron.ballman added inline comments.
clang-tidy/modernize/UseAlgorithmCheck.cpp
26

I'd like to see some test cases involving attributed types, typedefs to pointers and non-pointers, and multidimensional arrays. For instance, I'm curious how well this handles typedef int some_type; typedef some_type * const * volatile *foo_ptr; or typedef void (__cdecl fn_ptr)(int);

50–51

Please use Twine for this sort of thing.

66

Missing full stop.

94

Please do not use auto here as the type is not spelled out in the RHS (same elsewhere).

97

You should ensure that the callee is actually the function we care about. Consider:

namespace awful {
void memcpy(int, int, int);
}
using namespace awful;

Such a use of memcpy() would trigger your check and create an invalid replacement, I think (and there should be a test for this).

98

Please add some "help text" to the assert in case it ever triggers. e.g., && "Replacements list does not match list of registered matcher names" or some such.

102

Diagnostics should start with a lowercase letter.

More importantly, this warning doesn't actually describe a problem to the user. Given the fact that this only should be triggered on uses where memcpy and std::copy are interchangeable, it's not really a *warning* at all (due to the interchangeability) as there is nothing dangerous about the original code that the suggestion will fix. @alexfh, what do you think of the idea of this being a Note rather than a Warning? I know it's unorthodox, but literally every instance of this diagnostic can be ignored or replaced and there should be no semantic effect on the code either way. Most other checks in modernize have more obvious benefits to modifying the code and so Warning is reasonably appropriate.

If this text remains a warning, it should describe what is dangerous with the code, not simply how to transform the code. Perhaps "'memcpy' reduces type-safety, consider using 'std::copy' instead" or something along those lines?

108–109

What about parens? e.g., memcpy((foo + 12), (bar), (a + b - 10));

119

Same for function pointer types.

138

Missing a colon at the end of this line.

clang-tidy/modernize/UseAlgorithmCheck.h
2

Line length does not match the closing ASCII art line length.

21

This is no longer accurate, and is also missing the full stop.

docs/clang-tidy/checks/modernize-use-algorithm.rst
7

Spurious comma.

8

Comma before "respectively".

9

Comma before "and" because Oxford commas are the best commas.

11–12

Remove the "Furthermore," as this is a continuation of the "type-aware" statement above?

test/clang-tidy/modernize-use-algorithm.cpp
3

Remove blank line.

Missing tests for unqualified uses of calling memcpy() and memset().

27

The asterisk binds to the identifier -- should run clang-format over the file.

This revision now requires changes to proceed.Aug 3 2016, 6:56 AM
JDevlieghere edited edge metadata.
JDevlieghere marked 21 inline comments as done.

Addresses comments from Aaron Ballman

@aaron.ballman Thanks for the thorough review! Can you check whether the tests I added address your concerns? Could you also elaborate on the case with the C-function pointer? Unless I explicitly cast it to void* the compiler rejects will reject it as an argument to memcpy. Am I missing a case where this could go wrong? I still added it to the pointer arithmetic check though, just to be sure.

Prazek added a comment.Aug 3 2016, 8:30 PM

Addresses comments from Aaron Ballman

@aaron.ballman Thanks for the thorough review! Can you check whether the tests I added address your concerns? Could you also elaborate on the case with the C-function pointer? Unless I explicitly cast it to void* the compiler rejects will reject it as an argument to memcpy. Am I missing a case where this could go wrong? I still added it to the pointer arithmetic check though, just to be sure.

Did you manage to see what was wrong in the transformation that you did on LLVM?

I have one idea, which should be fixed. Memset takes char as an argument, so if you would fill the array of ints with memset(arr, 0, sizeof(arr)), then it will set all
the elements to zero. But if you would do it with with set say 1, you will end up having

int val = 1 | (1 << 8) | (1 << 16) | (1 << 24)

for each elemnt of the array.

I don't think tho that any LLVM code depend od this, you usually set everything to zero, or to another value if you operate on char type.

So allow transformation and diagnostic if the argument equals 0, or when the type is char.

Addresses comments from Aaron Ballman

@aaron.ballman Thanks for the thorough review! Can you check whether the tests I added address your concerns? Could you also elaborate on the case with the C-function pointer? Unless I explicitly cast it to void* the compiler rejects will reject it as an argument to memcpy. Am I missing a case where this could go wrong? I still added it to the pointer arithmetic check though, just to be sure.

The tests added mostly cover them -- I elaborated on the function pointer case, which I don't *think* will go wrong with this check, but should have a pathological test case for just to be sure.

clang-tidy/modernize/UseAlgorithmCheck.cpp
27

Consider a pathological case like:

#include <string.h>

void f();

int main() {
  typedef void (__cdecl *fp)();
  
  fp f1 = f;
  fp f2;
  memcpy(&f2, &f1, sizeof(fp)); // transforms into std::copy(&f1, &f1, &f2); ?
}

The calling convention is important in the test because getStrippedType() isn't looking through attributed types, which I *think* that declaration would rely on. I don't think it will cause problems in this case, but the test will ensure it.

54–55

A more idiomatic way to write this is: return (Twine(Function) + "(" + Arg0...<etc>).str();

107

Instead of manually composing the diagnostic, it should use placeholders. e.g.,

diag(Loc, "%0 reduces type-safety, consider using '%1' instead) << Callee << ReplacedName;

(This makes it easier if we ever decide to localize our diagnostics.) Note that there's no quoting required around %0 because we're passing in a NamedDecl argument and the diagnostics engine understands how to display that.

JDevlieghere edited edge metadata.
JDevlieghere marked 9 inline comments as done.
  • Added function pointer test case
  • Used placeholders for diagnostics

I extended the matchers to include ::memcpy and ::memset as well because the check otherwise does not work on my mac because the cstring header that ships with Xcode is implemented as shown below.

_LIBCPP_BEGIN_NAMESPACE_STD
using ::size_t;
using ::memcpy;
using ::memset;
...
_LIBCPP_END_NAMESPACE_STD

This is annoying as I have no way to discriminate functions that have the same signature. Unless there's a better solution, this seems like a reasonable trade-off to me. Of course I'm open to suggestions!

The tests added mostly cover them -- I elaborated on the function pointer case, which I don't *think* will go wrong with this check, but should have a pathological test case for just to be sure.

I added the test case. The call is indeed replaced, which I guess is fine?

Did you manage to see what was wrong in the transformation that you did on LLVM?

Not yet, thought it seemed to be related to std::copy rather than std::fill. I'm still trying to pinpoint the culprit but it's extremely time consuming as I have to recompile LLVM completely in order to run the tests...

  • Added function pointer test case
  • Used placeholders for diagnostics

    I extended the matchers to include ::memcpy and ::memset as well because the check otherwise does not work on my mac because the cstring header that ships with Xcode is implemented as shown below.

    ` _LIBCPP_BEGIN_NAMESPACE_STD using ::size_t; using ::memcpy; using ::memset; ... _LIBCPP_END_NAMESPACE_STD `

You should add a similar approach as a test case to ensure we're covering this usage.

This is annoying as I have no way to discriminate functions that have the same signature. Unless there's a better solution, this seems like a reasonable trade-off to me. Of course I'm open to suggestions!

Given that you also have to handle the case where C++ code includes string.h (rather than cstring) to call memcpy, I think this is the only approach you can really take. Either you are getting a global declaration or one in the std namespace, and both should be handled when the signatures are correct.

The tests added mostly cover them -- I elaborated on the function pointer case, which I don't *think* will go wrong with this check, but should have a pathological test case for just to be sure.

I added the test case. The call is indeed replaced, which I guess is fine?

Yes, that's the behavior I would expect.

Did you manage to see what was wrong in the transformation that you did on LLVM?

Not yet, thought it seemed to be related to std::copy rather than std::fill. I'm still trying to pinpoint the culprit but it's extremely time consuming as I have to recompile LLVM completely in order to run the tests...

alexfh requested changes to this revision.Aug 8 2016, 8:19 AM
alexfh edited edge metadata.
alexfh added inline comments.
clang-tidy/modernize/ModernizeTidyModule.cpp
60

Entries should be sorted alphabetically.

clang-tidy/modernize/UseAlgorithmCheck.cpp
51

I don't think this function pulls its weight. With just two callers and a trivial implementation, it seems that it's better to inline it.

60–62

I'm not sure this works on MSVC2013. Could someone try this before submitting?

87

For consistency, I'd prefer the early return style (if (!...) return;).

103

Notes are treated completely differently - each note has to be attached to a warning.

Clang-tidy can only deal with two severity levels of diagnostics: "warning" and "error", but it's better to let them be controlled by the user via -warnings-as-errors= command-line option or the WarningsAsErrors configuration file option.

104

This should be a StringRef instead. No need to allocate a string.

132

Should clang::tooling::fixit::getText be used instead?

clang-tidy/modernize/UseAlgorithmCheck.h
21

Please enclose inline code snippets in backquotes.

This revision now requires changes to proceed.Aug 8 2016, 8:19 AM
aaron.ballman added inline comments.Aug 8 2016, 8:33 AM
clang-tidy/modernize/UseAlgorithmCheck.cpp
60–62

It does not work in MSVC 2013.

103

Drat. I am not keen on this being a warning (let alone an error) because it really doesn't warn the user against anything bad (the type safety argument is tenuous at best), and it's arguable whether this actually modernizes code. Do you think there's sufficient utility to this check to warrant it being default-enabled as part of the modernize suite? For instance, we have 368 instances of memcpy() and 171 instances of std::copy() in the LLVM code base, which is an example of a very modern C++ code base. I'm not opposed to the check, just worried that it will drown out diagnostics that have more impact to the user.

alexfh added inline comments.Aug 8 2016, 4:56 PM
clang-tidy/modernize/UseAlgorithmCheck.cpp
60–62

Thanks for verifying. Then it should be just

Replacements["memcpy"] = "std::copy";
Replacements["memset"] = "std::fill";

Maybe even remove the map completely and just hardcode the two options in UseAlgorithmCheck::check. Or are we expecting other similar functions to be added to the list? (I guess, unlikely)

103

It's a terminology issue, I think. Clang-tidy uses clang "warnings" for all kinds of warnings, issues, suggestions or recommendations it reports. If we want to extend the range while remaining in the clang diagnostic engine bounds, we could use Remark for some diagnostics, but I think, the mapping would still need to be configurable by the user. Alternatively, we could introduce some kind of a more granular integer-based severity level. However, I don't have an immediate use case for either of these, so I'd leave design to someone, who actually needs this and can explain the motivation.

JDevlieghere edited edge metadata.
JDevlieghere marked 5 inline comments as done.

Fixes issues raised by Alexander and Aaron

JDevlieghere edited edge metadata.

Removed anonymous namespaces in test file. I was playing around with it but forgot to remove it before making my last diff.

alexfh requested changes to this revision.Aug 17 2016, 2:34 AM
alexfh edited edge metadata.
alexfh added inline comments.
clang-tidy/modernize/UseAlgorithmCheck.cpp
38

Please inline this function as well. It's only used once and the implementation + the comment that is already in the call site are more than enough to explain the intent.

42

I'd probably inline this function as well. Its name is not particularly clear anyway (could be better with 'doesNotSupportPointerArithmetic` or similar, but I think, inlining it is even better).

49

Actually, I don't think we need a map of exactly two entries, since the order of arguments still has to be hardcoded. Let's just do a couple of ifs or a StringSwitch in the check() method.

142

No braces around single-statement if bodies, please.

docs/clang-tidy/checks/modernize-use-algorithm.rst
52

I think, we need to fix this right away instead of documenting this bug. Two things here:

  1. problems related to the operator precedence can be fixed by adding parentheses, when needed (see how a similar issue was solved in SimplifyBooleanExprCheck; here we can make a needsParensBeforePlus function similar to needsParensAfterUnaryNegation).
  2. the check should avoid replacements that duplicate an expression with side effects that can result from increment/decrement operators, assignments and some function calls. Function calls is the most difficult part here, since not all function calls have side effects (or are expensive). One approach here could be to add a const auto & variable to keep the value of the argument we need to repeat, if it's anything more complex than just a DeclRefExpr.
This revision now requires changes to proceed.Aug 17 2016, 2:34 AM
JDevlieghere edited edge metadata.

Still working on comment #2 from Alex but wanted to update my diff since it's been a while and I haven't gotten around to looking into it further. So no need to review yet.

alexfh requested changes to this revision.Oct 31 2016, 7:45 AM
alexfh edited edge metadata.

Removing from my dashboard. Ping me once you have a new patch.

This revision now requires changes to proceed.Oct 31 2016, 7:45 AM
JDevlieghere abandoned this revision.Nov 23 2016, 11:33 AM

I'm abandoning this revision because I think this check is getting overly complex. There's still the problem of supporting arguments that can have side effects, and then there's also the unaddressed issue of code possibly using the void* return type of these two functions. Any solution to this would require rather inelegant and complex code that is, in my opinion, simply not warranted by the added value of this check.