This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] Add modernize-raw-string-literal check
ClosedPublic

Authored by LegalizeAdulthood on Jan 25 2016, 4:02 AM.

Details

Summary

This check examines string literals and looks for those that could be more directly expressed as a raw string literal.

Example:

char const *const BackSlash{"goink\\frob"};

becomes:

char const *const BackSlash{R"(goink\frob)"};

Addresses PR24444

Diff Detail

Event Timeline

LegalizeAdulthood retitled this revision from to [clang-tidy] Add modernize-raw-string-literal check.
LegalizeAdulthood updated this object.
LegalizeAdulthood added a reviewer: alexfh.
LegalizeAdulthood added a subscriber: cfe-commits.
aaron.ballman added inline comments.
clang-tidy/modernize/RawStringLiteralCheck.cpp
36

Newlines are hard. Should this also handle \r (for CRLF code like \r\n)?

67

Please limit registration of the checker to just C++11 or later.

test/clang-tidy/modernize-raw-string-literal.cpp
19

Is there a way that this fix-it can mangle the source file in scary ways, though? For instance, say I have a source file saved with CRLF and a string literal of "foo\nbar" -- this will either make the string literal contain a CRLF instead of just LF, or it will make the string literal contain an LF right up until the user's editor saves the file and converts the line ending (I think).

56

Can you also add a test that ensures we don't mangle already-raw string literals (by re-rawifying them)?

clang-tidy/modernize/RawStringLiteralCheck.cpp
36

The standard [2.14.5 lex.string, clauses 4 and 5] explicitly says that a source file new-line in a raw string literal results in a new-line in the resulting execution string-literal.

The examples shown in the standard are consistent with what I've written and VS interprets CRLF in a raw string literal in the source file as \n in the string literal.

\r is not handled similarly and the presence of \r in the string literal prevents this check from generating any fix-it.

67

Good catch. I meant to do this, but overlooked it.

test/clang-tidy/modernize-raw-string-literal.cpp
19

See above. This doesn't work the way you think it does.

If it did (no newline interpretation), then the feature would be utterly useless for multi-line raw string literals, where simply checking out the source code on Windows with CRLF EOL interpretation would change the meaning of source that was originally committed on Unix with LF EOL interpretation.

56

Another good catch, I will add something for this.

LegalizeAdulthood marked 2 inline comments as done.

Update from comments:

  • leave existing raw string literals unchanged
  • don't change UTF-8, UTF-16, UTF-32 or wide character string literals
  • apply changes only to C++11 or later source files
  • improve detection of strings with control characters
aaron.ballman added inline comments.Jan 27 2016, 6:16 AM
clang-tidy/modernize/RawStringLiteralCheck.cpp
27

Might make sense to hoist this check out of containsEscapedCharacters since it has nothing to do with escaped chars, really. Regardless, comments should be full sentences with capitalization and punctuation (here and elsewhere in the file).

37

Ah, thank you for the explanation!

59

I think it might make more sense to block on D16012 (which will hopefully be reviewed relatively soon), and then just use the StringLiteral object to specify whether it's a raw string literal or not.

104

Can just use !Options.CPlusPlus11 -- 11 is implied in 14 and 1z.

alexfh added inline comments.Jan 27 2016, 7:13 AM
clang-tidy/modernize/RawStringLiteralCheck.cpp
28

Why can't the check convert non-ascii string literals to corresponding raw string literals?

58

nit: Capitalization, punctuation. Same for other comments in the file.

59

Also, why find_first_of instead of just find(char)? It will allow you to get rid of the R"(")" awfulness ;)

62

I'd remove these variables and either just combine all these to a single condition (which would make the code benefit from short-circuiting) or just use a regular expression, which can be more effective than N lookups in a row (not sure for which N though). Another benefit of the regular expression is reduction of the number of R"(s in the code: R"(\\[n'"?\\])" ;).

78

Why don't you try an empty delimiter? It will cover a majority of cases.

Also, even though the function is used only when generating fix-it hints, it still should be more effective than it is now. Most of the heap allocations and copies caused by the usage of std::ostringstream, std::string and std::string::operator+ can be avoided, e.g. like this:

static const StringRef Delimiters[2][] =
  {{"", "R\"(", ")\""}, {"lit", "R\"lit(", ")lit\""}, {"str", "R\"str(", ")str\""},
  /* add more different ones to make it unlikely to meet all of these in a single literal in the wild */};
for (const auto &Delim : Delimiters) {
  if (Bytes.find(Delim[2]) != StringRef::npos)
    return (Delim[1] + Bytes + Delim[2]).str();
}
// Give up gracefully on literals having all our delimiters.
108

I don't see a reason to have a separate function, I'd just inline it here.

114

I'd just use the expression instead of the variable.

118

Replacement will point to deleted memory right after this line.

clang-tidy/modernize/RawStringLiteralCheck.cpp
78

The very first thing tried is an empty delimiter.

I coded a solution to the problem that always works.

I find it less understandable to try a bunch of random delimiters and then fail if they are all present in the string. Then the whole algorithm becomes more complicated because even though I could construct a fixit, I'm failing stupidly because all my "favorite" delimiters were used in your string.

Sometimes I feel these reviews over-obsess with allocations and efficiency instead of implementing a simple general algorithm that works reliably.

We don't even have any measurements to show that this performance consideration is dominant or even noticeable, but I'm being asked to take a perfectly correct algorithm and cram it into a StringRef corset.

104

Then the documentation needs to be updated. (Actually I couldn't find any documentation on these flags and how they relate to each other because they use preprocessor hell to generate bitfields in a structure.)

118

For the record: I hate StringRef. I spend about 80% of my time on clang-tidy submissions dicking around with StringRef based on all the review comments. It is a very hard class to use properly. If a usage like this results in referencing deleted memory, it shouldn't even compile.

clang-tidy/modernize/RawStringLiteralCheck.cpp
28

Just doing one thing at a time and not trying to create a "kitchen sink" check on the first pass.

58

I don't understand what exactly you are asking me to change. There is something wrong with the comment?

clang-tidy/modernize/RawStringLiteralCheck.cpp
108

There is no reason to inline it.

aaron.ballman added inline comments.Jan 27 2016, 12:13 PM
clang-tidy/modernize/RawStringLiteralCheck.cpp
58

// already a raw string literal if R comes before "

Should be:

// Already a raw string literal if R comes before ".

alexfh added inline comments.Jan 27 2016, 5:44 PM
clang-tidy/modernize/RawStringLiteralCheck.cpp
28

It should be rather straightforward to implement handling of the other types of string literals, but if you still want to postpone this, then a TODO would be more useful than the comment that just states that "we only transform ASCII string literals" (without explaining why).

58

Sorry for the brevity. Aaron explained what I meant and here's the motivation (http://llvm.org/docs/CodingStandards.html#commenting):

When writing comments, write them as English prose, which means they should use proper capitalization, punctuation, etc.

78

The very first thing tried is an empty delimiter.

Ah, sorry, this wasn't clear to me at first.

I coded a solution to the problem that always works.

We first need to agree on whether we're trying to create a solution that works for hypothetical worst-case that has no chances to appear outside of the tests for your check, or we're trying to target real code.

The solution I suggest is more than enough for real code. For example, I've just searched for the \)[a-z]\\\" regex (a sequence that would prevent a single-letter delimiter) in a huuuuuge codebase and there were only ~20 hits (roughly half of them in tests for handling raw string literals in clang-format and other C++ tools). Needless to say that )lit\" wasn't found, nor )str\".

I find it less understandable to try a bunch of random delimiters and then fail if they are all present in the string. Then the whole algorithm becomes more complicated because even though I could construct a fixit, I'm failing stupidly because all my "favorite" delimiters were used in your string.

Good news is that this code will "fail" (and it should fail gracefully, e.g. just skip the fixit) extremely infrequently (see above). And by having a list of "favorite" delimiters we can make the resulting code look better in the (extremely rare) case when the version you suggest would have to use )lit0", for example.

Sometimes I feel these reviews over-obsess with allocations and efficiency instead of implementing a simple general algorithm that works reliably.

Unfortunately, in the code that deals with C++ analysis it's very important to be thoughtful about memory allocations and other possible sources of performance issues. It's not a hypothetical problem: the set of checks we normally use already takes much longer than just parsing the code, which sometimes results in minutes for a single translation unit.

We don't even have any measurements to show that this performance consideration is dominant or even noticeable, but I'm being asked to take a perfectly correct algorithm and cram it into a StringRef corset.

There are currently about 100 checks in clang-tidy and in order to be dominant time consumer a single check needs to be about 100 times slower than the average other check. Don't aim for that. Instead, please try to understand that this "obsession with allocations and efficiency" comes from experience of engineers who spent lots of time analyzing performance issues in various parts of LLVM/Clang that frequently enough boil down to excessive allocations and/or copying of strings and other data structures. This experience resulted in a set of recommendations and an extensive library of data structures that help to avoid common performance problems, when used appropriately. See http://llvm.org/docs/ProgrammersManual.html#picking-the-right-data-structure-for-a-task for some relevant documentation.

108

I apologize for the form of this comment. I wanted to ask whether you have any reasons to have this as a separate function.

Here are a few reasons to not have a separate preferRawStringLiterals function:

  • this function has a single caller that's small enough to fit the whole body of this function;
  • the name of this function doesn't make the responsibilities of this function clear to the reader;
  • the code will become shorter (and IMO, easier to read), if you just remove the function and put its body where it is now called.
118

For the record: I hate StringRef. I spend about 80% of my time on clang-tidy submissions dicking around with StringRef based on all the review comments.

And there's a bunch of good reasons to use StringRef instead of std::string (and const char*) where appropriate. See my comment above regarding optimizations.

It is a very hard class to use properly.

Yes, it's like pointers. They are also hard to use to a certain degree. However, we don't have a better tool to work with read-only fragments of strings, pass strings around without copying them unintentionally, etc.

If a usage like this results in referencing deleted memory, it shouldn't even compile.

If it's possible to achieve this without breaking valid use cases, I believe, many LLVM/Clang developers will say thank you to the person proposing and implementing this solution.

But until we have this solution, we still need to be careful about creating StringRef from temporary std::string. One tool that can help finding this kind of a bug is ASan, it can be enabled using CMake LLVM_USE_SANITIZER option (see http://llvm.org/docs/CMake.html for relevant docs).

LegalizeAdulthood marked 3 inline comments as done.

Update from review comments.
Extend analysis of newlines to prevent pathological raw strings from being introduced.
Extend the documentation to provide more examples of strings that are modified or not modified and explain the reasoning.

Why are you re-adding all those Makefiles?

Why are you re-adding all those Makefiles?

Ugh, I didn't even notice they were in there. It must have errantly slipped in from rebasing.

alexfh edited edge metadata.Jan 31 2016, 4:44 AM

Thank you for addressing (most of) the comments!

I have a few more comments and the comment about the implementation of asRawStringLiteral still applies.

Also, please remove unrelated files from the patch.

clang-tidy/modernize/RawStringLiteralCheck.cpp
69

A .find('\0') should be enough, no need for .find_first_of().

79

I think 'R' in string literals can't be anywhere but at the very beginning. So I'd prefer Text.front() == 'R' as a simpler and a more straightforward alternative.

87

What's the reason to make the code longer and less readable by usingb lit here?

116

This is purely a matter of preference, but I personally don't find that, for example, R"(\)" is better that "\\", and in general, that removing one backslash from a string literal is worth adding three more special characters (R, ( and )). I think, the check should either be more conservative (e.g. leave alone string literals that don't become shorter, when rewritten as raw string literals) or be configurable in this regard. Another concern is that replacing a large number of \n sequences with actual newlines might make the code less readable (also a matter of preference, I guess). This also deserves either a cleverer analysis to prevent undesirable replacements and/or a separate configuration option.

docs/clang-tidy/checks/modernize-raw-string-literal.rst
46

I'm not a native speaker, but it seems that the rule to use two spaces after a period is outdated. Also, a quick search shows that in LLVM/Clang documentation a majority of files use exclusively one space after a period. That said, I don't feel strongly about that and this comment is just FYI.

Also, please remove unrelated files from the patch.

Please ignore this part, I missed the new diff.

LegalizeAdulthood marked an inline comment as done.Jan 31 2016, 7:41 PM
LegalizeAdulthood added inline comments.
clang-tidy/modernize/RawStringLiteralCheck.cpp
80

Take a look at http://en.cppreference.com/w/cpp/language/string_literal

There can be a prefix before the R, it's not always the first character.

88

Without R"()", the \ and " have to be escaped. Running this check on this code would turn this into a raw string literal.

docs/clang-tidy/checks/modernize-raw-string-literal.rst
47

It's a habit my fingers learned when they taught me to type on typewriters in high school.

47

...also, AFAIK, it doesn't matter in terms of the output produced.

LegalizeAdulthood edited edge metadata.

Update from comments

aaron.ballman added inline comments.Feb 1 2016, 7:28 AM
clang-tidy/modernize/RawStringLiteralCheck.cpp
81

This is why I would still prefer to block on fixing StringLiteral. This is functional, but really kind of nasty -- in the common case (not already a raw string literal), this does two linear searches through the entire string literal.

89

I think Alex's point is: why not R"('\"?x01)" (removing the need for lit)?

117

A configuration option makes sense to me, but I would be fine if it was a follow-on patch. I think that someone running this check is likely fine with the level of noise it will produce (which should be moderately low).

alexfh added inline comments.Feb 1 2016, 8:34 AM
clang-tidy/modernize/RawStringLiteralCheck.cpp
81

Richard, you're right, it's u8R"(...)", not Ru8"(...)" or something. Nevertheless, doing a full scan for R (the search for " wouldn't result in a full scan) in this condition shouldn't be necessary, it could first find a " and then check if the previous character is R. It would be a bit more code, but less needless work at runtime.

Alternatively, D16012 might be a better answer, if it ever makes it into Clang.

89

Exactly, I was only talking about lit, not about using the raw string literal.

97

Please address my comment above that relates to this code. Specifically, I find this generic algorithm unnecessarily inefficient and too rigid, i.e. one can't configure a custom set of delimiters that don't follow the <prefix><number> pattern. I suggest using a list of pre-concatenated pairs of strings (like R"lit( and )lit").

clang-tidy/modernize/RawStringLiteralCheck.cpp
89

It was looking a little busy to my eyes with the raw string " and the quoted " close together. It isn't necessary, but IMO improves readability.

97

Raw strings are new and not many people are using them because the don't have a good way to automatically convert disgusting quoted strings to raw strings. So I don't think it is reasonable to draw conclusions by looking in existing code bases for raw strings.

We're having the same conversation we've had before. I'm trying to do a basic check and get things working properly and the review comments are tending towards a desire for some kind of perfection.

I don't see why we have to make the perfect the enemy of the good. Nothing you are suggesting must be present now in order for this check to function properly and reasonably.

alexfh added a comment.Feb 4 2016, 4:21 AM

There are still a few comments open. One more important thing is to try running this check over a large enough project (LLVM + Clang, for example), apply fixes, look at the results and try to build the fixed code and run all tests. You can use the clang-tidy/tool/run-clang-tidy.py script (with proper -clang-tidy-binary and -clang-apply-replacements-binary options) to run the check and apply the fixes.

clang-tidy/modernize/RawStringLiteralCheck.cpp
69

Do you have a test for this?

97

We're looking at this from different perspectives. From my point of view, preventing a potentially wasteful code in ClangTidy checks before it's even committed is much easier than tracking down performance issues when tens of checks run on multiple machines analyzing millions lines of code. So I'm asking to avoid writing code using string allocations and concatenations when there are good alternatives. Apart from possible performance issues, in this case the generic solution you suggest is targets extremely rare cases, while most real-world situations can be handled with a fixed set of delimiters (possibly, configured by the user, while I expect the preferences for the choice of delimiters to be very different in different teams).

117

Agreed. This can be done in a follow up.

test/clang-tidy/modernize-raw-string-literal.cpp
2

As usual, please add tests ensuring that replacements are done correctly in templates with multiple instantiations and in macro arguments (but not in macro bodies), e.g.

template<typename T>
void f() {
  (void)"asdf\\\\\\";
  // CHECK-FIXES: void f() {
  // CHECK-FIXES-NEXT: {{^  }}(void)R"(asdf\\\)";{{$}}
  // CHECK-FIXES-NEXT: {{^}$}}
}
void g() {
  f<int>();
  f<double>();
}

#define M(x) x x x "a\\b\\"
void h() {
  const char *const s = M("c\\d\\");
  // CHECK-FIXES: {{^}}#define M(x) x x x "a\\b\\"{{$}}
  // CHECK-FIXES: {{^  }}const char *const s = M(R"(c\d\)");
}

I agree it needs more testing.

I think also my current approach to newlines is overly aggressive and will result in more raw string literals than people would prefer.

It's really the Windows style path separators and regex ones that are not controversial transformations.

LegalizeAdulthood marked 5 inline comments as done.Feb 7 2016, 4:51 PM
LegalizeAdulthood added inline comments.
clang-tidy/modernize/RawStringLiteralCheck.cpp
97

I believe we agree on the following:

  • In order to turn a non-raw string literal into a raw string literal I have to surround it with R"( and )".
  • If the existing string literal contains )" already, then surrounding the literal with the above will result in a syntax error. Therefore, every potential raw string literal will have to be searched at least once for this non-delimited form of raw string literal.
  • clang-tidy checks should never introduce syntax errors.

Therefore, I can either not transform the string literal if it contains )", or I can come up with some delimited form of raw string literal where the delimiter does not appear in the string. In order to not transform the literal, I first have to search for )" in order to know that it should not be transformed. Therefore, the search for )" must be performed, no matter what algorithm is used to determine a delimited or non-delimited raw string literal.

Where we seem to disagree is what algorithm should be used to determine the delimiter when a delimited raw string literal is required.

My implementation is the minimum performance impact for the typical case where the string does not contain )".

Now let's talk about the cases where searching the string repeatedly for a delimiter would be expensive.

First, the string literal will have to contain )".

Second, the string literal would have to be lengthy for the delimiter searching to be expensive. Most of the time lengthy string literals are when someone has transformed a text file into a giant string literal. Such text files include newlines and in the latest version of the code, I've decided to skip any string literal containing a newline. It simply results in too many strings being converted to raw string literals and obfuscates the newlines. The one case where the embedded newlines makes sense is the case of the text file converted into a string literal.

Repeated searching of the string for the literal delimiter could be avoided by changing the routine to search for the delimiter. If present, it could examine the characters following the literal to make the literal more unique and then continue searching from there to look for more occurrences of the extended delimiter. It would proceed incrementally through the rest of the string to create a unique delimiter in a single pass through the string. I think the resulting literals would be even more non-sensical than the current implementation, but it would result in a delimiter obtained by a single pass through the string. It's a significantly more complicated implementation and results in an even weirder delimiter to handle a very edge case.

If I follow your suggested approach of using a fixed number of string delimiters, I still have to check the strings for the presence of those delimiters surrounded by ) and ", so I am not saving anything in terms of performance. In the worst case i still have to perform multiple searches through the string and then fail the entire transformation if all of them are present in the string. Failing the transformation significantly complicates the entire check because more state has to be passed around up and down the call chain of the functions evaluating the string literal so that we can abort the whole replacement.

I ran this check on the entire LLVM code base and not once did the check produce a raw string literal containing a delimiter. Therefore the performance question of how to choose a unique delimiter we are discussing is moot. It never entered the picture.

I'm happy to run this check on other large code bases to determine the likelihood of )" appearing in a string and the need for determining a unique delimiter.

117

In my current version of the code, I've decided to skip strings containing newlines. There are too many heuristics involved and subjective evaluations about readability when newlines are present in raw string literals.

The one case where I see it to be useful is when you process a text file into a C++ source file as a giant string literal. However, I think the better solution there is to have the generating utility use raw string literals directly instead of running clang-tidy on it afterwards.

test/clang-tidy/modernize-raw-string-literal.cpp
2

I thought I could get it to handle macro arguments, but when I applied the check, it ended up inlining the macro instead, so I don't know exactly how to do that. In the meantime, I've simply skipped any location that came from macro body or argument expansion. This could be improved in the future, but prevents unwanted changes in the source file now.

LegalizeAdulthood marked 2 inline comments as done.

Update from review comments.

Added a bunch of test cases for non-printing characters, templates and macros.

Removed string literals containing newline (\n) from being considered for transformation to raw string literals.

I wrote:

Repeated searching of the string for the literal delimiter could be avoided by changing the routine to search for the delimiter. If present, it could examine the characters following the literal to make the literal more unique and then continue searching from there to look for more occurrences of the extended delimiter. It would proceed incrementally through the rest of the string to create a unique delimiter in a single pass through the string. I think the resulting literals would be even more non-sensical than the current implementation, but it would result in a delimiter obtained by a single pass through the string. It's a significantly more complicated implementation and results in an even weirder delimiter to handle a very edge case.

Unfortunately in this paragraph I used the term "literal" in several places where I should have said "delimiter". I hope that wasn't too confusing. It should have read:

Repeated searching of the string for the delimiter could be avoided by changing the routine to search for the delimiter. If present, it could examine the characters following the delimiter to make the delimiter more unique and then continue searching from there to look for more occurrences of the extended delimiter. It would proceed incrementally through the rest of the string to create a unique delimiter in a single pass through the string. I think the resulting delimiters would be even more non-sensical than the current implementation, but it would result in a delimiter obtained by a single pass through the string. It's a significantly more complicated implementation and results in an even weirder delimiter to handle a very edge case.

Sorry for the delay. I'm trying to prioritize reviews that are taking less time per-iteration. Unfortunately, here we have a bunch of disagreements and I have to spend significant time to read through your arguments and address your points.

clang-tidy/modernize/RawStringLiteralCheck.cpp
76

Please remove lit. I understand your desire to make the string less 'crowded', but it's totally uncommon for this code to use delimiters in raw strings unless absolutely necessary.

80

Please remove quote. I understand your desire to make the string less 'crowded', but it's totally uncommon for this code to use delimiters in raw strings unless absolutely necessary.

93

This is not needed, see the comment below at the call to this function.

98

I believe we agree on the following: ...

Yes.

Where we seem to disagree is what algorithm should be used to determine the
delimiter when a delimited raw string literal is required.

Pretty much so. IMO, we don't need a universal algorithm that will hardly ever go further than the second iteration, and even in this case would produce a result that's likely undesirable (as I said, R"lit0(....)lit0" is not what I would want to see in my code).

The possibility of this code causing performance issues is probably not that high, but I can imagine a code where this could be sub-optimal.

My implementation is the minimum performance impact for the typical case
where the string does not contain )".

One concern about this part is that it could issue 4 concatenation calls fewer in case of an empty delimiter. This may already be handled well by the llvm::Twine though.

Any code following potentially-problematic patterns might be fine on its own, but it will attract unnecessary attention when reading code. High frequency of such false alarms has non-trivial cost for code readers and makes it harder to find actual problems.

So please, change the code to avoid these issues. Here's a possible alternative:

llvm::Optional<std::string> asRawStringLiteral(const StringLiteral *Literal) {
  const StringRef Bytes = Literal->getBytes();
  static const StringRef Delimiters[2][] =
    {{"R\"(", ")\""}, {"R\"lit(", ")lit\""}, {"R\"str(", ")str\""},
    /* add more different ones to make it unlikely to meet all of these in a single literal in the wild */};
  for (const auto &Delim : Delimiters) {
    if (Bytes.find(Delim[1]) != StringRef::npos)
      return (Delim[0] + Bytes + Delim[1]).str();
  }
  // Give up gracefully on literals having all our delimiters.
  return llvm::None;
}
110

There's no need to check the result of getNodeAs here, since the binding is done in a non-optional branch of the matcher and the type of the matcher corresponds to the template argument type of getNodeAs. At most, an assert is needed to simplify debugging in case the code is changed.

111
if (Literal->getLocStart().isMacroID())
  return;
test/clang-tidy/modernize-raw-string-literal.cpp
3

This can be sorted out later. Please add a FIXME next to the relevant test for now.

48

Please add a FIXME to handle utf8, utf16, utf32 and wide string literals.

clang-tidy/modernize/RawStringLiteralCheck.cpp
98

I just don't see why replacing a general algorithm that always works with a bunch of hard-coded special cases is better.

You've reiterated your preference for one over the other, but since it simply a matter of preference, I don't see why this should be a requirement.

Working with the hard-coded list of delimiters is no more or less efficient than the general algorithm that I implemented, so efficiency is not the concern here.

111

I don't understand what this code is doing. isMacroID is undocumented, so I don't know what it means for this function to return true without reverse engineering the implementation.

LegalizeAdulthood marked 4 inline comments as done.

Update from review comments.
Avoid some string concatenation when delimiter is empty.

Add FIXME for non-ASCII string literals.
Allow delimiter stem to be configured.
Avoid some string concatenation.

LegalizeAdulthood marked an inline comment as done.

Add FIXME for macro argument test case.

Squeak. This has been waiting on review for over two weeks....

aaron.ballman accepted this revision.Mar 7 2016, 7:41 AM
aaron.ballman edited edge metadata.

Squeak. This has been waiting on review for over two weeks....

Sorry for the delay, I was at WG21 meetings last week, so code reviews mostly were ignored by me.

I think this check LGTM, but please wait for @alexfh before committing.

clang-tidy/modernize/RawStringLiteralCheck.cpp
82

This is a wonderful demonstration of why I hate raw string literals on shorter strings. I had to stare at that raw string for quite some time to figure it out. :-(

This revision is now accepted and ready to land.Mar 7 2016, 7:41 AM

I do not have commit access, so I will need someone to commit this for me.

Patch by Richard Thomson

thanks.

clang-tidy/modernize/RawStringLiteralCheck.cpp
82

I used a raw string literal here because that's what this check would have recommended on this code :-).

However, your feedback is useful. If I subsequently enhance this check with a parameter that says the minimum length a string literal must have before it is to be transformed into a raw string literal and the default value for that parameter is something like 5, would that address your concern?

Looks mostly good, a few nits.

clang-tidy/modernize/RawStringLiteralCheck.cpp
82

That (a way to configure some thresholds of "sensitivity" of this check, so it doesn't change "\"" to R"(")" etc.) is roughly what I suggested a while ago, and we decided that this can be addressed in a follow up.

98

Working with the hard-coded list of delimiters is no more or less efficient than the general algorithm that I
implemented, so efficiency is not the concern here.

The other concern is:

IMO, we don't need a universal algorithm that will hardly ever go further than the second iteration, and even in
this case would produce a result that's likely undesirable (as I said, R"lit0(....)lit0" is not what I would want to
see in my code).

However, given the low probability of this scenario, I don't want to waste more of our time on this. The current algorithm is good enough for the first version.

104

AFAIU, braced initializers in constructor initializer lists are not supported on MSVC 2013.

104

Then the documentation needs to be updated. (Actually I couldn't find any
documentation on these flags and how they relate to each other because they
use preprocessor hell to generate bitfields in a structure.)

As with most things in open source, the documentation is missing, since nobody yet needed it strongly enough to write it.

If you feel it's worth your time, send a patch to whoever maintains Clang frontend code these days. The documentation should go somewhere around include/clang/Frontend/LangStandards.def:115 or include/clang/Basic/LangOptions.def:77.

LegalizeAdulthood edited edge metadata.
LegalizeAdulthood marked an inline comment as done.

Update from comments.
Update documentation to reflect changes in the implementation.

I do not have commit access.

Patch by Richard Thomson

When running the tests from this patch locally (Win10, MSVC 2015, debug, x64), I get:

1>------ Build started: Project: intrinsics_gen, Configuration: Debug x64 ------
2>------ Build started: Project: ClangDiagnosticLex, Configuration: Debug x64 ------
3>------ Build started: Project: ClangDiagnosticParse, Configuration: Debug x64 ------
4>------ Build started: Project: ClangDiagnosticSema, Configuration: Debug x64 ------
5>------ Build started: Project: ClangDiagnosticSerialization, Configuration: Debug x64 ------
6>------ Build started: Project: ClangDiagnosticIndexName, Configuration: Debug x64 ------
7>------ Build started: Project: ClangDiagnosticGroups, Configuration: Debug x64 ------
8>------ Build started: Project: ClangStmtNodes, Configuration: Debug x64 ------
9>------ Build started: Project: ClangARMNeon, Configuration: Debug x64 ------
10>------ Build started: Project: ClangAttrClasses, Configuration: Debug x64 ------
11>------ Build started: Project: ClangAttrDump, Configuration: Debug x64 ------
12>------ Build started: Project: ClangAttrHasAttributeImpl, Configuration: Debug x64 ------
13>------ Build started: Project: ClangAttrImpl, Configuration: Debug x64 ------
14>------ Build started: Project: ClangAttrList, Configuration: Debug x64 ------
15>------ Build started: Project: ClangAttrPCHRead, Configuration: Debug x64 ------
16>------ Build started: Project: ClangAttrPCHWrite, Configuration: Debug x64 ------
17>------ Build started: Project: ClangAttrParsedAttrImpl, Configuration: Debug x64 ------
18>------ Build started: Project: ClangAttrParsedAttrKinds, Configuration: Debug x64 ------
19>------ Build started: Project: ClangAttrParsedAttrList, Configuration: Debug x64 ------
20>------ Build started: Project: ClangAttrParserStringSwitches, Configuration: Debug x64 ------
21>------ Build started: Project: ClangAttrSpellingListIndex, Configuration: Debug x64 ------
22>------ Build started: Project: ClangAttrTemplateInstantiate, Configuration: Debug x64 ------
23>------ Build started: Project: ClangAttrVisitor, Configuration: Debug x64 ------
24>------ Build started: Project: ClangCommentCommandInfo, Configuration: Debug x64 ------
25>------ Build started: Project: ClangCommentCommandList, Configuration: Debug x64 ------
26>------ Build started: Project: ClangCommentHTMLNamedCharacterReferences, Configuration: Debug x64 ------
27>------ Build started: Project: ClangCommentHTMLTags, Configuration: Debug x64 ------
28>------ Build started: Project: ClangCommentHTMLTagsProperties, Configuration: Debug x64 ------
29>------ Build started: Project: ClangCommentNodes, Configuration: Debug x64 ------
30>------ Build started: Project: ClangDeclNodes, Configuration: Debug x64 ------
31>------ Build started: Project: ClangDiagnosticAST, Configuration: Debug x64 ------
32>------ Build started: Project: ClangDiagnosticAnalysis, Configuration: Debug x64 ------
33>------ Build started: Project: ClangDiagnosticComment, Configuration: Debug x64 ------
34>------ Build started: Project: ClangDiagnosticCommon, Configuration: Debug x64 ------
35>------ Build started: Project: ClangDiagnosticDriver, Configuration: Debug x64 ------
36>------ Build started: Project: ClangDiagnosticFrontend, Configuration: Debug x64 ------
37>------ Build started: Project: AttributeCompatFuncTableGen, Configuration: Debug x64 ------
38>------ Build started: Project: ClangDriverOptions, Configuration: Debug x64 ------
39>------ Build started: Project: clang-headers, Configuration: Debug x64 ------
2> Updating DiagnosticLexKinds.inc...
8> Updating StmtNodes.inc...
5> Updating DiagnosticSerializationKinds.inc...
9> Updating arm_neon.inc...
1> Updating Attributes.inc...
40>------ Build started: Project: AMDGPUCommonTableGen, Configuration: Debug x64 ------
41>------ Build started: Project: AArch64CommonTableGen, Configuration: Debug x64 ------
42>------ Build started: Project: XCoreCommonTableGen, Configuration: Debug x64 ------
43>------ Build started: Project: ARMCommonTableGen, Configuration: Debug x64 ------
44>------ Build started: Project: X86CommonTableGen, Configuration: Debug x64 ------
45>------ Build started: Project: MSP430CommonTableGen, Configuration: Debug x64 ------
46>------ Build started: Project: HexagonCommonTableGen, Configuration: Debug x64 ------
47>------ Build started: Project: SparcCommonTableGen, Configuration: Debug x64 ------
48>------ Build started: Project: SystemZCommonTableGen, Configuration: Debug x64 ------
49>------ Build started: Project: PowerPCCommonTableGen, Configuration: Debug x64 ------
50>------ Build started: Project: BPFCommonTableGen, Configuration: Debug x64 ------
51>------ Build started: Project: NVPTXCommonTableGen, Configuration: Debug x64 ------
52>------ Build started: Project: MipsCommonTableGen, Configuration: Debug x64 ------
24> Updating CommentCommandInfo.inc...
20> Updating AttrParserStringSwitches.inc...
25> Updating CommentCommandList.inc...
26> Updating CommentHTMLNamedCharacterReferences.inc...
27> Updating CommentHTMLTags.inc...
29> Updating CommentNodes.inc...
28> Updating CommentHTMLTagsProperties.inc...
30> Updating DeclNodes.inc...
31> Updating DiagnosticASTKinds.inc...
32> Updating DiagnosticAnalysisKinds.inc...
34> Updating DiagnosticCommonKinds.inc...
33> Updating DiagnosticCommentKinds.inc...
35> Updating DiagnosticDriverKinds.inc...
36> Updating DiagnosticFrontendKinds.inc...
37> Updating AttributesCompatFunc.inc...
39> Updating arm_neon.h...
43> Updating ARMGenMCPseudoLowering.inc...
41> Updating AArch64GenRegisterInfo.inc...
42> Updating XCoreGenRegisterInfo.inc...
53>------ Build started: Project: ClangSACheckers, Configuration: Debug x64 ------
50> Updating BPFGenRegisterInfo.inc...
49> Updating PPCGenAsmMatcher.inc...
45> Updating MSP430GenRegisterInfo.inc...
48> Updating SystemZGenAsmMatcher.inc...
44> Updating X86GenRegisterInfo.inc...
40> Updating AMDGPUGenRegisterInfo.inc...
46> Updating HexagonGenAsmMatcher.inc...
47> Updating SparcGenRegisterInfo.inc...
51> Updating NVPTXGenRegisterInfo.inc...
52> Updating MipsGenRegisterInfo.inc...
39> Copying clang's arm_neon.h...
53> Updating Checkers.inc...
43> Updating ARMGenAsmMatcher.inc...
50> Updating X86GenAsmMatcher.inc...
42> Updating XCoreGenCallingConv.inc...
41> Updating AArch64GenMCPseudoLowering.inc...
45> Updating MSP430GenDAGISel.inc...
44> Updating X86GenAsmMatcher.inc...
48> Updating SystemZGenCallingConv.inc...
49> Updating PPCGenRegisterInfo.inc...
47> Updating SparcGenAsmMatcher.inc...
40> Updating AMDGPUGenCallingConv.inc...
46> Updating HexagonGenCallingConv.inc...
51> Updating NVPTXGenSubtargetInfo.inc...
52> Updating MipsGenCallingConv.inc...
43> Updating ARMGenCallingConv.inc...
50> Updating BPFGenCallingConv.inc...
42> Updating XCoreGenSubtargetInfo.inc...
41> Updating AArch64GenFastISel.inc...
45> Updating MSP430GenCallingConv.inc...
44> Updating X86GenFastISel.inc...
48> Updating SystemZGenRegisterInfo.inc...
49> Updating PPCGenFastISel.inc...
47> Updating SparcGenDAGISel.inc...
40> Updating AMDGPUGenSubtargetInfo.inc...
46> Updating HexagonGenDFAPacketizer.inc...
52> Updating MipsGenMCPseudoLowering.inc...
50> Updating BPFGenSubtargetInfo.inc...
41> Updating AArch64GenCallingConv.inc...
45> Updating MSP430GenSubtargetInfo.inc...
44> Updating X86GenCallingConv.inc...
48> Updating SystemZGenSubtargetInfo.inc...
49> Updating PPCGenCallingConv.inc...
47> Updating SparcGenCallingConv.inc...
40> Updating AMDGPUGenIntrinsics.inc...
46> Updating HexagonGenRegisterInfo.inc...
49> Updating PPCGenSubtargetInfo.inc...
44> Updating X86GenSubtargetInfo.inc...
40> Updating AMDGPUGenDFAPacketizer.inc...
46> Updating HexagonGenSubtargetInfo.inc...
54>------ Build started: Project: check-clang-tools, Configuration: Debug x64 ------
54> Running the Clang extra tools' regression tests
54> -- Testing: 244 tests, 32 threads --
54> FAIL: Clang Tools :: clang-tidy/modernize-raw-string-literal.cpp (114 of 244)
54> **** TEST 'Clang Tools :: clang-tidy/modernize-raw-string-literal.cpp' FAILED ****
54> Script:
54> --
54> D:/Python27/python.exe E:/llvm/llvm/tools/clang/tools/extra/test/../test\clang-tidy\check_clang_tidy.py E:\llvm\llvm\tools\clang\tools\extra\test\clang-tidy\modernize-raw-string-literal.cpp modernize-raw-string-literal E:\llvm\2015\tools\clang\tools\extra\test\clang-tidy\Output\modernize-raw-string-literal.cpp.tmp
54> --
54> Exit Code: 1
54>
54> Command Output (stdout):
54> --
54> Command 0: "D:/Python27/python.exe" "E:/llvm/llvm/tools/clang/tools/extra/test/../test\clang-tidy\check_clang_tidy.py" "E:\llvm\llvm\tools\clang\tools\extra\test\clang-tidy\modernize-raw-string-literal.cpp" "modernize-raw-string-literal" "E:\llvm\2015\tools\clang\tools\extra\test\clang-tidy\Output\modernize-raw-string-literal.cpp.tmp"
54> Command 0 Result: 1
54> Command 0 Output:
54> Running ['clang-tidy', 'E:\\llvm\\2015\\tools\\clang\\tools\\extra\\test\\clang-tidy\\Output\\modernize-raw-string-literal.cpp.tmp.cpp', '-fix', '--checks=-*,modernize-raw-string-literal', '--', '--std=c++11']...
54> ------------------------ clang-tidy output -----------------------
54> 28 warnings and 8 errors generated.
54>
54> Error while processing E:\llvm\2015\tools\clang\tools\extra\test\clang-tidy\Output\modernize-raw-string-literal.cpp.tmp.cpp.
54>
54> E:\llvm\2015\tools\clang\tools\extra\test\clang-tidy\Output\modernize-raw-string-literal.cpp.tmp.cpp:3:29: warning: escaped string literal can be written as a raw string literal [modernize-raw-string-literal]
54> char const *const BackSlash{"goink\\frob"};
54> ^
54> R"(goink\frob)"
54> E:\llvm\2015\tools\clang\tools\extra\test\clang-tidy\Output\modernize-raw-string-literal.cpp.tmp.cpp:49:1: error: unknown type name 'char16_t' [clang-diagnostic-error]
54> char16_t const *const UTF16Literal{u"foobie\\bletch"};
54> ^
54> E:\llvm\2015\tools\clang\tools\extra\test\clang-tidy\Output\modernize-raw-string-literal.cpp.tmp.cpp:49:10: error: expected unqualified-id [clang-diagnostic-error]
54> char16_t const *const UTF16Literal{u"foobie\\bletch"};
54> ^
54> E:\llvm\2015\tools\clang\tools\extra\test\clang-tidy\Output\modernize-raw-string-literal.cpp.tmp.cpp:50:1: error: unknown type name 'char16_t' [clang-diagnostic-error]
54> char16_t const *const UTF16RawLiteral{uR"(foobie\\bletch)"};
54> ^
54> E:\llvm\2015\tools\clang\tools\extra\test\clang-tidy\Output\modernize-raw-string-literal.cpp.tmp.cpp:50:10: error: expected unqualified-id [clang-diagnostic-error]
54> char16_t const *const UTF16RawLiteral{uR"(foobie\\bletch)"};
54> ^
54> E:\llvm\2015\tools\clang\tools\extra\test\clang-tidy\Output\modernize-raw-string-literal.cpp.tmp.cpp:51:1: error: unknown type name 'char32_t' [clang-diagnostic-error]
54> char32_t const *const UTF32Literal{U"foobie\\bletch"};
54> ^
54> E:\llvm\2015\tools\clang\tools\extra\test\clang-tidy\Output\modernize-raw-string-literal.cpp.tmp.cpp:51:10: error: expected unqualified-id [clang-diagnostic-error]
54> char32_t const *const UTF32Literal{U"foobie\\bletch"};
54> ^
54> E:\llvm\2015\tools\clang\tools\extra\test\clang-tidy\Output\modernize-raw-string-literal.cpp.tmp.cpp:52:1: error: unknown type name 'char32_t' [clang-diagnostic-error]
54> char32_t const *const UTF32RawLiteral{UR"(foobie\\bletch)"};
54> ^
54> E:\llvm\2015\tools\clang\tools\extra\test\clang-tidy\Output\modernize-raw-string-literal.cpp.tmp.cpp:52:10: error: expected unqualified-id [clang-diagnostic-error]
54> char32_t const *const UTF32RawLiteral{UR"(foobie\\bletch)"};
54> ^
54> E:\llvm\2015\tools\clang\tools\extra\test\clang-tidy\Output\modernize-raw-string-literal.cpp.tmp.cpp:56:31: warning: escaped string literal can be written as a raw string literal [modernize-raw-string-literal]
54> char const *const SingleQuote{"goink\'frob"};
54> ^
54> R"(goink'frob)"
54> E:\llvm\2015\tools\clang\tools\extra\test\clang-tidy\Output\modernize-raw-string-literal.cpp.tmp.cpp:60:31: warning: escaped string literal can be written as a raw string literal [modernize-raw-string-literal]
54> char const *const DoubleQuote{"goink\"frob"};
54> ^
54> R"(goink"frob)"
54> E:\llvm\2015\tools\clang\tools\extra\test\clang-tidy\Output\modernize-raw-string-literal.cpp.tmp.cpp:64:32: warning: escaped string literal can be written as a raw string literal [modernize-raw-string-literal]
54> char const *const QuestionMark{"goink\?frob"};
54> ^
54> R"(goink?frob)"
54> E:\llvm\2015\tools\clang\tools\extra\test\clang-tidy\Output\modernize-raw-string-literal.cpp.tmp.cpp:68:25: warning: escaped string literal can be written as a raw string literal [modernize-raw-string-literal]
54> char const *const RegEx{"goink\\(one|two\\)\\\\\\?.*\\nfrob"};
54> ^
54> R"(goink\(one|two\)\\\?.*\nfrob)"
54> E:\llvm\2015\tools\clang\tools\extra\test\clang-tidy\Output\modernize-raw-string-literal.cpp.tmp.cpp:72:24: warning: escaped string literal can be written as a raw string literal [modernize-raw-string-literal]
54> char const *const Path{"C:\\Program Files\\Vendor\\Application\\Application.exe"};
54> ^
54> R"(C:\Program Files\Vendor\Application\Application.exe)"
54> E:\llvm\2015\tools\clang\tools\extra\test\clang-tidy\Output\modernize-raw-string-literal.cpp.tmp.cpp:76:36: warning: escaped string literal can be written as a raw string literal [modernize-raw-string-literal]
54> char const *const ContainsSentinel{"who\\ops)\""};
54> ^
54> R"lit(who\ops)")lit"
54> E:\llvm\2015\tools\clang\tools\extra\test\clang-tidy\Output\modernize-raw-string-literal.cpp.tmp.cpp:80:33: warning: escaped string literal can be written as a raw string literal [modernize-raw-string-literal]
54> char const *const ContainsDelim{"whoops)\")lit\""};
54> ^
54> R"lit1(whoops)")lit")lit1"
54> E:\llvm\2015\tools\clang\tools\extra\test\clang-tidy\Output\modernize-raw-string-literal.cpp.tmp.cpp:84:34: warning: escaped string literal can be written as a raw string literal [modernize-raw-string-literal]
54> char const *const OctalPrintable{"\100\\"};
54> ^
54> R"(@\)"
54> E:\llvm\2015\tools\clang\tools\extra\test\clang-tidy\Output\modernize-raw-string-literal.cpp.tmp.cpp:88:32: warning: escaped string literal can be written as a raw string literal [modernize-raw-string-literal]
54> char const *const HexPrintable{"\x40\\"};
54> ^
54> R"(@\)"
54> E:\llvm\2015\tools\clang\tools\extra\test\clang-tidy\Output\modernize-raw-string-literal.cpp.tmp.cpp:104:25: warning: escaped string literal can be written as a raw string literal [modernize-raw-string-literal]
54> char const *const Str{"foo\\bar"};
54> ^
54> R"(foo\bar)"
54> E:\llvm\2015\tools\clang\tools\extra\test\clang-tidy\Output\modernize-raw-string-literal.cpp.tmp.cpp:111:25: warning: escaped string literal can be written as a raw string literal [modernize-raw-string-literal]
54> char const *const Str{"foo\\bar"};
54> ^
54> R"(foo\bar)"
54> E:\llvm\2015\tools\clang\tools\extra\test\clang-tidy\Output\modernize-raw-string-literal.cpp.tmp.cpp:117:11: warning: escaped string literal can be written as a raw string literal [modernize-raw-string-literal]
54> fn<int>("foo\\bar");
54> ^
54> R"(foo\bar)"
54> E:\llvm\2015\tools\clang\tools\extra\test\clang-tidy\Output\modernize-raw-string-literal.cpp.tmp.cpp:120:14: warning: escaped string literal can be written as a raw string literal [modernize-raw-string-literal]
54> fn<double>("foo\\bar");
54> ^
54> R"(foo\bar)"
54> Found compiler errors, but -fix-errors was not specified.
54>
54> Fixes have NOT been applied.
54>
54>
54>
54>
54> ------------------------------------------------------------------
54> ------------------------------ Fixes -----------------------------
54>
54> ------------------------------------------------------------------
54> FileCheck failed:
54> E:\llvm\llvm\tools\clang\tools\extra\test\clang-tidy\modernize-raw-string-literal.cpp:5:17: error: expected string not found in input
54>
54> CHECK-FIXES: {{^}}char const *const BackSlash{R"(goink\frob)"};{{$}}
54>
54> ^
54>
54> E:\llvm\2015\tools\clang\tools\extra\test\clang-tidy\Output\modernize-raw-string-literal.cpp.tmp.cpp:1:1: note: scanning from here
54>
54>
RUN: %check_clang_tidy %s modernize-raw-string-literal %t
54>
54> ^
54>
54> E:\llvm\2015\tools\clang\tools\extra\test\clang-tidy\Output\modernize-raw-string-literal.cpp.tmp.cpp:3:1: note: possible intended match here
54>
54> char const *const BackSlash{"goink\\frob"};
54>
54> ^
54>
54>
54>
54>
54> Command 0 Stderr:
54> Traceback (most recent call last):
54>
54> File "E:/llvm/llvm/tools/clang/tools/extra/test/../test\clang-tidy\check_clang_tidy.py", line 135, in <module>
54>
54> main()
54>
54> File "E:/llvm/llvm/tools/clang/tools/extra/test/../test\clang-tidy\check_clang_tidy.py", line 116, in main
54>
54> stderr=subprocess.STDOUT)
54>
54> File "D:\Python27\lib\subprocess.py", line 573, in check_output
54>
54> raise CalledProcessError(retcode, cmd, output=output)
54>
54> subprocess.CalledProcessError: Command '['FileCheck', '-input-file=E:\\llvm\\2015\\tools\\clang\\tools\\extra\\test\\clang-tidy\\Output\\modernize-raw-string-literal.cpp.tmp.cpp', 'E:\\llvm\\llvm\\tools\\clang\\tools\\extra\\test\\clang-tidy\\modernize-raw-string-literal.cpp', '-check-prefix=CHECK-FIXES', '-strict-whitespace']' returned non-zero exit status 1
54>
54>
54>
54>
54> --
54>
54> ****
54>
54> Testing Time: 6.68s
54> ****
54> Failing Tests (1):
54> Clang Tools :: clang-tidy/modernize-raw-string-literal.cpp
54>
54> Expected Passes : 242
54> Unsupported Tests : 1
54> Unexpected Failures: 1
54>C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V140\Microsoft.CppCommon.targets(171,5): error MSB6006: "cmd.exe" exited with code 1.

Build: 53 succeeded, 1 failed, 161 up-to-date, 0 skipped

Looks like I forgot to remove brace initializers from the test files. I will fix that.

Chris Lattner has given me commit access now, so I can commit on my own.

Update release notes

Remove brace initializers from test code.