This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] [PR49298] Sort includes pass will sort inside raw strings
ClosedPublic

Authored by MyDeveloperDay on Dec 6 2021, 9:55 AM.

Details

Summary

https://bugs.llvm.org/show_bug.cgi?id=49298

clang-format does not respect raw string literals when sorting includes

const char *RawStr = R"(
#include "headerB.h"
#include "headerA.h"
)";

Running clang-format over with SortIncludes enabled transforms this code to:

const char *RawStr = R"(
#include "headerA.h"
#include "headerB.h"
)";

The following code tries to minimize this impact during IncludeSorting, by treating R"( and )" as equivalent of // clang-format off/on

Diff Detail

Event Timeline

MyDeveloperDay requested review of this revision.Dec 6 2021, 9:55 AM
MyDeveloperDay created this revision.
curdeius added inline comments.Dec 6 2021, 10:29 AM
clang/lib/Format/Format.cpp
2603

"we we"?

2604

This won't be enough for raw literals with custom delimiter e.g. R"xyz(...)xyz".
But well, without any code parsing, this function will always be easily outsmarted I guess.
Maybe we can use regexp here and save the delimiter? (And then check it below)

clang/lib/Format/Format.cpp
2604

And then we need to check first for closing, since we can close and reopen on the same line, or not?

curdeius added inline comments.Dec 6 2021, 11:45 AM
clang/lib/Format/Format.cpp
2604

It's done below anyway (i.e. Trimmed is still the same line), nope?

clang/lib/Format/Format.cpp
2604

Yes but if we save the delimiter we will save the new one, before checking for the old one to close.
I was just thinking we must close the old one correctly before "opening" the new one. But after rethinking about it, that shouldn't matter.

MyDeveloperDay updated this revision to Diff 392674.EditedDec 8 2021, 1:02 AM

Allow for optional Raw String CharSequences

MyDeveloperDay marked 5 inline comments as done.Dec 8 2021, 1:02 AM
curdeius requested changes to this revision.Dec 8 2021, 1:34 AM
curdeius added inline comments.
clang/lib/Format/Format.cpp
2589

A raw string literal is:

prefix(optional) R"d-char-sequence(optional) (r-char-sequence(optional))d-char-sequence(optional)"	

d-char-sequence	-	A sequence of one or more d-chars, at most 16 characters long
d-char	-	A character from the basic source character set (until C++23)basic character set (since C++23), except parentheses, backslash and spaces

So, you missed the digits and the characters: _{}[]#<>%:;.?*+-/^&|~!=,'.
Please add a test case.
Mind the need to escape the square brackets [] and the minus sign - in the regexp (the latter can be put at the beginning or at the end too).

Cf. https://en.cppreference.com/w/cpp/language/string_literal, https://godbolt.org/z/rb61WzMcs

2606–2608

Since CharSequence is empty, you might want to remove the if altogether and create RawStringTermination unconditionally.

Oh, actually, as it's used in the for loop, you *have to* reassign RawStringTermination, otherwise the code like:

R"x(...
...)x"; // RawStringTermination is "x"
R"(...
...); // RawStringTermination would still be "x"
#include <b.h> // not sorted but it should
#include <a.h> // not sorted but it should

will misbehave.
Please create a test case for this.

This revision now requires changes to proceed.Dec 8 2021, 1:34 AM
MyDeveloperDay added inline comments.Dec 8 2021, 2:27 AM
clang/lib/Format/Format.cpp
2606–2608

Correct I need to remove the if

So, you missed the digits and the characters: _{}[]#<>%:;.?*+-/^&|~!=,'.

The standard is nuts sometimes... let me add that madness in ;-)

Add a suitable amount of C++ madness, (to the limit of what I could coax the regex to accept)

clang/unittests/Format/SortIncludesTest.cpp
1162

You should add a #undef X.

curdeius added inline comments.Dec 9 2021, 11:36 AM
clang/lib/Format/Format.cpp
2590

I'm picky but you missed [ and ], no?

2604

Will this match the first or the last occurrence? If it's the last then it's ok. Otherwise, we should change it. Also, please test something along the lines of:

R"x(...)x" ... on the same line R"y(
#include "..."
)y"
clang/unittests/Format/SortIncludesTest.cpp
1150

Can't | be tested? Same for [ and ]?

MyDeveloperDay added inline comments.Dec 9 2021, 12:15 PM
clang/lib/Format/Format.cpp
2590

I couldn't get them to work, even if I escaped them.. let me go look at the Regex unit tests

curdeius added inline comments.Dec 9 2021, 12:46 PM
clang/lib/Format/Format.cpp
2590

You can try putting`] at the beginning (just after the opening[`). It *should* work.

Add support for | but still can't get [] to work in the regex (checked the unit tests for Support/Regex and its not clear if its supported)

MyDeveloperDay marked 4 inline comments as done.Dec 10 2021, 4:48 AM

Add support for | but still can't get [] to work in the regex (checked the unit tests for Support/Regex and its not clear if its supported)

How about "R\"(([A-Za-z0-9_{}#<>%:;.?*+/^&\\$|~!=,'\\-]|]|\[)*)\\(", not putting the [] symbols within the [].

Does it work with std::regex which should also be available?

MyDeveloperDay updated this revision to Diff 393474.EditedDec 10 2021, 6:53 AM

Thanks @HazardyKnusperkeks you got me over the line, I was able to escape the [ but not the ] so I used your trick there

"R\"(([\\[A-Za-z0-9_{}#<>%:;.?*+/^&\\$|~!=,'\\-]|])*)\\("

Add unit tests to cover those cases too

MyDeveloperDay marked 3 inline comments as done.Dec 10 2021, 6:54 AM
curdeius accepted this revision.Dec 10 2021, 8:51 AM

LGTM. Great!

This revision is now accepted and ready to land.Dec 10 2021, 8:51 AM