This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Add case aware include sorting.
ClosedPublic

Authored by kentsommer on Jan 19 2021, 7:20 PM.

Details

Summary

Adds an option to [clang-format] which sorts headers in an alphabetical manner using case only for tie-breakers. The options is off by default in favor of the current ASCIIbetical sorting style.

Diff Detail

Event Timeline

kentsommer requested review of this revision.Jan 19 2021, 7:20 PM
kentsommer created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptJan 19 2021, 7:20 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
kentsommer added a comment.EditedJan 19 2021, 8:42 PM

I'll work on fixing the unit tests. Thought they ran with check-clang-format but I was obviously wrong.

kentsommer updated this revision to Diff 317790.EditedJan 20 2021, 12:07 AM

Fix both failing unit tests

kentsommer added a project: Restricted Project.Jan 20 2021, 2:16 PM

Please expand the test case to different options like grouping. Otherwise looks good.

Expanded unit test

@HazardyKnusperkeks let me know if the additions to the unit test are along the lines you were hoping for or not, I think this captures what you had in mind.

This revision is now accepted and ready to land.Jan 21 2021, 4:27 AM

I do not have commit access.

Full Name: Kent Sommer
Email: work@kentsommer.com

I can push that, but will wait a bit longer so the others have time to object.

curdeius added inline comments.Jan 24 2021, 1:00 PM
clang/docs/ClangFormatStyleOptions.rst
2039

The name is somewhat awkward IMO (in the sense it doesn't read nicely), but it's nice to be (alphabetically) closer to other Include-related options in the documentation.
I have no better name though. Just noticing.

2045

I'd write this in the same manner as other options, so sth like "When false: <code>... When true: <code>".

Otherwise, it's necessary to read up to the end of the description "This option is off by default" and conclude that off=false and that was the first snippet that corresponded to false.

clang/lib/Format/Format.cpp
2195

Maybe it would be better for CodeGen to get the if outside of sort? It would mean some code duplication, but maybe it could be mitigated with a lambda.

clang/unittests/Format/SortIncludesTest.cpp
602

You should test that this option is false for LLVM style.
BTW, shouldn't you set it somewhere so that the style is explicit?

MyDeveloperDay added inline comments.Jan 24 2021, 2:43 PM
clang/docs/ClangFormatStyleOptions.rst
2039

I agree how about IncludeSortCaseSensitive

kentsommer marked 5 inline comments as done.

Addressed comments

kentsommer added inline comments.Jan 24 2021, 6:04 PM
clang/docs/ClangFormatStyleOptions.rst
2039

I went with IncludeSortAlphabetically for now. To me, this reads better and seems to more clearly indicate the effect of turning this on. The downside is it no longer captures the fact that case does still play a part in the sort.

What do you both think?

2045

Good call, I definitely agree. I've rewritten this to better match the bool flag instances.

clang/lib/Format/Format.cpp
2195

This is a good point. I moved the if outside of the call to stable_sort. I don't feel there is actually much code duplication caused by this (other than the call signature).

Undo some accidental comment formatting

curdeius accepted this revision.Jan 25 2021, 12:26 AM

LGTM. Thanks for being prompt at addressing the comments. I'll merge it today for you if @MyDeveloperDay has no more remarks.

This revision was automatically updated to reflect the committed changes.
MyDeveloperDay added inline comments.Jan 25 2021, 2:18 PM
clang/docs/ClangFormatStyleOptions.rst
2040

Are you sure IncludeSortAlphabetically expresses what you mean? Once these things get released they cannot be changed easily.

If you were not sure of a new option, in my view we should slow down and make sure we have the correct design, for example you could have used a enum and it might have given you more possibility for greater flexibility

enum IncludeSort
{
       CaseInsensitive
       CaseSensitive
}

Please give people time to re-review your changes before we commit, especially if they've taken the time to look at your review in the first place. Just saying.

kentsommer added inline comments.Jan 25 2021, 3:07 PM
clang/docs/ClangFormatStyleOptions.rst
2040

Hi, @MyDeveloperDay I definitely agree. It was not my intention to rush through the review. I was simply trying to follow the process outlined in https://llvm.org/docs/Contributing.html#how-to-submit-a-patch which mentions giving sufficient information to allow for a commit on your behalf when you don't have access after an LGTM (which is all that I did). As you can see from the lack of additional comments from my end, I was happy to let this sit and be reviewed.

Per the discussion about the option itself, I do believe IncludeSortAlphabetically currently expresses what I mean as the behavior with this off is indeed not an alphabetical sort as case takes precedence over the alphabetical ordering. However, looking at the enum and realizing that others probably will have additional styles they prefer (maybe they want alphabetical but lower case first, etc.) I do believe it might have been a better way to go as it leaves more flexibility and room for additional ordering styles. Given that this just landed, I would be happy to open a patch to turn this into an enum as I do see benefits to doing so. What do you think?

Should we then revert ASAP and rework it later? @MyDeveloperDay

Should we then revert ASAP and rework it later? @MyDeveloperDay

Reverting would also be fine with me. I am happy to drive towards a feature (and flags) that everyone is comfortable with in whichever way makes the most sense.

Ok, reverted. If there's a doubt, then I guess it's the best solution.

curdeius reopened this revision.Jan 26 2021, 3:01 AM
This revision is now accepted and ready to land.Jan 26 2021, 3:01 AM
curdeius added inline comments.Jan 26 2021, 3:05 AM
clang/docs/ClangFormatStyleOptions.rst
2040

Hmmm, and how about using the existing option SortIncludes and change its type from bool to some enum?
We could then, for backward-compatibility, map false to (tentative) Never and true to ASCII/CaseInsensitive, and add CaseSensitive.

This will have the advantage of not adding additional style options.
... and it will prove once again that using bools for style options is not a good idea.

curdeius requested changes to this revision.Jan 26 2021, 3:06 AM

Requesting changes so that it appears correctly in the review queue.

This revision now requires changes to proceed.Jan 26 2021, 3:06 AM
MyDeveloperDay added inline comments.Jan 26 2021, 9:25 AM
clang/docs/ClangFormatStyleOptions.rst
2040

I think that is an excellent idea @curdeius

clang/docs/ClangFormatStyleOptions.rst
2040

I also fully support that! (Although I would not say a bool is per se bad.)

Turn SortIncludes into an enum

kentsommer marked an inline comment as done.Jan 28 2021, 1:22 AM
kentsommer added inline comments.
clang/docs/ClangFormatStyleOptions.rst
2040

@curdeius @MyDeveloperDay @HazardyKnusperkeks this is now done.

However... The command-line option (--sort-includes) is not in a place that I like at the moment but I am having trouble thinking of any really good options.

The issue as it stands is that there are a lot of usages of the flag that assume it is a bool and therefore sometimes do not pass any value. These of course could be updated along with the flag to accept a std::string value, however, this breaks backward capability for anyone relying on that flag not requiring a value. As I have it now, backward compatibility is maintained but the command line flag is rather severely limited compared to the configuration option. Thoughts on which path to take? A third option I have not considered?

Remove improper enum mapping

curdeius added inline comments.Jan 28 2021, 1:26 AM
clang/lib/Format/Format.cpp
399

Is this needed?

curdeius added inline comments.Jan 28 2021, 1:29 AM
clang/lib/Format/Format.cpp
399

Ok. You just fixed it.

Concerning the --sort-includes command-line flag. I'd rather keep it as is and, if need be, work on accepting an optional string argument.
Please update release notes.

curdeius added inline comments.Jan 28 2021, 1:34 AM
clang/docs/ClangFormatStyleOptions.rst
2040

I also fully support that! (Although I would not say a bool is per se bad.)

@HazardyKnusperkeks, I was of course a bit exaggerating :).

Concerning the --sort-includes command-line flag. I'd rather keep it as is and, if need be, work on accepting an optional string argument.
Please update release notes.

@curdeius Just to confirm, are you asking for the "commit" message to be updated or something else?

I was thinking of https://github.com/llvm/llvm-project/blob/main/clang/docs/ReleaseNotes.rst#clang-format. Look at the history to see how it was written before.
Also, you may change the revision title if you want (it will become the commit message).

Added release notes, updated commit message and summary

Fixed release notes typo

curdeius accepted this revision.Jan 28 2021, 4:23 AM

LGTM.

This revision is now accepted and ready to land.Jan 28 2021, 4:23 AM
curdeius retitled this revision from [clang-format] add case aware include sorting to [clang-format] Add case aware include sorting..Jan 28 2021, 4:24 AM
curdeius edited the summary of this revision. (Show Details)
MyDeveloperDay accepted this revision.Jan 28 2021, 10:15 AM

I like this much better LGTM

NFC rebase to fix CI

This revision was landed with ongoing or failed builds.Feb 2 2021, 6:12 AM
This revision was automatically updated to reflect the committed changes.

I find the naming of the case sensitive options confusing here.
When I read "CaseSensitive" I think of the behavior of strcmp() which sorts according to "ASCIIbetical" order. But here "CaseSensitive" throws away case by comparing the result of "Filename.lower()" which I would consider case-insensitive, or the same behavior as a call to stricmp().

Do you have an idea for better names?
I see that e.g. MS documentation uses ascending order and case-sensitive order.

Do you have an idea for better names?
I see that e.g. MS documentation uses ascending order and case-sensitive order.

I'm okay with the names, it just seems to me that they are reversed, where CaseSensitive should sort using ASCIIbetic order, and CaseInsensitive should ignore case entirely in the same way that most case-insensitive string compares would.

Do you have an idea for better names?
I see that e.g. MS documentation uses ascending order and case-sensitive order.

I'm okay with the names, it just seems to me that they are reversed, where CaseSensitive should sort using ASCIIbetic order, and CaseInsensitive should ignore case entirely in the same way that most case-insensitive string compares would.

After initially reading it I was not sure about it, but now I have looked into the wikipedia:

In computers, case sensitivity defines whether uppercase and lowercase letters are treated as distinct (case-sensitive) or equivalent (case-insensitive).

So I would support a name switch. I would wait a few days for @kentsommer to say something, if he doesn't respond just make a diff and add me as reviewer. :)

Do you have an idea for better names?
I see that e.g. MS documentation uses ascending order and case-sensitive order.

I'm okay with the names, it just seems to me that they are reversed, where CaseSensitive should sort using ASCIIbetic order, and CaseInsensitive should ignore case entirely in the same way that most case-insensitive string compares would.

Do you make a change? Otherwise I will do, but that will take some time, because I'm rather busy.

Do you make a change? Otherwise I will do, but that will take some time, because I'm rather busy.

Sorry, I have not; unfortunately I've been rather busy myself and haven't had a moment to get on it.