Page MenuHomePhabricator

[clang-format] SortIncludes should support "@import" lines in Objective-C
Changes PlannedPublic

Authored by kwk on Mar 10 2022, 3:44 AM.

Details

Summary

[clang-format] SortIncludes should support "@import" lines in Objective-C

Fixes #38995

This is an attempt to modify the regular expression to identify
@import and import alongside the regular #include. The challenging
part was not to support @ in addition to # but how to handle
everything that comes after the include|import keywords. Previously
everything that wasn't " or < was consumed. But as you can see in
this example from the issue #38995, there is no " or < following the
keyword:

@import Foundation;

I experimented with a lot of fancy and useful expressions in this online regex tool only to find out that some
things are simply not supported by the regex implementation in LLVM.

  • For example the beginning [\t\ ]* should be replacable by the horizontal whitespace character \h* but this will break the SortIncludesTest.LeadingWhitespace test.

That's why I've chosen to come back to the basic building blocks.

The essential change in this patch is the change from this regular
expression:

^[\t\ ]*#[\t\ ]*(import|include)[^"<]*(["<][^">]*[">])
        ~                              ~~~~~~~~~~~~~~
        ^                              ^
        |                              |
        only support # prefix not @    |
                                       only support "" and <> as delimiters
                                       no support for C++ modules and ;
                                       ending. Also this allows for ">
                                       or <" or "" or <> which all seems
                                       either off or wrong.

to this:

^[\t\ ]*[@#][\t\ ]*(import|include)([^"/]*("[^"]+")|[^</]*(<[^>]+>)|[\t/\ ]*([^;]+;))
        ~~~~                        ~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~
        ^                                 ^           ^       ^       ^
        |                                 |           |       |       |
        Now support @ and #.            Clearly support "" and <> as ell as an
                                        include name without enclosing characters.
                                        Allows for no mixture of "> or <" or
                                        empty include names. Trailing
                                        comments on include lines as shown in
                                        D124513 are ignored now by
                                        ignoring slashes in the characters that
                                        lead up to a match group.

Here is how I've tested this patch:

ninja clang-Format
ninja FormatTests
./tools/clang/unittests/Format/FormatTests
--gtest_filter=SortIncludesTest*

And if that worked I doubled checked that nothing else broke by running
all format checks:

./tools/clang/unittests/Format/FormatTests

One side effect of this change is it should partially support
C++20 Module
import lines without the optional export in front. Adding
this can be a change on its own that shouldn't be too hard. I say
partially because the @ or # are currently *NOT* optional in the
regular expression.

I see an opportunity to optimized the matching to exclude @include for
example. But eventually these should be caught by the compiler, so...

With my change, the matching group is not at a fixed position any
longer. I decided to choose the last match (group) that is not empty.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
MyDeveloperDay added inline comments.Mar 16 2022, 2:58 PM
clang/lib/Format/Format.cpp
2801

int false?

This revision now requires changes to proceed.Mar 16 2022, 2:58 PM
kwk updated this revision to Diff 416876.Mar 21 2022, 3:05 AM
kwk marked an inline comment as done.
  • Address review comments on grouping and main headers
kwk updated this revision to Diff 416877.Mar 21 2022, 3:06 AM
  • Fix: int -> bool
kwk added a comment.Mar 21 2022, 3:11 AM

Some test with a main header and blocks would be nice.

This one please. :)

Done.

clang/lib/Format/Format.cpp
2801

int false?

Yes, this was just to see if everybody is awake ;). Just kiddin'. Good catch.

kwk added a comment.Mar 21 2022, 3:13 AM
This comment was removed by kwk.
kwk updated this revision to Diff 416878.EditedMar 21 2022, 3:19 AM
  • Fix tests

But please fix the format. :)

kwk updated this revision to Diff 417097.Mar 21 2022, 2:21 PM
kwk marked an inline comment as done.
  • introduce getIncludeNameFromMatches
  • Fix format
kwk marked 3 inline comments as done.Mar 21 2022, 2:24 PM

@HazardyKnusperkeks thank you for the approval. Can you have one last view please? I've introduced a function to get the match instead of repeating the for loop two times.

kwk updated this revision to Diff 417099.Mar 21 2022, 2:32 PM
  • Remove left-over comment
kwk updated this revision to Diff 417101.Mar 21 2022, 2:34 PM
  • Remove duplicate assert
kwk added a comment.Mar 22 2022, 2:37 AM

@MyDeveloperDay can you please have another look now that the patch has additional tests and most comments that still apply have been addressed?

HazardyKnusperkeks added inline comments.
clang/lib/Format/Format.cpp
2733–2736
2738

Is there something like LLVM_UNREACHABLE?

kwk updated this revision to Diff 417379.Mar 22 2022, 1:11 PM
  • Address review comment
kwk marked 2 inline comments as done.Mar 22 2022, 1:13 PM

@HazardyKnusperkeks I've addressed your comments and did an early return together with llvm_unreachable() which is used in the clang/lib/Format in other places as well. I hope this is to your liking. There's no longer an assert. Do you want that as well?

owenpan added inline comments.Mar 22 2022, 5:35 PM
clang/lib/Format/Format.cpp
2731–2734

I think you missed Matches[0].

3083

We should make it static if it's only used in this file.

clang/lib/Tooling/Inclusions/HeaderIncludes.cpp
171–172

If these are the same as in Format.cpp above, we should move the definitions to HeaderIncludes.h.

owenpan added inline comments.Mar 22 2022, 5:42 PM
clang/lib/Tooling/Inclusions/HeaderIncludes.cpp
171–172

If these are the same as in Format.cpp above, we should move the definitions to HeaderIncludes.h.

I meant we should remove the definitions from Format.cpp and add the declarations to HeaderIncludes.h.

kwk updated this revision to Diff 423254.Apr 16 2022, 12:19 PM
  • Put shared code for finding include names HeaderIncludes
kwk updated this revision to Diff 423255.Apr 16 2022, 12:23 PM
  • Added llvm:: namespace
kwk updated this revision to Diff 423256.Apr 16 2022, 12:26 PM
  • Use proposed reverse loop
kwk updated this revision to Diff 423257.Apr 16 2022, 12:29 PM
kwk marked an inline comment as done.
  • Make function static
kwk marked 3 inline comments as done.Apr 16 2022, 12:30 PM
kwk added inline comments.
clang/lib/Format/Format.cpp
2731–2734

I did miss it and fixed it but it doesn't make much sense to use the first match because that's always the overall string that's being matched. Nevertheless, I've used your porposed reverse loop and the logic is better if it includes even the first match.

clang/lib/Tooling/Inclusions/HeaderIncludes.cpp
171–172

I think I've done what you intended me to do.

Should we handle #import and @import for Object-C only so as to simply the regex for C++?

clang/include/clang/Tooling/Inclusions/HeaderIncludes.h
136

I think you meant @import bar;, which is valid in Objective-C only. So is #import btw.

139–147

Function names should start with a lowercase letter.

kwk updated this revision to Diff 423582.Apr 19 2022, 3:45 AM
  • Make functions lowercase
kwk marked 2 inline comments as done.Apr 19 2022, 3:51 AM

Should we handle #import and @import for Object-C only so as to simply the regex for C++?

@owenpan my counter question probably reveals how few things I know about clang-format and how much I still have to learn. I only see sortCppIncludes but no sortObjCIncludes in the function list in Format.h. That said, do you want to introduce that or do you want to put logic in sortCppIncludes that knows about the programming language at hand and decides which regex to pick?

kwk updated this revision to Diff 423584.Apr 19 2022, 3:53 AM
  • typo: @include bar; -> @import bar;

The hard part about doing 2 separate cases is where we miss-interpret a .hpp or .h file as being the wrong language (C/C++/ObjC/ObjC++), I don't think I have a problem with a single regex (its not like we change it that often).

Is there a use case for having them separate or not?

MyDeveloperDay accepted this revision.Apr 19 2022, 6:56 AM

Accepting but please wait for @owenpan

This revision is now accepted and ready to land.Apr 19 2022, 6:56 AM
owenpan accepted this revision.Apr 19 2022, 9:00 PM

LGTM as I defer to @MyDeveloperDay.

kwk edited the summary of this revision. (Show Details)Apr 20 2022, 12:00 AM
kwk edited the summary of this revision. (Show Details)Apr 20 2022, 12:05 AM
This revision was landed with ongoing or failed builds.Apr 20 2022, 12:22 AM
This revision was automatically updated to reflect the committed changes.

It appears that this regressed include sorting in the following case, where the contents of test.h show the expected include order and the clang-format behavior before this patch:

% cat test.h
#include <cstdint>

#include "util/bar.h"
#include "util/foo/foo.h"  // foo<T>
% bin/clang-format --version; bin/clang-format -style=google test.h
clang-format version 15.0.0 (https://github.com/llvm/llvm-project.git d46fa023caa2db5a9f1e21dd038bcb626261d958)
#include "util/foo/foo.h"  // foo<T>
#include <cstdint>

#include "util/bar.h"

@kwk could you please take a look

kwk added a comment.Apr 26 2022, 12:01 PM

It appears that this regressed include sorting in the following case, where the contents of test.h show the expected include order and the clang-format behavior before this patch:

% cat test.h
#include <cstdint>

#include "util/bar.h"
#include "util/foo/foo.h"  // foo<T>
% bin/clang-format --version; bin/clang-format -style=google test.h
clang-format version 15.0.0 (https://github.com/llvm/llvm-project.git d46fa023caa2db5a9f1e21dd038bcb626261d958)
#include "util/foo/foo.h"  // foo<T>
#include <cstdint>

#include "util/bar.h"

@kwk could you please take a look

@krasimir I'm on sick leave and will have a look as soon as I'm back. Can it wait that long?

@krasimir I'm on sick leave and will have a look as soon as I'm back. Can it wait that long?

I'm planning to revert this temporarily and add a regression test, you can proceed with this work without time pressure. We aim to release a new clang-format version weekly to ease the process of finding and reporting regressions.

kwk reopened this revision.May 3 2022, 7:24 AM
This revision is now accepted and ready to land.May 3 2022, 7:24 AM
kwk updated this revision to Diff 426701.May 3 2022, 7:26 AM

Address trailing comments on include lines by ignoring slashes in
characters that lead up to the match groups.

^[\t\ ]*[@#][\t\ ]*(import|include)([^"/]*("[^"]+")|[^</]*(<[^>]+>)|[\t/\ ]*([^;]+;))
        ~~~~                        ~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~
        ^                                 ^           ^       ^       ^
        |                                 |           |       |       |
        Now support @ and #.            Clearly support "" and <> as ell as an
                                        include name without enclosing characters.
                                        Allows for no mixture of "> or <" or
                                        empty include names. Trailing
                                        comments on include lines as shown in
                                        D124513 are ignored now by
                                        ignoring slashes in the characters that
                                        lead up to a match group.
kwk added a comment.May 3 2022, 7:32 AM

@krasimir could you please test this patch on your side? I've reopened it and tested it with the following file that contains both // and /* ... */ trailing comments on include lines followed by foo<T> and alike. The includes use <...> and "..." notation to locate files.

*test.h*

#include <cstdint>

#include "util/bar.h"
#include "util/foo/foo.h"  // foo<T>
#include "util/bar/bar.h"  // bar
#include "utilzbar0.h"  // bar0<T>
#include <util/foo/foo3.h>  // foo3<T>
#include <util/bar/bar3.h>  // bar3
#include <util/foo/foo2.h>  // foo2<T>
#include <util/bar/bar2.h>  // bar2
#include "util/bar.h"
#include "util/foo/foo.h"  /* foo<T> */
#include "util/bar/bar.h"  /* bar */
#include "utilzbar0.h"  /* bar0<T> */
#include <util/foo/foo3.h>  /* foo3<T> */
#include <util/bar/bar3.h>  /* bar3 */
#include <util/foo/foo2.h>  /* foo2<T> */
#include <util/bar/bar2.h>  /* bar2 */

Run test with ninja clang-format && bin/clang-format -style=google test.h:

#include <util/bar/bar2.h>  // bar2
#include <util/bar/bar2.h>  /* bar2 */
#include <util/bar/bar3.h>  // bar3
#include <util/bar/bar3.h>  /* bar3 */
#include <util/foo/foo2.h>  // foo2<T>
#include <util/foo/foo2.h>  /* foo2<T> */
#include <util/foo/foo3.h>  // foo3<T>
#include <util/foo/foo3.h>  /* foo3<T> */

#include <cstdint>

#include "util/bar.h"
#include "util/bar/bar.h"  // bar
#include "util/bar/bar.h"  /* bar */
#include "util/foo/foo.h"  // foo<T>
#include "util/foo/foo.h"  /* foo<T> */
#include "utilzbar0.h"     // bar0<T>
#include "utilzbar0.h"     /* bar0<T> */

To me, this looks precisely how it should look like. What do you think?

kwk edited the summary of this revision. (Show Details)May 3 2022, 7:33 AM
kwk planned changes to this revision.May 3 2022, 9:00 AM

Some thing that doesn't work at the moment is ordering an include like this:

#include /*some include*/ "wontwork.h"

I wonder if it worked before. Let's see. Yes it worked. Grrr. I guess I need to do more...

kwk added a comment.May 3 2022, 9:13 AM

I found another problem with both regular expressions (pre mine and mine): They cannot find

/* a comment before */ #include <foo>

Is this a known problem @MyDeveloperDay @owenpan @krasimir ?

@krasimir could you please test this patch on your side?

@kwk thank you! the new version of the patch looks good!

I didn't know before about the issue you found with the /* a comment before */ #include <foo> (and since it's preexisting, if it's tricky to address, it would be reasonable to just file a bug for it and omit fixing it by this work, I think).

Now support @ and #.            Clearly support "" and <> as ell as an

as well as?

@krasimir could you please test this patch on your side?

@kwk thank you! the new version of the patch looks good!

I didn't know before about the issue you found with the /* a comment before */ #include <foo> (and since it's preexisting, if it's tricky to address, it would be reasonable to just file a bug for it and omit fixing it by this work, I think).

+1.

Eitot added a subscriber: Eitot.May 25 2022, 3:10 AM