This is an archive of the discontinued LLVM Phabricator instance.

Fix clang-formats IncludeCategory to match the documentation
Needs ReviewPublic

Authored by Febbe on Feb 9 2023, 4:24 PM.

Details

Summary
  • Main Include Files have now always prio (0,0)
  • They can't be matched by negative matchers anymore.
  • SortPriority now truly defaults to Priority
  • If it is unclear, which include is the main include, use both instead of using the first.

fixes #58284
fixes #60589

Diff Detail

Event Timeline

Febbe created this revision.Feb 9 2023, 4:24 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 9 2023, 4:24 PM
Febbe requested review of this revision.Feb 9 2023, 4:24 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 9 2023, 4:24 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
Febbe added a comment.EditedFeb 9 2023, 4:49 PM

Still some issues with SortPriorities.
The following settings (note the SortPriority of '^<.*' == 1) which will produce an extra group for the attached includes:

IncludeCategories:
  - Regex:           '^<ext/.*\.h>'
    Priority:        2
    SortPriority:    2
  - Regex:           '^<.*\.h>'
    Priority:        1
    SortPriority:    1
  - Regex:           '^<.*'
    Priority:        2
    SortPriority:    1
  - Regex:           '.*'
    Priority:        -1
    SortPriority:    1
#include "toolset_febbe.h" // <- Not expected to be in group 1

#include "zhello_world.h" // <- Not expected to be in group 2

#include <cstddef>
#include <cstdint>

#include <ext/bar.h>

#include <iostream> // <- no group expected

But at least it does what the documentation states and the most users would expect for the default case (Priority == SortPriority).

It seems that the include list is currently first sorted (by SortPriority), and then split into groups, which does not make much sense, since it also rearranges the groups and splits them up. But the sorting before the regroup is at least documented.

Febbe added inline comments.Feb 9 2023, 4:55 PM
clang/lib/Tooling/Inclusions/HeaderIncludes.cpp
17

<limits> is used, but not included. Currently, it works, since another header already included it.

Febbe added a reviewer: Restricted Project.Feb 9 2023, 4:58 PM
Febbe edited projects, added Restricted Project; removed Restricted Project.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 9 2023, 4:58 PM
Febbe added a subscriber: Restricted Project.Feb 9 2023, 4:59 PM
rymiel edited reviewers, added: owenpan, MyDeveloperDay, rymiel, HazardyKnusperkeks; removed: Restricted Project.Feb 9 2023, 6:51 PM
Febbe updated this revision to Diff 496647.Feb 10 2023, 5:58 PM

Fixed the Unit Tests

  • rewritten one test, which made the assumption, that there can be only one main header.
    • it now asserts, that all matching headers are considered as main header.
  • replaced initialisations of IncludeCategories.SortPriority to zero with the value from IncludeCategories.Priority
    • the previous approach to set the SortPriority when it's 0 to Priority, instead of initializing it directly as intended had several drawbacks:
      • values of 0 were not possible and resulted in weird behav.
      • when a main include was matched by another matcher, it got annother sortpriority than 0.
Febbe updated this revision to Diff 496650.Feb 10 2023, 6:31 PM

Added priority to the sorting algorithm.
This prevents unwanted splits and weird resorts of groups with the same priority, but with different and overlapping (other groups) SortPriority.
The new system can now be understood as Primary.SecondaryPriority instead of:

sort by SecondPriority + bucket By PrimaryPriority

Febbe added a comment.Feb 10 2023, 6:38 PM

I think I am done, and you can review it now.

With the latest changes, SortPriority is no longer required to match Priority, it can always be 0, since Priority is now used in the sorting algorithm.
This is what the documentation tends to depict:

"and #includes are sorted first according to increasing category number and then alphabetically within each category."
" There is a third and optional field SortPriority which can used while IncludeBlocks = IBS_Regroup to define the priority in which #includes should be ordered."

HazardyKnusperkeks requested changes to this revision.Feb 13 2023, 12:05 PM
HazardyKnusperkeks added inline comments.
clang/lib/Format/Format.cpp
1381

I don't follow why this is necessary. And I don't feel comfortable with it.

clang/lib/Tooling/Inclusions/HeaderIncludes.cpp
218–219

Drop the braces (now).

clang/lib/Tooling/Inclusions/IncludeStyle.cpp
28–29

A big no! You will break existing configurations.

This revision now requires changes to proceed.Feb 13 2023, 12:05 PM
Febbe added a comment.Feb 13 2023, 1:07 PM

I've added some elaborations and justifications for the criticized changes.

clang/lib/Format/Format.cpp
1381

This was required, but is not required necessarily, my last update of the patch made this change most likely obsolete.

But the actual reason is, that SortPriority is described in the documentation as "optional value which is defaulted to Priority, if absent", but it is neither a (std::)optional value nor it was defaulted to Priority if absent.
It was indirectly updated from 0 to Priority in the algorithm.

Since I moved the buggy, re-initialization of SortPriority to the first initialization:
clang/lib/Tooling/Inclusions/IncludeStyle.cpp:L20 this change must also be covered by the tests: All zero initializations must now be default to Priority to not change the test behavior, which is what you want.

Either by creating a Constructor without SortPriority, which defaults to Priority,
creating a factory function, or by doing this for all tests manually.

Imagine the tests would also test the interpretation of the JSON input. Then the tests would not require an adjustment to be semantically equal.

But since I added Priority to the sorting after this change, this is kind of irrelevant.

clang/lib/Tooling/Inclusions/IncludeStyle.cpp
28–29

Can you elaborate?
Do you mean the change from mapOptional to mapRequired?
From what I thought, this change only affects clarity (improvement), not the semantic:

  • the parent JSON list/array entry is already optional.
  • therefore any of them is required, but none is useful alone.
    • an empty regex, just matches nothing => regex required
    • an empty Priority should default (documentation) to INT_MAX, but actually did not (0).
  • Also, from the documentations view, omitting one of them was never intended.

The only effect this change will have for old configurations, is, that it breaks "wrong" configurations. It's like abusing undefined behav.: You can't be sure it works after a compiler update.

clang/lib/Format/Format.cpp
1381

My main question is, does this change behavior? In either case, please add some tests to show the difference, or the lack thereof.

clang/lib/Tooling/Inclusions/IncludeStyle.cpp
28–29

Can you elaborate?
Do you mean the change from mapOptional to mapRequired?
From what I thought, this change only affects clarity (improvement), not the semantic:

  • the parent JSON list/array entry is already optional.
  • therefore any of them is required, but none is useful alone.
    • an empty regex, just matches nothing => regex required

While a configuration without a regex may be stupid, it currently is accepted.

  • an empty Priority should default (documentation) to INT_MAX, but actually did not (0).

Here one has to decide if the documentation or the behavior has to be fixed. I have not a preference, @MyDeveloperDay
may have.

  • Also, from the documentations view, omitting one of them was never intended.

Nevertheless it is accepted and thus there is most likely a configuration which uses it.

The only effect this change will have for old configurations, is, that it breaks "wrong" configurations. It's like abusing undefined behav.: You can't be sure it works after a compiler update.

Changing what happens in such cases is one thing (but also not desired), just completely turning clang-format off, through erroring on the configuration parsing is in my view not acceptable.

Febbe updated this revision to Diff 497687.Feb 15 2023, 8:16 AM
Febbe marked 3 inline comments as done.
  • Removed some, not required changes.
  • Reverted mapOptional -> mapRequired change, since it might break existing configs
Febbe updated this revision to Diff 497701.Feb 15 2023, 9:07 AM

Added requested tests:

  • since we now support priorities, equal to 0 (the main include priority) sorting should work now for those.
  • multiple files can be main-includes now, the tests expects that now.
  • negative priorities do not override the main include priority anymore, which is also tested
Febbe marked 2 inline comments as done.Feb 15 2023, 9:22 AM
Febbe added inline comments.
clang/lib/Format/Format.cpp
1381

I removed all changes to the tests, which do not change the behavior.

One test required a change. It assumed, that only one file can be a main include, which is, in my opinion, not correct. I've made the change, that all files, which are matched as main include are ordered with priority 0 (this especially means includes introduced by IncludeIsMainRegex and IncludeIsMainSourceRegex). The previous implementation ignored those, if another main file was found earlier. It could also result in unpredictable resorting.

Looks good to me, apart from that one issue.

clang/lib/Tooling/Inclusions/IncludeStyle.cpp
23

Please move this into the mapping function.

clang/unittests/Format/SortIncludesTest.cpp
11

Don't need that, or?

548

While I may accept there are more than one main header, this should not happen in my opinion.

557

End comments with full stop.

Febbe added inline comments.Feb 16 2023, 7:07 AM
clang/unittests/Format/SortIncludesTest.cpp
548

Yes, that is a conflict of interests. There is no perfect way of implementing this, without the help of clang-tidy / the clang language server:

To arguably define a file as a main include, all declarations must be analyzed, whether they are predeclared in the include or not.
But we don't have the help of the clang AST.

My expectation is, that when I randomly shuffle the includes, that they are always sorted to the same result.
Currently, this is not the case, and depending on the rules it can furthermore happen, that a main include "a.h" is exchanged with the "llvm/a.h".
This should also not happen, but it does, and it is not covered by the tests.

I consider the false negative of a correct main include worse than a false positive.
Therefore, I changed it.
In addition, my approach has the advantage that all includes are sorted in the same way, regardless of the order.

But if you want, we could introduce a new Option like: enum FindMainIncludes{FMI_First, FMI_All, FMI_Off = 0};
With First to match the current behavior, All for the new behavior.
But then matched includes with a negative priority would be swapped with the other possible main_include at each run.
I don't know how to prevent this.
The best solution I can imagine, is still a comment of the programmer, that the respective include is or is not a main include.
E.g. // clang-format pragma: no_main_include

Another idea would be, to insert all main includes into a list with the matchers' priority, and then take the last negative or the first positive, but only if it is not partially sortable with the neighbors.
This might be a good heuristic, whether the include was previously detected as main include.
But I don't know if this is implementable with the current algorithm.

clang/unittests/Format/SortIncludesTest.cpp
548

It is very unfortunate that llvm/a.h would be detected as a main header, but would currently the relative order be kept and thus a.h always be the main header?

I don't think you will find someone (of the usual reviewers) to be in favor of the pragma, and I certainly don't want such a thing in my code.

A heuristic seems to be a good way. I'd say from all the candidates we take the one without any directories in it.

Febbe added inline comments.Feb 16 2023, 2:23 PM
clang/unittests/Format/SortIncludesTest.cpp
548

but would currently the relative order be kept and thus a.h always be the main header?

No, unfortunately, clang-format never sorted the path after its dept, std::filesystem::path would do that, but clang-format uses the std::less operator of StringRef which is a lexicographical comparison.
Therefore, m.h will always be ordered after llvm/m.h. Changing that would reorder nearly all headers in all projects using clang-format. But I could do this only for the main-include candidates.

Nevertheless, I don't think this would be a good heuristic either, because me, and many others are using an extra include folder for public headers (API) and the src directory for private headers:

my-lib/
├─ include/
│  ├─ public_headers.md
│  ├─ a.h
├─ src/
│  ├─ private_headers_and_source.md
│  ├─ a_priv.h
│  ├─ a.cpp

With your proposed solution, the primary main header is ordered after the main include.
Let me implement my heuristic first, maybe it will work well enough to prevent the reordering, when the main header is sorted by hand (just like before, but now it also should work for negative priorities).

Otherwise, we must reconsider which tradeoff we choose (maybe a completely new one).