This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Remove all include duplicates not only those in the same block
Needs RevisionPublic

Authored by Febbe on Feb 12 2023, 5:14 PM.

Details

Summary

This change aims to remove all duplicate include directives,
instead of only those, which are in the same block.

This change might be a bit controversial, since it will also remove includes, which are in custom splitted blocks like:

#include "a.h"
/* some code */

// following include must stay!:
#include "a.h"

This is a follow up patch for D143691, but can be freestanding.

Diff Detail

Event Timeline

Febbe created this revision.Feb 12 2023, 5:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 12 2023, 5:14 PM
Febbe requested review of this revision.Feb 12 2023, 5:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 12 2023, 5:14 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
Febbe added a project: Restricted Project.Feb 12 2023, 5:15 PM
HazardyKnusperkeks requested changes to this revision.Feb 13 2023, 12:10 PM

I can see that this is maybe useful, but that have to be behind a new option, which is turned off by default. And a big no to changing the existing tests, you may add new stuff.

This revision now requires changes to proceed.Feb 13 2023, 12:10 PM
Febbe added a comment.Feb 13 2023, 1:20 PM

@HazardyKnusperkeks thank you for the review, I would add another option, but I don't know a good name. I would propose a

boolean IncludeDeduplicateInAllBlocks which defaults to zero.

First an Include, to keep include-sorting related options together in the documentation (sorted by name not by category)
DeduplicateInAllBlocks self explaining name.

But IncludeDeduplication with the enum states IDD_SameBlock(default) and IDD_ALL would also an option.
The latter has the advantage, that an expansion of this option would not require the adjustment of tests.

What's your preference?

@HazardyKnusperkeks thank you for the review, I would add another option, but I don't know a good name. I would propose a

boolean IncludeDeduplicateInAllBlocks which defaults to zero.

First an Include, to keep include-sorting related options together in the documentation (sorted by name not by category)
DeduplicateInAllBlocks self explaining name.

But IncludeDeduplication with the enum states IDD_SameBlock(default) and IDD_ALL would also an option.
The latter has the advantage, that an expansion of this option would not require the adjustment of tests.

What's your preference?

The enum is the better variant, I'd add the Off/None variant in the same breath. But the name has to be better, you don't need to start it with Include, since D138446 we can link to other options and that should happen. So a more natural name would be DeduplicateIncludeDirectives.

MyDeveloperDay requested changes to this revision.EditedApr 17 2023, 12:40 AM

I know it might not seem an obvious use case but there really isn't a requirement to not include header files more than once.. imagine if I have

#define ARCH "win32"
#include "MyDataStructThatContainsPlaformSpecificNamesUsingARCH.h"
#define ARCH "win64"
#include "MyDataStructThatContainsPlaformSpecificNamesUsingARCH.h"

Its not nice, but just because I include it twice doesn't mean its wrong? This change would break code written that way, No?

clang/unittests/Format/SortIncludesTest.cpp
927

Please don't change existing tests

Febbe added a comment.Apr 17 2023, 2:32 AM

I know it might not seem an obvious use case but there really isn't a requirement to not include header files more than once.. imagine if I have

#define ARCH "win32"
#include "MyDataStructThatContainsPlaformSpecificNamesUsingARCH.h"
#define ARCH "win64"
#include "MyDataStructThatContainsPlaformSpecificNamesUsingARCH.h"

Its not nice, but just because I include it twice doesn't mean its wrong? This change would break code written that way, No?

Mhh, I need to test that. But you are right, if it removes the header in one of the conditional preprocessor blocks, it would be very bad.
Regarding normal blocks, the behavior was already inconsistent, as soon to reorder blocks was activated:
Blocks are merged if possible, then de-duplicated and split again. So I just made the behavior more consistent here.

Regarding the comment that I must not change existing tests: I think this rule is too strict, because those tests are mostly regression tests.
But a regression tests does not test for correctness. So if a test had already a wrong assumption, it must be changeable.
Also, when a behavior is now undesirable, it should be possible to adapt them.
I would say in this case I should replace the test with your case, to reflect that includes must not be removed, when they are in different conditional preprocessor blocks.
That the different "modi" should be more consistent is in my opinion desirable.

Last but not least, I currently have nearly zero free time, so I would like to pause this and my other phab until my thesis is done.

Regarding the comment that I must not change existing tests: I think this rule is too strict, because those tests are mostly regression tests.
But a regression tests does not test for correctness. So if a test had already a wrong assumption, it must be changeable.

I don't agree, you are making the assumption that the default behaviour should now be different from what is was before, we have 100,000+ of users did you canvas them to ask if we wanted your suggested correct behaviour? (especially when someone has split their includes into a separate include group? I assume for a reason)

What would be their recourse if they didn't want the duplicate removed across groups, I don't see one? are you expecting them to // clang-format on/off

Ultimately we try not to change the default behaviour, if we think the behaviour is useful (and this could be), we ask that its put behind a new "Option" and that by default its off.

SortIncludes was latterly considered a contentious addition and has been proved to alter code in a well publicized example, in hindsight most people think it should have been off by default. I don't like making assumptions that we should turn something on, that others might not want/need or desire. This feels like one of those and something that could break code.

I personally follow a Beyoncé rule when it comes to unit tests... "If I like it I should have put a test on it", if the test is there, I need to be persuaded the test is very wrong before I'm happy to let it be changed to a new behaviour.

FWIW, these are not just regression tests, they assert that the behaviour is as the author desired and that is considered correctness until proven otherwise, any new author needs to respect that, or prove its a genuine bug and not just a matter of style/opinion.

Febbe added a comment.Apr 19 2023, 6:12 AM

Actually, I already wanted to add an enum, controlling this behavior.
But even with an enum, this check should only remove includes in consecutive blocks, which could be relocated (are not split by #defines or a comment)
Or I should add the options to DeduplicateIncludeDirectives: {DID_InSameBlock (default), DID_Never, DID_Allways, DID_OnlyConsecutiveBlocks}

Note, that I use DID_InSameBlock as default, because that's the current behavior, even many people would like to have it off by default.

Regarding the breaking of current features: What would you do, if a break is necessary, to implement a way more useful feature or to fix a huge issue/bug?
Would you just infer a new version number? (I personally don't like the idea of having version numbers for specific tests, but that would be the easiest way to keep compatibility with old configurations and codebases).
Of course, the assumptions must be proven, as you said.