This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Modified SortIncludes and IncludeCategories to priority for sorting #includes within the Group Category.
ClosedPublic

Authored by Manikishan on Jul 13 2019, 5:30 AM.

Details

Summary

This new Style rule is made as a part of adding support for NetBSD KNF in clang-format. NetBSD have it's own priority of includes which should be followed while formatting NetBSD code. This style sorts the Cpp Includes according to the priorities of NetBSD, as mentioned in the Style Guide
The working of this Style rule shown below:

Configuration:
This revision introduces a new field under IncludeCategories named SortPriority which defines the priority of ordering the #includes and the Priority will define the categories for grouping the #include blocks.

Here is an example how this Style sorts cpp includes according to NetBSD KNF.
Before Formatting:

#include <sys/param.h>		
#include <sys/types.h>		
#include <sys/ioctl.h>		
#include <net/if_dl.h>
#include <net/route.h>
#include <netinet/in.h>
#include <sys/socket.h>		
#include <sys/stat.h>
#include <sys/wait.h>		
#include <net/if.h>
#include <protocols/rwhod.h>
#include <assert.h>
#include <paths.h>
#include "pathnames.h"		
#include <errno.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>

After Formatting:

#include <sys/param.h>
#include <sys/types.h>
#include <sys/ioctl.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <sys/wait.h>

#include <net/if.h>
#include <net/if_dl.h>
#include <net/route.h>
#include <netinet/in.h>
#include <protocols/rwhod.h>

#include <assert.h>
#include <errno.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>

#include <paths.h>

#include "pathnames.h"

Diff Detail

Repository
rL LLVM

Event Timeline

Manikishan created this revision.Jul 13 2019, 5:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2019, 5:30 AM
Herald added a subscriber: krytarowski. · View Herald Transcript

This is the squashed commit of both Adding NetBSD Style and adding sortNetBSDIncludes

Manikishan edited the summary of this revision. (Show Details)Jul 13 2019, 5:37 AM

Do you think there is anything about this algorithm that could be parameterized so that other projects could utilize it? I guess its not completely clear how this differs from just using the IncludeCategories?

Do you think there is anything about this algorithm that could be parameterized so that other projects could utilize it?

I think we can take input from users the header files which should be grouped together and pass them to regex searches which then sets up priority.
But I dob the feasibility of this approach. Will come up with one.

I guess its not completely clear how this differs from just using the IncludeCategories?

This style works only via NetBSD includes order instead of alphabetical .

There also seems like alot of duplication between the existing sortCppIncludes

I think the only difference here is really just

SmallVector<unsigned, 16> Indices;
  SmallVector<unsigned, 16> Includes_p;
  for (unsigned i = 0, e = Includes.size(); i != e; ++i) {
    unsigned pl = getNetBSDIncludePriority(Includes[i].Filename);
    Includes_p.push_back(pl);
    Indices.push_back(i);
  }

vs the original

SmallVector<unsigned, 16> Indices;
for (unsigned i = 0, e = Includes.size(); i != e; ++i)
  Indices.push_back(i);

plus way the sorting is performed, are we sure we couldn't have just made the original sorting more powerful based on style settings?

lib/Format/Format.cpp
1878 ↗(On Diff #209685)

_p? I don't understand what it stands for?

IncludesPriority?

Manikishan marked an inline comment as done.EditedJul 14 2019, 10:07 AM

There also seems like alot of duplication between the existing sortCppIncludes

I think the only difference here is really just

SmallVector<unsigned, 16> Indices;
  SmallVector<unsigned, 16> Includes_p;
  for (unsigned i = 0, e = Includes.size(); i != e; ++i) {
    unsigned pl = getNetBSDIncludePriority(Includes[i].Filename);
    Includes_p.push_back(pl);
    Indices.push_back(i);
  }

vs the original

SmallVector<unsigned, 16> Indices;
for (unsigned i = 0, e = Includes.size(); i != e; ++i)
  Indices.push_back(i);

plus way the sorting is performed, are we sure we couldn't have just made the original sorting more powerful based on style settings?

Does it mean that adding the priority to sort based on style?
like this:

if (Style== NetBSD)
   // set priority to netbsd's priority
else if(Style == X)
  // set X's priority

I didn't want to mess up the original sorting and made up this patch, if we have parameterise this solution, I will go for it.

lib/Format/Format.cpp
1878 ↗(On Diff #209685)

Yes , it means priority

There also seems like alot of duplication between the existing sortCppIncludes

I think the only difference here is really just

SmallVector<unsigned, 16> Indices;
  SmallVector<unsigned, 16> Includes_p;
  for (unsigned i = 0, e = Includes.size(); i != e; ++i) {
    unsigned pl = getNetBSDIncludePriority(Includes[i].Filename);
    Includes_p.push_back(pl);
    Indices.push_back(i);
  }

vs the original

SmallVector<unsigned, 16> Indices;
for (unsigned i = 0, e = Includes.size(); i != e; ++i)
  Indices.push_back(i);

plus way the sorting is performed, are we sure we couldn't have just made the original sorting more powerful based on style settings?

Does it mean that adding the priority to sort based on style?
like this:

if (Style== NetBSD)
   // set priority to netbsd's priority
else if(Style == X)
  // set X's priority

I didn't want to mess up the original sorting and made up this patch, if we have parameterise this solution, I will go for it.

We should be able to make changes without fear of breaking other code, if we can't then we don't have enough tests

I guess my comment was that you patche introduces a lot of code duplication, could the small amount of code you added be say put into a lambda and past in based on style?

lib/Format/Format.cpp
1878 ↗(On Diff #209685)

I don't think Includes_p matches Clang naming conventions so I recommend changing it.

MyDeveloperDay added inline comments.Jul 15 2019, 1:26 AM
unittests/Format/SortIncludesTest.cpp
709 ↗(On Diff #209685)

should you add a test which has groups defined already, I'm unclear as to how your algorithm works across groups?

Manikishan marked an inline comment as done.
Manikishan added inline comments.
unittests/Format/SortIncludesTest.cpp
709 ↗(On Diff #209685)

The include groups in NetBSD files are made depending on the use, for example
Kernel headers, File systems headers, Network and protocol headers are grouped together.
My algorithm is to detect the include type by regex and assign a priority to them to sort. And to regroup I used IncludeCategories.

Manikishan marked 2 inline comments as done.Jul 15 2019, 1:48 AM

I guess my comment was that you patche introduces a lot of code duplication, could the small amount of code you added be say put into a lambda and past in based on style?

Yes, I will add that and submit a patch soon.

Is this change that was meant?

I am not quite sure why this change is required to sort the headers for NetBSD, you can set the priorities via IncludeStyle.IncludeCategories. Is that not sufficient?

Manikishan added a comment.EditedJul 15 2019, 8:57 AM

I am not quite sure why this change is required to sort the headers for NetBSD, you can set the priorities via IncludeStyle.IncludeCategories. Is that not sufficient?

It can be done by setting priorities in IncludeCategories, but here we have nearly 40+ cases and categories to hardcode due to complex interdependencies between their headers. So, I have added this style reducing the cases using regex. And if this is fully parameterised any other related project can add their own header priorities.

I am not quite sure why this change is required to sort the headers for NetBSD, you can set the priorities via IncludeStyle.IncludeCategories. Is that not sufficient?

It can be done by setting priorities in IncludeCategories, but here we have nearly 40+ cases and categories to hardcode due to complex interdependencies between their headers. So, I have added this style reducing the cases using regex. And if this is fully parameterised any OS related project can add their own header priorities.

Note that IncludeCategories is already a regex - https://clang.llvm.org/docs/ClangFormatStyleOptions.html

Manikishan added a comment.EditedJul 15 2019, 9:31 AM

I am not quite sure why this change is required to sort the headers for NetBSD, you can set the priorities via IncludeStyle.IncludeCategories. Is that not sufficient?

It can be done by setting priorities in IncludeCategories, but here we have nearly 40+ cases and categories to hardcode due to complex interdependencies between their headers. So, I have added this style reducing the cases using regex. And if this is fully parameterised any OS related project can add their own header priorities.

Note that IncludeCategories is already a regex - https://clang.llvm.org/docs/ClangFormatStyleOptions.html

Sorry, my mistake I meant that I have added Regex for priorities while sorting and If I am not wrong I think IncludeCategories are used while Regrouping after sorting the Includes. In addition to that in my case I have to sort the includes In a particular order then grouping them in different
For example:

#include <sys/param.h>		/* <sys/param.h> first, */
#include <sys/types.h>		/*   <sys/types.h> next, */
#include <sys/ioctl.h>		/*   and then the rest, */
#include <uvm/*.h>
#include <dev/*.h>

#include <net/if.h>
#include <net/if_dl.h>
#include <net/route.h>
#include <netinet/in.h>
#include <protocols/rwhod.h>

As shown in the above example <uvm> should follow <sys> then <dev> but while regrouping they should be in the same group.

Sorry, my mistake I meant that I have added Regex for priorities while sorting and If I am not wrong I think IncludeCategories are used while Regrouping after sorting the Includes. In addition to that in my case I have to sort the includes In a particular order then grouping them in different
For example:

#include <sys/param.h>		/* <sys/param.h> first, */
#include <sys/types.h>		/*   <sys/types.h> next, */
#include <sys/ioctl.h>		/*   and then the rest, */
#include <uvm/*.h>
#include <dev/*.h>

#include <net/if.h>
#include <net/if_dl.h>
#include <net/route.h>
#include <netinet/in.h>
#include <protocols/rwhod.h>

As shown in the above example <uvm> should follow <sys> then <dev> but while regrouping they should be in the same group.

OK. I think I understand. Since <sys/param.h> and <sys/types.h> have to come first we would need to set the regex and priorities for those individually. But then those will be treated as separate block and grouped like:

#include <sys/param.h>

#include <sys/types.h>

#include <sys/ioctl.h>
more <sys/*> includes

I wonder if a better approach would be to extend IncludeCatalog to allow the grouping to specified independent of the regex?

E.g., { "{<sys/param.h>", 0, 1 }, // priority 1, group 1

    {<sys/types.h>, 1, 1, }, // different priority, but same group
   {<sys/*>, 2, 1}  // more general headers with lower priority, but will still be grouped in group 1.
}

I appreciate what you've done to make this patch less duplicated code..but now you've almost got to the point where being able to control it all via configuration.

I'm with @rdwampler I think you should be able to extend the include categories...and then this becomes a very powerful capability. There are many people who use clang-format for a "non" standard style source trees, and they might like to use such a capability themselves without having to inherit from NetBSD type.

I agree it will be complex, but that is a good justification for adding hard coded a NetBSD style.

lib/Format/Format.cpp
1905 ↗(On Diff #209873)

remove

Manikishan marked 2 inline comments as done.

I appreciate what you've done to make this patch less duplicated code..but now you've almost got to the point where being able to control it all via configuration.

I'm with @rdwampler I think you should be able to extend the include categories...and then this becomes a very powerful capability. There are many people who use clang-format for a "non" standard style source trees, and they might like to use such a capability themselves without having to inherit from NetBSD type.

I agree it will be complex, but that is a good justification for adding hard coded a NetBSD style.

Thanks, I will try to implement it as suggested.
This is my proposal algorithm:

  1. Modify Include.Categories by adding one more field for grouping priority.
  2. Add support for this third field in sortIncludes.

Am I missing anything?

In the following patch I have modified Include categories by adding a new field "sortInlcudes" which defines the priority of the sort. And priority field will now only be used for grouping.

Thanks! Can you update the documentation too?

lib/Tooling/Inclusions/HeaderIncludes.cpp
211 ↗(On Diff #210218)

I think you can drop the else block and just return immediately from the for loop.

Thanks! Can you update the documentation too?

Thanks!
So, is this Implementation fine? But there is a thing, The values which we give two Priority does not really matter they need to be different to be in different groups. Is that ok? and should I Rename any options?.

Is there sufficient test coverage as to what happens if SortPriority is not set?

lib/Format/Format.cpp
455–456 ↗(On Diff #210218)

Unrelated change

613–614 ↗(On Diff #210218)

Unrelated change

692–693 ↗(On Diff #210218)

Unrelated change

Manikishan marked 4 inline comments as done.

Is there sufficient test coverage as to what happens if SortPriority is not set?

If SortPriority is not set, the Includes will be grouped without sorting,

For example:

#include <sys/param.h>
#include <sys/types.h>
#include <sys/ioctl.h>
#include <net/if_dl.h>
#include <net/route.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <net/if.h>
#include <protocols/rwhod.h>
#include <assert.h>
#include <paths.h>
#include "pathnames.h"
#include <errno.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>

will be grouped as

#include "pathnames.h"

#include <assert.h>
#include <errno.h>
#include <inttypes.h>

#include <net/if.h>
#include <net/if_dl.h>
#include <net/route.h>
#include <netinet/in.h>

#include <paths.h>

#include <protocols/rwhod.h>

#include <stdio.h>
#include <stdlib.h>

#include <sys/ioctl.h>
#include <sys/param.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/wait.h>

Can we add a test case for this and mention that users should set SortPriority, or handle this case some how?
One way can be when the values are not set their Priority value is set as SortPriority, but this is also a problem
when users give SortPriority as "0". Any comments on this issue?

Is there sufficient test coverage as to what happens if SortPriority is not set?

If SortPriority is not set, the Includes will be grouped without sorting,

Let me rephrase - for the exiting .clang-formats, that don't currently specify SortPriority,
this introduction of SortPriority should not change the header handling.
Is that the case, and if so is there sufficient test coverage for that?

Is there sufficient test coverage as to what happens if SortPriority is not set?

If SortPriority is not set, the Includes will be grouped without sorting,

Let me rephrase - for the exiting .clang-formats, that don't currently specify SortPriority,
this introduction of SortPriority should not change the header handling.
Is that the case, and if so is there sufficient test coverage for that?

I got your idea now.
No, there is no test coverage for that case, and with the current patch they have to add SortPriority.
To avoid this shall I set SortPriority as Priority as default if it is not defined? I think that will fix the issue.

Is there sufficient test coverage as to what happens if SortPriority is not set?

If SortPriority is not set, the Includes will be grouped without sorting,

Let me rephrase - for the exiting .clang-formats, that don't currently specify SortPriority,
this introduction of SortPriority should not change the header handling.
Is that the case, and if so is there sufficient test coverage for that?

I got your idea now.
No, there is no test coverage for that case, and with the current patch they have to add SortPriority.
To avoid this shall I set SortPriority as Priority as default if it is not defined? I think that will fix the issue.

any reviews on it ?

Is there sufficient test coverage as to what happens if SortPriority is not set?

If SortPriority is not set, the Includes will be grouped without sorting,

Let me rephrase - for the exiting .clang-formats, that don't currently specify SortPriority,
this introduction of SortPriority should not change the header handling.
Is that the case, and if so is there sufficient test coverage for that?

I got your idea now.
No, there is no test coverage for that case, and with the current patch they have to add SortPriority.
To avoid this shall I set SortPriority as Priority as default if it is not defined? I think that will fix the issue.

any reviews on it ?

That's sounds like it will work. Can you add some additional test cases around this in SortIncludesTest.cpp. Also, adding a test case specifically for sorting the NetBSD headers would be good.

Is there sufficient test coverage as to what happens if SortPriority is not set?

If SortPriority is not set, the Includes will be grouped without sorting,

Let me rephrase - for the exiting .clang-formats, that don't currently specify SortPriority,
this introduction of SortPriority should not change the header handling.
Is that the case, and if so is there sufficient test coverage for that?

I got your idea now.
No, there is no test coverage for that case, and with the current patch they have to add SortPriority.
To avoid this shall I set SortPriority as Priority as default if it is not defined? I think that will fix the issue.

any reviews on it ?

That's sounds like it will work. Can you add some additional test cases around this in SortIncludesTest.cpp. Also, adding a test case specifically for sorting the NetBSD headers would be good.

Sorry for the delay, I am facing issues with "NoCrash_Bug34236" will update the patch once I am able to fix it.

Added Tests for using and not using SortPriority Field, Now, even if the SortPriorityis not set it will be set to the value of Category by default

Assuming this works and the other unit tests don't show issues then this LGTM. Please consider running this on your NetBSD code base before committing, if possible please also run on clang code based to ensure existing sorted headers aren't sorted unexpectedly.

I do feel like there could be a documentation change missing really to explain to people how this really works and what if anything they have to change in their existing .clang-format files

MyDeveloperDay accepted this revision.Aug 5 2019, 2:45 AM
This revision is now accepted and ready to land.Aug 5 2019, 2:45 AM
Manikishan retitled this revision from [clang-format] Added new style rule: SortNetBSDIncludes to [clang-format] Modified SortIncludes and IncludeCategories to priority for sorting #includes within the Group Category..Aug 21 2019, 12:22 PM

Assuming this works and the other unit tests don't show issues then this LGTM.

Please consider running this on your NetBSD code base before committing, if possible please also run on clang code based to ensure existing sorted headers aren't sorted unexpectedly.

Before this gets committed, i feel like echoing this - did you ensure that this doesn't regress existing code/configs?

I do feel like there could be a documentation change missing really to explain to people how this really works and what if anything they have to change in their existing .clang-format files

Oh, indeed, docs are completely missing. I think it is best to resolve this right away rather than afterwards.

Assuming this works and the other unit tests don't show issues then this LGTM. Please consider running this on your NetBSD code base before committing, if possible please also run on clang code based to ensure existing sorted headers aren't sorted unexpectedly.

I do feel like there could be a documentation change missing really to explain to people how this really works and what if anything they have to change in their existing .clang-format files

I wIll update the documentation soon, within a day. And I will also run it over both the clang, and netbsd code to make sure nothing goes wrong.

Manikishan edited the summary of this revision. (Show Details)

I have removed the introduction to the NetBSDStyle, as there are some styles that are needed to implemented before launching the NetBSD style.

Manikishan edited the summary of this revision. (Show Details)Aug 22 2019, 3:46 AM
Manikishan edited the summary of this revision. (Show Details)
Manikishan edited the summary of this revision. (Show Details)

Does this need landing? given that you have a number of patches in flight perhaps it would be good to request commit access

Manikishan added a comment.EditedSep 12 2019, 6:36 AM

>>! In D64695#1667696, @MyDeveloperDay wrote:

Does this need landing? given that you have a number of patches in flight perhaps it would be good to request commit access

Yes, it would be good if it is landed. And can I know the procedure for getting commit access

Yes, it would be good if it is landed. And can I know the procedure for getting commit access

https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access

MyDeveloperDay added a project: Restricted Project.Sep 25 2019, 1:18 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptSep 25 2019, 1:32 PM