This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Add option to control the space at the front of a line comment
ClosedPublic

Authored by HazardyKnusperkeks on Nov 27 2020, 9:24 PM.

Details

Summary

Adding a minimum and a maximum number of allowed spaces in the line comment prefix.
The both values are needed for what I intend (a maximum of 0), and keeping the old behavior (only a minimum of 1).

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptNov 27 2020, 9:24 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
HazardyKnusperkeks requested review of this revision.Nov 27 2020, 9:24 PM
MyDeveloperDay added inline comments.Nov 28 2020, 2:20 AM
clang/docs/ClangFormatStyleOptions.rst
2727

I'm personally not a massive fan of stuffing -1 into an unsigned but I understand why its there, if this was an signed and it was actually -1 would the algorithm be substantially worse in your view?

clang/lib/Format/BreakableToken.cpp
797

my assumption is when Maximum is -1 this is a very large -ve number correct? is that defined behavior for drop_back()

clang/docs/ClangFormatStyleOptions.rst
2727

I'm no fan if unsigned in any way, but that seems to be the way in clang-format.
Making it signed would require a few more checks when to use it, but I don't see any problem in that.
I just would also make the Minimum signed then just to be consistent.

While parsing the style I would add checks to ensure Minimum is never negative and Maximum is always greater or equal to -1, should that print any warnings? Is there a standard way of doing so? Or should it be just silently corrected?

clang/lib/Format/BreakableToken.cpp
797

Since we check beforehand SpacesInPrefix is larger than Maximum there is no problem.

MyDeveloperDay added a comment.EditedNov 30 2020, 2:59 AM

I think fundamentally from my perspective this seem ok, out of interest can I ask what drove you to require it?

My assumption is that some people write comments like

// Free comment without space

and you want to be able to consistently format it to be (N spaces, as clang-format already does 1 space correct?)

//  Free comment without space

is that correct? is there a common style guide asking for that? what is the rationale

clang/lib/Format/BreakableToken.cpp
790

is this case covered by a unit test at all? sorry can you explain why you are looking for "##"?

797

yep sorry didn't see that.

HazardyKnusperkeks marked 2 inline comments as done.Nov 30 2020, 12:28 PM

I think fundamentally from my perspective this seem ok, out of interest can I ask what drove you to require it?

My assumption is that some people write comments like

// Free comment without space

and you want to be able to consistently format it to be (N spaces, as clang-format already does 1 space correct?)

//  Free comment without space

is that correct? is there a common style guide asking for that? what is the rationale

I will go for {0,0}, so no space between // and the text, I don't know about a style guide asking for it, other than my own. (Which I can dictate in my company.) :)
I have just recently started using clang-format and it does not everything the the way I want to, on some aspects I have adapted, but on others I try to "fix" it, you can expect some more changes from me in the next time.

clang/lib/Format/BreakableToken.cpp
790

It is covered by multiple tests, that's how I was made aware of it. :)
If you look at the code before it only adds a space if the old prefix is "#" not "" which is also found by `getLineCommentIndentPrefix`. As it seems in `TextProto` "" should not be touched. I can of course add a test in my test function.

Now I see a change, in the code before "#" was only accepted when the language is TextProto, now it is always. But I think for that to happen the parser (or lexer?) should have assigned something starting with"#" as comment, right? But I can change that.

clang/lib/Format/BreakableToken.cpp
790

Okay # # is formatted, I try again:
If you look at the code before it only adds a space if the old prefix is "#" not "##" which is also found by getLineCommentIndentPrefix. As it seems in TextProto "##" should not be touched.

MyDeveloperDay added inline comments.Dec 1 2020, 3:25 AM
clang/docs/ClangFormatStyleOptions.rst
2710

Is this change generated? with clang/doc/tools/dump_style.py or did you hand craft it?

ClangFormatStyleOptions.rst is always autogenerated from running dump_style.py, any text you want in the rst needs to be present in Format.h

clang/docs/ClangFormatStyleOptions.rst
2710

Yes it is generated, after you told me what I have to do on D91507 I have (and will in the future) always run that script, I've not touched the file directly.

But I also have not really looked at it, because it is generated. If it is a bit odd I can retake a look on other options with nested entries.

MyDeveloperDay accepted this revision.Dec 3 2020, 12:07 PM

This LGTM, I'm not sure if others have any comments

This revision is now accepted and ready to land.Dec 3 2020, 12:07 PM
curdeius added inline comments.
clang/lib/Format/Format.cpp
963

I don't know precisely the LLVM style but does it allow more than one space (as Maximum would suggest)?
Are there any tests covering that?
And what about other styles, no need to set min/max for them?

clang/lib/Format/Format.cpp
963

The part with the LLVM Style from my test case did run exactly so without any modification, so yes it allows more than one space.

Since there was no option before (that I'm aware of) all other styles behaved exactly like that. I did not check the style guides if they say anything about that, I just preserved the old behavior when nothing is configured.

curdeius accepted this revision.Dec 3 2020, 12:46 PM

LGTM

clang/lib/Format/Format.cpp
963

Ok. Great

Can I assume you need someone to land this for you?

Can I assume you need someone to land this for you?

Yes I do. But I have a question, my last change is commited in your name, that means git blame would blame it on you, right?

You can set me as author:
Björn Schäpers <bjoern@hazardy.de>
My Github Account is also called HazardyKnusperkeks.

Can I assume you need someone to land this for you?

Yes I do. But I have a question, my last change is commited in your name, that means git blame would blame it on you, right?

You can set me as author:
Björn Schäpers <bjoern@hazardy.de>
My Github Account is also called HazardyKnusperkeks.

The process is that you add (https://llvm.org/docs/Contributing.html)

Patch By: HazardyKnusperkeks

to the commit message if the user doesn't have commit access, if you want your name against the blame then I recommend applying for commit access yourself.

let me know if you still want me to land this

HazardyKnusperkeks edited the summary of this revision. (Show Details)Dec 6 2020, 5:40 AM

Can I assume you need someone to land this for you?

Yes I do. But I have a question, my last change is commited in your name, that means git blame would blame it on you, right?

You can set me as author:
Björn Schäpers <bjoern@hazardy.de>
My Github Account is also called HazardyKnusperkeks.

The process is that you add (https://llvm.org/docs/Contributing.html)

Patch By: HazardyKnusperkeks

to the commit message if the user doesn't have commit access, if you want your name against the blame then I recommend applying for commit access yourself.

let me know if you still want me to land this

Updated. :)
Yes please land this.

Can I assume you need someone to land this for you?

Yes I do. But I have a question, my last change is commited in your name, that means git blame would blame it on you, right?

You can set me as author:
Björn Schäpers <bjoern@hazardy.de>
My Github Account is also called HazardyKnusperkeks.

The process is that you add (https://llvm.org/docs/Contributing.html)

Patch By: HazardyKnusperkeks

to the commit message if the user doesn't have commit access, if you want your name against the blame then I recommend applying for commit access yourself.

That is incorrect and does not represent the nowadays reality, i suggest that you look up the docs.

let me know if you still want me to land this

krasimir added inline comments.Dec 6 2020, 10:46 AM
clang/docs/ClangFormatStyleOptions.rst
2727

I find it confusing why we have 2, Minimum and Maximum, instead of a single one.
I'm not convinced that Maximum is useful.
Conceptually I'd prefer a single integer option, say LineCommentContentIndent that would indicate the default indent used for the content of line comments. I'd naively expect LineCommentContentIndent = :

  • 0 would produce //comment
  • 1 would produce // comment (current default)
  • 2 would produce // comment, etc.

and this will work with if the input is any of //comment, // comment, or // comment, etc.

An additional consideration is that line comment sections often contain additional indentation, e.g. when there is a bullet list, paragraphs, etc. and so we can't guarantee that the indent of each line comment will be less than Maximum in general. I'd expect this feature to not adjust extra indent in comments, e.g.,

// Lorem ipsum dolor sit amet,
//  consectetur adipiscing elit,
//  ...

after reformatting with LineCommentContentIndent=0 to produce

//Lorem ipsum dolor sit amet,
// consectetur adipiscing elit,
// ...

(and vice-versa, after reformatting with LineCommentContentIndent=1).
This may well be handled by code, I just wasn't sure by looking at the code and test examples.

clang/lib/Format/BreakableToken.cpp
790

Thanks for the analysis!
I wrote the text proto comment detection. I believe the current clang-format is buggy in that it should transform ##comment into ## comment for text proto (and similarly for all other KnownTextProtoPrefixes in getLineCommentIndentCommentPrefix), so this substr(0, 2) != "##" is unnecessary and I should go ahead and update and add tests for that.

clang/unittests/Format/FormatTestComments.cpp
3405

This is desired, AFAIK, and due to the normalization behavior while reflowing: when a comment line exceeds the comment limit and is broken up into a new line, the full range of blanks is replaced with a newline. (https://github.com/llvm/llvm-project/blob/ddb002d7c74c038b64dd9d3c3e4a4b58795cf1a6/clang/lib/Format/BreakableToken.cpp#L66).
Note that reflowing copies the extra indent of the line, e.g.,

// line limit  V
// heading
// *line is
//   long long long long

get reformatted as

// line limit  V
// heading
// *line is
//   long long
//   long long

so if for ranges of blanks longer of size S>1 we copied the (S-1) blanks at the beginning of the next line, we would have cascading comment reflows undesired with longer and longer indents.

I tend to agree with @krasimir I don't see where you really use Maximum to mean anything, the nested configuration seems perhaps unnecessarily confusing?

HazardyKnusperkeks marked 6 inline comments as done.Dec 7 2020, 12:39 PM

Can I assume you need someone to land this for you?

Yes I do. But I have a question, my last change is commited in your name, that means git blame would blame it on you, right?

You can set me as author:
Björn Schäpers <bjoern@hazardy.de>
My Github Account is also called HazardyKnusperkeks.

The process is that you add (https://llvm.org/docs/Contributing.html)

Patch By: HazardyKnusperkeks

to the commit message if the user doesn't have commit access, if you want your name against the blame then I recommend applying for commit access yourself.

That is incorrect and does not represent the nowadays reality, i suggest that you look up the docs.

let me know if you still want me to land this

What/Where are the docs? I read https://llvm.org/docs/Contributing.html before hand and just now https://llvm.org/docs/CodeReview.html


I will update the patch, but that won't happen before the weekend.

clang/docs/ClangFormatStyleOptions.rst
2727

I was actually going for only one value, but while writing the tests I came to the conclusion that before my change is only enforced a minimum of 1. But that very well may be because of what you call line comment sections, I did not consider that. That's why I chose a minimum and maximum.

I will modify the patch to one value and will also add tests for the sections. But for that I need to remember if I added or removed spaces, right? Is there already infrastructure for that? Or is there any documentation of the various steps clang-format takes to parse and format code? Until now I tried to understand what's going on through single stepping with the debugger (quite time consuming).

clang/lib/Format/BreakableToken.cpp
790

In that case I will remove that check, but as said there were many tests which failed without it, I will have to adapt them too.

clang/unittests/Format/FormatTestComments.cpp
3405

Okay, I mean the spaced between sit and amet, while the spaces between Lorem and ipsum, and dolor and sit is kept.

Can I assume you need someone to land this for you?

Yes I do. But I have a question, my last change is commited in your name, that means git blame would blame it on you, right?

You can set me as author:
Björn Schäpers <bjoern@hazardy.de>
My Github Account is also called HazardyKnusperkeks.

The process is that you add (https://llvm.org/docs/Contributing.html)

Patch By: HazardyKnusperkeks

to the commit message if the user doesn't have commit access, if you want your name against the blame then I recommend applying for commit access yourself.

That is incorrect and does not represent the nowadays reality, i suggest that you look up the docs.

let me know if you still want me to land this

What/Where are the docs? I read https://llvm.org/docs/Contributing.html before hand and just now https://llvm.org/docs/CodeReview.html

My comment was addressed at @MyDeveloperDay
https://llvm.org/docs/DeveloperPolicy.html#commit-messages

If you’re not the original author, ensure the ‘Author’ property of the commit
is set to the original author and the ‘Committer’ property is set to yourself.
You can use a command similar to git commit --amend --author="John Doe <jdoe@llvm.org>"
to correct the author property if it is incorrect.
See Attribution of Changes for more information including the method
we used for attribution before the project migrated to git.

IOW @HazardyKnusperkeks's request was correct.


I will update the patch, but that won't happen before the weekend.

Can I assume you need someone to land this for you?

Yes I do. But I have a question, my last change is commited in your name, that means git blame would blame it on you, right?

You can set me as author:
Björn Schäpers <bjoern@hazardy.de>
My Github Account is also called HazardyKnusperkeks.

The process is that you add (https://llvm.org/docs/Contributing.html)

Patch By: HazardyKnusperkeks

to the commit message if the user doesn't have commit access, if you want your name against the blame then I recommend applying for commit access yourself.

That is incorrect and does not represent the nowadays reality, i suggest that you look up the docs.

let me know if you still want me to land this

Yes I agree I hadn’t seen that the process had changed,

This is one reason why I don’t like landing patches for others, this just confirms that in the future I will generally request people apply for commit access themselves.

MyDeveloperDay requested changes to this revision.Dec 7 2020, 11:17 PM

Could we consider dropping the maximum?

This revision now requires changes to proceed.Dec 7 2020, 11:17 PM
HazardyKnusperkeks marked an inline comment as done.Dec 8 2020, 1:19 AM

Can I assume you need someone to land this for you?

Yes I do. But I have a question, my last change is commited in your name, that means git blame would blame it on you, right?

You can set me as author:
Björn Schäpers <bjoern@hazardy.de>
My Github Account is also called HazardyKnusperkeks.

The process is that you add (https://llvm.org/docs/Contributing.html)

Patch By: HazardyKnusperkeks

to the commit message if the user doesn't have commit access, if you want your name against the blame then I recommend applying for commit access yourself.

That is incorrect and does not represent the nowadays reality, i suggest that you look up the docs.

let me know if you still want me to land this

Yes I agree I hadn’t seen that the process had changed,

This is one reason why I don’t like landing patches for others, this just confirms that in the future I will generally request people apply for commit access themselves.

And where do I do that? Also I did not think I would not have a chance of getting the access so early.

And where do I do that? Also I did not think I would not have a chance of getting the access so early.

https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access

Could we consider dropping the maximum?

While rewriting the tests with the unmodified clang-format I just confirmed that currently only the minimum of 1 is enforced, there is no maximum. I.e.

//x
int x;
//   y
int y;

will be formatted as

// x
int x;
//   y
int y;

So for what I want to do I can:

  1. Enforce the Minimum (for LLVM Style 1)
  2. Enforce an optional Maximum (but what if the Maximum is set to 0, what I want to do, given that the Minimum is 1?)
  3. Stay with the current design of a Minimum/Maximum Pair with (practically unbounded Maximum)

I would choose #3, in that case I only have to add a test for line comment sections (and most likely adapt the implementation).

/// A List:
///  * Foo
///  * Bar

This didn't really address the comments, what is the point of the maximum? what if the maximum is > the ColumnLimit?

This didn't really address the comments, what is the point of the maximum?

My goal is to remove all spaces between // and the content (with the exception of comment sections, as I have learned here), and do not break the current behavior in any way, and currently it seems to work with an unlimited maximum, and a minimum of 1.

what if the maximum is > the ColumnLimit?

Most likely what ever happens now if the space in the comment is larger than the ColumnLimit. But one could just remove spaces as needed.
A more interesting question would be what happens if the minimum is larger than the ColumnLimit?. For that one had to decide which is more important, I would go with the ColumnLimit and reduce the minimum, but maybe that could be handled with the penalties? Although I have to admit that I don't understand where and how they are used.

HazardyKnusperkeks marked an inline comment as done.
HazardyKnusperkeks edited the summary of this revision. (Show Details)

I'm back!
I've reworked the change to correctly(*) work with line comment sections.

*: That is to be discussed, currently there is one change in behavior which is not covered through previous tests and one test which is failing. I will highlight that in inline comments.

HazardyKnusperkeks marked an inline comment as done.Dec 27 2020, 1:29 AM

This didn't really address the comments, what is the point of the maximum? what if the maximum is > the ColumnLimit?

Current behavior if there are more spaces than ColumnLimit: Do not format the comment at all. Now if the minimum is larger than the ColumnLimit we will obey the limit and then normal behavior kicks in, this is also covered in the tests.

clang/docs/ClangFormatStyleOptions.rst
2727

I find it confusing why we have 2, Minimum and Maximum, instead of a single one.
I'm not convinced that Maximum is useful.
Conceptually I'd prefer a single integer option, say LineCommentContentIndent that would indicate the default indent used for the content of line comments. I'd naively expect LineCommentContentIndent = :

  • 0 would produce //comment
  • 1 would produce // comment (current default)
  • 2 would produce // comment, etc.

and this will work with if the input is any of //comment, // comment, or // comment, etc.

An additional consideration is that line comment sections often contain additional indentation, e.g. when there is a bullet list, paragraphs, etc. and so we can't guarantee that the indent of each line comment will be less than Maximum in general. I'd expect this feature to not adjust extra indent in comments, e.g.,

// Lorem ipsum dolor sit amet,
//  consectetur adipiscing elit,
//  ...

after reformatting with LineCommentContentIndent=0 to produce

//Lorem ipsum dolor sit amet,
// consectetur adipiscing elit,
// ...

(and vice-versa, after reformatting with LineCommentContentIndent=1).
This may well be handled by code, I just wasn't sure by looking at the code and test examples.

clang/unittests/Format/FormatTestComments.cpp
3135–3141

Here the test fails, because commen1 gets a space added and commen3 belongs to the same section, thus also gets an additional space.
I see three options:

  1. The whole keeping indentation in a section is wrong.
  2. Disable the mechanic for text proto.
  3. Adapt the test.
3393

Here is the difference. Before this would have been formatted as

// if (ret1) {
//  return 2;
//}

So only one space added for the if, it did not keep the indentation of the return and not adding a space to }.
I think this is much better and also basically what @krasimir requested.

HazardyKnusperkeks edited the summary of this revision. (Show Details)Dec 27 2020, 1:34 AM

My assumption is that you want to stick with the minimum and maximum is that correct?

My assumption is that you want to stick with the minimum and maximum is that correct?

Otherwise I have to make a breaking change, or not achieve at all what I want. So either abandon this or we need a minimum and maximum.
Although right now I have changed an existing test because in that case the behavior changed (as noted inline) and a test case still to decide how to advance.

If that little break is not acceptable I see no further base to pursue this and have to decide to drop it totally or only apply it locally.

And thanks for bringing this up again, currently I have very little time to work on clang-format and only react on the mails. Even while I have many things I want to add/change or cases where it is currently misformatted.

MyDeveloperDay accepted this revision.Jan 17 2021, 3:20 AM
This revision is now accepted and ready to land.Jan 17 2021, 3:20 AM
  • Rebased
  • Fixed(?) the last UnitTest, please take a look @krasimir
krasimir added inline comments.Jan 29 2021, 1:06 AM
clang/unittests/Format/FormatTestComments.cpp
3135–3141

This test change looks OK.

The previous one broke a (format) test in polly. This lead me to change the one breaking behavior, no it is not breaking anymore.
A comment starting with } will only be indented if it is in a comment section which will get an indention. Test case is adapted.

I will wait a few days if there is no negative feedback I will push it again.

This revision is now accepted and ready to land.Jan 29 2021, 1:06 PM
curdeius accepted this revision.Jan 30 2021, 1:45 AM

LGTM. Could you please give us a link to the failing test in Polly? May be GitHub or buildbot.

LGTM. Could you please give us a link to the failing test in Polly? May be GitHub or buildbot.

No problem:

http://lab.llvm.org:8011/#builders/10/builds/2294

I have a script that runs clang-format -n on various directories in clang
that are clang format clean, polly is one of them because they have clang
format as a unit test

I use this to ensure I don’t regress behaviour

Maybe we should formalise this with some sort of clang-format-check cmake
rule

Mydeveloperday

I have a script that runs clang-format -n on various directories in clang
that are clang format clean, polly is one of them because they have clang
format as a unit test

I use this to ensure I don’t regress behaviour

Maybe we should formalise this with some sort of clang-format-check cmake
rule

Mydeveloperday

That would be ok for me.

This revision was landed with ongoing or failed builds.Feb 1 2021, 1:49 PM
This revision was automatically updated to reflect the committed changes.

Hi guys,i found SpacesInLineCommentPrefix does not support other encoding such as utf8 ,
I am curious why there is a isAlphanumeric limit in BreakableLineCommentSection::BreakableLineCommentSection() ?
I want to make some contribution to make it support utf8, what should i do ?

Hi guys,i found SpacesInLineCommentPrefix does not support other encoding such as utf8 ,
I am curious why there is a isAlphanumeric limit in BreakableLineCommentSection::BreakableLineCommentSection() ?
I want to make some contribution to make it support utf8, what should i do ?

see https://llvm.org/docs/Contributing.html but in essence:

  1. open a bug at https://bugs.llvm.org/
  2. clone the llvm repo from gitub
  3. build the repo
  4. add unit tests in clang/unittests/Format that show the problem
  5. add the code that fixes the issue
  6. upload a diff of the patch to reviews.llvm.org
  7. add clang-format project tag and at least me as a reviewer and I can help fill in the rest

This sounds like a great new contributor idea.. go for it! I'll support this.

I missed the most important step!

  1. Add LLVM contributor to your CV.

No seriously I mean it. I interview people all the time, if I saw that on a CV, it would immediately start a conversation about what/who/why you did it. (allowing me to look up your contribution)

As an interviewer, Contributing to open source is a great thing!

Hi guys,i found SpacesInLineCommentPrefix does not support other encoding such as utf8 ,
I am curious why there is a isAlphanumeric limit in BreakableLineCommentSection::BreakableLineCommentSection() ?
I want to make some contribution to make it support utf8, what should i do ?

The isAlphanumeric is there to not break doxygen like comments for example.

I'm very interested in how you want to tackle that problem. :)

ksyx added a subscriber: ksyx.EditedJan 17 2022, 3:17 PM

Hi guys,i found SpacesInLineCommentPrefix does not support other encoding such as utf8 ,
I am curious why there is a isAlphanumeric limit in BreakableLineCommentSection::BreakableLineCommentSection() ?
I want to make some contribution to make it support utf8, what should i do ?

The isAlphanumeric is there to not break doxygen like comments for example.

I'm very interested in how you want to tackle that problem. :)

Sorry for digging out this months long thing but I encountered this in one project full of non-alphanumeric comment. If I understood the problem correctly, is it more like avoiding symbols like @ / * rather than avoiding non-ASCII characters like CJK's? In this case, would just not changing leading spaces when it is started with symbols make this option usable for wider amount of comment content? :)

Hi guys,i found SpacesInLineCommentPrefix does not support other encoding such as utf8 ,
I am curious why there is a isAlphanumeric limit in BreakableLineCommentSection::BreakableLineCommentSection() ?
I want to make some contribution to make it support utf8, what should i do ?

The isAlphanumeric is there to not break doxygen like comments for example.

I'm very interested in how you want to tackle that problem. :)

Sorry for digging out this months long thing but I encountered this in one project full of non-alphanumeric comment. If I understood the problem correctly, is it more like avoiding symbols like @ / * rather than avoiding non-ASCII characters like CJK's? In this case, would just not changing leading spaces when it is started with symbols make this option usable for wider amount of comment content? :)

So we basically have to choose between using this simple trick, but excluding non latin characters, or creating a list where we do not add spaces, which is possibly incomplete.

But please create a patch and we will evaluate it.