Adding a minimum and a maximum number of allowed spaces in the line comment prefix.
The both values are needed for what I intend (a maximum of 0), and keeping the old behavior (only a minimum of 1).
Details
- Reviewers
MyDeveloperDay klimek krasimir curdeius - Commits
- rG772eb24e0062: [clang-format] Add option to control the spaces in a line comment
rG4ad41f1daf0f: Revert "[clang-format] Add option to control the spaces in a line comment"
rG078f30e04d1f: [clang-format] Add option to control the spaces in a line comment
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/docs/ClangFormatStyleOptions.rst | ||
---|---|---|
2727 | I'm personally not a massive fan of stuffing -1 into an unsigned but I understand why its there, if this was an signed and it was actually -1 would the algorithm be substantially worse in your view? | |
clang/lib/Format/BreakableToken.cpp | ||
797 | my assumption is when Maximum is -1 this is a very large -ve number correct? is that defined behavior for drop_back() |
clang/docs/ClangFormatStyleOptions.rst | ||
---|---|---|
2727 | I'm no fan if unsigned in any way, but that seems to be the way in clang-format. While parsing the style I would add checks to ensure Minimum is never negative and Maximum is always greater or equal to -1, should that print any warnings? Is there a standard way of doing so? Or should it be just silently corrected? | |
clang/lib/Format/BreakableToken.cpp | ||
797 | Since we check beforehand SpacesInPrefix is larger than Maximum there is no problem. |
I think fundamentally from my perspective this seem ok, out of interest can I ask what drove you to require it?
My assumption is that some people write comments like
// Free comment without space
and you want to be able to consistently format it to be (N spaces, as clang-format already does 1 space correct?)
// Free comment without space
is that correct? is there a common style guide asking for that? what is the rationale
clang/lib/Format/BreakableToken.cpp | ||
---|---|---|
790 | is this case covered by a unit test at all? sorry can you explain why you are looking for "##"? | |
797 | yep sorry didn't see that. |
I will go for {0,0}, so no space between // and the text, I don't know about a style guide asking for it, other than my own. (Which I can dictate in my company.) :)
I have just recently started using clang-format and it does not everything the the way I want to, on some aspects I have adapted, but on others I try to "fix" it, you can expect some more changes from me in the next time.
clang/lib/Format/BreakableToken.cpp | ||
---|---|---|
790 | It is covered by multiple tests, that's how I was made aware of it. :) Now I see a change, in the code before "#" was only accepted when the language is TextProto, now it is always. But I think for that to happen the parser (or lexer?) should have assigned something starting with"#" as comment, right? But I can change that. |
clang/lib/Format/BreakableToken.cpp | ||
---|---|---|
790 | Okay # # is formatted, I try again: |
clang/docs/ClangFormatStyleOptions.rst | ||
---|---|---|
2710 | Is this change generated? with clang/doc/tools/dump_style.py or did you hand craft it? ClangFormatStyleOptions.rst is always autogenerated from running dump_style.py, any text you want in the rst needs to be present in Format.h |
clang/docs/ClangFormatStyleOptions.rst | ||
---|---|---|
2710 | Yes it is generated, after you told me what I have to do on D91507 I have (and will in the future) always run that script, I've not touched the file directly. But I also have not really looked at it, because it is generated. If it is a bit odd I can retake a look on other options with nested entries. |
clang/lib/Format/Format.cpp | ||
---|---|---|
963 | I don't know precisely the LLVM style but does it allow more than one space (as Maximum would suggest)? |
clang/lib/Format/Format.cpp | ||
---|---|---|
963 | The part with the LLVM Style from my test case did run exactly so without any modification, so yes it allows more than one space. Since there was no option before (that I'm aware of) all other styles behaved exactly like that. I did not check the style guides if they say anything about that, I just preserved the old behavior when nothing is configured. |
Yes I do. But I have a question, my last change is commited in your name, that means git blame would blame it on you, right?
You can set me as author:
Björn Schäpers <bjoern@hazardy.de>
My Github Account is also called HazardyKnusperkeks.
The process is that you add (https://llvm.org/docs/Contributing.html)
Patch By: HazardyKnusperkeks
to the commit message if the user doesn't have commit access, if you want your name against the blame then I recommend applying for commit access yourself.
let me know if you still want me to land this
That is incorrect and does not represent the nowadays reality, i suggest that you look up the docs.
let me know if you still want me to land this
clang/docs/ClangFormatStyleOptions.rst | ||
---|---|---|
2727 | I find it confusing why we have 2, Minimum and Maximum, instead of a single one.
and this will work with if the input is any of //comment, // comment, or // comment, etc. An additional consideration is that line comment sections often contain additional indentation, e.g. when there is a bullet list, paragraphs, etc. and so we can't guarantee that the indent of each line comment will be less than Maximum in general. I'd expect this feature to not adjust extra indent in comments, e.g., // Lorem ipsum dolor sit amet, // consectetur adipiscing elit, // ... after reformatting with LineCommentContentIndent=0 to produce //Lorem ipsum dolor sit amet, // consectetur adipiscing elit, // ... (and vice-versa, after reformatting with LineCommentContentIndent=1). | |
clang/lib/Format/BreakableToken.cpp | ||
790 | Thanks for the analysis! | |
clang/unittests/Format/FormatTestComments.cpp | ||
3405 | This is desired, AFAIK, and due to the normalization behavior while reflowing: when a comment line exceeds the comment limit and is broken up into a new line, the full range of blanks is replaced with a newline. (https://github.com/llvm/llvm-project/blob/ddb002d7c74c038b64dd9d3c3e4a4b58795cf1a6/clang/lib/Format/BreakableToken.cpp#L66). // line limit V // heading // *line is // long long long long get reformatted as // line limit V // heading // *line is // long long // long long so if for ranges of blanks longer of size S>1 we copied the (S-1) blanks at the beginning of the next line, we would have cascading comment reflows undesired with longer and longer indents. |
I tend to agree with @krasimir I don't see where you really use Maximum to mean anything, the nested configuration seems perhaps unnecessarily confusing?
What/Where are the docs? I read https://llvm.org/docs/Contributing.html before hand and just now https://llvm.org/docs/CodeReview.html
I will update the patch, but that won't happen before the weekend.
clang/docs/ClangFormatStyleOptions.rst | ||
---|---|---|
2727 | I was actually going for only one value, but while writing the tests I came to the conclusion that before my change is only enforced a minimum of 1. But that very well may be because of what you call line comment sections, I did not consider that. That's why I chose a minimum and maximum. I will modify the patch to one value and will also add tests for the sections. But for that I need to remember if I added or removed spaces, right? Is there already infrastructure for that? Or is there any documentation of the various steps clang-format takes to parse and format code? Until now I tried to understand what's going on through single stepping with the debugger (quite time consuming). | |
clang/lib/Format/BreakableToken.cpp | ||
790 | In that case I will remove that check, but as said there were many tests which failed without it, I will have to adapt them too. | |
clang/unittests/Format/FormatTestComments.cpp | ||
3405 | Okay, I mean the spaced between sit and amet, while the spaces between Lorem and ipsum, and dolor and sit is kept. |
My comment was addressed at @MyDeveloperDay
https://llvm.org/docs/DeveloperPolicy.html#commit-messages
If you’re not the original author, ensure the ‘Author’ property of the commit is set to the original author and the ‘Committer’ property is set to yourself. You can use a command similar to git commit --amend --author="John Doe <jdoe@llvm.org>" to correct the author property if it is incorrect. See Attribution of Changes for more information including the method we used for attribution before the project migrated to git.
IOW @HazardyKnusperkeks's request was correct.
I will update the patch, but that won't happen before the weekend.
Yes I agree I hadn’t seen that the process had changed,
This is one reason why I don’t like landing patches for others, this just confirms that in the future I will generally request people apply for commit access themselves.
And where do I do that? Also I did not think I would not have a chance of getting the access so early.
And where do I do that? Also I did not think I would not have a chance of getting the access so early.
https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access
While rewriting the tests with the unmodified clang-format I just confirmed that currently only the minimum of 1 is enforced, there is no maximum. I.e.
//x int x; // y int y;
will be formatted as
// x int x; // y int y;
So for what I want to do I can:
- Enforce the Minimum (for LLVM Style 1)
- Enforce an optional Maximum (but what if the Maximum is set to 0, what I want to do, given that the Minimum is 1?)
- Stay with the current design of a Minimum/Maximum Pair with (practically unbounded Maximum)
I would choose #3, in that case I only have to add a test for line comment sections (and most likely adapt the implementation).
/// A List: /// * Foo /// * Bar
This didn't really address the comments, what is the point of the maximum? what if the maximum is > the ColumnLimit?
My goal is to remove all spaces between // and the content (with the exception of comment sections, as I have learned here), and do not break the current behavior in any way, and currently it seems to work with an unlimited maximum, and a minimum of 1.
Most likely what ever happens now if the space in the comment is larger than the ColumnLimit. But one could just remove spaces as needed.
A more interesting question would be what happens if the minimum is larger than the ColumnLimit?. For that one had to decide which is more important, I would go with the ColumnLimit and reduce the minimum, but maybe that could be handled with the penalties? Although I have to admit that I don't understand where and how they are used.
I'm back!
I've reworked the change to correctly(*) work with line comment sections.
*: That is to be discussed, currently there is one change in behavior which is not covered through previous tests and one test which is failing. I will highlight that in inline comments.
Current behavior if there are more spaces than ColumnLimit: Do not format the comment at all. Now if the minimum is larger than the ColumnLimit we will obey the limit and then normal behavior kicks in, this is also covered in the tests.
clang/docs/ClangFormatStyleOptions.rst | ||
---|---|---|
2727 |
| |
clang/unittests/Format/FormatTestComments.cpp | ||
3141–3147 | Here the test fails, because commen1 gets a space added and commen3 belongs to the same section, thus also gets an additional space.
| |
3399 | Here is the difference. Before this would have been formatted as // if (ret1) { // return 2; //} So only one space added for the if, it did not keep the indentation of the return and not adding a space to }. |
My assumption is that you want to stick with the minimum and maximum is that correct?
Otherwise I have to make a breaking change, or not achieve at all what I want. So either abandon this or we need a minimum and maximum.
Although right now I have changed an existing test because in that case the behavior changed (as noted inline) and a test case still to decide how to advance.
If that little break is not acceptable I see no further base to pursue this and have to decide to drop it totally or only apply it locally.
And thanks for bringing this up again, currently I have very little time to work on clang-format and only react on the mails. Even while I have many things I want to add/change or cases where it is currently misformatted.
clang/unittests/Format/FormatTestComments.cpp | ||
---|---|---|
3141–3147 | This test change looks OK. |
The previous one broke a (format) test in polly. This lead me to change the one breaking behavior, no it is not breaking anymore.
A comment starting with } will only be indented if it is in a comment section which will get an indention. Test case is adapted.
I will wait a few days if there is no negative feedback I will push it again.
LGTM. Could you please give us a link to the failing test in Polly? May be GitHub or buildbot.
I have a script that runs clang-format -n on various directories in clang
that are clang format clean, polly is one of them because they have clang
format as a unit test
I use this to ensure I don’t regress behaviour
Maybe we should formalise this with some sort of clang-format-check cmake
rule
Mydeveloperday
Hi guys,i found SpacesInLineCommentPrefix does not support other encoding such as utf8 ,
I am curious why there is a isAlphanumeric limit in BreakableLineCommentSection::BreakableLineCommentSection() ?
I want to make some contribution to make it support utf8, what should i do ?
see https://llvm.org/docs/Contributing.html but in essence:
- open a bug at https://bugs.llvm.org/
- clone the llvm repo from gitub
- build the repo
- add unit tests in clang/unittests/Format that show the problem
- add the code that fixes the issue
- upload a diff of the patch to reviews.llvm.org
- add clang-format project tag and at least me as a reviewer and I can help fill in the rest
This sounds like a great new contributor idea.. go for it! I'll support this.
I missed the most important step!
- Add LLVM contributor to your CV.
No seriously I mean it. I interview people all the time, if I saw that on a CV, it would immediately start a conversation about what/who/why you did it. (allowing me to look up your contribution)
As an interviewer, Contributing to open source is a great thing!
The isAlphanumeric is there to not break doxygen like comments for example.
I'm very interested in how you want to tackle that problem. :)
Sorry for digging out this months long thing but I encountered this in one project full of non-alphanumeric comment. If I understood the problem correctly, is it more like avoiding symbols like @ / * rather than avoiding non-ASCII characters like CJK's? In this case, would just not changing leading spaces when it is started with symbols make this option usable for wider amount of comment content? :)
So we basically have to choose between using this simple trick, but excluding non latin characters, or creating a list where we do not add spaces, which is possibly incomplete.
But please create a patch and we will evaluate it.
is this case covered by a unit test at all? sorry can you explain why you are looking for "##"?