Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I'm not sure whether we should consider this a breaking change, there were no tests at all verifying this behaviour.
I think that not adding a space in comments starting with a punctuation is a mistake unless for some special comments like ///, //! etc. (which are handled separately anyway).
This appears to have broken a relatively common pattern we see in text proto comments where sections use a style like this:
################################################################################ # Big section name ################################################################################
After this patch, this gets broken up (in text protos we recognize up to 4 #-s as a comment leader: https://github.com/llvm/llvm-project/blob/e725e2afe02e18398525652c9bceda1eb055ea64/clang/lib/Format/BreakableToken.cpp#L46):
#### ############################################################################ # Big section name #### ############################################################################
IMO, this is analogous to, e.g., using many /s in C++ source, which clang-format leaves alone.
////////////////////// // section name //////////////////////
I'll be happy to work on a fix and add a regression test; wanted to this with you first. Also, do we now still develop clang-format's patches & discuss here first, or over at github via PRs?
That's a pity.
Do you think that just doing a different thing (what we had before) for Proto (and if possible other languages where # is used in a similar way) should be enough?
I was thinking about something a bit more sophisticated, where if the first character looks like a continuation of the comment leader we keep them together (effectively generalizing whatever keeps //// together), but have to look how this is actually handled.
This is getting more fun: the reason why /// isn't broken up is because the first character / after the comment leader // is considered punctuation per isPunctuation's CHAR_RAWDEL, which includes all these {}[]#<>%:;?*+-/^&|~!=,"':
- https://github.com/llvm/llvm-project/blob/73a15ad567071c13e761fd6d593e2b66164865e2/clang/lib/Format/BreakableToken.cpp#L790
- https://github.com/llvm/llvm-project/blob/73a15ad567071c13e761fd6d593e2b66164865e2/clang/include/clang/Basic/CharInfo.h#L139
- https://github.com/llvm/llvm-project/blob/73a15ad567071c13e761fd6d593e2b66164865e2/clang/include/clang/Basic/CharInfo.h#L31
With this patch, we got stuff like:
% cat test.cc //# status //. status //; status //{ status /// status //, status //: status % ~/llvm/llvm-project/build/bin/clang-format test.cc // # status //. status //; status //{ status /// status //, status //: status
This makes me think that we should really consider not adding indent if the first character is punctuation WAI, with rationale:
- comments starting with such characters often have special meaning, clang-format errors out on the conservative side to not touch them. Example is stuff like //! in Doxygen blocks: https://www.doxygen.nl/manual/docblocks.html (although that's special-handled elsewhere).
- it's possible to work around this by changing the source code to use // #; clang-format will respect indentation in that case (also for other punctuation characters).
WDYAT?
That was the case before this patch.
- comments starting with such characters often have special meaning, clang-format errors out on the conservative side to not touch them. Example is stuff like //! in Doxygen blocks: https://www.doxygen.nl/manual/docblocks.html (although that's special-handled elsewhere).
I'm aware of these special characters, but I wasn't aware of the fact that # is a special case too (I don't know proto too much).
- it's possible to work around this by changing the source code to use // #; clang-format will respect indentation in that case (also for other punctuation characters).
Again, that was the case before this patch.
What I wanted to achieve was to only treat a limited list of special punctuation characters, not all of them.
As you mentioned, we have at least ///, //<, //! for Doxygen comments. There are more of course.
I think that every (group of) language(s) should have a different way of defining whether a given character is special and so should be kept with the leading //.
I hope it makes sense given that C-language family is pretty different from Proto where it comes to comments.
So I'd rather keep the old behaviour for # in Proto but not in C-like languages.
Is that acceptable?
So I'd rather keep the old behaviour for # in Proto but not in C-like languages. Is that acceptable?
Sounds good. I'll work on a patch doing that.
I'm a little uncomfortable with
//#<preprocessor directive>
becoming
// #<preprocessor directive>
At least without an option for it to not make changes
Don't you think that //#PPdirective should be indented the same way as other code?
E.g. in:
#include <header.h> int something;
You comment the whole block and it becomes (with this patch):
// #include <header.h> // // int something;
whereas it was:
//#include <header.h> // // int something;
which seems inconsistent at least.
A fun wrinkle here is is that if you originally wrote (note no blank line):
//#include <header.h> //int something;
and ran old clang-format, you get (and checked in)
//#include <header.h> // int something;
and now if you run new clang-format, the second line gets an *extra* space to preserve the indent:
// #include <header.h> // int something;
I don't think this is worth fixing, had me scratching my head though :-)