This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Add space to comments starting with '#'.
ClosedPublic

Authored by curdeius on Mar 11 2022, 1:32 AM.

Diff Detail

Event Timeline

curdeius created this revision.Mar 11 2022, 1:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 11 2022, 1:32 AM
curdeius requested review of this revision.Mar 11 2022, 1:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 11 2022, 1:32 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

I'm not sure whether we should consider this a breaking change, there were no tests at all verifying this behaviour.
I think that not adding a space in comments starting with a punctuation is a mistake unless for some special comments like ///, //! etc. (which are handled separately anyway).

MyDeveloperDay accepted this revision.Mar 11 2022, 1:49 AM
This revision is now accepted and ready to land.Mar 11 2022, 1:49 AM

Is this conforming to SpacesInLineCommentPrefix

curdeius updated this revision to Diff 414669.Mar 11 2022, 8:28 AM

Add tests for SpacesInLineCommentPrefix.

Is this conforming to SpacesInLineCommentPrefix

Yes it is, nothing changed in this regard, I added tests anyway.

owenpan accepted this revision.Mar 11 2022, 2:30 PM
MyDeveloperDay accepted this revision.Mar 12 2022, 1:23 AM
This revision was automatically updated to reflect the committed changes.

This appears to have broken a relatively common pattern we see in text proto comments where sections use a style like this:

################################################################################
# Big section name
################################################################################

After this patch, this gets broken up (in text protos we recognize up to 4 #-s as a comment leader: https://github.com/llvm/llvm-project/blob/e725e2afe02e18398525652c9bceda1eb055ea64/clang/lib/Format/BreakableToken.cpp#L46):

#### ############################################################################
# Big section name
#### ############################################################################

IMO, this is analogous to, e.g., using many /s in C++ source, which clang-format leaves alone.

//////////////////////
// section name
//////////////////////

I'll be happy to work on a fix and add a regression test; wanted to this with you first. Also, do we now still develop clang-format's patches & discuss here first, or over at github via PRs?

This appears to have broken a relatively common pattern we see in text proto comments where sections use a style like this:

################################################################################
# Big section name
################################################################################

That's a pity.
Do you think that just doing a different thing (what we had before) for Proto (and if possible other languages where # is used in a similar way) should be enough?

This appears to have broken a relatively common pattern we see in text proto comments where sections use a style like this:

################################################################################
# Big section name
################################################################################

That's a pity.
Do you think that just doing a different thing (what we had before) for Proto (and if possible other languages where # is used in a similar way) should be enough?

I was thinking about something a bit more sophisticated, where if the first character looks like a continuation of the comment leader we keep them together (effectively generalizing whatever keeps //// together), but have to look how this is actually handled.

krasimir added a comment.EditedMar 21 2022, 3:06 AM

This is getting more fun: the reason why /// isn't broken up is because the first character / after the comment leader // is considered punctuation per isPunctuation's CHAR_RAWDEL, which includes all these {}[]#<>%:;?*+-/^&|~!=,"':

With this patch, we got stuff like:

% cat test.cc
//# status
//. status
//; status
//{ status
/// status
//, status
//: status
% ~/llvm/llvm-project/build/bin/clang-format test.cc
// # status
//. status
//; status
//{ status
/// status
//, status
//: status

This makes me think that we should really consider not adding indent if the first character is punctuation WAI, with rationale:

  • comments starting with such characters often have special meaning, clang-format errors out on the conservative side to not touch them. Example is stuff like //! in Doxygen blocks: https://www.doxygen.nl/manual/docblocks.html (although that's special-handled elsewhere).
  • it's possible to work around this by changing the source code to use // #; clang-format will respect indentation in that case (also for other punctuation characters).

WDYAT?

This makes me think that we should really consider not adding indent if the first character is punctuation WAI, with rationale:

That was the case before this patch.

  • comments starting with such characters often have special meaning, clang-format errors out on the conservative side to not touch them. Example is stuff like //! in Doxygen blocks: https://www.doxygen.nl/manual/docblocks.html (although that's special-handled elsewhere).

I'm aware of these special characters, but I wasn't aware of the fact that # is a special case too (I don't know proto too much).

  • it's possible to work around this by changing the source code to use // #; clang-format will respect indentation in that case (also for other punctuation characters).

Again, that was the case before this patch.

What I wanted to achieve was to only treat a limited list of special punctuation characters, not all of them.
As you mentioned, we have at least ///, //<, //! for Doxygen comments. There are more of course.
I think that every (group of) language(s) should have a different way of defining whether a given character is special and so should be kept with the leading //.
I hope it makes sense given that C-language family is pretty different from Proto where it comes to comments.
So I'd rather keep the old behaviour for # in Proto but not in C-like languages.
Is that acceptable?

So I'd rather keep the old behaviour for # in Proto but not in C-like languages. Is that acceptable?

Sounds good. I'll work on a patch doing that.

MyDeveloperDay added a comment.EditedMar 23 2022, 2:53 AM

I'm a little uncomfortable with

//#<preprocessor directive>

becoming

// #<preprocessor directive>

At least without an option for it to not make changes

curdeius added a comment.EditedMar 23 2022, 4:46 AM

I'm a little uncomfortable with

//#<preprocessor directive>

becoming

// #<preprocessor directive>

At least without an option for it to not make changes

Don't you think that //#PPdirective should be indented the same way as other code?
E.g. in:

#include <header.h>

int something;

You comment the whole block and it becomes (with this patch):

// #include <header.h>
//
// int something;

whereas it was:

//#include <header.h>
//
// int something;

which seems inconsistent at least.

I'm a little uncomfortable with

//#<preprocessor directive>

becoming

// #<preprocessor directive>

At least without an option for it to not make changes

Don't you think that //#PPdirective should be indented the same way as other code?
E.g. in:

#include <header.h>

int something;

You comment the whole block and it becomes (with this patch):

// #include <header.h>
//
// int something;

whereas it was:

//#include <header.h>
//
// int something;

which seems inconsistent at least.

That makes sense..

You comment the whole block and it becomes (with this patch):

// #include <header.h>
//
// int something;

whereas it was:

//#include <header.h>
//
// int something;

which seems inconsistent at least.

A fun wrinkle here is is that if you originally wrote (note no blank line):

//#include <header.h>
//int something;

and ran old clang-format, you get (and checked in)

//#include <header.h>
// int something;

and now if you run new clang-format, the second line gets an *extra* space to preserve the indent:

// #include <header.h>
//  int something;

I don't think this is worth fixing, had me scratching my head though :-)