This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Non-latin comment prefix whitespace
ClosedPublic

Authored by ksyx on Feb 2 2022, 7:35 PM.

Details

Summary

This commit changes the condition of requiring comment to start with
alphabet characters to make no change only for a certain set of
characters, currently horizontal whitespace and punctuation characters.

Diff Detail

Event Timeline

ksyx requested review of this revision.Feb 2 2022, 7:35 PM
ksyx created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 2 2022, 7:35 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
MyDeveloperDay added a project: Restricted Project.Feb 3 2022, 9:40 AM
This revision is now accepted and ready to land.Feb 3 2022, 11:52 AM
This revision was landed with ongoing or failed builds.Feb 3 2022, 1:49 PM
This revision was automatically updated to reflect the committed changes.

It appears that this caused a regression by adding an additional space of indentation to line comments in some cases:

% cat test.cc  # (before)
// Comment
int i;
% clang-format -style=google test.cc
//  Comment
int i;

@ksyx could you please take a look?

Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2022, 1:41 AM
ksyx added a comment.Mar 4 2022, 4:47 AM

It appears that this caused a regression by adding an additional space of indentation to line comments in some cases:

% cat test.cc  # (before)
// Comment
int i;
% clang-format -style=google test.cc
//  Comment
int i;

@ksyx could you please take a look?

I did not reproduce it in latest main branch. This problem may in relate to this issue which have recently been fixed: https://github.com/llvm/llvm-project/issues/53844

Thank you!
Turns out the original source wasn't using an ascii space for the comment indentation but a unicode [c2 a0] no-break space:

2f 2f c2 a0 43 6f 6d 6d  65 6e 74 0a 69 6e 74 20  |//..Comment.int |

IMO we shouldn't aim to handle non-ascii spaces for indentation in clang-format; the new behavior on this example is fine.