This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Fix whitespace counting stuff
ClosedPublic

Authored by sstwcw on May 1 2022, 8:02 PM.

Details

Summary

The current way of counting whitespace would count backticks as
whitespace. For Verilog stuff we need backticks to be handled
correctly. For JavaScript the current way is to compare the entire
token text to see if it's a backtick. However, when the backtick is the
first token following an escaped newline, the escaped newline will be
part of the tok::unknown token. Verilog has macros and escaped newlines
unlike JavaScript. So we can't regard an entire tok::unknown token as
whitespace. Previously, the start of every token would be matched for
newlines. Now, it is all whitespace instead of just newlines.

The column counting problem has already been fixed for JavaScript in
e71b4cbdd140f059667f84464bd0ac0ebc348387 by counting columns elsewhere.

Diff Detail

Event Timeline

sstwcw created this revision.May 1 2022, 8:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 1 2022, 8:02 PM
sstwcw requested review of this revision.May 1 2022, 8:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 1 2022, 8:02 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
clang/lib/Format/FormatTokenLexer.cpp
843

Can you add some documentation?

872

I'd put (and search while reading) the comment after the case.

875–876

Please verify your assumption with an assert.

Is this patch NFC?

clang/lib/Format/FormatTokenLexer.cpp
863–871

I'd replace the switch with if (isspace(Cur[0])) ... else if (Cur[1] == '\n' ...) ....

913
922–923

\r\n is a newline on Windows.

932
949–961

Don't you need to look ahead for a ??/?

Is there any way you could add a unit test for this?

sstwcw updated this revision to Diff 429530.May 15 2022, 5:50 AM
  • address review comments
sstwcw marked 8 inline comments as done.May 15 2022, 6:00 AM

This patch is not NFC. But there would not be change in behavior if the
input is valid C as far as I know.

input:
#define LIST \
`ENTRY \
`ENTRY
output before, note extra line:
#define LIST                                                                   \
                                                                               \
  `ENTRY \
`ENTRY
output after:
#define LIST \
`ENTRY \
`ENTRY

I don't know how I would add a test for this.

The tests in D124749 depend on this though.

clang/lib/Format/FormatTokenLexer.cpp
843

Is this comment enough?

922–923

I added a comment to make it clear the code is correct.

949–961

It has to be ??/. Now there's an assertion.

clang/lib/Format/FormatTokenLexer.cpp
843

Maybe some reasoning?

Also I think NewLen shouldn't be bigger than FormatTok->TokenText.size() should it? Add an assert and a comment on the restriction.

sstwcw updated this revision to Diff 433626.Jun 1 2022, 7:17 PM
  • add comment
sstwcw marked an inline comment as done.Jun 1 2022, 7:22 PM

Can there be a test case, not related to Verilog? Or do we need to wait until you pump up the support for that?

This revision is now accepted and ready to land.Jun 2 2022, 11:40 AM
HazardyKnusperkeks added a project: Restricted Project.Jun 2 2022, 11:40 AM
This revision was automatically updated to reflect the committed changes.

@sstwcw @HazardyKnusperkeks : this change introduced a regression in one of the clang format unit tests. I fixed it in the revision below. Could anyone please take a look / approve this revision to unblock Windows debug builds? Thanks!

https://reviews.llvm.org/D128786