This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Handle CRLF correctly when formatting escaped newlines
ClosedPublic

Authored by peterbudai on Oct 11 2017, 8:55 AM.

Details

Summary

This patch fixes clang-format lexer token handling when parsing source file with CRLF line endings.

The original bug I discovered:

#define MACRO(x) \
private:         \
  int x(int a);

was formatted correctly with LF line endings and incorrectly with CRLF line endings. In the latter case a new line was inserted before private, because it was incorrectly recognized as an identifier instead of a keyword.

The relevant code section was fixed and the affected test was updated to include CRLF versions of the test inputs as well. These test failed before and run successfully now.

Event Timeline

peterbudai created this revision.Oct 11 2017, 8:55 AM
krasimir added inline comments.Oct 20 2017, 4:39 AM
lib/Format/FormatTokenLexer.cpp
539

Hm, this does more. Why not just keep the DOS \r\n and the Unix \n cases, as these are the cases we're testing for? Also, please clang-format this patch.

Removed untested \r\n and \r cases, applied clang-format to the patch.

peterbudai marked an inline comment as done.Oct 24 2017, 9:17 AM
peterbudai added inline comments.
lib/Format/FormatTokenLexer.cpp
539

My idea was to make it aligned to Lexer::getCharAndSizeSlow() and Lexer::getEscapedNewLineSize() that already handled any combinations or \r and \n similarly.

But I think you're right, it's no point in adding untested cases.

krasimir accepted this revision.Oct 26 2017, 12:30 PM

Looks good! Thank you!

This revision is now accepted and ready to land.Oct 26 2017, 12:30 PM

Do you need me to commit this for you?

peterbudai marked an inline comment as done.Oct 27 2017, 1:23 AM

Yes, that would be nice, thank you!

krasimir updated this revision to Diff 120822.Oct 30 2017, 7:37 AM
  • Rebase with master