This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] make header guard identification stricter (with Lexer).
ClosedPublic

Authored by ioeric on Jun 3 2016, 6:18 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

ioeric updated this revision to Diff 59539.Jun 3 2016, 6:18 AM
ioeric retitled this revision from to [clang-format] make header guard identification stricter (with Lexer)..
ioeric updated this object.
ioeric added a reviewer: djasper.
ioeric added a subscriber: cfe-commits.
djasper added inline comments.Jun 3 2016, 6:43 AM
lib/Format/Format.cpp
1441 ↗(On Diff #59539)

I'd modify this to:

while (Tok.is(comment))
  if (Lex.LexFromRawLexer(Tok))
    return;

So it skips tokens until it finds a non-comment token.

1452 ↗(On Diff #59539)

Store this value in a bool and call another Lex.LexFromRawLexer afterwards. Both calls do this next.

1501 ↗(On Diff #59539)

Move the Tok.is(hash) check into isDirectiveWithName and replace the assert there.

ioeric updated this revision to Diff 59549.Jun 3 2016, 7:10 AM
ioeric marked 3 inline comments as done.
  • Addressed reviewer's comments.
ioeric added a comment.Jun 6 2016, 1:16 AM

Friendly PING

djasper added inline comments.Jun 6 2016, 1:27 AM
lib/Format/Format.cpp
1491 ↗(On Diff #59549)

I think you should pull this part out into a separate function, possibly named getOffsetAfterHeaderGuardsAndComments(). This function is growing too much.

1500 ↗(On Diff #59549)

Can this go wrong? Maybe in an empty file?

ioeric updated this revision to Diff 59698.Jun 6 2016, 1:59 AM
ioeric marked 2 inline comments as done.
  • Addressed reviewer's comments. Added a test case for empty code.
lib/Format/Format.cpp
1500 ↗(On Diff #59549)

Added a test case for empty file.

If the code is empty, Tok will be tok::eof, which would leave MinInsertionOffset as Code.size().

But the program does fail when we insert at the end of code that does not end with '\n'. We always +1 when we calculate the offset for each line (in the for-loop on Lines); however, the last line does not necessarily end with '\n', which makes offset exceed code size. I will fix this in a follow-up patch.

djasper added inline comments.Jun 6 2016, 2:35 AM
lib/Format/Format.cpp
1466 ↗(On Diff #59698)

Don't store this. Just inline Code.size() below.

1471 ↗(On Diff #59698)

I'd probably just store the offset, not the entire token.

1474 ↗(On Diff #59698)

I wonder whether we should also skip comments after the header guard .. If so, we could probably just move the skipComments() call into checkAndConsumeDirectiveWithName.

1476 ↗(On Diff #59698)

Maybe just return here and remove the HeaderGuardFound variable?

1480 ↗(On Diff #59698)

Does getFileOffset not work on the eof token?

1484 ↗(On Diff #59698)

Seems like adding an std::min call would be shorter than this comment. Any reason not to just fix it now?

ioeric updated this revision to Diff 59700.Jun 6 2016, 3:18 AM
ioeric marked 6 inline comments as done.
  • Addressed reviewer's comments.
lib/Format/Format.cpp
1474 ↗(On Diff #59698)

Not sure if we want to skip those too...seems to me that comments following header guard do not usually belong to header guard...they would most often belong to the code after it I guess?

1480 ↗(On Diff #59698)

Yes, it does.

1484 ↗(On Diff #59698)

We actually need more than that. For example, adding '\n' after the last line etc. This is not quite trivial to me since '\n' could be inserted after header insertions with the same offset.

I thought it would be better to do the fix in a separate patch. But I'll fix the crash here and handle '\n' insertion in the future.

djasper accepted this revision.Jun 6 2016, 3:39 AM
djasper edited edge metadata.

Nice :-)

This revision is now accepted and ready to land.Jun 6 2016, 3:39 AM
This revision was automatically updated to reflect the committed changes.