This is an archive of the discontinued LLVM Phabricator instance.

clang-format: do not reflow bullet lists
ClosedPublic

Authored by Typz on May 17 2017, 8:59 AM.

Details

Summary

This patch prevents reflowing bullet lists in block comments.

It handles all lists supported by doxygen and markdown, e.g. bullet
lists starting with '-', '*', '+', as well as numbered lists starting
with -# or a number followed by a dot.

Diff Detail

Repository
rL LLVM

Event Timeline

Typz created this revision.May 17 2017, 8:59 AM
djasper added a subscriber: djasper.
krasimir added inline comments.May 18 2017, 12:26 AM
lib/Format/BreakableToken.cpp
313 ↗(On Diff #99312)

A problem with this is that sometimes you have a sentence ending with a number, like this one, in 2016. If this sentence also happens to just go over the column width, its last part would be reflown and during subsequent passes it will be seen as a numbered list, which is super unfortunate. I'd like us to come up with a more refined strategy of handling this case. Maybe we should look at how others are doing it?

315 ↗(On Diff #99312)

This builds an llvm::Regex on each invocation, which is wasteful.

unittests/Format/FormatTestComments.cpp
1663 ↗(On Diff #99312)

I'd also like to see tests where we correctly reflow lists with multiline entries.

Typz marked an inline comment as done.May 18 2017, 5:53 AM
Typz added inline comments.
lib/Format/BreakableToken.cpp
313 ↗(On Diff #99312)

Looking at doxygen, it seems there is no extra protection: just a number followed by a dot...
So it means:

  • We should never break before a such a sequence, to avoid the issue.
  • We may also limit the expression to limit the size of the number: I am not sure there are cases where bullet lists with hundreds of items are used, esp. with explicit values (uses the auto-numbering -# would be much simpler in that case). Maybe a limit of 2 digits? The same limit would be applied to prevent breaking before a number followed by a dot.

What do you think?

315 ↗(On Diff #99312)

I did this to keep the function re-entrant; but since the code is not multi-thread I can use a static variable instead.

Typz marked 2 inline comments as done.May 18 2017, 5:54 AM
Typz updated this revision to Diff 99427.May 18 2017, 5:55 AM
  • Use static regex to avoid recreating it each time
  • Add more tests
krasimir added inline comments.May 18 2017, 6:39 AM
lib/Format/BreakableToken.cpp
313 ↗(On Diff #99312)

I like the combination of the two options: let's limit to 2 digits and not break before a matching numbered list sequence followed by a fullstop. That would require also a little change to BreakableToken::getCommentSplit.

Typz marked 3 inline comments as done.May 18 2017, 7:58 AM
Typz added inline comments.
lib/Format/BreakableToken.cpp
313 ↗(On Diff #99312)

Done, but I could find a use-case where this would break subsequent passes, apart from re-running clang-format ; but in this case it is fine, since the comments are already formatted to fit, and will thus not be reflowed...

Typz updated this revision to Diff 99436.May 18 2017, 8:01 AM
Typz marked an inline comment as done.

Limit to 2 digits and not break before a matching numbered list sequence followed by a fullstop, to avoid interpreting numbers at the end of sentence as numbered bullets (and thus preventing reflow).

krasimir accepted this revision.May 22 2017, 5:39 AM

Looks great!

This revision is now accepted and ready to land.May 22 2017, 5:39 AM
This revision was automatically updated to reflect the committed changes.