This is an archive of the discontinued LLVM Phabricator instance.

Better trade-off for excess characters vs. staying within the column limits.
ClosedPublic

Authored by klimek on Nov 29 2017, 7:53 AM.

Details

Summary

When we break a long line like:
Column limit: 21

                    |
// foo foo foo foo foo foo foo foo foo foo foo foo

The local decision when to allow protruding vs. breaking can lead to this
outcome (2 excess characters, 2 breaks):

// foo foo foo foo foo
// foo foo foo foo foo
// foo foo

While strictly staying within the column limit leads to this strictly better
outcome (fully below the column limit, 2 breaks):

// foo foo foo foo
// foo foo foo foo
// foo foo foo foo

To get an optimal solution, we would need to consider all combinations of excess
characters vs. breaking for all lines, but that would lead to a significant
increase in the search space of the algorithm for little gain.

Instead, we blindly try both approches and·select the one that leads to the
overall lower penalty.

Diff Detail

Repository
rL LLVM

Event Timeline

klimek created this revision.Nov 29 2017, 7:53 AM
krasimir added inline comments.
lib/Format/ContinuationIndenter.cpp
1422 ↗(On Diff #124752)

The problem here is that we're calling breakProtrudingToken 3 times more than we used to. Seems like such a waste.
@djasper: do you think this is fine?

1519 ↗(On Diff #124752)

Why we return true here?

1773 ↗(On Diff #124752)

Shouldn't we also be messing up something here in this patch?

klimek added inline comments.Nov 30 2017, 6:16 AM
lib/Format/ContinuationIndenter.cpp
1422 ↗(On Diff #124752)

If we're in DryRun mode (all through the optimization phase) and we don't actually leave around excess characters, we call this exactly once per protruding token, so the vast majority of cases will see no change.

We'll always have an additional call in non-dry-run mode, but I think that completely doesn't matter, and we'll have a second call if we actually do leave around excess characters - note that this only happens when the excess character penalty is low enough, so it won't actually affect the normal LLVM / Google styles.

1519 ↗(On Diff #124752)

Argh, because I first had inverted the concept and changed it later - thanks for catching, will fix.

1773 ↗(On Diff #124752)

This just switches reflowing to false, which will keep the original line break. This basically figures out whether we will later be able to break. That might lead to a non-perfect solution later, as we might reflow ourselves into a situation where we have to leave excess characters around, which we could have avoided if we kept under the limit, but I do not think that matteres in an observable way.

klimek updated this revision to Diff 124940.Nov 30 2017, 7:52 AM

Fix incorrect return value leading to unnecessary computation.

krasimir added inline comments.Nov 30 2017, 9:14 AM
unittests/Format/FormatTest.cpp
10007 ↗(On Diff #124940)

Could you also add a test with line comments surrounded by stuff, like in:

int f() {
  int a = /* long block comment */ 42;
}
krasimir accepted this revision.Nov 30 2017, 9:20 AM
This revision is now accepted and ready to land.Nov 30 2017, 9:20 AM
klimek updated this revision to Diff 125094.Dec 1 2017, 3:04 AM
klimek marked an inline comment as done.

Add test.

This revision was automatically updated to reflect the committed changes.