Page MenuHomePhabricator

[clang-format] Fixed AfterEnum handling
Needs RevisionPublic

Authored by atirit on Dec 30 2020, 12:59 AM.

Details

Summary

Previously, the AfterEnum option of the BraceWrapping section was treated as always true, regardless of the format passed. These changes resolve this behaviour.

Diff Detail

Unit TestsFailed

TimeTest
10 msx64 debian > Clang-Unit.Format/_/FormatTests::FormatTest.FormatsTypedefEnum
Note: Google Test filter = FormatTest.FormatsTypedefEnum [==========] Running 1 test from 1 test case. [----------] Global test environment set-up.
40 msx64 windows > Clang-Unit.Format/_/FormatTests_exe::FormatTest.FormatsTypedefEnum
Note: Google Test filter = FormatTest.FormatsTypedefEnum [==========] Running 1 test from 1 test case.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I think we are missing some clarity in this bug as to what the actual problem is, I do agree the test looks wrong,

This seems to be that in "Attach" mode, then AllowShortEnumsOnASingleLine=false doesn't attach the brace.

I'm somewhat struggling to understand how the code change fixes that.. given that Style.BraceWrapping.AfterEnum should be true correct?

clang/lib/Format/UnwrappedLineParser.cpp
1305

surely this means always add a newline? but that isn't what is wanted from what I can tell.

curdeius added a comment.EditedJan 2 2021, 6:14 AM

I think we are missing some clarity in this bug as to what the actual problem is, I do agree the test looks wrong,

I agree on this. I'd like to see a more exhaustive test suite for all the combinations of AfterEnum and AllowShortEnumsOnASingleLine, not only fixing the single wrong test.

This seems to be that in "Attach" mode, then AllowShortEnumsOnASingleLine=false doesn't attach the brace.

That's my understanding as well.

I'm somewhat struggling to understand how the code change fixes that.. given that Style.BraceWrapping.AfterEnum should be true correct?

In Attach mode, AfterEnum is false. Cf. expandPresets https://github.com/llvm/llvm-project/blob/47877c9079c27f19a954b660201ea47717c82fec/clang/lib/Format/Format.cpp#L752.

Said all that, it *seems* to me that the fix is correct apart from the strangely looking if (!Style.isCpp()) { change that I don't really understand. Why should C++ be handled differently in this regard? What am I missing?

atirit added a comment.EditedJan 2 2021, 1:51 PM

This seems to be that in "Attach" mode, then AllowShortEnumsOnASingleLine=false doesn't attach the brace.

That is correct, but the main issue is that AfterEnum: false, which Attach mode implies, doesn't function correctly.

Said all that, it *seems* to me that the fix is correct apart from the strangely looking if (!Style.isCpp()) { change that I don't really understand. Why should C++ be handled differently in this regard? What am I missing?

That code was there before. Here is the code currently in release, including comments:

case tok::kw_enum:
  // Ignore if this is part of "template <enum ...".
  if (Previous && Previous->is(tok::less)) {
    nextToken();
    break;
  }

  // parseEnum falls through and does not yet add an unwrapped line as an
  // enum definition can start a structural element.
  if (!parseEnum())
    break;
  // This only applies for C++.
  if (!Style.isCpp()) {
    addUnwrappedLine();
    return;
  }
  break;

I modified this code in an attempt to fix this bug, but that broke styling in other languages. It turns out this section required no modification. The only changes my revisions at present introduce is the change for the unit test and the modification of the following function (ignoring the // NEW comments):

static bool ShouldBreakBeforeBrace(const FormatStyle &Style,
                                   const FormatToken &InitialToken) {
  if (InitialToken.isOneOf(tok::kw_namespace, TT_NamespaceMacro))
    return Style.BraceWrapping.AfterNamespace;
  if (InitialToken.is(tok::kw_class))
    return Style.BraceWrapping.AfterClass;
  if (InitialToken.is(tok::kw_union))
    return Style.BraceWrapping.AfterUnion;
  if (InitialToken.is(tok::kw_struct))
    return Style.BraceWrapping.AfterStruct;
  if (InitialToken.is(tok::kw_enum))         // NEW
    return Style.BraceWrapping.AfterEnum;    // NEW
  return false;
}

I can add some unit tests for the variations of AllowShortEnumsOnASingleLine and AfterEnum.

If there's anything I can explain better please let me know.

atirit added a comment.EditedJan 2 2021, 5:40 PM

After writing a unit test, I've found that a combination of AfterEnum: true and AllowShortEnumsOnASingleLine: true doesn't function properly even in release.. My next revision will include a fix for that alongside the unit test.

atirit added a comment.Jan 2 2021, 8:56 PM

Turns out the true/true bug goes quite deep. I've managed to resolve the first bit of it with a hack that I'm sure will warrant some criticism, but I haven't familiarised myself with this codebase enough to write a cleaner version.

The second issue I'm still resolving. I'll expand on what's going on with this bug once I push a revision.

atirit added a comment.EditedJan 3 2021, 12:55 AM

OK, I'm getting a few more failed unit tests and I'm really not sure what correct formatting behaviour is anymore, so I'll just ask.

Assuming the following settings:

BreakBeforeBraces: Custom
BraceWrapping: {
  AfterEnum: true
}
AllowShortEnumsOnASingleLine: true

Which of these is correct? (FormatTest.AllmanBraceBreaking)
A (expected in test):

enum X
{
  Y = 0
}

B:

enum X { Y = 0 }

Which of these is correct? (new test)
A:

enum
{
   A,
   B
} Enum1, Enum2

B:

enum
{
   A,
   B
} Enum1,
   Enum2

C:

enum { A, B } Enum1, Enum2

true/true results in option B of the "new test" section. The strange behaviour with Enum1 and Enum2 occurs when AllowShortEnumsOnASingleLine isn't allowed to shrink a line for whatever reason, be it a comma on the last case, a comment, too long an enum, or AfterEnum: true, which for whatever reason takes precedence.

The docs don't cover corner cases like this from what I can tell. It may be worth doing so in the future so that cases like this aren't implementation-defined.

atirit added a comment.Jan 3 2021, 2:42 AM

I think accepting a revision that includes only a fix for AfterEnum being ignored (not the corner case) and the new unit test would be the best way to go about this, since they're separate bugs. Then I can fix the corner case (and compatibility with the new unit test) in a separate differential.

However, as I've already mentioned, I'm new here, so I defer to the judgment of those more experienced.

I think accepting a revision that includes only a fix for AfterEnum being ignored (not the corner case) and the new unit test would be the best way to go about this, since they're separate bugs. Then I can fix the corner case (and compatibility with the new unit test) in a separate differential.

However, as I've already mentioned, I'm new here, so I defer to the judgment of those more experienced.

I'm also quite new, but if that are different issues they should receive their own commit (and thus review).

You should always be clear what is not working correctly and reflect that's working after your change with tests which illustrate the now working code.

You need to have these conversations by adding new unit tests that prove your point, I highly doubt I'll personally be willing to accept any revision without more unit tests than this one line change

MyDeveloperDay added inline comments.Jan 3 2021, 6:39 AM
clang/unittests/Format/FormatTest.cpp
1352–1386

EXPECT_FALSE(Style.BraceWrapping.AfterEnum)

1399

please add

verifyFormat("enum {\n"
                     "A,\n"
                     "B,\n"
                    "} ShortEnum",
                    Style);

You need to have these conversations by adding new unit tests that prove your point, I highly doubt I'll personally be willing to accept any revision without more unit tests than this one line change

And that's why I said:

I think accepting a revision that includes only a fix for AfterEnum being ignored (not the corner case) and the new unit test would be the best way to go about this.

Obviously the current revision isn't sufficient. I'll be submitting a new one shortly.

atirit updated this revision to Diff 314296.Jan 3 2021, 11:42 AM

Added unit test

The first test fails due to the aforementioned corner case.

we don't commit with failing tests so lets understand why it fails.

Can you add the tests without multiple names for the enum

atirit updated this revision to Diff 314299.Jan 3 2021, 12:22 PM

Removed multiple enum names from new test

AfterEnum: true currently overrides AllowShortEnumsOnASingleLine: true. If this is epxected behaviour then I'll modify the test to accomodate that, but otherwise, there's a separate issue.

MyDeveloperDay requested changes to this revision.Jan 3 2021, 12:36 PM

I'm sorry, but now you are missing the files from the review, I think you diffing against your own commits and not commit that are in the repo. This makes the review very difficult to understand.

This revision now requires changes to proceed.Jan 3 2021, 12:36 PM

I don't understand; should every commit I've made be squashed into one and then submitted here?

atirit updated this revision to Diff 314301.Jan 3 2021, 1:02 PM

Squashed commits

MyDeveloperDay added inline comments.Jan 3 2021, 1:27 PM
clang/unittests/Format/FormatTest.cpp
1347

keep this test, you should keep one with 1 name and 1 with 2 names

atirit added a comment.Jan 3 2021, 1:31 PM

The test still is there; Git is just showing the diff strangely. I didn't modify that test at all. The corner case bug doesn't affect FormatTest.ShortEnums because that test effectively has AfterEnum: false. Should I add cases for AfterEnum: true in that test too? I had figured the new test took care of that.

atirit added a comment.EditedMar 7 2021, 9:09 PM

Has anyone else had the chance to review this yet? (not trying to be pushy, just curious)

I hadn't had another look, because the CI still shows the test AfterEnum to be failing.
Ping me when it's green please.

atirit added a comment.Mar 8 2021, 8:43 PM

I hadn't had another look, because the CI still shows the test AfterEnum to be failing.

I know the test fails; it's due to a separate bug wherein AfterEnum: true and AllowShortEnumsOnASingleLine: true produces incorrect behaviour. The unit test has the correct expectation. I've also already mentioned this.

The first test fails due to the aforementioned corner case.

I could modify the unit test to expect the currently broken behaviour and then fix the test later alongside a fix for the corner case.

I hadn't had another look, because the CI still shows the test AfterEnum to be failing.

I know the test fails; it's due to a separate bug wherein AfterEnum: true and AllowShortEnumsOnASingleLine: true produces incorrect behaviour. The unit test has the correct expectation. I've also already mentioned this.

The first test fails due to the aforementioned corner case.

I could modify the unit test to expect the currently broken behaviour and then fix the test later alongside a fix for the corner case.

In my opinion you should then, either temporarily deactivate the test, or fix the bug first. A failing test blocks the pipeline and confuses everyone working on the project.

In my opinion you should then, either temporarily deactivate the test, or fix the bug first. A failing test blocks the pipeline and confuses everyone working on the project.

+1

I got confused about this. I know that there was some discussion about this failing test but I thought that the plan was to fix it (as it should). Also, that's what one expects in a revision called "Fixed AfterEnum handling" :).

+1 we are not going to land this with a failing or removed test

In my opinion you should then, either temporarily deactivate the test, or fix the bug first. A failing test blocks the pipeline and confuses everyone working on the project.

I got confused about this. I know that there was some discussion about this failing test but I thought that the plan was to fix it (as it should). Also, that's what one expects in a revision called "Fixed AfterEnum handling" :).

+1 we are not going to land this with a failing or removed test

There are two bugs being discussed here. The first is the one fixed here, where AfterEnum is treated as always true. The second I found while adding unit tests for the first bug. I was advised that I shouldn't fix two bugs with a single differential, especially since the second bug has more to do with AllowShortEnumsOnASingleLine and how it is overridden by AfterEnum. (I think a more appropriate title for a diff handling the second bug would be "Fixed AllowShortEnumsOnASingleLine handling".) However, if it is acceptable to do so, then I'll add commits to fix the second bug as well before asking for this to be merged.

I agree with @HazardyKnusperkeks that failing tests should not be merged, and I understand that this patch is not ready yet.

Please let me know what you think would be the best course of action here.

In my opinion you should then, either temporarily deactivate the test, or fix the bug first. A failing test blocks the pipeline and confuses everyone working on the project.

I got confused about this. I know that there was some discussion about this failing test but I thought that the plan was to fix it (as it should). Also, that's what one expects in a revision called "Fixed AfterEnum handling" :).

+1 we are not going to land this with a failing or removed test

There are two bugs being discussed here. The first is the one fixed here, where AfterEnum is treated as always true. The second I found while adding unit tests for the first bug. I was advised that I shouldn't fix two bugs with a single differential, especially since the second bug has more to do with AllowShortEnumsOnASingleLine and how it is overridden by AfterEnum. (I think a more appropriate title for a diff handling the second bug would be "Fixed AllowShortEnumsOnASingleLine handling".) However, if it is acceptable to do so, then I'll add commits to fix the second bug as well before asking for this to be merged.

I agree with @HazardyKnusperkeks that failing tests should not be merged, and I understand that this patch is not ready yet.

Please let me know what you think would be the best course of action here.

If the bugs are (very) similar, I could live with one fix for both. Otherwise you should fix the other bug first, if its blocking this change.

atirit added a comment.EditedMar 10 2021, 1:16 AM

If the bugs are (very) similar, I could live with one fix for both. Otherwise you should fix the other bug first, if its blocking this change.

The only thing linking the bugs is AfterEnum. Other than that I'd say they're separate. If the bug should be fixed first (instead of merging incorrect but passing tests) then I think it should be a separate diff.

My concern with that course of action is that whoever reviews that diff will all but certainly ask for unit tests, and if I wrote unit tests for AfterEnum and AllowShortEnumsOnASingleLine as I have here, they'll still be incorrect because this bug is not merged. Would you foresee this being an issue?

I'll start working on a fix regardless.

If the bugs are (very) similar, I could live with one fix for both. Otherwise you should fix the other bug first, if its blocking this change.

The only thing linking the bugs is AfterEnum. Other than that I'd say they're separate. If the bug should be fixed first (instead of merging incorrect but passing tests) then I think it should be a separate diff.

My concern with that course of action is that whoever reviews that diff will all but certainly ask for unit tests, and if I wrote unit tests for AfterEnum and AllowShortCaseLabelsOnASingleLine as I have here, they'll still be incorrect because this bug is not merged. Would you foresee this being an issue?

I'll start working on a fix regardless.

Than I would say it should be one change. Omitting a (needed) test just so it passes is not good, and failing tests are worse. Just fix both at the same time.

atirit updated this revision to Diff 330495.Mar 13 2021, 11:50 PM

Fixed AfterEnum's compatibility with AllowShortEnumsOnASingleLine

atirit updated this revision to Diff 330568.Mar 15 2021, 12:52 AM

Added comments for the previous commit's changes and cleaned up those changes

I would like to request that this commit be considered for merging.

atirit updated this revision to Diff 330570.EditedMar 15 2021, 12:56 AM

Build fix

curdeius added inline comments.Mar 15 2021, 1:34 AM
clang/lib/Format/TokenAnnotator.cpp
3668–3687

Please refactor this part so that the conditions can short-circuit, and start with checking for !Style.AllowShortEnumsOnASingleLine (the cheapest check) and end with the while loop (the apparently most expensive check).
You can do this by putting these conditions in a lambda.
Something like:

auto isAllowedByShortEnums = [&] () {
  if (!Style.AllowShortEnumsOnASingleLine) return false;

  // check isLineTooBig etc.

    const FormatToken *breakingSearchToken = &Right;
    while ((breakingSearchToken = breakingSearchToken->Next)) {
      bool hasBreakingComma = breakingSearchToken->is(tok::comma) &&
                              breakingSearchToken->Next->is(tok::r_brace);
      if (breakingSearchToken->isTrailingComment() || hasBreakingComma) {
        return true;
      }
    }

  return false;
};
return (isAllowedByAfterEnum && isAllowedByShortEnums()) || // ...
3682

strlen? Please use StringRef::size().

clang/unittests/Format/FormatTest.cpp
13340

That shouldn't be necessary here.

curdeius requested changes to this revision.Mar 15 2021, 1:34 AM
This revision now requires changes to proceed.Mar 15 2021, 1:34 AM
clang/lib/Format/TokenAnnotator.cpp
3664–3709

Maybe move that check to the top and return early (without running through the loop and strlen).

clang/unittests/Format/FormatTest.cpp
13340

Is it useful to test once with false and once with true?

atirit added inline comments.Mar 15 2021, 8:10 AM
clang/lib/Format/TokenAnnotator.cpp
3682

For FormatToken.TokenText, StringRef::size() is set to the length of the token, not of the stored text. Please see clang/lib/Format/FormatTokenLexer.cpp:1047. Changing that line breaks nearly every format test so I'm assuming that it is correct. A strlen or equivalent is necessary here.

atirit added inline comments.Mar 15 2021, 11:11 AM
clang/unittests/Format/FormatTest.cpp
13340

Please see the test at clang/unittests/Format/FormatTest.cpp:13416:

verifyFormat("enum X\n"
             "{\n"
             "  Y = 0\n"
             "}\n",
             AllmanBraceStyle);

The default LLVM style that this test bases its style on has AllowShortEnumsOnASingleLine set to true, which conflicts with this test. Based on what it expects is the resultant formatting, I think it's safe to assume it expects AllowShortEnumsOnASingleLine to be false.

atirit updated this revision to Diff 330738.Mar 15 2021, 11:13 AM

Implemented requested changes

atirit marked 3 inline comments as done.Mar 15 2021, 1:52 PM
atirit updated this revision to Diff 330860.Mar 15 2021, 7:20 PM

Fixed remote build

curdeius added inline comments.Mar 16 2021, 3:45 AM
clang/lib/Format/TokenAnnotator.cpp
3682

Then it should be something like the line's length, no? Using strlen will be very expensive on non-snippets, as it strlen will traverse the string until its end (so possibly until the end of file) for each invocation of mustBreakBefore (if it goes into this condition of course).

I only see one failing check in the test FormatTest.FormatsTypedefEnum when using TokenText.size():

verifyFormat("typedef enum\n"
             "{\n"
             "  ZERO = 0,\n"
             "  ONE = 1,\n"
             "  TWO = 2,\n"
             "  THREE = 3\n"
             "} LongEnum;",
             Style);

You might need to add more tests in AfterEnum to test the behaviour of this part if the line is just below/above the limit.

Also, that's just a hack, but I made all tests pass with:

assert(Line.Last);
assert(Line.Last->TokenText.data() >= Right.TokenText.data());
auto isAllowedByShortEnums = [&]() {
  if (!Style.AllowShortEnumsOnASingleLine ||
      (Line.Last->TokenText.data() - Right.TokenText.data() +
       Right.TokenText.size() + Right.OriginalColumn) >
          Style.ColumnLimit)

I haven't given it too much thought though and am unsure whether there are cases where the above assertions will fail.

clang/unittests/Format/FormatTest.cpp
13433–13442

I would then put AllmanBraceStyle.AllowShortEnumsOnASingleLine = ...; just before these enum tests and put the old value back afterwards.
Also, as @HazardyKnusperkeks suggested test both false and true.

curdeius requested changes to this revision.Mar 16 2021, 3:45 AM
This revision now requires changes to proceed.Mar 16 2021, 3:45 AM
atirit added inline comments.Mar 16 2021, 7:24 PM
clang/lib/Format/TokenAnnotator.cpp
3682

If I use TokenText.size(), the line length check will always claim that the line is short enough. I'll be adding a unit test for this. However, you're right that a strlen is a bad idea here. I hadn't realised it would consume the entire file. I'll try to figure out a more efficient method of checking the line length.

atirit updated this revision to Diff 331172.Mar 17 2021, 12:50 AM

Removed strlen call and added tests

curdeius requested changes to this revision.Mar 17 2021, 3:56 AM
curdeius added inline comments.
clang/lib/Format/TokenAnnotator.cpp
3679–3680

I don't really understand these conditions on spaces. Could you explain your intent, please?
You really need to add specific tests for that, playing with the value of ColumnLimit, adding spacing etc.

This revision now requires changes to proceed.Mar 17 2021, 3:56 AM
atirit added inline comments.Mar 17 2021, 8:17 AM
clang/lib/Format/TokenAnnotator.cpp
3679–3680

Repeated spaces, e.g. enum { A, B, C } SomeEnum; are removed during formatting. Since they wouldn't be part of the formatted line, they shouldn't be counted towards the column limit. Only one space need be considered. Removed spaces, e.g. enum{A,B,C}SomeEnum; are handled by the fact that clang-format runs multiple passes. On the first pass, spaces would be added. On the second pass, assuming the line is then too long, the above code would catch it and break up the enum.

I'll add unit tests to check if spaces are being handled correctly.

curdeius added inline comments.Mar 17 2021, 10:01 AM
clang/lib/Format/TokenAnnotator.cpp
3679–3680

Since you use == ' ' twice, remainingLineCharCount will count only consecutive spaces, right?
But you want to count other characters, no?
So, IIUC, the condition you want is rF[wI] != '\n' && !(rF[wI] == ' ' && rF[wI - 1] == ' ') (mind the negation). It will count characters other than a newline and it will only count a series of consecutive spaces as one char. WDYT?

3681

Nit.

I've found yet another bug. When AllowShortEnumsOnASingleLine is true and a multiline enum is forced (by line length, trailing comma, etc.), multiple names for the enum are placed on separate lines.

Example:

enum { A, B, C, } ShortEnum1, ShortEnum2;

Is refactored to

enum {
    A,
    B,
    C,
} ShortEnum1,
    ShortEnum2;

Instead of the expected

enum {
    A,
    B,
    C,
} ShortEnum1, ShortEnum2;

ColumnLimit is not causing this.
When AllowShortEnumsOnASingleLine is false, the expected behaviour occurs. This affects a unit test I added, so I'll be fixing this as well in this diff.

clang/lib/Format/TokenAnnotator.cpp
3679–3680

Ah yes, that's my bad. Must have made a typo. Fixed in the next commit.

enum {
    A,
    B,
    C,
} ShortEnum1,
    ShortEnum2;

I've seen this before maybe with regard to something else, but can't quite recall. (maybe a bug in the bug tracker)

I applied this fix locally to a branch based off llvm 11.x and the FormatTest.FormatsTypedefEnum test now fails.

I applied this fix locally to a branch based off llvm 11.x and the FormatTest.FormatsTypedefEnum test now fails.

So this test is failing:

TEST_F(FormatTest, LongEnum) {
  FormatStyle Style = getLLVMStyle();
  Style.AllowShortEnumsOnASingleLine = true;
  Style.ColumnLimit = 40;

  Style.BreakBeforeBraces = FormatStyle::BS_Custom;
  Style.BraceWrapping.AfterEnum = false;

  verifyFormat("enum {\n"
               "  ZERO = 0,\n"
               "  ONE = 1,\n"
               "  TWO = 2,\n"
               "  THREE = 3\n"
               "} LongEnum;",
               Style);
  Style.BreakBeforeBraces = FormatStyle::BS_Custom;
  Style.BraceWrapping.AfterEnum = true;
  verifyFormat("enum\n"
               "{\n"
               "  ZERO = 0,\n"
               "  ONE = 1,\n"
               "  TWO = 2,\n"
               "  THREE = 3\n"
               "} LongEnum;",
               Style);
}

It fails in the second case because we don't respect the 'AfterEnum = true' and collapse the brace. It appears there is buggy logic in the remainingLineCharCount stuff which others have already been commenting on

It fails in the second case because we don't respect the 'AfterEnum = true' and collapse the brace. It appears there is buggy logic in the remainingLineCharCount stuff which others have already been commenting on

Apologies, late to the party here... this fix suggested by curdeius does indeed fix the issue:

Since you use == ' ' twice, remainingLineCharCount will count only consecutive spaces, right?
But you want to count other characters, no?
So, IIUC, the condition you want is rF[wI] != '\n' && !(rF[wI] == ' ' && rF[wI - 1] == ' ') (mind the negation). It will count characters other than a newline and it will only count a series of consecutive spaces as one char. WDYT?

clang/lib/Format/TokenAnnotator.cpp
3679–3680

Since you use == ' ' twice, remainingLineCharCount will count only consecutive spaces, right?
But you want to count other characters, no?
So, IIUC, the condition you want is rF[wI] != '\n' && !(rF[wI] == ' ' && rF[wI - 1] == ' ') (mind the negation). It will count characters other than a newline and it will only count a series of consecutive spaces as one char. WDYT?

Since you use == ' ' twice, remainingLineCharCount will count only consecutive spaces, right?
But you want to count other characters, no?
So, IIUC, the condition you want is rF[wI] != '\n' && !(rF[wI] == ' ' && rF[wI - 1] == ' ') (mind the negation). It will count characters other than a newline and it will only count a series of consecutive spaces as one char. WDYT?

That whole while loop leaves me feeling cold, it feels like a bug waiting to happen, frankly I don't understand why its necessary. It would be better to move the loop into a function and properly unit test it if there is really no other way