Previously, the AfterEnum option of the BraceWrapping section was treated as always true, regardless of the format passed. These changes resolve this behaviour.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
Please add and fix tests.
clang/lib/Format/UnwrappedLineParser.cpp | ||
---|---|---|
2485 ↗ | (On Diff #314085) | Why is this removed? |
From what I can tell the unit tests are broken. Take FormatTest.ShortEnums for example. It passes the following code:
enum { A, B, C } ShortEnum1, ShortEnum2;
And expects no changes to be made.
However, format unit tests use the BreakBeforeBraces: Attach option. This option should attach braces to the same line, as demonstrated in the docs:
BS_Attach (in configuration: Attach) Always attach braces to surrounding context.
namespace N { enum E { E1, E2, }; // ... }
I'm new to this project so please correct me if I'm wrong, but this appears to indicate broken tests. I'll push a commit with fixed tests.
The FormatTestJS.EnumDeclarations test in fact isn't broken, but FormatTest.ShortEnums and FormatTestCSharp.CSharpKeyWordEscaping are.
EDIT: My mistake, FormatTestCSharp uses a different format than FormatTest. That test appears to be correct. Re-adding the removed lines at L2485-6 fixes this test.
Fixed the FormatTest.ShortEnums unit test and fixed compatibility with FormatTestCSharp.CSharpKeyWordEscaping and FormatTestJS.EnumDeclarations
I think if you think you have a bug that you log it in the bug tracker and we track it with this issue.
Take the following example:
enum { A, B, C, D, E, F, G, H, I } Short;
And the following minimal .clang-format --- ColumnLimit: 10 BreakBeforeBraces: Custom BraceWrapping: AfterEnum: true
To use AfterEnum you must be using "Custom" for BreakBeforeBraces, the result is
enum { A, B, C, D, E, F, G, H, I } Short;
changer AfterEnum to false and you have
enum { A, B, C, D, E, F, G, H, I } Short;
Could you explain using this example what you think is wrong and why? what do you expect to see?
@MyDeveloperDay I expect to see exactly that. clang-format currently does not respect AfterEnum, treating it as always true. This is why I changed UnwrappedLineParser.cpp.
The unit test is incorrect as the style used for the clang-format C(++) unit tests uses BreakBeforeBraces: Attach, which should result in AfterEnum being treated as false, as indicated by the docs I quoted a few messages back. The unit test expects AfterEnum: true behaviour from a style that should result in AfterEnum: false behaviour. This is why I changed the unit test.
Additionally, this bug has already been reported: https://bugs.llvm.org/show_bug.cgi?id=46927
I think we are missing some clarity in this bug as to what the actual problem is, I do agree the test looks wrong,
This seems to be that in "Attach" mode, then AllowShortEnumsOnASingleLine=false doesn't attach the brace.
I'm somewhat struggling to understand how the code change fixes that.. given that Style.BraceWrapping.AfterEnum should be true correct?
clang/lib/Format/UnwrappedLineParser.cpp | ||
---|---|---|
1305 ↗ | (On Diff #314162) | surely this means always add a newline? but that isn't what is wanted from what I can tell. |
I agree on this. I'd like to see a more exhaustive test suite for all the combinations of AfterEnum and AllowShortEnumsOnASingleLine, not only fixing the single wrong test.
This seems to be that in "Attach" mode, then AllowShortEnumsOnASingleLine=false doesn't attach the brace.
That's my understanding as well.
I'm somewhat struggling to understand how the code change fixes that.. given that Style.BraceWrapping.AfterEnum should be true correct?
In Attach mode, AfterEnum is false. Cf. expandPresets https://github.com/llvm/llvm-project/blob/47877c9079c27f19a954b660201ea47717c82fec/clang/lib/Format/Format.cpp#L752.
Said all that, it *seems* to me that the fix is correct apart from the strangely looking if (!Style.isCpp()) { change that I don't really understand. Why should C++ be handled differently in this regard? What am I missing?
This seems to be that in "Attach" mode, then AllowShortEnumsOnASingleLine=false doesn't attach the brace.
That is correct, but the main issue is that AfterEnum: false, which Attach mode implies, doesn't function correctly.
Said all that, it *seems* to me that the fix is correct apart from the strangely looking if (!Style.isCpp()) { change that I don't really understand. Why should C++ be handled differently in this regard? What am I missing?
That code was there before. Here is the code currently in release, including comments:
case tok::kw_enum: // Ignore if this is part of "template <enum ...". if (Previous && Previous->is(tok::less)) { nextToken(); break; } // parseEnum falls through and does not yet add an unwrapped line as an // enum definition can start a structural element. if (!parseEnum()) break; // This only applies for C++. if (!Style.isCpp()) { addUnwrappedLine(); return; } break;
I modified this code in an attempt to fix this bug, but that broke styling in other languages. It turns out this section required no modification. The only changes my revisions at present introduce is the change for the unit test and the modification of the following function (ignoring the // NEW comments):
static bool ShouldBreakBeforeBrace(const FormatStyle &Style, const FormatToken &InitialToken) { if (InitialToken.isOneOf(tok::kw_namespace, TT_NamespaceMacro)) return Style.BraceWrapping.AfterNamespace; if (InitialToken.is(tok::kw_class)) return Style.BraceWrapping.AfterClass; if (InitialToken.is(tok::kw_union)) return Style.BraceWrapping.AfterUnion; if (InitialToken.is(tok::kw_struct)) return Style.BraceWrapping.AfterStruct; if (InitialToken.is(tok::kw_enum)) // NEW return Style.BraceWrapping.AfterEnum; // NEW return false; }
I can add some unit tests for the variations of AllowShortEnumsOnASingleLine and AfterEnum.
If there's anything I can explain better please let me know.
After writing a unit test, I've found that a combination of AfterEnum: true and AllowShortEnumsOnASingleLine: true doesn't function properly even in release.. My next revision will include a fix for that alongside the unit test.
Turns out the true/true bug goes quite deep. I've managed to resolve the first bit of it with a hack that I'm sure will warrant some criticism, but I haven't familiarised myself with this codebase enough to write a cleaner version.
The second issue I'm still resolving. I'll expand on what's going on with this bug once I push a revision.
OK, I'm getting a few more failed unit tests and I'm really not sure what correct formatting behaviour is anymore, so I'll just ask.
Assuming the following settings:
BreakBeforeBraces: Custom BraceWrapping: { AfterEnum: true } AllowShortEnumsOnASingleLine: true
Which of these is correct? (FormatTest.AllmanBraceBreaking)
A (expected in test):
enum X { Y = 0 }
B:
enum X { Y = 0 }
Which of these is correct? (new test)
A:
enum { A, B } Enum1, Enum2
B:
enum { A, B } Enum1, Enum2
C:
enum { A, B } Enum1, Enum2
true/true results in option B of the "new test" section. The strange behaviour with Enum1 and Enum2 occurs when AllowShortEnumsOnASingleLine isn't allowed to shrink a line for whatever reason, be it a comma on the last case, a comment, too long an enum, or AfterEnum: true, which for whatever reason takes precedence.
The docs don't cover corner cases like this from what I can tell. It may be worth doing so in the future so that cases like this aren't implementation-defined.
I think accepting a revision that includes only a fix for AfterEnum being ignored (not the corner case) and the new unit test would be the best way to go about this, since they're separate bugs. Then I can fix the corner case (and compatibility with the new unit test) in a separate differential.
However, as I've already mentioned, I'm new here, so I defer to the judgment of those more experienced.
I'm also quite new, but if that are different issues they should receive their own commit (and thus review).
You should always be clear what is not working correctly and reflect that's working after your change with tests which illustrate the now working code.
You need to have these conversations by adding new unit tests that prove your point, I highly doubt I'll personally be willing to accept any revision without more unit tests than this one line change
And that's why I said:
I think accepting a revision that includes only a fix for AfterEnum being ignored (not the corner case) and the new unit test would be the best way to go about this.
Obviously the current revision isn't sufficient. I'll be submitting a new one shortly.
we don't commit with failing tests so lets understand why it fails.
Can you add the tests without multiple names for the enum
AfterEnum: true currently overrides AllowShortEnumsOnASingleLine: true. If this is epxected behaviour then I'll modify the test to accomodate that, but otherwise, there's a separate issue.
I'm sorry, but now you are missing the files from the review, I think you diffing against your own commits and not commit that are in the repo. This makes the review very difficult to understand.
I don't understand; should every commit I've made be squashed into one and then submitted here?
clang/unittests/Format/FormatTest.cpp | ||
---|---|---|
1346–1347 | keep this test, you should keep one with 1 name and 1 with 2 names |
The test still is there; Git is just showing the diff strangely. I didn't modify that test at all. The corner case bug doesn't affect FormatTest.ShortEnums because that test effectively has AfterEnum: false. Should I add cases for AfterEnum: true in that test too? I had figured the new test took care of that.
Has anyone else had the chance to review this yet? (not trying to be pushy, just curious)
I hadn't had another look, because the CI still shows the test AfterEnum to be failing.
Ping me when it's green please.
I know the test fails; it's due to a separate bug wherein AfterEnum: true and AllowShortEnumsOnASingleLine: true produces incorrect behaviour. The unit test has the correct expectation. I've also already mentioned this.
I could modify the unit test to expect the currently broken behaviour and then fix the test later alongside a fix for the corner case.
In my opinion you should then, either temporarily deactivate the test, or fix the bug first. A failing test blocks the pipeline and confuses everyone working on the project.
+1
I got confused about this. I know that there was some discussion about this failing test but I thought that the plan was to fix it (as it should). Also, that's what one expects in a revision called "Fixed AfterEnum handling" :).
There are two bugs being discussed here. The first is the one fixed here, where AfterEnum is treated as always true. The second I found while adding unit tests for the first bug. I was advised that I shouldn't fix two bugs with a single differential, especially since the second bug has more to do with AllowShortEnumsOnASingleLine and how it is overridden by AfterEnum. (I think a more appropriate title for a diff handling the second bug would be "Fixed AllowShortEnumsOnASingleLine handling".) However, if it is acceptable to do so, then I'll add commits to fix the second bug as well before asking for this to be merged.
I agree with @HazardyKnusperkeks that failing tests should not be merged, and I understand that this patch is not ready yet.
Please let me know what you think would be the best course of action here.
If the bugs are (very) similar, I could live with one fix for both. Otherwise you should fix the other bug first, if its blocking this change.
The only thing linking the bugs is AfterEnum. Other than that I'd say they're separate. If the bug should be fixed first (instead of merging incorrect but passing tests) then I think it should be a separate diff.
My concern with that course of action is that whoever reviews that diff will all but certainly ask for unit tests, and if I wrote unit tests for AfterEnum and AllowShortEnumsOnASingleLine as I have here, they'll still be incorrect because this bug is not merged. Would you foresee this being an issue?
I'll start working on a fix regardless.
Than I would say it should be one change. Omitting a (needed) test just so it passes is not good, and failing tests are worse. Just fix both at the same time.
clang/lib/Format/TokenAnnotator.cpp | ||
---|---|---|
3669–3688 ↗ | (On Diff #330570) | Please refactor this part so that the conditions can short-circuit, and start with checking for !Style.AllowShortEnumsOnASingleLine (the cheapest check) and end with the while loop (the apparently most expensive check). auto isAllowedByShortEnums = [&] () { if (!Style.AllowShortEnumsOnASingleLine) return false; // check isLineTooBig etc. const FormatToken *breakingSearchToken = &Right; while ((breakingSearchToken = breakingSearchToken->Next)) { bool hasBreakingComma = breakingSearchToken->is(tok::comma) && breakingSearchToken->Next->is(tok::r_brace); if (breakingSearchToken->isTrailingComment() || hasBreakingComma) { return true; } } return false; }; return (isAllowedByAfterEnum && isAllowedByShortEnums()) || // ... |
3683 ↗ | (On Diff #330570) | strlen? Please use StringRef::size(). |
clang/unittests/Format/FormatTest.cpp | ||
13321 | That shouldn't be necessary here. |
clang/lib/Format/TokenAnnotator.cpp | ||
---|---|---|
3690–3691 ↗ | (On Diff #330570) | Maybe move that check to the top and return early (without running through the loop and strlen). |
clang/unittests/Format/FormatTest.cpp | ||
13321 | Is it useful to test once with false and once with true? |
clang/lib/Format/TokenAnnotator.cpp | ||
---|---|---|
3683 ↗ | (On Diff #330570) | For FormatToken.TokenText, StringRef::size() is set to the length of the token, not of the stored text. Please see clang/lib/Format/FormatTokenLexer.cpp:1047. Changing that line breaks nearly every format test so I'm assuming that it is correct. A strlen or equivalent is necessary here. |
clang/unittests/Format/FormatTest.cpp | ||
---|---|---|
13321 | Please see the test at clang/unittests/Format/FormatTest.cpp:13416: verifyFormat("enum X\n" "{\n" " Y = 0\n" "}\n", AllmanBraceStyle); The default LLVM style that this test bases its style on has AllowShortEnumsOnASingleLine set to true, which conflicts with this test. Based on what it expects is the resultant formatting, I think it's safe to assume it expects AllowShortEnumsOnASingleLine to be false. |
clang/lib/Format/TokenAnnotator.cpp | ||
---|---|---|
3683 ↗ | (On Diff #330570) | Then it should be something like the line's length, no? Using strlen will be very expensive on non-snippets, as it strlen will traverse the string until its end (so possibly until the end of file) for each invocation of mustBreakBefore (if it goes into this condition of course). I only see one failing check in the test FormatTest.FormatsTypedefEnum when using TokenText.size(): verifyFormat("typedef enum\n" "{\n" " ZERO = 0,\n" " ONE = 1,\n" " TWO = 2,\n" " THREE = 3\n" "} LongEnum;", Style); You might need to add more tests in AfterEnum to test the behaviour of this part if the line is just below/above the limit. Also, that's just a hack, but I made all tests pass with: assert(Line.Last); assert(Line.Last->TokenText.data() >= Right.TokenText.data()); auto isAllowedByShortEnums = [&]() { if (!Style.AllowShortEnumsOnASingleLine || (Line.Last->TokenText.data() - Right.TokenText.data() + Right.TokenText.size() + Right.OriginalColumn) > Style.ColumnLimit) I haven't given it too much thought though and am unsure whether there are cases where the above assertions will fail. |
clang/unittests/Format/FormatTest.cpp | ||
13412–13421 | I would then put AllmanBraceStyle.AllowShortEnumsOnASingleLine = ...; just before these enum tests and put the old value back afterwards. |
clang/lib/Format/TokenAnnotator.cpp | ||
---|---|---|
3683 ↗ | (On Diff #330570) | If I use TokenText.size(), the line length check will always claim that the line is short enough. I'll be adding a unit test for this. However, you're right that a strlen is a bad idea here. I hadn't realised it would consume the entire file. I'll try to figure out a more efficient method of checking the line length. |
clang/lib/Format/TokenAnnotator.cpp | ||
---|---|---|
3679–3680 ↗ | (On Diff #331172) | I don't really understand these conditions on spaces. Could you explain your intent, please? |
clang/lib/Format/TokenAnnotator.cpp | ||
---|---|---|
3679–3680 ↗ | (On Diff #331172) | Repeated spaces, e.g. enum { A, B, C } SomeEnum; are removed during formatting. Since they wouldn't be part of the formatted line, they shouldn't be counted towards the column limit. Only one space need be considered. Removed spaces, e.g. enum{A,B,C}SomeEnum; are handled by the fact that clang-format runs multiple passes. On the first pass, spaces would be added. On the second pass, assuming the line is then too long, the above code would catch it and break up the enum. I'll add unit tests to check if spaces are being handled correctly. |
clang/lib/Format/TokenAnnotator.cpp | ||
---|---|---|
3679–3680 ↗ | (On Diff #331172) | Since you use == ' ' twice, remainingLineCharCount will count only consecutive spaces, right? |
3681 ↗ | (On Diff #331172) | Nit. |
I've found yet another bug. When AllowShortEnumsOnASingleLine is true and a multiline enum is forced (by line length, trailing comma, etc.), multiple names for the enum are placed on separate lines.
Example:
enum { A, B, C, } ShortEnum1, ShortEnum2;
Is refactored to
enum { A, B, C, } ShortEnum1, ShortEnum2;
Instead of the expected
enum { A, B, C, } ShortEnum1, ShortEnum2;
ColumnLimit is not causing this.
When AllowShortEnumsOnASingleLine is false, the expected behaviour occurs. This affects a unit test I added, so I'll be fixing this as well in this diff.
clang/lib/Format/TokenAnnotator.cpp | ||
---|---|---|
3679–3680 ↗ | (On Diff #331172) | Ah yes, that's my bad. Must have made a typo. Fixed in the next commit. |
enum { A, B, C, } ShortEnum1, ShortEnum2;
I've seen this before maybe with regard to something else, but can't quite recall. (maybe a bug in the bug tracker)
I applied this fix locally to a branch based off llvm 11.x and the FormatTest.FormatsTypedefEnum test now fails.
So this test is failing:
TEST_F(FormatTest, LongEnum) { FormatStyle Style = getLLVMStyle(); Style.AllowShortEnumsOnASingleLine = true; Style.ColumnLimit = 40; Style.BreakBeforeBraces = FormatStyle::BS_Custom; Style.BraceWrapping.AfterEnum = false; verifyFormat("enum {\n" " ZERO = 0,\n" " ONE = 1,\n" " TWO = 2,\n" " THREE = 3\n" "} LongEnum;", Style); Style.BreakBeforeBraces = FormatStyle::BS_Custom; Style.BraceWrapping.AfterEnum = true; verifyFormat("enum\n" "{\n" " ZERO = 0,\n" " ONE = 1,\n" " TWO = 2,\n" " THREE = 3\n" "} LongEnum;", Style); }
It fails in the second case because we don't respect the 'AfterEnum = true' and collapse the brace. It appears there is buggy logic in the remainingLineCharCount stuff which others have already been commenting on
Apologies, late to the party here... this fix suggested by curdeius does indeed fix the issue:
Since you use == ' ' twice, remainingLineCharCount will count only consecutive spaces, right?
But you want to count other characters, no?
So, IIUC, the condition you want is rF[wI] != '\n' && !(rF[wI] == ' ' && rF[wI - 1] == ' ') (mind the negation). It will count characters other than a newline and it will only count a series of consecutive spaces as one char. WDYT?
clang/lib/Format/TokenAnnotator.cpp | ||
---|---|---|
3679–3680 ↗ | (On Diff #331172) |
|
Since you use == ' ' twice, remainingLineCharCount will count only consecutive spaces, right?
But you want to count other characters, no?
So, IIUC, the condition you want is rF[wI] != '\n' && !(rF[wI] == ' ' && rF[wI - 1] == ' ') (mind the negation). It will count characters other than a newline and it will only count a series of consecutive spaces as one char. WDYT?
That whole while loop leaves me feeling cold, it feels like a bug waiting to happen, frankly I don't understand why its necessary. It would be better to move the loop into a function and properly unit test it if there is really no other way
I'm running into this bug too.
typedef enum Blah { One = 1, } Blah;
becomes
typedef enum Blah { One = 1, } Blah;
with BraceWrapping.AfterEnum set to true or false, and AllowShortEnumsOnASingleLine set to false; but with AllowShortEnumsOnASingleLine set to true and BraceWrapping.AfterEnum set to true it works.
Only problem is, I never want short enums to be on a single line.
I feel like I ran into a similar bug when I contributed the patch to add BraceWrapping.AfterExternBlock; something else I don't recall off the top of my head, was messing with that setting too.
and the solution was to add an if statement around that call to addUnwrappedLine().
Not trying to take over this patch, because I've got a ton of patches I need to be finishing up myself.
but I think the problematic code is in UnwrappedLineParser.cpp:1900
if (FormatTok->Tok.getKind() == ClosingBraceKind) { if (IsEnum && !Style.AllowShortEnumsOnASingleLine) addUnwrappedLine(); nextToken(); return !HasError; }
I'd change that if statement to
if (FormatTok->Tok.getKind() == ClosingBraceKind) { if (IsEnum && (!Style.AllowShortEnumsOnASingleLine || !Style.BraceWrapping.AfterEnum)) addUnwrappedLine(); nextToken(); return !HasError; }
I haven't tried this code at all, just a gut feeling based on my experience fixing the extern block wrapping; hopefully it helps point you in the right direction.
keep this test, you should keep one with 1 name and 1 with 2 names