Page MenuHomePhabricator

[clang-format] [PR45218] Fix an issue where < and > and >> in a for loop gets incorrectly interpreted at a TemplateOpener/Closer
AbandonedPublic

Authored by MyDeveloperDay on May 2 2020, 9:36 AM.

Details

Summary

Fix to address https://bugs.llvm.org/show_bug.cgi?id=45218

The <> in the following code means the i < bits and v > in the following code gets incorrectly determined as a TemplateOpener/Closer (see debug output below)

While this became more prevalent after D66332: [clang-format] Fix the bug that joins template closer and > or >> this isn't the root cause, but instead as called out by PR45218 it's more to do with parseAngle() and how it labels < and >

`
for (unsigned i = 0; i < nbits; ++i, bIt.next(), v = v >> 1)
`

This can be seen by the following debug output for the line, where we can see that the <> have the wrong T value.

The tests pass and various directories within LLVM which are currently already clang-formatted don't show any new errors (although that far from exhaustive testing)

M=0 C=0 T=Unknown S=1 B=0 BK=0 P=0 Name=for L=3 PPK=2 FakeLParens= FakeRParens=0 II=0x1e0bf1b7c80 Text='for'
M=0 C=0 T=Unknown S=1 B=0 BK=0 P=23 Name=l_paren L=5 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='('
M=0 C=1 T=Unknown S=0 B=0 BK=0 P=1040 Name=unsigned L=13 PPK=2 FakeLParens=2/0/ FakeRParens=0 II=0x1e0bf1b7fc0 Text='unsigned'
M=0 C=1 T=StartOfName S=1 B=0 BK=0 P=240 Name=identifier L=15 PPK=2 FakeLParens= FakeRParens=0 II=0x1e0bf1f12a0 Text='i'
M=0 C=0 T=BinaryOperator S=1 B=0 BK=0 P=42 Name=equal L=17 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='='
M=0 C=1 T=Unknown S=1 B=0 BK=0 P=44 Name=numeric_constant L=19 PPK=2 FakeLParens= FakeRParens=1 II=0x0 Text='0'
M=0 C=0 T=Unknown S=0 B=0 BK=0 P=43 Name=semi L=20 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=';'
M=0 C=1 T=Unknown S=1 B=0 BK=0 P=40 Name=identifier L=22 PPK=2 FakeLParens=10/ FakeRParens=0 II=0x1e0bf1f12a0 Text='i'
M=0 C=0 T=TemplateOpener S=0 B=0 BK=0 P=50 Name=less L=23 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='<'
M=0 C=1 T=Unknown S=0 B=0 BK=0 P=380 Name=identifier L=28 PPK=2 FakeLParens=0/ FakeRParens=0 II=0x1e0bf1f12d0 Text='nbits'
M=0 C=0 T=Unknown S=0 B=0 BK=0 P=283 Name=semi L=29 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=';'
M=0 C=1 T=UnaryOperator S=1 B=0 BK=0 P=280 Name=plusplus L=32 PPK=2 FakeLParens=0/1/ FakeRParens=0 II=0x0 Text='++'
M=0 C=0 T=Unknown S=0 B=0 BK=0 P=340 Name=identifier L=33 PPK=2 FakeLParens= FakeRParens=1 II=0x1e0bf1f12a0 Text='i'
M=0 C=0 T=Unknown S=0 B=0 BK=0 P=281 Name=comma L=34 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=','
M=0 C=1 T=Unknown S=1 B=0 BK=0 P=281 Name=identifier L=38 PPK=2 FakeLParens=0/ FakeRParens=0 II=0x1e0bf1f1300 Text='bIt'
M=0 C=1 T=Unknown S=0 B=0 BK=0 P=430 Name=period L=39 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='.'
M=0 C=0 T=Unknown S=0 B=0 BK=0 P=283 Name=identifier L=43 PPK=2 FakeLParens= FakeRParens=0 II=0x1e0bf1f1330 Text='next'
M=0 C=0 T=Unknown S=0 B=0 BK=0 P=283 Name=l_paren L=44 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='('
M=0 C=0 T=Unknown S=0 B=0 BK=0 P=319 Name=r_paren L=45 PPK=2 FakeLParens= FakeRParens=1 II=0x0 Text=')'
M=0 C=0 T=Unknown S=0 B=0 BK=0 P=281 Name=comma L=46 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=','
M=0 C=1 T=Unknown S=1 B=0 BK=0 P=281 Name=identifier L=48 PPK=2 FakeLParens=2/ FakeRParens=0 II=0x1e0bf1f1360 Text='v'
M=0 C=0 T=BinaryOperator S=1 B=0 BK=0 P=282 Name=equal L=50 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='='
M=0 C=1 T=Unknown S=1 B=0 BK=0 P=284 Name=identifier L=52 PPK=2 FakeLParens= FakeRParens=3 II=0x1e0bf1f1360 Text='v'
M=0 C=0 T=TemplateCloser S=0 B=0 BK=0 P=290 Name=greater L=53 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='>'
M=0 C=0 T=BinaryOperator S=0 B=0 BK=0 P=50 Name=greater L=54 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='>'
M=0 C=1 T=Unknown S=1 B=0 BK=0 P=50 Name=numeric_constant L=56 PPK=2 FakeLParens= FakeRParens=2 II=0x0 Text='1'
M=0 C=0 T=Unknown S=0 B=0 BK=0 P=43 Name=r_paren L=57 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=')'

This fix tries to avoid that issue by determining that a ";" between the < and > would not be a template (I couldn't think of an example where that would be the case, but I'm sure there are..

Afterwards the token are interpreted correctly for this line. (

M=0 C=0 T=Unknown S=1 B=0 BK=0 P=0 Name=for L=3 PPK=2 FakeLParens= FakeRParens=0 II=0x1c891ac41b0 Text='for'
M=0 C=0 T=Unknown S=1 B=0 BK=0 P=23 Name=l_paren L=5 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='('
M=0 C=1 T=Unknown S=0 B=0 BK=0 P=1040 Name=unsigned L=13 PPK=2 FakeLParens=2/0/ FakeRParens=0 II=0x1c891ac44f0 Text='unsigned'
M=0 C=1 T=StartOfName S=1 B=0 BK=0 P=240 Name=identifier L=15 PPK=2 FakeLParens= FakeRParens=0 II=0x1c891abce10 Text='i'
M=0 C=0 T=BinaryOperator S=1 B=0 BK=0 P=42 Name=equal L=17 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='='
M=0 C=1 T=Unknown S=1 B=0 BK=0 P=44 Name=numeric_constant L=19 PPK=2 FakeLParens= FakeRParens=1 II=0x0 Text='0'
M=0 C=0 T=Unknown S=0 B=0 BK=0 P=43 Name=semi L=20 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=';'
M=0 C=1 T=Unknown S=1 B=0 BK=0 P=40 Name=identifier L=22 PPK=2 FakeLParens=10/ FakeRParens=0 II=0x1c891abce10 Text='i'
M=0 C=0 T=BinaryOperator S=1 B=0 BK=0 P=50 Name=less L=24 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='<'
M=0 C=1 T=Unknown S=1 B=0 BK=0 P=50 Name=identifier L=30 PPK=2 FakeLParens= FakeRParens=1 II=0x1c891abce40 Text='nbits'
M=0 C=0 T=Unknown S=0 B=0 BK=0 P=43 Name=semi L=31 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=';'
M=0 C=1 T=UnaryOperator S=1 B=0 BK=0 P=40 Name=plusplus L=34 PPK=2 FakeLParens=0/1/ FakeRParens=0 II=0x0 Text='++'
M=0 C=0 T=Unknown S=0 B=0 BK=0 P=100 Name=identifier L=35 PPK=2 FakeLParens= FakeRParens=1 II=0x1c891abce10 Text='i'
M=0 C=0 T=Unknown S=0 B=0 BK=0 P=41 Name=comma L=36 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=','
M=0 C=1 T=Unknown S=1 B=0 BK=0 P=41 Name=identifier L=40 PPK=2 FakeLParens=0/ FakeRParens=0 II=0x1c891abce70 Text='bIt'
M=0 C=1 T=Unknown S=0 B=0 BK=0 P=190 Name=period L=41 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='.'
M=0 C=0 T=Unknown S=0 B=0 BK=0 P=43 Name=identifier L=45 PPK=2 FakeLParens= FakeRParens=0 II=0x1c891abcea0 Text='next'
M=0 C=0 T=Unknown S=0 B=0 BK=0 P=43 Name=l_paren L=46 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='('
M=0 C=0 T=Unknown S=0 B=0 BK=0 P=79 Name=r_paren L=47 PPK=2 FakeLParens= FakeRParens=1 II=0x0 Text=')'
M=0 C=0 T=Unknown S=0 B=0 BK=0 P=41 Name=comma L=48 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=','
M=0 C=1 T=Unknown S=1 B=0 BK=0 P=41 Name=identifier L=50 PPK=2 FakeLParens=2/ FakeRParens=0 II=0x1c891abced0 Text='v'
M=0 C=0 T=BinaryOperator S=1 B=0 BK=0 P=42 Name=equal L=52 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='='
M=0 C=1 T=Unknown S=1 B=0 BK=0 P=44 Name=identifier L=54 PPK=2 FakeLParens=10/ FakeRParens=0 II=0x1c891abced0 Text='v'
M=0 C=0 T=BinaryOperator S=1 B=0 BK=0 P=50 Name=greater L=56 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='>'
M=0 C=0 T=BinaryOperator S=0 B=0 BK=0 P=50 Name=greater L=57 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='>'
M=0 C=1 T=Unknown S=1 B=0 BK=0 P=50 Name=numeric_constant L=59 PPK=2 FakeLParens= FakeRParens=4 II=0x0 Text='1'
M=0 C=0 T=Unknown S=0 B=0 BK=0 P=43 Name=r_paren L=60 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=')'

Diff Detail

Event Timeline

MyDeveloperDay created this revision.May 2 2020, 9:36 AM
MyDeveloperDay edited the summary of this revision. (Show Details)

Update for pre-merge checks

Couple examples:

void foo(int x) {
for (unsigned int i = 0; i < x >> 1; i++) { }
}

and

int i = 0;

if (i < x >> 1) {}

}

Couple examples:

void foo(int x) {
for (unsigned int i = 0; i < x >> 1; i++) { }
}

and

int i = 0;

if (i < x >> 1) {}

}

I don't deny there are places where this will fail, but that is clang-format all over its a series of rules or the arrangement of token to produce the desired result

I think we also have to say.. "who writes code like this" ;-)

They're overly reduced examples ;) but embedded programmers tend to make heavy use of shift operators, including within for loops and if stmts

Add more test cases to try to ensure TemplateOpener/Closer type isn't assigned incorrectly.

This fix tries to avoid that issue by determining that a ";" between the < and > would not be a template (I couldn't think of an example where that would be the case, but I'm sure there are..

For what is worth, with lambdas in unevaluated context (C++20), you can get a semicolon between template opener/closer easily:

some_templated_type<decltype([](int i) { return i; })>

But I'm not sure if it's something that may break anything here.

Another idea in which a semicolon occurs between <> would be a macro X(;).

clang/lib/Format/TokenAnnotator.cpp
40

I know you haven't changed this, but...
Static function (internal linkage) in anonymous namespace?

46

Ditto (static).

1986

There should be no comment, it's the ending brace of a class.

MyDeveloperDay marked 3 inline comments as done.

Remove tok::semi check as its not actually needed now we label the < and > correctly

MyDeveloperDay marked an inline comment as done.May 4 2020, 12:40 AM

For what is worth, with lambdas in unevaluated context (C++20), you can get a semicolon between template opener/closer easily:

I actually no longer need the check for the semi if the TemplateOpener/Closer is correctly labelled

clang/lib/Format/TokenAnnotator.cpp
40

I believe the guideline is to not use anonymous namespaces for functions

https://llvm.org/docs/CodingStandards.html#anonymous-namespaces

(not my personal preference mind)

I think these examples are too ambiguous for clang-format to try and format correctly.
Sadly for C++, correctly distinguishing between >>, the binary operator, and >>, the sequence of two closing template brackets, requires semantic analysis, I believe.
Such heuristics are very brittle, e.g. this patch fixes if (i < x >> 1) but regresses this:

% cat ~/test.cc
bool f() {
  if (w<u<v<x>>, 1>::t) return true;
}
% bin/clang-format ~/test.cc 
bool f() {
  if (w < u < v < x >>, 1 > ::t)
    return true;
}
%

We can of course enumerate a fixed set of patterns where we apply such fixes fixes, but I'm not sure how much is that worth.
I think clang-format's current approach to choose to interpret as a template, even if silly at times, is reasonable.
A way to deal with if (i < x >> 1) is to give clang-format a hint e.g. if (i < (x >> 1)). You can ~always surround an expression with braces, but if clang-format guesses incorrectly two template closers as a binary operator, I can't think of a good way to force it to stop.

MyDeveloperDay updated this revision to Diff 262727.EditedMay 7 2020, 12:26 PM

Handle addition case

if (w<u<v<x>>, 1>::t) return true;

I do tend to agree with you, but at least for this case I think it can also be handled. It could of course just be the tip of the iceberg, but let me at least update the patch here to cover those cases.

hans added a comment.Aug 18 2020, 6:44 AM

I suppose this is just piling on heuristics, but I guess it fixes the reported bug. Krasimir, what do you think?