This serves to augment the improvements made in https://reviews.llvm.org/D86581. It prevents clang-format from interpreting bitshift operators as template arguments in certain circumstances. This is an attempt at fixing https://bugs.llvm.org/show_bug.cgi?id=49868
Details
Diff Detail
Event Timeline
clang/unittests/Format/FormatTest.cpp | ||
---|---|---|
7652 | IMO you should use "test < a | b >> c;" as your test case here, to reassure the reader that it doesn't depend on the fact that ... 1; is visibly not a variable declaration. |
clang/unittests/Format/FormatTest.cpp | ||
---|---|---|
7652 | Btw, a much-bigger-scope way to fix this would be to teach clang-format about "input encoding" versus "output encoding." The only time clang-format should ever be inserting space in the middle of >> is if it's translating C++11-encoded input into C++03-encoded output. If the input is known to already be C++03-encoded, then breaking up an >> token into a pair of > > tokens is guaranteed to introduce a bug. |
clang/unittests/Format/FormatTest.cpp | ||
---|---|---|
7652 | Thanks for the feedback. Your 2 test suggestions make sense to me; I've updated the patch diff. I hadn't considered teaching clang-format input encoding, but that does sound like the preferable long term solution. This patch is intended to be a lightweight fix to fix a very narrow use case. |
clang/lib/Format/TokenAnnotator.cpp | ||
---|---|---|
125 ↗ | (On Diff #338648) | I don't really understand what we are saying here? |
clang/lib/Format/TokenAnnotator.cpp | ||
---|---|---|
125 ↗ | (On Diff #338648) | Effectively we are checking that, barring intervening whitespace, we are analyzing 2 consecutive '>' tokens. If so, we treat such sequence as a binary op in lieu of a closing template angle bracket. If there's another more straightforward way of accomplishing this check, I'm open to that, but this seemed to be the most straightforward way at the time. |
Additionally; barring any other feedback, I'll need someone to land this change as I do not have commit access.
clang/lib/Format/TokenAnnotator.cpp | ||
---|---|---|
125 ↗ | (On Diff #338648) | I'm worried that this may regress template code. How does this account for cases where two consecutive >-s are really two closing template brackets, e.g., In particular, one added test case is ambiguous: >> could really be two closing template brackets: I have to say that my general feeling about trying to disambiguate between bitshifts and template closers is: don't try too hard inside clang-format as the heuristics are generally quite brittle and make the code harder to maintain; in cases where clang-format wrongly detects bitshift as templates, users should add parens around the bitshift, which IMO improves readability. |
clang/lib/Format/TokenAnnotator.cpp | ||
---|---|---|
125 ↗ | (On Diff #338648) | As this patch currently stands, it does not disambiguate between bitshift '>>' operators and 2 closing template brackets, so in your snippet, we would no longer insert a space between the '>' characters (despite arguably being the better formatting decision in this case). I agree with your feeling that user guided disambiguation between bitshift operators and template closing brackets via parens is the ideal solution and also improves readability, but IMO the approach taken by clang-format to format the '>' token should be conservative in that any change made should be non-semantic altering, which is not presently the case. While the case you mentioned would regress, we would no longer potentially alter program semantics. Thinking about this more, would it make sense to modify the actual white-space change generation later on in the analysis to not break up >> sequences of characters in lieu of annotating the tokens differently as the proposed patch is currently doing? |
clang/lib/Format/TokenAnnotator.cpp | ||
---|---|---|
125 ↗ | (On Diff #338648) | I tried and can't make this misinterpret two consecutive template > as a bit shift, IMO because this check is guarded by the Left->ParentBracket != tok::less condition. Both std::vector<std::decay_t<int&>> v; and test<test<a | b>> c; below are handled correctly. |
clang/lib/Format/TokenAnnotator.cpp | ||
---|---|---|
125 ↗ | (On Diff #338648) | I had come to the same conclusion when modifying the conditional here; namely the ParentBracket predicate is what catches the case you were alluding to earlier. |
Friendly reminder that I need someone to land this for me as I do not have commit access.
IMO you should use "test < a | b >> c;" as your test case here, to reassure the reader that it doesn't depend on the fact that ... 1; is visibly not a variable declaration.
Personally I'd also like to see "test<test<a | b>> c;" tested on the very next line, to show off the intended difference between the two. (Assuming that I understand the intent of this patch correctly.)
(I also switched to a bitwise operator just for the heck of it; that makes this expression just a very tiny bit less implausible — but still highly implausible, to the point where I question why we're special-casing it.)