Download Raw Diff

Details

Reviewers

Saldivarcher
MyDeveloperDay
JakeMerdichAMD
krasimir

Commits

rG8fa56f7ededc: [clang-format] Prevent extraneous space insertion in bitshift operators

Summary

This serves to augment the improvements made in https://reviews.llvm.org/D86581. It prevents clang-format from interpreting bitshift operators as template arguments in certain circumstances. This is an attempt at fixing https://bugs.llvm.org/show_bug.cgi?id=49868

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

penagos requested review of this revision.Apr 19 2021, 11:17 AM

penagos created this revision.

Herald added a project: Restricted Project. · View Herald TranscriptApr 19 2021, 11:17 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

penagos edited the summary of this revision. (Show Details)Apr 19 2021, 11:21 AM

• Quuxplusone added a subscriber: • Quuxplusone.Apr 19 2021, 11:56 AM

• Quuxplusone added inline comments.

clang/unittests/Format/FormatTest.cpp
8010	IMO you should use `"test < a \| b >> c;"` as your test case here, to reassure the reader that it doesn't depend on the fact that `... 1;` is visibly not a variable declaration. Personally I'd also like to see `"test<test<a \| b>> c;"` tested on the very next line, to show off the intended difference between the two. (Assuming that I understand the intent of this patch correctly.) (I also switched to a bitwise operator just for the heck of it; that makes this expression just a very tiny bit less implausible — but still highly implausible, to the point where I question why we're special-casing it.)

• Quuxplusone added inline comments.Apr 19 2021, 12:02 PM

clang/unittests/Format/FormatTest.cpp
8010	Btw, a much-bigger-scope way to fix this would be to teach clang-format about "input encoding" versus "output encoding." The only time clang-format should ever be inserting space in the middle of `>>` is if it's translating C++11-encoded input into C++03-encoded output. If the input is known to already be C++03-encoded, then breaking up an `>>` token into a pair of `> >` tokens is guaranteed to introduce a bug. Right now, my impression is that clang-format has a concept of "output encoding" (i.e. "language mode") but has no way of knowing the "input encoding."

Harbormaster completed remote builds in B99518: Diff 338570.Apr 19 2021, 12:29 PM

penagos updated this revision to Diff 338641.Apr 19 2021, 2:20 PM

Harbormaster completed remote builds in B99565: Diff 338641.Apr 19 2021, 2:26 PM

penagos marked an inline comment as done.Apr 19 2021, 2:38 PM

penagos added inline comments.

clang/unittests/Format/FormatTest.cpp
8010	Thanks for the feedback. Your 2 test suggestions make sense to me; I've updated the patch diff. I hadn't considered teaching clang-format input encoding, but that does sound like the preferable long term solution. This patch is intended to be a lightweight fix to fix a very narrow use case.

Update Format test

Harbormaster completed remote builds in B99570: Diff 338648.Apr 19 2021, 3:46 PM

MyDeveloperDay accepted this revision.Apr 21 2021, 1:53 AM

MyDeveloperDay added inline comments.

clang/lib/Format/TokenAnnotator.cpp
129	I don't really understand what we are saying here?

This revision is now accepted and ready to land.Apr 21 2021, 1:53 AM

penagos added inline comments.Apr 21 2021, 7:26 AM

clang/lib/Format/TokenAnnotator.cpp
129	Effectively we are checking that, barring intervening whitespace, we are analyzing 2 consecutive '>' tokens. If so, we treat such sequence as a binary op in lieu of a closing template angle bracket. If there's another more straightforward way of accomplishing this check, I'm open to that, but this seemed to be the most straightforward way at the time.

Additionally; barring any other feedback, I'll need someone to land this change as I do not have commit access.

krasimir added inline comments.Apr 22 2021, 2:44 AM

clang/lib/Format/TokenAnnotator.cpp
129	I'm worried that this may regress template code. How does this account for cases where two consecutive `>`-s are really two closing template brackets, e.g., `std::vector<std::decay_t<int& >> v;`? In particular, one added test case is ambiguous: `>>` could really be two closing template brackets: https://godbolt.org/z/v19hj9vKn I have to say that my general feeling about trying to disambiguate between bitshifts and template closers is: don't try too hard inside clang-format as the heuristics are generally quite brittle and make the code harder to maintain; in cases where clang-format wrongly detects bitshift as templates, users should add parens around the bitshift, which IMO improves readability.

penagos added inline comments.Apr 22 2021, 1:47 PM

clang/lib/Format/TokenAnnotator.cpp
129	As this patch currently stands, it does not disambiguate between bitshift '>>' operators and 2 closing template brackets, so in your snippet, we would no longer insert a space between the '>' characters (despite arguably being the better formatting decision in this case). I agree with your feeling that user guided disambiguation between bitshift operators and template closing brackets via parens is the ideal solution and also improves readability, but IMO the approach taken by clang-format to format the '>' token should be conservative in that any change made should be non-semantic altering, which is not presently the case. While the case you mentioned would regress, we would no longer potentially alter program semantics. Thinking about this more, would it make sense to modify the actual white-space change generation later on in the analysis to not break up >> sequences of characters in lieu of annotating the tokens differently as the proposed patch is currently doing?

krasimir added inline comments.Apr 26 2021, 2:33 AM

clang/lib/Format/TokenAnnotator.cpp
129	I tried and can't make this misinterpret two consecutive template `>` as a bit shift, IMO because this check is guarded by the `Left->ParentBracket != tok::less` condition. Both `std::vector<std::decay_t<int&>> v;` and `test<test<a \| b>> c;` below are handled correctly. I'm less worried about regressions in common template cases now. Thank you for pointing out altering program semantics, I agree. Please add a comment about this tradeoff and and a bit of the reasoning behind it in code for future reference.

Add justification comment for changes in parseAngle()

penagos added inline comments.Apr 26 2021, 2:47 PM

clang/lib/Format/TokenAnnotator.cpp
129	I had come to the same conclusion when modifying the conditional here; namely the ParentBracket predicate is what catches the case you were alluding to earlier. I've added a brief comment to `parseAngle()` to document the need for the change, explaining the conservative nature of the change w.r.t. nested template cases; thank you for the suggestion.

Harbormaster completed remote builds in B101030: Diff 340656.Apr 26 2021, 4:09 PM

krasimir accepted this revision.Apr 27 2021, 1:26 AM

Friendly reminder that I need someone to land this for me as I do not have commit access.

@penagos, I'll submit this for you.

Closed by commit rG8fa56f7ededc: [clang-format] Prevent extraneous space insertion in bitshift operators (authored by penagos, committed by krasimir). · Explain WhyMay 4 2021, 3:29 AM

This revision was automatically updated to reflect the committed changes.

krasimir added a commit: rG8fa56f7ededc: [clang-format] Prevent extraneous space insertion in bitshift operators.

Backl1ght mentioned this in D140843: [clang-format] fix template closer followed by >.Jan 2 2023, 7:23 AM

Diff 342690

clang/lib/Format/TokenAnnotator.cpp

Show First 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	bool parseAngle() {

if (Style.Language == FormatStyle::LK_Java &&		if (Style.Language == FormatStyle::LK_Java &&
CurrentToken->is(tok::question))		CurrentToken->is(tok::question))
next();		next();

while (CurrentToken) {		while (CurrentToken) {
if (CurrentToken->is(tok::greater)) {		if (CurrentToken->is(tok::greater)) {
// Try to do a better job at looking for ">>" within the condition of		// Try to do a better job at looking for ">>" within the condition of
// a statement.		// a statement. Conservatively insert spaces between consecutive ">"
		// tokens to prevent splitting right bitshift operators and potentially
		// altering program semantics. This check is overly conservative and
		// will prevent spaces from being inserted in select nested template
		// parameter cases, but should not alter program semantics.
if (CurrentToken->Next && CurrentToken->Next->is(tok::greater) &&		if (CurrentToken->Next && CurrentToken->Next->is(tok::greater) &&
Left->ParentBracket != tok::less &&		Left->ParentBracket != tok::less &&
isKeywordWithCondition(*Line.First))		(isKeywordWithCondition(*Line.First) \|\|
		CurrentToken->getStartOfNonWhitespace() ==
		CurrentToken->Next->getStartOfNonWhitespace().getLocWithOffset(
		-1)))
		MyDeveloperDayUnsubmitted Not Done Reply Inline Actions I don't really understand what we are saying here? MyDeveloperDay: I don't really understand what we are saying here?
		penagosAuthorUnsubmitted Done Reply Inline Actions Effectively we are checking that, barring intervening whitespace, we are analyzing 2 consecutive '>' tokens. If so, we treat such sequence as a binary op in lieu of a closing template angle bracket. If there's another more straightforward way of accomplishing this check, I'm open to that, but this seemed to be the most straightforward way at the time. penagos: Effectively we are checking that, barring intervening whitespace, we are analyzing 2…
		krasimirUnsubmitted Not Done Reply Inline Actions I'm worried that this may regress template code. How does this account for cases where two consecutive `>`-s are really two closing template brackets, e.g., `std::vector<std::decay_t<int& >> v;`? In particular, one added test case is ambiguous: `>>` could really be two closing template brackets: https://godbolt.org/z/v19hj9vKn I have to say that my general feeling about trying to disambiguate between bitshifts and template closers is: don't try too hard inside clang-format as the heuristics are generally quite brittle and make the code harder to maintain; in cases where clang-format wrongly detects bitshift as templates, users should add parens around the bitshift, which IMO improves readability. krasimir: I'm worried that this may regress template code. How does this account for cases where two…
		penagosAuthorUnsubmitted Done Reply Inline Actions As this patch currently stands, it does not disambiguate between bitshift '>>' operators and 2 closing template brackets, so in your snippet, we would no longer insert a space between the '>' characters (despite arguably being the better formatting decision in this case). I agree with your feeling that user guided disambiguation between bitshift operators and template closing brackets via parens is the ideal solution and also improves readability, but IMO the approach taken by clang-format to format the '>' token should be conservative in that any change made should be non-semantic altering, which is not presently the case. While the case you mentioned would regress, we would no longer potentially alter program semantics. Thinking about this more, would it make sense to modify the actual white-space change generation later on in the analysis to not break up >> sequences of characters in lieu of annotating the tokens differently as the proposed patch is currently doing? penagos: As this patch currently stands, it does not disambiguate between bitshift '>>' operators and 2…
		krasimirUnsubmitted Not Done Reply Inline Actions I tried and can't make this misinterpret two consecutive template `>` as a bit shift, IMO because this check is guarded by the `Left->ParentBracket != tok::less` condition. Both `std::vector<std::decay_t<int&>> v;` and `test<test<a \| b>> c;` below are handled correctly. I'm less worried about regressions in common template cases now. Thank you for pointing out altering program semantics, I agree. Please add a comment about this tradeoff and and a bit of the reasoning behind it in code for future reference. krasimir: I tried and can't make this misinterpret two consecutive template `>` as a bit shift, IMO…
		penagosAuthorUnsubmitted Done Reply Inline Actions I had come to the same conclusion when modifying the conditional here; namely the ParentBracket predicate is what catches the case you were alluding to earlier. I've added a brief comment to `parseAngle()` to document the need for the change, explaining the conservative nature of the change w.r.t. nested template cases; thank you for the suggestion. penagos: I had come to the same conclusion when modifying the conditional here; namely the ParentBracket…
return false;		return false;
Left->MatchingParen = CurrentToken;		Left->MatchingParen = CurrentToken;
CurrentToken->MatchingParen = Left;		CurrentToken->MatchingParen = Left;
// In TT_Proto, we must distignuish between:		// In TT_Proto, we must distignuish between:
// map<key, value>		// map<key, value>
// msg < item: data >		// msg < item: data >
// msg: < item: data >		// msg: < item: data >
// In TT_TextProto, map<key, value> does not occur.		// In TT_TextProto, map<key, value> does not occur.
▲ Show 20 Lines • Show All 4,026 Lines • Show Last 20 Lines

clang/unittests/Format/FormatTest.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,001 Lines • ▼ Show 20 Lines	TEST_F(FormatTest, UnderstandsTemplateParameters) {
verifyFormat("bool b = a<1> >= 1;");		verifyFormat("bool b = a<1> >= 1;");
verifyFormat("int i = a<1> >> 1;");		verifyFormat("int i = a<1> >> 1;");
FormatStyle Style = getLLVMStyle();		FormatStyle Style = getLLVMStyle();
Style.SpaceBeforeAssignmentOperators = false;		Style.SpaceBeforeAssignmentOperators = false;
verifyFormat("bool b= a<1> == 1;", Style);		verifyFormat("bool b= a<1> == 1;", Style);
verifyFormat("a<int> = 1;", Style);		verifyFormat("a<int> = 1;", Style);
verifyFormat("a<int> >>= 1;", Style);		verifyFormat("a<int> >>= 1;", Style);

		verifyFormat("test < a \| b >> c;");
		QuuxplusoneUnsubmitted Done Reply Inline Actions IMO you should use `"test < a \| b >> c;"` as your test case here, to reassure the reader that it doesn't depend on the fact that `... 1;` is visibly not a variable declaration. Personally I'd also like to see `"test<test<a \| b>> c;"` tested on the very next line, to show off the intended difference between the two. (Assuming that I understand the intent of this patch correctly.) (I also switched to a bitwise operator just for the heck of it; that makes this expression just a very tiny bit less implausible — but still highly implausible, to the point where I question why we're special-casing it.) Quuxplusone: IMO you should use `"test < a \| b >> c;"` as your test case here, to reassure the reader that…
		QuuxplusoneUnsubmitted Not Done Reply Inline Actions Btw, a much-bigger-scope way to fix this would be to teach clang-format about "input encoding" versus "output encoding." The only time clang-format should ever be inserting space in the middle of `>>` is if it's translating C++11-encoded input into C++03-encoded output. If the input is known to already be C++03-encoded, then breaking up an `>>` token into a pair of `> >` tokens is guaranteed to introduce a bug. Right now, my impression is that clang-format has a concept of "output encoding" (i.e. "language mode") but has no way of knowing the "input encoding." Quuxplusone: Btw, a much-bigger-scope way to fix this would be to teach clang-format about "input encoding"…
		penagosAuthorUnsubmitted Done Reply Inline Actions Thanks for the feedback. Your 2 test suggestions make sense to me; I've updated the patch diff. I hadn't considered teaching clang-format input encoding, but that does sound like the preferable long term solution. This patch is intended to be a lightweight fix to fix a very narrow use case. penagos: Thanks for the feedback. Your 2 test suggestions make sense to me; I've updated the patch diff.
		verifyFormat("test<test<a \| b>> c;");
verifyFormat("test >> a >> b;");		verifyFormat("test >> a >> b;");
verifyFormat("test << a >> b;");		verifyFormat("test << a >> b;");

verifyFormat("f<int>();");		verifyFormat("f<int>();");
verifyFormat("template <typename T> void f() {}");		verifyFormat("template <typename T> void f() {}");
verifyFormat("struct A<std::enable_if<sizeof(T2) < sizeof(int32)>::type>;");		verifyFormat("struct A<std::enable_if<sizeof(T2) < sizeof(int32)>::type>;");
verifyFormat("struct A<std::enable_if<sizeof(T2) ? sizeof(int32) : "		verifyFormat("struct A<std::enable_if<sizeof(T2) ? sizeof(int32) : "
"sizeof(char)>::type>;");		"sizeof(char)>::type>;");
▲ Show 20 Lines • Show All 12,461 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Prevent extraneous space insertion in bitshift operators
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 342690

clang/lib/Format/TokenAnnotator.cpp

clang/unittests/Format/FormatTest.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Prevent extraneous space insertion in bitshift operatorsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 342690

clang/lib/Format/TokenAnnotator.cpp

clang/unittests/Format/FormatTest.cpp

[clang-format] Prevent extraneous space insertion in bitshift operators
ClosedPublic