This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] [PR36294] AlwaysBreakAfterReturnType works incorrectly for some operator functions
ClosedPublic

Authored by MyDeveloperDay on Oct 29 2019, 10:46 AM.

Details

Summary

https://bugs.llvm.org/show_bug.cgi?id=36294

Addressing bug related to returning after return type not being honoured for some operator types.

$ bin/clang-format --style="{BasedOnStyle: llvm, AlwaysBreakAfterReturnType: TopLevelDefinitions}" /tmp/foo.cpp
class Foo {
public:
  bool operator!() const;
  bool operator<(Foo const &) const;
  bool operator*() const;
  bool operator->() const;
  bool operator+() const;
  bool operator-() const;
  bool f() const;
};

bool Foo::operator!() const { return true; }
bool
Foo::operator<(Foo const &) const {
  return true;
}
bool Foo::operator*() const { return true; }
bool Foo::operator->() const { return true; }
bool
Foo::operator+() const {
  return true;
}
bool
Foo::operator-() const {
  return true;
}
bool
Foo::f() const {
  return true;
}

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptOct 29 2019, 10:46 AM
sammccall added inline comments.Oct 29 2019, 11:32 AM
clang/lib/Format/TokenAnnotator.cpp
1347–1349

This seems clearer as is semi || (is exclaim && !previous is operator), as operator; isn't relevant here.
&& !Current.is(TT_OverloadedOperator) would be even better of course

2092

This seems like it's supposed to be handled by the token annotator, and the same cause will result in bugs in other places - why aren't these tokens being marked as TT_OverloadedOperator?

I'd guess the deeper fix is in the kw_operator case in consumeToken(). The way it's written looks suspect, though I'm not an expert on any of this.

Trying to address review comments

MyDeveloperDay marked 4 inline comments as done.Oct 29 2019, 1:52 PM
MyDeveloperDay added inline comments.
clang/lib/Format/TokenAnnotator.cpp
1347–1349

rearranged the logic as suggested, unfortunately we can't use TT_LoverloadOperator at this point because its not yet been annotated (so ! has TT_Unknown still)

2092

The * from *() is already labelled as a TT_PointerOrRefernce so can't also be marked as an Overloaded operator so use that in the loop.

MyDeveloperDay marked 2 inline comments as done.Oct 30 2019, 4:02 AM
sammccall added inline comments.Oct 30 2019, 5:59 AM
clang/lib/Format/TokenAnnotator.cpp
2092

Right, the fact that it's incorrectly labeled as a PointerOrReference is a bug, is it not possible to fix that instead of working around it here?

In the consumeToken loop, it seems the * in operator * is labeled as PointerOrReference by logic that's trying to handle e.g. StringRef::operator char*(). However when the * is _immediately_ preceded by operator, it's always an overloaded operator name rather than a pointer, I think?

MyDeveloperDay marked 2 inline comments as done.

Extend this revision to cover additional https://bugs.llvm.org/show_bug.cgi?id=43783 issue (which has overlap)

New revision correctly formats the following code:

constexpr auto
operator*() const -> reference {
  bfexpects_when_compliant(m_i < m_t->size());
  return m_t->get()[m_i];
}

constexpr auto
operator->() const -> pointer {
  bfexpects_when_compliant(m_i < m_t->size());
  return &m_t->get()[m_i];
}

constexpr auto
operator[](index_type n) const -> reference {
  bfexpects_when_compliant(m_i < m_t->size());
  return m_t->get()[m_i];
}

constexpr auto
operator++() -> lra_iterator & {
  ++m_i;
  return *this;
}

constexpr auto
operator()() const -> reference {}

constexpr auto
operator++() const -> reference {}

constexpr auto
operator--() const -> reference {}

constexpr auto
operator*() const -> reference {}

constexpr auto
operator->() const -> reference {}

constexpr auto
operator>>() const -> reference {}

constexpr auto
operator<<() const -> reference {}

constexpr auto
operator[]() const -> reference {}

constexpr auto
operator void*() const -> reference {}

constexpr auto
operator char*() const -> reference {}

constexpr auto
operator Foo*() const -> reference {}

constexpr auto
operator!() const -> reference {}

class Foo {
public:
  bool operator!() const;
  bool operator<(Foo const &) const;
  bool operator*() const;
  bool operator->() const;
  bool operator+() const;
  bool operator-() const;
  bool f() const;
};

bool
Foo::operator!() const {
  return true;
}
bool
Foo::operator<(Foo const &) const {
  return true;
}
bool
Foo::operator*() const {
  return true;
}
bool
Foo::operator->() const {
  return true;
}
bool
Foo::operator+() const {
  return true;
}
bool
Foo::operator-() const {
  return true;
}
bool
Foo::f() const {
  return true;
}
sammccall added inline comments.Oct 30 2019, 7:05 AM
clang/lib/Format/TokenAnnotator.cpp
955

I'm confused about this: ISTM that where we were previously always treating star as a pointer, now we're always treating it as an operator name.
Whereas it's sometimes an operator name (immediately after the operator keyword) and sometimes a pointer (following a type name).

Maybe we should shift the OverloadedOperator labelling outside the loop (since AFAICT it should only apply to the first token) and then keep the loop to mark stars/amps elsewhere in the operator name as PointerOrReference?

2617

why?

clang/unittests/Format/FormatTest.cpp
6962

this looks like a regression at first glance, in LLVM style there's a space between type and *

MyDeveloperDay marked 2 inline comments as done.Oct 30 2019, 7:19 AM
MyDeveloperDay added inline comments.
clang/lib/Format/TokenAnnotator.cpp
955

Let me take another look..

2092

Treating * and -> as an TT_OverloadedOperator is probably ok, although this breaks unit test I highlight below, which I almost think is incorrect anyway.

I'm looking into a related issue https://bugs.llvm.org/show_bug.cgi?id=43783, which is changing this code again, I may roll both changes into the same fix

However this is made much more complex to deal with it just by using the TT_OverloadedOperator when the operator consists of 2 tokens say like [ and ] unless I combine them into a single token.

operator void*()
operator char*()
operator []()

hence the need for code to handle in this loop for handling.

operator new[]()
operator delete[]()
clang/unittests/Format/FormatTest.cpp
6962

Treating * as TT_OverloadedOperator will break this test, which perhaps is already wrong!

6962

yes, I was a little concerned about this change too, it's hard to know if operator void * was intended as it falls off the bottom of the spacesBetween function with a return true; but if you are concerned then perhaps its better to change it back.

Remove test regression

MyDeveloperDay marked 4 inline comments as done.Oct 30 2019, 8:20 AM
MyDeveloperDay added inline comments.
clang/lib/Format/TokenAnnotator.cpp
2617

LLVM style and google style is different based on pointer alignment

clang/unittests/Format/FormatTest.cpp
6962

reverted

MyDeveloperDay marked an inline comment as done.Oct 30 2019, 8:20 AM
MyDeveloperDay marked 2 inline comments as done.Oct 30 2019, 9:07 AM
MyDeveloperDay added inline comments.
clang/lib/Format/TokenAnnotator.cpp
955

@sammccall In trying to understand what you were suggesting I tried the following:

bool
Foo::operator*(void *) const {
  return true;
}

The second * will be as seen correctly below a PointerOrReference by the nature that we already hit the OverloadedOperatorLParen as the loop only goes until it see a ( ) or ;

I'm trying to think of a case where that perhaps wouldn't be the case? such that we might get 2 OverloadedOperators

M=0 C=0 T=Unknown S=1 B=0 BK=0 P=0 Name=bool L=4 PPK=2 FakeLParens= FakeRParens=0 II=0x1cea7c36c40 Text='bool'
 M=1 C=1 T=FunctionDeclarationName S=1 B=0 BK=0 P=80 Name=identifier L=84 PPK=2 FakeLParens= FakeRParens=0 II=0x1cea7c42298 Text='Foo'
 M=0 C=0 T=Unknown S=0 B=0 BK=0 P=23 Name=coloncolon L=86 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='::'
 M=0 C=1 T=Unknown S=0 B=0 BK=0 P=520 Name=operator L=94 PPK=2 FakeLParens= FakeRParens=0 II=0x1cea7c36ed8 Text='operator'
 M=0 C=0 T=OverloadedOperator S=0 B=0 BK=0 P=34 Name=star L=95 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='*'
 M=0 C=0 T=OverloadedOperatorLParen S=0 B=0 BK=0 P=34 Name=l_paren L=96 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='('
 M=0 C=1 T=Unknown S=0 B=0 BK=0 P=140 Name=void L=100 PPK=2 FakeLParens= FakeRParens=0 II=0x1cea7c368e0 Text='void'
 M=0 C=1 T=PointerOrReference S=1 B=0 BK=0 P=230 Name=star L=102 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='*'
 M=0 C=0 T=Unknown S=0 B=0 BK=0 P=54 Name=r_paren L=103 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=')'
 M=0 C=1 T=TrailingAnnotation S=1 B=0 BK=0 P=170 Name=const L=109 PPK=2 FakeLParens= FakeRParens=0 II=0x1cea7c363e8 Text='const'
 M=0 C=0 T=FunctionLBrace S=1 B=0 BK=1 P=23 Name=l_brace L=111 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='{'
2617

I believe that now I'm no longer marking that * as a TT_PointerOrReference, one of the other spaceRequiredBetween rules is missing for this case and now we need to consider the pointer alignment.

Without adding this rule to handle this case one of the 2 items from the same UnderstandsOverloadedOperators test fail

verifyFormat("operator void *();");
...
verifyGoogleFormat("operator void*();");

its possible I could clarify this further make ensuring the star is also an overloaded operator e.g.

if (Right.is(tok::star)  && Right.is(TT_OverloadedOperator)
sammccall added inline comments.Oct 31 2019, 11:16 AM
clang/lib/Format/TokenAnnotator.cpp
955

I mean something like a conversion operator to pointer: bool Foo::operator void *();
The star appears before the operator arg list, but it's still a pointer.

So the testcases I'd try here would be:

Foo::operator*(); // should be OverloadedOperator
Foo::operator void *(); // should be PointerOrReference
Foo::operator()(void *); // should be PointerOrReference
Foo::operator*(void *); // should be OverloadedOperator, PointerOrReference
operator*(int(*)(), class Foo); // OverloadedOperator, PointerOrReference (this one is silly and doesn't need to work)
2107

I'm suspicious of code that handles star but not amp, though maybe I shouldn't be.

MyDeveloperDay set the repository for this revision to rG LLVM Github Monorepo.

Address review comments

MyDeveloperDay marked 5 inline comments as done.Nov 1 2019, 1:12 AM
MyDeveloperDay added inline comments.
clang/lib/Format/TokenAnnotator.cpp
2107

added those and mutated your test cases to cover that

Build result: pass - 59827 tests passed, 0 failed and 762 were skipped.
Log files: console-log.txt, CMakeCache.txt

Looking at the unit tests, it appears that you have successfully fixed the bug

MyDeveloperDay marked an inline comment as done.Nov 8 2019, 1:14 PM

gentle ping!

sammccall accepted this revision.Nov 11 2019, 3:39 AM

Thanks, looks great!
Sorry for the delay, I was out last week.

This revision is now accepted and ready to land.Nov 11 2019, 3:39 AM
This revision was automatically updated to reflect the committed changes.

@MyDeveloperDay @hans what about adding this to the release notes?
I was trying clang-format 10 on Firefox code base and I noticed this change which isn't documented in
https://prereleases.llvm.org/10.0.0/rc1/tools/clang/docs/ReleaseNotes.html#clang-format
(I can do it if you want)

hans added a comment.Mar 2 2020, 2:22 AM

@MyDeveloperDay @hans what about adding this to the release notes?
I was trying clang-format 10 on Firefox code base and I noticed this change which isn't documented in
https://prereleases.llvm.org/10.0.0/rc1/tools/clang/docs/ReleaseNotes.html#clang-format
(I can do it if you want)

Release notes are very welcome :-)

FYI, in Mozilla build of clang-format, we reverted this change.
It was causing more issues than fixes.
https://bugzilla.mozilla.org/show_bug.cgi?id=1629853

MyDeveloperDay added a comment.EditedApr 26 2020, 4:48 AM

Let me look into this a little, I'd prefer we fixed the corner cases than reverted the lot (https://bugs.llvm.org/show_bug.cgi?id=45357)

see D78879: [clang-format] [PR45357] Fix issue found with operator spacing for possible fix