This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Add a space between an overloaded operator and '>'
ClosedPublic

Authored by owenpan on Feb 10 2023, 9:20 AM.

Details

Summary

The token annotator doesn't annotate the template opener and closer as such if they enclose an overloaded operator. This causes the space between the operator and the closer to be removed, resulting
in invalid C++ code.

Fixes https://github.com/llvm/llvm-project/issues/58602.

Diff Detail

Event Timeline

owenpan created this revision.Feb 10 2023, 9:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2023, 9:20 AM
owenpan requested review of this revision.Feb 10 2023, 9:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2023, 9:20 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
vedgy added a subscriber: vedgy.Feb 10 2023, 9:42 AM

Hi @owenpan. Thank you for fixing this bug!
Have you noticed this paragraph in my bug report?

I believe clang_getTypeSpelling(), or more likely QualType::print() used by it, should insert a tab character between such tokens to pretty-print compilable code. The tab character is preferable to the space character here, because the users may rely on the fact that pretty-printed binary operators are surrounded by spaces to distinguish them from angle brackets.

KDevelop parses the result of clang_getTypeSpelling() when libclang API is lacking. Since this recent commit KDevelop's parsing relies on the empirical fact that only operators are surrounded by spaces to distinguish them from angle brackets. Does this revision introduce angle brackets surrounded by spaces? Can tab characters be used instead? If not, do you know how else such angle brackets can be distinguished from operators?

owenpan planned changes to this revision.Feb 10 2023, 10:03 PM
owenpan updated this revision to Diff 496670.Feb 11 2023, 1:43 AM

Added tryMergeGreaterGreater() to and fixed tryMergerLessLess() of FormatTokenLexer.

Hi @owenpan. Thank you for fixing this bug!
Have you noticed this paragraph in my bug report?

I believe clang_getTypeSpelling(), or more likely QualType::print() used by it, should insert a tab character between such tokens to pretty-print compilable code. The tab character is preferable to the space character here, because the users may rely on the fact that pretty-printed binary operators are surrounded by spaces to distinguish them from angle brackets.

Yes, but I'm not familiar with these functions.

KDevelop parses the result of clang_getTypeSpelling() when libclang API is lacking. Since this recent commit KDevelop's parsing relies on the empirical fact that only operators are surrounded by spaces to distinguish them from angle brackets. Does this revision introduce angle brackets surrounded by spaces? Can tab characters be used instead? If not, do you know how else such angle brackets can be distinguished from operators?

This patch should fix the reported invalid-code-generation bug by leaving a space between the overloaded operator and the closing angle bracket. clang-format doesn't insert tabs in the middle of a line (except when doing alignment if UseTab is set to Always).

This revision is now accepted and ready to land.Feb 14 2023, 3:03 AM
This revision was landed with ongoing or failed builds.Feb 16 2023, 8:25 PM
This revision was automatically updated to reflect the committed changes.

Hi @owenpan, this seems to be crashing for:

struct Foo { operator enum foo{} };

with stack trace:

$ ~/repos/llvm/build/bin/clang-format format_crash.cc --dry-run
clang-format: /usr/local/google/home/kadircet/repos/llvm/clang/lib/Format/TokenAnnotator.cpp:1229: bool clang::format::(anonymous namespace)::AnnotatingParser::consumeToken(): Assertion `CurrentToken' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /usr/local/google/home/kadircet/repos/llvm/build/bin/clang-format format_crash.cc --dry-run
 #0 0x0000000000397687 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /usr/local/google/home/kadircet/repos/llvm/llvm/lib/Support/Unix/Signals.inc:567:13
 #1 0x000000000039583e llvm::sys::RunSignalHandlers() /usr/local/google/home/kadircet/repos/llvm/llvm/lib/Support/Signals.cpp:105:18
 #2 0x0000000000397faa SignalHandler(int) /usr/local/google/home/kadircet/repos/llvm/llvm/lib/Support/Unix/Signals.inc:412:1
 #3 0x00007f49d145af90 (/lib/x86_64-linux-gnu/libc.so.6+0x3bf90)
 #4 0x00007f49d14a9ccc __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x00007f49d145aef2 raise ./signal/../sysdeps/posix/raise.c:27:6
 #6 0x00007f49d1445472 abort ./stdlib/abort.c:81:7
 #7 0x00007f49d1445395 _nl_load_domain ./intl/loadmsgcat.c:1177:9
 #8 0x00007f49d1453df2 (/lib/x86_64-linux-gnu/libc.so.6+0x34df2)
 #9 0x000000000043a7c1 parseBrace /usr/local/google/home/kadircet/repos/llvm/clang/lib/Format/TokenAnnotator.cpp:867:9
#10 0x000000000043a7c1 clang::format::(anonymous namespace)::AnnotatingParser::consumeToken() /usr/local/google/home/kadircet/repos/llvm/clang/lib/Format/TokenAnnotator.cpp:1170:12
#11 0x000000000042e1f8 clang::format::(anonymous namespace)::AnnotatingParser::parseLine() /usr/local/google/home/kadircet/repos/llvm/clang/lib/Format/TokenAnnotator.cpp:1558:11
#12 0x000000000042dc37 clang::format::TokenAnnotator::annotate(clang::format::AnnotatedLine&) /usr/local/google/home/kadircet/repos/llvm/clang/lib/Format/TokenAnnotator.cpp:2837:13
#13 0x000000000041b910 clang::format::TokenAnalyzer::process(bool) /usr/local/google/home/kadircet/repos/llvm/clang/lib/Format/TokenAnalyzer.cpp:0:19
#14 0x00000000003d0576 ~TokenAnalyzer /usr/local/google/home/kadircet/repos/llvm/clang/lib/Format/TokenAnalyzer.h:88:7
#15 0x00000000003d0576 operator() /usr/local/google/home/kadircet/repos/llvm/clang/lib/Format/Format.cpp:3492:9
#16 0x00000000003d0576 __invoke_impl<std::pair<clang::tooling::Replacements, unsigned int>, (lambda at /usr/local/google/home/kadircet/repos/llvm/clang/lib/Format/Format.cpp:3491:27) &, const clang::format::Environment &> /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/invoke.h:61:14
#17 0x00000000003d0576 __invoke_r<std::pair<clang::tooling::Replacements, unsigned int>, (lambda at /usr/local/google/home/kadircet/repos/llvm/clang/lib/Format/Format.cpp:3491:27) &, const clang::format::Environment &> /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/invoke.h:114:9
#18 0x00000000003d0576 std::_Function_handler<std::pair<clang::tooling::Replacements, unsigned int> (clang::format::Environment const&), clang::format::internal::reformat(clang::format::FormatStyle const&, llvm::StringRef, llvm::ArrayRef<clang::tooling::Range>, unsigned int, unsigned int, unsigned int, llvm::StringRef, clang::format::FormattingAttemptStatus*)::$_8>::_M_invoke(std::_Any_data const&, clang::format::Environment const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_function.h:290:9
#19 0x00000000003b8a18 _M_is_engaged /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/optional:471:58
#20 0x00000000003b8a18 operator bool /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/optional:985:22
#21 0x00000000003b8a18 clang::format::internal::reformat(clang::format::FormatStyle const&, llvm::StringRef, llvm::ArrayRef<clang::tooling::Range>, unsigned int, unsigned int, unsigned int, llvm::StringRef, clang::format::FormattingAttemptStatus*) /usr/local/google/home/kadircet/repos/llvm/clang/lib/Format/Format.cpp:3533:9
#22 0x00000000003b9c73 _Rb_tree_impl /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_tree.h:687:12
#23 0x00000000003b9c73 _Rb_tree /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_tree.h:954:7
#24 0x00000000003b9c73 set /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_set.h:231:7
#25 0x00000000003b9c73 Replacements /usr/local/google/home/kadircet/repos/llvm/clang/include/clang/Tooling/Core/Replacement.h:212:7
#26 0x00000000003b9c73 clang::format::reformat(clang::format::FormatStyle const&, llvm::StringRef, llvm::ArrayRef<clang::tooling::Range>, llvm::StringRef, clang::format::FormattingAttemptStatus*) /usr/local/google/home/kadircet/repos/llvm/clang/lib/Format/Format.cpp:3557:10
#27 0x000000000033da7f clang::format::format(llvm::StringRef) /usr/local/google/home/kadircet/repos/llvm/clang/tools/clang-format/ClangFormat.cpp:0:7
#28 0x000000000033c0a5 main /usr/local/google/home/kadircet/repos/llvm/clang/tools/clang-format/ClangFormat.cpp:631:14
#29 0x00007f49d144618a __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:74:3
#30 0x00007f49d1446245 call_init ./csu/../csu/libc-start.c:128:20
#31 0x00007f49d1446245 __libc_start_main ./csu/../csu/libc-start.c:368:5
#32 0x000000000033ac71 _start (/usr/local/google/home/kadircet/repos/llvm/build/bin/clang-format+0x33ac71)
Aborted

@kadircet thanks for reporting the crash and reverting the commit. I will fix it and reland the patch.

clang/lib/Format/TokenAnnotator.cpp
1229

After next() or consumeToken(), CurrentToken would advance and could become null.