This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Improve Incomplete detection for (text) protos
ClosedPublic

Authored by krasimir on Mar 7 2018, 5:16 AM.

Details

Summary

This patch improves detection of incomplete code for protos and text protos.
This is especially important for text protos in raw string literals, since they
might be partial strings concatenated, and we'd like to disable formatting in
these cases.

Diff Detail

Repository
rC Clang

Event Timeline

krasimir created this revision.Mar 7 2018, 5:16 AM

I'm not familiar with clang-format internals, sorry if questions are dumb.

My first inclination is that doing something reasonable for incomplete inputs makes sense, but it doesn't seem to be worth spending complexity on things that are actually intended to not be protos - people should just not use R"pb in this case.

I recall you had a reason for wanting to try this out anyway though, but I can't remember what it is. (Can you add it to the description?)

lib/Format/TokenAnnotator.cpp
716

I don't really understand what this is doing.
It seems to be detecting some particular sequence of invalid tokens }:,]:,>:. Why are these special?

What's the contract of consumeToken()? It's hard to see from the callsites.

822–823

In what sense of "line" is this true?

krasimir updated this revision to Diff 141121.Apr 5 2018, 3:28 AM
krasimir marked 2 inline comments as done.
  • Address review comments
lib/Format/TokenAnnotator.cpp
716

Thank you very much! I guess I got carried away. These don't make sense. I'll update it to only keep dangling : as in key:.

Returning false from consumeToken has the desired effect that (eventually) marks the line as incomplete here:
https://github.com/llvm-mirror/clang/blob/master/lib/Format/TokenAnnotator.cpp#L1055

822–823

This is the UnwrappedLine concept of clang-format: roughly a maximal sequence of tokens that we could have put on the same line if there was no column limit.

sammccall accepted this revision.Apr 11 2018, 2:21 AM
sammccall added inline comments.
lib/Format/TokenAnnotator.cpp
822–823

I still don't understand why it's true for say C or JS then.
(kinda relevant here because I would like to understand why proto is an exception)

unittests/Format/FormatTestProto.cpp
21

DoNotCheck is never used, consider removing it

21

(nit: is the SC_ prefix really needed in tests?)

This revision is now accepted and ready to land.Apr 11 2018, 2:21 AM
krasimir updated this revision to Diff 142394.Apr 13 2018, 6:38 AM
krasimir marked an inline comment as done.
  • Address review comments
This revision was automatically updated to reflect the committed changes.
krasimir added inline comments.Apr 13 2018, 6:41 AM
lib/Format/TokenAnnotator.cpp
822–823

In C++ stuff like global functions or namespaces result in multiple unwrapped lines, like:

line 1: namespace a {
line 2: int x = 3;
line 3: }

In text protos, the top-level essentially can only contain key-value pairs, where the value can be a submessage, which is like an init list, which is put on a single unwrapped line, like:

line 1: key: { item: data }

I don't know too much more about why this is the case.

unittests/Format/FormatTestProto.cpp
21