This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Improve detection of Objective-C block types
ClosedPublic

Authored by benhamilton on Feb 28 2018, 1:44 PM.

Details

Summary

Previously, clang-format would detect the following as an
Objective-C block type:

FOO(^);

when it actually must be a C or C++ macro dealing with an XOR
statement or an XOR operator overload.

According to the Clang Block Language Spec:

https://clang.llvm.org/docs/BlockLanguageSpec.html

block types are of the form:

int (^)(char, float)

and block variables of block type are of the form:

void (^blockReturningVoidWithVoidArgument)(void);
int (^blockReturningIntWithIntAndCharArguments)(int, char);
void (^arrayOfTenBlocksReturningVoidWithIntArgument[10])(int);

This tightens up the detection so we don't unnecessarily detect
C macros which pass in the XOR operator.

Depends On D43904

Test Plan: New tests added. Ran tests with:

make -j12 FormatTests &&
./tools/clang/unittests/Format/FormatTests

Diff Detail

Repository
rC Clang

Event Timeline

benhamilton created this revision.Feb 28 2018, 1:44 PM

Would it be enough to only add the block type case? With the macro false positive, there won't be an open paren after the closing paren, right?

lib/Format/TokenAnnotator.cpp
153

This seems suspect. Does it have to be a numeric_constant?

unittests/Format/FormatTest.cpp
12133

I am somewhat skeptical about all these tests now being solely about guessLanguage. It might be the best choice for some of them, but also, there might be other things we do to detect ObjC at some point and then all of these tests become meaningless.

Does your change create a different formatting here?

krasimir added inline comments.Mar 5 2018, 10:53 AM
unittests/Format/FormatTest.cpp
12131

Please also add formatting tests. This might require changes to the formatting logic since I'd intuitively expect `int(^)(char) to be formatted as int (^)(char)`.

benhamilton marked 2 inline comments as done.

Would it be enough to only add the block type case? With the macro false positive, there won't be an open paren after the closing paren, right?

Are you asking to remove the block variable cases? I was concerned those would no longer be handled correctly if we don't explicitly check for them.

What change are you suggesting if we remove the block variable cases to handle those?

lib/Format/TokenAnnotator.cpp
153

Probably not, any constexpr would do, I suspect. What's the best way to parse that?

unittests/Format/FormatTest.cpp
12131

Added formatting tests. Tests passed with no changes to the formatting logic, and your intuition was correct.

12133

I hear you. I just want to make sure the false positive case doesn't regress, and we don't lose the functionality of detecting ObjC for block types.

Added formatting tests.

krasimir added inline comments.Mar 6 2018, 5:14 AM
unittests/Format/FormatTest.cpp
7692

This should be in FormatTestObjC.cpp.

djasper added inline comments.Mar 6 2018, 5:37 AM
lib/Format/TokenAnnotator.cpp
153

I think this is the same answer for both of your questions. If what you are trying to prevent "FOO(^)" to be parsed as a block, wouldn't it be enough to look for whether there is a "(" after the ")" or even only after "(^)", everything else is already correct IIUC? That would get you out of need to parse the specifics here, which will be hard.

Or thinking about it another way. Previously, every "(^" would be parsed as an ObjC block. There seems to be only a really rare corner case in which it isn't (macros). Thus, I'd just try to detect that corner case. Instead you are completely inverting the defaults (defaulting to "^" is not a block) and then try to exactly parse ObjC where there might be many cases and edge cases that you won't even think of now.

benhamilton added inline comments.Mar 6 2018, 8:01 AM
lib/Format/TokenAnnotator.cpp
153

Hmm. Well, it's not just FOO(^); that isn't a block:

#define FOO(X) operator X

SomeType FOO(^)(int x, const SomeType& y) { ... }

Obviously we can't get this perfect without a pre-processor, but it seems like our best bet is to only assign mark TT_ObjCBlockLParen when we are sure the syntax is a valid block type or block variable.

benhamilton added inline comments.Mar 6 2018, 8:26 AM
lib/Format/TokenAnnotator.cpp
153

I tried the suggestion to only treat (^)( as a block type, but it appears this is the primary place where we set TT_ObjCBlockLParen, so I think we really do need to handle the other cases here.

djasper added inline comments.Mar 6 2018, 11:34 AM
lib/Format/TokenAnnotator.cpp
153

I don't follow your logic. I'd like you to slowly change this as opposed to completely going the opposite way.

So currently, the only know real-live problem is "FOO(^);". So address this somehow, but still default/error to recognizing too much stuff as a block.

Have you actually seen

SomeType FOO(^)(int x, const SomeType& y) { ... }

in real code?

benhamilton added inline comments.Mar 6 2018, 12:13 PM
lib/Format/TokenAnnotator.cpp
153

I haven't seen that example, but I have seen FOO(^, OtherParam);.

I'm trying to change the parser to handle this now without explicitly parsing all block types, but it's tricky. The following should be ObjC block types:

(^)(int, char);
(^foo)(int, char);
(^foo[10])(int, char);
(^foo[kNum])(int, char);
(^foo[(kNum + kOtherNum)])(int, char);

but the following should not:

FOO(^);
FOO(^, int, char);

I'll make it work, it's just easy to make a mistake.

Right. So the difference is that for blocks, there is always a "(" after the "(^.....)". That should be easy to parse, no?

  • Simplify logic and assume we have an ObjC block whenever we see rparen at the end of (^...)(.

Right. So the difference is that for blocks, there is always a "(" after the "(^.....)". That should be easy to parse, no?

Yep, I've updated this diff to implement that as you recommended. I just had to handle nested parens (I couldn't find an existing way to deal with these so I wrote my own—let me know if I missed something.)

Fix comment.

djasper added inline comments.Mar 7 2018, 4:46 AM
lib/Format/TokenAnnotator.cpp
150

No. Don't implement yet another parenthesis counting. This function already iterates over all tokens until the closing paren. Just store a pointer to the caret here and mark it (or unmark it) when encountering the closing brace (line 271). There is already very similar logic there to set TT_FunctionTypeLParen (which is actually doing the exact same parsing now that I think of it).

  • Greatly clean up ObjC block type logic by re-using TT_FunctionTypeLParen type logic.
benhamilton added inline comments.Mar 7 2018, 7:54 AM
lib/Format/TokenAnnotator.cpp
150

Oh! Yes, re-using the logic for TT_FunctionTypeLParen is clearly the right thing to do, since it has exactly the same syntax (just * or & instead of ^).

Done, code is much cleaner now.

benhamilton marked an inline comment as done.Mar 7 2018, 7:54 AM
djasper accepted this revision.Mar 8 2018, 3:58 PM

Nice.. Removed a lot of complexity :). Let's see whether this heuristic is good enough.

lib/Format/TokenAnnotator.cpp
218–219

I'd suggest to put a comment here saying that this is for both ObjC blocks and Function types, because they look very similar in nature (maybe giving examples) and then not actually rename the variables. To me, the long names make the code harder to read.

But if you feel strongly the other way, I'd be ok with it.

This revision is now accepted and ready to land.Mar 8 2018, 3:58 PM
  • Restore short functionn type variable names and add clarifying comment.
benhamilton marked an inline comment as done.Mar 12 2018, 8:44 AM
benhamilton added inline comments.
lib/Format/TokenAnnotator.cpp
218–219

Restored old names and added comment.

This revision was automatically updated to reflect the committed changes.
benhamilton marked an inline comment as done.