Page MenuHomePhabricator

clang-format recognizes alternative binary operator keywords
ClosedPublic

Authored by bobmoretti on May 6 2014, 8:44 PM.

Details

Reviewers
djasper
Summary

clang-format does not lex the ISO 646 alternative operator keywords as operators; rather, it interprets them as normal identifiers.

This would result in code such as (a) and (b) getting reformatted as (a)and(b).

As a result of some discussion on cfe-dev, this patch changes the lexer option used by clang-format to interpret these keywords as operators and not identifiers.

Diff Detail

Event Timeline

bobmoretti updated this revision to Diff 9143.May 6 2014, 8:44 PM
bobmoretti retitled this revision from to clang-format recognizes alternative binary operator keywords.
bobmoretti updated this object.
bobmoretti edited the test plan for this revision. (Show Details)
bobmoretti set the repository for this revision to rL LLVM.
bobmoretti added a subscriber: Unknown Object (MLST).

NB: Phabricator choked on a diff with UTF8, so I replaced all non-ascii characters in the uploaded diff with spaces (which you will see if you expand the context enough).

nikola added a subscriber: nikola.May 6 2014, 9:17 PM

I don't know anything about this code but I'd really expect both LS_Cpp03 and LS_Cpp11 to mean C++. And this seems to be the case since LangOpts.CPlusPlus is always set to 1. I don't see the value in having the language kind parameter since LK_Cpp is used for all C family of languages.

My suggestion would be to just set CXXOperatorNames to 1, unless there are plans to handle C and C++ differently. But let's see what Manuel has to say.

I didn't add the LanguageKind parameter initially. However, without checking the LanguageKind parameter, a protocol buffers test fails.

Specifically, the and in the following test (FormatTestProto.cpp line 54):

verifyFormat("message SomeMessage {\n"
             "  optional really.really.long.and.qualified.type.aaaaaaa\n"
             "      fiiiiiiiiiiiiiiiiiiiiiiiiield = 1;\n"
             "  optional\n"
             "      really.really.long.and.qualified.type.aaaaaaa.aaaaaaaa\n"
             "          another_fiiiiiiiiiiiiiiiiiiiiield = 2;\n"
             "}");

The and gets misrepresented as an identifier, and the reformatter inserts a space before .qualified.

I don't know the first thing about protocol buffers. Is and a valid type identifier?

If it's not, I'd be happy to replace it in the test, and remove the LanguageKind parameter from getFormattingLangOpts.

nikola added a comment.May 6 2014, 9:55 PM

Theoretically yes but this is true for any C++ keyword as proto compiler turns message fields into data members with decorated names (appends underscore).

I'd say that this test is broken. If clang format is supposed to format proto files the same way it does C++ files than proto definitions can't use C++ tokens as field names. I'd say that this test needs to be modified. I think the same would happen if test declared message field like this 'optional int32 class'?

Yes, I think that would be a problem. However, this particular failing test case doesn't involve the field name; rather it involves the field's type. So I think a closer parallel would be

message class {
    optional int32 foo = 1;
}

message bar {
    optional class c = 1;
}

How would the proto compiler handle that?

nikola added a comment.EditedMay 6 2014, 10:11 PM

LOL, nope. Generates class class.

So to summarize, protbuf compiler is happy to generate invalid C++ code, and the author of proto definition should take care abot this. It just happens that 'and' is not always treated as a token the way 'class' is. I vote for changing the test but let's hear from Manuel first.

All right, so probably the proto test should not try to use a C++ reserved keyword for a type name. I will change that, and get rid of the LanguageKind parameter.

Getting a little off topic, I'm a little surprised that the same exact lexer is used for all of C, C++, ObjC, and javascript, but I am very naïve about such things. Was it an explicit design decision made previously in clang-format's history?

Oh, I missed your edit. I will wait for more feedback before changing.

Thanks for the info.

This is clang's lexer and as far as I know it was designed to handle all C languages from the start. Not sure how Javascript an Protobuf happened but I'm guessing this was added with clang-format.

djasper accepted this revision.May 7 2014, 8:31 AM
djasper added a reviewer: djasper.
djasper added a subscriber: djasper.

I'd be fine with recognizing these as keywords for any language (and changing the stupid proto test).

Otherwise looks good. Do you have commit access?

This revision is now accepted and ready to land.May 7 2014, 8:31 AM
bobmoretti updated this revision to Diff 9194.May 7 2014, 4:06 PM
bobmoretti edited edge metadata.

I updated the proto test to not include any c++ reserved keywords, and I changed the LangOpts returned by getFormattingLangOpts to set CXXOperatorNames to 1 unconditionally, as per review feedback

I do not have commit access.

nikola added a comment.May 7 2014, 5:13 PM

Done r208269. Thanks for working on this.

djasper closed this revision.May 22 2014, 2:56 AM