This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] [NFC] isCpp() is inconsistently used to mean both C++ and Objective C, add language specific isXXX() functions
AbandonedPublic

Authored by MyDeveloperDay on May 17 2020, 4:52 AM.

Details

Summary

This revision puts the Style.Lanauge==LK_???? behind convenience functions to begin to avoid potential issues regarding the use of Style.isCpp()

It's unclear in a number of places what people really mean when using isCpp() as such I have introduced a isCppOnly() to help identify those places where people where not checking both languages

Ideally I'd make isCpp() mean only C/C++ and use isCpp() || isObjectiveC() but so many of those places overlap I can see why we'd have one function, but perhaps a better name for isCpp() is isCStyleLanguages() then let isCppOnly() be renamed to isCpp() but I'm not comfortable doing that in one go.

Regardless I feel introducing isObjectiveC(),isJava(),JavaScript(),isProtoBuf(),isTextProtoBuf and isTableGen() helps the code read a little better

Diff Detail

Event Timeline

MyDeveloperDay created this revision.May 17 2020, 4:52 AM

Missed some cases

Remove erroneous changes

sammccall added inline comments.May 17 2020, 11:59 PM
clang/include/clang/Format/Format.h
1632

Just my 2c - I find the current meaning of isCpp easier to understand, and would prefer isObjectiveC to mean objective-C/C++. h if it exists.

Reasons:

  • this is consistent with LangOptions::CPlusPlus and LangOptions::ObjC
  • when checking for C++, also applying these rules to ObjC++ should be the common/default case, and excluding ObjC++ the special case that justifies more precise syntax (and honestly, I'd find isCpp && !isObjC to carry the clearest intent in that case). IOW, this seems like it will attract bugs.

perhaps a better name for isCpp() is isCStyleLanguages()

Clang uses the term "C family languages", and this includes C, C++, ObjC, ObjC++.
If you really want to avoid the conflict between C++ the boolean language option and C++ the precise language mode, I'd suggest isPlusPlus() and isObjective(). But I think consistency with LangOptions is worth more.

1635

These functions that don't *even in principle* do more than compare to an enum seem like extra indirection that hurts understanding of the code (have to look up what isObjectiveC() does, or have subtle bugs).

I suspect isCSharp() was added due to a misunderstanding of what isCpp() was for.

  • I personally find the current form (Style.Language == LK_XXX) very verbose and difficult to read, especially when its in the middle of a large clause.
  • Its clear to me that isCpp() is not understood, its name doesn't quite suggest what it does and as such there is code that effectively does if (isCpp() || isObjectiveC()) (which you can't easily see because of the verbose form)
  • I added isCSharp() (I knew what isCpp() did) but I added it so it was very clear where the C# handling code was, I didn't want to proliferate the Style.Language == LK_CSharp , and also its easier to search for all !Style.isCSharp() and Style.isCSharp() in one go rather than having to handle the !=/== cases

I think functions need to be named to say what they do, and isCpp() currently doesn't and it has/does already cause mistakes. I don't mind what its called I'm happy to change names, but I feel there is more clarity in the isXXX() calls.

This has been on of those refactorings I've wanted to do over the last year as I've been working on clang-format, I was expecting it to be controversial: ;-)

My 2p worth.

MyDeveloperDay marked 2 inline comments as done.May 18 2020, 1:55 AM
MyDeveloperDay added inline comments.
clang/lib/Format/TokenAnnotator.cpp
3443

Here is an example of where the language is checked to be both LK_cpp and LK_ObjC when just one isCpp() would have done.

3645

Again here, why not isCpp() here?

I like these changes.
I have mixed feelings about isCpp() & co.
As @MyDeveloperDay said, I'd like it mean C++ only. I find it confusing that it means C++ or ObjC (even if the latter is a superset of the former). I'd rather see it spelt as isCppOrObjC() even if it's verbose but at least removes all confusion IMO.

clang/include/clang/Format/Format.h
1632

I'd rather go for coherence with LanguageKind options and spell it isObjC().

1635

Ditto, maybe isProto and isTextProto?

As @MyDeveloperDay said, I'd like it mean C++ only. I find it confusing that it means C++ or ObjC (even if the latter is a superset of the former). I'd rather see it spelt as isCppOrObjC() even if it's verbose but at least removes all confusion IMO.

I'm kind of with you on isCppOrObjC()

Quuxplusone added inline comments.
clang/include/clang/Format/Format.h
1632

Peanut gallery says: Please be consistent! If you're going to do isCppOrObjC(), then you should also do isObjC(), not isObjectiveC(). (Or vice versa.)

What about Objective-C++? Are people using isCppOrObjectiveC() to mean "well actually it's Objective-C++ but we have no LanguageKind for that, oops?"

Re the term "C family languages," I've heard "curly-brace languages" — but that would include C#, Java, and JavaScript as well.

It seems to be that the members should be

bool isC() const { return Language == LK_Cpp; }  // no LK_C yet
bool isCpp() const { return Language == LK_Cpp; }
bool isObjectiveC() const { return Language == LK_ObjectiveC; }
bool isObjectiveCpp() const { return Language == LK_ObjectiveC; }  // no LK_ObjectiveCpp yet
bool includesCpp() const { return isCpp() || is ObjectiveCpp(); }

and that most of the callers you're talking about should be using includesCpp() instead of isCpp().

1635

If the name of the language is "Protobuf", then the name of the function should be isProtobuf().

MyDeveloperDay marked 2 inline comments as done.May 21 2020, 10:42 AM

I feel like there might something of a concencus forming.. If I take the time to redo following the suggestions @sammccall do you think you could live with it?

clang/include/clang/Format/Format.h
1632

So I think I'd refrain from adding the isC() and isObjectiveCpp() unless they were added to clang itself. But if they were then I think I'd be 100% with you.

I'm not really a fan of the IsCppOnly() this was really to minimize the initial changes to use a function, I actually think 'isCppOrObjC()' is a better choice,

but I think I'd do that transistion from isCpp -> isCppOrObjC() and isCppOnly() -> isCpp() in a separate revision.

1635

I'm good with changing it to use the correct for obj isObjC() and isProto() annd isTextProto()

I don't really mind what we call them I just don't like the extra visual complexity that the Style.Langauge == LK_XXX and Style.Langauge != LK_XXX brings

Switch to new function name suggestions

MyDeveloperDay marked 6 inline comments as done.May 22 2020, 2:19 AM

Just some thoughts.

I agree that Style.isJavaScript() is nicer than Style.Language == FormatStyle::LK_JavaScript, but overall the tradeoffs about adding convenience methods are not clear to me.

C++/ObjC/ObjC++ is special -- because ObjC++ exists I think it is useful for isCpp() by default to include LK_ObjC so C++ enhancements added to the formatter automatically also apply to ObjC++. This is also why the overall Clang language enum choice makes sense to me. I think we don't want to give an easy way for people to special case C++ excluding ObjC++. I agree that this makes Style.isCpp() confusing.

Proto/TextProto is another family that occurs frequently together (but much much less often than C/C++) (as the options format of .proto files is the text proto format).

I'd be inclined to only add convenience methods for the subsets of languages that we think are important together and where improvements to one member of the subset are most likely to benefit or introduce regressions in other members of the subset. I'd be confused if two such convenience methods shared a language in common.

Specifically, let's provide exactly these convenience methods:

likeCpp { LK_Cpp || LK_ObjC }
isCSharp { LK_CSharp }
isJava { LK_Java }
isJavaScript { LK_JavaScript }
isTableGen { LK_TableGen }
likeProto { LK_Proto || LK_TextProto }

and intentionally not provide: isC, isCpp, isCppOrObjC, isObjC, isObjCpp, isProto and isTextProto. In cases where those need to be distinguished, which I believe should be rare enough*, we can use Style.Language comparisons, somewhat documenting that what follows is non-standard, but also being very precise in each of those situations what exactly we mean by spelling out the comparisons in full.
The guideline for usages would be: use the convenience methods unless you have a good reason not to.

*I realize that probably the most comparisons left would be of the form Style.Language == FormatStyle::LK_ObjC (or !=), and that makes me sad, and maybe this is a good enough reason to add isObjC by itself, but then we would lose the orthogonality of the convenience methods, and we open the door for adding isXXX for every language kind, which will make it easier to add a feature for one language that ideally should automatically work by default also for another, so it sucks either way.

Sidenote. For "proto" vs "protobuf" -- I don't know whether this language has an official name. "protobuf" could refer to "the whole ecosystem" vs. "the definition language (describing types)" vs "a message in binary or text format". But .proto is *the* file extension for the definition language, so I think that's good enough.

I'm a fan of the 'like' helpers, but I'm not entirely convinced that having helpers for languages covered by the 'like' is a bad thing-- it just needs to be very explicit that you do mean only that language. For example, an 'isObjCOnly()' would hint to reviewers that ObjC is sometimes used in combination with other languages and there may be a more appropriate helper (of course, this should be clearly documented as well).

Ultimately, I don't really care one way or the other about this, except maybe for the current misnomer of isCpp(), but that's my 2c.

MyDeveloperDay planned changes to this revision.May 22 2020, 11:19 AM

I feel like there might something of a concencus forming.. If I take the time to redo following the suggestions @sammccall do you think you could live with it?

Yeah, LGTM. The alignment of the existing name with clang wasn't as close as I thought anyway. Sorry about adding friction here.

I feel like isCppOrObjC doesn't cover the "essence" of the thing being tested. But it's hard to come up with a great name because it's not clear whether this essence is about C or about C++ (we don't distinguish, and the answer varies depending on callsite).
Actually isCFamily might be a better name here - it's short, decides this is about C but clearly includes the others too. But I'm bikeshedding now, I can live with the current or any other option.

Regarding helpers vs no helpers for the trivial cases: I'm personally not a fan but this is just a question of taste and at this point your opinion is more important I think.

MyDeveloperDay abandoned this revision.Sep 27 2022, 4:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 27 2022, 4:44 AM