Page MenuHomePhabricator

Variable names rule
Needs RevisionPublic

Authored by michaelplatings on Feb 7 2019, 7:48 AM.

Details

Reviewers
lattner
zturner
Summary

Following discussion and general agreement that the current naming rule for variables is not ideal, this patch switches the naming rule to make lowerCamelCase the standard, consistent with a prior RFC.

Given that over 450,000 variables are currently named in UpperCamelCase, the rule also permits using that form for consistency with existing code. I can't see a way to express that in .clang-tidy files.

Diff Detail

Event Timeline

michaelplatings created this revision.Feb 7 2019, 7:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 7 2019, 7:48 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Pretty sure this patch should have gone to llvm-commits, not cfe-commits.

Pretty sure this patch should have gone to llvm-commits, not cfe-commits.

I just set the repository, Phabricator did the rest - apparently the magic isn't working so well.

Pretty sure this patch should have gone to llvm-commits, not cfe-commits.

I just set the repository, Phabricator did the rest - apparently the magic isn't working so well.

  1. Does clang-tidy warn on every single existing variable now?
  2. It might be best to give this more visibility, by submitting a mail to llvm-dev, with a noticeable subject, like "RFC: changing variable naming rules in LLVM codebase"
  1. It might be best to give this more visibility, by submitting a mail to llvm-dev, with a noticeable subject, like "RFC: changing variable naming rules in LLVM codebase"

+1. I know the discussion took place there originally, but "RFC" will catch more people's eyes. Also the prior discussion had some digressions (ahem) which a new RFC can be more focused.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 7 2019, 10:14 AM

does the readability-identifier-naming check need to be changed to support multiple allowed case types?

- key:             readability-identifier-naming.VariableCase
    value:        camelBack,CamelBack

I am generally in favour of this direction.

lattner accepted this revision.Feb 7 2019, 10:04 PM
lattner added a subscriber: lattner.

I am very much +1 on this. That said, this isn't the sort of thing we just use patch review for. Please agitate a robust discussion about this on llvm-dev. :-)

This revision is now accepted and ready to land.Feb 7 2019, 10:04 PM
  1. Does clang-tidy warn on every single existing variable now?
  2. It might be best to give this more visibility, by submitting a mail to llvm-dev, with a noticeable subject, like "RFC: changing variable naming rules in LLVM codebase"
  1. Pretty much. Previously clang-tidy gave 56,787 unique warnings. After this patch it gives 361,382 unique warnings. I personally would be happy to change the settings from camelBack to aNy_CasE.
  1. Done: http://lists.llvm.org/pipermail/llvm-dev/2019-February/130083.html
MyDeveloperDay added a comment.EditedFeb 8 2019, 6:27 AM

I personally would be happy to change the settings from camelBack to aNy_CasE.

Should we come up with a new style? say UpperOrLowerCamelCase or camelBackOrCase , I don't mind going and doing that in the readability-identifier-naming check, given that I just wrote up all the Options for that check https://clang.llvm.org/extra/clang-tidy/checks/readability-identifier-naming.html in D56563: [clang-tidy] add options documentation to readability-identifier-naming checker

zturner added a subscriber: zturner.Feb 8 2019, 8:59 PM

Is this actually any better? Whereas before we can’t differentiate type names and variable names, under this proposal we can’t differentiate type names and function names. So it seems a bit of “6 of 1, half dozen of another”

Is this actually any better? Whereas before we can’t differentiate type names and variable names, under this proposal we can’t differentiate type names and function names. So it seems a bit of “6 of 1, half dozen of another”

Perhaps you mistyped? The proposal does not change the status quo of either type names nor function names. If you mean that we can't differentiate variable names and function names, then it seems worthwhile to point out that the actual letters (not just the case of said letters) matter too. Whereas the guidelines state that types and variables should have names that are nouns, the guidelines state that functions should have names that are verb phrases.

Should we come up with a new style? say UpperOrLowerCamelCase, I don't mind going and doing that in the readability-identifier-naming check, given that I just wrote up all the Options for that check https://clang.llvm.org/extra/clang-tidy/checks/readability-identifier-naming.html in D56563: [clang-tidy] add options documentation to readability-identifier-naming checker

Sounds good to me. I see that you've made D57966 a child of this issue, but we could swap the dependency around so that once your patch is applied I can update this patch to use camelBackOrCase.

MyDeveloperDay added a comment.EditedFeb 11 2019, 3:07 AM

Sounds good to me. I see that you've made D57966 a child of this issue, but we could swap the dependency around so that once your patch is applied I can update this patch to use camelBackOrCase.

I'm OK if we want to do that, but its very much a circular dependency, I don't want to land D57966: [clang-tidy] add camelBackOrCase casing style to readability-identifier-naming to support change to variable naming policy (if adopted), unless this whole variableName suggestion is accepted (which I think is a good idea).. I think your suggestion warrants being the driver, lets do the clang-tidy change and subsequent .clang-tidy changes on another revision post acceptance of both.

FWIW, I agree with the comments that the function name should be differentiated from the variable by the use of a verbs, I've spent too much time in my career grepping for the word type in code

Type type = type();

I think

Type type = getType();

or

Type objectType = getType();

adds some increased levels of clarity.

Is this actually any better? Whereas before we can’t differentiate type names and variable names, under this proposal we can’t differentiate type names and function names. So it seems a bit of “6 of 1, half dozen of another”

Perhaps you mistyped? The proposal does not change the status quo of either type names nor function names. If you mean that we can't differentiate variable names and function names, then it seems worthwhile to point out that the actual letters (not just the case of said letters) matter too. Whereas the guidelines state that types and variables should have names that are nouns, the guidelines state that functions should have names that are verb phrases.

There is still overlap, e.g. "process" can be a noun (a Linux process) or a verb (to process something)

I think it should also be pointed out there is not zero overhead -- it's not a lot (at least for native English speakers, which many LLVM developers are not), but determining if a word is a verb or a noun is harder than looking at the casing. Small, but worth observing.

A different convention, e.g. lower_case, avoids this. Personally, I'd prefer that, but I'm also fine with lowerCamelCase just so we can stop using UpperCamelCase.

llvm/docs/CodingStandards.rst
1194

It would be nice for this section to be expanded a bit, just to avoid inevitable code review churn, e.g. if I'm adding 50 lines to a 200 line file, am I allowed to change the existing var names elsewhere in the file or method, or is that outside the scope of my change? If I'm reviewing that patch, do I tell the author they have to be consistent and revert other changes? etc.

Is there any plan to use clang-tidy to do a global cleanup, or is this going to be a totally ad-hoc migration -- variables use the new scheme only when the code is updated?

Update .clang-tidy files to use aNy_CasE until camelBackOrCase is available.
Add more guidance around acronyms.
Add more guidance around consistency with existing CamelCase variable names.
Change other code examples to camelBack.

michaelplatings marked an inline comment as done.Feb 18 2019, 8:50 AM
michaelplatings added inline comments.
llvm/docs/CodingStandards.rst
1194

I've had a go at expanding it. Please let me know if you have other suggestions.

Is there any plan to use clang-tidy to do a global cleanup, or is this going to be a totally ad-hoc migration -- variables use the new scheme only when the code is updated?

The latter. Given that the code doesn't keep to the existing .clang-tidy rules I'm not optimistic that we could persuade code owners to start now. That's not to say it couldn't happen eventually, but my aim at this point in time is to make it easier to use good variable names and I don't want perfect to be the enemy of better.

Changed recommendation for acronyms from lower case to upper case, as suggested by several responses to the RFC.

miyuki added a subscriber: miyuki.Feb 19 2019, 3:40 AM
miyuki added inline comments.
llvm/docs/CodingStandards.rst
1065–1068

signed is a keyword, it can't be used as a variable name

michaelplatings marked an inline comment as done.

Changed recommendation for acronyms from lower case to upper case, as suggested by several responses to the RFC.

I haven't been following the discussion closely - why is this the preferred direction? I don't think that things like "Basicblock *bb" or "MachineInstr *mi" will be confusing, and going towards a consistently leading lower case letter seems simple and preferable.

-Chris

Changed recommendation for acronyms from lower case to upper case, as suggested by several responses to the RFC.

I haven't been following the discussion closely - why is this the preferred direction? I don't think that things like "Basicblock *bb" or "MachineInstr *mi" will be confusing, and going towards a consistently leading lower case letter seems simple and preferable.

Maybe I misunderstood you (http://lists.llvm.org/pipermail/llvm-dev/2019-February/130223.html):

> Maybe there should be an exception that variable names that start with an acronym still should start with an upper case letter?
That would also be fine with me - it could push such a debate down the road, and is definitely important for a transition period anyway.

Also Chandler (http://lists.llvm.org/pipermail/llvm-dev/2019-February/130313.html):

FWIW, I suspect separating the transition of our acronyms from the transition of identifiers with non-acronym words may be an effective way to chip away at the transition cost... Definitely an area that people who really care about this should look at carefully.

And Sanjoy (kind of) (http://lists.llvm.org/pipermail/llvm-dev/2019-February/130304.html):

maybe a gradual transition plan could be to allow these upper case acronyms for specific classes?

I agree that lower case acronyms would ultimately be more consistent, but given where we are it seems more achievable to only change the rule for non-acronyms.

Changed recommendation for acronyms from lower case to upper case, as suggested by several responses to the RFC.

I haven't been following the discussion closely - why is this the preferred direction? I don't think that things like "Basicblock *bb" or "MachineInstr *mi" will be confusing, and going towards a consistently leading lower case letter seems simple and preferable.

-Chris

I don’t think we should use this review as evidence of consensus. For example, I’m going to be against any change that doesn’t bring us closer to LLDB’s style of lower_case simply on the grounds that a move which brings us farther away from global consistency is strictly worse than one which brings us closer, despite ones personal aesthetic preferences.

And so far, I don’t really see that addressed here (or in the thread)

zturner requested changes to this revision.Feb 19 2019, 7:15 AM

Since someone already accepted this, I suppose I should mark require changes to formalize my dissent

This revision now requires changes to proceed.Feb 19 2019, 7:15 AM

Since someone already accepted this, I suppose I should mark require changes to formalize my dissent

As it was Chris @lattner who accepted it, is your request for changes just based on the fact that it doesn't fit LLDB style?

I was trying to find where the LLDB coding style was documented but could only find this https://llvm.org/svn/llvm-project/lldb/branches/release_36/www/lldb-coding-conventions.html, seemingly this file has been move/removed around release 3.9.

But reading that link its seems unlikely to find a concencous between either naming conventions or formatting style between LLDB and the rest of LLVM, unless of course the solution would be to adopt LLDB style completely (for which I'd suspect there would be counter objections)

If that isn't a reality is blocking the rest of the LLVM community from relieving some of their eye strain an acceptable solution?

Since someone already accepted this, I suppose I should mark require changes to formalize my dissent

As it was Chris @lattner who accepted it, is your request for changes just based on the fact that it doesn't fit LLDB style?

(Side note, but I think everyones' opinions hold the same weight with regards to issues like this, and that is in part why changes like this are so difficult to move forward with. Because it takes a lot of consensus, not just one person, to drive a change.)

To answer your question: In a way, yes. To be clear, I don't actually care what the style we end up with is and I think arguing over which specific style we end up adopting is a silly argument. No style is going to be aesthetically pleasing to everyone, and I conjecture that any style we choose will have just as many people who dislike it as there are that like it. A coding style should serve exactly two purposes (in this order of importance): 1) Consistency across codebase, and 2) Visually distinguish semantically names that refer to semantically different things.

As long as it satisfies those two things, the specific choice of style is almost incosequential.

My objection is based on the fact adopting LLDB's style makes #1 significantly better at literally no incremental cost, while maintaining #2. So, the benefit of changing to literally any other style would be dwarfed by the benefit of changing to this particular style, because we would get instant consistency across a large swath of code.

If someone wants to propose a mass change of LLDB's names, I would actually be fine with that, but I suspect that will be just as difficult to drive, and so the path of least resistance here is to just use it and move on with our lives.

I was trying to find where the LLDB coding style was documented but could only find this https://llvm.org/svn/llvm-project/lldb/branches/release_36/www/lldb-coding-conventions.html, seemingly this file has been move/removed around release 3.9.

But reading that link its seems unlikely to find a concencous between either naming conventions or formatting style between LLDB and the rest of LLVM, unless of course the solution would be to adopt LLDB style completely (for which I'd suspect there would be counter objections)

If there are counter objections, I'd like to hear them. "I'm not a fan of that style" is not really a strong counter-objection in my opinion, because if we require a unanimous consensus on the most aesthetically pleasing style, I'm pretty sure nothing will ever happen. After all, I'm not a huge fan of LLDB's style myself. But as with any coding standard, you just deal with it.

If that isn't a reality is blocking the rest of the LLVM community from relieving some of their eye strain an acceptable solution?

Inconsistency is worse than eye strain, because it *causes* eye strain, as well as discourages people from contributing to the code at all. Anyone who has worked on both LLDB and LLVM can attest to how jarring the shift is moving back and forth between them, and that is a much more serious problem than a subset of developers who don't like something and another subset who do.

MyDeveloperDay added a comment.EditedFeb 21 2019, 1:24 PM

...

I can't argue with anything you say.. but I guess to reinforce your point introducing what is effectively a 3rd style would likely cause even more jarring...

The funny thing is this so reminds me of those religious bracketing style debates we had for decades.. then clang-format came along and I feel I stopped caring, I just let it do it for me on saving.. we need clang-tidy to do the same but for naming conventions.

I can't argue with anything you say.. but I guess to reinforce your point introducing what is effectively a 3rd style would likely cause even more jarring...

Zach isn't introducing a new style, the style already exists and is consistently used by what I think is our 3rd largest subproject. It happens not to be used at all by the two largest subprojects, but those subprojects already aren't consistent with themselves.
I would not mind a more concerted effort to migrate to whatever style we pick, which was notably lacking last time around. Then the jarring inconsistencies would go away, and we could all get back to complaining about content and not style.

I can't argue with anything you say.. but I guess to reinforce your point introducing what is effectively a 3rd style would likely cause even more jarring...

Zach isn't introducing a new style, the style already exists and is consistently used by what I think is our 3rd largest subproject. It happens not to be used at all by the two largest subprojects, but those subprojects already aren't consistent with themselves.
I would not mind a more concerted effort to migrate to whatever style we pick, which was notably lacking last time around. Then the jarring inconsistencies would go away, and we could all get back to complaining about content and not style.

If I read the post correctly, it was actually agreeing with me (because it said "to reinforce your point...". Meaning that something such as lowerCaseCamel would be the third style being referred to, while my proposal keeps the number of styles to 2. But, maybe I read it wrong. If I read it right, then obviously I agree :)

If I read the post correctly, it was actually agreeing with me (because it said "to reinforce your point...". Meaning that something such as lowerCaseCamel would be the third style being referred to

Correct! just acknowledging your point from a different perspective.

If I read the post correctly, it was actually agreeing with me (because it said "to reinforce your point...". Meaning that something such as lowerCaseCamel would be the third style being referred to

Correct! just acknowledging your point from a different perspective.

Doh! Sorry for the noise.

It looks like the RFC thread has mostly turned into a transition-plan debate, so should we work on the actual convention description here? Extracting the naming conventions from the 3.6-era link mentioned above, we have:

  • types and classes are UpperCamelCase (this is unchanged from current LLVM style)
  • methods are UpperCamelCase (this is also the old LLVM style IIRC)
  • variables are snake_case
  • static data members add "g_" prefix to variable style (although I see a proposal for "s_" instead)
  • nonstatic data members add "m_" prefix to variable style

Did I miss anything really important?

I can understand Zach's position here, but LLDB has historically never conformed to the general LLVM naming or other conventions due to its heritage. It should not be taken as precedent that the rest of the project should follow.

In any case, I think that it is best for this discussion to take place on the llvm-dev list where it is likely to get the most visibility. Would you mind moving comments and discussions there?

I can understand Zach's position here, but LLDB has historically never conformed to the general LLVM naming or other conventions due to its heritage. It should not be taken as precedent that the rest of the project should follow.

In any case, I think that it is best for this discussion to take place on the llvm-dev list where it is likely to get the most visibility. Would you mind moving comments and discussions there?

Hey! Random Clang developer is here after this topic became a little-bit dead as not everyone subbed to LLVM dev-list. I think the best solution for every difficult question is to let the users decide their own future among all the projects. I would announce polls (https://reviews.llvm.org/vote/) and announce them on every dev-list.

I do not see any better solution to decide if we would code like DRE, VD versus expr, decl as @lattner would code. And I am not sure if everyone happy with this_new_styling as @chandlerc and @zturner would code. E.g. I am not happy because I know my if-statements would take two lines of code instead of one and it would be super-duper-mega ugly and difficult to read. Here is an example:

static Optional<const llvm::APSInt *>
getConcreteIntegerValue(const Expr *CondVarExpr, const ExplodedNode *N) {
//...
  if (const auto *DRE = dyn_cast_or_null<DeclRefExpr>(CondVarExpr)) {
    if (const auto *VD = dyn_cast_or_null<VarDecl>(DRE->getDecl())) {
//...
}

would be:

static Optional<const llvm::APSInt *>                                           |
getConcreteIntegerValue(const Expr *cond_var_expr, const ExplodedNode *node) {  |
//...                                                                           |
  if (const auto *decl_ref_expr = dyn_cast_or_null<DeclRefExpr>(cond_var_expr)) {
    if (const auto *var_decl = dyn_cast_or_null<VarDecl>(decl_ref_expr->getDecl())) {
//...                                                                           |
}                                                             whoops column-81 ~^

Hungarian notation on members and globals are cool idea. However, the notation is made without the _ part, so I think mMember is better than m_member as we used to 80-column standard and it is waste of space and hurts your C-developer eyes. I would recommend b prefix to booleans as Unreal Engine 4 styling is used to do that (bIsCoolStyle) and it is handy. It is useful because booleans usually has multiple prefixes: has, have, is and you would list all the booleans together in autocompletion. Yes, there is a problem: if the notation is not capital like the pure Hungarian notation then it is problematic to list and we are back to the BIsCapitalLetter and MMember CamelCase-world where we started (except one project). I think @lattner could say if it is useful as all the Apple projects based on those notations and could be annoying.

static Optional<const llvm::APSInt *>
getConcreteIntegerValue(const Expr *CondVarExpr, const ExplodedNode *N) {
//...

if (const auto *DRE = dyn_cast_or_null<DeclRefExpr>(CondVarExpr)) {
  if (const auto *VD = dyn_cast_or_null<VarDecl>(DRE->getDecl())) {

//...
}

would be:

static Optional<const llvm::APSInt *> |
getConcreteIntegerValue(const Expr *cond_var_expr, const ExplodedNode *node) { |
//... |

if (const auto *decl_ref_expr = dyn_cast_or_null<DeclRefExpr>(cond_var_expr)) {
  if (const auto *var_decl = dyn_cast_or_null<VarDecl>(decl_ref_expr->getDecl())) {

//... |
} whoops column-81 ~^

Hungarian notation on members and globals are cool idea. However, the notation is made without the `_` part, so I think `mMember` is better than `m_member` as we used to 80-column standard and it is waste of space and hurts your C-developer eyes. I would recommend `b` prefix to booleans as Unreal Engine 4 styling is used to do that (`bIsCoolStyle`) and it is handy. It is useful because booleans usually has multiple prefixes: `has, have, is` and you would list all the booleans together in autocompletion. Yes, there is a problem: if the notation is not capital like the pure Hungarian notation then it is problematic to list and we are back to the `BIsCapitalLetter` and `MMember` CamelCase-world where we started (except one project). I think @lattner could say if it is useful as all the Apple projects based on those notations and could be annoying.

FWIW, my suggestion is *not* to expand names like DRE to decl_ref_expr, I agree that doesn't add clarity to the code. Two possibilities: "dre", or "decl" which is what I would write today.

FWIW, my suggestion is *not* to expand names like DRE to decl_ref_expr, I agree that doesn't add clarity to the code. Two possibilities: "dre", or "decl" which is what I would write today.

I totally agree with you, wherever I can I write that out for clarification. Please note that in my example there is two Expr and that is why I pointed out we need acronyms so we cannot really use expr and acronyms usually capital, that is why we went back to the default CamelCase standard. It was a little brainstorming and ping for you guys because I believe you would put those polls out and create a better code-base.