Page MenuHomePhabricator

[clang-tidy] Add new case type to check variables with Hungarian notation
Needs ReviewPublic

Authored by dougpuob on Aug 26 2020, 7:00 PM.

Details

Summary

Add IdentifierNamingCheck::CaseType, CT_HungarianNotation, supporting naming check with Hungarian notation.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Please no worry to give me your suggestions and feedback.

clang-tools-extra/clang-tidy/readability/IdentifierNamingCheck.cpp
185

OK~ I will fix it. Thank you.

186

OK~ I will fix it. Thank you.

208

It is a good question.

Microsoft redefines them, so I would like to add them as part of mapping table, HungarianNotationTable. That means the map include primitive types and partial windows data types.

// Windows Data Type
{"BOOL",            "b"},   // typedef int BOOL;
{"BOOLEAN",         "b"},   // typedef BYTE BOOLEAN;
{"BYTE",            "by"},  // typedef unsigned char BYTE;
{"WORD",            "w"},   // typedef unsigned short WORD;
{"DWORD",           "dw"},  // typedef unsigned long DWORD;

The result will like the following,

WORD wVid = 0x8086;
DWORD dwVidDid = 0x80861234;
211

OK~ I will add them. Thank you. The void* and ptrdiff_t starts with p.

222

I will add them too. Thank you.

288

Nice consideration, thank you. I will add the feature. function pointers will start with fn.

I will add a test like this,

typedef void (*FUNC_PTR_HELLO)();
FUNC_PTR_HELLO Hello = NULL;
// CHECK-MESSAGES: :[[@LINE-1]]:16: warning: invalid case style for variable 'Hello' [readability-identifier-naming]
// CHECK-FIXES: {{^}}FUNC_PTR_HELLO fnHello = NULL;
310

I will check it again. Thank you.

314

I will check it again. Thank you.

384

Oops! I will fix it later.

478

I will fix too.

479

Make sense. If it was you, will you check it at the caller or in the callee? and could I know the reason?

483

OK, I will fix it later. Thank you.

485

OK. But I am curious about it. Is it a rule in this project? or it is a flaw?

563

OK, I will fix it later.

clang-tools-extra/docs/clang-tidy/checks/readability-identifier-naming.rst
21

Agree with you consideration.

  1. I will add more detail about the table, HungarianNotationTable .
  1. I would like treat it as a default table for this moment. Because I don't know does it make sense that make user can customized it in .clang-tidy file. The priority in config is higher than default table. Show my idea as the following,
Checks: '-*,readability-identifier-naming'
CheckOptions:
  - { key: readability-identifier-naming.VariableCase                   , value: szHungarianNotion }
  - { key: readability-identifier-naming.HungarianWordList.uint8        , value: u8                }
  - { key: readability-identifier-naming.HungarianWordList.uint16       , value: u16               }
  - { key: readability-identifier-naming.HungarianWordList.uint32       , value: u32               }
  - { key: readability-identifier-naming.HungarianWordList.uint64       , value: u64               }
  - { key: readability-identifier-naming.HungarianWordList.unsigned_long, value: ul                }
  - { key: readability-identifier-naming.HungarianWordList.long_long    , value: ll                }

How about it?

clang-tools-extra/test/clang-tidy/checkers/readability-identifier-naming-hungarian-notation.cpp
1

OK, make sense. I will fix it later. Thank you.

aaron.ballman added inline comments.Sep 5 2020, 7:24 AM
clang-tools-extra/clang-tidy/readability/IdentifierNamingCheck.cpp
208

I think this would be user-friendly behavior, though I'm curious if we'll run into any conflicting notations this way.

479

I'd handle it in getHungarianNotationTypePrefix() along with the check for an empty type name as both seem like they're checking for a degenerate case.

485

Not really a flaw (there's nothing wrong with top-level const on value types), but sort of a rule. Our coding style basically says to try to match the local styles used by the project. We've never really done top-level const anywhere else in the project, so any time someone introduces it, I usually ask for it to be removed. Also, as much as I love const correctness, many of our APIs are not super const correct (we're getting much better though), so it can also make code somewhat awkward when you hit an older API that accepts something non-const.

clang-tools-extra/docs/clang-tidy/checks/readability-identifier-naming.rst
21

I think allowing the prefixes to be configurable is a good idea and I agree that we should have a sensible default table (what you've proposed so far seems like a reasonable default to me). One thing we should be sure we support (and have tests for) is the case where the user wants to use the default table but change only one prefix from it. e.g., they like the defaults but change long_long from ll to super_.

dougpuob updated this revision to Diff 290099.EditedSep 5 2020, 10:03 AM

Addressed comments by @aaron.ballman

Improved suggestions of code review.

  1. Changed variables named with Decl to InputDecl.
  2. Changed TypeName.length() --> !TypeName.empty().
  3. Added partial Microsft data types to the HungarianNotationTable.
  4. Added long long, long double, ptrdiff_t to the HungarianNotationTable.
  5. Added char8_t, char16_t, char32_t to the HungarianNotationTable. Variables name with char8_t* type start with pc8, Eg. char8_t *pc8Value = 0;
  6. Supported name of function pointer start with fn.
  7. Supported to remove the restrict keyword in IdentifierNamingCheck::getDeclTypeName().
  8. Removed const_cast from Keywords list.

About making it can be configurable in .clang-tidy file, I'd like to separate it to another patch.

Did you upload this incorrectly again, context is missing and seems to be a relative diff from a previous version of this patch?

Did you upload this incorrectly again, context is missing and seems to be a relative diff from a previous version of this patch?

Sorry for it, I think it's my bad. It is possible that I manually merged the last master(github) with changes then updated them both via web interface ...

Can I fix it if switch back to the base (14948a0) then merge all my changes, then update the diff again via web interface? Or do you have any better suggestion?

I am curious about how do you know this mistake? You got error messages with arc patch D86671 ?

Did you upload this incorrectly again, context is missing and seems to be a relative diff from a previous version of this patch?

Sorry for it, I think it's my bad. It is possible that I manually merged the last master(github) with changes then updated them both via web interface ...

Can I fix it if switch back to the base (14948a0) then merge all my changes, then update the diff again via web interface? Or do you have any better suggestion?

I am curious about how do you know this mistake? You got error messages with arc patch D86671 ?

The no context is easy to spot as phab says context not available. Its easy to find knowing that there is no mention of hungarian notation in the trunk version of IdentifierNamingCheck.cpp, yet there is mention of that in the before diff.

The way I do my patches is I create a branch from the current master. Then all commits go into that branch. When its time to update the PR I can just do a diff from <feature_branch> to <master>.
Though I do use arcanist for my patches

arc diff master

arcanist will check to see if the current branch has tags for PR and automatically update that PR. Otherwise it will create a new PR.
If it goes to create a new PR instead of updating an existing one you can pass update

arc diff master --update D86671
dougpuob updated this revision to Diff 290483.Sep 8 2020, 7:30 AM

This is a test with arc diff master --update D86671 command.

dougpuob updated this revision to Diff 290726.Sep 9 2020, 6:43 AM
  • Fixed lint warnings and regression test failures on Windows.
dougpuob updated this revision to Diff 291371.Sep 11 2020, 7:53 PM

Fixed crash on Windows when run regression test (llvm-lit for readability-identifier-naming.cpp file).

This is because over range parameter made ctor of std::string copying out of range memory from source. The over range parameter came from SourceManager::getCharacterData() function with the SourceLocation returning value of ValDecl->getLocation().

Find the terminated char insteads of calling ValDecl->getLocation() function.

Did you upload this incorrectly again, context is missing and seems to be a relative diff from a previous version of this patch?

Sorry for it, I think it's my bad. It is possible that I manually merged the last master(github) with changes then updated them both via web interface ...

Can I fix it if switch back to the base (14948a0) then merge all my changes, then update the diff again via web interface? Or do you have any better suggestion?

I am curious about how do you know this mistake? You got error messages with arc patch D86671 ?

The no context is easy to spot as phab says context not available. Its easy to find knowing that there is no mention of hungarian notation in the trunk version of IdentifierNamingCheck.cpp, yet there is mention of that in the before diff.

The way I do my patches is I create a branch from the current master. Then all commits go into that branch. When its time to update the PR I can just do a diff from <feature_branch> to <master>.
Though I do use arcanist for my patches

arc diff master

arcanist will check to see if the current branch has tags for PR and automatically update that PR. Otherwise it will create a new PR.
If it goes to create a new PR instead of updating an existing one you can pass update

arc diff master --update D86671

@njames93, thank you. I updated three updates with the way you told me, seems work fine.

aaron.ballman added inline comments.Sep 14 2020, 10:56 AM
clang-tools-extra/clang-tidy/readability/IdentifierNamingCheck.cpp
564

unsigned long long -> ull

573

ULONG -> ul
HANDLE -> h

584

I'm not certain how valid it is to look at just the type and decide that it's a null-terminated string. For instance, the following is not an uncommon pattern: void something(const char *buffer, size_t length); and it would be a bit strange to change that into szBuffer as that would give an indication that the buffer is null terminated. You could look at surrounding code for additional information though, like other parameters in a function declaration. As an aside, this sort of heuristic search may also allow you to change length into cbLength instead of nLength for conventions like the Microsoft one.

However, for local variables, I don't see how you could even come up with a heuristic like you could with parameters, so I'm not certain what the right answer is here.

657

No need for this if statement, the for loop won't run anyway if PtrCount == 0.

667

ND instead of Decl.

The function name doesn't really help me to understand why you'd call this as opposed to getting the type information as a string from the NamedDecl itself. I'm a bit worried about maintaining this code as the language evolves -- Clang will get new keywords, and someone will have to remember to come update this code. Could you get away with using Decl->getType()->getAsString() and working with that rather than going back to the original source code and trying to parse manually?

668

const auto * since the type is spelled out in the initialization.

1004

const auto * because the type is spelled out in the initialization.

clang-tools-extra/clang-tidy/readability/IdentifierNamingCheck.h
39

Can you go with ND (or something else) instead of Decl since that's a type name?

Replied comments by @aaron.ballman

clang-tools-extra/clang-tidy/readability/IdentifierNamingCheck.cpp
564

OK.

573

OK.

584

OK (size_t nLength --> cbLength)

About the void something(const char *buffer, size_t length) pattern. char is a special type which can express signed char and unsigned char. One can express C string(ASCII), another expresses general data buffer(in hex). I think you are mentioning when it is a general data buffer. I agree with you if it changed the name from buffer to szBuffer for general data buffer is strange. I prefer to use uint8_t or unsigned char instead.

How about adding a new config option for it? like, - { key: readability-identifier-naming.HungarainNotation.DontChangeCharPointer , value: 1 } Users can make decision for their projects. For consistency with HN, the default will still be changed to szBuffer in your case.

If I add the option, does it make sense to you from your side?

657

OK! redundant code.

667

OK, I should do it. (ND instead of Decl.)

Use Decl->getType()->getAsString() is a good way. But HN is a strongly-typed notation, if users haven't specific the include directories, the checking result may look messy (it will be changed to unexpected type). So I parse a string with SourceLocation, just for user-friendly.

I understand your consideration, maintaining-friendly is also important. I can implement it with Decl->getType()->getAsString(), if my explanation can't convince you still. It is also fine to me. Or you think it just need a better appropriate function name?

668

OK.

1004

OK.

clang-tools-extra/clang-tidy/readability/IdentifierNamingCheck.h
39

It's my mistake, you mentioned last time. I will check them all.

Hi @aaron.ballman
About changing size_t nLength to cbLength. I searched MSVC folders with size_t, many names of variable start with n, or i in MFC related files. So I prefer to keep it starts with n. Another side to name starts with cb, I found variables like cbXxx are defined with ULONG, DWORD, or UCHAR type.

Maybe make them as options in config (.clang-tidy) is more practical. The prefixes in the default table follows its abbreviation or acronym of data type.

Hi @aaron.ballman
About changing size_t nLength to cbLength. I searched MSVC folders with size_t, many names of variable start with n, or i in MFC related files. So I prefer to keep it starts with n. Another side to name starts with cb, I found variables like cbXxx are defined with ULONG, DWORD, or UCHAR type.

I think the important point is that cb is used for APIs that specify a count of bytes, whereas i, n, and l are used for integers (there is no specific prefix for size_t that I'm aware of or that MSDN documents, so the lack of consistency there is not too surprising). That's why I mentioned that using a heuristic approach may allow you to identify the parameters that are intended to be a count of bytes so that you can use the more specific prefix if there's more specific contextual information available.

clang-tools-extra/clang-tidy/readability/IdentifierNamingCheck.cpp
584

I'm not certain that approach would make sense because the decision needs to be made on a case-by-case basis. Taking the C standard library as an example set of APIs, a function like char *strncpy(char * restrict s1, const char * restrict s2, size_t n); is one where this functionality should not change s1 or s2 to be szS1 or szS2 because the sz modifier indicates a null-terminated string, but these strings do not have to be null terminated. However, an API like char *strncat(char * restrict s1, const char * restrict s2, size_t n); should change s1 to be szS1 because that string is required to be null terminated. In both cases, the declarations use char.

uint8_t and friends are newer datatypes (added in C99) and are not required to be present (so they're not portable to all implementations), which may account for buffer-based APIs still using char.

I think adding some heuristics to the check may help some of these situations pick the correct prefix, but I'm worried that cases like the two above are going to be almost intractable for the check. I don't have a good idea of how prevalent that kind of issue will be, though. One thing that would help me feel more comfortable would be to look at some data about how the check performs on some various code bases. e.g., pick some large or popular libraries that use C, C++, or both (Win32 SDK, LLVM headers, OpenSSL, etc), run the check over it, and check the results to see how often the prefix is right compared to how often it's wrong. If we find it's mostly right and only gets it wrong a small percentage of the time, then we've likely found a good compromise for the check's behavior. If we find it gets it wrong too often, then the kinds of tweaks needed may be more obvious.

Btw, the reason I think it's important to be careful about false positives for this particular check is because Hungarian notation is intended to convey semantic information in some cases (like whether a buffer is null terminated or not) and that can have security implications. This is the sort of check where it's easy for a person to change all the identifiers in their program assuming they're correct and only find out about the issue much later when someone else gets the semantics wrong because of an incorrect prefix.

667

But HN is a strongly-typed notation, if users haven't specific the include directories, the checking result may look messy (it will be changed to unexpected type).

I'm not certain I understand what issue you're worried about here. Do you have a code example that might help clarify?

Hi @aaron.ballman
About changing size_t nLength to cbLength. I searched MSVC folders with size_t, many names of variable start with n, or i in MFC related files. So I prefer to keep it starts with n. Another side to name starts with cb, I found variables like cbXxx are defined with ULONG, DWORD, or UCHAR type.

I think the important point is that cb is used for APIs that specify a count of bytes, whereas i, n, and l are used for integers (there is no specific prefix for size_t that I'm aware of or that MSDN documents, so the lack of consistency there is not too surprising). That's why I mentioned that using a heuristic approach may allow you to identify the parameters that are intended to be a count of bytes so that you can use the more specific prefix if there's more specific contextual information available.

Hi @aaron.ballman

  1. Making sz as a prefix to the char* parameters of strncpy() and strncat() functions. I read the code of strncpy function, seems null-terminated to parameters is essential. I thought those parameters to strXxx() are also able to the parameter of strlen function knowing the length of a C string(it can be passed to be a parameter or assigned to be a variable, those point to the identical memory. strXxx functions tell the end by null character in the memory). Moreover, I searched source code of OpenSSL, I found the project uses most of char* as C string, and data buffer to retrieve data with unsigned char*. unsigned char is a primitive type in C89 (is portable than uint8_t). Maybe current mechanism is a good way to hint users that unsigned char* is more explicit than char* if they want to treat it as a buffer to retrieve data.
  1. Adding heuristics to pick the correct prefix. Do you mean that use the heuristics approach to give naming suggestion as warning message, or correct it with the --fix option? Is there any existing in this project can be a sample for me? If my thought about the heuristics is correct. The information of parameters to functions I can query its type, name, and location. As we have discussed that the declaration type is insufficient to tell sz for char*, or cb for count of bytes instead i, n, or l, so the name and location provide more information for comparing names with string mapping tables and location relationship between parameters/variables. Take FILE * fopen ( const char * filename, const char * mode ); for example(C String). It will change filename to szFilename if the name string is in the mapping table, change mode to pcMode if mode string is not in the mapping table. Take size_t fread ( void * ptr, size_t size, size_t count, FILE * stream ); for example(count of bytes). It will change size to cbSize, because its previous parameter is a void pointer.

    I can smell that there is always exceptional cases, and not good for maintainability.
  1. Changing i, n, and l to cb for APIs that specify a count of bytes. I have no idea how to distinguish when should add the cb instead of integer types for heuristics. If I was the user, I wish clang-tidy don't change integer types from cbXxx to iCbXxx or nCbXxx. Keep cb with default or an option in config file, like IgnoreParameterPrefix, IgnoreVariablePrefix.
  1. Why not use Decl->getType()->getAsString() I printed the differences as a log for your reference. (https://gist.github.com/dougpuob/e5a76db6e2c581ba003afec3236ab6ac#file-clang-tidy-readability-identifier-naming-hungarian-notation-log-L191) Previous I mentioned "if users haven't specific the include directories, the checking result may look messy". This concern will not happened because the clang-diagnostic-error, if users didn't specific header directories, clang-tidy will show the error (L414).

Not strictly relevant here, but this does open up the idea of enforcing the style where an enum constant is prefixed by the initials of the enum name.

clang-tools-extra/test/clang-tidy/checkers/readability-identifier-naming-hungarian-notation.cpp
25

Class names shouldn't use hungarian notation.

Not strictly relevant here, but this does open up the idea of enforcing the style where an enum constant is prefixed by the initials of the enum name.

I like this idea. There is a case when EnumConstantPrefix and EnumConstantCase=szHungarianNotation options are set, it may take similar affect(which will be the first) or be overwritten? I can have it a try later.

Showing my conception as the following:

// [Before]
typedef enum {
    RevValid             = -1,
    RevNone              = 0, 
    RevCrlReason         = 1, 
    RevHold              = 2, 
    RevKeyCompromise     = 3, 
    RevCaCompromise      = 4  
} REVINFO_TYPE;

// [After]
typedef enum {
    rtRevValid             = -1
    rtRevNone              = 0,
    rtRevCrlReason         = 1,
    rtRevHold              = 2,
    rtRevKeyCompromise     = 3,
    rtRevCaCompromise      = 4 
} REVINFO_TYPE;

// [After] EnumConstantPrefix first case
// EnumConstantCase=snHungarianNotation
// EnumConstantPrefix=pre_
typedef enum {
    pre_rtRevValid             = -1
    pre_rtRevNone              = 0,
    pre_rtRevCrlReason         = 1,
    pre_rtRevHold              = 2,
    pre_rtRevKeyCompromise     = 3,
    pre_rtRevCaCompromise      = 4 
} REVINFO_TYPE;
clang-tools-extra/test/clang-tidy/checkers/readability-identifier-naming-hungarian-notation.cpp
25

OK~ I have classified CheckOptions, and all test cases one by one in the next diff.

// RUN:   -config='{ CheckOptions: [ \
// RUN:     { key: readability-identifier-naming.ClassMemberCase              , value: szHungarianNotation }, \
// RUN:     { key: readability-identifier-naming.ConstantCase                 , value: szHungarianNotation }, \
// RUN:     { key: readability-identifier-naming.ConstantMemberCase           , value: szHungarianNotation }, \
// RUN:     { key: readability-identifier-naming.ConstantParameterCase        , value: szHungarianNotation }, \
// RUN:     { key: readability-identifier-naming.ConstantPointerParameterCase , value: szHungarianNotation }, \
// RUN:     { key: readability-identifier-naming.ConstexprVariableCase        , value: szHungarianNotation }, \
// RUN:     { key: readability-identifier-naming.GlobalConstantCase           , value: szHungarianNotation }, \
// RUN:     { key: readability-identifier-naming.GlobalConstantPointerCase    , value: szHungarianNotation }, \
// RUN:     { key: readability-identifier-naming.GlobalVariableCase           , value: szHungarianNotation }, \
// RUN:     { key: readability-identifier-naming.LocalConstantCase            , value: szHungarianNotation }, \
// RUN:     { key: readability-identifier-naming.LocalConstantPointerCase     , value: szHungarianNotation }, \
// RUN:     { key: readability-identifier-naming.LocalPointerCase             , value: szHungarianNotation }, \
// RUN:     { key: readability-identifier-naming.LocalVariableCase            , value: szHungarianNotation }, \
// RUN:     { key: readability-identifier-naming.MemberCase                   , value: szHungarianNotation }, \
// RUN:     { key: readability-identifier-naming.ParameterCase                , value: szHungarianNotation }, \
// RUN:     { key: readability-identifier-naming.PointerParameterCase         , value: szHungarianNotation }, \
// RUN:     { key: readability-identifier-naming.PrivateMemberCase            , value: szHungarianNotation }, \
// RUN:     { key: readability-identifier-naming.StaticConstantCase           , value: szHungarianNotation }, \
// RUN:     { key: readability-identifier-naming.StaticVariableCase           , value: szHungarianNotation }, \
// RUN:     { key: readability-identifier-naming.StructCase                   , value: szHungarianNotation }, \
// RUN:     { key: readability-identifier-naming.UnionCase                    , value: szHungarianNotation }, \
// RUN:     { key: readability-identifier-naming.VariableCase                 , value: szHungarianNotation }  \
// RUN:   ]}'
aaron.ballman added inline comments.Fri, Oct 2, 7:15 AM
clang-tools-extra/test/clang-tidy/checkers/readability-identifier-naming-hungarian-notation.cpp
25

Class names shouldn't use hungarian notation.

That may be debatable as I've definitely seen C used as a prefix for class names and I used as a prefix for pure virtual class names (interfaces). Doing a quick search on Google brings up evidence that this isn't uncommon.

dougpuob added inline comments.Sat, Oct 10, 9:46 PM
clang-tools-extra/test/clang-tidy/checkers/readability-identifier-naming-hungarian-notation.cpp
25

Agree both of you because I saw them in different projects. I will add this feature as an option (default is off, user can enable it in .clang-tidy).

dougpuob updated this revision to Diff 297443.Sat, Oct 10, 11:11 PM
  • Support to add Class prefix for Hungarian Notation.
  • Support to add Enum prefix for Hungarian Notation.
  • Support unsigned long long, ULONG, and HANDLE types and others.
  • Support options for Hungarian Notation in config file.
  • Added more test cases.
dougpuob updated this revision to Diff 297459.EditedSun, Oct 11, 6:40 AM

Previous arc diff got problem(failed on builtbot), do it again with arc diff master --update D86671.

Now the diff is correct with above command, and I can do arc patch D86671, it passed to build and regression test. But it still failed on builtbot, one error showed clang-tools-extra/test/clang-tidy/checkers/readability-identifier-naming-hungarian-notation.cpp, I have renamed the file, now the file should not exist. Does anyone know what is going on?

INFO    creating branch phab-diff-297459 at bdb193a6ed32edbedbcbe2e9e34d2c7d23ae7815
INFO    Base branch revision is bdb193a6ed32edbedbcbe2e9e34d2c7d23ae7815
INFO    git reset, git cleanup...
INFO    Analyzing D86671
INFO    applying original diff
INFO    Applying diff 297459 for revision D86671...
ERROR   exception: Applying patch failed:
error: clang-tools-extra/test/clang-tidy/checkers/readability-identifier-naming-hungarian-notation.cpp: No such file or directory  <-- the file should not exist !!!
error: patch failed: clang-tools-extra/docs/clang-tidy/checks/readability-identifier-naming.rst:34
error: clang-tools-extra/docs/clang-tidy/checks/readability-identifier-naming.rst: patch does not apply
error: patch failed: clang-tools-extra/clang-tidy/readability/IdentifierNamingCheck.h:36
error: clang-tools-extra/clang-tidy/readability/IdentifierNamingCheck.h: patch does not apply
error: patch failed: clang-tools-extra/clang-tidy/readability/IdentifierNamingCheck.cpp:180
error: clang-tools-extra/clang-tidy/readability/IdentifierNamingCheck.cpp: patch does not apply
dougpuob updated this revision to Diff 297820.Tue, Oct 13, 4:09 AM

Merged with the master then do arc diff master --update D86671 again. (Trying to resolve "No such file or directory" error from the scripts/phabtalk/apply_patch2.py!_apply_diff() function)

dougpuob updated this revision to Diff 298001.Tue, Oct 13, 5:45 PM

Rename file name, "readability-identifier-naming-hungarian-notation-default.cpp" to "readability-identifier-naming-hungarian-notation.cpp".

Hi @aaron.ballman and @njames93,
I addressed your code review suggestions and supported Hungarian Notation prefix for decl of enum and class(option) at latest patch. Unfortunately, I encountered a problem that new patch failed on remote BuildBot, it showed the "No such file or directory" error message when it was trying to call apply_patch2.py!_apply_diff(). Do you have any idea what is going on? Do you suggest I create a new Diff(new diff id) for it ?

dougpuob updated this revision to Diff 298037.Tue, Oct 13, 10:47 PM

Merged all commits as one single commit. (Trying to resolve "No such file or directory" error from the scripts/phabtalk/apply_patch2.py!_apply_diff() function)

Is this diff been created incorrectly again?

Taking a step back, Is Hungarian notation really a case style, Seems to me its mainly about the prefix and a user may want DWORD dwUPPER_CASE, Right now there is no way of adopting that.
Maybe extend the options for hungarian type decls to

<Type>Case
<Type>Prefix
<Type>Suffix
<Type>HungarianPrefix

<Type>HungarianPrefix would likely be a bool and if enabled, it would be prepended to <Type>Prefix
I could see a situation like this

[Options]
// VariableCase: UPPER_CASE
// VariablePrefix: PRE_
// VariableSuffix: _POST
// VariableHungarianPrefix: true

DWORD MyVariable; -> DWORD dwPRE_MY_VARIABLE_POST;

This would give users full control of exactly how they want their declarations with no hidden surprises.

Granted this approach would require a little rework but it would satisfy more users.

Is this diff been created incorrectly again?

Taking a step back, Is Hungarian notation really a case style, Seems to me its mainly about the prefix and a user may want DWORD dwUPPER_CASE, Right now there is no way of adopting that.
Maybe extend the options for hungarian type decls to

<Type>Case
<Type>Prefix
<Type>Suffix
<Type>HungarianPrefix

<Type>HungarianPrefix would likely be a bool and if enabled, it would be prepended to <Type>Prefix
I could see a situation like this

[Options]
// VariableCase: UPPER_CASE
// VariablePrefix: PRE_
// VariableSuffix: _POST
// VariableHungarianPrefix: true

DWORD MyVariable; -> DWORD dwPRE_MY_VARIABLE_POST;

This would give users full control of exactly how they want their declarations with no hidden surprises.

Granted this approach would require a little rework but it would satisfy more users.

Hi @njames93,

It's a smart idea, the rework for it is worth. There is a special case if lowercase name with Hungarian prefix, it possibly makes variable ambiguous, like the Case1. How about separating them and aNy_CasE with an underscore, like Case2 ?

// Case1
bool bRIGHT_LEVEL;     // UPPER_CASE
bool bRightLevel;      // CamelCase
bool bRight_Level;     // Camel_Snake_Case
bool baNy_CasE;        // aNy_CasE
bool bright_level;     // lower_case
bool brightLevel;      // camelBack
bool bright_Level;     // camel_Snake_Back
.....^^^^^^ <-- right? bright?

// Case2
bool bRIGHT_LEVEL;     // UPPER_CASE
bool bRightLevel;      // CamelCase
bool bRight_Level;     // Camel_Snake_Case
bool b_aNy_CasE;       // aNy_CasE
bool b_right_level;    // lower_case
bool b_rightLevel;     // camelBack
bool b_right_Level;    // camel_Snake_Back
.....^^^^^^^ <-- add an underscore

Hi @njames93,

It's a smart idea, the rework for it is worth. There is a special case if lowercase name with Hungarian prefix, it possibly makes variable ambiguous, like the Case1. How about separating them and aNy_CasE with an underscore, like Case2 ?

// Case1
bool bRIGHT_LEVEL;     // UPPER_CASE
bool bRightLevel;      // CamelCase
bool bRight_Level;     // Camel_Snake_Case
bool baNy_CasE;        // aNy_CasE
bool bright_level;     // lower_case
bool brightLevel;      // camelBack
bool bright_Level;     // camel_Snake_Back
.....^^^^^^ <-- right? bright?

// Case2
bool bRIGHT_LEVEL;     // UPPER_CASE
bool bRightLevel;      // CamelCase
bool bRight_Level;     // Camel_Snake_Case
bool b_aNy_CasE;       // aNy_CasE
bool b_right_level;    // lower_case
bool b_rightLevel;     // camelBack
bool b_right_Level;    // camel_Snake_Back
.....^^^^^^^ <-- add an underscore

That still has hidden surprises. Maybe instead of a bool, an enum is used for controlling hungarian prefix (Off|On|...).
Can't think of a good name for the third option but it would do the inserting of '_' (bright_level ->b_right_level) or capitalising the first word of the identifier (brightLevel -> bRightLevel).

dougpuob added a comment.EditedTue, Oct 20, 4:38 PM

Hi @njames93,

It's a smart idea, the rework for it is worth. There is a special case if lowercase name with Hungarian prefix, it possibly makes variable ambiguous, like the Case1. How about separating them and aNy_CasE with an underscore, like Case2 ?

// Case1
bool bRIGHT_LEVEL;     // UPPER_CASE
bool bRightLevel;      // CamelCase
bool bRight_Level;     // Camel_Snake_Case
bool baNy_CasE;        // aNy_CasE
bool bright_level;     // lower_case
bool brightLevel;      // camelBack
bool bright_Level;     // camel_Snake_Back
.....^^^^^^ <-- right? bright?

// Case2
bool bRIGHT_LEVEL;     // UPPER_CASE
bool bRightLevel;      // CamelCase
bool bRight_Level;     // Camel_Snake_Case
bool b_aNy_CasE;       // aNy_CasE
bool b_right_level;    // lower_case
bool b_rightLevel;     // camelBack
bool b_right_Level;    // camel_Snake_Back
.....^^^^^^^ <-- add an underscore

That still has hidden surprises. Maybe instead of a bool, an enum is used for controlling hungarian prefix (Off|On|...).
Can't think of a good name for the third option but it would do the inserting of '_' (bright_level ->b_right_level) or capitalising the first word of the identifier (brightLevel -> bRightLevel).

Maybe it doesn't need a new name, how about (Off|On|lower_case|camelBack) or (Off|On|sz_lower_case|szszCamelCase)?

dougpuob updated this revision to Diff 299847.EditedWed, Oct 21, 7:45 PM
  • Changed Hungarian Notation as an prefix style (not a case type).
  • Changed XxxHungarianPrefix as an enumeration from bool.
  • Updated documents.

Hi @njames93,
As our discussion, I updated my change. Including the enum, now it looks like VariableHungarianPrefix: (Off|On|sz_lower_case|szCamelCase).

aaron.ballman added inline comments.Thu, Oct 22, 8:23 AM
clang-tools-extra/clang-tidy/readability/IdentifierNamingCheck.cpp
237

It seems like this function should take HNOption as a reference rather than a shared_ptr.

clang-tools-extra/clang-tidy/readability/IdentifierNamingCheck.h
93

I'd like to avoid using a shared_ptr here if we can avoid it -- do we expect this to be super expensive to copy by value (I would imagine it'll be an expensive copy but we don't make copies all that often, but maybe my intuition is wrong)?

Hi @aaron.ballman,

Thank you for the suggestion.

clang-tools-extra/clang-tidy/readability/IdentifierNamingCheck.cpp
237

OK!

clang-tools-extra/clang-tidy/readability/IdentifierNamingCheck.h
93

Good idea. A reference object is good enough. I will change it at next commit.

dougpuob updated this revision to Diff 300293.Fri, Oct 23, 7:52 AM
  • Changed the type of HNOption from std::shared_ptr to reference.