This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Handle C variables with name that matches c++ access specifier
ClosedPublic

Authored by psigillito on Jan 15 2022, 8:23 PM.

Diff Detail

Event Timeline

psigillito requested review of this revision.Jan 15 2022, 8:23 PM
psigillito created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptJan 15 2022, 8:23 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
psigillito added a project: Restricted Project.

Sorry, this is my first time using Phabricator. I am not sure if it matters that .arclint is changed in this review, it is not part of my commit.

This issue only appears to impact the accessSpecifier keywords. For example, a c variable declared with the name 'private' will be treated as an access specifier.

I tried to make parsing the identifiers more restrictive by requiring either an @ prefix for obj-c or a colon after the keyword for cpp. Unfortunately, that does not allow an incorrect access specifier (i.e without a colon) to be identified as an accessor. I think a missing colon is probably a common typo so I dont want to remove identifying a access specifier with a missing colon correctly.

I think the correct approach is to check if the var named 'private' is followed by an operator or symbol indicating it is variable.

curdeius retitled this revision from Handle C variables with name that matches c++ access specifier to [clang-format] Handle C variables with name that matches c++ access specifier.Jan 16 2022, 1:11 AM
MyDeveloperDay requested changes to this revision.Jan 16 2022, 1:52 AM
MyDeveloperDay added inline comments.
.arclint
5 ↗(On Diff #400340)

remove this change

clang/lib/Format/UnwrappedLineFormatter.cpp
102–103

Style.isCpp()

clang/lib/Format/UnwrappedLineParser.cpp
2511

elide {

clang/unittests/Format/FormatTest.cpp
3034

This isn't enough of a test, if you are going to add this CppOperatorsFollowingVar then you'd better test every case

This revision now requires changes to proceed.Jan 16 2022, 1:52 AM
curdeius requested changes to this revision.Jan 16 2022, 7:19 AM

Thanks for having a try on this.
However, I don't like this approach too much. You add many changes and a single test. That's not sufficient.
Also, handling C++ keywords in all cases (e.g. delete as a function name) *may* need to distinguish whether we format a C file or a C++ file. It's probably impossible to do this without user input (.h extension is used in both languages for example).
We'd maybe need to add C as language option and let the user specify the language (-x c?).
That in turn may be painful (because not automatic).
But, you may have a better solution.
My 2 cents.

.arclint
5 ↗(On Diff #400340)

Unrelated. Please revert.

clang/lib/Format/UnwrappedLineFormatter.cpp
102–103

Please refactor to e.g. a lambda to ease the understanding. And use a series of ifs and returns.

Thanks for having a try on this.
However, I don't like this approach too much. You add many changes and a single test. That's not sufficient.
Also, handling C++ keywords in all cases (e.g. delete as a function name) *may* need to distinguish whether we format a C file or a C++ file. It's probably impossible to do this without user input (.h extension is used in both languages for example).
We'd maybe need to add C as language option and let the user specify the language (-x c?).
That in turn may be painful (because not automatic).
But, you may have a better solution.
My 2 cents.

Thanks, I am just starting to understand some of the code base so my changes should be taken with a grain of salt. The more I think about this, I do not think my change is correct. As you pointed out with the 'delete' keyword, my changes would not correctly handle a struct named 'delete' i.e.

struct delete foo = {0};
delete.val;

I think adding a cmd line argument specifying you are using C code is more correct. If the user does not specify they are using C, clang-format would format the code as it does now.

The only thing I don't like about this is that it might be kind of clumsy to only be able to specify your language is C. I can take a stab at implementing this if this is the route we want to go.

Thanks for having a try on this.
However, I don't like this approach too much. You add many changes and a single test. That's not sufficient.
Also, handling C++ keywords in all cases (e.g. delete as a function name) *may* need to distinguish whether we format a C file or a C++ file. It's probably impossible to do this without user input (.h extension is used in both languages for example).
We'd maybe need to add C as language option and let the user specify the language (-x c?).
That in turn may be painful (because not automatic).
But, you may have a better solution.
My 2 cents.

I second just adding a new language.

MyDeveloperDay added inline comments.Jan 17 2022, 6:40 AM
clang/lib/Format/UnwrappedLineParser.cpp
2505–2512

I get what your trying to do but what happens in this scenario

private::mynamespace::g_value -1;

Removed old approach and started initial changes to add C language

Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2022, 9:09 PM

revert changes

Hopefully this revision works

Include multiple commits in review

  • undo delete
  • annoying arc changes

These are just initial changes, there is still a lot of work and test cases to write. I figured I would put this out there to see if this is the direction we were thinking.

clang/lib/Format/UnwrappedLineParser.cpp
2505–2512

Yea, it doesnt parse correctly and is split to a new line after private:. I am going to revert all the changes in this code review and start looking at implementing C as its own language.

MyDeveloperDay requested changes to this revision.Jan 17 2022, 11:57 PM

I'm not a fan of this approach of adding a "C" language, mainly because of the .h problem so ultimately it doesn't solve your problem.

I think this is overkill for what is basically the subtle handling of "public/private/protected/try/delete/new" being used as variables (which frankly is at least in my view not a bright idea in the first place!)

I think ultimately we can try to catch those corner cases but IMHO it doesn't need the addition of a whole new language for that, (we are not at that much on an impasse)

This review contains .arclint that needs removing, this review contains no tests

This revision now requires changes to proceed.Jan 17 2022, 11:57 PM

I actually thought you were on the right lines with your change to parseAccessSpecifier() surely if you see "private:" "public:" or "protected:" but not public:: that's when you call parseAccessSpecifier()

I think mostly I believe clang-format is a 96% done deal in terms of code, a lot of what we do here in my view is about closing those final "corner cases", sometimes our fixes are quite specific,

Something in the back of my head says this is a sledge hammer to crack a nut but that is just my view.. I'm just a little uncomfortable here with this change although it may open doors for other C specific changes, I'm not convinced this solves the problem I just think it moves it a little and maybe adds to complexity for what is likely a very small number of cases.

clang/lib/Format/TokenAnnotator.cpp
3238 ↗(On Diff #400718)

I wonder how many people using C do this?

#define new(x) malloc(x)
#define delete(x) free(x)

Some at least

https://github.com/jbf/tonsai-scheme/blob/83d2a022f60fc17f9913c017fbde29e19cdf5b2b/src/memory.h

clang/lib/Format/UnwrappedLineFormatter.cpp
100

isC() these large OR/AND conditions become unreadable

clang/lib/Format/UnwrappedLineParser.cpp
1221

isC()

1604

I feel everywhere you put isCpp you'll end up doing isCpp() || is C()

This goes back to the arguments for a review I tried before D80079: [clang-format] [NFC] isCpp() is inconsistently used to mean both C++ and Objective C, add language specific isXXX() functions its really isCStyle() if we WERE going to pursue this, I'd want that and not have clauses added everywhere

clang/tools/clang-format/ClangFormat.cpp
124 ↗(On Diff #400718)

Is this required for this patch or is this a nice to have, can we have a separate review for this, it may have merit in its own right but it needs it own tests

467 ↗(On Diff #400718)

I don't understand what you are doing here or why? is this just for C it feels wrong

469 ↗(On Diff #400718)

we don't use braces on single line if

wouldn't you use a tolower or toupper here?

MyDeveloperDay added a comment.EditedJan 18 2022, 12:21 AM

For me the push back on adding the "C" language and having to pass it on the command line for the .h case is about VS Code, Vim, Visual Studio, Emacs having to recognize what language they are in and having to pass the -language=XXX flag, now we are pushing the interpretation of what language we are to them and not to us, which mean we are now out of control.

Each editor will make different assessments as to what language it is (likely only 98% correctly) and we'll get the blame when they say its C# and not C because they made the wrong decision

Personally, If we are going to add C as a language we need something that can control it without external intervention

something like..

// clang-format language='C'

or something that reads the header (like we used to have in Makefile)

---*- C++ -*-===//

especially if there is a sudo standard for that, and uses that as a language hint to say this is a "C" .h header

But I think it might be a mistake for us to rely on the editors to provide us that, and even adding the option could open a tsunami of incorrect language issues which we cannot solve.

Isn't the reality that developers using delete/new/try/catch/interface/public/protected/private/async/var/let as variables the real problem? I'm slightly averse to pandering to them, and I definitely don't want to make my world more complex just to accommodate their poor choice of variable name.

I know if someone used any of those words as variables in LLVM surely we'd tell them to change it. Honestly how much of a problem are we really talking about here that can't be handled by a few corner cases.

As MyDeveloperDay justly pointed out, adding a new language enum LK_C doesn't seem to be the way to go, as you mostly duplicate all the conditions (probably missing some of them).
For the usability, it would be really painful to specify the language for each file that should be considered as C.
So, sorry for this forth and back, but now at least we see that your initial approach was actually better (but please add tests!).

We'd maybe need to add C as language option and let the user specify the language (-x c?).
That in turn may be painful (because not automatic).

Yeah, I should have been more explicit here.

.arclint
15–16 ↗(On Diff #400718)

Please don't change unrelated files.

MyDeveloperDay added inline comments.Jan 18 2022, 1:12 AM
.arclint
15–16 ↗(On Diff #400718)

I don't know about others but I personally don't use arcanist...

I stage my files, make a patch and upload manually

git add <files being modified>  (or `git ls files -m | xargs git add` )
git clang-format 
git diff --cached -U4 > patch_to_submitt.diff

I use the Create Diff/Update Diff functionality via the web interface (I feel that gives me more control)

Just my 2c

curdeius added inline comments.Jan 18 2022, 1:57 AM
.arclint
15–16 ↗(On Diff #400718)

I do, but I never had to modify .arclint or whatever config file arcanist uses.
It just works as a charm.

Create and send a patch (if sending all changes not in main branch):

arc diff main --reviewers ...

Update a patch:

arc diff main --update --revision D123456789

I'm not a fan of this approach of adding a "C" language, mainly because of the .h problem so ultimately it doesn't solve your problem.

I think this is overkill for what is basically the subtle handling of "public/private/protected/try/delete/new" being used as variables (which frankly is at least in my view not a bright idea in the first place!)

I think ultimately we can try to catch those corner cases but IMHO it doesn't need the addition of a whole new language for that, (we are not at that much on an impasse)

This review contains .arclint that needs removing, this review contains no tests

I was actually thinking that adding C as language is the right way, and still am. We have stuff only for arcane C variants, why not push these in the language C and keep C++ free from it?

But I'm fine with either solution.

clang/lib/Format/UnwrappedLineFormatter.cpp
100

I'm more of a switch fan and not your isXXX(), but that's preference. ;)

clang/lib/Format/UnwrappedLineParser.cpp
1604

I think one could use isC(), isCpp(), isObjC(), and what is now isCpp(): isCish().

Ok, as I was starting to add a new language, the scope of changes just continued to grow.

If you think it is worthwhile, I think I can fix this edge case for accessSpecifiers by cleaning up my old approach and adding some tests. I dont like having to add a big set of operators to check against for handling the case where there is a typo and the colon is missing i.e.

class foo { 
private
  bool jim;
public:
  bool bob;
};

I think this is probably the most common error so I think we should support it.

I don't think the delete issue in https://github.com/llvm/llvm-project/issues/46915 is worth the added complexity. Without specifying the language, it is too hard to interpret the programmer's intention. For example, these are totally valid as either a delete or a function call:

delete(foo) // foo is a pointer being deleted 
delete(bar) // bar is a parameter to a function

@HazardyKnusperkeks (its not without merits to add a "C" Language the more i think about it, but I'm not sure quite in the form presented here)

If we DID add a "C" language then we need to add something like you suggested isCpp() already means more than we think.

bool isCStyle() {
    return isCpp() || isC();
}

Personally I really don't want to see the below change or it will change the world, so it would be better if the change was just from !Style.isCpp() to !Style.isCStyle()

!Style.isCpp() && !Style.isC()

The ClangFormat.cpp changes are really unrelated and need to be removed (or resubmitted/discussed separately), and we really must have extensive unit tests if we are adding a new language (which this review doesn't have any) - (that means a new FormatCTests.cpp)

I'd personally want to see really limited use of isCStyle() and ONLY what's needed to fix the exact example/tests we'd add. (its all to easy to find/replace thinking you are doing the right thing when actually we may not, it needs to be targetted changes)

But bottom line, I just don't see how we fix the .h problem? other than if the .clang-format section doesn't have a Language: Cpp then we don't assume its C++ (problem is it already does)

(actually this is a request that users can limit the guessing of languages to only those languages present in the .clang-format file)

This would be good for "C" projects but not much use for C++ or C++/C projects.

It wouldn't solve the delete(foo); issue this can still be a function or a delete of a bracketed variable, or a macro... so I don't see it helps

And bottom line the Lexer will give it to you as tok::kw_private and not as tok::identifier regardless of your language and so you'll need to perhaps do something in FormatTokenLexer to transform it back to an identifier (which is possible) checkout tryMergeLessLess() etc...

  • revert to corner case handling with private
  • revert bad changes and arclint
MyDeveloperDay accepted this revision.Jan 22 2022, 4:02 AM

new nits but basically I think this looks like it might handle this ok? LGTM

We need to run this across a large code base to check

clang/lib/Format/UnwrappedLineFormatter.cpp
106

remove extraneous {} braces

clang/lib/Format/FormatToken.h
125
clang/lib/Format/UnwrappedLineFormatter.cpp
104

In clang-format all lambdas I've seen start with a capital letter.

107

Please just return. And then drop the else.

clang/lib/Format/UnwrappedLineParser.cpp
2497
HazardyKnusperkeks requested changes to this revision.Jan 23 2022, 9:40 PM
This revision now requires changes to proceed.Jan 23 2022, 9:40 PM
psigillito marked 6 inline comments as done.

Review Changes

undo arclint change

Base diff on upstream

  • revert bad arclint changes
  • revert of .arclint file delete
curdeius requested changes to this revision.Jan 25 2022, 12:17 AM

Last nits, apart from this clean up, I'm OK.

clang/lib/Format/FormatToken.h
125

Not done it seems.
Please rename and use a different type. Maybe ImmutableSet? Or just a sorted std::vector?

clang/lib/Format/UnwrappedLineFormatter.cpp
109

Nit here and elsewhere: full stop at the end of comments.

113
116

Please use contains instead of count if using ImmutableSet.

clang/lib/Format/UnwrappedLineParser.cpp
2497
2506–2512
2507

Ditto. Use contains.

This revision now requires changes to proceed.Jan 25 2022, 12:17 AM
psigillito marked an inline comment as done.Jan 26 2022, 8:33 PM
psigillito added inline comments.
clang/lib/Format/FormatToken.h
125

ok updated to sorted std::vector

clang/lib/Format/UnwrappedLineFormatter.cpp
104

Updated.

Have you updated the diff? I still see std::set and not a sorted vector.

psigillito marked 8 inline comments as done.
psigillito marked 2 inline comments as done.
  • revert auto formatting comments
clang/lib/Format/UnwrappedLineFormatter.cpp
119–121

Drop Braces

122–126

Use std::binary_search instead of std::lower_bound. That should simplify the following if.

curdeius requested changes to this revision.Jan 28 2022, 1:36 PM
curdeius added inline comments.
clang/lib/Format/FormatToken.h
124
clang/lib/Format/UnwrappedLineFormatter.cpp
122–129

Please do something along these lines. The idea is to put a cheaper check first.

Note. I haven't tested nor formatted these lines.

845

Revert.

clang/lib/Format/UnwrappedLineParser.cpp
2503–2505

Please use binary_search and put it inside the else branch to avoid it if the first condition is satisfied.
Something like:

if (FormatTok->Tok.is(tok::colon)) {
...
} else if (!binary_search(...) {
} else if (...) {
}

Also, this code and the code in UnwrappedLineFormatter are pretty much similar.
Can we remove this duplication by e.g. setting the token kind here and checking it in the formatter?

This revision now requires changes to proceed.Jan 28 2022, 1:36 PM
psigillito marked 7 inline comments as done.Jan 29 2022, 10:28 AM
psigillito added inline comments.
clang/lib/Format/UnwrappedLineParser.cpp
2503–2505

Yes, since we now set the kind to 'identifier' here, the binary_search check in the formatter is unnecessary. Updated the formatter to no longer do the check.

psigillito marked an inline comment as done.
  • review changes, lower_bound to binary_search
curdeius accepted this revision.Jan 29 2022, 11:43 AM

LGTM but please add an assert(is_sorted).

clang/lib/Format/FormatToken.h
125

Is there a place anywhere that you verify it's sorted?
If no, please add an assert.

+1 for assert

clang/lib/Format/FormatToken.h
125

Could initialize from a lambda in which the assert is located, thus it is only executed once.

static const std::vector<clang::tok::TokenKind> COperatorsFollowingVar = []{
  std::vector<clang::tok::TokenKind> Ret...;
  assert(...);
  return Ret;
}();
This revision is now accepted and ready to land.Jan 29 2022, 12:50 PM
  • wrap vector in function
psigillito marked 2 inline comments as done.
  • unnecessary whitespace

Have you commit access, or do you need some one to commit it for you?

@HazardyKnusperkeks I do not have commit access. This is my first commit to the project. Do I just need to issue an 'arc land' command?

@HazardyKnusperkeks I do not have commit access. This is my first commit to the project. Do I just need to issue an 'arc land' command?

I've never used arc, so I can't tell you anything about that.

You need to apply for commit access, that's no big deal, you can find the address in the LLVM documentation. Or you post name and email for the commit here and someone will commit it on your behalf.

@HazardyKnusperkeks
Ok thanks, someone else can commit on my behalf while I wait for commit access.

name: Philip Sigillito
email: psigillito@gmail.com

Do I need to care about these pre-build checks failing?

Do I need to care about these pre-build checks failing?

The checks passed, the build on linux failed. But as far as I can see it's something in openmp. I assume you did run all the format tests, then everything should be fine.

Do I need to care about these pre-build checks failing?

The checks passed, the build on linux failed. But as far as I can see it's something in openmp. I assume you did run all the format tests, then everything should be fine.

Yes, I ran the format tests.

Thanks!

This revision was landed with ongoing or failed builds.Jan 30 2022, 11:57 AM
This revision was automatically updated to reflect the committed changes.