Page MenuHomePhabricator

[clang-format] Handle C variables with name that matches c++ access specifier
ClosedPublic

Authored by psigillito on Jan 15 2022, 8:23 PM.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
curdeius added inline comments.Jan 18 2022, 1:57 AM
.arclint
15–16 ↗(On Diff #400718)

I do, but I never had to modify .arclint or whatever config file arcanist uses.
It just works as a charm.

Create and send a patch (if sending all changes not in main branch):

arc diff main --reviewers ...

Update a patch:

arc diff main --update --revision D123456789

I'm not a fan of this approach of adding a "C" language, mainly because of the .h problem so ultimately it doesn't solve your problem.

I think this is overkill for what is basically the subtle handling of "public/private/protected/try/delete/new" being used as variables (which frankly is at least in my view not a bright idea in the first place!)

I think ultimately we can try to catch those corner cases but IMHO it doesn't need the addition of a whole new language for that, (we are not at that much on an impasse)

This review contains .arclint that needs removing, this review contains no tests

I was actually thinking that adding C as language is the right way, and still am. We have stuff only for arcane C variants, why not push these in the language C and keep C++ free from it?

But I'm fine with either solution.

clang/lib/Format/UnwrappedLineFormatter.cpp
100

I'm more of a switch fan and not your isXXX(), but that's preference. ;)

clang/lib/Format/UnwrappedLineParser.cpp
1678

I think one could use isC(), isCpp(), isObjC(), and what is now isCpp(): isCish().

Ok, as I was starting to add a new language, the scope of changes just continued to grow.

If you think it is worthwhile, I think I can fix this edge case for accessSpecifiers by cleaning up my old approach and adding some tests. I dont like having to add a big set of operators to check against for handling the case where there is a typo and the colon is missing i.e.

class foo { 
private
  bool jim;
public:
  bool bob;
};

I think this is probably the most common error so I think we should support it.

I don't think the delete issue in https://github.com/llvm/llvm-project/issues/46915 is worth the added complexity. Without specifying the language, it is too hard to interpret the programmer's intention. For example, these are totally valid as either a delete or a function call:

delete(foo) // foo is a pointer being deleted 
delete(bar) // bar is a parameter to a function

@HazardyKnusperkeks (its not without merits to add a "C" Language the more i think about it, but I'm not sure quite in the form presented here)

If we DID add a "C" language then we need to add something like you suggested isCpp() already means more than we think.

bool isCStyle() {
    return isCpp() || isC();
}

Personally I really don't want to see the below change or it will change the world, so it would be better if the change was just from !Style.isCpp() to !Style.isCStyle()

!Style.isCpp() && !Style.isC()

The ClangFormat.cpp changes are really unrelated and need to be removed (or resubmitted/discussed separately), and we really must have extensive unit tests if we are adding a new language (which this review doesn't have any) - (that means a new FormatCTests.cpp)

I'd personally want to see really limited use of isCStyle() and ONLY what's needed to fix the exact example/tests we'd add. (its all to easy to find/replace thinking you are doing the right thing when actually we may not, it needs to be targetted changes)

But bottom line, I just don't see how we fix the .h problem? other than if the .clang-format section doesn't have a Language: Cpp then we don't assume its C++ (problem is it already does)

(actually this is a request that users can limit the guessing of languages to only those languages present in the .clang-format file)

This would be good for "C" projects but not much use for C++ or C++/C projects.

It wouldn't solve the delete(foo); issue this can still be a function or a delete of a bracketed variable, or a macro... so I don't see it helps

And bottom line the Lexer will give it to you as tok::kw_private and not as tok::identifier regardless of your language and so you'll need to perhaps do something in FormatTokenLexer to transform it back to an identifier (which is possible) checkout tryMergeLessLess() etc...

  • revert to corner case handling with private
  • revert bad changes and arclint
MyDeveloperDay accepted this revision.Jan 22 2022, 4:02 AM

new nits but basically I think this looks like it might handle this ok? LGTM

We need to run this across a large code base to check

clang/lib/Format/UnwrappedLineFormatter.cpp
106

remove extraneous {} braces

clang/lib/Format/FormatToken.h
127
clang/lib/Format/UnwrappedLineFormatter.cpp
104

In clang-format all lambdas I've seen start with a capital letter.

107

Please just return. And then drop the else.

clang/lib/Format/UnwrappedLineParser.cpp
2711
HazardyKnusperkeks requested changes to this revision.Jan 23 2022, 9:40 PM
This revision now requires changes to proceed.Jan 23 2022, 9:40 PM
psigillito marked 6 inline comments as done.

Review Changes

undo arclint change

Base diff on upstream

  • revert bad arclint changes
  • revert of .arclint file delete
curdeius requested changes to this revision.Jan 25 2022, 12:17 AM

Last nits, apart from this clean up, I'm OK.

clang/lib/Format/FormatToken.h
127

Not done it seems.
Please rename and use a different type. Maybe ImmutableSet? Or just a sorted std::vector?

clang/lib/Format/UnwrappedLineFormatter.cpp
109

Nit here and elsewhere: full stop at the end of comments.

113
116

Please use contains instead of count if using ImmutableSet.

clang/lib/Format/UnwrappedLineParser.cpp
2711
2720–2726
2721

Ditto. Use contains.

This revision now requires changes to proceed.Jan 25 2022, 12:17 AM
psigillito marked an inline comment as done.Jan 26 2022, 8:33 PM
psigillito added inline comments.
clang/lib/Format/FormatToken.h
127

ok updated to sorted std::vector

clang/lib/Format/UnwrappedLineFormatter.cpp
104

Updated.

Have you updated the diff? I still see std::set and not a sorted vector.

psigillito marked 8 inline comments as done.
psigillito marked 2 inline comments as done.
  • revert auto formatting comments
clang/lib/Format/UnwrappedLineFormatter.cpp
119–121

Drop Braces

122–126

Use std::binary_search instead of std::lower_bound. That should simplify the following if.

curdeius requested changes to this revision.Jan 28 2022, 1:36 PM
curdeius added inline comments.
clang/lib/Format/FormatToken.h
126
clang/lib/Format/UnwrappedLineFormatter.cpp
122–129

Please do something along these lines. The idea is to put a cheaper check first.

Note. I haven't tested nor formatted these lines.

884

Revert.

clang/lib/Format/UnwrappedLineParser.cpp
2717–2719

Please use binary_search and put it inside the else branch to avoid it if the first condition is satisfied.
Something like:

if (FormatTok->Tok.is(tok::colon)) {
...
} else if (!binary_search(...) {
} else if (...) {
}

Also, this code and the code in UnwrappedLineFormatter are pretty much similar.
Can we remove this duplication by e.g. setting the token kind here and checking it in the formatter?

This revision now requires changes to proceed.Jan 28 2022, 1:36 PM
psigillito marked 7 inline comments as done.Jan 29 2022, 10:28 AM
psigillito added inline comments.
clang/lib/Format/UnwrappedLineParser.cpp
2717–2719

Yes, since we now set the kind to 'identifier' here, the binary_search check in the formatter is unnecessary. Updated the formatter to no longer do the check.

psigillito marked an inline comment as done.
  • review changes, lower_bound to binary_search
curdeius accepted this revision.Jan 29 2022, 11:43 AM

LGTM but please add an assert(is_sorted).

clang/lib/Format/FormatToken.h
127

Is there a place anywhere that you verify it's sorted?
If no, please add an assert.

+1 for assert

clang/lib/Format/FormatToken.h
127

Could initialize from a lambda in which the assert is located, thus it is only executed once.

static const std::vector<clang::tok::TokenKind> COperatorsFollowingVar = []{
  std::vector<clang::tok::TokenKind> Ret...;
  assert(...);
  return Ret;
}();
This revision is now accepted and ready to land.Jan 29 2022, 12:50 PM
  • wrap vector in function
psigillito marked 2 inline comments as done.
  • unnecessary whitespace

Have you commit access, or do you need some one to commit it for you?

@HazardyKnusperkeks I do not have commit access. This is my first commit to the project. Do I just need to issue an 'arc land' command?

@HazardyKnusperkeks I do not have commit access. This is my first commit to the project. Do I just need to issue an 'arc land' command?

I've never used arc, so I can't tell you anything about that.

You need to apply for commit access, that's no big deal, you can find the address in the LLVM documentation. Or you post name and email for the commit here and someone will commit it on your behalf.

@HazardyKnusperkeks
Ok thanks, someone else can commit on my behalf while I wait for commit access.

name: Philip Sigillito
email: psigillito@gmail.com

Do I need to care about these pre-build checks failing?

Do I need to care about these pre-build checks failing?

The checks passed, the build on linux failed. But as far as I can see it's something in openmp. I assume you did run all the format tests, then everything should be fine.

Do I need to care about these pre-build checks failing?

The checks passed, the build on linux failed. But as far as I can see it's something in openmp. I assume you did run all the format tests, then everything should be fine.

Yes, I ran the format tests.

Thanks!

This revision was landed with ongoing or failed builds.Jan 30 2022, 11:57 AM
This revision was automatically updated to reflect the committed changes.