Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Sorry, this is my first time using Phabricator. I am not sure if it matters that .arclint is changed in this review, it is not part of my commit.
This issue only appears to impact the accessSpecifier keywords. For example, a c variable declared with the name 'private' will be treated as an access specifier.
I tried to make parsing the identifiers more restrictive by requiring either an @ prefix for obj-c or a colon after the keyword for cpp. Unfortunately, that does not allow an incorrect access specifier (i.e without a colon) to be identified as an accessor. I think a missing colon is probably a common typo so I dont want to remove identifying a access specifier with a missing colon correctly.
I think the correct approach is to check if the var named 'private' is followed by an operator or symbol indicating it is variable.
.arclint | ||
---|---|---|
5 ↗ | (On Diff #400340) | remove this change |
clang/lib/Format/UnwrappedLineFormatter.cpp | ||
102–103 | Style.isCpp() | |
clang/lib/Format/UnwrappedLineParser.cpp | ||
2728 | elide { | |
clang/unittests/Format/FormatTest.cpp | ||
3127 | This isn't enough of a test, if you are going to add this CppOperatorsFollowingVar then you'd better test every case |
Thanks for having a try on this.
However, I don't like this approach too much. You add many changes and a single test. That's not sufficient.
Also, handling C++ keywords in all cases (e.g. delete as a function name) *may* need to distinguish whether we format a C file or a C++ file. It's probably impossible to do this without user input (.h extension is used in both languages for example).
We'd maybe need to add C as language option and let the user specify the language (-x c?).
That in turn may be painful (because not automatic).
But, you may have a better solution.
My 2 cents.
.arclint | ||
---|---|---|
5 ↗ | (On Diff #400340) | Unrelated. Please revert. |
clang/lib/Format/UnwrappedLineFormatter.cpp | ||
102–103 | Please refactor to e.g. a lambda to ease the understanding. And use a series of ifs and returns. |
Thanks, I am just starting to understand some of the code base so my changes should be taken with a grain of salt. The more I think about this, I do not think my change is correct. As you pointed out with the 'delete' keyword, my changes would not correctly handle a struct named 'delete' i.e.
struct delete foo = {0}; delete.val;
I think adding a cmd line argument specifying you are using C code is more correct. If the user does not specify they are using C, clang-format would format the code as it does now.
The only thing I don't like about this is that it might be kind of clumsy to only be able to specify your language is C. I can take a stab at implementing this if this is the route we want to go.
clang/lib/Format/UnwrappedLineParser.cpp | ||
---|---|---|
2722–2731 | I get what your trying to do but what happens in this scenario private::mynamespace::g_value -1; |
These are just initial changes, there is still a lot of work and test cases to write. I figured I would put this out there to see if this is the direction we were thinking.
clang/lib/Format/UnwrappedLineParser.cpp | ||
---|---|---|
2722–2731 | Yea, it doesnt parse correctly and is split to a new line after private:. I am going to revert all the changes in this code review and start looking at implementing C as its own language. |
I'm not a fan of this approach of adding a "C" language, mainly because of the .h problem so ultimately it doesn't solve your problem.
I think this is overkill for what is basically the subtle handling of "public/private/protected/try/delete/new" being used as variables (which frankly is at least in my view not a bright idea in the first place!)
I think ultimately we can try to catch those corner cases but IMHO it doesn't need the addition of a whole new language for that, (we are not at that much on an impasse)
This review contains .arclint that needs removing, this review contains no tests
I actually thought you were on the right lines with your change to parseAccessSpecifier() surely if you see "private:" "public:" or "protected:" but not public:: that's when you call parseAccessSpecifier()
I think mostly I believe clang-format is a 96% done deal in terms of code, a lot of what we do here in my view is about closing those final "corner cases", sometimes our fixes are quite specific,
Something in the back of my head says this is a sledge hammer to crack a nut but that is just my view.. I'm just a little uncomfortable here with this change although it may open doors for other C specific changes, I'm not convinced this solves the problem I just think it moves it a little and maybe adds to complexity for what is likely a very small number of cases.
clang/lib/Format/TokenAnnotator.cpp | ||
---|---|---|
3238 ↗ | (On Diff #400718) | I wonder how many people using C do this? #define new(x) malloc(x) #define delete(x) free(x) Some at least https://github.com/jbf/tonsai-scheme/blob/83d2a022f60fc17f9913c017fbde29e19cdf5b2b/src/memory.h |
clang/lib/Format/UnwrappedLineFormatter.cpp | ||
100 | isC() these large OR/AND conditions become unreadable | |
clang/lib/Format/UnwrappedLineParser.cpp | ||
1295 | isC() | |
1678 | I feel everywhere you put isCpp you'll end up doing isCpp() || is C() This goes back to the arguments for a review I tried before D80079: [clang-format] [NFC] isCpp() is inconsistently used to mean both C++ and Objective C, add language specific isXXX() functions its really isCStyle() if we WERE going to pursue this, I'd want that and not have clauses added everywhere | |
clang/tools/clang-format/ClangFormat.cpp | ||
124 ↗ | (On Diff #400718) | Is this required for this patch or is this a nice to have, can we have a separate review for this, it may have merit in its own right but it needs it own tests |
467 ↗ | (On Diff #400718) | I don't understand what you are doing here or why? is this just for C it feels wrong |
469 ↗ | (On Diff #400718) | we don't use braces on single line if wouldn't you use a tolower or toupper here? |
For me the push back on adding the "C" language and having to pass it on the command line for the .h case is about VS Code, Vim, Visual Studio, Emacs having to recognize what language they are in and having to pass the -language=XXX flag, now we are pushing the interpretation of what language we are to them and not to us, which mean we are now out of control.
Each editor will make different assessments as to what language it is (likely only 98% correctly) and we'll get the blame when they say its C# and not C because they made the wrong decision
Personally, If we are going to add C as a language we need something that can control it without external intervention
something like..
// clang-format language='C'
or something that reads the header (like we used to have in Makefile)
---*- C++ -*-===//
especially if there is a sudo standard for that, and uses that as a language hint to say this is a "C" .h header
But I think it might be a mistake for us to rely on the editors to provide us that, and even adding the option could open a tsunami of incorrect language issues which we cannot solve.
Isn't the reality that developers using delete/new/try/catch/interface/public/protected/private/async/var/let as variables the real problem? I'm slightly averse to pandering to them, and I definitely don't want to make my world more complex just to accommodate their poor choice of variable name.
I know if someone used any of those words as variables in LLVM surely we'd tell them to change it. Honestly how much of a problem are we really talking about here that can't be handled by a few corner cases.
As MyDeveloperDay justly pointed out, adding a new language enum LK_C doesn't seem to be the way to go, as you mostly duplicate all the conditions (probably missing some of them).
For the usability, it would be really painful to specify the language for each file that should be considered as C.
So, sorry for this forth and back, but now at least we see that your initial approach was actually better (but please add tests!).
We'd maybe need to add C as language option and let the user specify the language (-x c?).
That in turn may be painful (because not automatic).
Yeah, I should have been more explicit here.
.arclint | ||
---|---|---|
15–16 ↗ | (On Diff #400718) | Please don't change unrelated files. |
.arclint | ||
---|---|---|
15–16 ↗ | (On Diff #400718) | I don't know about others but I personally don't use arcanist... I stage my files, make a patch and upload manually git add <files being modified> (or `git ls files -m | xargs git add` ) git clang-format git diff --cached -U4 > patch_to_submitt.diff I use the Create Diff/Update Diff functionality via the web interface (I feel that gives me more control) Just my 2c |
.arclint | ||
---|---|---|
15–16 ↗ | (On Diff #400718) | I do, but I never had to modify .arclint or whatever config file arcanist uses. Create and send a patch (if sending all changes not in main branch): arc diff main --reviewers ... Update a patch: arc diff main --update --revision D123456789 |
I was actually thinking that adding C as language is the right way, and still am. We have stuff only for arcane C variants, why not push these in the language C and keep C++ free from it?
But I'm fine with either solution.
clang/lib/Format/UnwrappedLineFormatter.cpp | ||
---|---|---|
100 | I'm more of a switch fan and not your isXXX(), but that's preference. ;) | |
clang/lib/Format/UnwrappedLineParser.cpp | ||
1678 | I think one could use isC(), isCpp(), isObjC(), and what is now isCpp(): isCish(). |
Ok, as I was starting to add a new language, the scope of changes just continued to grow.
If you think it is worthwhile, I think I can fix this edge case for accessSpecifiers by cleaning up my old approach and adding some tests. I dont like having to add a big set of operators to check against for handling the case where there is a typo and the colon is missing i.e.
class foo { private bool jim; public: bool bob; };
I think this is probably the most common error so I think we should support it.
I don't think the delete issue in https://github.com/llvm/llvm-project/issues/46915 is worth the added complexity. Without specifying the language, it is too hard to interpret the programmer's intention. For example, these are totally valid as either a delete or a function call:
delete(foo) // foo is a pointer being deleted delete(bar) // bar is a parameter to a function
@HazardyKnusperkeks (its not without merits to add a "C" Language the more i think about it, but I'm not sure quite in the form presented here)
If we DID add a "C" language then we need to add something like you suggested isCpp() already means more than we think.
bool isCStyle() { return isCpp() || isC(); }
Personally I really don't want to see the below change or it will change the world, so it would be better if the change was just from !Style.isCpp() to !Style.isCStyle()
!Style.isCpp() && !Style.isC()
The ClangFormat.cpp changes are really unrelated and need to be removed (or resubmitted/discussed separately), and we really must have extensive unit tests if we are adding a new language (which this review doesn't have any) - (that means a new FormatCTests.cpp)
I'd personally want to see really limited use of isCStyle() and ONLY what's needed to fix the exact example/tests we'd add. (its all to easy to find/replace thinking you are doing the right thing when actually we may not, it needs to be targetted changes)
But bottom line, I just don't see how we fix the .h problem? other than if the .clang-format section doesn't have a Language: Cpp then we don't assume its C++ (problem is it already does)
(actually this is a request that users can limit the guessing of languages to only those languages present in the .clang-format file)
This would be good for "C" projects but not much use for C++ or C++/C projects.
It wouldn't solve the delete(foo); issue this can still be a function or a delete of a bracketed variable, or a macro... so I don't see it helps
And bottom line the Lexer will give it to you as tok::kw_private and not as tok::identifier regardless of your language and so you'll need to perhaps do something in FormatTokenLexer to transform it back to an identifier (which is possible) checkout tryMergeLessLess() etc...
new nits but basically I think this looks like it might handle this ok? LGTM
We need to run this across a large code base to check
clang/lib/Format/UnwrappedLineFormatter.cpp | ||
---|---|---|
106 | remove extraneous {} braces |
clang/lib/Format/FormatToken.h | ||
---|---|---|
126 | And maybe choose a different container: https://llvm.org/docs/ProgrammersManual.html#set-like-containers-std-set-smallset-setvector-etc | |
clang/lib/Format/UnwrappedLineFormatter.cpp | ||
104 | In clang-format all lambdas I've seen start with a capital letter. | |
107 | Please just return. And then drop the else. | |
clang/lib/Format/UnwrappedLineParser.cpp | ||
2710 |
Last nits, apart from this clean up, I'm OK.
clang/lib/Format/FormatToken.h | ||
---|---|---|
126 | Not done it seems. | |
clang/lib/Format/UnwrappedLineFormatter.cpp | ||
109 | Nit here and elsewhere: full stop at the end of comments. | |
113 | ||
116 | Please use contains instead of count if using ImmutableSet. | |
clang/lib/Format/UnwrappedLineParser.cpp | ||
2710 | ||
2723–2729 | ||
2724 | Ditto. Use contains. |
- Merge branch 'main' of https://github.com/llvm/llvm-project
- code review syntax cleanup, sorted vector
clang/lib/Format/FormatToken.h | ||
---|---|---|
125 | ||
clang/lib/Format/UnwrappedLineFormatter.cpp | ||
122–129 | Please do something along these lines. The idea is to put a cheaper check first. Note. I haven't tested nor formatted these lines. | |
905 | Revert. | |
clang/lib/Format/UnwrappedLineParser.cpp | ||
2716–2718 | Please use binary_search and put it inside the else branch to avoid it if the first condition is satisfied. if (FormatTok->Tok.is(tok::colon)) { ... } else if (!binary_search(...) { } else if (...) { } Also, this code and the code in UnwrappedLineFormatter are pretty much similar. |
clang/lib/Format/UnwrappedLineParser.cpp | ||
---|---|---|
2716–2718 | Yes, since we now set the kind to 'identifier' here, the binary_search check in the formatter is unnecessary. Updated the formatter to no longer do the check. |
LGTM but please add an assert(is_sorted).
clang/lib/Format/FormatToken.h | ||
---|---|---|
126 | Is there a place anywhere that you verify it's sorted? |
+1 for assert
clang/lib/Format/FormatToken.h | ||
---|---|---|
126 | Could initialize from a lambda in which the assert is located, thus it is only executed once. static const std::vector<clang::tok::TokenKind> COperatorsFollowingVar = []{ std::vector<clang::tok::TokenKind> Ret...; assert(...); return Ret; }(); |
@HazardyKnusperkeks I do not have commit access. This is my first commit to the project. Do I just need to issue an 'arc land' command?
I've never used arc, so I can't tell you anything about that.
You need to apply for commit access, that's no big deal, you can find the address in the LLVM documentation. Or you post name and email for the commit here and someone will commit it on your behalf.
@HazardyKnusperkeks
Ok thanks, someone else can commit on my behalf while I wait for commit access.
name: Philip Sigillito
email: psigillito@gmail.com
The checks passed, the build on linux failed. But as far as I can see it's something in openmp. I assume you did run all the format tests, then everything should be fine.