This patch should implement https://bugs.llvm.org/show_bug.cgi?id=43465 . I didn't try too hard to handle as many cases as GCC does, but I expect this should do in practice, especially given that this is a deprecated technique.
Details
Diff Detail
Event Timeline
Thank you for working on this!
clang/lib/Sema/AnalysisBasedWarnings.cpp | ||
---|---|---|
1216 | You should make sure all the variables follow the usual naming conventions in (https://llvm.org/docs/CodingStandards.html), like SourceData or LineStart. Please be careful not to repeat type names though (so don't turn const Stmt *stmt into const Stmt *Stmt). | |
1227 | Should we handle whitespace-only lines and ignore them? Also, comments should be full sentences with capitalization and punctuation. | |
1231–1236 | Is there a need for this to be a pointer? I would imagine the regex could just be a regular data member that is initialized by the class constructor (unless I'm missing something). | |
clang/test/Sema/fallthrough-comment.c | ||
1 | Is there a reason this test needs to be -std=c11 explicitly? |
clang/lib/Sema/AnalysisBasedWarnings.cpp | ||
---|---|---|
1231–1236 | It's not a need, but it's an optimization to make it on-demand. This should be rarely needed and presumably compiling a regex costs much more than an allocation. | |
clang/test/Sema/fallthrough-comment.c | ||
1 | As can be seen in fallthrough-attr.c, the exact wording of the warning depends on selected -std variant. |
clang/lib/Sema/AnalysisBasedWarnings.cpp | ||
---|---|---|
1231–1236 | For two randomly selected "normal" LibreOffice source files FallthroughMapper gets instantiated 500-600 times, and std::chrono::steady_clock clocks the creation at 50-80us, so per source file this is 40-60ms, or 3-4% of the total compilation time. The percent cost is a bit skewed by using a massive PCH, but still this is a bit hefty for essentially saving one line. |
LGTM!
clang/lib/Sema/AnalysisBasedWarnings.cpp | ||
---|---|---|
1231–1236 | Thank you for doing the measurements -- that seems significant enough to be worth the effort over. I am fine with the current approach. |
I think giving comments a semantic meaning is a bad idea. Do we really have to do this?
In addition to this making comments have semantic meaning, the meaning is different from gcc.
I don't think we should support this.
clang/lib/Sema/AnalysisBasedWarnings.cpp | ||
---|---|---|
1239 | Also, this adds a regex match for every comment line, yes? Isn't this terrible for build performance? Did you do any benchmarking of this? |
I also jumped when I saw that this now makes certain comments "load bearing". That doesn't seem like a great idea to me.
The warning may be all right for C++ code, which has an attribute to suppress it, but C code does not normally use such attributes, and has no standard syntax for them. I think it would be better if the warning was off by default for C code. Those C projects that wish could opt-in to it and jump through the hoops of applying attributes to silence the warning.
I think that ship has sailed, for instance, see // NOLINT comments that are also supported to silence diagnostics. Also, the original bug report mentions that flex generates code using these constructs, so there are real world libraries using this construct.
In addition to this making comments have semantic meaning, the meaning is different from gcc.
We're covering the cases GCC does that make sense within Clang -- or have we deviated from GCC in a way that is significant to our users?
I don't think we should support this.
I think we should -- C did not have any other way to resolve this issue without C2x support or compiler extensions, and fallthrough is definitely an issue in C. While it's not my favorite solution to silencing the diagnostics, it solves a real problem that C programmers are hitting.
That is not an automatic reason to revert a patch, especially one that has been accepted by a code owner.
clang/lib/Sema/AnalysisBasedWarnings.cpp | ||
---|---|---|
1239 |
That's for linters, not for compilers.
Also, the original bug report mentions that flex generates code using these constructs, so there are real world libraries using this construct.
So what? If we went down this road, we'd now have /*OVERRIDE*/ instead of override, /*FINAL*/ instead of final, etc. Real world libraries can be updated. That's what we already did everywhere in C++ mode. No reason it can't happen in C mode too.
The usual progression is:
- compiler-specific extensions (we now have __attribute__((fallthrough)) attached to an empty statement)
- eventually, if it has legs, standardization
This disrupts this path.
In addition to this making comments have semantic meaning, the meaning is different from gcc.
We're covering the cases GCC does that make sense within Clang -- or have we deviated from GCC in a way that is significant to our users?
We give a not-well-specified and not documented subset of comments some meaning. We do this in a way that's different from GCC.
Also, as mentioned above, as-is I'd expect this patch to make builds measurably slower. That alone is revert reason enough.
It's an idea we already have in the project though with NOLINT comments, though that is a clang-tidy approach. So this does add load bearing comments in the frontend, but with plenty of precedence (both with clang-tidy and with GCC).
The warning may be all right for C++ code, which has an attribute to suppress it, but C code does not normally use such attributes, and has no standard syntax for them.
That's not quite true. C2x has [[attr]] attributes, but that requires enabling a custom compiler extension for non-C2x mode. Given that there are reasonably popular libraries like flex which already using comments, this is supporting a real use case that is also supported by GCC.
I think it would be better if the warning was off by default for C code. Those C projects that wish could opt-in to it and jump through the hoops of applying attributes to silence the warning.
The warning is off by default already for both C and C++. The issue being solved here is projects that enable the option in C.
Yeah, I realized that later that we only support that in clang-tidy, so this is introducing a new concept to the frontend.
Also, the original bug report mentions that flex generates code using these constructs, so there are real world libraries using this construct.
So what? If we went down this road, we'd now have /*OVERRIDE*/ instead of override, /*FINAL*/ instead of final, etc. Real world libraries can be updated. That's what we already did everywhere in C++ mode. No reason it can't happen in C mode too.
This is a red herring. The feature we have is a comment that disables a diagnostic and has no semantics. You are equating that to features with semantic requirements like override or final. I don't think that's a particularly compelling argument.
The usual progression is:
- compiler-specific extensions (we now have __attribute__((fallthrough)) attached to an empty statement)
- eventually, if it has legs, standardization
This disrupts this path.
Fair points. We also add features to Clang that exist in GCC and solve real problems as part of the usual progression.
In addition to this making comments have semantic meaning, the meaning is different from gcc.
We're covering the cases GCC does that make sense within Clang -- or have we deviated from GCC in a way that is significant to our users?
We give a not-well-specified and not documented subset of comments some meaning. We do this in a way that's different from GCC.
We can (and should!) fix up the documentation to make this more obvious. I am still not convinced we deviate from GCC in a meaningful way, but if it would make you more comfortable if we exactly match the GCC behavior identically, that seems reasonable.
Also, as mentioned above, as-is I'd expect this patch to make builds measurably slower. That alone is revert reason enough.
As mentioned above, the author did the timing measurements at reviewer request and it did not make it measurably slower. If we have evidence that this really does add a sufficient performance regression, I totally agree that we should revert (I had the same worries). Thus far, no one has provided actual evidence that this is the case and I would not want to see a feature rejected on the hunch that it might be a performance regression.
All this said, I am comfortable reverting this back out of the 10.0 branch while we consider it harder. It does not seem so critical that we can't take time to discuss it further.
Ah, this may not be in the 10.0 branch yet anyway. I mostly would prefer to avoid churn where we put it in, pull it out, and put it back in, if there's a way to salvage the feature by addressing issues post-commit as we typically do. Obviously, if there's sufficient justification to pull it out and not put it back in, that's the solution we should pick.
I think that this patch is needed since we need to deal with this situation somehow because fallthru comments are used in many real world code.
But I am personally against extending it more and use comments to disable warnings generally.
That is a Phabricator problem. https://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface says to select 'Clang' as the repository, but after the monorepo switch that repository no longer exists and Phabricator says 'Inactive' for it. So I (presumably, I don't remember) selected the monorepo and tagged the issue with 'Clang'. If it's still required to use 'Clang' as the repository, then it shouldn't be marked as inactive, or alternatively cfe-commits should be added on the 'Clang' tag.
clang/lib/Sema/AnalysisBasedWarnings.cpp | ||
---|---|---|
1239 | As said above, it's on-demand, and in code without unannotated fallthough it'll be triggered exactly zero times. |
Doesn't GCC also support multiple different levels of this warning for all kinds of different spellings?
https://developers.redhat.com/blog/2017/03/10/wimplicit-fallthrough-in-gcc-7/
I have concerns about the performance of this warning. We're going to parse every comment?
I'm also pretty concerned about using comments to drive warning behavior. We discussed this when first adding our -Wimplicit-fallthrough warning and its suppression mechanism and made an explicit decision that we did not want comments to affect our behaviour. I don't think anything has changed in that regard.
Per http://llvm.org/PR43465#c37, this patch has not had proper review and we have not established consensus that we want this. Please revert for now.
That may be true, but that is not what the bugreport is about, it explicitly mentions flex. There are codebases that explicitly use -Wimplicit-fallthrough, and flex generates the same code even for C++. So either Clang needs to handle it, or flex needs to change (which may be non-trivial if https://bugs.llvm.org/show_bug.cgi?id=43465#c24 is right), or codebases using flex will need to either special-case Clang or live with the warning.
... or turn off the warning for generated code. As noted on that bug, -Wimplicit-fallthrough enforces a particular coding style, so users should expect to need to not enable it in generated code that doesn't follow that style. You might need to turn off (eg) -Windent or -Wparentheses in flex-generated code too.
In any case, none of this has any bearing on whether this patch has had sufficient review. It hasn't, so it needs to be reverted for now.
FWIW, the Clang repository here in Phabricator is still 'Inactive', so even though https://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface does say to use it, reviews may again not get sent to the correct commits list if somebody else gets confused by that too (especially for projects other than LLVM itself or Clang, as the page doesn't even mention what the mailing lists for those are).
I know this has been reverted, but for the record, the implementation has a few shortcomings that makes it hard to use it for projects that rely on fallthrough comments:
- The comment is not recognized if it is on the same line as the last statement, e.g.
case 2: n++; // fallthrough case 3: // ...
- The implementation only scans until the next non-empty line and checks it for a comment, so there can't be a comment between the fallthrough comment and the last statement, making the following case not work:
case 2: if (some_bool) n++; // otherwise, don't increment n // fallthrough case 3: // ...
- If the last statement ends in a macro, this implementation checks the source code at the definition of the macro, so this breaks:
#define SOME_MACRO 1 switch(c) { case 0: n += SOME_MACRO; //fallthrough case 1: // ... }
- I'm not 100% why this is the case, but for more complex if statements, the second branch will start the comment search in the line of the if statement:
case 0: if (n == 3 && p == 4) n++; // fallthrough case1: //...
They aren't hard to fix, but I think it would generally make more sense to search for a fallthrough comment upwards from a case statement instead of downwards. That way the "pick the first non-empty line matching the regex" logic should work, too.
You should make sure all the variables follow the usual naming conventions in (https://llvm.org/docs/CodingStandards.html), like SourceData or LineStart. Please be careful not to repeat type names though (so don't turn const Stmt *stmt into const Stmt *Stmt).