Page MenuHomePhabricator

[clang] detect switch fallthrough marked by a comment (PR43465)
ClosedPublic

Authored by llunak on Feb 2 2020, 11:18 AM.

Details

Summary

This patch should implement https://bugs.llvm.org/show_bug.cgi?id=43465 . I didn't try too hard to handle as many cases as GCC does, but I expect this should do in practice, especially given that this is a deprecated technique.

Diff Detail

Event Timeline

llunak created this revision.Feb 2 2020, 11:18 AM
llunak updated this revision to Diff 241938.Feb 2 2020, 11:43 AM

Fix possible crash, cache regex.

Thank you for working on this!

clang/lib/Sema/AnalysisBasedWarnings.cpp
1216

You should make sure all the variables follow the usual naming conventions in (https://llvm.org/docs/CodingStandards.html), like SourceData or LineStart. Please be careful not to repeat type names though (so don't turn const Stmt *stmt into const Stmt *Stmt).

1227

Should we handle whitespace-only lines and ignore them?

Also, comments should be full sentences with capitalization and punctuation.

1231–1236

Is there a need for this to be a pointer? I would imagine the regex could just be a regular data member that is initialized by the class constructor (unless I'm missing something).

clang/test/Sema/fallthrough-comment.c
2

Is there a reason this test needs to be -std=c11 explicitly?

llunak updated this revision to Diff 242074.Feb 3 2020, 7:39 AM

Updated according to comments.

llunak marked 4 inline comments as done.Feb 3 2020, 7:43 AM
llunak added inline comments.
clang/lib/Sema/AnalysisBasedWarnings.cpp
1231–1236

It's not a need, but it's an optimization to make it on-demand. This should be rarely needed and presumably compiling a regex costs much more than an allocation.

clang/test/Sema/fallthrough-comment.c
2

As can be seen in fallthrough-attr.c, the exact wording of the warning depends on selected -std variant.

aaron.ballman added inline comments.Feb 3 2020, 7:47 AM
clang/lib/Sema/AnalysisBasedWarnings.cpp
1231–1236

Unless we have some performance numbers to suggest this is necessary, I'd prefer the more straight-forward member variable.

clang/test/Sema/fallthrough-comment.c
2

Ah, thank you for the reminder!

llunak marked an inline comment as done.Feb 3 2020, 9:08 AM
llunak added inline comments.
clang/lib/Sema/AnalysisBasedWarnings.cpp
1231–1236

For two randomly selected "normal" LibreOffice source files FallthroughMapper gets instantiated 500-600 times, and std::chrono::steady_clock clocks the creation at 50-80us, so per source file this is 40-60ms, or 3-4% of the total compilation time. The percent cost is a bit skewed by using a massive PCH, but still this is a bit hefty for essentially saving one line.
But I can make it an on-demand initialized member variable, if that helps you.

llunak updated this revision to Diff 242104.Feb 3 2020, 9:09 AM
aaron.ballman accepted this revision.Feb 3 2020, 10:06 AM

LGTM!

clang/lib/Sema/AnalysisBasedWarnings.cpp
1231–1236

Thank you for doing the measurements -- that seems significant enough to be worth the effort over. I am fine with the current approach.

This revision is now accepted and ready to land.Feb 3 2020, 10:06 AM
This revision was automatically updated to reflect the committed changes.

I think giving comments a semantic meaning is a bad idea. Do we really have to do this?

In addition to this making comments have semantic meaning, the meaning is different from gcc.
I don't think we should support this.

thakis added inline comments.Feb 12 2020, 5:30 AM
clang/lib/Sema/AnalysisBasedWarnings.cpp
1239

Also, this adds a regex match for every comment line, yes? Isn't this terrible for build performance? Did you do any benchmarking of this?

Now this patch should be reverted.
This patch also omitted cfe-commits lists.

hans added a subscriber: hans.Feb 12 2020, 5:40 AM

I also jumped when I saw that this now makes certain comments "load bearing". That doesn't seem like a great idea to me.

The warning may be all right for C++ code, which has an attribute to suppress it, but C code does not normally use such attributes, and has no standard syntax for them. I think it would be better if the warning was off by default for C code. Those C projects that wish could opt-in to it and jump through the hoops of applying attributes to silence the warning.

Also, rG398b wasn't sent to cfe-dev lists.

I think giving comments a semantic meaning is a bad idea. Do we really have to do this?

I think that ship has sailed, for instance, see // NOLINT comments that are also supported to silence diagnostics. Also, the original bug report mentions that flex generates code using these constructs, so there are real world libraries using this construct.

In addition to this making comments have semantic meaning, the meaning is different from gcc.

We're covering the cases GCC does that make sense within Clang -- or have we deviated from GCC in a way that is significant to our users?

I don't think we should support this.

I think we should -- C did not have any other way to resolve this issue without C2x support or compiler extensions, and fallthrough is definitely an issue in C. While it's not my favorite solution to silencing the diagnostics, it solves a real problem that C programmers are hitting.

Now this patch should be reverted.
This patch also omitted cfe-commits lists.

That is not an automatic reason to revert a patch, especially one that has been accepted by a code owner.

clang/lib/Sema/AnalysisBasedWarnings.cpp
1239

I think giving comments a semantic meaning is a bad idea. Do we really have to do this?

I think that ship has sailed, for instance, see // NOLINT comments that are also supported to silence diagnostics.

That's for linters, not for compilers.

Also, the original bug report mentions that flex generates code using these constructs, so there are real world libraries using this construct.

So what? If we went down this road, we'd now have /*OVERRIDE*/ instead of override, /*FINAL*/ instead of final, etc. Real world libraries can be updated. That's what we already did everywhere in C++ mode. No reason it can't happen in C mode too.

The usual progression is:

  • compiler-specific extensions (we now have __attribute__((fallthrough)) attached to an empty statement)
  • eventually, if it has legs, standardization

This disrupts this path.

In addition to this making comments have semantic meaning, the meaning is different from gcc.

We're covering the cases GCC does that make sense within Clang -- or have we deviated from GCC in a way that is significant to our users?

We give a not-well-specified and not documented subset of comments some meaning. We do this in a way that's different from GCC.

Also, as mentioned above, as-is I'd expect this patch to make builds measurably slower. That alone is revert reason enough.

I also jumped when I saw that this now makes certain comments "load bearing". That doesn't seem like a great idea to me.

It's an idea we already have in the project though with NOLINT comments, though that is a clang-tidy approach. So this does add load bearing comments in the frontend, but with plenty of precedence (both with clang-tidy and with GCC).

The warning may be all right for C++ code, which has an attribute to suppress it, but C code does not normally use such attributes, and has no standard syntax for them.

That's not quite true. C2x has [[attr]] attributes, but that requires enabling a custom compiler extension for non-C2x mode. Given that there are reasonably popular libraries like flex which already using comments, this is supporting a real use case that is also supported by GCC.

I think it would be better if the warning was off by default for C code. Those C projects that wish could opt-in to it and jump through the hoops of applying attributes to silence the warning.

The warning is off by default already for both C and C++. The issue being solved here is projects that enable the option in C.

I think giving comments a semantic meaning is a bad idea. Do we really have to do this?

I think that ship has sailed, for instance, see // NOLINT comments that are also supported to silence diagnostics.

That's for linters, not for compilers.

Yeah, I realized that later that we only support that in clang-tidy, so this is introducing a new concept to the frontend.

Also, the original bug report mentions that flex generates code using these constructs, so there are real world libraries using this construct.

So what? If we went down this road, we'd now have /*OVERRIDE*/ instead of override, /*FINAL*/ instead of final, etc. Real world libraries can be updated. That's what we already did everywhere in C++ mode. No reason it can't happen in C mode too.

This is a red herring. The feature we have is a comment that disables a diagnostic and has no semantics. You are equating that to features with semantic requirements like override or final. I don't think that's a particularly compelling argument.

The usual progression is:

  • compiler-specific extensions (we now have __attribute__((fallthrough)) attached to an empty statement)
  • eventually, if it has legs, standardization

    This disrupts this path.

Fair points. We also add features to Clang that exist in GCC and solve real problems as part of the usual progression.

In addition to this making comments have semantic meaning, the meaning is different from gcc.

We're covering the cases GCC does that make sense within Clang -- or have we deviated from GCC in a way that is significant to our users?

We give a not-well-specified and not documented subset of comments some meaning. We do this in a way that's different from GCC.

We can (and should!) fix up the documentation to make this more obvious. I am still not convinced we deviate from GCC in a meaningful way, but if it would make you more comfortable if we exactly match the GCC behavior identically, that seems reasonable.

Also, as mentioned above, as-is I'd expect this patch to make builds measurably slower. That alone is revert reason enough.

As mentioned above, the author did the timing measurements at reviewer request and it did not make it measurably slower. If we have evidence that this really does add a sufficient performance regression, I totally agree that we should revert (I had the same worries). Thus far, no one has provided actual evidence that this is the case and I would not want to see a feature rejected on the hunch that it might be a performance regression.

All this said, I am comfortable reverting this back out of the 10.0 branch while we consider it harder. It does not seem so critical that we can't take time to discuss it further.

All this said, I am comfortable reverting this back out of the 10.0 branch while we consider it harder. It does not seem so critical that we can't take time to discuss it further.

Ah, this may not be in the 10.0 branch yet anyway. I mostly would prefer to avoid churn where we put it in, pull it out, and put it back in, if there's a way to salvage the feature by addressing issues post-commit as we typically do. Obviously, if there's sufficient justification to pull it out and not put it back in, that's the solution we should pick.

I think that this patch is needed since we need to deal with this situation somehow because fallthru comments are used in many real world code.

But I am personally against extending it more and use comments to disable warnings generally.

llunak marked an inline comment as done.Feb 12 2020, 7:52 AM

This patch also omitted cfe-commits lists.

That is a Phabricator problem. https://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface says to select 'Clang' as the repository, but after the monorepo switch that repository no longer exists and Phabricator says 'Inactive' for it. So I (presumably, I don't remember) selected the monorepo and tagged the issue with 'Clang'. If it's still required to use 'Clang' as the repository, then it shouldn't be marked as inactive, or alternatively cfe-commits should be added on the 'Clang' tag.

clang/lib/Sema/AnalysisBasedWarnings.cpp
1239

As said above, it's on-demand, and in code without unannotated fallthough it'll be triggered exactly zero times.

Doesn't GCC also support multiple different levels of this warning for all kinds of different spellings?
https://developers.redhat.com/blog/2017/03/10/wimplicit-fallthrough-in-gcc-7/

I have concerns about the performance of this warning. We're going to parse every comment?

rsmith added a subscriber: rsmith.EditedMar 2 2020, 8:37 AM

I'm also pretty concerned about using comments to drive warning behavior. We discussed this when first adding our -Wimplicit-fallthrough warning and its suppression mechanism and made an explicit decision that we did not want comments to affect our behaviour. I don't think anything has changed in that regard.

rsmith added a comment.Mar 2 2020, 1:03 PM

Per http://llvm.org/PR43465#c37, this patch has not had proper review and we have not established consensus that we want this. Please revert for now.

llunak added a comment.Mar 2 2020, 1:06 PM

We shouldn't enable the warning under -Wextra in language modes where there's no standard way to suppress it.

That may be true, but that is not what the bugreport is about, it explicitly mentions flex. There are codebases that explicitly use -Wimplicit-fallthrough, and flex generates the same code even for C++. So either Clang needs to handle it, or flex needs to change (which may be non-trivial if https://bugs.llvm.org/show_bug.cgi?id=43465#c24 is right), or codebases using flex will need to either special-case Clang or live with the warning.

rsmith added a comment.Mar 2 2020, 2:11 PM

We shouldn't enable the warning under -Wextra in language modes where there's no standard way to suppress it.

That may be true, but that is not what the bugreport is about, it explicitly mentions flex. There are codebases that explicitly use -Wimplicit-fallthrough, and flex generates the same code even for C++. So either Clang needs to handle it, or flex needs to change (which may be non-trivial if https://bugs.llvm.org/show_bug.cgi?id=43465#c24 is right), or codebases using flex will need to either special-case Clang or live with the warning.

... or turn off the warning for generated code. As noted on that bug, -Wimplicit-fallthrough enforces a particular coding style, so users should expect to need to not enable it in generated code that doesn't follow that style. You might need to turn off (eg) -Windent or -Wparentheses in flex-generated code too.

In any case, none of this has any bearing on whether this patch has had sufficient review. It hasn't, so it needs to be reverted for now.

In any case, none of this has any bearing on whether this patch has had sufficient review. It hasn't, so it needs to be reverted for now.

FYI: it was reverted by Luboš in c61401b89742f230b7e6a2cc0548fbf7442e906d

rsmith added a comment.Mar 2 2020, 3:12 PM

FYI: it was reverted by Luboš in c61401b89742f230b7e6a2cc0548fbf7442e906d

Thank you, Luboš, and sorry for the process problems here.

llunak added a comment.Mar 3 2020, 1:56 AM

Thank you, Luboš, and sorry for the process problems here.

FWIW, the Clang repository here in Phabricator is still 'Inactive', so even though https://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface does say to use it, reviews may again not get sent to the correct commits list if somebody else gets confused by that too (especially for projects other than LLVM itself or Clang, as the page doesn't even mention what the mailing lists for those are).

I know this has been reverted, but for the record, the implementation has a few shortcomings that makes it hard to use it for projects that rely on fallthrough comments:

  • The comment is not recognized if it is on the same line as the last statement, e.g.
case 2:
  n++; // fallthrough
case 3:
  // ...
  • The implementation only scans until the next non-empty line and checks it for a comment, so there can't be a comment between the fallthrough comment and the last statement, making the following case not work:
case 2:
  if (some_bool)
    n++;
  // otherwise, don't increment n

// fallthrough
case 3:
  // ...
  • If the last statement ends in a macro, this implementation checks the source code at the definition of the macro, so this breaks:
#define SOME_MACRO 1

switch(c) {
  case 0:
    n += SOME_MACRO;
 //fallthrough
  case 1:
  // ...
}
  • I'm not 100% why this is the case, but for more complex if statements, the second branch will start the comment search in the line of the if statement:
case 0:
  if (n == 3 && p == 4)
    n++;

// fallthrough
case1:
  //...

They aren't hard to fix, but I think it would generally make more sense to search for a fallthrough comment upwards from a case statement instead of downwards. That way the "pick the first non-empty line matching the regex" logic should work, too.