Download Raw Diff

Details

Reviewers

Commits

rG398b4ed87d48: [clang] detect switch fallthrough marked by a comment (PR43465)

Summary

This patch should implement https://bugs.llvm.org/show_bug.cgi?id=43465 . I didn't try too hard to handle as many cases as GCC does, but I expect this should do in practice, especially given that this is a deprecated technique.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

llunak created this revision.Feb 2 2020, 11:18 AM

nickdesaulniers added a subscriber: nickdesaulniers.Feb 2 2020, 11:26 AM

Fix possible crash, cache regex.

Thank you for working on this!

clang/lib/Sema/AnalysisBasedWarnings.cpp
1216	You should make sure all the variables follow the usual naming conventions in (https://llvm.org/docs/CodingStandards.html), like `SourceData` or `LineStart`. Please be careful not to repeat type names though (so don't turn `const Stmt stmt` into `const Stmt Stmt`).
1227	Should we handle whitespace-only lines and ignore them? Also, comments should be full sentences with capitalization and punctuation.
1231–1236	Is there a need for this to be a pointer? I would imagine the regex could just be a regular data member that is initialized by the class constructor (unless I'm missing something).
clang/test/Sema/fallthrough-comment.c
2	Is there a reason this test needs to be `-std=c11` explicitly?

Updated according to comments.

llunak marked 4 inline comments as done.Feb 3 2020, 7:43 AM

llunak added inline comments.

clang/lib/Sema/AnalysisBasedWarnings.cpp
1231–1236	It's not a need, but it's an optimization to make it on-demand. This should be rarely needed and presumably compiling a regex costs much more than an allocation.
clang/test/Sema/fallthrough-comment.c
2	As can be seen in fallthrough-attr.c, the exact wording of the warning depends on selected -std variant.

aaron.ballman added inline comments.Feb 3 2020, 7:47 AM

clang/lib/Sema/AnalysisBasedWarnings.cpp
1231–1236	Unless we have some performance numbers to suggest this is necessary, I'd prefer the more straight-forward member variable.
clang/test/Sema/fallthrough-comment.c
2	Ah, thank you for the reminder!

llunak marked an inline comment as done.Feb 3 2020, 9:08 AM

llunak added inline comments.

clang/lib/Sema/AnalysisBasedWarnings.cpp
1231–1236	For two randomly selected "normal" LibreOffice source files FallthroughMapper gets instantiated 500-600 times, and std::chrono::steady_clock clocks the creation at 50-80us, so per source file this is 40-60ms, or 3-4% of the total compilation time. The percent cost is a bit skewed by using a massive PCH, but still this is a bit hefty for essentially saving one line. But I can make it an on-demand initialized member variable, if that helps you.

llunak updated this revision to Diff 242104.Feb 3 2020, 9:09 AM

LGTM!

clang/lib/Sema/AnalysisBasedWarnings.cpp
1231–1236	Thank you for doing the measurements -- that seems significant enough to be worth the effort over. I am fine with the current approach.

This revision is now accepted and ready to land.Feb 3 2020, 10:06 AM

Closed by commit rG398b4ed87d48: [clang] detect switch fallthrough marked by a comment (PR43465) (authored by llunak). · Explain WhyFeb 3 2020, 10:41 AM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: cfe-commits. · View Herald TranscriptFeb 3 2020, 10:41 AM

I think giving comments a semantic meaning is a bad idea. Do we really have to do this?

In addition to this making comments have semantic meaning, the meaning is different from gcc.
I don't think we should support this.

thakis added inline comments.Feb 12 2020, 5:30 AM

clang/lib/Sema/AnalysisBasedWarnings.cpp
1239	Also, this adds a regex match for every comment line, yes? Isn't this terrible for build performance? Did you do any benchmarking of this?

Now this patch should be reverted.
This patch also omitted cfe-commits lists.

I also jumped when I saw that this now makes certain comments "load bearing". That doesn't seem like a great idea to me.

The warning may be all right for C++ code, which has an attribute to suppress it, but C code does not normally use such attributes, and has no standard syntax for them. I think it would be better if the warning was off by default for C code. Those C projects that wish could opt-in to it and jump through the hoops of applying attributes to silence the warning.

Also, rG398b wasn't sent to cfe-dev lists.

In D73852#1872000, @thakis wrote:

I think giving comments a semantic meaning is a bad idea. Do we really have to do this?

I think that ship has sailed, for instance, see // NOLINT comments that are also supported to silence diagnostics. Also, the original bug report mentions that flex generates code using these constructs, so there are real world libraries using this construct.

In addition to this making comments have semantic meaning, the meaning is different from gcc.

We're covering the cases GCC does that make sense within Clang -- or have we deviated from GCC in a way that is significant to our users?

I don't think we should support this.

I think we should -- C did not have any other way to resolve this issue without C2x support or compiler extensions, and fallthrough is definitely an issue in C. While it's not my favorite solution to silencing the diagnostics, it solves a real problem that C programmers are hitting.

In D73852#1872013, @lebedev.ri wrote:

Now this patch should be reverted.
This patch also omitted cfe-commits lists.

That is not an automatic reason to revert a patch, especially one that has been accepted by a code owner.

clang/lib/Sema/AnalysisBasedWarnings.cpp
1239	https://reviews.llvm.org/D73852#inline-671309

In D73852#1872044, @aaron.ballman wrote:

In D73852#1872000, @thakis wrote:

I think giving comments a semantic meaning is a bad idea. Do we really have to do this?

I think that ship has sailed, for instance, see // NOLINT comments that are also supported to silence diagnostics.

That's for linters, not for compilers.

Also, the original bug report mentions that flex generates code using these constructs, so there are real world libraries using this construct.

So what? If we went down this road, we'd now have /*OVERRIDE*/ instead of override, /*FINAL*/ instead of final, etc. Real world libraries can be updated. That's what we already did everywhere in C++ mode. No reason it can't happen in C mode too.

The usual progression is:

compiler-specific extensions (we now have __attribute__((fallthrough)) attached to an empty statement)
eventually, if it has legs, standardization

This disrupts this path.

In addition to this making comments have semantic meaning, the meaning is different from gcc.

We're covering the cases GCC does that make sense within Clang -- or have we deviated from GCC in a way that is significant to our users?

We give a not-well-specified and not documented subset of comments some meaning. We do this in a way that's different from GCC.

Also, as mentioned above, as-is I'd expect this patch to make builds measurably slower. That alone is revert reason enough.

In D73852#1872019, @hans wrote:

I also jumped when I saw that this now makes certain comments "load bearing". That doesn't seem like a great idea to me.

It's an idea we already have in the project though with NOLINT comments, though that is a clang-tidy approach. So this does add load bearing comments in the frontend, but with plenty of precedence (both with clang-tidy and with GCC).

The warning may be all right for C++ code, which has an attribute to suppress it, but C code does not normally use such attributes, and has no standard syntax for them.

That's not quite true. C2x has [[attr]] attributes, but that requires enabling a custom compiler extension for non-C2x mode. Given that there are reasonably popular libraries like flex which already using comments, this is supporting a real use case that is also supported by GCC.

I think it would be better if the warning was off by default for C code. Those C projects that wish could opt-in to it and jump through the hoops of applying attributes to silence the warning.

The warning is off by default already for both C and C++. The issue being solved here is projects that enable the option in C.

In D73852#1872064, @thakis wrote:

In D73852#1872044, @aaron.ballman wrote:

In D73852#1872000, @thakis wrote:

I think giving comments a semantic meaning is a bad idea. Do we really have to do this?

I think that ship has sailed, for instance, see // NOLINT comments that are also supported to silence diagnostics.

That's for linters, not for compilers.

Yeah, I realized that later that we only support that in clang-tidy, so this is introducing a new concept to the frontend.

Also, the original bug report mentions that flex generates code using these constructs, so there are real world libraries using this construct.

So what? If we went down this road, we'd now have /*OVERRIDE*/ instead of override, /*FINAL*/ instead of final, etc. Real world libraries can be updated. That's what we already did everywhere in C++ mode. No reason it can't happen in C mode too.

This is a red herring. The feature we have is a comment that disables a diagnostic and has no semantics. You are equating that to features with semantic requirements like override or final. I don't think that's a particularly compelling argument.

The usual progression is:

compiler-specific extensions (we now have __attribute__((fallthrough)) attached to an empty statement)

eventually, if it has legs, standardization

This disrupts this path.

Fair points. We also add features to Clang that exist in GCC and solve real problems as part of the usual progression.

In addition to this making comments have semantic meaning, the meaning is different from gcc.

We're covering the cases GCC does that make sense within Clang -- or have we deviated from GCC in a way that is significant to our users?

We give a not-well-specified and not documented subset of comments some meaning. We do this in a way that's different from GCC.

We can (and should!) fix up the documentation to make this more obvious. I am still not convinced we deviate from GCC in a meaningful way, but if it would make you more comfortable if we exactly match the GCC behavior identically, that seems reasonable.

Also, as mentioned above, as-is I'd expect this patch to make builds measurably slower. That alone is revert reason enough.

As mentioned above, the author did the timing measurements at reviewer request and it did not make it measurably slower. If we have evidence that this really does add a sufficient performance regression, I totally agree that we should revert (I had the same worries). Thus far, no one has provided actual evidence that this is the case and I would not want to see a feature rejected on the hunch that it might be a performance regression.

All this said, I am comfortable reverting this back out of the 10.0 branch while we consider it harder. It does not seem so critical that we can't take time to discuss it further.

In D73852#1872104, @aaron.ballman wrote:

All this said, I am comfortable reverting this back out of the 10.0 branch while we consider it harder. It does not seem so critical that we can't take time to discuss it further.

Ah, this may not be in the 10.0 branch yet anyway. I mostly would prefer to avoid churn where we put it in, pull it out, and put it back in, if there's a way to salvage the feature by addressing issues post-commit as we typically do. Obviously, if there's sufficient justification to pull it out and not put it back in, that's the solution we should pick.

I think that this patch is needed since we need to deal with this situation somehow because fallthru comments are used in many real world code.

But I am personally against extending it more and use comments to disable warnings generally.

In D73852#1872013, @lebedev.ri wrote:

This patch also omitted cfe-commits lists.

That is a Phabricator problem. https://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface says to select 'Clang' as the repository, but after the monorepo switch that repository no longer exists and Phabricator says 'Inactive' for it. So I (presumably, I don't remember) selected the monorepo and tagged the issue with 'Clang'. If it's still required to use 'Clang' as the repository, then it shouldn't be marked as inactive, or alternatively cfe-commits should be added on the 'Clang' tag.

clang/lib/Sema/AnalysisBasedWarnings.cpp
1239	As said above, it's on-demand, and in code without unannotated fallthough it'll be triggered exactly zero times.

Doesn't GCC also support multiple different levels of this warning for all kinds of different spellings?
https://developers.redhat.com/blog/2017/03/10/wimplicit-fallthrough-in-gcc-7/

I have concerns about the performance of this warning. We're going to parse every comment?

I'm also pretty concerned about using comments to drive warning behavior. We discussed this when first adding our -Wimplicit-fallthrough warning and its suppression mechanism and made an explicit decision that we did not want comments to affect our behaviour. I don't think anything has changed in that regard.

Per http://llvm.org/PR43465#c37, this patch has not had proper review and we have not established consensus that we want this. Please revert for now.

In D73852#1901186, @rsmith wrote:

We shouldn't enable the warning under -Wextra in language modes where there's no standard way to suppress it.

That may be true, but that is not what the bugreport is about, it explicitly mentions flex. There are codebases that explicitly use -Wimplicit-fallthrough, and flex generates the same code even for C++. So either Clang needs to handle it, or flex needs to change (which may be non-trivial if https://bugs.llvm.org/show_bug.cgi?id=43465#c24 is right), or codebases using flex will need to either special-case Clang or live with the warning.

In D73852#1901699, @llunak wrote:

In D73852#1901186, @rsmith wrote:

We shouldn't enable the warning under -Wextra in language modes where there's no standard way to suppress it.

That may be true, but that is not what the bugreport is about, it explicitly mentions flex. There are codebases that explicitly use -Wimplicit-fallthrough, and flex generates the same code even for C++. So either Clang needs to handle it, or flex needs to change (which may be non-trivial if https://bugs.llvm.org/show_bug.cgi?id=43465#c24 is right), or codebases using flex will need to either special-case Clang or live with the warning.

... or turn off the warning for generated code. As noted on that bug, -Wimplicit-fallthrough enforces a particular coding style, so users should expect to need to not enable it in generated code that doesn't follow that style. You might need to turn off (eg) -Windent or -Wparentheses in flex-generated code too.

In any case, none of this has any bearing on whether this patch has had sufficient review. It hasn't, so it needs to be reverted for now.

In D73852#1901793, @rsmith wrote:

In any case, none of this has any bearing on whether this patch has had sufficient review. It hasn't, so it needs to be reverted for now.

FYI: it was reverted by Luboš in c61401b89742f230b7e6a2cc0548fbf7442e906d

In D73852#1901836, @aaron.ballman wrote:

FYI: it was reverted by Luboš in c61401b89742f230b7e6a2cc0548fbf7442e906d

Thank you, Luboš, and sorry for the process problems here.

In D73852#1901895, @rsmith wrote:

Thank you, Luboš, and sorry for the process problems here.

FWIW, the Clang repository here in Phabricator is still 'Inactive', so even though https://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface does say to use it, reviews may again not get sent to the correct commits list if somebody else gets confused by that too (especially for projects other than LLVM itself or Clang, as the page doesn't even mention what the mailing lists for those are).

I know this has been reverted, but for the record, the implementation has a few shortcomings that makes it hard to use it for projects that rely on fallthrough comments:

The comment is not recognized if it is on the same line as the last statement, e.g.

case 2:
  n++; // fallthrough
case 3:
  // ...

The implementation only scans until the next non-empty line and checks it for a comment, so there can't be a comment between the fallthrough comment and the last statement, making the following case not work:

case 2:
  if (some_bool)
    n++;
  // otherwise, don't increment n

// fallthrough
case 3:
  // ...

If the last statement ends in a macro, this implementation checks the source code at the definition of the macro, so this breaks:

#define SOME_MACRO 1

switch(c) {
  case 0:
    n += SOME_MACRO;
 //fallthrough
  case 1:
  // ...
}

I'm not 100% why this is the case, but for more complex if statements, the second branch will start the comment search in the line of the if statement:

case 0:
  if (n == 3 && p == 4)
    n++;

// fallthrough
case1:
  //...

They aren't hard to fix, but I think it would generally make more sense to search for a fallthrough comment upwards from a case statement instead of downwards. That way the "pick the first non-empty line matching the regex" logic should work, too.

Diff 242125

clang/lib/Sema/AnalysisBasedWarnings.cpp

Show First 20 Lines • Show All 1,142 Lines • ▼ Show 20 Lines	bool checkFallThroughIntoBlock(const CFGBlock &B, int &AnnotatedCnt,

if (!LastStmt) { // This block contains no executable statements.		if (!LastStmt) { // This block contains no executable statements.
// Traverse its predecessors.		// Traverse its predecessors.
std::copy(P->pred_begin(), P->pred_end(),		std::copy(P->pred_begin(), P->pred_end(),
std::back_inserter(BlockQueue));		std::back_inserter(BlockQueue));
continue;		continue;
}		}

		if (isFollowedByFallThroughComment(LastStmt)) {
		++AnnotatedCnt;
		continue; // Fallthrough comment, good.
		}

++UnannotatedCnt;		++UnannotatedCnt;
}		}
return !!UnannotatedCnt;		return !!UnannotatedCnt;
}		}

// RecursiveASTVisitor setup.		// RecursiveASTVisitor setup.
bool shouldWalkTypesOfTypeLocs() const { return false; }		bool shouldWalkTypesOfTypeLocs() const { return false; }

▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	static const Stmt *getLastStmt(const CFGBlock &B) {
// case X: ; case Y:		// case X: ; case Y:
if (const SwitchCase *SW = dyn_cast_or_null<SwitchCase>(B.getLabel()))		if (const SwitchCase *SW = dyn_cast_or_null<SwitchCase>(B.getLabel()))
if (!isa<SwitchCase>(SW->getSubStmt()))		if (!isa<SwitchCase>(SW->getSubStmt()))
return SW->getSubStmt();		return SW->getSubStmt();

return nullptr;		return nullptr;
}		}

		bool isFollowedByFallThroughComment(const Stmt *Statement) {
		aaron.ballmanUnsubmitted Done Reply Inline Actions You should make sure all the variables follow the usual naming conventions in (https://llvm.org/docs/CodingStandards.html), like `SourceData` or `LineStart`. Please be careful not to repeat type names though (so don't turn `const Stmt stmt` into `const Stmt Stmt`). aaron.ballman: You should make sure all the variables follow the usual naming conventions in (https://llvm.
		// Try to detect whether the fallthough is marked by a comment like
		// /FALLTHOUGH/.
		bool Invalid;
		const char *SourceData = S.getSourceManager().getCharacterData(
		Statement->getEndLoc(), &Invalid);
		if (Invalid)
		return false;
		const char *LineStart = SourceData;
		for (;;) {
		LineStart = strchr(LineStart, '\n');
		if (LineStart == nullptr)
		aaron.ballmanUnsubmitted Done Reply Inline Actions Should we handle whitespace-only lines and ignore them? Also, comments should be full sentences with capitalization and punctuation. aaron.ballman: Should we handle whitespace-only lines and ignore them? Also, comments should be full…
		return false;
		++LineStart; // Start of next line.
		const char *LineEnd = strchr(LineStart, '\n');
		StringRef Line(LineStart,
		LineEnd ? LineEnd - LineStart : strlen(LineStart));
		if (LineStart == LineEnd \|\|
		Line.find_first_not_of(" \t\r") == StringRef::npos)
		continue; // Whitespace-only line.
		if (!FallthroughRegex.isValid())
		aaron.ballmanUnsubmitted Not Done Reply Inline Actions Is there a need for this to be a pointer? I would imagine the regex could just be a regular data member that is initialized by the class constructor (unless I'm missing something). aaron.ballman: Is there a need for this to be a pointer? I would imagine the regex could just be a regular…
		llunakAuthorUnsubmitted Done Reply Inline Actions It's not a need, but it's an optimization to make it on-demand. This should be rarely needed and presumably compiling a regex costs much more than an allocation. llunak: It's not a need, but it's an optimization to make it on-demand. This should be rarely needed…
		aaron.ballmanUnsubmitted Not Done Reply Inline Actions Unless we have some performance numbers to suggest this is necessary, I'd prefer the more straight-forward member variable. aaron.ballman: Unless we have some performance numbers to suggest this is necessary, I'd prefer the more…
		llunakAuthorUnsubmitted Done Reply Inline Actions For two randomly selected "normal" LibreOffice source files FallthroughMapper gets instantiated 500-600 times, and std::chrono::steady_clock clocks the creation at 50-80us, so per source file this is 40-60ms, or 3-4% of the total compilation time. The percent cost is a bit skewed by using a massive PCH, but still this is a bit hefty for essentially saving one line. But I can make it an on-demand initialized member variable, if that helps you. llunak: For two randomly selected "normal" LibreOffice source files FallthroughMapper gets instantiated…
		aaron.ballmanUnsubmitted Not Done Reply Inline Actions Thank you for doing the measurements -- that seems significant enough to be worth the effort over. I am fine with the current approach. aaron.ballman: Thank you for doing the measurements -- that seems significant enough to be worth the effort…
		FallthroughRegex =
		llvm::Regex("(/\\[ \\t]fall(s \| \|-)?thr(ough\|u)\\.?[ \\t]\\/)"
		"\|(//[ \\t]fall(s \| \|-)?thr(ough\|u)\\.?[ \\t])",
		thakisUnsubmitted Not Done Reply Inline Actions Also, this adds a regex match for every comment line, yes? Isn't this terrible for build performance? Did you do any benchmarking of this? thakis: Also, this adds a regex match for every comment line, yes? Isn't this terrible for build…
		aaron.ballmanUnsubmitted Not Done Reply Inline Actions https://reviews.llvm.org/D73852#inline-671309 aaron.ballman: https://reviews.llvm.org/D73852#inline-671309
		llunakAuthorUnsubmitted Done Reply Inline Actions As said above, it's on-demand, and in code without unannotated fallthough it'll be triggered exactly zero times. llunak: As said above, it's on-demand, and in code without unannotated fallthough it'll be triggered…
		llvm::Regex::IgnoreCase);
		assert(FallthroughRegex.isValid());
		return FallthroughRegex.match(Line);
		}
		}

bool FoundSwitchStatements;		bool FoundSwitchStatements;
AttrStmts FallthroughStmts;		AttrStmts FallthroughStmts;
Sema &S;		Sema &S;
llvm::SmallPtrSet<const CFGBlock *, 16> ReachableBlocks;		llvm::SmallPtrSet<const CFGBlock *, 16> ReachableBlocks;
		llvm::Regex FallthroughRegex;
};		};
} // anonymous namespace		} // anonymous namespace

static StringRef getFallthroughAttrSpelling(Preprocessor &PP,		static StringRef getFallthroughAttrSpelling(Preprocessor &PP,
SourceLocation Loc) {		SourceLocation Loc) {
TokenValue FallthroughTokens[] = {		TokenValue FallthroughTokens[] = {
tok::l_square, tok::l_square,		tok::l_square, tok::l_square,
PP.getIdentifierInfo("fallthrough"),		PP.getIdentifierInfo("fallthrough"),
▲ Show 20 Lines • Show All 1,066 Lines • Show Last 20 Lines

clang/test/Sema/fallthrough-comment.c

This file was added.

				// RUN: %clang_cc1 -fsyntax-only -std=c11 -verify -Wimplicit-fallthrough %s

				aaron.ballmanUnsubmitted Not Done Reply Inline Actions Is there a reason this test needs to be `-std=c11` explicitly? aaron.ballman: Is there a reason this test needs to be `-std=c11` explicitly?
				llunakAuthorUnsubmitted Done Reply Inline Actions As can be seen in fallthrough-attr.c, the exact wording of the warning depends on selected -std variant. llunak: As can be seen in fallthrough-attr.c, the exact wording of the warning depends on selected -std…
				aaron.ballmanUnsubmitted Not Done Reply Inline Actions Ah, thank you for the reminder! aaron.ballman: Ah, thank you for the reminder!
				int fallthrough_comment(int n) {
				switch (n) {
				case 0:
				n++;
				// FALLTHROUGH
				case 1:
				n++;

				/fall-through./

				case 2:
				n++;
				case 3: // expected-warning{{unannotated fall-through between switch labels}} expected-note{{insert '__attribute__((fallthrough));' to silence this warning}} expected-note{{insert 'break;' to avoid fall-through}}
				n++;
				break;
				}
				return n;
				}

This is an archive of the discontinued LLVM Phabricator instance.

[clang] detect switch fallthrough marked by a comment (PR43465)
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 242125

clang/lib/Sema/AnalysisBasedWarnings.cpp

clang/test/Sema/fallthrough-comment.c

This is an archive of the discontinued LLVM Phabricator instance.

[clang] detect switch fallthrough marked by a comment (PR43465)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 242125

clang/lib/Sema/AnalysisBasedWarnings.cpp

clang/test/Sema/fallthrough-comment.c

[clang] detect switch fallthrough marked by a comment (PR43465)
ClosedPublic