This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] find/fix unneeded semicolon after switch
Needs RevisionPublic

Authored by trixirt on Oct 26 2020, 11:28 AM.

Details

Summary

Cleaning up -Wextra-semi-stmt in the linux kernel shows a high
incidence of 'switch() {} ;' The ';' is not needed.

This is special case of the Parser::ConsumeExtraSemi() fixer.
This fixer allows the fixes to be batched and takes care of the
formatting issue of having an empty line when the ';' is removed.

Diff Detail

Event Timeline

trixirt created this revision.Oct 26 2020, 11:28 AM
trixirt requested review of this revision.Oct 26 2020, 11:28 AM
tmroeder added inline comments.Oct 26 2020, 12:17 PM
clang-tools-extra/clang-tidy/linuxkernel/LinuxKernelTidyModule.cpp
25

Please fix the linter error, here and in the other cases.

clang-tools-extra/clang-tidy/linuxkernel/SwitchSemiCheck.cpp
35

nit: I could use a short comment about the reason for this check for hasLeadingEmptyMacro.

clang-tools-extra/test/clang-tidy/checkers/linuxkernel-switch-semi.c
5–10

Maybe also a test that shows that it doesn't apply to a normal switch statement (without the extra semicolon)?

Is this not already handled by -Wextra-semi. If it isn't I feel like it should be handled there rather than in clang-tidy

Is this not already handled by -Wextra-semi. If it isn't I feel like it should be handled there rather than in clang-tidy

That was my initial thought too, but I think the value of this CL is in automating the Fix it, IIUC.

trixirt updated this revision to Diff 300830.Oct 26 2020, 4:49 PM

Comment on the matcher
Comment explaining need to look for empty macro
Add test for normal switch
Add test for switch with empty macro
Fix formatting

Yes, this automates the fixit to something precise.
There are about 10,000 problems in the kernel with -Wextra-semi, all of those fixes would be too many and diverse to practically submit.
Also the -Wextra-semi fix removes just the semi and so it's fix will fail the kernel formatting check for an empty line.

With adding a -fix to the 'make clang-tidy' logic, this will fix automagically 100 problems in 50 files for x86_64.

Taking a step back, clang-tidy checks are supposed to enforce guidelines for the specific module they live in.
If there are 10'000 occurrences of a semi directly after a switch closing brace in the linux kernel code base it could be argued that its a style guideline of the code base (or something they don't feel too strongly about).
In any case this check wouldn't be enforcing any guideline for the linux kernel code base.

Having said that, given it occurs so many times in one specific code base, suggests it may appear many more times in the wild.
In which case a specific warning should be emitted, just not in the linux-kernel module (or clang-tidy???)

trixirt updated this revision to Diff 300965.Oct 27 2020, 5:46 AM

precheckin lint fixes

The 10,000 problems are for all the extra semicolon problems.
Those after switch are a small subset, chosen because is ok to automagically fix them.

The checker is in the linuxkernel/ because it is being used to fix the linux kernel now.
And to enforce it stays fixed.

Enforcement here is necessary because the linux kernel's enforcer checkpatch is
a line checker, not an ast checker.

If folks think this is a general fixer, I do not mind moving it to some other checker dir.

I think CLang should do this job, since it has two related warnings already.

Is this not already handled by -Wextra-semi. If it isn't I feel like it should be handled there rather than in clang-tidy

Strong +1 -- this is already handled by -Wextra-semi-stmt and should not be duplicated in a clang-tidy check. In fact, there's already a fix-it produced by Clang for this: https://godbolt.org/z/xhbfc9

trixirt updated this revision to Diff 301059.Oct 27 2020, 11:11 AM

fix precheckin lint issues with auto vs const auto

Discussion of change on LKML
https://lkml.org/lkml/2020/10/27/2538

On why the existing clang fixit is not practical.
tl;dr
10,000 changes to look for a treewide fix
The fixit has whitespace issue

Discussion of change on LKML
https://lkml.org/lkml/2020/10/27/2538

On why the existing clang fixit is not practical.
tl;dr
10,000 changes to look for a treewide fix
The fixit has whitespace issue

By why not to improve Clang's fix-it code?

Discussion of change on LKML
https://lkml.org/lkml/2020/10/27/2538

On why the existing clang fixit is not practical.
tl;dr
10,000 changes to look for a treewide fix
The fixit has whitespace issue

When you pass -fix to clang-tidy, it will apply fix-its from the compiler as well. See https://godbolt.org/z/7vzM4b and try adding a semicolon to the end of the switch statement and note how it's removed when recompilation happens. What's more, the frontend will generate fix-its on some compile errors as well, so passing -fix may generate more changes than just removing semicolons even if you disable all frontend warnings. So if I understand your use case properly, you'd want to disable all frontend warnings, run clang-tidy with just this check enabled and pass -fix to be able to automatically catch and fix *only* the case where there's a semicolon after a switch (and just sort of hope that fixits from compile errors are reasonable to look at). Is that any more practical of a scenario? While we have a module for the Linux kernel and it's reasonable to argue that this check certainly applies there, I also think it's defensible to argue that this is a somewhat odd case that's better handled by an out-of-tree script rather than having the community carry around functionality that duplicates what the frontend already supports.

Btw, I'd say that the whitespace is not really something we'd consider an issue -- we generally expect the user to format their code changes after applying fix-its.

trixirt marked 2 inline comments as done.Oct 31 2020, 5:59 AM

I am trying to address problems treewide so around 100 changes per fix.
Doing that requires consensus that the fix is generally ok for every subsystem in the code base.
So while clang will fix
switch () {} ;
it will also fix
for () {

// tbd : add something here
;

}
consensus needs to be found on as narrow a fix as possible.
so there will be a specialized fixer for each problem consensus is reached on.

As for leaving whitespace fixing as an exercise for the user.
No change will be accepted with a whitespace problem.
When there are 100 fixes, there are 100 files to open, find the line, fix the line.
This is slow and error prone.

Another slow part is manually commit and submitting the fixes and doing a review cycle.
But if your fixer can automagically do everything you can hook it into a c/i and and
it will keep the codebase free of its set of fixers while we all do something more
interesting.

i am working on the kernel build to do this now.
It would be helpful if this and future fixers could live in clang-tidy's linuxkernel/
which is already part of the kernel build so i don't have to reinvent how to find them.

I am trying to address problems treewide so around 100 changes per fix.
Doing that requires consensus that the fix is generally ok for every subsystem in the code base.
So while clang will fix
switch () {} ;
it will also fix
for () {

// tbd : add something here
;

}
consensus needs to be found on as narrow a fix as possible.
so there will be a specialized fixer for each problem consensus is reached on.

Interesting approach, thank you for sharing it.

As for leaving whitespace fixing as an exercise for the user.
No change will be accepted with a whitespace problem.
When there are 100 fixes, there are 100 files to open, find the line, fix the line.
This is slow and error prone.

It's also not really required. Apply the automated fixes, run the results through clang-format-diff.py (http://clang.llvm.org/docs/ClangFormat.html#script-for-patch-reformatting), and commit the results. If this is error-prone, we've got a pretty big issue (IMO).

Another slow part is manually commit and submitting the fixes and doing a review cycle.
But if your fixer can automagically do everything you can hook it into a c/i and and
it will keep the codebase free of its set of fixers while we all do something more
interesting.

i am working on the kernel build to do this now.
It would be helpful if this and future fixers could live in clang-tidy's linuxkernel/
which is already part of the kernel build so i don't have to reinvent how to find them.

Given that we already have a module for Linux kernel-specific clang-tidy checks for this to live in, I can hold my nose on the fact that it's poorly replicating the frontend's work because other users won't be surprised by it (in theory). So I'll retract my disapproval; you have compelling rationale for why you want to do it this way.

Since the plan is to produce multiple distinct checks for each extraneous semicolon situation, then I think this check should probably be renamed to something more generic and given options to control which distinct scenarios to check for. This will reduce the amount of compilation overhead for the clang-tidy project over time by not needing to introduce a new check (with new boilerplate) for each scenario but should hopefully still allow you to do what you need (with config files perhaps) in your CI. WDYT?

When you pass -fix to clang-tidy, it will apply fix-its from the compiler as well. See https://godbolt.org/z/7vzM4b and try adding a semicolon to the end of the switch statement and note how it's removed when recompilation happens. What's more, the frontend will generate fix-its on some compile errors as well, so passing -fix may generate more changes than just removing semicolons even if you disable all frontend warnings. So if I understand your use case properly, you'd want to disable all frontend warnings, run clang-tidy with just this check enabled and pass -fix to be able to automatically catch and fix *only* the case where there's a semicolon after a switch (and just sort of hope that fixits from compile errors are reasonable to look at). Is that any more practical of a scenario?

Doesn't -W disable all warnings? If -W -Wextra-semi-stmt would only warn/produce fixits related to -Wextra-semi-stmt that might be feasible, but it's not as easy to integrate into the Linux kernel's build system (parts of the tree enable additional warnings), as compared to our current clang-tidy integration (which "just works").

While we have a module for the Linux kernel and it's reasonable to argue that this check certainly applies there, I also think it's defensible to argue that this is a somewhat odd case that's better handled by an out-of-tree script rather than having the community carry around functionality that duplicates what the frontend already supports.

I'm happy to take ownership of linuxkernel scoped clang-tidy checks, as the Linux kernel maintainer for LLVM support. Even after using this for a tree-wide change, it's still useful IMO for us to automate clang-tidy runs to prevent this pattern from recurring. And anything we can do to promote LLVM within the Linux kernel community, such as leveraging clang-tidy, is a huge win for both projects. And I wouldn't mind carrying checks for a brief period and removing them if we find they no longer provide value. Worst case, just having the patch up on phab lets future travels find what was used to make large sweeping changes and can be linked to from kernel commit messages.

Btw, I'd say that the whitespace is not really something we'd consider an issue -- we generally expect the user to format their code changes after applying fix-its.

Heh, clang-tidy is another tool we're...hoping to promote the use of more. That's it's own distinct battle.

Given that we already have a module for Linux kernel-specific clang-tidy checks for this to live in, I can hold my nose on the fact that it's poorly replicating the frontend's work because other users won't be surprised by it (in theory). So I'll retract my disapproval; you have compelling rationale for why you want to do it this way.

Since the plan is to produce multiple distinct checks for each extraneous semicolon situation, then I think this check should probably be renamed to something more generic and given options to control which distinct scenarios to check for.

That's fair feedback. Maybe something to note that this is more stylistic and doesn't necessarily fix bugs?

This will reduce the amount of compilation overhead for the clang-tidy project over time by not needing to introduce a new check (with new boilerplate) for each scenario but should hopefully still allow you to do what you need (with config files perhaps) in your CI. WDYT?

I don't see how renaming the check changes "compilation overhead" or why we think "compilation overhead" of clang tidy is a concern in this case? Maybe clarifying what you would prefer to see the check called and whether it would be in the linuxkernel namespace of checks or something else would help, @aaron.ballman ?

When you pass -fix to clang-tidy, it will apply fix-its from the compiler as well. See https://godbolt.org/z/7vzM4b and try adding a semicolon to the end of the switch statement and note how it's removed when recompilation happens. What's more, the frontend will generate fix-its on some compile errors as well, so passing -fix may generate more changes than just removing semicolons even if you disable all frontend warnings. So if I understand your use case properly, you'd want to disable all frontend warnings, run clang-tidy with just this check enabled and pass -fix to be able to automatically catch and fix *only* the case where there's a semicolon after a switch (and just sort of hope that fixits from compile errors are reasonable to look at). Is that any more practical of a scenario?

Doesn't -W disable all warnings? If -W -Wextra-semi-stmt would only warn/produce fixits related to -Wextra-semi-stmt that might be feasible, but it's not as easy to integrate into the Linux kernel's build system (parts of the tree enable additional warnings), as compared to our current clang-tidy integration (which "just works").

Thanks for the further explanation -- I was thinking of something along the lines of -W -Wextra-semi-stmt but wasn't aware of the build system particulars.

While we have a module for the Linux kernel and it's reasonable to argue that this check certainly applies there, I also think it's defensible to argue that this is a somewhat odd case that's better handled by an out-of-tree script rather than having the community carry around functionality that duplicates what the frontend already supports.

I'm happy to take ownership of linuxkernel scoped clang-tidy checks, as the Linux kernel maintainer for LLVM support. Even after using this for a tree-wide change, it's still useful IMO for us to automate clang-tidy runs to prevent this pattern from recurring. And anything we can do to promote LLVM within the Linux kernel community, such as leveraging clang-tidy, is a huge win for both projects. And I wouldn't mind carrying checks for a brief period and removing them if we find they no longer provide value. Worst case, just having the patch up on phab lets future travels find what was used to make large sweeping changes and can be linked to from kernel commit messages.

Thank you for volunteering! Having a primary point of contact who knows Linux kernel development for this module would be really helpful given the specific domain it's in.

Btw, I'd say that the whitespace is not really something we'd consider an issue -- we generally expect the user to format their code changes after applying fix-its.

Heh, clang-tidy is another tool we're...hoping to promote the use of more. That's it's own distinct battle.

Given that we already have a module for Linux kernel-specific clang-tidy checks for this to live in, I can hold my nose on the fact that it's poorly replicating the frontend's work because other users won't be surprised by it (in theory). So I'll retract my disapproval; you have compelling rationale for why you want to do it this way.

Since the plan is to produce multiple distinct checks for each extraneous semicolon situation, then I think this check should probably be renamed to something more generic and given options to control which distinct scenarios to check for.

That's fair feedback. Maybe something to note that this is more stylistic and doesn't necessarily fix bugs?

I don't have strong opinions on stylistic vs not.

This will reduce the amount of compilation overhead for the clang-tidy project over time by not needing to introduce a new check (with new boilerplate) for each scenario but should hopefully still allow you to do what you need (with config files perhaps) in your CI. WDYT?

I don't see how renaming the check changes "compilation overhead" or why we think "compilation overhead" of clang tidy is a concern in this case?

I meant that if we had distinct checks linuxkernel-switch-semi, linuxkernel-for-loop-semi, linuxkernel-middle-of-nowhere-semi, etc that each one of those checks would require their own header file, source file, test files, documentation, etc. whereas if we had a single check, we'd reduce that overhead by only having one header, one source, one documentation, etc using config options, which makes fetching or building clang-tidy go ever-so-slightly faster.

Maybe clarifying what you would prefer to see the check called and whether it would be in the linuxkernel namespace of checks or something else would help, @aaron.ballman ?

I definitely think the check should live in the linuxkernel module. How about linuxkernel-spurious-semi, linuxkernel-extra-semi, linuxkernel-remove-useless-semi-colons, or anything that makes you happy and sounds similarly generic?

This will reduce the amount of compilation overhead for the clang-tidy project over time by not needing to introduce a new check (with new boilerplate) for each scenario but should hopefully still allow you to do what you need (with config files perhaps) in your CI. WDYT?

I don't see how renaming the check changes "compilation overhead" or why we think "compilation overhead" of clang tidy is a concern in this case?

I meant that if we had distinct checks linuxkernel-switch-semi, linuxkernel-for-loop-semi, linuxkernel-middle-of-nowhere-semi, etc that each one of those checks would require their own header file, source file, test files, documentation, etc. whereas if we had a single check, we'd reduce that overhead by only having one header, one source, one documentation, etc using config options, which makes fetching or building clang-tidy go ever-so-slightly faster.

Ah, so you're recommending that future checks related to additional/extraneous semicolons also be placed in this check, rather than their own? I don't have a problem with that.

Maybe clarifying what you would prefer to see the check called and whether it would be in the linuxkernel namespace of checks or something else would help, @aaron.ballman ?

I definitely think the check should live in the linuxkernel module. How about linuxkernel-spurious-semi, linuxkernel-extra-semi, linuxkernel-remove-useless-semi-colons, or anything that makes you happy and sounds similarly generic?

It's currently called "linuxkernel-switch-semi", so if you'd like it to be slightly more generic, @trixirt would you mind renaming this check to something slightly more generic like one of the above (or something similar)?

This will reduce the amount of compilation overhead for the clang-tidy project over time by not needing to introduce a new check (with new boilerplate) for each scenario but should hopefully still allow you to do what you need (with config files perhaps) in your CI. WDYT?

I don't see how renaming the check changes "compilation overhead" or why we think "compilation overhead" of clang tidy is a concern in this case?

I meant that if we had distinct checks linuxkernel-switch-semi, linuxkernel-for-loop-semi, linuxkernel-middle-of-nowhere-semi, etc that each one of those checks would require their own header file, source file, test files, documentation, etc. whereas if we had a single check, we'd reduce that overhead by only having one header, one source, one documentation, etc using config options, which makes fetching or building clang-tidy go ever-so-slightly faster.

Ah, so you're recommending that future checks related to additional/extraneous semicolons also be placed in this check, rather than their own? I don't have a problem with that.

Yup!

This revision now requires changes to proceed.Nov 19 2020, 12:50 PM