Page MenuHomePhabricator

[clang-tidy][WIP] Do not run perfect alias checks
Needs ReviewPublic

Authored by carlosgalvezp on Nov 20 2021, 6:01 AM.

Details

Summary

Find "perfect aliases" and remove them from the list of checks to run.
On large codebases, this can save up to 30% runtime, which is not negligible.

A "perfect alias" is an alias that:

  • Is of the same check class as the primary check.
  • Has all configuration options the same as the primary check.

NOTE TO REVIEWERS:

This is a WIP, quick and dirty PoC, so the code is not pretty, well structured
nor optimal.

The intention is to make this feature opt-in via config/CLI.
I.e. turned off by default -> should not break things.

Before polishing the patch I'd like to get feedback about whether this
is the desired direction, and in general if I'm putting the pieces in the right places.

DESIGN GOALS:

  • No impact on check developers, other than having to register the check via the new macro instead of the regular function.
  • Only perfect aliases to be removed.

MISSING BITS/REQUIREMENTS FROM THE RFC:

  • It would be desirable to still keep the diagnostic/fixit for the alias checks. I don't know how to implement this in practice. It would require each check to call the diag() function in a loop for all alias checks. This would be very disruptive for check developers. I cannot implement this in the ClangTidyCheck class because the diag() function returns an object, so I can't put a loop there.
  • Ideally, it would be nice to show "CACHED - 0 seconds" or "SKIPPED - 0 seconds" in the profiling output (currently, the alias check just doesn't show up). Again, I don't know how to implement this feature in a non-intrusive way that makes sense.

Suggestions are very welcome for these two bits! Otherwise I am personally
happy to skip them.

Current working test:

clang-tidy -checks=-*,*-c-arrays,*-in-classes

The result is:

  • Only one of the 3 "c-arrays" check is run (the primary check), because the other 2 checks are perfect aliases.
  • Both of the "-in-classes" checks (cppcoreguidelines, misc) are run, because they have different configuration options.

Diff Detail

Event Timeline

carlosgalvezp created this revision.Nov 20 2021, 6:01 AM
carlosgalvezp requested review of this revision.Nov 20 2021, 6:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 20 2021, 6:01 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

What would you say are the key differences between this patch differ and previous attempts, e.g. https://reviews.llvm.org/D72566? How does this patch address the concerns raised in the previous reviews?

carlosgalvezp added a comment.EditedNov 23 2021, 1:12 PM

What would you say are the key differences between this patch differ and previous attempts, e.g. https://reviews.llvm.org/D72566? How does this patch address the concerns raised in the previous reviews?

The key differences are:

  • It has been previously discussed and agreed upon via RFC in the mailing list, as requested as a comment in the previous patch.
  • Check developers do not need to manually specify which check is aliasing which check. It's automated and transparent, thus less burden on check developers and less possibility for errors.
  • Only aliases with the same configuration (perfect aliases) are removed. The patch you mention would run only one instance with it's configuration, so "aliases with different configuration" would incorrectly not be run (if I read the commit message correctly).

I don't have a strong opinion as to which patch to use to continue this work, I just would like to work on making it happen. It's been a standing issue for quite a long time :)

Thank you for looking into it!

What would you say are the key differences between this patch differ and previous attempts, e.g. https://reviews.llvm.org/D72566? How does this patch address the concerns raised in the previous reviews?

The key differences are:

  • It has been previously discussed and agreed upon via RFC in the mailing list, as requested as a comment in the previous patch.
  • Check developers do not need to manually specify which check is aliasing which check. It's automated and transparent, thus less burden on check developers and less possibility for errors.
  • Only aliases with the same configuration (perfect aliases) are removed. The patch you mention would run only one instance with it's configuration, so "aliases with different configuration" would incorrectly not be run (if I read the commit message correctly).

The leftover caveat of this approach is that it does not help the checks with configuration options,
because it is likely that the options were tuned, but most likely only for a single check.
I still maintain that there should be a single opt-in option to completely disable all aliasee checks.

I don't have a strong opinion as to which patch to use to continue this work, I just would like to work on making it happen. It's been a standing issue for quite a long time :)

it is likely that the options were tuned, but most likely only for a single check.

Yes, if the alias check has a different configuration than the primary check then they are considered as "different" checks (as they should be, since they produce potentially different outputs and some people might lose coverage). Perhaps it would be interesting to add something to the "--explain-config" flag so the user understands why some checks are/aren't enabled.

I still maintain that there should be a single opt-in option to completely disable all aliasee checks.

By "aliasee" do you mean "alias" checks (as opposed to "primary" check)? I'm a bit confused by the terminology :) In order words, are you proposing to opt-in to disable all alias checks, leaving only the primary check (regardless of configuration)? I suppose this should be fairly easy to implement, just don't check the configuration when removing aliases. We can support both options for people having different use cases.

it is likely that the options were tuned, but most likely only for a single check.

Yes, if the alias check has a different configuration than the primary check then they are considered as "different" checks (as they should be, since they produce potentially different outputs and some people might lose coverage).

Yep.

Perhaps it would be interesting to add something to the "--explain-config" flag so the user understands why some checks are/aren't enabled.

I still maintain that there should be a single opt-in option to completely disable all aliasee checks.

By "aliasee" do you mean "alias" checks (as opposed to "primary" check)? I'm a bit confused by the terminology :)

Yes.

In order words, are you proposing to opt-in to disable all alias checks, leaving only the primary check (regardless of configuration)? I suppose this should be fairly easy to implement, just don't check the configuration when removing aliases. We can support both options for people having different use cases.

IMO that would be best, otherwise this still seems like a half a solution :/ (not that this is worse than currently)

carlosgalvezp added a comment.EditedNov 24 2021, 1:25 AM

this still seems like a half a solution

I see, thanks for the input! I took configuration into account due to the concerns raised in previous patch attempts and in the RFC. I believe both use cases are valid so it makes sense to me to support both as independent opt-in flags. What do you think, @aaron.ballman ?

Out of the scope of this patch, I think it's important to document what exactly we mean by "alias". Users currently think that they are "interchangeable" and they are not aware that they are configured with different options by default, producing different results. For example, cppcoreguidelines-non-private-members-in-classes is less strict than misc-non-private-members-in-classes.

As I was writing this patch, I was thinking that another possibility is to add an option for clang-tidy to print out the alias checks, if we don't want to disable them. As a user, my main goal is to get an answer to the question: "what checks can I safely disable to speed up clang-tidy without losing coverage"? If clang-tidy can print this list for me, I can copy-paste it into my .clang-tidy file and be happy. I don't want to manually look at the HTML docs (which are manually-written and thus error-prone) nor at the source code to find this out, each time I update clang-tidy.

carlosgalvezp edited the summary of this revision. (Show Details)Nov 27 2021, 9:15 AM
carlosgalvezp added a reviewer: whisperity.

Kind ping to reviewers, I mostly want to know if you agree with the general direction so I can go ahead and implement the rest of the patch. It's a big change (replace all check registrations with the macro) so I'd like to avoid extra work if you are strongly against it.

@aaron.ballman I understand you are super busy and don't want to take any more time than needed from you. I'd just love to hear your thoughts about the two missing points that we discussed in the RFC. Nobody else seems to have raised similar concerns about it.

Really appreciate your time!

aaron.ballman added a reviewer: alexfh.EditedDec 16 2021, 6:11 AM
aaron.ballman added a subscriber: alexfh.

Kind ping to reviewers, I mostly want to know if you agree with the general direction so I can go ahead and implement the rest of the patch. It's a big change (replace all check registrations with the macro) so I'd like to avoid extra work if you are strongly against it.

@aaron.ballman I understand you are super busy and don't want to take any more time than needed from you. I'd just love to hear your thoughts about the two missing points that we discussed in the RFC. Nobody else seems to have raised similar concerns about it.

Really appreciate your time!

Sorry about the lack of response! Taking (effectively) two weeks off put me pretty far behind on reviews (I'm down to "only" ~30 reviews in my queue as of this morning), and I head out for another vacation starting tomorrow until the end of the year. So please anticipate further delays from me, sorry!

It would be desirable to still keep the diagnostic/fixit for the alias checks. I don't know how to implement this in practice. It would require each check to call the diag() function in a loop for all alias checks. This would be very disruptive for check developers. I cannot implement this in the ClangTidyCheck class because the diag() function returns an object, so I can't put a loop there.

I'd like to make sure I'm on the same page as you. What you mean here is that you want the diagnostic to be something like: error: forgot to frobble the quux [bugprone-frobble-quux, cert-quux45-cpp] instead of error: forgot to frobble the quux [bugprone-frobble-quux]? Or do you mean something else?

Ideally, it would be nice to show "CACHED - 0 seconds" or "SKIPPED - 0 seconds" in the profiling output (currently, the alias check just doesn't show up). Again, I don't know how to implement this feature in a non-intrusive way that makes sense.

That would be handy, but I think is totally reasonable for a follow-up.

I see, thanks for the input! I took configuration into account due to the concerns raised in previous patch attempts and in the RFC. I believe both use cases are valid so it makes sense to me to support both as independent opt-in flags. What do you think, @aaron.ballman ?

Eh, I'm less convinced of this, but I'm also not certain I'm strongly opposed (mostly weakly opposed at this point). Aliases with different configuration options are positioned to users as being separate checks. For example, we could have one check that decides to convert all punctuators it can to be alternate tokens (and instead of &&, etc) and there could be an alias check that converts all alternative tokens back into punctuators but is implemented as an alias. The fact that it's an alias is an implementation detail that should not matter to the user, and being able to disable *all* checks regardless of their configuration strikes me as making these kinds of situations confusing to users and I think it closes a development door we don't really want to close. I'd want to have buy-in from @alexfh as code owner for that option. However, I think it'd be good to ensure that the design can support such an option in the future (at least, support as much as reasonable) so we can do it as follow-up work without extra effort.

clang-tools-extra/clang-tidy/ClangTidyModule.h
25–26

I'm not 100% sold that this macro carries its weight. I almost think the registration code is easier to read without it, but not enough to suggest a change yet. Curious if others have feelings here.

me pretty far behind on reviews

Wow, that's a lot to review! I will keep it in mind and hopefully only ping you when I have a nice final design ready to go. Really appreciate your early feedback!

I'd like to make sure I'm on the same page as you.

Yes, exactly, from the RFC we agreed that the desired outcome of this patch would be that the same diagnostics and fixits are emitted, regardless of the alias checks actually running. I.e. this patch should not have any visible effects towards the user (other than faster runtime). The patch, as it is right now _will_ eliminate those diagnostics (since the alias checks are removed from the list), which to my understanding are not desirable. In order to implement this, the primary check needs to emit copies of the same diagnostic as if it were the alias check.

carlosgalvezp added inline comments.Dec 16 2021, 7:59 AM
clang-tools-extra/clang-tidy/ClangTidyModule.h
25–26

I'm not loving it either, to be honest, but I couldn't think of anything better that wouldn't cause developer friction. I've looked into RTTI (banned in LLVM) and hand-made LLVM RRTI, which I think would be quite a burden for check developers since they'd need to add a new enum to the list, implement a getClass() method, etc (IIUC).

Previous attempts as this problem created a different register function like:

Factory.registerCheck("foo-check", misc::PrimaryCheck, "misc-check")

Which I don't like much because the developer needs to spend time digging through the code and docs to see what is the name of the check it's aliasing, which is error-prone and can easily get out of sync in the future.

With this macro, all of this is automated and always in sync. But again, if someone has some cleverer magic that would work here I'd love to bring it in :)

Since replacing existing lines with the macro is a pretty big change, I'll wait until I get OK from reviewers to avoid spending time if in the end we go for another solution.

me pretty far behind on reviews

Wow, that's a lot to review! I will keep it in mind and hopefully only ping you when I have a nice final design ready to go. Really appreciate your early feedback!

Happy to help and I really appreciate your patience!

I'd like to make sure I'm on the same page as you.

Yes, exactly, from the RFC we agreed that the desired outcome of this patch would be that the same diagnostics and fixits are emitted, regardless of the alias checks actually running. I.e. this patch should not have any visible effects towards the user (other than faster runtime). The patch, as it is right now _will_ eliminate those diagnostics (since the alias checks are removed from the list), which to my understanding are not desirable. In order to implement this, the primary check needs to emit copies of the same diagnostic as if it were the alias check.

Okay, thank you for verifying! I keep reading "copies of the same diagnostic" as though you wanted:

error: forgot to frobble the quux [bugprone-frobble-quux]
error: forgot to frobble the quux [cert-quux45-cpp]

and going "no, I don't think that's quite what we want." :-)

If it turns out that producing output like error: forgot to frobble the quux [bugprone-frobble-quux, cert-quux45-cpp] is really difficult, I don't see it as being a deal breaker to only emit the primary check name. (Same goes for things like LINT comments naming the perfect alias vs naming the primary.)

The idea I had for supporting this was: we already register which check is the primary and all the names of the aliases, and so we don't need to really run the alias check to get its diagnostic output, we could thread some way to go from the diagnostics engine back to our registered lists of aliases, and then print the alias names manually when emitting the diagnostic for the primary. However, I've not tried this idea out and who knows what details I'm missing.

clang-tools-extra/clang-tidy/ClangTidyModule.h
25–26

With this macro, all of this is automated and always in sync.

Yup, and this is the primary reason I'm not suggesting a change yet. This solves the problem we needed to solve, it's just a bit... crufty, maybe? I dunno. I would not spend time finding a replacement yet; if someone has a great idea, we can consider it then.

no, I don't think that's quite what we want." :-)

Sorry for not being specific enough :) What we want is to keep this (current user-facing behavior today):

error: forgot to frobble the quux [bugprone-frobble-quux, cert-quux45-cpp]

As opposed to:

error: forgot to frobble the quux [bugprone-frobble-quux]

(the users wants CERT enabled, and yet they don't see diagnostics coming from cert (even though they effectively are enabled via bugprone)).

I'll try to dig a bit deeper and see if I can figure this out.

Jeroen added a subscriber: Jeroen.Sat, Jan 8, 9:38 AM
Jeroen added inline comments.
clang-tools-extra/clang-tidy/ClangTidy.cpp
442

I'm not sure if you have this in for testing purpose or not, though I really like to have this message. If I would be having 2 aliases active with different options, it would be nice to know about that. So please keep this in!
Preferably, we would only configure it once, though that's dreaming.

carlosgalvezp added inline comments.Sat, Jan 8, 10:24 AM
clang-tools-extra/clang-tidy/ClangTidy.cpp
442

Yep, all printouts are just for my own testing, but good to hear it's useful info for production! I'll see if I get some time and clean this up.