This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] Add infrastructure support for running on project-level information
Needs ReviewPublic

Authored by bahramib on Apr 26 2022, 4:50 AM.

Details

Summary

This patch adds the necessary infrastructure bindings and overloads that
allow checks to implement collecting and emitting per-TU data to a
directory from which the diagnosis mode can read it and decide whether
to diagnose.

What checks emit and how they transform the data is a per-check
implementation detail.

Diff Detail

Event Timeline

bahramib created this revision.Apr 26 2022, 4:50 AM
Herald added a project: Restricted Project. ยท View Herald TranscriptApr 26 2022, 4:50 AM

Very good, this is a great initiative! Also, the quality of this patch is high, docs are great, even the release notes are updated. I would accept if I were that confident in Clang-tidy.

Thanks, I think this is a neat idea! I haven't had the chance to review thoroughly yet, but wanted to point out that precommit CI is showing a fair amount of failures that I believe are related to your patch.

precommit CI is showing a fair amount of failures that I believe are related to your patch.

It was me who helped @bahramib to rebase and split a much larger and more convoluted development history into broken up patches for review, and I have no idea what is causing the CI issues, because all the CI issues are related to unit test libraries removing const from fixits and such(??) which this patch (or any in the current patch set) doesn't even touch, at all. I did run the test targets locally... it's likely that simply the rebase and the push happened against an unclean/breaking main branch...

We can investigate later, but I think the review can be done either way. If something is related, it has to be tangential, we strived to make the entire new infrastructure fully backwards compatible.

The only reason I can think of is that some LangOpts might not be forwarded correctly somewhere, will definitely look into this.

(Also, full disclosure, I am the B.Sc. thesis supervisor of @bahramib so I'm only here to keep an eye on (and fix up the "very LLVM-y" parts) of the check, I cannot impartially review and accept this as I was involved in the full creation process. ๐Ÿ˜‡)

precommit CI is showing a fair amount of failures that I believe are related to your patch.

I have no idea what is causing the CI issues, because all the CI issues are related to unit test libraries removing const from fixits and such(??) which this patch (or any in the current patch set) doesn't even touch, at all. I did run the test targets locally... it's likely that simply the rebase and the push happened against an unclean/breaking main branch...

Rebase & rerun against a current main branch produces the issues locally for me, too. So far I have no idea what might be causing this, but I'll keep on digging. However, let's continue with discussing the general approach meanwhile. ๐Ÿ™‚

precommit CI is showing a fair amount of failures that I believe are related to your patch.

I have no idea what is causing the CI issues, because all the CI issues are related to unit test libraries removing const from fixits and such(??) which this patch (or any in the current patch set) doesn't even touch, at all. I did run the test targets locally... it's likely that simply the rebase and the push happened against an unclean/breaking main branch...

Rebase & rerun against a current main branch produces the issues locally for me, too. So far I have no idea what might be causing this, but I'll keep on digging. However, let's continue with discussing the general approach meanwhile. ๐Ÿ™‚

Actually, it turned out to be easy after all! The implementation is backwards compatible on the command-line but was not for the "machine-driven" tests that are implemented as C++ code and execution scaffold. Adding the below commented change to the diff completely solved the failing tests for me locally (turns out none of the checks were actually running the check() callback!), but I cannot update a "Revision" that I do not own.

clang-tools-extra/clang-tidy/ClangTidyOptions.h
60

โš  The crucial fix against the unit-test failures!

In general, I think this is looking pretty close to good. Whether clang-tidy should get this functionality in this form or not is a bit less clear to me. *I* think it's helpful functionality and the current approach is reasonable, but I think it might be worthwhile to have a community RFC to see if others agree. CC @alexfh in case he wants to make a code owner decision instead.

clang-tools-extra/clang-tidy/ClangTidy.cpp
321โ€“324

I'd add the braces for readability despite them not being strictly required.

347โ€“348
clang-tools-extra/clang-tidy/ClangTidyCheck.cpp
54

I'm on the fence about whether this should be an unreachable or an assert. This seems like it's an assertion situation (if you reach that case, the programmer made a mistake; it's not that you can never reach this case).

clang-tools-extra/clang-tidy/ClangTidyCheck.h
118โ€“119
122

I'm not certain what "to the right" means in this context, so some extra clarification here would be reasonable. I think it's saying that it's a post order traversal?

clang-tools-extra/clang-tidy/ClangTidyDiagnosticConsumer.cpp
242โ€“243

I think this would be expressed better via an assert.

246

Why is this a reference?

clang-tools-extra/clang-tidy/tool/ClangTidyMain.cpp
552โ€“553

This looks like a TOCTOU issue; create_directory() takes an argument for whether you should ignore existing or not, that should suffice here, right?

The idea looks great in general, I didn't get a chance to look at all the details but this also creates some concerns for integrations of clang-tidy checks in other environments (like clangd or tidy running in distributed systems) as the workflow actually needs to be run twice and be able to write to disk.

So I'd actually ask for changing the way the outputting is performed. It'd be better to have an interface where checks can perform this finding reporting operation, that way clangd and other means of running clang-tidy can implement that interface as they see fit (put everything into a database, collect in memory, etc.) and clang-tidy binary by default can implement writing to disk (through VFS rather than physical FS operations if possible).
In addition to that it'd be nice to have a flag on checks to indicate whether they can work in a certain phase, and make sure instantiation only happens for checks that can run at a certain phase. This ensures checks can have assumptions about the phase they're being run at and also drivers can assume checks won't perform invalid operations. That way we should get to reduce complexity on both sides. We can by default say that checks work in diagnose phase and doesn't work in other phases, then relevant checks can override these bits.

As a more general comment to the overall approach, the "compact" phase probably doesn't require any tidy specific infrastructure. Only pieces that can benefit re-use is outputting edits, but this is already common infra across all clang-tools. So another direction overall here could be to actually change the way clang-tidy outputs findings into two forms:

  • a machine readable format, that can later on be used by other tools to perform more analysis
  • a human readable format, the usual thing today.

Have we considered this? Any reason for not going down this path?

clang-tools-extra/clang-tidy/ClangTidyCheck.cpp
98

filename and contents might not be enough here. as clang-tidy is usually run over a compilation database and there might be multiple entries for the same file with different compile flags. i think we also want to include a hash of the flags in the filename.

ymandel added a subscriber: ymandel.Jul 8 2022, 5:52 AM

In general, I think this is looking pretty close to good. Whether clang-tidy should get this functionality in this form or not is a bit less clear to me. *I* think it's helpful functionality and the current approach is reasonable, but I think it might be worthwhile to have a community RFC to see if others agree. CC @alexfh in case he wants to make a code owner decision instead.

+1 to a need for the RFC here, there are ecosystem questions that I think should be addressed first but aren't in scope of this patch.
Changing the "plugin signature" of checks in a widely-used plugin system like clang-tidy shouldn't be taken lightly.

In particular:

  • what is a motivating example of the problem you're solving? (sorry if I missed this)
  • which checks do you plan to contribute/modify that will make use of this functionality?
  • for existing users of clang-tidy that don't adopt the new workflow (clang-tidy without --multipass-phase, code review systems, clangd and other IDEs), how do they interact with such checks?
  • will you be maintaining/extending the framework functionality & design long-term, or for a particular time span?
  • do you have plans for how to deploy this to users? (e.g. running the CLI commands by hand, or integration into a code review tool)
  • what are the expectations on check owners for understanding/supporting this mode?
  • is this extensible to distributed execution like clang-tidy currently is? (this patch appears to assume coordination through a shared local filesystem, and "compact" appears to be a large monolithic step)
  • what alternative designs were considered?

I really would like to see these addressed in an RFC rather than here - I think it will attract wider discussion.
(I'm sure I'll have an opinion, but mostly hoping to flesh out the plan to evolve clang-tidy so others can react)

In general, I think this is looking pretty close to good. Whether clang-tidy should get this functionality in this form or not is a bit less clear to me. *I* think it's helpful functionality and the current approach is reasonable, but I think it might be worthwhile to have a community RFC to see if others agree. CC @alexfh in case he wants to make a code owner decision instead.

+1 to a need for the RFC here, there are ecosystem questions that I think should be addressed first but aren't in scope of this patch.

I will attempt to write a concise response to this today (and tomorrow)! Sorry, I was away travelling to and doing post-action bureaucracy of conferences the past few weeks.

I will attempt to write a concise response to this today (and tomorrow)! Sorry, I was away travelling to and doing post-action bureaucracy of conferences the past few weeks.

@aaron.ballman @kadircet @sammccall, FYI, I have posted it here: http://discourse.llvm.org/t/rfc-enhancing-clang-tidy-with-project-level-knowledge/63960