This is an archive of the discontinued LLVM Phabricator instance.

[analyzer][NFC] Move the data structures from CheckerRegistry to the Core library
ClosedPublic

Authored by Szelethus on Jun 25 2020, 10:45 AM.

Details

Summary

If you were around the analyzer for a while now, you must've seen a lot of patches that awkwardly puts code from one library to the other:

  • D75360 moves the constructors of CheckerManager, which lies in the Core library, to the Frontend library. Most the patch itself was a struggle along the library lines.
  • D78126 had to be reverted because dependency information would be utilized in the Core library, but the actual data lied in the frontend. D78126#inline-751477 touches on this issue as well.

This stems from the often mentioned problem: the Frontend library depends on Core and Checkers, Checkers depends on Core. The checker registry functions (registerMallocChecker, etc) lie in the Checkers library in order to keep each checker its own module. What this implies is that checker registration cannot take place in the Core, but the Core might still want to use the data that results from it (which checker/package is enabled, dependencies, etc).

D54436 was the patch that initiated this. Back in the days when CheckerRegistry was super dumb and buggy, it implemented a non-documented solution to this problem by keeping the data in the Core, and leaving the logic in the Frontend. At the time when the patch landed, the merger to the Frontend made sense, because the data hadn't been utilized anywhere, and the whole workaround without any documentation made little sense to me.

So, lets put the data back where it belongs, in the Core library. This patch introduces CheckerRegistryData, and turns CheckerRegistry into a short lived wrapper around this data that implements the logic of checker registration. The data is tied to CheckerManager because it is required to parse it.

Side note: I can't help but cringe at the fact how ridiculously awkward the library lines are. I feel like I'm thinking too much inside the box, but I guess this is just the price of keeping the checkers so modularized.

edit:

In detail:

  • Move all data structures, dumps, -help flag printers as well as a getter function from CheckerRegistry to CheckerRegistryData. CheckerRegistry can only be constructed now with by passing a data object as a constructor, so it acts as a glorified initialization function.
  • CheckerManager now owns an instance of CheckerRegistryData, and confines the lifetime of the CheckerRegistry to the constructor's body.
  • Rename InitializationFunction to RegisterCheckerFn.

Diff Detail

Event Timeline

Szelethus created this revision.Jun 25 2020, 10:45 AM
Szelethus edited the summary of this revision. (Show Details)Jun 25 2020, 10:53 AM

Good job, massive props to you for such an overhaul.

clang/include/clang/StaticAnalyzer/Core/CheckerManager.h
156–157

placer -> place

clang/include/clang/StaticAnalyzer/Core/CheckerRegistryData.h
17

querried -> queried

84

Could you please explain why use zero for the small-vector element count? My first thought would be that literally anything other than 0 would be more beneficial because of non-dynamic storage, even if we have use a totally ad-hoc value here (like 8 or 42 or whatever). Why not std::vector if we do not want static storage? Maybe there is something that I'm just not aware of :D

clang/lib/StaticAnalyzer/Core/CheckerRegistryData.cpp
180

Its -> It's

clang/lib/StaticAnalyzer/Frontend/CheckerRegistry.cpp
471–472

A bit confused here. It is not immediately clear for me from which namespace are we importing CmdLineOption. Is it possible to use a fully qualified type name on the RHS of the using statement?

Szelethus marked 2 inline comments as done.Jun 30 2020, 3:20 AM
Szelethus added inline comments.
clang/include/clang/StaticAnalyzer/Core/CheckerRegistryData.h
84

https://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h

SmallVector has grown a few other minor advantages over std::vector, causing SmallVector<Type, 0> to be preferred over std::vector<Type>.

clang/lib/StaticAnalyzer/Frontend/CheckerRegistry.cpp
471–472

This looks stupid because it is :^) Accidentally left it there.

It looks like that the dependency manipulation and computation functions, and the checker and option add functions can be member of CheckerRegistryData. It would be a bit more simple (no Data. call), and these belong naturally to there.

clang/include/clang/StaticAnalyzer/Core/CheckerManager.h
156–157

placer -> place

clang/include/clang/StaticAnalyzer/Core/CheckerRegistryData.h
200

couting -> counting

Szelethus updated this revision to Diff 274445.EditedJun 30 2020, 6:20 AM

Fixes according to reviewer comments!

edit: from @gamesh411.

Szelethus marked 6 inline comments as done.Jun 30 2020, 6:53 AM

It looks like that the dependency manipulation and computation functions

I guess there are two ways of looking at this:

  • CheckerRegistry only contains the logic CheckerRegistryData cannot due to the library layout, but it is otherwise minimal
  • CheckerRegistryData is data only, and all logic is placed in CheckerRegistry

I would prefer the second option -- dependency resolving is one of the most crucial, and very non-trivial task of CheckerRegistry, and while moving it to CheckerRegistryData wouldn't violate the library layout, it would certainly go outside of what I'd expect a data-centric class to do. I also would like to keep the interface of it minimal.

and the checker and option add functions can be member of CheckerRegistryData. It would be a bit more simple (no Data. call), and these belong naturally to there.

I don't want to encourage the modification of the data by anyone other then CheckerRegistry.

On another note, this would break all plugin code. We haven't shied away from that in the past, but historically speaking, plugins and the unit test files only interacted with CheckerRegistry, and seeing how it is the logic behind checker registration, I think its fitting.

Szelethus edited the summary of this revision. (Show Details)Jul 3 2020, 2:49 AM
Szelethus set the repository for this revision to rG LLVM Github Monorepo.

Basically this looks good in the current way too, but I can only assume that the code was copied correctly. A rule could be that the CheckerRegistryData is manipulated only by CheckerRegistry but get or print functions can be in CheckerRegistryData.

clang/unittests/StaticAnalyzer/CallEventTest.cpp
13 ↗(On Diff #274445)

This line seems to be not needed here.

clang/include/clang/StaticAnalyzer/Core/CheckerRegistryData.h
217

Why not store AnOpts in CheckerRegistryData? It would save the need to pass it to every function separately.

clang/lib/StaticAnalyzer/Core/CheckerRegistryData.cpp
167

Why do we need the new AnOpts parameter here? It does not seem to be used at all.

Szelethus updated this revision to Diff 275378.Jul 3 2020, 5:53 AM
Szelethus marked 6 inline comments as done.

Small fixes according to reviewer comments.

clang/include/clang/StaticAnalyzer/Core/CheckerRegistryData.h
217

This is definitely a point where I chose a well defined interface over convenience. These print functions only have a single user, CompilerInstance, which must juggle all sorts of analyzer managers anyways, so it wouldn't have made the code any more readable in my opinion.

The idea was to constrain all data this class has to be strictly about what was processed by CheckerRegistry. While mildly inconvenient, the only way to retrieve this object would be from CheckerManager, that also has a getter to AnalyzerOptions, so in case someone were to use this for debug purposes or something, they certainly can.

It looks OK.

This revision is now accepted and ready to land.Jul 3 2020, 8:43 AM
This revision was automatically updated to reflect the committed changes.

This broke the Green Dragon build for Clang: http://lab.llvm.org:8080/green/view/LLDB/job/lldb-cmake/

FAILED: bin/diagtool 
: && /Users/buildslave/jenkins/workspace/lldb-cmake/host-compiler/bin/clang++  -Wdocumentation -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -fmodules -fmodules-cache-path=/Users/buildslave/jenkins/workspace/lldb-cmake/lldb-build/module.cache -fcxx-modules -Xclang -fmodules-local-submodule-visibility -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O3  -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk -Wl,-search_paths_first -Wl,-headerpad_max_install_names -Wl,-dead_strip tools/clang/tools/diagtool/CMakeFiles/diagtool.dir/diagtool_main.cpp.o tools/clang/tools/diagtool/CMakeFiles/diagtool.dir/DiagTool.cpp.o tools/clang/tools/diagtool/CMakeFiles/diagtool.dir/DiagnosticNames.cpp.o tools/clang/tools/diagtool/CMakeFiles/diagtool.dir/FindDiagnosticID.cpp.o tools/clang/tools/diagtool/CMakeFiles/diagtool.dir/ListWarnings.cpp.o tools/clang/tools/diagtool/CMakeFiles/diagtool.dir/ShowEnabledWarnings.cpp.o tools/clang/tools/diagtool/CMakeFiles/diagtool.dir/TreeView.cpp.o  -o bin/diagtool  -Wl,-rpath,@loader_path/../lib  lib/libLLVMSupport.a  lib/libclangBasic.a  lib/libclangFrontend.a  lib/libclangDriver.a  lib/libclangParse.a  lib/libclangSerialization.a  lib/libclangSema.a  lib/libclangEdit.a  lib/libclangAnalysis.a  lib/libclangASTMatchers.a  lib/libclangAST.a  lib/libclangLex.a  lib/libclangBasic.a  lib/libLLVMFrontendOpenMP.a  lib/libLLVMTransformUtils.a  lib/libLLVMAnalysis.a  lib/libLLVMObject.a  lib/libLLVMMCParser.a  lib/libLLVMMC.a  lib/libLLVMDebugInfoCodeView.a  lib/libLLVMDebugInfoMSF.a  lib/libLLVMTextAPI.a  lib/libLLVMBitReader.a  lib/libLLVMOption.a  lib/libLLVMProfileData.a  lib/libLLVMCore.a  lib/libLLVMBinaryFormat.a  lib/libLLVMRemarks.a  lib/libLLVMBitstreamReader.a  lib/libLLVMSupport.a  -lz  -lcurses  -lm  lib/libLLVMDemangle.a && :
Undefined symbols for architecture x86_64:
  "clang::ento::PackageInfo::dumpToStream(llvm::raw_ostream&) const", referenced from:
      clang::ento::PackageInfo::dump() const in ShowEnabledWarnings.cpp.o
      clang::ento::PackageInfo::dump() const in libclangFrontend.a(CompilerInstance.cpp.o)
      clang::ento::PackageInfo::dump() const in libclangFrontend.a(CompilerInvocation.cpp.o)
      clang::ento::PackageInfo::dump() const in libclangFrontend.a(CreateInvocationFromCommandLine.cpp.o)
      clang::ento::PackageInfo::dump() const in libclangFrontend.a(TextDiagnosticBuffer.cpp.o)
      clang::ento::PackageInfo::dump() const in libclangFrontend.a(FrontendAction.cpp.o)
      clang::ento::PackageInfo::dump() const in libclangFrontend.a(FrontendOptions.cpp.o)
      ...
  "clang::ento::CmdLineOption::dumpToStream(llvm::raw_ostream&) const", referenced from:
      clang::ento::CmdLineOption::dump() const in ShowEnabledWarnings.cpp.o
      clang::ento::CmdLineOption::dump() const in libclangFrontend.a(CompilerInstance.cpp.o)
      clang::ento::CmdLineOption::dump() const in libclangFrontend.a(CompilerInvocation.cpp.o)
      clang::ento::CmdLineOption::dump() const in libclangFrontend.a(CreateInvocationFromCommandLine.cpp.o)
      clang::ento::CmdLineOption::dump() const in libclangFrontend.a(TextDiagnosticBuffer.cpp.o)
      clang::ento::CmdLineOption::dump() const in libclangFrontend.a(FrontendAction.cpp.o)
      clang::ento::CmdLineOption::dump() const in libclangFrontend.a(FrontendOptions.cpp.o)
      ...
  "clang::ento::CheckerInfo::dumpToStream(llvm::raw_ostream&) const", referenced from:
      clang::ento::CheckerInfo::dump() const in ShowEnabledWarnings.cpp.o
      clang::ento::CheckerInfo::dump() const in libclangFrontend.a(CompilerInstance.cpp.o)
      clang::ento::CheckerInfo::dump() const in libclangFrontend.a(CompilerInvocation.cpp.o)
      clang::ento::CheckerInfo::dump() const in libclangFrontend.a(CreateInvocationFromCommandLine.cpp.o)
      clang::ento::CheckerInfo::dump() const in libclangFrontend.a(TextDiagnosticBuffer.cpp.o)
      clang::ento::CheckerInfo::dump() const in libclangFrontend.a(FrontendAction.cpp.o)
      clang::ento::CheckerInfo::dump() const in libclangFrontend.a(FrontendOptions.cpp.o)
      ...
ld: symbol(s) not found for architecture x86_64

Thank you so much! Kind of ironic considering that this entire patch was all about fixing them in the first place :)