This is an archive of the discontinued LLVM Phabricator instance.

[analyzer][NFC] Demonstrate a move from the analyzer-configs `.def` file to a TableGen file
Needs ReviewPublic

Authored by Szelethus on Jun 16 2021, 11:30 PM.

Details

Summary

As of now, analyzer configs are specified in the preprocessor file clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def, something like this:

#define ANALYZER_OPTION(TYPE, NAME, CMDFLAG, DESC, DEFAULT_VAL)

ANALYZER_OPTION(bool, ShouldIncludeImplicitDtorsInCFG, "cfg-implicit-dtors",
                "Whether or not implicit destructors for C++ objects "
                "should be included in the CFG.",
                true)

ANALYZER_OPTION(
    unsigned, AlwaysInlineSize, "ipa-always-inline-size",
    "The size of the functions (in basic blocks), which should be considered "
    "to be small enough to always inline.",
    3)

ANALYZER_OPTION(
    StringRef, RawSilencedCheckersAndPackages, "silence-checkers",
    "A semicolon separated list of checker and package names to silence. "
    "Silenced checkers will not emit reports, but the modeling remain enabled.",
    "")

ANALYZER_OPTION(
    StringRef, CXXMemberInliningMode, "c++-inlining",
    "Controls which C++ member functions will be considered for inlining. "
    "Value: \"constructors\", \"destructors\", \"methods\".",
    "destructors")

This patch proposes to move to a TableGen format, seen in the diff below. The file generated from it (build/tools/clang/include/clang/StaticAnalyzer/Core/Config.inc) looks like this:

// This file is automatically generated. Do not edit this file by hand.

#ifdef BOOLEAN_OPTIONS
BOOLEAN_OPTION(bool, ShouldIncludeImplicitDtorsInCFG, "cfg-implicit-dtors",
               "Whether or not implicit destructors for C++ objects should be "
               "included in the CFG.",
               true)
#endif // BOOLEAN_OPTIONS

#ifdef INTEGER_OPTIONS
INTEGER_OPTION(int, AlwaysInlineSize, "ipa-always-inline-size",
               "The size of the functions (in basic blocks), which should be "
               "considered to be small enough to always inline.",
               3)
#endif // INTEGER_OPTIONS

#ifdef STRING_OPTIONS
STRING_OPTION(
    StringRef, RawSilencedCheckersAndPackages, "silence-checkers",
    "A semicolon separated list of checker and package names to silence. "
    "Silenced checkers will not emit reports, but the modeling remain enabled.",
    "")
#endif // STRING_OPTIONS

#ifdef ENUM_OPTIONS
ENUM_OPTION(
    CXXMemberInliningModeKind, CXXMemberInliningMode, "c++-inlining",
    "Controls which C++ member functions will be considered for inlining.",
    "destructors",
    "none,"
    "constructors,"
    "destructors,"
    "methods,")
#endif // ENUM_OPTIONS

#ifdef ENUMS
enum class CXXMemberInliningModeKind {
  CIMK_None = 0,
  CIMK_Constructors = 1,
  CIMK_Destructors = 2,
  CIMK_Memberfunctions = 3,
}

#endif // ENUMS

So, we're changing from writing a preprocessor file to generating it -- it'd be a stretch to say that all the trouble synonymous with the preprocessor are now gone. Its also rather easy and intuitive to add a new config to the existing format, and TableGen isn't universally liked in the analyzer. So, here is my pros and cons for why this is would be a good change:

Pros:

  • Any changes to the current analyzer config format is miserable.
    • Might affect every macro, or needs manual #define shenanigans. Making changes to the TableGen emitter file are rather easy and not as invasive.
    • We may want to add features like what we're already seen in the checker TableGen file: development status (whether its in alpha, developer, or are okay for the end user to tinker with) and documentation uri.
    • As arguments aren't strongly typed (there are 3 string arguments one after the other), its easy to make a mistake in them. TableGen doesn't entirely negate this issue -- care still needs to be taken when using the preprocessor file it generates, but leading up to that point is far safer.
    • As there are a lot of entries in that file already, unless you compile with a single job, compilation errors can become really long and confusing, and thats true even if you do compile on a single job. TableGen again doesn't negate this entirely, but gets rid of them at least to the point of writing the configs. Errors are caught while generating the file, and easy to digest errors are emitted.
  • There are in fact 2 .def files for configuring the analyzer: the other is clang/include/clang/StaticAnalyzer/Core/Analyses.def. The latter contains values acceptable values for some of the string configs, for example:
#ifndef ANALYSIS_STORE
#define ANALYSIS_STORE(NAME, CMDFLAG, DESC, CREATFN)
#endif

ANALYSIS_STORE(RegionStore, "region", "Use region-based analyzer store",
               CreateRegionStoreManager)
  • (cont.) In general, we don't have a good handle on string option accepting only a set of values. In the help text, its redundantly hardcoded in the text. We can't just generate it into the help text like this:
#define ANALYSIS_STORE(NAME, CMDFLAG, DESC, CREATFN) #CMDFLAG ","
ANALYZER_OPTION(StringRef, AnalyzerStoreStr, "analyzer-store",
                "The Store implementation of the analyzer. The store is "
                "a mapping from variables or regions to values. Note: The "
                "analyzer was designed with its store being a replaceable "
                "component, but over the years it became increasingly unlikely "
                "that another kind of it will be developed. Value:"
#include "clang/StaticAnalyzer/Core/Analyses.def"
                ,
                "region")
#undef ANALYSIS_STORE
  • (cont.) because such an #include is not portable, not to mention that it is rather ugly. So that redundancy is unavoidable, but thats not an issue if we generate that with TableGen.
  • TableGen allows for more thorough, and arguably less cheesy ways to validate whether the specified analyzer config is valid.
  • Unless you use /*comments=*/, arguments aren't as obvious. I argue that this is far more readable in the attached tablegen file.

Cons:

  • Making changes to the TableGen emitter file might be okay, but still requires TableGen specific knowledge. However, making changes to the .td file itself is a nightmare unless you have some experience with it. In fact, it can be a bit troublesome even if you know a bit of it! Its usually only a few lines of class definition code, but even that can be a bit of a struggle.
  • TableGen helps a lot on the issues faced by working with the preprocessor, but it doesn't negate them. At the end of the day, the generated file still needs to be included in the same, or similar way as before.
  • Likely a bit slower to compile. Not that much, but its something.

It is my personal opinion that the pros outweigh the cons. The inspiration came from that I wanted to move some options (like -analyzer_constraints) from frontend flags one level lower into -analyzer-configs (in a backward compatible manner), and realized I'd need to redundantly hardcore the acceptable values for each of the enum-like configs.


So, about this patch. This serves as little more than a demonstration of how the TableGen file looks like, and what it could generate. The biggest item to highlight is how we can easily define enum configs:

  • We can put all the acceptable values in the help text AND generate the enum class WITHOUT any redundancy.
  • While not demonstrated, it'd be easy to add help texts to the enum values as well.
  • You can see that in the generated file, all the values are squashed into a single string, and would need a string split -- thats just the way it goes, it seems like, as other tablegen files seem to do that as well. Maybe I could use variadic macro arguments...
  • Automatic initialization and validation of the user input can be done by marshalling, but its so strongly tied to regular llvm/clang flags, I'd rather use it as a source of inspiration.

Diff Detail

Event Timeline

Szelethus created this revision.Jun 16 2021, 11:30 PM
Szelethus requested review of this revision.Jun 16 2021, 11:30 PM

Ping.

The main question with this patch really is whether we want TableGen at all or not, as for what we generate precisely and how we'd utilize it is a discussion for later.

Ping.

The main question with this patch really is whether we want TableGen at all or not, as for what we generate precisely and how we'd utilize it is a discussion for later.

I don't really have strong opinions here. On one hand, I'd prefer TableGen to a purely macro-based solution. On the other hand, something, let's say, C++-native (like cl::opt) seems better than TableGen.

steakhal added subscribers: jansvoboda11, myhsu.

We can put all the acceptable values in the help text AND generate the enum class WITHOUT any redundancy.

+1

While not demonstrated, it'd be easy to add help texts to the enum values as well.

Ah yes! We should take it one step further. Generate user-facing documentation from the descriptions embedded in the `td file.
We should really have a doc page for analyzer configs as well, similarly to the one we have for checkers.
That being said, I would vote for generating the checker's user-facing docs as well, from the Checkers.td file. That way they would be always in sync and we could easily check if an option/checker is not documented.`

Automatic initialization and validation of the user input can be done by marshaling, but its so strongly tied to regular llvm/clang flags, I'd rather use it as a source of inspiration.

I think we can introduce marshaling and cl::opts in a subsequent patch if we need to, but I cannot immediately foresee the impact of any of these.

Let me invite @myhsu for Tablegen and @jansvoboda11 for marshaling.

clang/include/clang/StaticAnalyzer/AnalyzerFlagsBase.td
9

typo

clang/utils/TableGen/ClangSAConfigsEmitter.cpp
19

Where do you use std::map?

35–38
43–49

How does this return a sorted data structure? SortedRecords aliases llvm::StringMap<const Record *>.

114

typo

Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2022, 6:43 AM
myhsu added a comment.May 1 2022, 11:40 PM

Just curious: Have you considered using llvm::opt or even llvm::cl infrastructure? What are the pros & cons?

clang/utils/TableGen/ClangSAConfigsEmitter.cpp
24
32

I'm not an expert in clang-analyzer, but can DefaultVal be optional in an analyzer config? It you want it to be optional -- which, I think, also makes the TableGen file more concise -- you can use Record::getValueasOptionalString instead.

57

Please use llvm::emitSourceFileHeader (in llvm/TableGen/TableGenBackend.h)

76

IIRC usually we will also write /*IsDefaultValString=*/

140

if R->isValueUnset("Values") returns true, the resulting code will have a trailing comma just right before the right brace:

ENUM_OPTION(Foo, Foo, "bar", "...",)

which I don't think is valid for C/C++ macro.
Please use llvm::ListSeparator(in StringExtras.h) to print commas instead.