This is an archive of the discontinued LLVM Phabricator instance.

Allow direct navigation to static analysis checker documentation through SARIF exports
ClosedPublic

Authored by aaron.ballman on Dec 17 2018, 2:38 PM.

Details

Summary

The static analysis checkers are mostly listed on two pages, one for alpha checks and one for released checks (there are also a considerable number of undocumented analyzer checkers). This patch adds anchors to all of the documented checks so that you can directly link to a check by a stable name. This is useful because the SARIF file format has a field for specifying a URI to documentation for a rule and some viewers, like CodeSonar, make use of this information.

This patch exposes those anchors through the SARIF exporter, which requires an automated way to determine the check URIs. Checkers.td now requires each checker to be annotated with an extensible list of documentation kinds (listed in CheckerBase.td) that map to physical URI paths on the static analyzer web page if the check has documentation (mapping is done by the table generator), which is exposed via the CHECKER macro. When exporting to SARIF, the checker name itself is used as the anchor target on the page (and I verified that checker names, including dots and dashes, will be valid anchors) and the corresponding edits were made to each HTML file to add the anchors.

This likely has some overlap with D54429, so I've added those reviewers and author to this review.

Diff Detail

Event Timeline

aaron.ballman created this revision.Dec 17 2018, 2:38 PM

Thank you so much for working on this! The lack of a proper documentation is indeed a weak points of the project.

I guess it'd be better to generate the entire URI in the tblgen file, and also allow plugins to register their own through CheckerRegistry. Sarif lacking the support for them is one thing, but adding a new field to the Checker class in CheckerBase.td I think should be accompanied by extending CheckerRegistry::CheckerInfo and CheckerRegistry::getChecker.

While I dislike the use of <a id="some.Checker"></a>, until Sphinx documentation becomes a thing, I guess we can't do much about it.

But the overall direction of this patch is great.

lib/StaticAnalyzer/Core/SarifDiagnostics.cpp
270–275

As per usual, this won't handle plugin checkers, but the solution is in an arms reach -- in the last couple months, I changed a lot around checker registration, and once things settle down a bit.

One idea is to store the list of registered checkers (now, in this case, registered means that the analyzer even knows about its existence, not that it's enabled) in AnalyzerOptions, however it'd force you to put an extra parameter to almost all functions here. What do you think?

284–286

Why not do this in ClangSACheckersEmitter.cpp?

utils/TableGen/ClangSACheckersEmitter.cpp
97–98

Cheers!

aaron.ballman marked 3 inline comments as done.Dec 18 2018, 5:51 AM

Thank you so much for working on this! The lack of a proper documentation is indeed a weak points of the project.

Yeah, the more we can couple the documentation to the check definitions, the more likely we are to see the documentation remain up to date as checks are added (at least, that's been the case with attributes).

I guess it'd be better to generate the entire URI in the tblgen file, and also allow plugins to register their own through CheckerRegistry. Sarif lacking the support for them is one thing, but adding a new field to the Checker class in CheckerBase.td I think should be accompanied by extending CheckerRegistry::CheckerInfo and CheckerRegistry::getChecker.

Good point about the checker registry, I'll make those changes. However, I'm a bit less certain about generating the entire URI in the tablegen file -- that adds a lot of string literals to the binary when I think we don't have to. However, perhaps that's not a huge concern and having the information in its final form in the .inc file is a better approach. WDYT?

While I dislike the use of <a id="some.Checker"></a>, until Sphinx documentation becomes a thing, I guess we can't do much about it.

But the overall direction of this patch is great.

Thanks, I'm glad to hear it!

lib/StaticAnalyzer/Core/SarifDiagnostics.cpp
270–275

As per usual, this won't handle plugin checkers, but the solution is in an arms reach -- in the last couple months, I changed a lot around checker registration, and once things settle down a bit.

I knew this wouldn't do good things for plugins yet, but I'm glad to hear things are moving forward on that front!

One idea is to store the list of registered checkers (now, in this case, registered means that the analyzer even knows about its existence, not that it's enabled) in AnalyzerOptions, however it'd force you to put an extra parameter to almost all functions here. What do you think?

I could always wrap these methods up to be private methods of the SarifDiagnostics class so they'd have access to the options, so I think there's a decent path forward.

284–286

In part so there will be less string literals, basically. By supplying only the .html target in the .inc file, we have one of three string literals there. I could move the http:// part in to the .inc file as well (we'd still only have the same three string literals), but if I moved the anchor targets as well, we now have N string literals baked into the binary when we don't need to use that much space.

utils/TableGen/ClangSACheckersEmitter.cpp
97–98

You make that mistake *one time* and then you add comments to the top of the auto-generated file. ;-)

Updated based on review feedback.

Now generating the full HTML link from tablegen instead of piecemeal. Also, updated CheckerRegistry and CheckerInfo to expose this information for checkers written as plugins.

Global nit: I guess having both DESC and DOCS as variable/macro arg names is confusing. Can we instead use DOCSURI/DocsUri?

Thank you so much for working on this! The lack of a
I guess it'd be better to generate the entire URI in the tblgen file, and also allow plugins to register their own through CheckerRegistry. Sarif lacking the support for them is one thing, but adding a new field to the Checker class in CheckerBase.td I think should be accompanied by extending CheckerRegistry::CheckerInfo and CheckerRegistry::getChecker.

Good point about the checker registry, I'll make those changes. However, I'm a bit less certain about generating the entire URI in the tablegen file -- that adds a lot of string literals to the binary when I think we don't have to. However, perhaps that's not a huge concern and having the information in its final form in the .inc file is a better approach. WDYT?

Well, we already store checker- and the quite lengthy -analyzer-config descriptions... but I'm not sure thats a good point.
However, plugin checkers being able to specify uris such as "mycompany.internal.supersecret.idk/dontshare/checkers/tradesecrets.html" would be neat.

One more thing to consider is that if Sphinx lands after 8.0.0., the uri will most probably change. We can always redirect, but thats a thing to consider. I'm not sure whether there's any likelihood of it being completed until the new release branch is created. But I don't think there's any change needed for this patch just because of this.

lib/StaticAnalyzer/Core/SarifDiagnostics.cpp
270–275

Ah, touchpad messed up my inline, but I guess it was understandable.

Okay, thanks, one more concern out of the way then!

Global nit: I guess having both DESC and DOCS as variable/macro arg names is confusing. Can we instead use DOCSURI/DocsUri?

Sure, I'll make that change.

Thank you so much for working on this! The lack of a
I guess it'd be better to generate the entire URI in the tblgen file, and also allow plugins to register their own through CheckerRegistry. Sarif lacking the support for them is one thing, but adding a new field to the Checker class in CheckerBase.td I think should be accompanied by extending CheckerRegistry::CheckerInfo and CheckerRegistry::getChecker.

Good point about the checker registry, I'll make those changes. However, I'm a bit less certain about generating the entire URI in the tablegen file -- that adds a lot of string literals to the binary when I think we don't have to. However, perhaps that's not a huge concern and having the information in its final form in the .inc file is a better approach. WDYT?

Well, we already store checker- and the quite lengthy -analyzer-config descriptions... but I'm not sure thats a good point.
However, plugin checkers being able to specify uris such as "mycompany.internal.supersecret.idk/dontshare/checkers/tradesecrets.html" would be neat.

One more thing to consider is that if Sphinx lands after 8.0.0., the uri will most probably change. We can always redirect, but thats a thing to consider. I'm not sure whether there's any likelihood of it being completed until the new release branch is created. But I don't think there's any change needed for this patch just because of this.

Yeah, I wound up pushing all of the data into the table generator as suggested. The plugin registration was what did it for me -- when registering a checker, we should have the full link to the help URI rather than try to build it from pieces that may not be correct for plugins.

Updating based on review feedback.

Changed the macro name from DOCS to DOC_URI to differentiate it from DESC.

Szelethus accepted this revision.EditedDec 18 2018, 10:17 AM

This is awesome. Please wait for either @NoQ or @george.karpenkov to have the final say.

Edit: Maybe couple days I guess. I've been digging through these files quite a lot in the last 2 months, and I don't see anything wrong with this patch.

This revision is now accepted and ready to land.Dec 18 2018, 10:17 AM

This is awesome. Please wait for either @NoQ or @george.karpenkov to have the final say.

Edit: Maybe couple days I guess. I've been digging through these files quite a lot in the last 2 months, and I don't see anything wrong with this patch.

@dkrupp, @NoQ, @george.karpenkov -- any comments? I'll probably commit early tomorrow if there isn't further feedback (and I'm happy to deal with anything post-commit if it arises).

NoQ accepted this revision.Dec 20 2018, 10:56 AM

Looks great, no objections on my end!

This is amazing, huge thanks for doing this!

Thanks for the reviews! I've commit in r349812.

I'm afraid we are seeing a build failure here on our local Windows checking MSVC build. Unfortunately I cannot find a public buildbot that uses the exact configuration that causes the assertion. The assertion we are seeing is

Assertion failed: isa<X>(Val) && "cast<Ty>() argument of incompatible type!", file F:\merge3\o\llvm\include\llvm/Support/Casting.h, line 255

It would appear it will not allow a cast from 'Init' to 'BitInit'. There is a very similar routine in the file X86EX2VEXTTablesEmitter.cpp as follows

static inline uint64_t getValueFromBitsInit(const BitsInit *B) {
uint64_t Value = 0;
for (unsigned i = 0, e = B->getNumBits(); i != e; ++i) {

if (BitInit *Bit = dyn_cast<BitInit>(B->getBit(i)))
  Value |= uint64_t(Bit->getValue()) << i;
else
  PrintFatalError("Invalid VectSize bit");

}
return Value;
}

Which appears to serve the same purpose but uses a dynamic cast. If I replace your routine with this technique our build succeeds.

regards

Phil Camp, SIE