This is an archive of the discontinued LLVM Phabricator instance.

[Analyzer] Add support for displaying cross-file diagnostic paths in HTML output
ClosedPublic

Authored by vlad.tsyrklevich on Feb 27 2017, 7:21 AM.

Details

Summary

While looking at checker output with D30289 applied I realized I was missing results. After some digging I found that it's because the Linux kernel makes heavy use of inlined functions included in header files, and the resulting diagnostic path now included cross-file diagnostics which are unsupported. This change adds support for cross-file diagnostic paths. If the diagnostic path is not cross-file, there is no change in the output.

Diff Detail

Repository
rL LLVM

Event Timeline

Here's an example 3 file report from the Linux source: https://cdn.rawgit.com/vlad902/73bd32181151f2f0b07fa501a0f0b627/raw/0106f09cdb0d045b757de3d71ffffddc58072d33/report2.html

It's not as immediately clear this is a multi-file output. Ideas to improve this could include anchor tags in the header for every file included, or augmenting the line count to something like '$FILE:$LINE' instead. I'm open to suggestions on this front.

I think this is great. We've been hearing a lot of complaints on the mailing lists recently about that problem.

Did you check that scan-build properly de-duplicates cross-file reports that originate from different clang runs but point to the same header? With your approach i think it should work out of the box, but i'd rather be sure.

It's not as immediately clear this is a multi-file output.

I'm still to have a closer look at the actual code, sorry for the delays.

I'm not having problems with that, i think. If you want to fix that, maybe a list of files at the header, with links to the beginning of each file, would be enough. Or maybe modify arrows between diagnostic pieces to highlight cases when they cross file boundaries: <- 5. Calling foo() -> (into file foo.c).

zaks.anna edited edge metadata.Mar 1 2017, 1:07 PM

No multi-file support is a long outstanding limitation of scan-build html output. Great to see the patch!! Thank you for working on it!

It's not as immediately clear this is a multi-file output.

In addition to Artem's suggestions, you might want to insert multiple lines of padding to make the distinction on the border more clear. I think it would help especially when scrolling a large report like in the link for the Linux source.

Also, could you put this behind an option or introduce a new format like -analyzer-output=plist-multi-file but for html? Just in case someone is relying on a single file output format, we'd want to have an option to give them to turn it on.

vlad.tsyrklevich edited the summary of this revision. (Show Details)

Updated the formatting to make the file split more obvious (padding/line break height) and added simple navigation across files, example here: https://rawgit.com/vlad902/4aaa4e5e7a777b7098337370791352d7/raw/fc88cef667935e1eeae492b15590c18516e645dd/report.html

@NoQ: I had not thought about deduplication. I looked at scan-build and it looks like it currently de-dups based on MD5 matches of the HTML files. The simple case of multi-file reports with the same MD5 (e.g. because of repeated compilations of a single file during a build) being deduplicated still works correctly.

I also looked at what would happen if two separate C files include a common header file that only generates a report with -analyzer-opt-analyze-headers specified (e.g. if the path originates in the header file and doesn’t traverse the main C file at all.) This generated zero reports! My logic assumed the main file would always be included in the path and threw away -analyzer-opt-analyze-headers reports incorrectly. I was able to fix the logic such that these reports are correctly generated and also properly deduplicated by structuring the reports such that the main C file source is not included if it’s not traversed in the path.

@zaks.anna: I’ve added the single file output option but I would like to keep the multi-file option default—I suspect there are very few users parsing the HTML manually when plist is available and I’d like to ensure people don’t miss any results or have to dig into the source to understand they needed something other than the default output setting. What do you think?

lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp
211 ↗(On Diff #90417)

Is there a cleaner way to do these two comparisons?

I’ve added the single file output option but I would like to keep the multi-file option default

This sounds good to me! I agree that this is a very useful addition.

Updates to cleanly rebase on a recent clang master. Ran all tests with ASan to ensure I avoided a broken build post-merge as with https://reviews.llvm.org/D30909

vlad.tsyrklevich marked an inline comment as done.

After reviewing this patch again last night I:

  1. Updated some "%clang_cc1 -analyze" calls with "%clang_analyze_cc1" due to 2cfd901321423a96edd8513afc7c7c2bb0d18b2e
  2. Isolated HTML generation into a separate function to simplify HTMLDiagnostics::ReportDiag even further
zaks.anna accepted this revision.Jun 16 2017, 10:48 PM

LGTM. Thank you!

(Not sure if @NoQ wants to do a final review.)

Do you have commit access or should we commit on your behalf?

This revision is now accepted and ready to land.Jun 16 2017, 10:48 PM

Please commit on my behalf.

This revision was automatically updated to reflect the committed changes.
cfe/trunk/lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp