This is an archive of the discontinued LLVM Phabricator instance.

[scan-build-py] Update scan-build-py to allow outputing as SARIF.
ClosedPublic

Authored by isthismyaccount on Jan 7 2021, 11:28 AM.

Details

Summary

clang static analysis reports can be generated in html, plist, or sarif format.
This updates scan-build-py to be able to specify SARIF as the desired output format, as previously it only support plist and html formats.

Diff Detail

Event Timeline

isthismyaccount requested review of this revision.Jan 7 2021, 11:28 AM
whisperity added a subscriber: NoQ.
This revision is now accepted and ready to land.Jan 12 2021, 5:40 PM

Update to support running multiple TUs at once and merging the resulting sarif file into a single sarif file with multiple runs.

Support embedded links in results, codeflows, and threadflows. Does not support embedded links in other message properties (eg: location, since these generally do not refer to sarif runs indices)

Cleanup related to merge_sarif_files to be more readable, update the merge sarif test.

aabbaabb added inline comments.Feb 2 2021, 5:24 PM
clang/tools/scan-build-py/libscanbuild/report.py
313 ↗(On Diff #320523)

arr not used?

Change to:

arr = [update_sarif_object(entry, runs_count_offset) for entry in sarif_object[key]]
sarif_object[key] = arr

337 ↗(On Diff #320523)

I think loop in a reverse order like
for i in range(len(matches)-1, -1, -1):

makes more sense.

347 ↗(On Diff #320523)

pattern only used above, suggest move it up.

370 ↗(On Diff #320523)

Here you are assuming all the run index in one sarif file is sequential. (I believe this should be the case)
However, a more general approach is not to pass in a runs_count_offset, but rather, a runs_count
and then runs_count++ after every run in every sarif file. This way, we no longer rely on the original run index.

aabbaabb requested changes to this revision.Feb 2 2021, 5:24 PM
This revision now requires changes to proceed.Feb 2 2021, 5:24 PM
isthismyaccount marked 3 inline comments as done.Feb 3 2021, 11:28 AM
isthismyaccount added inline comments.
clang/tools/scan-build-py/libscanbuild/report.py
370 ↗(On Diff #320523)

From my understanding, in a single sarif file, runs can point to any index in the set of contained runs. Ie we could have something like:

sarif:

{ 
'runs': [
   { 'message': { 'text': 'this is a link to a future run [link](sarif:/runs/1/result/0)'}},
   { 'message': { 'text': 'this is a link to another future run [link](sarif:/runs/2/result/0'}},
   { 'message': { 'text': 'this is a link to the first run [link](sarif:/runs/0/result/0'}}
]
}

In this case, when merging this set of runs into another set of runs, embedded links would be updated to be relative to the "base" run index, such that and index of 0 becomes base_run_index, 1 becomes base_run_index+1, etc. I'm unsure of whether or not I can achieve this by iteratively increasing runs_count, since run indices are initially 0-indexed.

From
https://docs.oasis-open.org/sarif/sarif/v2.0/csprd02/sarif-v2.0-csprd02.html#_Toc10127658

Because the "sarif" URI scheme uses JSON pointer [RFC6901], which locates array elements by their array index, these URIs are potentially fragile if the SARIF log file is transformed by a post-processor.

EXAMPLE 2: If a post-processor concatenates two runs into a single log file, the links within the run at index 1 will be incorrect, and will need to be updated from "sarif:/runs/0/…" to "sarif:/runs/1/…".

Update based on aabbaabb's comments.

aabbaabb accepted this revision.Feb 4 2021, 2:15 PM
aabbaabb added inline comments.
clang/tools/scan-build-py/libscanbuild/report.py
370 ↗(On Diff #320523)

I see.

This revision is now accepted and ready to land.Feb 4 2021, 2:15 PM
haowei accepted this revision.Feb 5 2021, 4:59 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 7 2021, 6:26 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript