This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
Analysis/diagnostics/
-
diagnostics/
2
sarif-check.py
5
sarif-diagnostics-taint-test.c
-
StaticAnalyzer/Core/
-
Core/
1/1
CMakeLists.txt
11/13
SarifDiagnostics.cpp
-
clang/StaticAnalyzer/Core/
-
StaticAnalyzer/
-
Core/
-
Analyses.def
-
BugReporter/
2
PathDiagnostic.h

Differential D53814

Allow the analyzer to output to a SARIF file
ClosedPublic

Authored by aaron.ballman on Oct 29 2018, 6:33 AM.

Download Raw Diff

Details

Reviewers

dcoughlin
zaks.anna
george.karpenkov

Summary

SARIF (https://github.com/oasis-tcs/sarif-spec) is a new draft standard interchange format for static analysis results that allows result viewers to be decoupled from the tool producing the analysis results. This patch allows users to specify SARIF as the output from the clang static analyzer so that the results can be read in by other tools. There are several such tools for consuming SARIF, such as extensions to Visual Studio and VSCode, as well as static analyzers like CodeSonar.

SARIF is JSON-based and the latest provisional specification can be found at: https://github.com/oasis-tcs/sarif-spec/blob/master/Documents/ProvisionalDrafts/sarif-v2.0-csd02-provisional.docx. GrammaTech sponsored the work to produce this patch and we will make any necessary changes if the draft standard changes before publication.

Diff Detail

Event Timeline

aaron.ballman created this revision.Oct 29 2018, 6:33 AM

Herald added a reviewer: george.karpenkov. · View Herald TranscriptOct 29 2018, 6:33 AM

Herald added subscribers: dkrupp, donat.nagy, Szelethus and 2 others. · View Herald Transcript

ZaMaZaN4iK added a subscriber: ZaMaZaN4iK.Oct 29 2018, 9:13 AM

ZaMaZaN4iK added inline comments.

StaticAnalyzer/Core/CMakeLists.txt
43	Sort alphabetically
StaticAnalyzer/Core/SarifDiagnostics.cpp
89	I don't know about const corectness policy in LLVM. But I prefer here const char C .
95	Probably this piece of code will be better to write separately as function

Patch context is missing.

Analysis/diagnostics/sarif-check.py
23	Wow, this is super neat! Since you are quite active in LLVM community, would you think it's better to have this tool in `llvm/utils` next to FileCheck? The concept is very general, and I think a lot of people really feel FileCheck limitations.
Analysis/diagnostics/sarif-diagnostics-taint-test.c
19	Would it make more sense to just use `diff` + json pretty-formatter to write a test? With this test I can't even quite figure out how the output should look like.
StaticAnalyzer/Core/SarifDiagnostics.cpp
75	Nitpicking style, but I don't see why for-each loop, preferably with a range wrapping the iterators would not be more readable.
183	"Note" are notes which do not have to be attached to a particular path element.
244	I like closures, but what's wrong with just using a `for` loop here?
255	Usually we overwrite the file and note that on stderr in such cases.

Minor style notes + context missing.

I think using diff would be better than a custom python tool.

This revision now requires changes to proceed.Oct 29 2018, 9:20 AM

aaron.ballman added inline comments.Oct 29 2018, 10:07 AM

Analysis/diagnostics/sarif-check.py
23	The concept was pulled from test\TableGen\JSON-check.py, so it likely could be generalized. However, I suspect that each JSON test may want to expose different helper capabilities, so perhaps it won't be super general? I don't know enough about good Python design to know.
Analysis/diagnostics/sarif-diagnostics-taint-test.c
19	I'm not super comfortable with that approach, but perhaps I'm thinking of something different than what you're proposing. The reason I went with this approach is because diff would be fragile (depends heavily on field ordering, which the JSON support library doesn't give control over) and the physical layout of the file isn't what needs to be tested anyway. SARIF has a fair amount of optional data that can be provided as well, so using a purely textual diff worried me that exporting additional optional data in the future would require extensive unrelated changes to all SARIF diffs in the test directory. The goal for this test was to demonstrate that we can validate that the interesting bits of information are present in the output without worrying about the details. Also, the python approach allows us to express relationships between data items more easily than a textual diff tool would. I've not used that here, but I could imagine a test where we want to check that each code location has a corresponding file entry in another list.
StaticAnalyzer/Core/SarifDiagnostics.cpp
75	I tend to prefer using algorithms when the logic is simple -- it makes it more clear that the loop is an unimportant detail. I don't have strong opinions on this particular loop, however.
89	We typically do not put top-level `const` on locals, so I'd prefer to leave it off here rather than be inconsistent.
183	Good to know!
244	Same as above: clarity of exposition. This one I'd feel pretty strongly about keeping as an algorithm given how trivial the loop body is.
255	We took the decision internally to overwrite as well, but the SARIF format allows for multiple runs within the same output file (so you can compare analysis results for the same project over time). I think this will eventually have to be user-controlled because I can see some users wanting to append to the run and others wanting to overwrite. However, these log files can become quite large in practice (GBs of data), so "read in the JSON and add to it" may be implausible, hence why I punted for now. I'll update the comment so it's not a FIXME.

Updated based on initial review feedback, and added more context to the patch.

I don't think a new PathGenerationScheme is needed, unless you plan changes to BugReporter.cpp.

The code is fine otherwise, but could we try to use diff for testing?

Analysis/diagnostics/sarif-diagnostics-taint-test.c
19	Using a sample file + diff would have an advantage of being easier to read (just glance at the "Inputs/blah.serif" to see a sample output), and consistent with how we already do checking for plists. depends heavily on field ordering Is it an issue in practice though? I would assume that JSON support library would not switch field ordering too often (and even if it does, we can have a python wrapper testing that) SARIF has a fair amount of optional data Would diff `--ignore-matching-lines` be enough for those?
StaticAnalyzer/Core/SarifDiagnostics.cpp
69	+1, I would use this in other consumers.
clang/StaticAnalyzer/Core/BugReporter/PathDiagnostic.h
128	Do you actually need a new generation scheme here? I'm pretty sure that using "Minimal" would give you the same effect.

This revision now requires changes to proceed.Oct 29 2018, 2:57 PM

MTC added a subscriber: MTC.Oct 29 2018, 7:48 PM

aaron.ballman marked 9 inline comments as done.Oct 30 2018, 6:20 AM

aaron.ballman added inline comments.

Analysis/diagnostics/sarif-diagnostics-taint-test.c
19	Diff testing was what I originally tried and I abandoned it because it was not viable. The optional data cannot be handled by ignoring matching lines because the lines won't match from system to system. For instance, there are absolute URIs to files included in the output, clang git hashes and version numbers that will change from revision to revision, etc. What you see as an advantage, I see as more of a distraction -- the actual layout of the information in the file isn't that important so long as it validates as valid SARIF (unfortunately, there are no cross-platform command line tools to validate SARIF yet). What is important are just a few pieces of the content information (where are the diagnostics, what source ranges and paths are involved, etc) compared to the overall whole. I can give diff testing another shot, but I was unable to find a way to make `-I` work so that I could ignore everything that needs to be ignored due to natural variance. Do you have ideas there? To give you an idea of the ultimate form of the output: { "$schema": "http://json.schemastore.org/sarif-2.0.0", "runs": [ { "files": { "file:///C:/Users/aballman.GRAMMATECH/Desktop/test.c": { "fileLocation": { "uri": "file:///C:/Users/aballman.GRAMMATECH/Desktop/test.c" }, "length": 3850, "mimeType": "text/plain", "roles": [ "resultFile" ] } }, "results": [ { "codeFlows": [ { "threadFlows": [ { "locations": [ { "importance": "essential", "location": { "message": { "text": "Calling 'f'" }, "physicalLocation": { "fileLocation": { "uri": "file:///C:/Users/aballman.GRAMMATECH/Desktop/test.c" }, "region": { "endColumn": 5, "endLine": 119, "startColumn": 3, "startLine": 119 } } }, "step": 1 }, { "importance": "essential", "location": { "message": { "text": "tainted" }, "physicalLocation": { "fileLocation": { "uri": "file:///C:/Users/aballman.GRAMMATECH/Desktop/test.c" }, "region": { "endColumn": 17, "endLine": 115, "startColumn": 11, "startLine": 115 } } }, "step": 2 } ] } ] } ], "locations": [ { "physicalLocation": { "fileLocation": { "uri": "file:///C:/Users/aballman.GRAMMATECH/Desktop/test.c" }, "region": { "endColumn": 17, "endLine": 115, "startColumn": 11, "startLine": 115 } } } ], "message": { "text": "tainted" }, "ruleId": "debug.TaintTest" } ], "tool": { "fullName": "clang static analyzer", "language": "en-US", "name": "clang", "version": "clang version 8.0.0 (https://github.com/llvm-project/clang.git a5ccb257a7a70928ede717a7c282f5fc8cbed310) (https://github.com/llvm-mirror/llvm.git 73cebd79c512f7129eca16b0f3a7abd21d2881e8)" } } ], "version": "2.0.0-beta.2018-09-26" } In this file, the variable things are: the file URIs in multiple places, the clang version information, and to a lesser extent, the SARIF version information (we care that it says 2.0.0 but don't care about the -beta stuff). Other pieces of information are likely to become variable in the future (like the language field). This log file is ~100 lines long for displaying information about one diagnostic with two interesting locations on the code path, which is why I think the diff testing is a distraction. I believe that more complex examples will result in even larger output files. I might be able to use FileCheck more directly, e.g., CHECK: "threadFlows": CHECK-NEXT: "locations": CHECK-NEXT: "{" CHECK-NEXT: "importance": "essential" ... But I feel like that's just a more verbose version of what's being done in the patch with less flexibility.
StaticAnalyzer/Core/SarifDiagnostics.cpp
69	Not certain I understand this comment.
clang/StaticAnalyzer/Core/BugReporter/PathDiagnostic.h
128	I don't think it's currently needed, I've removed but updated the comment.

Updated based on review feedback -- removed the Sarif path generation scheme as it isn't currently needed, and replaced a FIXME with a better comment.

Testing remains an open question, however.

Switched to using diff for testing.

aaron.ballman added inline comments.Oct 30 2018, 8:39 AM

Analysis/diagnostics/sarif-diagnostics-taint-test.c
19	I was finally able to convince `diff` to do what I needed. You have to list a fair amount of things to ignore on the RUN line and there's a disconnect between the input and the expected output. One odd behavior I noticed was that splitting this into two RUN lines (so diff is its own RUN line) fails on my system because `diff` then does not understand the `-I` option! Using the pipe syntax seems to work for me. I think lit is somehow finding two different diff programs. I'm still not convinced this an improvement over the previous style of testing, but at least we have something working for comparison purposes now.

I much prefer this version.
We've had the same problem with diffing plist output.
One thing we have learned is using a FileCheck was definitely a bad idea, as it leads to unreadable, unmaintainable, and very hard to update tests,
so either diff or your custom tool is way better.

As for the ultimate solution, I'm still not sure: I agree that maintaining those -I flags is annoying.
One option is having an extra flag to produce "stable" output, which does not include absolute URLs/versions/etc.

This revision is now accepted and ready to land.Oct 30 2018, 9:07 AM

In D53814#1280581, @george.karpenkov wrote:

I much prefer this version.
We've had the same problem with diffing plist output.
One thing we have learned is using a FileCheck was definitely a bad idea, as it leads to unreadable, unmaintainable, and very hard to update tests,
so either diff or your custom tool is way better.

As for the ultimate solution, I'm still not sure: I agree that maintaining those -I flags is annoying.

We can go with this approach until we need something more complicated. I suspect that as we add SARIF features, we may want to bring back the Python script for handling things like "Does every file in the 'files' list appear only once and do the files listed correspond exactly to ones in the diagnostic locations?". Diff definitely won't handle that sort of thing.

One option is having an extra flag to produce "stable" output, which does not include absolute URLs/versions/etc.

Worth thinking about. SARIF has the ability to output relative paths as well as absolute paths. It also has the notion of redacted paths so that you can remove sensitive information from analysis reports. So there's plenty of room for changes here.

Thank you for the reviews! I've commit in r345628.

Revision Contents

Path

Size

Analysis/

diagnostics/

sarif-check.py

61 lines

sarif-diagnostics-taint-test.c

32 lines

		StaticAnalyzer/	Core/
	lib/	StaticAnalyzer/	Core/

CMakeLists.txt

5 lines

StaticAnalyzer/

Core/

SarifDiagnostics.cpp

267 lines

		clang/	StaticAnalyzer/	Core/
	include/	clang/	StaticAnalyzer/	Core/

Analyses.def

1 line

BugReporter/

PathDiagnostic.h

3 lines

Diff 171545

Analysis/diagnostics/sarif-check.py

This file was added.

				#!/usr/bin/env python

				import sys
				import subprocess
				import traceback
				import json

				testfile = sys.argv[1]
				with open(sys.argv[2]) as datafh:
				data = json.load(datafh)

				prefix = "CHECK: "

				def sarifLogFirstResult(idx):
				return data['runs'][0]['results'][idx]

				def threadFlow(idx):
				return data['runs'][0]['results'][0]['codeFlows'][0]['threadFlows'][idx]

				fails = 0
				passes = 0
				with open(testfile) as testfh:
				lineno = 0
				george.karpenkovUnsubmitted Not Done Reply Inline Actions Wow, this is super neat! Since you are quite active in LLVM community, would you think it's better to have this tool in `llvm/utils` next to FileCheck? The concept is very general, and I think a lot of people really feel FileCheck limitations. george.karpenkov: Wow, this is super neat! Since you are quite active in LLVM community, would you think it's…
				aaron.ballmanAuthorUnsubmitted Not Done Reply Inline Actions The concept was pulled from test\TableGen\JSON-check.py, so it likely could be generalized. However, I suspect that each JSON test may want to expose different helper capabilities, so perhaps it won't be super general? I don't know enough about good Python design to know. aaron.ballman: The concept was pulled from test\TableGen\JSON-check.py, so it likely could be generalized.
				for line in iter(testfh.readline, ""):
				lineno += 1
				line = line.rstrip("\r\n")
				try:
				prefix_pos = line.index(prefix)
				except ValueError:
				continue
				check_expr = line[prefix_pos + len(prefix):]

				try:
				exception = None
				result = eval(check_expr, {"sarifLog":data,
				"result":sarifLogFirstResult,
				"flow":threadFlow})
				except Exception:
				result = False
				exception = traceback.format_exc().splitlines()[-1]

				if exception is not None:
				sys.stderr.write(
				"{file}:{line:d}: check threw exception: {expr}\n"
				"{file}:{line:d}: exception was: {exception}\n".format(
				file=testfile, line=lineno,
				expr=check_expr, exception=exception))
				fails += 1
				elif not result:
				sys.stderr.write(
				"{file}:{line:d}: check returned False: {expr}\n".format(
				file=testfile, line=lineno, expr=check_expr))
				fails += 1
				else:
				passes += 1

				if fails != 0:
				sys.exit("{} checks failed".format(fails))
				else:
				sys.stdout.write("{} checks passed\n".format(passes))

Analysis/diagnostics/sarif-diagnostics-taint-test.c

This file was added.

				// RUN: %clang_analyze_cc1 -analyzer-checker=alpha.security.taint,debug.TaintTest %s -verify -analyzer-output=sarif -o %t.sarif \| %python %S/sarif-check.py %s %t.sarif
				#include "../Inputs/system-header-simulator.h"

				int atoi(const char *nptr);

				void f(void) {
				char s[80];
				scanf("%s", s);
				int d = atoi(s); // expected-warning {{tainted}}
				}

				int main(void) {
				f();
				return 0;
				}

				// Test the basics for sanity.
				// CHECK: sarifLog['version'].startswith("2.0.0")
				// CHECK: sarifLog['runs'][0]['tool']['fullName'] == "clang static analyzer"
				george.karpenkovUnsubmitted Not Done Reply Inline Actions Would it make more sense to just use `diff` + json pretty-formatter to write a test? With this test I can't even quite figure out how the output should look like. george.karpenkov: Would it make more sense to just use `diff` + json pretty-formatter to write a test? With this…
				aaron.ballmanAuthorUnsubmitted Not Done Reply Inline Actions I'm not super comfortable with that approach, but perhaps I'm thinking of something different than what you're proposing. The reason I went with this approach is because diff would be fragile (depends heavily on field ordering, which the JSON support library doesn't give control over) and the physical layout of the file isn't what needs to be tested anyway. SARIF has a fair amount of optional data that can be provided as well, so using a purely textual diff worried me that exporting additional optional data in the future would require extensive unrelated changes to all SARIF diffs in the test directory. The goal for this test was to demonstrate that we can validate that the interesting bits of information are present in the output without worrying about the details. Also, the python approach allows us to express relationships between data items more easily than a textual diff tool would. I've not used that here, but I could imagine a test where we want to check that each code location has a corresponding file entry in another list. aaron.ballman: I'm not super comfortable with that approach, but perhaps I'm thinking of something different…
				george.karpenkovUnsubmitted Not Done Reply Inline Actions Using a sample file + diff would have an advantage of being easier to read (just glance at the "Inputs/blah.serif" to see a sample output), and consistent with how we already do checking for plists. depends heavily on field ordering Is it an issue in practice though? I would assume that JSON support library would not switch field ordering too often (and even if it does, we can have a python wrapper testing that) SARIF has a fair amount of optional data Would diff `--ignore-matching-lines` be enough for those? george.karpenkov: Using a sample file + diff would have an advantage of being easier to read (just glance at the…
				aaron.ballmanAuthorUnsubmitted Not Done Reply Inline Actions Diff testing was what I originally tried and I abandoned it because it was not viable. The optional data cannot be handled by ignoring matching lines because the lines won't match from system to system. For instance, there are absolute URIs to files included in the output, clang git hashes and version numbers that will change from revision to revision, etc. What you see as an advantage, I see as more of a distraction -- the actual layout of the information in the file isn't that important so long as it validates as valid SARIF (unfortunately, there are no cross-platform command line tools to validate SARIF yet). What is important are just a few pieces of the content information (where are the diagnostics, what source ranges and paths are involved, etc) compared to the overall whole. I can give diff testing another shot, but I was unable to find a way to make `-I` work so that I could ignore everything that needs to be ignored due to natural variance. Do you have ideas there? To give you an idea of the ultimate form of the output: { "$schema": "http://json.schemastore.org/sarif-2.0.0", "runs": [ { "files": { "file:///C:/Users/aballman.GRAMMATECH/Desktop/test.c": { "fileLocation": { "uri": "file:///C:/Users/aballman.GRAMMATECH/Desktop/test.c" }, "length": 3850, "mimeType": "text/plain", "roles": [ "resultFile" ] } }, "results": [ { "codeFlows": [ { "threadFlows": [ { "locations": [ { "importance": "essential", "location": { "message": { "text": "Calling 'f'" }, "physicalLocation": { "fileLocation": { "uri": "file:///C:/Users/aballman.GRAMMATECH/Desktop/test.c" }, "region": { "endColumn": 5, "endLine": 119, "startColumn": 3, "startLine": 119 } } }, "step": 1 }, { "importance": "essential", "location": { "message": { "text": "tainted" }, "physicalLocation": { "fileLocation": { "uri": "file:///C:/Users/aballman.GRAMMATECH/Desktop/test.c" }, "region": { "endColumn": 17, "endLine": 115, "startColumn": 11, "startLine": 115 } } }, "step": 2 } ] } ] } ], "locations": [ { "physicalLocation": { "fileLocation": { "uri": "file:///C:/Users/aballman.GRAMMATECH/Desktop/test.c" }, "region": { "endColumn": 17, "endLine": 115, "startColumn": 11, "startLine": 115 } } } ], "message": { "text": "tainted" }, "ruleId": "debug.TaintTest" } ], "tool": { "fullName": "clang static analyzer", "language": "en-US", "name": "clang", "version": "clang version 8.0.0 (https://github.com/llvm-project/clang.git a5ccb257a7a70928ede717a7c282f5fc8cbed310) (https://github.com/llvm-mirror/llvm.git 73cebd79c512f7129eca16b0f3a7abd21d2881e8)" } } ], "version": "2.0.0-beta.2018-09-26" } In this file, the variable things are: the file URIs in multiple places, the clang version information, and to a lesser extent, the SARIF version information (we care that it says 2.0.0 but don't care about the -beta stuff). Other pieces of information are likely to become variable in the future (like the language field). This log file is ~100 lines long for displaying information about one diagnostic with two interesting locations on the code path, which is why I think the diff testing is a distraction. I believe that more complex examples will result in even larger output files. I might be able to use FileCheck more directly, e.g., CHECK: "threadFlows": CHECK-NEXT: "locations": CHECK-NEXT: "{" CHECK-NEXT: "importance": "essential" ... But I feel like that's just a more verbose version of what's being done in the patch with less flexibility. aaron.ballman: Diff testing was what I originally tried and I abandoned it because it was not viable. The…
				aaron.ballmanAuthorUnsubmitted Not Done Reply Inline Actions I was finally able to convince `diff` to do what I needed. You have to list a fair amount of things to ignore on the RUN line and there's a disconnect between the input and the expected output. One odd behavior I noticed was that splitting this into two RUN lines (so diff is its own RUN line) fails on my system because `diff` then does not understand the `-I` option! Using the pipe syntax seems to work for me. I think lit is somehow finding two different diff programs. I'm still not convinced this an improvement over the previous style of testing, but at least we have something working for comparison purposes now. aaron.ballman: I was finally able to convince `diff` to do what I needed. You have to list a fair amount of…
				// CHECK: sarifLog['runs'][0]['tool']['name'] == "clang"
				// CHECK: sarifLog['runs'][0]['tool']['language'] == "en-US"

				// Test the specifics of this taint test.
				// CHECK: len(result(0)['codeFlows'][0]['threadFlows'][0]['locations']) == 2
				// CHECK: flow(0)['locations'][0]['step'] == 1
				// CHECK: flow(0)['locations'][0]['importance'] == "essential"
				// CHECK: flow(0)['locations'][0]['location']['message']['text'] == "Calling 'f'"
				// CHECK: flow(0)['locations'][0]['location']['physicalLocation']['region']['startLine'] == 13
				// CHECK: flow(0)['locations'][1]['step'] == 2
				// CHECK: flow(0)['locations'][1]['importance'] == "essential"
				// CHECK: flow(0)['locations'][1]['location']['message']['text'] == "tainted"
				// CHECK: flow(0)['locations'][1]['location']['physicalLocation']['region']['startLine'] == 9

StaticAnalyzer/Core/CMakeLists.txt

Show All 34 Lines	add_clang_library(clangStaticAnalyzerCore
FunctionSummary.cpp		FunctionSummary.cpp
HTMLDiagnostics.cpp		HTMLDiagnostics.cpp
IssueHash.cpp		IssueHash.cpp
LoopUnrolling.cpp		LoopUnrolling.cpp
LoopWidening.cpp		LoopWidening.cpp
MemRegion.cpp		MemRegion.cpp
PathDiagnostic.cpp		PathDiagnostic.cpp
PlistDiagnostics.cpp		PlistDiagnostics.cpp
ProgramState.cpp		ProgramState.cpp
		ZaMaZaN4iKUnsubmitted Done Reply Inline Actions Sort alphabetically ZaMaZaN4iK: Sort alphabetically
RangeConstraintManager.cpp		RangeConstraintManager.cpp
RangedConstraintManager.cpp		RangedConstraintManager.cpp
RegionStore.cpp		RegionStore.cpp
RetainSummaryManager.cpp		RetainSummaryManager.cpp
SValBuilder.cpp		SarifDiagnostics.cpp
SVals.cpp
SimpleConstraintManager.cpp		SimpleConstraintManager.cpp
SimpleSValBuilder.cpp		SimpleSValBuilder.cpp
Store.cpp		Store.cpp
SubEngine.cpp		SubEngine.cpp
		SValBuilder.cpp
		SVals.cpp
SymbolManager.cpp		SymbolManager.cpp
WorkList.cpp		WorkList.cpp
Z3ConstraintManager.cpp		Z3ConstraintManager.cpp

LINK_LIBS		LINK_LIBS
clangAST		clangAST
clangASTMatchers		clangASTMatchers
clangAnalysis		clangAnalysis
Show All 13 Lines

StaticAnalyzer/Core/SarifDiagnostics.cpp

This file was added.

				//===--- SarifDiagnostics.cpp - Sarif Diagnostics for Paths ------ C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This file defines the SarifDiagnostics object.
				//
				//===----------------------------------------------------------------------===//

				#include "clang/Basic/Version.h"
				#include "clang/Lex/Preprocessor.h"
				#include "clang/StaticAnalyzer/Core/AnalyzerOptions.h"
				#include "clang/StaticAnalyzer/Core/BugReporter/PathDiagnostic.h"
				#include "clang/StaticAnalyzer/Core/PathDiagnosticConsumers.h"
				#include "llvm/ADT/STLExtras.h"
				#include "llvm/Support/JSON.h"
				#include "llvm/Support/Path.h"

				using namespace llvm;
				using namespace clang;
				using namespace ento;

				namespace {
				class SarifDiagnostics : public PathDiagnosticConsumer {
				std::string OutputFile;

				public:
				SarifDiagnostics(AnalyzerOptions &, const std::string &Output)
				: OutputFile(Output) {}
				~SarifDiagnostics() override = default;

				void FlushDiagnosticsImpl(std::vector<const PathDiagnostic *> &Diags,
				FilesMade *FM) override;

				StringRef getName() const override { return "SarifDiagnostics"; }
				PathGenerationScheme getGenerationScheme() const override { return Sarif; }
				bool supportsLogicalOpControlFlow() const override { return true; }
				bool supportsCrossFileDiagnostics() const override { return true; }
				};
				} // end anonymous namespace

				void ento::createSarifDiagnosticConsumer(AnalyzerOptions &AnalyzerOpts,
				PathDiagnosticConsumers &C,
				const std::string &Output,
				const Preprocessor &) {
				C.push_back(new SarifDiagnostics(AnalyzerOpts, Output));
				}

				static StringRef getFileName(const FileEntry &FE) {
				StringRef Filename = FE.tryGetRealPathName();
				if (Filename.empty())
				Filename = FE.getName();
				return Filename;
				}

				static std::string percentEncodeURICharacter(char C) {
				// RFC 3986 claims alpha, numeric, and this handful of
				// characters are not reserved for the path component and
				// should be written out directly. Otherwise, percent
				// encode the character and write that out instead of the
				// reserved character.
				if (llvm::isAlnum(C) \|\|
				StringRef::npos != StringRef("-._~:@!$&'()*+,;=").find(C))
				return std::string(&C, 1);
				return "%" + llvm::toHex(StringRef(&C, 1));
				george.karpenkovUnsubmitted Not Done Reply Inline Actions +1, I would use this in other consumers. george.karpenkov: +1, I would use this in other consumers.
				aaron.ballmanAuthorUnsubmitted Not Done Reply Inline Actions Not certain I understand this comment. aaron.ballman: Not certain I understand this comment.
				}

				static std::string fileNameToURI(StringRef Filename) {
				llvm::SmallString<32> Ret = "file://";

				// Get the root name to see if it has a URI authority.
				george.karpenkovUnsubmitted Done Reply Inline Actions Nitpicking style, but I don't see why for-each loop, preferably with a range wrapping the iterators would not be more readable. george.karpenkov: Nitpicking style, but I don't see why for-each loop, preferably with a range wrapping the…
				aaron.ballmanAuthorUnsubmitted Done Reply Inline Actions I tend to prefer using algorithms when the logic is simple -- it makes it more clear that the loop is an unimportant detail. I don't have strong opinions on this particular loop, however. aaron.ballman: I tend to prefer using algorithms when the logic is simple -- it makes it more clear that the…
				StringRef Root = sys::path::root_name(Filename);
				if (Root.startswith("//")) {
				// There is an authority, so add it to the URI.
				Ret += Root.drop_front(2).str();
				} else {
				// There is no authority, so end the component and add the root to the URI.
				Ret += Twine("/" + Root).str();
				}

				// Add the rest of the path components, encoding any reserved characters.
				std::for_each(std::next(sys::path::begin(Filename)), sys::path::end(Filename),
				[&Ret](StringRef Component) {
				// For reasons unknown to me, we may get a backslash with
				// Windows native paths for the initial backslash following
				ZaMaZaN4iKUnsubmitted Done Reply Inline Actions I don't know about const corectness policy in LLVM. But I prefer here const char C . ZaMaZaN4iK: I don't know about const corectness policy in LLVM. But I prefer here const char C .
				aaron.ballmanAuthorUnsubmitted Done Reply Inline Actions We typically do not put top-level `const` on locals, so I'd prefer to leave it off here rather than be inconsistent. aaron.ballman: We typically do not put top-level `const` on locals, so I'd prefer to leave it off here rather…
				// the drive component, which we need to ignore as a URI path
				// part.
				if (Component == "\\")
				return;

				// Add the separator between the previous path part and the
				ZaMaZaN4iKUnsubmitted Done Reply Inline Actions Probably this piece of code will be better to write separately as function ZaMaZaN4iK: Probably this piece of code will be better to write separately as function
				// one being currently processed.
				Ret += "/";

				// URI encode the part.
				for (char C : Component) {
				Ret += percentEncodeURICharacter(C);
				}
				});

				return Ret.str().str();
				}

				static json::Object createFileLocation(const FileEntry &FE) {
				return json::Object{{"uri", fileNameToURI(getFileName(FE))}};
				}

				static json::Object createFile(const FileEntry &FE) {
				return json::Object{{"fileLocation", createFileLocation(FE)},
				{"roles", json::Array{"resultFile"}},
				{"length", FE.getSize()},
				{"mimeType", "text/plain"}};
				}

				static json::Object createFileLocation(const FileEntry &FE,
				json::Object &Files) {
				std::string FileURI = fileNameToURI(getFileName(FE));
				if (!Files.get(FileURI))
				Files[FileURI] = createFile(FE);

				return json::Object{{"uri", FileURI}};
				}

				static json::Object createTextRegion(SourceRange R, const SourceManager &SM) {
				return json::Object{
				{"startLine", SM.getExpansionLineNumber(R.getBegin())},
				{"endLine", SM.getExpansionLineNumber(R.getEnd())},
				{"startColumn", SM.getExpansionColumnNumber(R.getBegin())},
				{"endColumn", SM.getExpansionColumnNumber(R.getEnd())}};
				}

				static json::Object createPhysicalLocation(SourceRange R, const FileEntry &FE,
				const SourceManager &SMgr,
				json::Object &Files) {
				return json::Object{{{"fileLocation", createFileLocation(FE, Files)},
				{"region", createTextRegion(R, SMgr)}}};
				}

				enum class Importance { Important, Essential, Unimportant };

				static StringRef importanceToStr(Importance I) {
				switch (I) {
				case Importance::Important:
				return "important";
				case Importance::Essential:
				return "essential";
				case Importance::Unimportant:
				return "unimportant";
				}
				llvm_unreachable("Fully covered switch is not so fully covered");
				}

				static json::Object createThreadFlowLocation(int Step, json::Object &&Location,
				Importance I) {
				return json::Object{{"step", Step},
				{"location", std::move(Location)},
				{"importance", importanceToStr(I)}};
				}

				static json::Object createMessage(StringRef Text) {
				return json::Object{{"text", Text.str()}};
				}

				static json::Object createLocation(json::Object &&PhysicalLocation,
				StringRef Message = "") {
				json::Object Ret{{"physicalLocation", std::move(PhysicalLocation)}};
				if (!Message.empty())
				Ret.insert({"message", createMessage(Message)});
				return Ret;
				}

				static Importance calculateImportance(const PathDiagnosticPiece &Piece) {
				StringRef PieceStr = Piece.getString();

				switch (Piece.getKind()) {
				case PathDiagnosticPiece::Kind::Call:
				case PathDiagnosticPiece::Kind::Macro:
				case PathDiagnosticPiece::Kind::Note:
				// FIXME: What should be reported here?
				george.karpenkovUnsubmitted Done Reply Inline Actions "Note" are notes which do not have to be attached to a particular path element. george.karpenkov: "Note" are notes which do not have to be attached to a particular path element.
				aaron.ballmanAuthorUnsubmitted Done Reply Inline Actions Good to know! aaron.ballman: Good to know!
				break;
				case PathDiagnosticPiece::Kind::Event:
				return Piece.getTagStr() == "ConditionBRVisitor" ? Importance::Important
				: Importance::Essential;
				case PathDiagnosticPiece::Kind::ControlFlow:
				return Importance::Unimportant;
				}
				return Importance::Unimportant;
				}

				static json::Object createThreadFlow(const PathPieces &Pieces,
				json::Object &Files) {
				const SourceManager &SMgr = Pieces.front()->getLocation().getManager();
				int Step = 1;
				json::Array Locations;
				for (const auto &Piece : Pieces) {
				const PathDiagnosticLocation &P = Piece->getLocation();
				Locations.push_back(createThreadFlowLocation(
				Step++,
				createLocation(createPhysicalLocation(P.asRange(),
				*P.asLocation().getFileEntry(),
				SMgr, Files),
				Piece->getString()),
				calculateImportance(*Piece)));
				}
				return json::Object{{"locations", std::move(Locations)}};
				}

				static json::Object createCodeFlow(const PathPieces &Pieces,
				json::Object &Files) {
				return json::Object{
				{"threadFlows", json::Array{createThreadFlow(Pieces, Files)}}};
				}

				static json::Object createTool() {
				return json::Object{{"name", "clang"},
				{"fullName", "clang static analyzer"},
				{"language", "en-US"},
				{"version", getClangFullVersion()}};
				}

				static json::Object createResult(const PathDiagnostic &Diag,
				json::Object &Files) {
				const PathPieces &Path = Diag.path.flatten(false);
				const SourceManager &SMgr = Path.front()->getLocation().getManager();

				return json::Object{
				{"message", createMessage(Diag.getVerboseDescription())},
				{"codeFlows", json::Array{createCodeFlow(Path, Files)}},
				{"locations",
				json::Array{createLocation(createPhysicalLocation(
				Diag.getLocation().asRange(),
				*Diag.getLocation().asLocation().getFileEntry(), SMgr, Files))}},
				{"ruleId", Diag.getCheckName()}};
				}

				static json::Object createRun(std::vector<const PathDiagnostic *> &Diags) {
				json::Array Results;
				json::Object Files;

				llvm::for_each(Diags, [&](const PathDiagnostic *D) {
				george.karpenkovUnsubmitted Done Reply Inline Actions I like closures, but what's wrong with just using a `for` loop here? george.karpenkov: I like closures, but what's wrong with just using a `for` loop here?
				aaron.ballmanAuthorUnsubmitted Done Reply Inline Actions Same as above: clarity of exposition. This one I'd feel pretty strongly about keeping as an algorithm given how trivial the loop body is. aaron.ballman: Same as above: clarity of exposition. This one I'd feel pretty strongly about keeping as an…
				Results.push_back(createResult(*D, Files));
				});

				return json::Object{{"tool", createTool()},
				{"results", std::move(Results)},
				{"files", std::move(Files)}};
				}

				void SarifDiagnostics::FlushDiagnosticsImpl(
				std::vector<const PathDiagnostic > &Diags, FilesMade ) {
				// FIXME: if the file already exists, do we overwrite it with a single run,
				george.karpenkovUnsubmitted Done Reply Inline Actions Usually we overwrite the file and note that on stderr in such cases. george.karpenkov: Usually we overwrite the file and note that on stderr in such cases.
				aaron.ballmanAuthorUnsubmitted Done Reply Inline Actions We took the decision internally to overwrite as well, but the SARIF format allows for multiple runs within the same output file (so you can compare analysis results for the same project over time). I think this will eventually have to be user-controlled because I can see some users wanting to append to the run and others wanting to overwrite. However, these log files can become quite large in practice (GBs of data), so "read in the JSON and add to it" may be implausible, hence why I punted for now. I'll update the comment so it's not a FIXME. aaron.ballman: We took the decision internally to overwrite as well, but the SARIF format allows for multiple…
				// or do we append a run into the file if it's a valid SARIF log?
				std::error_code EC;
				llvm::raw_fd_ostream OS(OutputFile, EC, llvm::sys::fs::F_Text);
				if (EC) {
				llvm::errs() << "warning: could not create file: " << EC.message() << '\n';
				return;
				}
				json::Object Sarif{{"$schema", "http://json.schemastore.org/sarif-2.0.0"},
				{"version", "2.0.0-beta.2018-09-26"},
				{"runs", json::Array{createRun(Diags)}}};
				OS << llvm::formatv("{0:2}", json::Value(std::move(Sarif)));
				}

clang/StaticAnalyzer/Core/Analyses.def

	Show All 27 Lines
	#define ANALYSIS_DIAGNOSTICS(NAME, CMDFLAG, DESC, CREATEFN)			#define ANALYSIS_DIAGNOSTICS(NAME, CMDFLAG, DESC, CREATEFN)
	#endif			#endif

	ANALYSIS_DIAGNOSTICS(HTML, "html", "Output analysis results using HTML", createHTMLDiagnosticConsumer)			ANALYSIS_DIAGNOSTICS(HTML, "html", "Output analysis results using HTML", createHTMLDiagnosticConsumer)
	ANALYSIS_DIAGNOSTICS(HTML_SINGLE_FILE, "html-single-file", "Output analysis results using HTML (not allowing for multi-file bugs)", createHTMLSingleFileDiagnosticConsumer)			ANALYSIS_DIAGNOSTICS(HTML_SINGLE_FILE, "html-single-file", "Output analysis results using HTML (not allowing for multi-file bugs)", createHTMLSingleFileDiagnosticConsumer)
	ANALYSIS_DIAGNOSTICS(PLIST, "plist", "Output analysis results using Plists", createPlistDiagnosticConsumer)			ANALYSIS_DIAGNOSTICS(PLIST, "plist", "Output analysis results using Plists", createPlistDiagnosticConsumer)
	ANALYSIS_DIAGNOSTICS(PLIST_MULTI_FILE, "plist-multi-file", "Output analysis results using Plists (allowing for multi-file bugs)", createPlistMultiFileDiagnosticConsumer)			ANALYSIS_DIAGNOSTICS(PLIST_MULTI_FILE, "plist-multi-file", "Output analysis results using Plists (allowing for multi-file bugs)", createPlistMultiFileDiagnosticConsumer)
	ANALYSIS_DIAGNOSTICS(PLIST_HTML, "plist-html", "Output analysis results using HTML wrapped with Plists", createPlistHTMLDiagnosticConsumer)			ANALYSIS_DIAGNOSTICS(PLIST_HTML, "plist-html", "Output analysis results using HTML wrapped with Plists", createPlistHTMLDiagnosticConsumer)
				ANALYSIS_DIAGNOSTICS(SARIF, "sarif", "Output analysis results in a SARIF file", createSarifDiagnosticConsumer)
	ANALYSIS_DIAGNOSTICS(TEXT, "text", "Text output of analysis results", createTextPathDiagnosticConsumer)			ANALYSIS_DIAGNOSTICS(TEXT, "text", "Text output of analysis results", createTextPathDiagnosticConsumer)

	#ifndef ANALYSIS_PURGE			#ifndef ANALYSIS_PURGE
	#define ANALYSIS_PURGE(NAME, CMDFLAG, DESC)			#define ANALYSIS_PURGE(NAME, CMDFLAG, DESC)
	#endif			#endif

	ANALYSIS_PURGE(PurgeStmt, "statement", "Purge symbols, bindings, and constraints before every statement")			ANALYSIS_PURGE(PurgeStmt, "statement", "Purge symbols, bindings, and constraints before every statement")
	ANALYSIS_PURGE(PurgeBlock, "block", "Purge symbols, bindings, and constraints before every basic block")			ANALYSIS_PURGE(PurgeBlock, "block", "Purge symbols, bindings, and constraints before every basic block")
	Show All 16 Lines

clang/StaticAnalyzer/Core/BugReporter/PathDiagnostic.h

Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	enum PathGenerationScheme {
/// Only runs visitors, no output generated.		/// Only runs visitors, no output generated.
None,		None,

/// Used for HTML and text output.		/// Used for HTML and text output.
Minimal,		Minimal,

/// Used for plist output, used for "arrows" generation.		/// Used for plist output, used for "arrows" generation.
Extensive,		Extensive,

		/// Used for SARIF output.
		Sarif,
		george.karpenkovUnsubmitted Not Done Reply Inline Actions Do you actually need a new generation scheme here? I'm pretty sure that using "Minimal" would give you the same effect. george.karpenkov: Do you actually need a new generation scheme here? I'm pretty sure that using "Minimal" would…
		aaron.ballmanAuthorUnsubmitted Not Done Reply Inline Actions I don't think it's currently needed, I've removed but updated the comment. aaron.ballman: I don't think it's currently needed, I've removed but updated the comment.
};		};

virtual PathGenerationScheme getGenerationScheme() const { return Minimal; }		virtual PathGenerationScheme getGenerationScheme() const { return Minimal; }
virtual bool supportsLogicalOpControlFlow() const { return false; }		virtual bool supportsLogicalOpControlFlow() const { return false; }

/// Return true if the PathDiagnosticConsumer supports individual		/// Return true if the PathDiagnosticConsumer supports individual
/// PathDiagnostics that span multiple files.		/// PathDiagnostics that span multiple files.
virtual bool supportsCrossFileDiagnostics() const { return false; }		virtual bool supportsCrossFileDiagnostics() const { return false; }
▲ Show 20 Lines • Show All 192 Lines • Show Last 20 Lines