This is an archive of the discontinued LLVM Phabricator instance.

[clang] Enable output of SARIF diagnostics
ClosedPublic

Authored by abrahamcd on Aug 10 2022, 4:44 PM.

Details

Summary

Enables Clang to emit diagnostics in SARIF format when
-fdiagnostics-format=sarif. Adds a new DiagnosticConsumer named
SARIFDiagnosticPrinter and a new DiagnosticRenderer named SARIFDiagnostic
to constuct and emit a SARIF object containing the run's basic diagnostic info.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Herald added a project: Restricted Project. · View Herald TranscriptAug 10 2022, 4:44 PM
Herald added a subscriber: mgorny. · View Herald Transcript
abrahamcd requested review of this revision.Aug 10 2022, 4:44 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 10 2022, 4:44 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
abrahamcd updated this revision to Diff 452833.Aug 15 2022, 3:56 PM
abrahamcd retitled this revision from [WIP] Enable SARIF Diagnostics to [clang] Enable output of SARIF diagnostics.
abrahamcd edited the summary of this revision. (Show Details)

Removed unused remaining parts of traditional text diagnostics.

abrahamcd updated this revision to Diff 452840.Aug 15 2022, 4:21 PM

Minor cleanup.

abrahamcd updated this revision to Diff 452843.Aug 15 2022, 4:33 PM

Naming fixes.

abrahamcd added inline comments.Aug 15 2022, 4:58 PM
clang/lib/Frontend/SARIFDiagnostic.cpp
76

@vaibhav.y , just wanted to confirm, the SarifDiagnosticWriter can only add locations that include row/column numbers, correct?

clang/lib/Frontend/SARIFDiagnosticPrinter.cpp
72

Does anyone know specific kinds of diagnostics that might be covered here in traditional diagnostics? I'm having trouble thinking of how we can support a simpler path for these kinds of diagnostics in SARIF since the Writer requires a source manager.

clang/test/Frontend/sarif-diagnostics.cpp
4

In regards to testing this, my understanding is that we are moving away from the unit test GTest testing, correct? I included a FileCheck implementation as well, but I wasn't sure how effective just checking the whole SARIF object like this would be.

abrahamcd updated this revision to Diff 453126.Aug 16 2022, 2:12 PM

Fixed FileCheck test case and added multiple source range error.

vaibhav.y added inline comments.Aug 16 2022, 2:14 PM
clang/lib/Frontend/SARIFDiagnostic.cpp
76

Yes, this is currently not supported.

Removed Clang name from FileCheck test.

abrahamcd marked 2 inline comments as not done.Aug 18 2022, 10:04 AM
vaibhav.y added inline comments.Aug 18 2022, 11:23 AM
clang/lib/Frontend/SARIFDiagnostic.cpp
156

Does this need an llvm_unreachable after the switch?

178
clang/lib/Frontend/SARIFDiagnosticPrinter.cpp
72

The location maybe if the diagnostic's source is located in the scratch buffer. Likely for macro expansions where token pasting is involved. Another case would be errors on the command line.

I'm not entirely sure how the SARIF spec would handle this case, it might require an extension.

A few ways that might work could be:

Using the logicalLocations property to specify (logicalLocation object), this might need an extension for kind: "macro", another case that might need extension is diagnostics about invalid command line flags which are also diagnostics without a valid

The parentIndex for these logical locations could be set to the physical location that produced them.

I think this definitely warrants some discussion since the spec doesn't provide a clear way to express these cases.

WDYT @aaron.ballman @cjdb @denik

vaibhav.y added inline comments.Aug 18 2022, 11:34 AM
clang/lib/Frontend/SARIFDiagnosticPrinter.cpp
72

The spec does say for "kind":

If none of those strings accurately describes the construct, kind MAY contain any value specified by the analysis tool.

So an extension might not be necessary, but might be worth discussing.

Precommit CI found a relevant failure.

clang/lib/Frontend/SARIFDiagnosticPrinter.cpp
72

From looking through the spec, I think logicalLocations is probably the right choice and we'd want to make up our own kind for things like the scratch buffer or the command line. I think an extension would be worth discussing.

abrahamcd updated this revision to Diff 453788.Aug 18 2022, 2:28 PM

Commented out unfinished test case

Precommit CI found a relevant failure.

I think this was just from the unfinished unit test. I can go ahead and remove clang/unittests/Frontend/SARIFDiagnosticTest.cpp entirely if we don't plan to use unit tests and the FileCheck test in clang/test/Frontend/sarif-diagnostics.cpp is enough.

denik added a comment.Aug 19 2022, 3:02 PM

Thanks Abraham!

Please mention in the summary that some part of SARIF Diag implementation was copied from Text Diag and further refactoring is needed.

Also, like we discussed:

  • drop the unittest;
  • replace strict match of json output in FileCheck with partial matches of the fields that we are interested in (if possible);
  • rebase and incorporate setDefaultConfiguration from D131084.
  • on the logicalLocations I have to take a look if we can add it in this patch or defer it and add a warning that this diagnostic is currently unsupported.
clang/include/clang/Frontend/SARIFDiagnosticPrinter.h
41

std::unique_ptr<SarifDocumentWriter> Writer;

clang/lib/Frontend/SARIFDiagnosticPrinter.cpp
49

Value(Writer->createDocument());

abrahamcd updated this revision to Diff 454113.Aug 19 2022, 3:06 PM
abrahamcd marked 2 inline comments as done.

Added diagnostic level and default configuration to SARIF output.
Removed unit test file.

abrahamcd edited the summary of this revision. (Show Details)Aug 19 2022, 3:19 PM
denik added inline comments.Aug 19 2022, 3:20 PM
clang/lib/Frontend/SARIFDiagnostic.cpp
156

I guess no, because of breaks. But...
Although case DiagnosticsEngine::Ignored unreachable was copied from TextDiagnostic it looks a bit confusing to me.
Do we want to store Ignored with SarifResultLevel::None or keep it unreachable?
With the latter I would replace case DiagnosticsEngine::Ignored with default to cover new unhandled types.

abrahamcd updated this revision to Diff 454129.Aug 19 2022, 3:41 PM
abrahamcd edited the summary of this revision. (Show Details)

Addressed review style comments.

abrahamcd marked 2 inline comments as done.Aug 19 2022, 3:41 PM
aaron.ballman added inline comments.Aug 22 2022, 10:45 AM
clang/include/clang/Frontend/SARIFDiagnosticPrinter.h
43

There's not a lot of benefit to using bit-fields here yet, so I'd make this field a bool instead for the time being. If we add more fields here where packing them into a bit-field helps, then we can revisit at that time.

46
64–66

This interface seems dangerous to me -- the caller has no idea that the pointer will be stashed off as a unique_ptr and so the caller might expect to still be able to use its (now not unique) pointer value.

clang/lib/Frontend/SARIFDiagnostic.cpp
44
44–48
60–62
91–93
106–108
128
147

This is reachable and it's a programmer mistake if we get here.

163–164
170–174
175

Please spell out the type since it's not spelled out in the initialization. Also, the next line is checking for a file -- what if File is null?

190

I'm not certain we want a difference in behavior here. For starters, Windows also has hardlinks and symlinks (they're not exactly the same as on *nix systems though), so I think the assumption these paths point to the same place is valid. But also, platform-specific behavior should be exceedingly rare in most parts of the compiler. Is there significant harm in unifying the behavior across platforms?

192

Note: this is not a particularly small string when it requires 4k by default.

clang/lib/Frontend/SARIFDiagnosticPrinter.cpp
28–31

It's ugly but it's the pattern we use all over the codebase (and it corrects the naming convention issues).

57–58
72
cjdb added a comment.Aug 22 2022, 11:34 AM

This looks great, thank you so much @abrahamcd!

clang/include/clang/Frontend/SARIFDiagnosticPrinter.h
32

Please make OS a pointer instead of a reference, since member references make usage very unpleasant.

32–43

Nit: please move all private members below the public interface, where possible.

43

Can we make this a bool instead? Unless I'm mistaken, there's 31 bits of padding for OwnsOutputStream, as opposed to a bool's (almost certainly) eight bits.

53

We should be passing in a StringRef rather than a string, since this will unnecessarily construct a string for C string literals.

clang/lib/Frontend/FrontendAction.cpp
726–729

We shouldn't have raw owning pointers, since they're vulnerable to memory leaks. Even though LLVM doesn't throw exceptions, it's still good practice to avoid them.

clang/lib/Frontend/SARIFDiagnostic.cpp
31–32

Please replace this using-directive with a namespace scope.

79

I think it would be good to file an issue on GitHub and change to FIXME(llvm-project/<issue>). @aaron.ballman WDYT?

87–89

I haven't seen any use of RI or RE in this loop, except to dereference RI, so we should change this to a range-for loop.

163
211–224

If these aren't called, then they should probably assert and say that they're not implemented.

clang/lib/Frontend/SARIFDiagnosticPrinter.cpp
25–26

Please replace with a namespace scope.

72

We should defer this to a future CL, so that Abraham isn't blocked by our decision-making (and so we can make the right decision). I can start a GitHub issue to get the discussion in a good spot?

clang/test/Frontend/sarif-diagnostics.cpp
4

In regards to testing this, my understanding is that we are moving away from the unit test GTest testing, correct?

Yep, that's correct.

I included a FileCheck implementation as well, but I wasn't sure how effective just checking the whole SARIF object like this would be.

Isn't this the FileCheck test?

aaron.ballman added inline comments.Aug 22 2022, 12:31 PM
clang/lib/Frontend/SARIFDiagnostic.cpp
79

I'd be fine with that.

clang/lib/Frontend/SARIFDiagnosticPrinter.cpp
72

SGTM (I don't consider this to be blocking for this patch, FWIW).

denik added inline comments.Aug 22 2022, 12:54 PM
clang/lib/Frontend/SARIFDiagnosticPrinter.cpp
72

I agree that logicalLocations is a good use for the mentioned cases.

In addition, the spec mentions that location object can miss both physical and logical location properties in which cases it has to include the message property (see 3.28.1). I think this also could be a good use case for the command line flag errors.

To unblock this change we can leave result.locations empty. It's allowed by the spec (see 3.27.12). From what I see, Sarif.cpp doesn't use the source manager if Locations is empty.
But locations still has to be present with the empty list. This might need a fix in https://clang.llvm.org/doxygen/Sarif_8cpp_source.html#l00387.

abrahamcd updated this revision to Diff 454652.Aug 22 2022, 4:55 PM
abrahamcd marked 26 inline comments as done.

Addressed review comments on formatting, style, C++ best practices.

clang/include/clang/Frontend/SARIFDiagnosticPrinter.h
64–66

Would it be best to store as a regular pointer instead of a unique pointer in this case then?

clang/lib/Frontend/SARIFDiagnostic.cpp
163

This creates -Wcovered-switch-default warning so I just removed the unreachable.

175

If I am interpreting it correctly (this is adapted from TextDiagnostic.cpp), I believe the passed-in Filename will just be returned unaltered. This is probably one of those functions that would be good to pull out during refactoring.

190

There seems to be Windows-specific tests that rely on this behavior, but I'm not too sure what the harm in unifying it would be. There's some discussion on it here: D59415.

abrahamcd added inline comments.Aug 22 2022, 4:56 PM
clang/test/Frontend/sarif-diagnostics.cpp
4

Yes, I meant "this FileCheck implementation" (back then I had the unit test implementation in here as well but I have since removed it).

aaron.ballman added inline comments.Aug 23 2022, 5:55 AM
clang/include/clang/Frontend/SARIFDiagnosticPrinter.h
64–66

My preference is to use a unique_ptr whenever possible so that ownership semantics are clear. So my recommendation is to have the user construct the unique_ptr and hand it off through this interface with a move.

clang/lib/Frontend/SARIFDiagnostic.cpp
190

Then let's skip this for now -- if there's something we want to address, we can do it in a follow-up.

abrahamcd marked 4 inline comments as done.

Modified Printer's interface to use unique pointers when setting the SARIF writer.

abrahamcd updated this revision to Diff 454926.EditedAug 23 2022, 12:22 PM

Documented issue with representing PresumedLocs modified by #line directives.

abrahamcd added inline comments.Aug 23 2022, 1:04 PM
clang/lib/Frontend/SARIFDiagnostic.cpp
117

I noticed that when processing additional source ranges, regular source locations are used instead of the presumed locations. Is this an intentional difference from single location diagnostics? (This issue is present in the current TextDiagnostics as well.)

130–133

There's an issue I encountered with adding presumed locations that have been modified by a #line directive to SARIF. To add a location, the Writer requires a CharSourceRange, which uses regular SourceLocations. If the presumed location has a modified line number, the source manager will still interpret it as a regular line number in the source file and can cause the wrong column number to be added to the SARIF object.

@vaibhav.y Do you have an idea of how involved it would be to add an option to the document writer that adds locations without first creating a CharSourceRange?

cjdb added inline comments.Aug 23 2022, 1:08 PM
clang/lib/Frontend/FrontendAction.cpp
728–729

Now that the interface is using unique_ptr, we should replace with std::make_unique.

clang/lib/Frontend/SARIFDiagnostic.cpp
130

Please assign this a GitHub issue. You can assign the issue to me.

192

There's a bug either here or in the function's interface: the function returns a StringRef to a stack object. Do we really need ownership here?

clang/lib/Frontend/SARIFDiagnosticPrinter.cpp
36–37

This makes me very uncomfortable. @aaron.ballman do we have a smart-pointer that works like variant<T*, unique_ptr<T>>?

44

Please replace with make_unique.

abrahamcd updated this revision to Diff 454950.Aug 23 2022, 1:47 PM
abrahamcd marked 2 inline comments as done.

Added use of std::make_unique.

cjdb added inline comments.Aug 23 2022, 3:45 PM
clang/include/clang/Frontend/SARIFDiagnosticPrinter.h
42

We don't need to move a StringRef.

clang/lib/Frontend/SARIFDiagnosticPrinter.cpp
44

This won't work because .get() only returns an observer, so the object will be deleted when this statement completes. We'll need to do

SARIFDiag = std::make_unique<SARIFDiagnostic>(*OS, LO, &*DiagOpts, &*Writer);
abrahamcd updated this revision to Diff 455005.Aug 23 2022, 4:23 PM

Removed use of pointer reset in Printer.

abrahamcd marked 2 inline comments as done.Aug 23 2022, 4:23 PM
cjdb accepted this revision.Aug 24 2022, 12:32 PM

Open comments notwithstanding, I'm happy with this patch. Thank you for working on this, it's a huge step towards getting more helpful diagnostics for C++!

@aaron.ballman, @vaibhav.y, do you folks have any further concerns? (@denik has some commentary incoming)

clang/include/clang/Frontend/SARIFDiagnostic.h
25–27

These can go in the private section below.

clang/include/clang/Frontend/SARIFDiagnosticPrinter.h
73

This can't ever happen, because there's no constructor that doesn't set OwnsOutputStream.

clang/lib/Frontend/SARIFDiagnostic.cpp
87–89

It seems @aaron.ballman has finally trained me :(

This revision is now accepted and ready to land.Aug 24 2022, 12:32 PM
aaron.ballman added inline comments.Aug 24 2022, 2:30 PM
clang/include/clang/Frontend/SARIFDiagnosticPrinter.h
32

Please make OS a pointer instead of a reference, since member references make usage very unpleasant.

Err... I disagree (and just deleted a comment above asking for OS to turn into a reference rather than a pointer because it can never be null). I think we want to use a reference here because 1) it conveys the correct "I can never be null" semantics so maintainers don't have to ask when that can happen, and 2) copies of this class are a bad idea to begin with (we should probably delete the copy and move operations).

We use reference members elsewhere: https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Sema/Sema.h#L407 (not that I would hold Sema up as an example of best practices, lol).

aaron.ballman added inline comments.Aug 24 2022, 2:32 PM
clang/lib/Frontend/SARIFDiagnostic.cpp
87–89

Macro ohno:

My work here is done. ;-)

abrahamcd updated this revision to Diff 455394.Aug 24 2022, 3:13 PM
abrahamcd marked 4 inline comments as done.

Deleted copy/move from renderer and returned OS to reference.

denik added inline comments.Aug 24 2022, 4:27 PM
clang/include/clang/Frontend/SARIFDiagnostic.h
25–27

In addition, I think it's worth putting a comment here about Writer's ownership and why it's a raw pointer here (like we discussed offline).

clang/include/clang/Frontend/SARIFDiagnosticPrinter.h
3

Please adjust the length to fit it in one line.

clang/lib/Frontend/SARIFDiagnostic.cpp
3

Same here.

39

Could you please add FIXME with a pointer to llvm-project/issues/57323 to refactor Diagnostic classes?

59

I think we should add a test case when Loc or/and Source Manager is not set.
If Loc is not set SarifDocumentWriter might not create the locations property which is required by the spec.
If this is the case we need to fix it. But I'm not sure if emitDiagnosticMessage() is going to be called when Loc.isInvalid() ).

79

@abrahamcd please file the llvm-project/issues and link it here.

117

Put FIXME here (if it's not already somewhere else).

192

Note: this is not a particularly small string when it requires 4k by default.

This still has to be addressed.

clang/lib/Frontend/SARIFDiagnosticPrinter.cpp
3

Same as above. One line.

aaron.ballman added inline comments.Aug 25 2022, 4:51 AM
clang/include/clang/Frontend/SARIFDiagnostic.h
31–34

Formatting nits.

46–48

Move this implementation to live with the rest and give it an assertion?

clang/include/clang/Frontend/SARIFDiagnosticPrinter.h
73

This can go away, this never owns the output stream.

clang/lib/Frontend/SARIFDiagnosticPrinter.cpp
36–37

No, but we shouldn't need it -- this object *never* owns the output stream, so I think we should remove the member variable and this code (we can default the dtor then).

43

Should we also assert that we don't already have a valid SARIFDiag object (in case someone calls BeginSourceFile() twice)?

54

Should we be invalidating that SARIFDiag object here so the printer cannot be used after the source file has ended?

cjdb added inline comments.Aug 25 2022, 11:46 AM
clang/lib/Frontend/SARIFDiagnostic.cpp
192

There's a bug either here or in the function's interface: the function returns a StringRef to a stack object. Do we really need ownership here?

Disregard, I misread TmpFilename as Filename and thought we were returning a dangling reference. That isn't the case, so there is no bug.

abrahamcd updated this revision to Diff 455728.Aug 25 2022, 3:00 PM
abrahamcd marked 15 inline comments as done.

Addressed some refactoring and reformatting comments.

clang/include/clang/Frontend/SARIFDiagnostic.h
46–48

This function is currently being called from the DiagnosticRenderer class that both Text and SARIFDiagnostics inherit from. Maybe this could be part of the refactoring, making sure that any text-specific function calls are moved to TextDiagnostic rather than being in the general Renderer base class?

clang/lib/Frontend/SARIFDiagnostic.cpp
59

I believe if Loc is invalid, it goes to one of those special cases I mentioned in this review.

cjdb added inline comments.Aug 25 2022, 3:01 PM
clang/lib/Frontend/SARIFDiagnosticPrinter.cpp
54

I suggested that we don't do this, because the next run of BeginSourceFile can take care of that when making the new SARIFDiagnostic. However, if we were to put an assert at the top of every member function other than BeginSourceFile to ensure that SARIFDiag isn't null (and one at the top of BeginSourceFile to ensure that it _is_), I think there might be good value in re-adding this.

abrahamcd updated this revision to Diff 455729.Aug 25 2022, 3:04 PM

Ran clang-format.

denik accepted this revision.Aug 25 2022, 3:32 PM
denik added inline comments.
clang/lib/Frontend/SARIFDiagnostic.cpp
59

If it does then "if" is redundant here.

aaron.ballman accepted this revision.Aug 26 2022, 4:41 AM

LGTM!

clang/include/clang/Frontend/SARIFDiagnostic.h
46–48

Ah, good call, the assert definitely would be wrong then. Let's leave it alone for now, it's fine as-is.

clang/lib/Frontend/SARIFDiagnosticPrinter.cpp
54

SGTM, thanks for weighing in!

abrahamcd marked 10 inline comments as done.

Resolved pending comments.

clang/lib/Frontend/SARIFDiagnostic.cpp
59

Looked closer and the base Renderer calls this function in both cases when Loc is valid and when it is invalid, so the extra check is necessary. In the case that Loc is invalid, you're right that results will not have locations. This might be something to add to the Writer, I couldn't find a way to add an empty locations object.

192

Made it smaller for now, but this should be revisited during refactoring, since text diag still uses 4096

Deleted copy and move for Printer and added asserts.

Reset SARIFDiag during EndSourceFile().

This revision was landed with ongoing or failed builds.Aug 26 2022, 11:50 AM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Aug 26 2022, 12:30 PM

This adds warnings:

[498/568] CXX obj/clang/lib/Frontend/Frontend.SARIFDiagnostic.o
../../clang/lib/Frontend/SARIFDiagnostic.cpp:74:25: warning: unused variable 'Filename' [-Wunused-variable]
        llvm::StringRef Filename =
                        ^
In file included from ../../clang/lib/Frontend/SARIFDiagnostic.cpp:9:
../../clang/include/clang/Frontend/SARIFDiagnostic.h:58:16: warning: private field 'OS' is not used [-Wunused-private-field]
  raw_ostream &OS;
               ^
dyung added a subscriber: dyung.Aug 26 2022, 10:33 PM
dyung added inline comments.
clang/test/Frontend/sarif-diagnostics.cpp
8

Can this CHECK line be broken into smaller pieces? This test is failing on our internal bot, and I'm going crosseyed trying to figure out what the difference is because the line is 3741 characters long!

dyung added inline comments.Aug 26 2022, 10:41 PM
clang/test/Frontend/sarif-diagnostics.cpp
8

Is there any significance to the "ruleId" and "Id" values in this test? If not, can we just use a regex to check for a number? Our internal build with private changes is failing due to slightly different numbers in some of the "ruleId" fields

uabelho added inline comments.Aug 29 2022, 6:22 AM
clang/test/Frontend/sarif-diagnostics.cpp
8

+1 on this. We also get different "ruleId" and "Id" values downstream and it's quite painful trying to figure out where the diffs are in a +3700 character CHECK.

aaron.ballman added inline comments.Aug 29 2022, 6:43 AM
clang/test/Frontend/sarif-diagnostics.cpp
8

I ran into this problem this morning as well. I've improved the situation somewhat in b345be177d03166add391f090fd0288a23413934, but I think we really could use some better diff tooling than FileCheck for diffing between two JSON files.

The test can still be made less fragile despite the changes I made, which were just enough to get the test passing for me locally again, but running considerably faster (it turns out long FileCheck lines take a while to process).

tstellar added inline comments.
clang/test/Frontend/sarif-diagnostics.cpp
8

This check line assumes the clang directory is called clang, which is not true for the release tarballs that we distribute. Need to drop the clang/ directory from the path name.

aaron.ballman added inline comments.Jan 24 2023, 12:34 PM
clang/test/Frontend/sarif-diagnostics.cpp
8

Good catch, I'll fix that up shortly.

aaron.ballman added inline comments.Jan 24 2023, 12:38 PM
clang/test/Frontend/sarif-diagnostics.cpp
8

This should be fixed by 0647f4a77a2f9aab84c961982193bea1a9eb1585; please let me know if there's still a problem.