Page MenuHomePhabricator

[clang] Enable output of SARIF diagnostics
ClosedPublic

Authored by abrahamcd on Aug 10 2022, 4:44 PM.

Details

Summary

Enables Clang to emit diagnostics in SARIF format when
-fdiagnostics-format=sarif. Adds a new DiagnosticConsumer named
SARIFDiagnosticPrinter and a new DiagnosticRenderer named SARIFDiagnostic
to constuct and emit a SARIF object containing the run's basic diagnostic info.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
abrahamcd added inline comments.Aug 22 2022, 4:56 PM
clang/include/clang/Frontend/SARIFDiagnosticPrinter.h
64–66

Would it be best to store as a regular pointer instead of a unique pointer in this case then?

clang/lib/Frontend/SARIFDiagnostic.cpp
163

This creates -Wcovered-switch-default warning so I just removed the unreachable.

175

If I am interpreting it correctly (this is adapted from TextDiagnostic.cpp), I believe the passed-in Filename will just be returned unaltered. This is probably one of those functions that would be good to pull out during refactoring.

190

There seems to be Windows-specific tests that rely on this behavior, but I'm not too sure what the harm in unifying it would be. There's some discussion on it here: D59415.

clang/test/Frontend/sarif-diagnostics.cpp
4

Yes, I meant "this FileCheck implementation" (back then I had the unit test implementation in here as well but I have since removed it).

aaron.ballman added inline comments.Aug 23 2022, 5:55 AM
clang/include/clang/Frontend/SARIFDiagnosticPrinter.h
64–66

My preference is to use a unique_ptr whenever possible so that ownership semantics are clear. So my recommendation is to have the user construct the unique_ptr and hand it off through this interface with a move.

clang/lib/Frontend/SARIFDiagnostic.cpp
190

Then let's skip this for now -- if there's something we want to address, we can do it in a follow-up.

abrahamcd marked 4 inline comments as done.

Modified Printer's interface to use unique pointers when setting the SARIF writer.

abrahamcd updated this revision to Diff 454926.EditedAug 23 2022, 12:22 PM

Documented issue with representing PresumedLocs modified by #line directives.

abrahamcd added inline comments.Aug 23 2022, 1:04 PM
clang/lib/Frontend/SARIFDiagnostic.cpp
117

I noticed that when processing additional source ranges, regular source locations are used instead of the presumed locations. Is this an intentional difference from single location diagnostics? (This issue is present in the current TextDiagnostics as well.)

130–133

There's an issue I encountered with adding presumed locations that have been modified by a #line directive to SARIF. To add a location, the Writer requires a CharSourceRange, which uses regular SourceLocations. If the presumed location has a modified line number, the source manager will still interpret it as a regular line number in the source file and can cause the wrong column number to be added to the SARIF object.

@vaibhav.y Do you have an idea of how involved it would be to add an option to the document writer that adds locations without first creating a CharSourceRange?

cjdb added inline comments.Aug 23 2022, 1:08 PM
clang/lib/Frontend/FrontendAction.cpp
728–729

Now that the interface is using unique_ptr, we should replace with std::make_unique.

clang/lib/Frontend/SARIFDiagnostic.cpp
130

Please assign this a GitHub issue. You can assign the issue to me.

192

There's a bug either here or in the function's interface: the function returns a StringRef to a stack object. Do we really need ownership here?

clang/lib/Frontend/SARIFDiagnosticPrinter.cpp
36–37

This makes me very uncomfortable. @aaron.ballman do we have a smart-pointer that works like variant<T*, unique_ptr<T>>?

44

Please replace with make_unique.

abrahamcd updated this revision to Diff 454950.Aug 23 2022, 1:47 PM
abrahamcd marked 2 inline comments as done.

Added use of std::make_unique.

cjdb added inline comments.Aug 23 2022, 3:45 PM
clang/include/clang/Frontend/SARIFDiagnosticPrinter.h
42

We don't need to move a StringRef.

clang/lib/Frontend/SARIFDiagnosticPrinter.cpp
44

This won't work because .get() only returns an observer, so the object will be deleted when this statement completes. We'll need to do

SARIFDiag = std::make_unique<SARIFDiagnostic>(*OS, LO, &*DiagOpts, &*Writer);
abrahamcd updated this revision to Diff 455005.Aug 23 2022, 4:23 PM

Removed use of pointer reset in Printer.

abrahamcd marked 2 inline comments as done.Aug 23 2022, 4:23 PM
cjdb accepted this revision.Aug 24 2022, 12:32 PM

Open comments notwithstanding, I'm happy with this patch. Thank you for working on this, it's a huge step towards getting more helpful diagnostics for C++!

@aaron.ballman, @vaibhav.y, do you folks have any further concerns? (@denik has some commentary incoming)

clang/include/clang/Frontend/SARIFDiagnostic.h
25–27

These can go in the private section below.

clang/include/clang/Frontend/SARIFDiagnosticPrinter.h
73

This can't ever happen, because there's no constructor that doesn't set OwnsOutputStream.

clang/lib/Frontend/SARIFDiagnostic.cpp
87–89

It seems @aaron.ballman has finally trained me :(

This revision is now accepted and ready to land.Aug 24 2022, 12:32 PM
aaron.ballman added inline comments.Aug 24 2022, 2:30 PM
clang/include/clang/Frontend/SARIFDiagnosticPrinter.h
32

Please make OS a pointer instead of a reference, since member references make usage very unpleasant.

Err... I disagree (and just deleted a comment above asking for OS to turn into a reference rather than a pointer because it can never be null). I think we want to use a reference here because 1) it conveys the correct "I can never be null" semantics so maintainers don't have to ask when that can happen, and 2) copies of this class are a bad idea to begin with (we should probably delete the copy and move operations).

We use reference members elsewhere: https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Sema/Sema.h#L407 (not that I would hold Sema up as an example of best practices, lol).

aaron.ballman added inline comments.Aug 24 2022, 2:32 PM
clang/lib/Frontend/SARIFDiagnostic.cpp
87–89

Macro ohno:

My work here is done. ;-)

abrahamcd updated this revision to Diff 455394.Aug 24 2022, 3:13 PM
abrahamcd marked 4 inline comments as done.

Deleted copy/move from renderer and returned OS to reference.

denik added inline comments.Aug 24 2022, 4:27 PM
clang/include/clang/Frontend/SARIFDiagnostic.h
25–27

In addition, I think it's worth putting a comment here about Writer's ownership and why it's a raw pointer here (like we discussed offline).

clang/include/clang/Frontend/SARIFDiagnosticPrinter.h
3

Please adjust the length to fit it in one line.

clang/lib/Frontend/SARIFDiagnostic.cpp
3

Same here.

39

Could you please add FIXME with a pointer to llvm-project/issues/57323 to refactor Diagnostic classes?

59

I think we should add a test case when Loc or/and Source Manager is not set.
If Loc is not set SarifDocumentWriter might not create the locations property which is required by the spec.
If this is the case we need to fix it. But I'm not sure if emitDiagnosticMessage() is going to be called when Loc.isInvalid() ).

79

@abrahamcd please file the llvm-project/issues and link it here.

117

Put FIXME here (if it's not already somewhere else).

192

Note: this is not a particularly small string when it requires 4k by default.

This still has to be addressed.

clang/lib/Frontend/SARIFDiagnosticPrinter.cpp
3

Same as above. One line.

aaron.ballman added inline comments.Aug 25 2022, 4:51 AM
clang/include/clang/Frontend/SARIFDiagnostic.h
31–34

Formatting nits.

46–48

Move this implementation to live with the rest and give it an assertion?

clang/include/clang/Frontend/SARIFDiagnosticPrinter.h
73

This can go away, this never owns the output stream.

clang/lib/Frontend/SARIFDiagnosticPrinter.cpp
36–37

No, but we shouldn't need it -- this object *never* owns the output stream, so I think we should remove the member variable and this code (we can default the dtor then).

43

Should we also assert that we don't already have a valid SARIFDiag object (in case someone calls BeginSourceFile() twice)?

54

Should we be invalidating that SARIFDiag object here so the printer cannot be used after the source file has ended?

cjdb added inline comments.Aug 25 2022, 11:46 AM
clang/lib/Frontend/SARIFDiagnostic.cpp
192

There's a bug either here or in the function's interface: the function returns a StringRef to a stack object. Do we really need ownership here?

Disregard, I misread TmpFilename as Filename and thought we were returning a dangling reference. That isn't the case, so there is no bug.

abrahamcd updated this revision to Diff 455728.Aug 25 2022, 3:00 PM
abrahamcd marked 15 inline comments as done.

Addressed some refactoring and reformatting comments.

clang/include/clang/Frontend/SARIFDiagnostic.h
46–48

This function is currently being called from the DiagnosticRenderer class that both Text and SARIFDiagnostics inherit from. Maybe this could be part of the refactoring, making sure that any text-specific function calls are moved to TextDiagnostic rather than being in the general Renderer base class?

clang/lib/Frontend/SARIFDiagnostic.cpp
59

I believe if Loc is invalid, it goes to one of those special cases I mentioned in this review.

cjdb added inline comments.Aug 25 2022, 3:01 PM
clang/lib/Frontend/SARIFDiagnosticPrinter.cpp
54

I suggested that we don't do this, because the next run of BeginSourceFile can take care of that when making the new SARIFDiagnostic. However, if we were to put an assert at the top of every member function other than BeginSourceFile to ensure that SARIFDiag isn't null (and one at the top of BeginSourceFile to ensure that it _is_), I think there might be good value in re-adding this.

abrahamcd updated this revision to Diff 455729.Aug 25 2022, 3:04 PM

Ran clang-format.

denik accepted this revision.Aug 25 2022, 3:32 PM
denik added inline comments.
clang/lib/Frontend/SARIFDiagnostic.cpp
59

If it does then "if" is redundant here.

aaron.ballman accepted this revision.Aug 26 2022, 4:41 AM

LGTM!

clang/include/clang/Frontend/SARIFDiagnostic.h
46–48

Ah, good call, the assert definitely would be wrong then. Let's leave it alone for now, it's fine as-is.

clang/lib/Frontend/SARIFDiagnosticPrinter.cpp
54

SGTM, thanks for weighing in!

abrahamcd marked 10 inline comments as done.

Resolved pending comments.

clang/lib/Frontend/SARIFDiagnostic.cpp
59

Looked closer and the base Renderer calls this function in both cases when Loc is valid and when it is invalid, so the extra check is necessary. In the case that Loc is invalid, you're right that results will not have locations. This might be something to add to the Writer, I couldn't find a way to add an empty locations object.

192

Made it smaller for now, but this should be revisited during refactoring, since text diag still uses 4096

Deleted copy and move for Printer and added asserts.

Reset SARIFDiag during EndSourceFile().

This revision was landed with ongoing or failed builds.Aug 26 2022, 11:50 AM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Aug 26 2022, 12:30 PM

This adds warnings:

[498/568] CXX obj/clang/lib/Frontend/Frontend.SARIFDiagnostic.o
../../clang/lib/Frontend/SARIFDiagnostic.cpp:74:25: warning: unused variable 'Filename' [-Wunused-variable]
        llvm::StringRef Filename =
                        ^
In file included from ../../clang/lib/Frontend/SARIFDiagnostic.cpp:9:
../../clang/include/clang/Frontend/SARIFDiagnostic.h:58:16: warning: private field 'OS' is not used [-Wunused-private-field]
  raw_ostream &OS;
               ^
dyung added a subscriber: dyung.Aug 26 2022, 10:33 PM
dyung added inline comments.
clang/test/Frontend/sarif-diagnostics.cpp
8

Can this CHECK line be broken into smaller pieces? This test is failing on our internal bot, and I'm going crosseyed trying to figure out what the difference is because the line is 3741 characters long!

dyung added inline comments.Aug 26 2022, 10:41 PM
clang/test/Frontend/sarif-diagnostics.cpp
8

Is there any significance to the "ruleId" and "Id" values in this test? If not, can we just use a regex to check for a number? Our internal build with private changes is failing due to slightly different numbers in some of the "ruleId" fields

uabelho added inline comments.Aug 29 2022, 6:22 AM
clang/test/Frontend/sarif-diagnostics.cpp
8

+1 on this. We also get different "ruleId" and "Id" values downstream and it's quite painful trying to figure out where the diffs are in a +3700 character CHECK.

aaron.ballman added inline comments.Aug 29 2022, 6:43 AM
clang/test/Frontend/sarif-diagnostics.cpp
8

I ran into this problem this morning as well. I've improved the situation somewhat in b345be177d03166add391f090fd0288a23413934, but I think we really could use some better diff tooling than FileCheck for diffing between two JSON files.

The test can still be made less fragile despite the changes I made, which were just enough to get the test passing for me locally again, but running considerably faster (it turns out long FileCheck lines take a while to process).