Enables Clang to emit diagnostics in SARIF format when
-fdiagnostics-format=sarif. Adds a new DiagnosticConsumer named
SARIFDiagnosticPrinter and a new DiagnosticRenderer named SARIFDiagnostic
to constuct and emit a SARIF object containing the run's basic diagnostic info.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/lib/Frontend/SARIFDiagnostic.cpp | ||
---|---|---|
76 | @vaibhav.y , just wanted to confirm, the SarifDiagnosticWriter can only add locations that include row/column numbers, correct? | |
clang/lib/Frontend/SARIFDiagnosticPrinter.cpp | ||
72 | Does anyone know specific kinds of diagnostics that might be covered here in traditional diagnostics? I'm having trouble thinking of how we can support a simpler path for these kinds of diagnostics in SARIF since the Writer requires a source manager. | |
clang/test/Frontend/sarif-diagnostics.cpp | ||
4 | In regards to testing this, my understanding is that we are moving away from the unit test GTest testing, correct? I included a FileCheck implementation as well, but I wasn't sure how effective just checking the whole SARIF object like this would be. |
clang/lib/Frontend/SARIFDiagnostic.cpp | ||
---|---|---|
76 | Yes, this is currently not supported. |
clang/lib/Frontend/SARIFDiagnostic.cpp | ||
---|---|---|
156 | Does this need an llvm_unreachable after the switch? | |
178 | ||
clang/lib/Frontend/SARIFDiagnosticPrinter.cpp | ||
72 | The location maybe if the diagnostic's source is located in the scratch buffer. Likely for macro expansions where token pasting is involved. Another case would be errors on the command line. I'm not entirely sure how the SARIF spec would handle this case, it might require an extension. A few ways that might work could be: Using the logicalLocations property to specify (logicalLocation object), this might need an extension for kind: "macro", another case that might need extension is diagnostics about invalid command line flags which are also diagnostics without a valid The parentIndex for these logical locations could be set to the physical location that produced them. I think this definitely warrants some discussion since the spec doesn't provide a clear way to express these cases. WDYT @aaron.ballman @cjdb @denik |
clang/lib/Frontend/SARIFDiagnosticPrinter.cpp | ||
---|---|---|
72 | The spec does say for "kind":
So an extension might not be necessary, but might be worth discussing. |
Precommit CI found a relevant failure.
clang/lib/Frontend/SARIFDiagnosticPrinter.cpp | ||
---|---|---|
72 | From looking through the spec, I think logicalLocations is probably the right choice and we'd want to make up our own kind for things like the scratch buffer or the command line. I think an extension would be worth discussing. |
I think this was just from the unfinished unit test. I can go ahead and remove clang/unittests/Frontend/SARIFDiagnosticTest.cpp entirely if we don't plan to use unit tests and the FileCheck test in clang/test/Frontend/sarif-diagnostics.cpp is enough.
Thanks Abraham!
Please mention in the summary that some part of SARIF Diag implementation was copied from Text Diag and further refactoring is needed.
Also, like we discussed:
- drop the unittest;
- replace strict match of json output in FileCheck with partial matches of the fields that we are interested in (if possible);
- rebase and incorporate setDefaultConfiguration from D131084.
- on the logicalLocations I have to take a look if we can add it in this patch or defer it and add a warning that this diagnostic is currently unsupported.
clang/include/clang/Frontend/SARIFDiagnosticPrinter.h | ||
---|---|---|
41 | std::unique_ptr<SarifDocumentWriter> Writer; | |
clang/lib/Frontend/SARIFDiagnosticPrinter.cpp | ||
49 | Value(Writer->createDocument()); |
Added diagnostic level and default configuration to SARIF output.
Removed unit test file.
clang/lib/Frontend/SARIFDiagnostic.cpp | ||
---|---|---|
156 | I guess no, because of breaks. But... |
clang/include/clang/Frontend/SARIFDiagnosticPrinter.h | ||
---|---|---|
43 | There's not a lot of benefit to using bit-fields here yet, so I'd make this field a bool instead for the time being. If we add more fields here where packing them into a bit-field helps, then we can revisit at that time. | |
46 | ||
64–66 | This interface seems dangerous to me -- the caller has no idea that the pointer will be stashed off as a unique_ptr and so the caller might expect to still be able to use its (now not unique) pointer value. | |
clang/lib/Frontend/SARIFDiagnostic.cpp | ||
44 | ||
44–48 | ||
60–62 | ||
91–93 | ||
106–108 | ||
128 | ||
147 | This is reachable and it's a programmer mistake if we get here. | |
163–164 | ||
170–174 | ||
175 | Please spell out the type since it's not spelled out in the initialization. Also, the next line is checking for a file -- what if File is null? | |
190 | I'm not certain we want a difference in behavior here. For starters, Windows also has hardlinks and symlinks (they're not exactly the same as on *nix systems though), so I think the assumption these paths point to the same place is valid. But also, platform-specific behavior should be exceedingly rare in most parts of the compiler. Is there significant harm in unifying the behavior across platforms? | |
192 | Note: this is not a particularly small string when it requires 4k by default. | |
clang/lib/Frontend/SARIFDiagnosticPrinter.cpp | ||
28–31 | It's ugly but it's the pattern we use all over the codebase (and it corrects the naming convention issues). | |
57–58 | ||
72 |
This looks great, thank you so much @abrahamcd!
clang/include/clang/Frontend/SARIFDiagnosticPrinter.h | ||
---|---|---|
32 | Please make OS a pointer instead of a reference, since member references make usage very unpleasant. | |
32–43 | Nit: please move all private members below the public interface, where possible. | |
43 | Can we make this a bool instead? Unless I'm mistaken, there's 31 bits of padding for OwnsOutputStream, as opposed to a bool's (almost certainly) eight bits. | |
53 | We should be passing in a StringRef rather than a string, since this will unnecessarily construct a string for C string literals. | |
clang/lib/Frontend/FrontendAction.cpp | ||
726–729 | We shouldn't have raw owning pointers, since they're vulnerable to memory leaks. Even though LLVM doesn't throw exceptions, it's still good practice to avoid them. | |
clang/lib/Frontend/SARIFDiagnostic.cpp | ||
31–32 | Please replace this using-directive with a namespace scope. | |
79 | I think it would be good to file an issue on GitHub and change to FIXME(llvm-project/<issue>). @aaron.ballman WDYT? | |
87–89 | I haven't seen any use of RI or RE in this loop, except to dereference RI, so we should change this to a range-for loop. | |
163 | ||
211–224 | If these aren't called, then they should probably assert and say that they're not implemented. | |
clang/lib/Frontend/SARIFDiagnosticPrinter.cpp | ||
25–26 | Please replace with a namespace scope. | |
72 | We should defer this to a future CL, so that Abraham isn't blocked by our decision-making (and so we can make the right decision). I can start a GitHub issue to get the discussion in a good spot? | |
clang/test/Frontend/sarif-diagnostics.cpp | ||
4 |
Yep, that's correct.
Isn't this the FileCheck test? |
clang/lib/Frontend/SARIFDiagnosticPrinter.cpp | ||
---|---|---|
72 | I agree that logicalLocations is a good use for the mentioned cases. In addition, the spec mentions that location object can miss both physical and logical location properties in which cases it has to include the message property (see 3.28.1). I think this also could be a good use case for the command line flag errors. To unblock this change we can leave result.locations empty. It's allowed by the spec (see 3.27.12). From what I see, Sarif.cpp doesn't use the source manager if Locations is empty. |
Addressed review comments on formatting, style, C++ best practices.
clang/include/clang/Frontend/SARIFDiagnosticPrinter.h | ||
---|---|---|
64–66 | Would it be best to store as a regular pointer instead of a unique pointer in this case then? | |
clang/lib/Frontend/SARIFDiagnostic.cpp | ||
163 | This creates -Wcovered-switch-default warning so I just removed the unreachable. | |
175 | If I am interpreting it correctly (this is adapted from TextDiagnostic.cpp), I believe the passed-in Filename will just be returned unaltered. This is probably one of those functions that would be good to pull out during refactoring. | |
190 | There seems to be Windows-specific tests that rely on this behavior, but I'm not too sure what the harm in unifying it would be. There's some discussion on it here: D59415. |
clang/test/Frontend/sarif-diagnostics.cpp | ||
---|---|---|
4 | Yes, I meant "this FileCheck implementation" (back then I had the unit test implementation in here as well but I have since removed it). |
clang/include/clang/Frontend/SARIFDiagnosticPrinter.h | ||
---|---|---|
64–66 | My preference is to use a unique_ptr whenever possible so that ownership semantics are clear. So my recommendation is to have the user construct the unique_ptr and hand it off through this interface with a move. | |
clang/lib/Frontend/SARIFDiagnostic.cpp | ||
190 | Then let's skip this for now -- if there's something we want to address, we can do it in a follow-up. |
clang/lib/Frontend/SARIFDiagnostic.cpp | ||
---|---|---|
117 | I noticed that when processing additional source ranges, regular source locations are used instead of the presumed locations. Is this an intentional difference from single location diagnostics? (This issue is present in the current TextDiagnostics as well.) | |
130–133 | There's an issue I encountered with adding presumed locations that have been modified by a #line directive to SARIF. To add a location, the Writer requires a CharSourceRange, which uses regular SourceLocations. If the presumed location has a modified line number, the source manager will still interpret it as a regular line number in the source file and can cause the wrong column number to be added to the SARIF object. @vaibhav.y Do you have an idea of how involved it would be to add an option to the document writer that adds locations without first creating a CharSourceRange? |
clang/lib/Frontend/FrontendAction.cpp | ||
---|---|---|
728–729 | Now that the interface is using unique_ptr, we should replace with std::make_unique. | |
clang/lib/Frontend/SARIFDiagnostic.cpp | ||
130 | Please assign this a GitHub issue. You can assign the issue to me. | |
192 | There's a bug either here or in the function's interface: the function returns a StringRef to a stack object. Do we really need ownership here? | |
clang/lib/Frontend/SARIFDiagnosticPrinter.cpp | ||
36–37 | This makes me very uncomfortable. @aaron.ballman do we have a smart-pointer that works like variant<T*, unique_ptr<T>>? | |
44 | Please replace with make_unique. |
clang/include/clang/Frontend/SARIFDiagnosticPrinter.h | ||
---|---|---|
42 | We don't need to move a StringRef. | |
clang/lib/Frontend/SARIFDiagnosticPrinter.cpp | ||
44 | This won't work because .get() only returns an observer, so the object will be deleted when this statement completes. We'll need to do SARIFDiag = std::make_unique<SARIFDiagnostic>(*OS, LO, &*DiagOpts, &*Writer); |
Open comments notwithstanding, I'm happy with this patch. Thank you for working on this, it's a huge step towards getting more helpful diagnostics for C++!
@aaron.ballman, @vaibhav.y, do you folks have any further concerns? (@denik has some commentary incoming)
clang/include/clang/Frontend/SARIFDiagnostic.h | ||
---|---|---|
25–27 | These can go in the private section below. | |
clang/include/clang/Frontend/SARIFDiagnosticPrinter.h | ||
73 | This can't ever happen, because there's no constructor that doesn't set OwnsOutputStream. | |
clang/lib/Frontend/SARIFDiagnostic.cpp | ||
87–89 | It seems @aaron.ballman has finally trained me :( |
clang/include/clang/Frontend/SARIFDiagnosticPrinter.h | ||
---|---|---|
32 |
Err... I disagree (and just deleted a comment above asking for OS to turn into a reference rather than a pointer because it can never be null). I think we want to use a reference here because 1) it conveys the correct "I can never be null" semantics so maintainers don't have to ask when that can happen, and 2) copies of this class are a bad idea to begin with (we should probably delete the copy and move operations). We use reference members elsewhere: https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Sema/Sema.h#L407 (not that I would hold Sema up as an example of best practices, lol). |
clang/lib/Frontend/SARIFDiagnostic.cpp | ||
---|---|---|
87–89 | My work here is done. ;-) |
clang/include/clang/Frontend/SARIFDiagnostic.h | ||
---|---|---|
25–27 | In addition, I think it's worth putting a comment here about Writer's ownership and why it's a raw pointer here (like we discussed offline). | |
clang/include/clang/Frontend/SARIFDiagnosticPrinter.h | ||
3 | Please adjust the length to fit it in one line. | |
clang/lib/Frontend/SARIFDiagnostic.cpp | ||
3 | Same here. | |
39 | Could you please add FIXME with a pointer to llvm-project/issues/57323 to refactor Diagnostic classes? | |
59 | I think we should add a test case when Loc or/and Source Manager is not set. | |
79 | @abrahamcd please file the llvm-project/issues and link it here. | |
117 | Put FIXME here (if it's not already somewhere else). | |
192 |
This still has to be addressed. | |
clang/lib/Frontend/SARIFDiagnosticPrinter.cpp | ||
3 | Same as above. One line. |
clang/include/clang/Frontend/SARIFDiagnostic.h | ||
---|---|---|
31–34 | Formatting nits. | |
46–48 | Move this implementation to live with the rest and give it an assertion? | |
clang/include/clang/Frontend/SARIFDiagnosticPrinter.h | ||
73 | This can go away, this never owns the output stream. | |
clang/lib/Frontend/SARIFDiagnosticPrinter.cpp | ||
36–37 | No, but we shouldn't need it -- this object *never* owns the output stream, so I think we should remove the member variable and this code (we can default the dtor then). | |
43 | Should we also assert that we don't already have a valid SARIFDiag object (in case someone calls BeginSourceFile() twice)? | |
54 | Should we be invalidating that SARIFDiag object here so the printer cannot be used after the source file has ended? |
clang/lib/Frontend/SARIFDiagnostic.cpp | ||
---|---|---|
192 |
Disregard, I misread TmpFilename as Filename and thought we were returning a dangling reference. That isn't the case, so there is no bug. |
Addressed some refactoring and reformatting comments.
clang/include/clang/Frontend/SARIFDiagnostic.h | ||
---|---|---|
46–48 | This function is currently being called from the DiagnosticRenderer class that both Text and SARIFDiagnostics inherit from. Maybe this could be part of the refactoring, making sure that any text-specific function calls are moved to TextDiagnostic rather than being in the general Renderer base class? | |
clang/lib/Frontend/SARIFDiagnostic.cpp | ||
59 | I believe if Loc is invalid, it goes to one of those special cases I mentioned in this review. |
clang/lib/Frontend/SARIFDiagnosticPrinter.cpp | ||
---|---|---|
54 | I suggested that we don't do this, because the next run of BeginSourceFile can take care of that when making the new SARIFDiagnostic. However, if we were to put an assert at the top of every member function other than BeginSourceFile to ensure that SARIFDiag isn't null (and one at the top of BeginSourceFile to ensure that it _is_), I think there might be good value in re-adding this. |
clang/lib/Frontend/SARIFDiagnostic.cpp | ||
---|---|---|
59 | If it does then "if" is redundant here. |
Resolved pending comments.
clang/lib/Frontend/SARIFDiagnostic.cpp | ||
---|---|---|
59 | Looked closer and the base Renderer calls this function in both cases when Loc is valid and when it is invalid, so the extra check is necessary. In the case that Loc is invalid, you're right that results will not have locations. This might be something to add to the Writer, I couldn't find a way to add an empty locations object. | |
192 | Made it smaller for now, but this should be revisited during refactoring, since text diag still uses 4096 |
This adds warnings:
[498/568] CXX obj/clang/lib/Frontend/Frontend.SARIFDiagnostic.o ../../clang/lib/Frontend/SARIFDiagnostic.cpp:74:25: warning: unused variable 'Filename' [-Wunused-variable] llvm::StringRef Filename = ^ In file included from ../../clang/lib/Frontend/SARIFDiagnostic.cpp:9: ../../clang/include/clang/Frontend/SARIFDiagnostic.h:58:16: warning: private field 'OS' is not used [-Wunused-private-field] raw_ostream &OS; ^
clang/test/Frontend/sarif-diagnostics.cpp | ||
---|---|---|
8 | Can this CHECK line be broken into smaller pieces? This test is failing on our internal bot, and I'm going crosseyed trying to figure out what the difference is because the line is 3741 characters long! |
clang/test/Frontend/sarif-diagnostics.cpp | ||
---|---|---|
8 | Is there any significance to the "ruleId" and "Id" values in this test? If not, can we just use a regex to check for a number? Our internal build with private changes is failing due to slightly different numbers in some of the "ruleId" fields |
clang/test/Frontend/sarif-diagnostics.cpp | ||
---|---|---|
8 | +1 on this. We also get different "ruleId" and "Id" values downstream and it's quite painful trying to figure out where the diffs are in a +3700 character CHECK. |
clang/test/Frontend/sarif-diagnostics.cpp | ||
---|---|---|
8 | I ran into this problem this morning as well. I've improved the situation somewhat in b345be177d03166add391f090fd0288a23413934, but I think we really could use some better diff tooling than FileCheck for diffing between two JSON files. The test can still be made less fragile despite the changes I made, which were just enough to get the test passing for me locally again, but running considerably faster (it turns out long FileCheck lines take a while to process). |
clang/test/Frontend/sarif-diagnostics.cpp | ||
---|---|---|
8 | This check line assumes the clang directory is called clang, which is not true for the release tarballs that we distribute. Need to drop the clang/ directory from the path name. |
clang/test/Frontend/sarif-diagnostics.cpp | ||
---|---|---|
8 | Good catch, I'll fix that up shortly. |
clang/test/Frontend/sarif-diagnostics.cpp | ||
---|---|---|
8 | This should be fixed by 0647f4a77a2f9aab84c961982193bea1a9eb1585; please let me know if there's still a problem. |
These can go in the private section below.