This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Basic/
-
clang/
-
Basic/
57/64
Sarif.h
-
lib/Basic/
-
Basic/
-
CMakeLists.txt
34/48
Sarif.cpp
-
unittests/Basic/
-
Basic/
-
CMakeLists.txt
-
SarifTest.cpp

Differential D109701

[clang] Emit SARIF Diagnostics: Create `clang::SarifDocumentWriter` interface
ClosedPublic

Authored by vaibhav.y on Sep 13 2021, 9:05 AM.

Download Raw Diff

Details

Reviewers

aaron.ballman
jranieri-grammatech
tra

Commits

rG4b03ad650645: [clang] Emit SARIF Diagnostics: Create `clang::SarifDocumentWriter` interface
rG69fcf4fd5a01: Emit SARIF Diagnostics: Create `clang::SarifDocumentWriter` interface
rG329fae7103d3: [clang] Emit SARIF Diagnostics: Create `clang::SarifDocumentWriter` interface
rG6546fdbe36fd: [clang] Emit SARIF Diagnostics: Create `clang::SarifDocumentWriter` interface

Summary

Create an interface for writing SARIF documents from within clang:

The primary intent of this change is to introduce the interface clang::SarifDocumentWriter, which allows incrementally adding diagnostic data to a JSON backed document. The proposed interface is not yet connected to the compiler internals, which will be covered in future work. As such this change will not change the input/output interface of clang.

Previous discussions:

RFC for this change: https://lists.llvm.org/pipermail/cfe-dev/2021-March/067907.html
https://lists.llvm.org/pipermail/cfe-dev/2021-July/068480.html

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Format code using patch from buildkite

Harbormaster completed remote builds in B133251: Diff 385813.Nov 9 2021, 8:24 AM

Ping.

clang-format is the only check failing as of now (attempting to reformat the <a> links in doc comments.

Thanks for your patience with this review! I'm currently in WG14 meetings this week and out on vacation next week, so my review availability is a bit limited at the moment (sorry for that).

I think this is heading in the right direction, but there are some lifetime issues we need to figure out how to resolve. I pointed out a few such places, but I did not do an exhaustive search to catch them all.

clang/include/clang/Basic/Sarif.h
81–83	One thing that's worth calling out is that `StringRef` is non-owning which means that the argument passed to create the `SarifArtifactLocation` has to outlive the returned object to avoid dangling references. This makes the class a bit more dangerous to use because `Twine` or automatic `std::string` objects may cause lifetime concerns. Should these classes be storing a `std::string` so that the memory is owned by SARIF?
117	`Loc` is already a `SarifArtifactLocation`, did you mean `SarifArtifact` by any chance? (Note, this suggests to me we should mark the ctor's here as `explicit`.)
230–231
244–246	This comment looks incorrect to me.
368	It'd be good to explain why `createRule()` returns a value here and above.
379
clang/include/clang/Basic/SourceLocation.h
441–456 ↗	(On Diff #385813)	Looks like unrelated formatting changes; feel free to submit these as an NFC change if you'd like, but the changes should be reverted from this patch for clarity.
clang/lib/Basic/Sarif.cpp
200	This seems like it'll be a use-after-free because the local `std::string` will be destroyed before the lifetime of the `SarifArtifactLocation` ends.
208–210	No worries! Our style is not... all that typical... so it can be hard to remember.

Rebase on upstream/main:

[clangBasic] Format code
[clangBasic] Mark all constructors taking single values explicit
[clangBasic] Convert StringRef to std::string fields that own the data
[clangBasic] Fixup outdated comments

clang/include/clang/Basic/Sarif.h
81–83	Good point! Will change it to `std::string` to start with. Some fields such as MimeType, RuleID would probably do better with `SmallString`, I haven't been able to find any good measurements on how the lengths of those two are distributed, can it be made into `SmallString<N>` in a future PR?
117	Agree, I've marked all constructors taking a single parameter as explicit.
244–246	Ack, fixed that as well. the right rationale for `uint32_t` is that it is the largest non-negative type that can be safely promoted to `int64_t` (which is what LLVM's json encoder supports)
clang/include/clang/Basic/SourceLocation.h
441–456 ↗	(On Diff #385813)	Ack, these are definitely a result of a bad rebase.
clang/lib/Basic/Sarif.cpp
200	Will run it through a pass of asan & msan, is the best way to add: `-fsanitize=memory -fsanitize=address` to the test CMakeLists.txt & run them? I've changed all strings to `std::string`, so this one should no longer be a problem but I wonder if there's any others I have missed as well.

Harbormaster completed remote builds in B136478: Diff 390396.Nov 29 2021, 11:57 AM

Rename enum members

Harbormaster completed remote builds in B136520: Diff 390448.Nov 29 2021, 2:41 PM

Mark completed comments as done

ping: This is ready for review now.

Thanks for your patience with the review as well!

ping: Requesting review

Thank you for your patience! I finally had the chance to go through this a bit more. I identified a bunch of tiny nits (the review feedback may look scary, but most should be trivial changes), but a few larger things about the design as well that are worth thinking about.

One thing that I'm still a bit worried about is validating the output document. The unit tests are a good start and the sort of thing we need for this functionality, but I'm also worried we won't find substantial design issues until we finally get SARIF results out of Clang or the static analyzer so we can see whether tools can actually *import* the SARIF we produce. I don't think we need in-tree tests for that sort of thing, but it'd definitely make me feel a lot more comfortable if we had external evidence that we're producing valid SARIF given that this is an exchange format. That said, I also don't want this to become a cumbersome patch for you to work on or for us to review, so I'm not certain what the best answer is here yet.

clang/include/clang/Basic/Sarif.h
47	Having thought about this a bit more, I think this line should be removed because it's within a header file, so all includes of this header file will be impacted and likely without knowing it. I'm very sorry for the churn, but I think this should go and the `llvm::` prefixes should be brought back within the header file. Within the source file, it's fine to use the `using`.
81–83	Some fields such as MimeType, RuleID would probably do better with SmallString, I haven't been able to find any good measurements on how the lengths of those two are distributed, can it be made into SmallString<N> in a future PR? Yeah, I think that can be done in a follow-up if we find a performance benefit from it.
113	Should we delete the default ctor?
163	One question this raises for me is whether we should be enforcing invariants from the SARIF spec as part of this interface or not. Currently, you can create a thread flow that has no importance or a rule without a name/id, etc. That means it's pretty easy to create SARIF that won't validate properly. One possible way to alleviate this would be for the `create()` methods/ctors to require these be set when creating the objects. However, I can imagine there will be times when that is awkward due to following the builder pattern with the interface. Another option would be to have some validation methods on each of the interfaces and the whole tree gets validated after construction, but this could have performance impacts. What are your thoughts?
249
340
341	Once you use the default ctor, there's no way to associate language options with the document writer. Is it wise to expose this constructor? (If we didn't, then we could store the `LangOptions` as a const reference instead of making a copy in the other constructor. Given that we never mutate the options, I think that's a win.)
343
345
347
401
407
408
clang/lib/Basic/Sarif.cpp
55–57
73–75
114–121
200	Will run it through a pass of asan & msan, is the best way to add: -fsanitize=memory -fsanitize=address to the test CMakeLists.txt & run them? Yup! The critical part will be test coverage -- code paths that aren't executed won't get issues reported with them.
220
236–237
249	(At this point, I'll stop commenting on these -- can you go through the patch and make sure that all comments have appropriate terminating punctuation?)

Fixup comments
Mark clang::FullSourceRange as returning const & to its locs
rebase on upstream main

Herald added a project: Restricted Project. · View Herald TranscriptMar 11 2022, 1:53 PM

Hi,

Apologies for the long delay! I was on a break to focus to other projects. I have some changes in mind such as:

Creating the SarifLog object to represent the top-level document. Currently we store this as a JSON object which ends up rather mucky. Having a separate structure can release internal state in SarifDocumentWriter. That way the writer will end up only dealing with the serialization of a SarifLog.

Regarding how to validate documents, I think having a SarifLog::validate() that checks everything underneath is the way to go. Ideally I'd prefer handling on the interface level, but I'm uncertain what would be a good approach. What do you think?

I'll comb through the previous comments again to make sure I'm not missing punctuations, will signal when this is ready for review again.

Thanks!

clang/include/clang/Basic/Sarif.h
113	Agreed, there's no reason for it to be callable. Will do the same for `SarifArtifactLocation`.
341	That's a good observation, I will delete this constructor and expose `SarifDocumentWriter(const LangOptions &LangOpts)` instead.
clang/lib/Basic/Sarif.cpp
249	Ack, sincere apologies again!

Harbormaster completed remote builds in B153838: Diff 414743.Mar 11 2022, 2:28 PM

In D109701#3376127, @vaibhav.y wrote:

Hi,

Apologies for the long delay! I was on a break to focus to other projects.

Not a problem at all!

I have some changes in mind such as:

Creating the SarifLog object to represent the top-level document. Currently we store this as a JSON object which ends up rather mucky. Having a separate structure can release internal state in SarifDocumentWriter. That way the writer will end up only dealing with the serialization of a SarifLog.

That seems like an interesting idea (though I think it could be done in a follow-up).

Regarding how to validate documents, I think having a SarifLog::validate() that checks everything underneath is the way to go. Ideally I'd prefer handling on the interface level, but I'm uncertain what would be a good approach. What do you think?

I think that's a reasonable idea for assert builds to let us know if there are issues with the internal consistency of the data; we do something similar for IR verification in LLVM. But the kind of validating that will tell us whether this is successful is validating against another tool (the whole point to SARIF is to be an exchange format between tools, so self-testing only gets you so much information about how well the implementation works). However, there's no good way to automate that kind of testing in our test suite, so this is more of a request to try the output in tools that can consume SARIF to see if they're successful, and if they're not, see if we can come up with unit tests for those cases.

I'll comb through the previous comments again to make sure I'm not missing punctuations, will signal when this is ready for review again.

Thanks!

clang/lib/Basic/Sarif.cpp
249	No worries! These sort of nits are really easy to miss, it happens to me too. :-)

Update tests to check serialization as well

SARIF text generated is validated externally against [Microsoft's online validator][1]

[1]: https://sarifweb.azurewebsites.net/Validation

@aaron.ballman Would it be possible that I add ::validate through a follow-up PR?

I'm currently checking the JSON output from the writer using Microsoft's online validator, and it is passing. Though it tends to complain about things outside of the spec (e.g. the spec doesn't constrain the toolComponent version to be numeric but the tool requires it to be).

Harbormaster completed remote builds in B168293: Diff 434802.Jun 7 2022, 8:23 AM

https://discourse.llvm.org/t/rfc-improving-clang-s-diagnostics/62584

There is an ongoing RFC similar to the work here, worth noting.

dexonsmith removed a subscriber: dexonsmith.Jun 7 2022, 9:12 PM

vaibhav.y added a subscriber: cjdb.Jun 9 2022, 6:12 AM

Rebase on main branch from upstream git

Harbormaster completed remote builds in B169157: Diff 436030.Jun 10 2022, 3:28 PM

Fix bug detected from ASAN run

Passes -fsanitize=address

Harbormaster completed remote builds in B169215: Diff 436110.Jun 10 2022, 11:18 PM

Aside from some minor nits and the open question about validation, I think this is getting pretty close. I'm curious about the next steps with this though, given that there are no in-tree uses for it currently. You had mentioned you planned to work on an adapter so that we could eventually start to remove the existing SARIF implementation work at: https://github.com/llvm/llvm-project/blob/main/clang/lib/StaticAnalyzer/Core/SarifDiagnostics.cpp. Are you still intending to work on that, or has the work from @cjdb changed your plans for the next steps? (I mostly am trying to figure out whether we should land this now because there's enough momentum that it will start being used "Real Soon Now" or whether we should wait for a bit until there's a patch to make use of this and then land the whole patch stack.)

clang/include/clang/Basic/Sarif.h
83	We don't typically use top-level `const` like this (same applies elsewhere) unless it's on a pointee/reference.
163	Are you still intending to add a `validate()` interface to take care of this, or are you still thinking about how to enforce invariants during construction (or some combination of the two)?
303–306	FWIW, this use of top-level `const` is fine (we do use it for data member variables).

Thanks, will push changes to address the comments soon.

As I understood from our discussion the work @cjdb has planned would create a new DiagnosticsConsumer, it can be started in parallel but would need the changes in D109701 to complete. I have other work planned for SARIF as well, some notes here: https://gist.github.com/envp/6abc3230dcc5043416c86aefb3d24419 (@cjdb plans to address the "BRIDGE" component)

For validation, I think having a hybrid approach is best here. As much as possible we should try to be correct by construction, but we certainly need to have validation before we serialize on the data.

My plan is to iterate on the interface as follows:

To decompose SarifDocumentWriter, into: SarifLog & other builders, along with a SarifLogSerializer whose sole responsibility is to construct the json::Value. This should improve the number of properties that are correct by construction.
Try to use this newer interface with https://github.com/llvm/llvm-project/blob/main/clang/lib/StaticAnalyzer/Core/SarifDiagnostics.cpp, and incorporate any new features I think might be useful

Do you think it would it better to use the changes here in libStaticAnalyzer before they land & then iterate on it? I don't have a strong preference for either approach so I'd defer to your experience.

I'm also unfamiliar with the workflow for stacking patches in phabricator, is there documentation I can refer to for this? My guess is: I make a git branch whose base is that of D109701, and then arc diff D109701..OTHER_BRANCH to create the phabricator that would use these changes?

clang/include/clang/Basic/Sarif.h
163	Definitely! I plan to add that in a follow-up patch. The goal would be to have as much as possible be correct by construction (through small builders having a limited field set), but we'll still need a `validate()`.

Discard top-level const specifier where target isn't pointee/ref

I think this CL (which is quite large) might be worth just getting the functionality in, and deferring the in-tree usage to a second CL. That second CL probably should just lift the existing diagnostics so that they're in the message component if the SARIF flag is enabled: nothing fancy. I also expect this to be fairly non-intrusive with respect to the compiler, but I expect that the diagnostic engine will need some teaching here.

We're intending @abrahamcd to pilot proper true integration, by getting overload resolution failure diagnostics to be SARIF-operational. That can be done in parallel to the aforementioned CL 2, or at least in a staged pipeline after CL 2 is up for review.

Harbormaster completed remote builds in B171375: Diff 439096.Jun 22 2022, 11:58 AM

Fix formatting in Sarif.h

Harbormaster completed remote builds in B171419: Diff 439153.Jun 22 2022, 3:15 PM

Okay, thanks both for your perspective on this. This LGTM as-is and we can handle validation and integration in subsequent patches. Thank you for this, @vaibhav.y!

This revision is now accepted and ready to land.Jun 23 2022, 11:06 AM

Thanks for your patience with the review as well!

Just noticed that I need to add a link to revision in the commit messages as well: (https://www.llvm.org/docs/Phabricator.html#committing-a-change)

I will update the commit messages, but I cannot commit to trunk.

Regarding further work on validation and integration, I couldn't find any documentation on how to work with stacked changes. Could you please direct me to some documentation for that?

In D109701#3606042, @vaibhav.y wrote:

Thanks for your patience with the review as well!

Likewise!

Just noticed that I need to add a link to revision in the commit messages as well: (https://www.llvm.org/docs/Phabricator.html#committing-a-change)

Yup, you'd link the commit message back to this review, like: Differential Revision: https://reviews.llvm.org/D109701

I will update the commit messages, but I cannot commit to the github repo.

Ah, thank you for letting me know. I can land the changes on your behalf. What name and email address would you like me to use for patch attribution? (I probably won't land it until tomorrow at this point though.)

Regarding further work on validation and integration, I couldn't find any documentation on how to work with stacked changes. Could you please direct me to some documentation for that?

You shouldn't need to use a patch stack once we land the changes in this review (because they'll then be on the main branch). As for documentation on stacked changes, I don't think we have any community specific documentation for it, so I'd recommend using your favorite search engine. If you still don't find much on it, we can discuss it more later.

Reword commit messages

Prefix [tag] specifying which components are affected
Append link to D109701 in commit message per commit message guidelines

I will update the commit messages, but I cannot commit to the github repo.

Ah, thank you for letting me know. I can land the changes on your behalf. What name and email address would you like me to use for patch attribution? (I probably won't land it until tomorrow at this point though.)

Please use:

Name: Vaibhav Yenamandra
email address: vyenamandra@bloomberg.net

Harbormaster completed remote builds in B171681: Diff 439507.Jun 23 2022, 1:50 PM

This revision was landed with ongoing or failed builds.Jun 24 2022, 4:18 AM

Closed by commit rG6546fdbe36fd: [clang] Emit SARIF Diagnostics: Create `clang::SarifDocumentWriter` interface (authored by vaibhav.y, committed by aaron.ballman). · Explain Why

This revision was automatically updated to reflect the committed changes.

aaron.ballman added a commit: rG6546fdbe36fd: [clang] Emit SARIF Diagnostics: Create `clang::SarifDocumentWriter` interface.

aaron.ballman added a reverting change: rG7a3918b540c3: Revert "[clang] Emit SARIF Diagnostics: Create `clang::SarifDocumentWriter`….Jun 24 2022, 4:33 AM

Unfortunately, I had to roll this back in 7a3918b540c30cc630aaae9124c67e5e4db123c2 because there's a layering violation. I've commented on it in the review.

clang/lib/Basic/Sarif.cpp
162	I didn't catch this during the review -- but this is a layering violation that caused link errors on some of the build bots. Lexer can call into Basic, but Basic cannot call into Lexer. So we'll need to find a different way to handle this.

This revision is now accepted and ready to land.Jun 24 2022, 4:36 AM

Marking as needing changes so it's clear this shouldn't be re-landed yet.

This revision now requires changes to proceed.Jun 24 2022, 4:36 AM

vaibhav.y added inline comments.Jun 24 2022, 6:23 AM

clang/lib/Basic/Sarif.cpp
162	Would moving the code to Support, having it depend on Basic & Lex work?

vaibhav.y marked an inline comment as not done.Jun 24 2022, 6:23 AM

aaron.ballman added inline comments.Jun 24 2022, 8:44 AM

clang/lib/Basic/Sarif.cpp
162	I don't think so -- Support is supposed to be a pretty low-level interface; it currently only relies on LLVM's Support library. I think the layering is supposed to be: Support -> Basic -> Lex. As I see it, there are a couple of options as to where this could live. It could live in the Frontend library, as that's where all the diagnostic consumer code in Clang lives. But that library might be a bit "heavy" to pull into other tools (maybe? I don't know). It could also live in AST -- that already links in Basic and Lex. But that feels like a somewhat random place for this to live as this has very little to do with the AST itself. Another approach, which might be better, is to require the user of this interface to pass in the token length calculation themselves in the places where it's necessary. e.g., `json::Object whatever(SourceLocation Start, SourceLocation End, unsigned EndLen)` and then you can remove the reliance on the lexer entirely while keeping the interface in Basic. I'm not certain how obnoxious this suggestion is, but I think it's my preferred approach for the moment (but not a strongly held position yet). WDYT of this approach?

Factor dependency on Lexer::MeasureTokenLength into externally provided functor

Introduces a type: TokenLengthMetric which measures the length of a token
starting at the given SLoc

vaibhav.y added inline comments.Jun 27 2022, 9:41 AM

clang/lib/Basic/Sarif.cpp
162	I think the approach to injecting the function is better here. I've tried to make the smallest change possiblew with passing in a function whose interface is almost identical to `Lexer::MeasureTokenLength`. The intent was to hint at this being the "canonical metric" for token lengths (with an example in the tests for the same). I tried passing start, end locs but couldn't find a strong use case yet since `end` would likely always be: `Lexer::getLocForEndOfToken(start, 0)`

Discard unused includes

aaron.ballman added inline comments.Jun 27 2022, 11:22 AM

clang/include/clang/Basic/Sarif.h
297–298	I worry about the performance aspects of using a callback for this. Calls across DSO boundaries can be slower (due to having to call through a thunk), and it necessitates calling back to the user to calculate information that the user could simply pass in directly. That's on top of `std::function` already being pretty heavy-weight.
clang/lib/Basic/Sarif.cpp
162	I'm not convinced that the less obtrusive change is a good design in this case. But I also agree that we should not use start/end locations either. `SourceLocation` traditionally points to the start of a token, so it would be super easy to get the `end` location wrong by forgetting to pass the location for the end of the token. My suggestion was to continue to pass the start of the starting token, the start of the ending token, and the length of the ending token. With the callback approach, you have to call through the callback to eventually call `Lexer::MeasureTokenLength()`; with the direct approach, you skip needing to call through a callback (which means at least one less function call on every source location operation).

Harbormaster completed remote builds in B172238: Diff 440279.Jun 27 2022, 11:48 AM

vaibhav.y added inline comments.Jun 27 2022, 11:59 AM

clang/lib/Basic/Sarif.cpp
162	Ah, I think I misunderstood your initial suggestion (`json::Object whatever(SourceLocation Start, SourceLocation End, unsigned EndLen)`) seemed like a function call to me, when it seems the suggested change was to pass in an object? Apologies, will fix that up.

Comment isn't done.

aaron.ballman added inline comments.Jun 27 2022, 12:24 PM

clang/lib/Basic/Sarif.cpp
162	Sorry for the confusion! Just to make sure we're on the same page -- my suggestion was to change the function interfaces like `SarficDocumentWriter::createPhysicalLocation()` so that they would take an additional `unsigned EndLen` parameter. However, now that I dig around a bit, it seems like `CharSourceRange` is what you'd want to use there -- then you can assert that what you're given is a char range and not a token range. So you won't need the `unsigned EndLen` parameter after all!

vaibhav.y added inline comments.Jun 27 2022, 12:45 PM

clang/lib/Basic/Sarif.cpp
162	Interesting! Asking for my understanding: If a `CharSourceRange` is a valid character range, then the `End` SLoc points to the last character in the range (even if it is mid token)? (Unlike slocs where it the first character of the last token).

vaibhav.y marked an inline comment as not done.Jun 27 2022, 12:59 PM

Use CharSourceRange instead of FullSourceRange so we no longer need to compute
the length of the last token. This should already be encoded in the end location
of the CharSourceRange by the caller

TODO(@vaibhav.y): Once accepted, drop the commit that introduces: clang::FullSourceRange

Discard include: clang/Lex/Lexer.h

Harbormaster completed remote builds in B172307: Diff 440374.Jun 27 2022, 3:49 PM

Thanks, I think this is heading in the right direction!

clang/include/clang/Basic/Sarif.h
168	Should we assert this source range is not a token range?
281	In an asserts build, should we additionally have a loop to assert that each location is a char range rather than a token range?
clang/lib/Basic/Sarif.cpp
162	Your understanding is correct.

vaibhav.y added inline comments.Jun 29 2022, 11:47 AM

clang/include/clang/Basic/Sarif.h
168	I don't have a strong opinion here (since these are validated downstream in `createPhysicalLocation`) but it makes sense to be defensive & assert early. I'll preserve the downstream one as well in case a new type gets added that feeds data into `createPhysicalLocation` as well.
281	Will do, I had the asserts in `createPhysicalLocation` initially. Adding them at the site of creation makes sense as well.

Assert that CharSourceRanges passed to Threadflow, SarifResult are
character Ranges.

Harbormaster completed remote builds in B172850: Diff 441130.Jun 29 2022, 2:26 PM

LGTM! Thank you for the fixes to this! I'll land this again on your behalf with the same information as before.

This revision is now accepted and ready to land.Jun 30 2022, 10:12 AM

This revision was landed with ongoing or failed builds.Jun 30 2022, 10:26 AM

Closed by commit rG329fae7103d3: [clang] Emit SARIF Diagnostics: Create `clang::SarifDocumentWriter` interface (authored by vaibhav.y, committed by aaron.ballman). · Explain Why

This revision was automatically updated to reflect the committed changes.

aaron.ballman added a commit: rG329fae7103d3: [clang] Emit SARIF Diagnostics: Create `clang::SarifDocumentWriter` interface.

aaron.ballman added a reverting change: rGb46ad1b5be69: Revert "[clang] Emit SARIF Diagnostics: Create `clang::SarifDocumentWriter`….Jun 30 2022, 10:40 AM

I had to roll it back because of failures with test bots:

https://lab.llvm.org/buildbot/#/builders/91/builds/11328

So this was reverted in b46ad1b5be694feefabd4c6cd112cbbd04a7b3a7, can you take a look when you get the chance?

aaron.ballman reopened this revision.Jun 30 2022, 10:41 AM

This revision is now accepted and ready to land.Jun 30 2022, 10:41 AM

In D109701#3622983, @aaron.ballman wrote:

I had to roll it back because of failures with test bots:

https://lab.llvm.org/buildbot/#/builders/91/builds/11328

So this was reverted in b46ad1b5be694feefabd4c6cd112cbbd04a7b3a7, can you take a look when you get the chance?

Odd! I'll try to reproduce that at the earliest. check-clang-unit was passing on my local machine last I checked. I'll try to build this on a RHEL system to be closer to what the buildbot reports.

Apologies for having to revert again!

Discard most likely culprit in test code causing unexpected crash.

@aaron.ballman I was unable to reproduce the issue on my end using the buildbot instructions and ASAN, UBSAN.

Is it possible for you to trigger a pre-merge check on the LLVM cluster?

I have suspicions that it was the SarifResult && type in the test so I changed it to a const SarifResult &.

I've tried running it on a RHEL 7, Darwin on my end.

In D109701#3642561, @vaibhav.y wrote:

Discard most likely culprit in test code causing unexpected crash.

@aaron.ballman I was unable to reproduce the issue on my end using the buildbot instructions and ASAN, UBSAN.

Is it possible for you to trigger a pre-merge check on the LLVM cluster?

Unfortunately, there isn't.

I have suspicions that it was the SarifResult && type in the test so I changed it to a const SarifResult &.

I've tried running it on a RHEL 7, Darwin on my end.

If you think you've got it fixed, I think the best we can do is to re-commit and watch the bots to see how they react. I'll commit again for you and watch them.

I have suspicions that it was the SarifResult && type in the test so I changed it to a const SarifResult &.

I've tried running it on a RHEL 7, Darwin on my end.

If you think you've got it fixed, I think the best we can do is to re-commit and watch the bots to see how they react. I'll commit again for you and watch them.

Thank you! Let us try that.

This revision was landed with ongoing or failed builds.Jul 11 2022, 9:19 AM

Closed by commit rG69fcf4fd5a01: Emit SARIF Diagnostics: Create `clang::SarifDocumentWriter` interface (authored by vaibhav.y, committed by aaron.ballman). · Explain Why

This revision was automatically updated to reflect the committed changes.

aaron.ballman added a commit: rG69fcf4fd5a01: Emit SARIF Diagnostics: Create `clang::SarifDocumentWriter` interface.

aaron.ballman added a reverting change: rGc8a28ae214c0: Revert "Emit SARIF Diagnostics: Create `clang::SarifDocumentWriter` interface".Jul 11 2022, 9:28 AM

Unfortunately, it still seems to be causing failures (I had to revert again):

https://lab.llvm.org/buildbot/#/builders/91/builds/11840

It looks to be the same failure as before (https://lab.llvm.org/buildbot/#/builders/91/builds/11328). :-(

This revision is now accepted and ready to land.Jul 11 2022, 9:31 AM

In D109701#3642748, @aaron.ballman wrote:

Unfortunately, it still seems to be causing failures (I had to revert again):

https://lab.llvm.org/buildbot/#/builders/91/builds/11840

It looks to be the same failure as before (https://lab.llvm.org/buildbot/#/builders/91/builds/11328). :-(

Thanks, this seems to be reproducible. Perhaps there's something special about my environment making it work. I'll try to pare down more (possibly try to build inside a container).

The buildbot properties say its Linux aurora is there an official container / build spec I can use to mimic the environment?

Harbormaster completed remote builds in B174681: Diff 443654.Jul 11 2022, 10:07 AM

In D109701#3642786, @vaibhav.y wrote:

In D109701#3642748, @aaron.ballman wrote:

Unfortunately, it still seems to be causing failures (I had to revert again):

https://lab.llvm.org/buildbot/#/builders/91/builds/11840

It looks to be the same failure as before (https://lab.llvm.org/buildbot/#/builders/91/builds/11328). :-(

Thanks, this seems to be reproducible. Perhaps there's something special about my environment making it work. I'll try to pare down more (possibly try to build inside a container).

Hopefully!

The buildbot properties say its Linux aurora is there an official container / build spec I can use to mimic the environment?

I don't know that there is such a thing, unfortunately. (Many of the bots are bots hosted by various different companies with different views on access to servers, I'd imagine.)

cjdb mentioned this in D129538: [clang] adds prototype for being able to alternate diagnostic formats.Jul 11 2022, 10:00 PM

Hi! I'm interning with @cjdb and @denik this summer and I was working on adding a -fdiagnostics-format=sarif option to start off my project, but I just found that a previous abandoned version of this change (D109697) was intending to add it. Seeing as the flag is no longer included in this version of the change, is it okay for me to go on and add it myself, or are you still planning on adding it here? Thanks!

In D109701#3646856, @abrahamcd wrote:

Hi! I'm interning with @cjdb and @denik this summer and I was working on adding a -fdiagnostics-format=sarif option to start off my project, but I just found that a previous abandoned version of this change (D109697) was intending to add it. Seeing as the flag is no longer included in this version of the change, is it okay for me to go on and add it myself, or are you still planning on adding it here? Thanks!

Sure, feel free to use D109697 as you see fit!

In D109701#3648168, @vaibhav.y wrote:

In D109701#3646856, @abrahamcd wrote:

Hi! I'm interning with @cjdb and @denik this summer and I was working on adding a -fdiagnostics-format=sarif option to start off my project, but I just found that a previous abandoned version of this change (D109697) was intending to add it. Seeing as the flag is no longer included in this version of the change, is it okay for me to go on and add it myself, or are you still planning on adding it here? Thanks!

Sure, feel free to use D109697 as you see fit!

Great, thank you!

@aaron.ballman

The culprit turned out to be the difference in release flags on the build server vs my environment. I had unfortunately run the configuration command once in Debug mode, and hadn't re-configured. Not a bright moment :)

The ASSERT_DEATH tests that I had written weren't gated by NDEBUG, GTEST_HAS_DEATH_TEST (other unit tests use some combination two). This was causing them to pass on my machine but fail pre-merge (which is RelWithDebInfo)

I will gate the tests similar to what https://github.com/llvm/llvm-project/blob/main/clang/unittests/Serialization/InMemoryModuleCacheTest.cpp#L48-L52 does, but force a skip instead.

Running another clean build on my end to ensure I have the right culprit. Thank you for your patience!

Gate death tests on NDEBUG and available of GTEST_HAS_DEATH_TEST

This should fix recent post-merge failures

vaibhav.y edited the summary of this revision. (Show Details)Jul 14 2022, 12:38 PM

vaibhav.y edited the summary of this revision. (Show Details)Jul 14 2022, 12:49 PM

Harbormaster completed remote builds in B175482: Diff 444764.Jul 14 2022, 5:10 PM

Undo test case renames

LGTM assuming precommit CI comes back happy with it, thank you! I'll land it once I see things are green.

Harbormaster completed remote builds in B175623: Diff 444960.Jul 15 2022, 7:15 AM

abrahamcd added a child revision: D129886: [clang] Add -fdiagnostics-format=sarif option for future SARIF output.Jul 15 2022, 11:34 AM

This revision was landed with ongoing or failed builds.Jul 18 2022, 5:38 AM

Closed by commit rG4b03ad650645: [clang] Emit SARIF Diagnostics: Create `clang::SarifDocumentWriter` interface (authored by vaibhav.y, committed by aaron.ballman). · Explain Why

This revision was automatically updated to reflect the committed changes.

aaron.ballman added a commit: rG4b03ad650645: [clang] Emit SARIF Diagnostics: Create `clang::SarifDocumentWriter` interface.

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

Sarif.h

440 lines

lib/

Basic/

CMakeLists.txt

1 line

Sarif.cpp

389 lines

unittests/

Basic/

CMakeLists.txt

1 line

SarifTest.cpp

325 lines

Diff 445466

clang/include/clang/Basic/Sarif.h

This file was added.

//== clang/Basic/Sarif.h - SARIF Diagnostics Object Model -------*- C++ -*--==//

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

/// \file

/// Defines clang::SarifDocumentWriter, clang::SarifRule, clang::SarifResult.

///

/// The document built can be accessed as a JSON Object.

/// Several value semantic types are also introduced which represent properties

/// of the SARIF standard, such as 'artifact', 'result', 'rule'.

///

/// A SARIF (Static Analysis Results Interchange Format) document is JSON

/// document that describes in detail the results of running static analysis

/// tools on a project. Each (non-trivial) document consists of at least one

/// "run", which are themselves composed of details such as:

/// * Tool: The tool that was run

/// * Rules: The rules applied during the tool run, represented by

/// \c reportingDescriptor objects in SARIF

/// * Results: The matches for the rules applied against the project(s) being

/// evaluated, represented by \c result objects in SARIF

///

/// Reference:

/// 1. <a href="https://docs.oasis-open.org/sarif/sarif/v2.1.0/os/sarif-v2.1.0-os.html">The SARIF standard</a>

/// 2. <a href="https://docs.oasis-open.org/sarif/sarif/v2.1.0/os/sarif-v2.1.0-os.html#_Toc34317836">SARIF<pre>reportingDescriptor</pre></a>

/// 3. <a href="https://docs.oasis-open.org/sarif/sarif/v2.1.0/os/sarif-v2.1.0-os.html#_Toc34317638">SARIF<pre>result</pre></a>

//===----------------------------------------------------------------------===//

#ifndef LLVM_CLANG_BASIC_SARIF_H

#define LLVM_CLANG_BASIC_SARIF_H

#include "clang/Basic/SourceLocation.h"

#include "clang/Basic/Version.h"

#include "llvm/ADT/ArrayRef.h"

#include "llvm/ADT/Optional.h"

#include "llvm/ADT/SmallVector.h"

#include "llvm/ADT/StringMap.h"

#include "llvm/ADT/StringRef.h"

#include "llvm/Support/JSON.h"

#include <cassert>

#include <cstddef>

#include <cstdint>

#include <initializer_list>

#include <string>

aaron.ballmanUnsubmitted

Done

Having thought about this a bit more, I think this line should be removed because it's within a header file, so all includes of this header file will be impacted and likely without knowing it. I'm very sorry for the churn, but I think this should go and the llvm:: prefixes should be brought back within the header file. Within the source file, it's fine to use the using.

aaron.ballman: Having thought about this a bit more, I think this line should be removed because it's within a…

namespace clang {

class SarifDocumentWriter;

class SourceManager;

namespace detail {

/// \internal

/// An artifact location is SARIF's way of describing the complete location

/// of an artifact encountered during analysis. The \c artifactLocation object

/// typically consists of a URI, and/or an index to reference the artifact it

/// locates.

///

/// This builder makes an additional assumption: that every artifact encountered

/// by \c clang will be a physical, top-level artifact. Which is why the static

/// creation method \ref SarifArtifactLocation::create takes a mandatory URI

/// parameter. The official standard states that either a \c URI or \c Index

/// must be available in the object, \c clang picks the \c URI as a reasonable

/// default, because it intends to deal in physical artifacts for now.

///

/// Reference:

/// 1. <a href="https://docs.oasis-open.org/sarif/sarif/v2.1.0/os/sarif-v2.1.0-os.html#_Toc34317427">artifactLocation object</a>

/// 2. \ref SarifArtifact

vaibhav.yAuthorUnsubmitted

Done

I'm not sure how to deal with overlength links in docs directly, turning clang format off & on on comments also seems counter-productive. Would it be okay to add an alias in the doxyfile for the root page of SARIF docs?

E.g.:

ALIASES = sarifDocs="https://docs.oasis-open.org/sarif/sarif/v2.1.0/os/sarif-v2.1.0-os.html"

vaibhav.y: I'm not sure how to deal with overlength links in docs directly, turning clang format off & on…

aaron.ballmanUnsubmitted

Done

I think it's fine to ignore the clang-format warnings here (without adding comment markers). It's easier for people reading the links to see the full URL here than adding an alias in the doxyfile, IMO.

aaron.ballman: I think it's fine to ignore the clang-format warnings here (without adding comment markers).

class SarifArtifactLocation {

private:

friend class clang::SarifDocumentWriter;

llvm::Optional<uint32_t> Index;

aaron.ballmanUnsubmitted

Done

friend class clang::SarifDocumentWriter;

- llvm::Optional<uint32_t> Index;

+ Optional<uint32_t> Index;

StringRef URI;

You have a using namespace llvm; at the top of the file, so all these llvm:: nested name specifiers can be removed.

aaron.ballman: You have a `using namespace llvm;` at the top of the file, so all these `llvm::` nested name…

std::string URI;

SarifArtifactLocation() = delete;

aaron.ballmanUnsubmitted

Done

StringRef URI;

- SarifArtifactLocation(const StringRef &URI) : Index(), URI(URI) {}

+ explicit SarifArtifactLocation(StringRef URI) : URI(URI) {}

public:

StringRef is a non-owning reference object anyway, so there's really no gain from passing a const ref to it -- we typically pass it by value. Same suggestion applies elsewhere in the patch.

Also, no need to explicitly init things with default constructors that will be run, like Index.

aaron.ballman: `StringRef` is a non-owning reference object anyway, so there's really no gain from passing a…

explicit SarifArtifactLocation(const std::string &URI) : URI(URI) {}

public:

static SarifArtifactLocation create(llvm::StringRef URI) {

return SarifArtifactLocation{URI.str()};

aaron.ballmanUnsubmitted

Done

One thing that's worth calling out is that StringRef is non-owning which means that the argument passed to create the SarifArtifactLocation has to outlive the returned object to avoid dangling references. This makes the class a bit more dangerous to use because Twine or automatic std::string objects may cause lifetime concerns.

Should these classes be storing a std::string so that the memory is owned by SARIF?

aaron.ballman: One thing that's worth calling out is that `StringRef` is non-owning which means that the…

vaibhav.yAuthorUnsubmitted

Done

Good point! Will change it to std::string to start with.

Some fields such as MimeType, RuleID would probably do better with SmallString, I haven't been able to find any good measurements on how the lengths of those two are distributed, can it be made into SmallString<N> in a future PR?

vaibhav.y: Good point! Will change it to `std::string` to start with. Some fields such as MimeType…

aaron.ballmanUnsubmitted

Done

Some fields such as MimeType, RuleID would probably do better with SmallString, I haven't been able to find any good measurements on how the lengths of those two are distributed, can it be made into SmallString<N> in a future PR?

Yeah, I think that can be done in a follow-up if we find a performance benefit from it.

aaron.ballman: > Some fields such as MimeType, RuleID would probably do better with SmallString, I haven't…

aaron.ballmanUnsubmitted

Done

public:

- static SarifArtifactLocation create(const llvm::StringRef URI) {

+ static SarifArtifactLocation create(llvm::StringRef URI) {

return SarifArtifactLocation{URI.str()};

We don't typically use top-level const like this (same applies elsewhere) unless it's on a pointee/reference.

aaron.ballman: We don't typically use top-level `const` like this (same applies elsewhere) unless it's on a…

}

SarifArtifactLocation setIndex(uint32_t Idx) {

aaron.ballmanUnsubmitted

Done

SarifArtifactLocation &setIndex(uint32_t Idx) {

- this->Index = Idx;

+ Index = Idx;

return *this;

Local style is to not use this-> unless required, so I'd recommend removing all of the uses (and potentially renaming some parameters so there's not a shadowing issue).

aaron.ballman: Local style is to not use `this->` unless required, so I'd recommend removing all of the uses…

Index = Idx;

return *this;

}

};

/// \internal

/// An artifact in SARIF is any object (a sequence of bytes) addressable by

/// a URI (RFC 3986). The most common type of artifact for clang's use-case

/// would be source files. SARIF's artifact object is described in detail in

/// section 3.24.

aaron.ballmanUnsubmitted

Done

/// section 3.24.

- /// Since every in clang artifact MUST have a location (there being no nested

+ /// Since everything in clang artifact MUST have a location (there being no nested

/// artifacts), the creation method \ref SarifArtifact::create requires a

How do we expect to handle artifact locations that don't correspond directly to a file? For example, the user can specify macros on the command line and those macros could have a diagnostic result associated with them. Can we handle that sort of scenario?

aaron.ballman: How do we expect to handle artifact locations that don't correspond directly to a file? For…

vaibhav.yAuthorUnsubmitted

Done

I hadn't considered -D macros. Definitely a valid case to handle. I will respond with more information once I read through the related portions of the SARIF spec. A (sort of hacky?) solution come to mind after a quick glance:

Setting the artifact URI to: data:text/plain:<CLI_ARG> with an offset, and saying that it's role is "referencedOnCommandLine". It seems strange since we need to copy over the command line, instead of referencing it directly. What do you think: https://docs.oasis-open.org/sarif/sarif/v2.1.0/os/sarif-v2.1.0-os.html#_Toc34317617

Dropping a TODO comment so I can update later.

TODO(envp)

vaibhav.y: I hadn't considered `-D` macros. Definitely a valid case to handle. I will respond with more…

aaron.ballmanUnsubmitted

Done

From my reading of SARIF, I think that sounds plausible. At least, I didn't see anything else that would work.

aaron.ballman: From my reading of SARIF, I think that sounds plausible. At least, I didn't see anything else…

/// Since every clang artifact MUST have a location (there being no nested

aaron.ballmanUnsubmitted

Done

/// artifacts), the creation method \ref SarifArtifact::create requires a

- /// \ref SarifArtifactLocation object

+ /// \ref SarifArtifactLocation object.

///

/// Reference:

aaron.ballman:

/// artifacts), the creation method \ref SarifArtifact::create requires a

/// \ref SarifArtifactLocation object.

///

/// Reference:

/// 1. <a href="https://docs.oasis-open.org/sarif/sarif/v2.1.0/os/sarif-v2.1.0-os.html#_Toc34317611">artifact object</a>

class SarifArtifact {

private:

friend class clang::SarifDocumentWriter;

llvm::Optional<uint32_t> Offset;

llvm::Optional<size_t> Length;

std::string MimeType;

aaron.ballmanUnsubmitted

Done

What size would you like this SmallVector to have?

aaron.ballman: What size would you like this `SmallVector` to have?

vaibhav.yAuthorUnsubmitted

Done

Glancing through valid values for roles: my expectation is that each artifact will have a _small_ number of roles associated, so 4 seems like a good threshold here. I don't really have a strong opinion on the size, so I'm open to changing this.

vaibhav.y: Glancing through valid values for roles: my expectation is that each artifact will have a…

SarifArtifactLocation Location;

llvm::SmallVector<std::string, 4> Roles;

aaron.ballmanUnsubmitted

Done

SarifArtifact(const SarifArtifactLocation &Loc)

- : Offset(), Length(), MimeType(), Location(Loc), Roles() {}

+ : Location(Loc) {}

public:

aaron.ballman:

aaron.ballmanUnsubmitted

Done

Should we delete the default ctor?

aaron.ballman: Should we delete the default ctor?

vaibhav.yAuthorUnsubmitted

Done

Agreed, there's no reason for it to be callable. Will do the same for SarifArtifactLocation.

vaibhav.y: Agreed, there's no reason for it to be callable. Will do the same for `SarifArtifactLocation`.

SarifArtifact() = delete;

explicit SarifArtifact(const SarifArtifactLocation &Loc) : Location(Loc) {}

aaron.ballmanUnsubmitted

Done

Loc is already a SarifArtifactLocation, did you mean SarifArtifact by any chance?

(Note, this suggests to me we should mark the ctor's here as explicit.)

aaron.ballman: `Loc` is already a `SarifArtifactLocation`, did you mean `SarifArtifact` by any chance? (Note…

vaibhav.yAuthorUnsubmitted

Done

Agree, I've marked all constructors taking a single parameter as explicit.

vaibhav.y: Agree, I've marked all constructors taking a single parameter as explicit.

public:

static SarifArtifact create(const SarifArtifactLocation &Loc) {

return SarifArtifact{Loc};

}

SarifArtifact setOffset(uint32_t ArtifactOffset) {

Offset = ArtifactOffset;

return *this;

}

SarifArtifact setLength(size_t NumBytes) {

Length = NumBytes;

return *this;

aaron.ballmanUnsubmitted

Done

return *this;

}

- SarifArtifact &setRoles(const std::initializer_list<StringRef> &Roles) {

+ SarifArtifact &setRoles(std::initializer_list<StringRef> Roles) {

this->Roles.assign(Roles);

This is another lightweight nonowning wrapper type that we don't usually pass as a const ref.

aaron.ballman: This is another lightweight nonowning wrapper type that we don't usually pass as a const ref.

}

SarifArtifact setRoles(std::initializer_list<llvm::StringRef> ArtifactRoles) {

Roles.assign(ArtifactRoles.begin(), ArtifactRoles.end());

return *this;

aaron.ballmanUnsubmitted

Done

return *this;

}

- SarifArtifact &setMimeType(const StringRef &MimeType) {

+ SarifArtifact &setMimeType(StringRef MimeType) {

this->MimeType = MimeType;

aaron.ballman:

}

SarifArtifact setMimeType(llvm::StringRef ArtifactMimeType) {

MimeType = ArtifactMimeType.str();

return *this;

}

};

} // namespace detail

enum class ThreadFlowImportance { Important, Essential, Unimportant };

/// A thread flow is a sequence of code locations that specify a possible path

/// through a single thread of execution.

/// A thread flow in SARIF is related to a code flow which describes

/// the progress of one or more programs through one or more thread flows.

///

/// Reference:

/// 1. <a href="https://docs.oasis-open.org/sarif/sarif/v2.1.0/os/sarif-v2.1.0-os.html#_Toc34317744">threadFlow object</a>

/// 2. <a href="https://docs.oasis-open.org/sarif/sarif/v2.1.0/os/sarif-v2.1.0-os.html#_Toc34317740">codeFlow object</a>

class ThreadFlow {

friend class SarifDocumentWriter;

CharSourceRange Range;

ThreadFlowImportance Importance;

std::string Message;

ThreadFlow() = default;

aaron.ballmanUnsubmitted

Not Done

One question this raises for me is whether we should be enforcing invariants from the SARIF spec as part of this interface or not. Currently, you can create a thread flow that has no importance or a rule without a name/id, etc. That means it's pretty easy to create SARIF that won't validate properly. One possible way to alleviate this would be for the create() methods/ctors to require these be set when creating the objects. However, I can imagine there will be times when that is awkward due to following the builder pattern with the interface. Another option would be to have some validation methods on each of the interfaces and the whole tree gets validated after construction, but this could have performance impacts.

What are your thoughts?

aaron.ballman: One question this raises for me is whether we should be enforcing invariants from the SARIF…

aaron.ballmanUnsubmitted

Not Done

Are you still intending to add a validate() interface to take care of this, or are you still thinking about how to enforce invariants during construction (or some combination of the two)?

aaron.ballman: Are you still intending to add a `validate()` interface to take care of this, or are you still…

vaibhav.yAuthorUnsubmitted

Done

Definitely! I plan to add that in a follow-up patch. The goal would be to have as much as possible be correct by construction (through small builders having a limited field set), but we'll still need a validate().

vaibhav.y: Definitely! I plan to add that in a follow-up patch. The goal would be to have as much as…

public:

static ThreadFlow create() { return {}; }

ThreadFlow setRange(const CharSourceRange &ItemRange) {

aaron.ballmanUnsubmitted

Not Done

Should we assert this source range is not a token range?

aaron.ballman: Should we assert this source range is not a token range?

vaibhav.yAuthorUnsubmitted

Done

I don't have a strong opinion here (since these are validated downstream in createPhysicalLocation) but it makes sense to be defensive & assert early.

I'll preserve the downstream one as well in case a new type gets added that feeds data into createPhysicalLocation as well.

vaibhav.y: I don't have a strong opinion here (since these are validated downstream in…

assert(ItemRange.isCharRange() &&

"ThreadFlows require a character granular source range!");

Range = ItemRange;

return *this;

}

ThreadFlow setImportance(const ThreadFlowImportance &ItemImportance) {

Importance = ItemImportance;

return *this;

}

ThreadFlow setMessage(llvm::StringRef ItemMessage) {

Message = ItemMessage.str();

return *this;

}

};

/// A SARIF rule (\c reportingDescriptor object) contains information that

/// describes a reporting item generated by a tool. A reporting item is

/// either a result of analysis or notification of a condition encountered by

/// the tool. Rules are arbitrary but are identifiable by a hierarchical

/// rule-id.

///

/// This builder provides an interface to create SARIF \c reportingDescriptor

/// objects via the \ref SarifRule::create static method.

///

/// Reference:

/// 1. <a href="https://docs.oasis-open.org/sarif/sarif/v2.1.0/os/sarif-v2.1.0-os.html#_Toc34317836">reportingDescriptor object</a>

class SarifRule {

friend class clang::SarifDocumentWriter;

std::string Name;

std::string Id;

std::string Description;

std::string HelpURI;

SarifRule() = default;

public:

static SarifRule create() { return {}; }

SarifRule setName(llvm::StringRef RuleName) {

Name = RuleName.str();

return *this;

}

SarifRule setRuleId(llvm::StringRef RuleId) {

Id = RuleId.str();

return *this;

}

SarifRule setDescription(llvm::StringRef RuleDesc) {

Description = RuleDesc.str();

return *this;

}

SarifRule setHelpURI(llvm::StringRef RuleHelpURI) {

HelpURI = RuleHelpURI.str();

return *this;

}

};

aaron.ballmanUnsubmitted

Done

No idea what clang-format wants done here, but there's a stray The at the end of the line.

aaron.ballman: No idea what clang-format wants done here, but there's a stray `The` at the end of the line.

/// A SARIF result (also called a "reporting item") is a unit of output

aaron.ballmanUnsubmitted

Done

/// used to create an empty shell onto which attributes can be added using the

- /// \c setX(...) methods. The

+ /// \c setX(...) methods.

///

/// For example:

aaron.ballman:

/// produced when one of the tool's \c reportingDescriptor encounters a match

/// on the file being analysed by the tool.

///

/// This builder provides a \ref SarifResult::create static method that can be

/// used to create an empty shell onto which attributes can be added using the

/// \c setX(...) methods.

///

/// For example:

/// \code{.cpp}

/// SarifResult result = SarifResult::create(...)

/// .setRuleId(...)

/// .setDiagnosticMessage(...);

/// \endcode

///

/// Reference:

aaron.ballmanUnsubmitted

Done

This comment looks incorrect to me.

aaron.ballman: This comment looks incorrect to me.

vaibhav.yAuthorUnsubmitted

Done

Ack, fixed that as well. the right rationale for uint32_t is that it is the largest non-negative type that can be safely promoted to int64_t (which is what LLVM's json encoder supports)

vaibhav.y: Ack, fixed that as well. the right rationale for `uint32_t` is that it is the largest non…

/// 1. <a href="https://docs.oasis-open.org/sarif/sarif/v2.1.0/os/sarif-v2.1.0-os.html#_Toc34317638">SARIF<pre>result</pre></a>

class SarifResult {

friend class clang::SarifDocumentWriter;

aaron.ballmanUnsubmitted

Done

// chosen because it has to be non-negative, and because the JSON encoder

- // used requires this be a type that can be safely promoted to \c int64_t

+ // used requires this be a type that can be safely promoted to \c int64_t.

uint32_t RuleIdx;

aaron.ballman:

// NOTE:

aaron.ballmanUnsubmitted

Done

A default constructed SarifResult will have an uninitialized RuleIdx -- are you okay with that?

aaron.ballman: A default constructed `SarifResult` will have an uninitialized `RuleIdx` -- are you okay with…

vaibhav.yAuthorUnsubmitted

Done

Hrm, I had overlooked that case (since appendResult took and index and a rule). I'll make it take a rule index upon construction. This will also make it so that appendResult takes just the SarifResult, and not an (Idx, Result), which is definitely an artifact of before I added RuleIdx to SarifResult.

vaibhav.y: Hrm, I had overlooked that case (since `appendResult` took and index and a rule). I'll make it…

// This type cannot fit all possible indexes representable by JSON, but is

// chosen because it is the largest unsigned type that can be safely

// converted to an \c int64_t.

uint32_t RuleIdx;

std::string RuleId;

std::string DiagnosticMessage;

llvm::SmallVector<CharSourceRange, 8> Locations;

llvm::SmallVector<ThreadFlow, 8> ThreadFlows;

SarifResult() = delete;

explicit SarifResult(uint32_t RuleIdx) : RuleIdx(RuleIdx) {}

public:

static SarifResult create(uint32_t RuleIdx) { return SarifResult{RuleIdx}; }

SarifResult setIndex(uint32_t Idx) {

RuleIdx = Idx;

return *this;

}

SarifResult setRuleId(llvm::StringRef Id) {

RuleId = Id.str();

return *this;

}

aaron.ballmanUnsubmitted

Done

return *this;

}

- SarifResult &setLocations(const ArrayRef<FullSourceRange> &DiagLocs) {

+ SarifResult &setLocations(ArrayRef<FullSourceRange> DiagLocs) {

this->Locations = DiagLocs;

return *this;

}

- SarifResult &setThreadFlows(const ArrayRef<ThreadFlow> &ThreadFlows) {

+ SarifResult &setThreadFlows(ArrayRef<ThreadFlow> ThreadFlows) {

this->ThreadFlows = ThreadFlows;

Also a nonowning reference type that's meant to be passed by value.

aaron.ballman: Also a nonowning reference type that's meant to be passed by value.

SarifResult setDiagnosticMessage(llvm::StringRef Message) {

DiagnosticMessage = Message.str();

return *this;

}

aaron.ballmanUnsubmitted

Not Done

In an asserts build, should we additionally have a loop to assert that each location is a char range rather than a token range?

aaron.ballman: In an asserts build, should we additionally have a loop to assert that each location is a char…

vaibhav.yAuthorUnsubmitted

Not Done

Will do, I had the asserts in createPhysicalLocation initially. Adding them at the site of creation makes sense as well.

vaibhav.y: Will do, I had the asserts in `createPhysicalLocation` initially. Adding them at the site of…

SarifResult setLocations(llvm::ArrayRef<CharSourceRange> DiagLocs) {

#ifndef NDEBUG

for (const auto &Loc : DiagLocs) {

assert(Loc.isCharRange() &&

"SARIF Results require character granular source ranges!");

aaron.ballmanUnsubmitted

Done

/// must ensure that \ref SarifDocumentWriter::createRun is is called before

- /// anyother methods.

+ /// any other methods.

/// 2. If SarifDocumentWriter::endRun is called, callers MUST call

aaron.ballman:

}

#endif

Locations.assign(DiagLocs.begin(), DiagLocs.end());

return *this;

}

SarifResult setThreadFlows(llvm::ArrayRef<ThreadFlow> ThreadFlowResults) {

ThreadFlows.assign(ThreadFlowResults.begin(), ThreadFlowResults.end());

return *this;

}

};

/// This class handles creating a valid SARIF document given various input

aaron.ballmanUnsubmitted

Not Done

I worry about the performance aspects of using a callback for this. Calls across DSO boundaries can be slower (due to having to call through a thunk), and it necessitates calling back to the user to calculate information that the user could simply pass in directly. That's on top of std::function already being pretty heavy-weight.

aaron.ballman: I worry about the performance aspects of using a callback for this. Calls across DSO boundaries…

/// attributes. However, it requires an ordering among certain method calls:

aaron.ballmanUnsubmitted

Done

/// \internal

- /// Return a pointer to the current tool. If no run exists, this will

- /// crash.

+ /// Return a pointer to the current tool. Asserts that a run exists.

json::Object *getCurrentTool();

aaron.ballman:

vaibhav.yAuthorUnsubmitted

Done

Thanks for rewording. I'll also make this return a reference, since the pointer returned cannot be null.

vaibhav.y: Thanks for rewording. I'll also make this return a reference, since the pointer returned cannot…

///

/// 1. Because every SARIF document must contain at least 1 \c run, callers

/// must ensure that \ref SarifDocumentWriter::createRun is is called before

/// any other methods.

aaron.ballmanUnsubmitted

Done

/// \internal

- /// Checks if there is a run associated with this document

+ /// Checks if there is a run associated with this document.

///

/// \return true on success

aaron.ballman:

/// 2. If SarifDocumentWriter::endRun is called, callers MUST call

/// SarifDocumentWriter::createRun, before invoking any of the result

/// aggregation methods such as SarifDocumentWriter::appendResult etc.

aaron.ballmanUnsubmitted

Not Done

FWIW, this use of top-level const is fine (we do use it for data member variables).

aaron.ballman: FWIW, this use of top-level `const` is fine (we do use it for data member variables).

class SarifDocumentWriter {

private:

const llvm::StringRef SchemaURI{

"https://docs.oasis-open.org/sarif/sarif/v2.1.0/cos02/schemas/"

aaron.ballmanUnsubmitted

Done

/// Reset portions of the internal state so that the document is ready to

- /// recieve data for a new run

+ /// receive data for a new run.

void reset();

aaron.ballman:

"sarif-schema-2.1.0.json"};

const llvm::StringRef SchemaVersion{"2.1.0"};

/// \internal

/// Return a pointer to the current tool. Asserts that a run exists.

llvm::json::Object &getCurrentTool();

aaron.ballmanUnsubmitted

Done

/// \brief Return a mutable pointer to the current run, if it exists.

///

- /// \note If a run does not exist in the SARIF document, calling this will

- /// trigger undefined behaviour

+ /// \note It is undefined behavior to call this if a run does not

+ /// exist in the SARIF document.

json::Object *currentRun();

aaron.ballman:

/// \internal

/// Checks if there is a run associated with this document.

///

/// \return true on success

bool hasRun() const;

/// \internal

aaron.ballmanUnsubmitted

Done

/// See \link ThreadFlow \endlink

///

- /// \note If a run does not exist in the SARIF document, calling this will

- /// trigger undefined behaviour

+ /// \note It is undefined behavior to call this if a run does not

+ /// exist in the SARIF document.

json::Object createCodeFlow(const ArrayRef<ThreadFlow> &ThreadFlows);

aaron.ballman:

/// Reset portions of the internal state so that the document is ready to

aaron.ballmanUnsubmitted

Done

/// trigger undefined behaviour

- json::Object createCodeFlow(const ArrayRef<ThreadFlow> &ThreadFlows);

+ json::Object createCodeFlow(ArrayRef<ThreadFlow> ThreadFlows);

/// Add the given threadflows to the ones this SARIF document knows about

aaron.ballman:

/// receive data for a new run.

void reset();

aaron.ballmanUnsubmitted

Done

json::Object createCodeFlow(const ArrayRef<ThreadFlow> &ThreadFlows);

- /// Add the given threadflows to the ones this SARIF document knows about

+ /// Add the given thread flows to the ones this SARIF document knows about.

json::Array createThreadFlows(const ArrayRef<ThreadFlow> &ThreadFlows);

aaron.ballman:

aaron.ballmanUnsubmitted

Done

/// Add the given threadflows to the ones this SARIF document knows about

- json::Array createThreadFlows(const ArrayRef<ThreadFlow> &ThreadFlows);

+ json::Array createThreadFlows(ArrayRef<ThreadFlow> ThreadFlows);

/// Add the given \ref FullSourceRange to the SARIF document as a physical

aaron.ballman:

/// \internal

/// Return a mutable reference to the current run, after asserting it exists.

///

aaron.ballmanUnsubmitted

Done

/// Add the given \ref FullSourceRange to the SARIF document as a physical

- /// location, with it's corresponding artifact

+ /// location, with its corresponding artifact.

json::Object createPhysicalLocation(const FullSourceRange &R);

Same suggestions apply elsewhere in the patch.

aaron.ballman: Same suggestions apply elsewhere in the patch.

/// \note It is undefined behavior to call this if a run does not exist in

/// the SARIF document.

llvm::json::Object &getCurrentRun();

/// Create a code flow object for the given threadflows.

/// See \ref ThreadFlow.

///

/// \note It is undefined behavior to call this if a run does not exist in

/// the SARIF document.

aaron.ballmanUnsubmitted

Done

public:

- /// Create a new empty SARIF document

+ /// Create a new empty SARIF document.

SarifDocumentWriter() : Closed(true){};

aaron.ballman:

llvm::json::Object

aaron.ballmanUnsubmitted

Done

/// Create a new empty SARIF document

- SarifDocumentWriter() : Closed(true){};

+ SarifDocumentWriter() = default;

/// Create a new empty SARIF document with the given language options

Once you use the default ctor, there's no way to associate language options with the document writer. Is it wise to expose this constructor? (If we didn't, then we could store the LangOptions as a const reference instead of making a copy in the other constructor. Given that we never mutate the options, I think that's a win.)

aaron.ballman: Once you use the default ctor, there's no way to associate language options with the document…

vaibhav.yAuthorUnsubmitted

Done

That's a good observation, I will delete this constructor and expose SarifDocumentWriter(const LangOptions &LangOpts) instead.

vaibhav.y: That's a good observation, I will delete this constructor and expose `SarifDocumentWriter(const…

createCodeFlow(const llvm::ArrayRef<ThreadFlow> ThreadFlows);

aaron.ballmanUnsubmitted

Done

SarifDocumentWriter() : Closed(true){};

- /// Create a new empty SARIF document with the given language options

+ /// Create a new empty SARIF document with the given language options.

SarifDocumentWriter(const LangOptions &LangOpts)

aaron.ballman:

/// Add the given threadflows to the ones this SARIF document knows about.

llvm::json::Array

aaron.ballmanUnsubmitted

Done

SarifDocumentWriter(const LangOptions &LangOpts)

- : LangOpts(LangOpts), Closed(true) {}

+ : LangOpts(LangOpts) {}

/// Release resources held by this SARIF document

aaron.ballman:

createThreadFlows(const llvm::ArrayRef<ThreadFlow> ThreadFlows);

aaron.ballmanUnsubmitted

Done

: LangOpts(LangOpts), Closed(true) {}

- /// Release resources held by this SARIF document

+ /// Release resources held by this SARIF document.

~SarifDocumentWriter() = default;

aaron.ballman:

/// Add the given \ref CharSourceRange to the SARIF document as a physical

/// location, with its corresponding artifact.

llvm::json::Object createPhysicalLocation(const CharSourceRange &R);

public:

SarifDocumentWriter() = delete;

/// Create a new empty SARIF document with the given source manager.

SarifDocumentWriter(const SourceManager &SourceMgr) : SourceMgr(SourceMgr) {}

/// Release resources held by this SARIF document.

~SarifDocumentWriter() = default;

/// Create a new run with which any upcoming analysis will be associated.

/// Each run requires specifying the tool that is generating reporting items.

void createRun(const llvm::StringRef ShortToolName,

const llvm::StringRef LongToolName,

const llvm::StringRef ToolVersion = CLANG_VERSION_STRING);

/// If there is a current run, end it.

///

aaron.ballmanUnsubmitted

Done

StringRef HelpURI = "");

- /// Associate the given rule with the current run

+ /// Associate the given rule with the current run.

///

/// \pre

It'd be good to explain why createRule() returns a value here and above.

aaron.ballman: It'd be good to explain why `createRule()` returns a value here and above.

/// This method collects various book-keeping required to clear and close

/// resources associated with the current run, but may also allocate some

/// for the next run.

///

/// Calling \ref endRun before associating a run through \ref createRun leads

/// to undefined behaviour.

void endRun();

/// Associate the given rule with the current run.

///

/// Returns an integer rule index for the created rule that is unique within

aaron.ballmanUnsubmitted

Done

/// There must be a run associated with the document, failing to do so will

- /// cause undefined behaviour

+ /// cause undefined behaviour.

/// \pre

aaron.ballman:

/// the current run, which can then be used to create a \ref SarifResult

/// to add to the current run. Note that a rule must exist before being

/// referenced by a result.

///

/// \pre

/// There must be a run associated with the document, failing to do so will

/// cause undefined behaviour.

size_t createRule(const SarifRule &Rule);

/// Append a new result to the currently in-flight run.

///

aaron.ballmanUnsubmitted

Done

I think this should be a value type rather than a possibly null pointer type -- this way, the document can always rely on there being valid language options to check, and if the user provides no custom language options, the default LangOptions suffice. Alternatively, it seems reasonable to expect the user to have to pass in a valid language options object in order to create a SARIF document. WDYT?

aaron.ballman: I think this should be a value type rather than a possibly null pointer type -- this way, the…

vaibhav.yAuthorUnsubmitted

Done

I agree with that, will change it to store an owned value.

I think leaving the two constructors as is is fine as long as the default variant will also leave LangOpts in a valid state.

vaibhav.y: I agree with that, will change it to store an owned value. I think leaving the two…

/// \pre

/// There must be a run associated with the document, failing to do so will

/// cause undefined behaviour.

/// \pre

/// \c RuleIdx used to create the result must correspond to a rule known by

/// the SARIF document. It must be the value returned by a previous call

/// to \ref createRule.

void appendResult(const SarifResult &SarifResult);

/// Return the SARIF document in its current state.

/// Calling this will trigger a copy of the internal state including all

aaron.ballmanUnsubmitted

Done

private:

- /// Langauge options to use for the current SARIF document

+ /// Language options to use for the current SARIF document.

const LangOptions LangOpts;

aaron.ballman:

/// reported diagnostics, resulting in an expensive call.

llvm::json::Object createDocument();

private:

/// Source Manager to use for the current SARIF document.

const SourceManager &SourceMgr;

aaron.ballmanUnsubmitted

Done

/// This could be a document that is freshly created, or has recently

- /// finished writing to a previous run

+ /// finished writing to a previous run.

bool Closed;

aaron.ballman:

aaron.ballmanUnsubmitted

Done

/// finished writing to a previous run

- bool Closed;

+ bool Closed = true;

/// A sequence of SARIF runs.

aaron.ballman:

/// Flag to track the state of this document:

/// A closed document is one on which a new runs must be created.

/// This could be a document that is freshly created, or has recently

/// finished writing to a previous run.

bool Closed = true;

/// A sequence of SARIF runs.

/// Each run object describes a single run of an analysis tool and contains

/// the output of that run.

///

/// Reference: <a href="https://docs.oasis-open.org/sarif/sarif/v2.1.0/os/sarif-v2.1.0-os.html#_Toc34317484">run object</a>

llvm::json::Array Runs;

/// The list of rules associated with the most recent active run. These are

/// defined using the diagnostics passed to the SarifDocument. Each rule

/// need not be unique through the result set. E.g. there may be several

/// 'syntax' errors throughout code under analysis, each of which has its

/// own specific diagnostic message (and consequently, RuleId). Rules are

/// also known as "reportingDescriptor" objects in SARIF.

///

/// Reference: <a href="https://docs.oasis-open.org/sarif/sarif/v2.1.0/os/sarif-v2.1.0-os.html#_Toc34317556">rules property</a>

llvm::SmallVector<SarifRule, 32> CurrentRules;

/// The list of artifacts that have been encountered on the most recent active

/// run. An artifact is defined in SARIF as a sequence of bytes addressable

/// by a URI. A common example for clang's case would be files named by

/// filesystem paths.

llvm::StringMap<detail::SarifArtifact> CurrentArtifacts;

};

} // namespace clang

#endif // LLVM_CLANG_BASIC_SARIF_H

clang/lib/Basic/CMakeLists.txt

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	add_clang_library(clangBasic
ObjCRuntime.cpp		ObjCRuntime.cpp
OpenCLOptions.cpp		OpenCLOptions.cpp
OpenMPKinds.cpp		OpenMPKinds.cpp
OperatorPrecedence.cpp		OperatorPrecedence.cpp
ProfileList.cpp		ProfileList.cpp
NoSanitizeList.cpp		NoSanitizeList.cpp
SanitizerSpecialCaseList.cpp		SanitizerSpecialCaseList.cpp
Sanitizers.cpp		Sanitizers.cpp
		Sarif.cpp
SourceLocation.cpp		SourceLocation.cpp
SourceManager.cpp		SourceManager.cpp
Stack.cpp		Stack.cpp
TargetID.cpp		TargetID.cpp
TargetInfo.cpp		TargetInfo.cpp
Targets.cpp		Targets.cpp
Targets/AArch64.cpp		Targets/AArch64.cpp
Targets/AMDGPU.cpp		Targets/AMDGPU.cpp
Show All 37 Lines

clang/lib/Basic/Sarif.cpp

This file was added.

//===-- clang/Basic/Sarif.cpp - SarifDocumentWriter class definition ------===//

lattnerUnsubmitted

Done

THis nees the standard header boilerplate per the coding standards doc

lattner: THis nees the standard header boilerplate per the coding standards doc

vaibhav.yAuthorUnsubmitted

Done

Ack, didn't grok the "all source files" part.

vaibhav.y: Ack, didn't grok the "all source files" part.

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

///

/// \file

/// This file contains the declaration of the SARIFDocumentWriter class, and

/// associated builders such as:

/// - \ref SarifArtifact

/// - \ref SarifArtifactLocation

/// - \ref SarifRule

/// - \ref SarifResult

//===----------------------------------------------------------------------===//

#include "clang/Basic/Sarif.h"

#include "clang/Basic/SourceLocation.h"

#include "clang/Basic/SourceManager.h"

#include "llvm/ADT/ArrayRef.h"

#include "llvm/ADT/STLExtras.h"

#include "llvm/ADT/StringMap.h"

#include "llvm/ADT/StringRef.h"

#include "llvm/Support/ConvertUTF.h"

#include "llvm/Support/JSON.h"

#include "llvm/Support/Path.h"

#include <string>

vaibhav.yAuthorUnsubmitted

Done

A lot of these are copied from SarifDiagnostics.cpp

vaibhav.y: A lot of these are copied from [[ https://github.com/llvm/llvm…

lattnerUnsubmitted

Done

Please use static methods instead of anonymous namespaces, per the coding standards doc.

lattner: Please use static methods instead of anonymous namespaces, per the coding standards doc.

vaibhav.yAuthorUnsubmitted

Done

Ack.

vaibhav.y: Ack.

#include <utility>

using namespace clang;

using namespace llvm;

using clang::detail::SarifArtifact;

using clang::detail::SarifArtifactLocation;

static StringRef getFileName(const FileEntry &FE) {

StringRef Filename = FE.tryGetRealPathName();

if (Filename.empty())

Filename = FE.getName();

return Filename;

}

/// \name URI

/// @{

/// \internal

/// \brief

/// Return the RFC3986 encoding of the input character.

///

/// \param C Character to encode to RFC3986.

///

/// \return The RFC3986 representation of \c C.

static std::string percentEncodeURICharacter(char C) {

// RFC 3986 claims alpha, numeric, and this handful of

// characters are not reserved for the path component and

// should be written out directly. Otherwise, percent

// encode the character and write that out instead of the

aaron.ballmanUnsubmitted

Done

/// Return the RFC3986 encoding of the input character.

///

- /// \param C Character to encode to RFC3986

+ /// \param C Character to encode to RFC3986.

///

- /// \return The RFC3986 representation of \c C

+ /// \return The RFC3986 representation of \c C.

static std::string percentEncodeURICharacter(char C) {

aaron.ballman:

// reserved character.

if (llvm::isAlnum(C) ||

StringRef::npos != StringRef("-._~:@!$&'()*+,;=").find(C))

return std::string(&C, 1);

return "%" + llvm::toHex(StringRef(&C, 1));

}

/// \internal

/// \brief Return a URI representing the given file name.

///

/// \param Filename The filename to be represented as URI.

///

/// \return RFC3986 URI representing the input file name.

static std::string fileNameToURI(StringRef Filename) {

SmallString<32> Ret = StringRef("file://");

// Get the root name to see if it has a URI authority.

StringRef Root = sys::path::root_name(Filename);

aaron.ballmanUnsubmitted

Done

/// \brief Return a URI representing the given file name.

///

- /// \param Filename The filename to be represented as URI

+ /// \param Filename The filename to be represented as URI.

///

- /// \return RFC3986 URI representing the input file name

+ /// \return RFC3986 URI representing the input file name.

static std::string fileNameToURI(StringRef Filename) {

aaron.ballman:

if (Root.startswith("//")) {

// There is an authority, so add it to the URI.

Ret += Root.drop_front(2).str();

} else if (!Root.empty()) {

// There is no authority, so end the component and add the root to the URI.

Ret += Twine("/" + Root).str();

}

auto Iter = sys::path::begin(Filename), End = sys::path::end(Filename);

assert(Iter != End && "Expected there to be a non-root path component.");

// Add the rest of the path components, encoding any reserved characters;

// we skip past the first path component, as it was handled it above.

std::for_each(++Iter, End, [&Ret](StringRef Component) {

// For reasons unknown to me, we may get a backslash with Windows native

// paths for the initial backslash following the drive component, which

// we need to ignore as a URI path part.

if (Component == "\\")

return;

// Add the separator between the previous path part and the one being

// currently processed.

Ret += "/";

// URI encode the part.

for (char C : Component) {

Ret += percentEncodeURICharacter(C);

}

});

return std::string(Ret);

}

/// @}

/// \brief Calculate the column position expressed in the number of UTF-8 code

/// points from column start to the source location

///

/// \param Loc The source location whose column needs to be calculated.

/// \param TokenLen Optional hint for when the token is multiple bytes long.

///

/// \return The column number as a UTF-8 aware byte offset from column start to

/// the effective source location.

static unsigned int adjustColumnPos(FullSourceLoc Loc,

unsigned int TokenLen = 0) {

assert(!Loc.isInvalid() && "invalid Loc when adjusting column position");

std::pair<FileID, unsigned> LocInfo = Loc.getDecomposedLoc();

aaron.ballmanUnsubmitted

Done

/// @}

/// \brief Calculate the column position expressed in the number of UTF-8 code

- /// points from column start to the source location

+ /// points from column start to the source location.

///

- /// \param Loc The source location whose column needs to be calculated

- /// \param TokenLen Optional hint for when the token is multiple bytes long

+ /// \param Loc The source location whose column needs to be calculated.

+ /// \param TokenLen Optional hint for when the token is multiple bytes long.

///

/// \return The column number as a UTF-8 aware byte offset from column start to

- /// the effective source location

+ /// the effective source location.

static unsigned int adjustColumnPos(FullSourceLoc Loc,

aaron.ballman:

Optional<MemoryBufferRef> Buf =

Loc.getManager().getBufferOrNone(LocInfo.first);

assert(Buf && "got an invalid buffer for the location's file");

assert(Buf->getBufferSize() >= (LocInfo.second + TokenLen) &&

"token extends past end of buffer?");

// Adjust the offset to be the start of the line, since we'll be counting

// Unicode characters from there until our column offset.

unsigned int Off = LocInfo.second - (Loc.getExpansionColumnNumber() - 1);

unsigned int Ret = 1;

while (Off < (LocInfo.second + TokenLen)) {

Off += getNumBytesForUTF8(Buf->getBuffer()[Off]);

Ret++;

}

return Ret;

}

/// \name SARIF Utilities

/// @{

/// \internal

json::Object createMessage(StringRef Text) {

return json::Object{{"text", Text.str()}};

}

/// \internal

/// \pre CharSourceRange must be a token range

static json::Object createTextRegion(const SourceManager &SM,

const CharSourceRange &R) {

FullSourceLoc FirstTokenLoc{R.getBegin(), SM};

FullSourceLoc LastTokenLoc{R.getEnd(), SM};

json::Object Region{{"startLine", FirstTokenLoc.getExpansionLineNumber()},

{"startColumn", adjustColumnPos(FirstTokenLoc)},

{"endColumn", adjustColumnPos(LastTokenLoc)}};

if (FirstTokenLoc != LastTokenLoc) {

Region["endLine"] = LastTokenLoc.getExpansionLineNumber();

}

return Region;

}

aaron.ballmanUnsubmitted

Not Done

I didn't catch this during the review -- but this is a layering violation that caused link errors on some of the build bots. Lexer can call into Basic, but Basic cannot call into Lexer. So we'll need to find a different way to handle this.

aaron.ballman: I didn't catch this during the review -- but this is a layering violation that caused link…

vaibhav.yAuthorUnsubmitted

Not Done

Would moving the code to Support, having it depend on Basic & Lex work?

vaibhav.y: Would moving the code to Support, having it depend on Basic & Lex work?

aaron.ballmanUnsubmitted

Not Done

I don't think so -- Support is supposed to be a pretty low-level interface; it currently only relies on LLVM's Support library. I think the layering is supposed to be: Support -> Basic -> Lex.

As I see it, there are a couple of options as to where this could live. It could live in the Frontend library, as that's where all the diagnostic consumer code in Clang lives. But that library might be a bit "heavy" to pull into other tools (maybe? I don't know). It could also live in AST -- that already links in Basic and Lex. But that feels like a somewhat random place for this to live as this has very little to do with the AST itself.

Another approach, which might be better, is to require the user of this interface to pass in the token length calculation themselves in the places where it's necessary. e.g., json::Object whatever(SourceLocation Start, SourceLocation End, unsigned EndLen) and then you can remove the reliance on the lexer entirely while keeping the interface in Basic. I'm not certain how obnoxious this suggestion is, but I think it's my preferred approach for the moment (but not a strongly held position yet). WDYT of this approach?

aaron.ballman: I don't think so -- Support is supposed to be a pretty low-level interface; it currently only…

vaibhav.yAuthorUnsubmitted

Not Done

I think the approach to injecting the function is better here. I've tried to make the smallest change possiblew with passing in a function whose interface is almost identical to Lexer::MeasureTokenLength. The intent was to hint at this being the "canonical metric" for token lengths (with an example in the tests for the same).

I tried passing start, end locs but couldn't find a strong use case yet since end would likely always be: Lexer::getLocForEndOfToken(start, 0)

vaibhav.y: I think the approach to injecting the function is better here. I've tried to make the smallest…

aaron.ballmanUnsubmitted

Not Done

I'm not convinced that the less obtrusive change is a good design in this case. But I also agree that we should not use start/end *locations* either. SourceLocation traditionally points to the *start* of a token, so it would be super easy to get the end location wrong by forgetting to pass the location for the end of the token.

My suggestion was to continue to pass the start of the starting token, the start of the ending token, and the length of the ending token. With the callback approach, you have to call through the callback to eventually call Lexer::MeasureTokenLength(); with the direct approach, you skip needing to call through a callback (which means at least one less function call on every source location operation).

aaron.ballman: I'm not convinced that the less obtrusive change is a good design in this case. But I also…

vaibhav.yAuthorUnsubmitted

Not Done

Ah, I think I misunderstood your initial suggestion (json::Object whatever(SourceLocation Start, SourceLocation End, unsigned EndLen)) seemed like a function call to me, when it seems the suggested change was to pass in an object? Apologies, will fix that up.

vaibhav.y: Ah, I think I misunderstood your initial suggestion (`json::Object whatever(SourceLocation…

aaron.ballmanUnsubmitted

Not Done

Sorry for the confusion! Just to make sure we're on the same page -- my suggestion was to change the function interfaces like SarficDocumentWriter::createPhysicalLocation() so that they would take an additional unsigned EndLen parameter.

However, now that I dig around a bit, it seems like CharSourceRange is what you'd want to use there -- then you can assert that what you're given is a char range and not a token range. So you won't need the unsigned EndLen parameter after all!

aaron.ballman: Sorry for the confusion! Just to make sure we're on the same page -- my suggestion was to…

vaibhav.yAuthorUnsubmitted

Not Done

Interesting!

Asking for my understanding: If a CharSourceRange is a valid character range, then the End SLoc points to the last character in the range (even if it is mid token)? (Unlike slocs where it the first character of the last token).

vaibhav.y: Interesting! Asking for my understanding: If a `CharSourceRange` is a valid character range…

aaron.ballmanUnsubmitted

Not Done

Your understanding is correct.

aaron.ballman: Your understanding is correct.

static json::Object createLocation(json::Object &&PhysicalLocation,

StringRef Message = "") {

json::Object Ret{{"physicalLocation", std::move(PhysicalLocation)}};

if (!Message.empty())

Ret.insert({"message", createMessage(Message)});

return Ret;

}

static StringRef importanceToStr(ThreadFlowImportance I) {

switch (I) {

case ThreadFlowImportance::Important:

return "important";

case ThreadFlowImportance::Essential:

return "essential";

case ThreadFlowImportance::Unimportant:

return "unimportant";

}

llvm_unreachable("Fully covered switch is not so fully covered");

}

static json::Object

createThreadFlowLocation(json::Object &&Location,

const ThreadFlowImportance &Importance) {

return json::Object{{"location", std::move(Location)},

{"importance", importanceToStr(Importance)}};

}

/// @}

json::Object

SarifDocumentWriter::createPhysicalLocation(const CharSourceRange &R) {

assert(R.isValid() &&

"Cannot create a physicalLocation from invalid SourceRange!");

assert(R.isCharRange() &&

"Cannot create a physicalLocation from a token range!");

FullSourceLoc Start{R.getBegin(), SourceMgr};

const FileEntry *FE = Start.getExpansionLoc().getFileEntry();

assert(FE != nullptr && "Diagnostic does not exist within a valid file!");

aaron.ballmanUnsubmitted

Done

This seems like it'll be a use-after-free because the local std::string will be destroyed before the lifetime of the SarifArtifactLocation ends.

aaron.ballman: This seems like it'll be a use-after-free because the local `std::string` will be destroyed…

vaibhav.yAuthorUnsubmitted

Done

Will run it through a pass of asan & msan, is the best way to add: -fsanitize=memory -fsanitize=address to the test CMakeLists.txt & run them?

I've changed all strings to std::string, so this one should no longer be a problem but I wonder if there's any others I have missed as well.

vaibhav.y: Will run it through a pass of asan & msan, is the best way to add: `-fsanitize=memory…

aaron.ballmanUnsubmitted

Not Done

Will run it through a pass of asan & msan, is the best way to add: -fsanitize=memory -fsanitize=address to the test CMakeLists.txt & run them?

Yup! The critical part will be test coverage -- code paths that aren't executed won't get issues reported with them.

aaron.ballman: > Will run it through a pass of asan & msan, is the best way to add: -fsanitize=memory…

const std::string &FileURI = fileNameToURI(getFileName(*FE));

auto I = CurrentArtifacts.find(FileURI);

if (I == CurrentArtifacts.end()) {

uint32_t Idx = static_cast<uint32_t>(CurrentArtifacts.size());

aaron.ballmanUnsubmitted

Done

uint32_t Idx = static_cast<uint32_t>(CurrentArtifacts.size());

- const SarifArtifactLocation &location =

+ const SarifArtifactLocation &Location =

SarifArtifactLocation::create(FileURI).setIndex(Idx);

- const SarifArtifact &artifact = SarifArtifact::create(location)

+ const SarifArtifact &Artifact = SarifArtifact::create(Location)

.setRoles({"resultFile"})

.setLength(FE->getSize())

.setMimeType("text/plain");

- auto statusIter = CurrentArtifacts.insert({FileURI, artifact});

+ auto StatusIter = CurrentArtifacts.insert({FileURI, Artifact});

// If inserted, ensure the original iterator points to the newly inserted

Same suggestion applies elsewhere in the patch regarding naming conventions.

aaron.ballman: Same suggestion applies elsewhere in the patch regarding naming conventions.

const SarifArtifactLocation &Location =

SarifArtifactLocation::create(FileURI).setIndex(Idx);

const SarifArtifact &Artifact = SarifArtifact::create(Location)

.setRoles({"resultFile"})

.setLength(FE->getSize())

aaron.ballmanUnsubmitted

Done

// element, so it can be used downstream

- if (statusIter.second) {

+ if (statusIter.second)

I = statusIter.first;

- }

+ }

assert(I != CurrentArtifacts.end() && "Failed to insert new artifact");

Our usual coding style elides these too. Btw, you can find the coding style document at: https://llvm.org/docs/CodingStandards.html

aaron.ballman: Our usual coding style elides these too. Btw, you can find the coding style document at: https…

vaibhav.yAuthorUnsubmitted

Done

Thanks, sorry there's so many of these! I definitely need to not auto-pilot with style.

vaibhav.y: Thanks, sorry there's so many of these! I definitely need to not auto-pilot with style.

aaron.ballmanUnsubmitted

Done

No worries! Our style is not... all that typical... so it can be hard to remember.

aaron.ballman: No worries! Our style is not... all that typical... so it can be hard to remember.

.setMimeType("text/plain");

auto StatusIter = CurrentArtifacts.insert({FileURI, Artifact});

// If inserted, ensure the original iterator points to the newly inserted

// element, so it can be used downstream.

if (StatusIter.second)

I = StatusIter.first;

}

assert(I != CurrentArtifacts.end() && "Failed to insert new artifact");

const SarifArtifactLocation &Location = I->second.Location;

uint32_t Idx = Location.Index.getValue();

aaron.ballmanUnsubmitted

Done

// If inserted, ensure the original iterator points to the newly inserted

- // element, so it can be used downstream

+ // element, so it can be used downstream.

if (StatusIter.second)

aaron.ballman:

return json::Object{{{"artifactLocation", json::Object{{{"index", Idx}}}},

{"region", createTextRegion(SourceMgr, R)}}};

}

aaron.ballmanUnsubmitted

Done

Should this return a reference rather than a pointer?

aaron.ballman: Should this return a reference rather than a pointer?

vaibhav.yAuthorUnsubmitted

Done

Makes sense to convert to a ref, the pointer returned can never be null anyway

vaibhav.y: Makes sense to convert to a ref, the pointer returned can never be null anyway

json::Object &SarifDocumentWriter::getCurrentTool() {

assert(!Closed && "SARIF Document is closed. "

"Need to call createRun() before using getcurrentTool!");

// Since Closed = false here, expect there to be at least 1 Run, anything

// else is an invalid state.

assert(!Runs.empty() && "There are no runs associated with the document!");

return *Runs.back().getAsObject()->get("tool")->getAsObject();

aaron.ballmanUnsubmitted

Done

void SarifDocumentWriter::endRun() {

- if (!hasRun()) {

+ if (!hasRun())

return;

- }

// Flush all the rules

Is there a reason why we don't want to assert instead?

aaron.ballman: Is there a reason why we don't want to assert instead?

vaibhav.yAuthorUnsubmitted

Done

Creating a document requires ending any ongoing runs, and it is valid to create a document without any runs, so createDocument() calls endRun().

I guess having a flag to mark the status of the document, and only calling endRun() if a an active run exists would likely be better. (hasRun() seems to have a rather broad responsibility, tracking both the availability & state of the current run)

I'm thinking of adding a Closed flag to the writer (default true), which is unset whenever createRun() is called, and endRun() will set the same flag. That way we only endRun() if there is something to end, like so:

Constructor makes writer with Closed = true
createRun() requires Closed == true, and sets it to false
endRun() requires Closed == false, and sets it to true
createDocument() requires Closed == true, and will call endRun() to ensure that

What do you think?

vaibhav.y: Creating a document requires ending any ongoing runs, and it is valid to create a document…

aaron.ballmanUnsubmitted

Done

I think that makes sense. Adding assertions where appropriate would also help users who misuse the interface catch their issues earlier rather than later.

aaron.ballman: I think that makes sense. Adding assertions where appropriate would also help users who misuse…

}

void SarifDocumentWriter::reset() {

CurrentRules.clear();

aaron.ballmanUnsubmitted

Done

json::Object &Tool = *getCurrentTool();

- json::Array Rules{};

+ json::Array Rules;

for (const SarifRule &R : CurrentRules) {

aaron.ballman:

aaron.ballmanUnsubmitted

Done

"Need to call createRun() before using getcurrentTool!");

// Since Closed = false here, expect there to be at least 1 Run, anything

- // else is an invalid state

+ // else is an invalid state.

assert(!Runs.empty() && "There are no runs associated with the document!");

aaron.ballman:

CurrentArtifacts.clear();

}

void SarifDocumentWriter::endRun() {

// Exit early if trying to close a closed Document.

if (Closed) {

reset();

aaron.ballmanUnsubmitted

Done

{"fullDescription", R.Description}};

- if (!R.HelpURI.empty()) {

+ if (!R.HelpURI.empty())

theRule["helpUri"] = R.HelpURI;

- }

Rules.emplace_back(std::move(theRule));

Same suggestion applies elsewhere in the patch.

aaron.ballman: Same suggestion applies elsewhere in the patch.

return;

}

// Since Closed = false here, expect there to be at least 1 Run, anything

// else is an invalid state.

aaron.ballmanUnsubmitted

Not Done

void SarifDocumentWriter::endRun() {

- // Exit early if trying to close a closed Document

+ // Exit early if trying to close a closed Document.

if (Closed) {

(At this point, I'll stop commenting on these -- can you go through the patch and make sure that all comments have appropriate terminating punctuation?)

aaron.ballman: (At this point, I'll stop commenting on these -- can you go through the patch and make sure…

vaibhav.yAuthorUnsubmitted

Done

Ack, sincere apologies again!

vaibhav.y: Ack, sincere apologies again!

aaron.ballmanUnsubmitted

Not Done

No worries! These sort of nits are really easy to miss, it happens to me too. :-)

aaron.ballman: No worries! These sort of nits are really easy to miss, it happens to me too. :-)

assert(!Runs.empty() && "There are no runs associated with the document!");

// Flush all the rules.

json::Object &Tool = getCurrentTool();

json::Array Rules;

for (const SarifRule &R : CurrentRules) {

json::Object Rule{

{"name", R.Name},

{"id", R.Id},

{"fullDescription", json::Object{{"text", R.Description}}}};

if (!R.HelpURI.empty())

Rule["helpUri"] = R.HelpURI;

Rules.emplace_back(std::move(Rule));

}

json::Object &Driver = *Tool.getObject("driver");

Driver["rules"] = std::move(Rules);

// Flush all the artifacts.

json::Object &Run = getCurrentRun();

json::Array *Artifacts = Run.getArray("artifacts");

for (const auto &Pair : CurrentArtifacts) {

const SarifArtifact &A = Pair.getValue();

json::Object Loc{{"uri", A.Location.URI}};

if (A.Location.Index.hasValue()) {

Loc["index"] = static_cast<int64_t>(A.Location.Index.getValue());

}

json::Object Artifact;

Artifact["location"] = std::move(Loc);

if (A.Length.hasValue())

Artifact["length"] = static_cast<int64_t>(A.Length.getValue());

aaron.ballmanUnsubmitted

Done

json::Array SarifDocumentWriter::createThreadFlows(

- const ArrayRef<ThreadFlow> &ThreadFlows) {

+ ArrayRef<ThreadFlow> ThreadFlows) {

json::Object Ret{{"locations", json::Array{}}};

aaron.ballman:

if (!A.Roles.empty())

Artifact["roles"] = json::Array(A.Roles);

aaron.ballmanUnsubmitted

Done

json::Object Ret{{"locations", json::Array{}}};

- json::Array Locs{};

+ json::Array Locs;

for (const auto &ThreadFlow : ThreadFlows) {

aaron.ballman:

if (!A.MimeType.empty())

Artifact["mimeType"] = A.MimeType;

if (A.Offset.hasValue())

Artifact["offset"] = A.Offset;

Artifacts->push_back(json::Value(std::move(Artifact)));

}

// Clear, reset temporaries before next run.

reset();

// Mark the document as closed.

Closed = true;

aaron.ballmanUnsubmitted

Done

json::Object

- SarifDocumentWriter::createCodeFlow(const ArrayRef<ThreadFlow> &ThreadFlows) {

+ SarifDocumentWriter::createCodeFlow(ArrayRef<ThreadFlow> ThreadFlows) {

return json::Object{{"threadFlows", createThreadFlows(ThreadFlows)}};

aaron.ballman:

}

json::Array

SarifDocumentWriter::createThreadFlows(ArrayRef<ThreadFlow> ThreadFlows) {

json::Object Ret{{"locations", json::Array{}}};

json::Array Locs;

for (const auto &ThreadFlow : ThreadFlows) {

aaron.ballmanUnsubmitted

Not Done

Is there a reason we don't want to assert that the caller has already ended a run before they created a new one?

aaron.ballman: Is there a reason we don't want to assert that the caller has already ended a run before they…

vaibhav.yAuthorUnsubmitted

Not Done

Calling createDocument() also calls endRun() (so as to provide a "complete" view of the document under construction).

So having endRun() amount to a no-op when there is no run because it is valid for createDocument() return a document with no runs associated with it. What do you think?

vaibhav.y: Calling `createDocument()` also calls `endRun()` (so as to provide a "complete" view of the…

json::Object PLoc = createPhysicalLocation(ThreadFlow.Range);

json::Object Loc = createLocation(std::move(PLoc), ThreadFlow.Message);

Locs.emplace_back(

createThreadFlowLocation(std::move(Loc), ThreadFlow.Importance));

}

Ret["locations"] = std::move(Locs);

return json::Array{std::move(Ret)};

}

json::Object

SarifDocumentWriter::createCodeFlow(ArrayRef<ThreadFlow> ThreadFlows) {

return json::Object{{"threadFlows", createThreadFlows(ThreadFlows)}};

}

void SarifDocumentWriter::createRun(StringRef ShortToolName,

aaron.ballmanUnsubmitted

Done

Runs.emplace_back(std::move(currentRun));

}

- bool SarifDocumentWriter::hasRun() const { return Runs.size() != 0; }

+ bool SarifDocumentWriter::hasRun() const { return !Runs.empty(); }

json::Object *SarifDocumentWriter::currentRun() {

aaron.ballman:

StringRef LongToolName,

StringRef ToolVersion) {

aaron.ballmanUnsubmitted

Done

Should this return a reference as well?

aaron.ballman: Should this return a reference as well?

vaibhav.yAuthorUnsubmitted

Done

That is reasonable. Will have currentTool() and currentRun() return references.

vaibhav.y: That is reasonable. Will have `currentTool()` and `currentRun()` return references.

// Clear resources associated with a previous run.

endRun();

// Signify a new run has begun.

Closed = false;

json::Object Tool{

{"driver",

json::Object{{"name", ShortToolName},

{"fullName", LongToolName},

{"language", "en-US"},

{"version", ToolVersion},

{"informationUri",

"https://clang.llvm.org/docs/UsersManual.html"}}}};

json::Object TheRun{{"tool", std::move(Tool)},

{"results", {}},

{"artifacts", {}},

{"columnKind", "unicodeCodePoints"}};

Runs.emplace_back(std::move(TheRun));

}

json::Object &SarifDocumentWriter::getCurrentRun() {

assert(!Closed &&

"SARIF Document is closed. "

"Can only getCurrentRun() if document is opened via createRun(), "

"create a run first");

// Since Closed = false here, expect there to be at least 1 Run, anything

// else is an invalid state.

assert(!Runs.empty() && "There are no runs associated with the document!");

return *Runs.back().getAsObject();

}

aaron.ballmanUnsubmitted

Done

{"ruleId", CurrentRules[RuleIdx].RuleId}};

- if (Result.Locations.size() != 0) {

+ if (!Result.Locations.empty()) {

json::Array Locs{};

aaron.ballman:

aaron.ballmanUnsubmitted

Done

if (Result.Locations.size() != 0) {

- json::Array Locs{};

+ json::Array Locs;

for (auto &Range : Result.Locations) {

aaron.ballman:

size_t SarifDocumentWriter::createRule(const SarifRule &Rule) {

size_t Ret = CurrentRules.size();

CurrentRules.emplace_back(Rule);

return Ret;

}

aaron.ballmanUnsubmitted

Done

Ret["locations"] = std::move(Locs);

}

- if (Result.ThreadFlows.size() != 0) {

+ if (!Result.ThreadFlows.empty()) {

Ret["codeFlows"] = json::Array{createCodeFlow(Result.ThreadFlows)};

aaron.ballman:

void SarifDocumentWriter::appendResult(const SarifResult &Result) {

size_t RuleIdx = Result.RuleIdx;

assert(RuleIdx < CurrentRules.size() &&

"Trying to reference a rule that doesn't exist");

json::Object Ret{{"message", createMessage(Result.DiagnosticMessage)},

{"ruleIndex", static_cast<int64_t>(RuleIdx)},

{"ruleId", CurrentRules[RuleIdx].Id}};

if (!Result.Locations.empty()) {

json::Array Locs;

for (auto &Range : Result.Locations) {

Locs.emplace_back(createLocation(createPhysicalLocation(Range)));

}

Ret["locations"] = std::move(Locs);

}

if (!Result.ThreadFlows.empty())

Ret["codeFlows"] = json::Array{createCodeFlow(Result.ThreadFlows)};

json::Object &Run = getCurrentRun();

json::Array *Results = Run.getArray("results");

aaron.ballmanUnsubmitted

Done

{"version", SchemaVersion},

};

- if (Runs.size() > 0) {

+ if (!Runs.empty())

doc["runs"] = json::Array(Runs);

- }

return doc;

aaron.ballman:

Results->emplace_back(std::move(Ret));

}

json::Object SarifDocumentWriter::createDocument() {

// Flush all temporaries to their destinations if needed.

endRun();

json::Object Doc{

{"$schema", SchemaURI},

{"version", SchemaVersion},

};

if (!Runs.empty())

Doc["runs"] = json::Array(Runs);

return Doc;

}

clang/unittests/Basic/CMakeLists.txt

	set(LLVM_LINK_COMPONENTS			set(LLVM_LINK_COMPONENTS
	Support			Support
	)			)

	add_clang_unittest(BasicTests			add_clang_unittest(BasicTests
	CharInfoTest.cpp			CharInfoTest.cpp
	DarwinSDKInfoTest.cpp			DarwinSDKInfoTest.cpp
	DiagnosticTest.cpp			DiagnosticTest.cpp
	FileEntryTest.cpp			FileEntryTest.cpp
	FileManagerTest.cpp			FileManagerTest.cpp
	LineOffsetMappingTest.cpp			LineOffsetMappingTest.cpp
	SanitizersTest.cpp			SanitizersTest.cpp
				SarifTest.cpp
	SourceManagerTest.cpp			SourceManagerTest.cpp
	)			)

	clang_target_link_libraries(BasicTests			clang_target_link_libraries(BasicTests
	PRIVATE			PRIVATE
	clangAST			clangAST
	clangBasic			clangBasic
	clangLex			clangLex
	)			)

	target_link_libraries(BasicTests			target_link_libraries(BasicTests
	PRIVATE			PRIVATE
	LLVMTestingSupport			LLVMTestingSupport
	)			)

clang/unittests/Basic/SarifTest.cpp

This file was added.

				//===- unittests/Basic/SarifTest.cpp - Test writing SARIF documents -------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "clang/Basic/Sarif.h"
				#include "clang/Basic/DiagnosticIDs.h"
				#include "clang/Basic/DiagnosticOptions.h"
				#include "clang/Basic/FileManager.h"
				#include "clang/Basic/FileSystemOptions.h"
				#include "clang/Basic/LangOptions.h"
				#include "clang/Basic/SourceLocation.h"
				#include "clang/Basic/SourceManager.h"
				#include "llvm/ADT/StringRef.h"
				#include "llvm/Support/FormatVariadic.h"
				#include "llvm/Support/JSON.h"
				#include "llvm/Support/MemoryBuffer.h"
				#include "llvm/Support/VirtualFileSystem.h"
				#include "llvm/Support/raw_ostream.h"
				#include "gmock/gmock-matchers.h"
				#include "gtest/gtest-death-test.h"
				#include "gtest/gtest-matchers.h"
				#include "gtest/gtest.h"

				#include <algorithm>

				using namespace clang;

				namespace {

				using LineCol = std::pair<unsigned int, unsigned int>;

				static std::string serializeSarifDocument(llvm::json::Object &&Doc) {
				std::string Output;
				llvm::json::Value value(std::move(Doc));
				llvm::raw_string_ostream OS{Output};
				OS << llvm::formatv("{0}", value);
				OS.flush();
				return Output;
				}

				class SarifDocumentWriterTest : public ::testing::Test {
				protected:
				SarifDocumentWriterTest()
				: InMemoryFileSystem(new llvm::vfs::InMemoryFileSystem),
				FileMgr(FileSystemOptions(), InMemoryFileSystem),
				DiagID(new DiagnosticIDs()), DiagOpts(new DiagnosticOptions()),
				Diags(DiagID, DiagOpts.get(), new IgnoringDiagConsumer()),
				SourceMgr(Diags, FileMgr) {}

				IntrusiveRefCntPtr<llvm::vfs::InMemoryFileSystem> InMemoryFileSystem;
				FileManager FileMgr;
				IntrusiveRefCntPtr<DiagnosticIDs> DiagID;
				IntrusiveRefCntPtr<DiagnosticOptions> DiagOpts;
				DiagnosticsEngine Diags;
				SourceManager SourceMgr;
				LangOptions LangOpts;

				FileID registerSource(llvm::StringRef Name, const char *SourceText,
				bool IsMainFile = false) {
				std::unique_ptr<llvm::MemoryBuffer> SourceBuf =
				llvm::MemoryBuffer::getMemBuffer(SourceText);
				const FileEntry *SourceFile =
				FileMgr.getVirtualFile(Name, SourceBuf->getBufferSize(), 0);
				SourceMgr.overrideFileContents(SourceFile, std::move(SourceBuf));
				FileID FID = SourceMgr.getOrCreateFileID(SourceFile, SrcMgr::C_User);
				if (IsMainFile)
				SourceMgr.setMainFileID(FID);
				return FID;
				}

				CharSourceRange getFakeCharSourceRange(FileID FID, LineCol Begin,
				LineCol End) {
				auto BeginLoc = SourceMgr.translateLineCol(FID, Begin.first, Begin.second);
				auto EndLoc = SourceMgr.translateLineCol(FID, End.first, End.second);
				return CharSourceRange{SourceRange{BeginLoc, EndLoc}, /* ITR = */ false};
				}
				};

				TEST_F(SarifDocumentWriterTest, canCreateEmptyDocument) {
				// GIVEN:
				SarifDocumentWriter Writer{SourceMgr};

				// WHEN:
				const llvm::json::Object &EmptyDoc = Writer.createDocument();
				std::vector<StringRef> Keys(EmptyDoc.size());
				std::transform(EmptyDoc.begin(), EmptyDoc.end(), Keys.begin(),
				[](auto item) { return item.getFirst(); });

				// THEN:
				ASSERT_THAT(Keys, testing::UnorderedElementsAre("$schema", "version"));
				}

				// Test that a newly inserted run will associate correct tool names
				TEST_F(SarifDocumentWriterTest, canCreateDocumentWithOneRun) {
				// GIVEN:
				SarifDocumentWriter Writer{SourceMgr};
				const char *ShortName = "sariftest";
				const char *LongName = "sarif writer test";

				// WHEN:
				Writer.createRun(ShortName, LongName);
				Writer.endRun();
				const llvm::json::Object &Doc = Writer.createDocument();
				const llvm::json::Array *Runs = Doc.getArray("runs");

				// THEN:
				// A run was created
				ASSERT_THAT(Runs, testing::NotNull());

				// It is the only run
				ASSERT_EQ(Runs->size(), 1UL);

				// The tool associated with the run was the tool
				const llvm::json::Object *driver =
				Runs->begin()->getAsObject()->getObject("tool")->getObject("driver");
				ASSERT_THAT(driver, testing::NotNull());

				ASSERT_TRUE(driver->getString("name").hasValue());
				ASSERT_TRUE(driver->getString("fullName").hasValue());
				ASSERT_TRUE(driver->getString("language").hasValue());

				EXPECT_EQ(driver->getString("name").getValue(), ShortName);
				EXPECT_EQ(driver->getString("fullName").getValue(), LongName);
				EXPECT_EQ(driver->getString("language").getValue(), "en-US");
				}

				TEST_F(SarifDocumentWriterTest, addingResultsWillCrashIfThereIsNoRun) {
				#if defined(NDEBUG) \|\| !GTEST_HAS_DEATH_TEST
				GTEST_SKIP() << "This death test is only available for debug builds.";
				#endif
				// GIVEN:
				SarifDocumentWriter Writer{SourceMgr};

				// WHEN:
				// A SarifDocumentWriter::createRun(...) was not called prior to
				// SarifDocumentWriter::appendResult(...)
				// But a rule exists
				auto RuleIdx = Writer.createRule(SarifRule::create());
				const SarifResult &EmptyResult = SarifResult::create(RuleIdx);

				// THEN:
				auto Matcher = ::testing::AnyOf(
				::testing::HasSubstr("create a run first"),
				::testing::HasSubstr("no runs associated with the document"));
				ASSERT_DEATH(Writer.appendResult(EmptyResult), Matcher);
				}

				// Test adding rule and result shows up in the final document
				TEST_F(SarifDocumentWriterTest, addingResultWithValidRuleAndRunIsOk) {
				// GIVEN:
				SarifDocumentWriter Writer{SourceMgr};
				const SarifRule &Rule =
				SarifRule::create()
				.setRuleId("clang.unittest")
				.setDescription("Example rule created during unit tests")
				.setName("clang unit test");

				// WHEN:
				Writer.createRun("sarif test", "sarif test runner");
				unsigned RuleIdx = Writer.createRule(Rule);
				const SarifResult &result = SarifResult::create(RuleIdx);

				Writer.appendResult(result);
				const llvm::json::Object &Doc = Writer.createDocument();

				// THEN:
				// A document with a valid schema and version exists
				ASSERT_THAT(Doc.get("$schema"), ::testing::NotNull());
				ASSERT_THAT(Doc.get("version"), ::testing::NotNull());
				const llvm::json::Array *Runs = Doc.getArray("runs");

				// A run exists on this document
				ASSERT_THAT(Runs, ::testing::NotNull());
				ASSERT_EQ(Runs->size(), 1UL);
				const llvm::json::Object *TheRun = Runs->back().getAsObject();

				// The run has slots for tools, results, rules and artifacts
				ASSERT_THAT(TheRun->get("tool"), ::testing::NotNull());
				ASSERT_THAT(TheRun->get("results"), ::testing::NotNull());
				ASSERT_THAT(TheRun->get("artifacts"), ::testing::NotNull());
				const llvm::json::Object *Driver =
				TheRun->getObject("tool")->getObject("driver");
				const llvm::json::Array *Results = TheRun->getArray("results");
				const llvm::json::Array *Artifacts = TheRun->getArray("artifacts");

				// The tool is as expected
				ASSERT_TRUE(Driver->getString("name").hasValue());
				ASSERT_TRUE(Driver->getString("fullName").hasValue());

				EXPECT_EQ(Driver->getString("name").getValue(), "sarif test");
				EXPECT_EQ(Driver->getString("fullName").getValue(), "sarif test runner");

				// The results are as expected
				EXPECT_EQ(Results->size(), 1UL);

				// The artifacts are as expected
				EXPECT_TRUE(Artifacts->empty());
				}

				TEST_F(SarifDocumentWriterTest, checkSerializingResults) {
				// GIVEN:
				const std::string ExpectedOutput =
				R"({"$schema":"https://docs.oasis-open.org/sarif/sarif/v2.1.0/cos02/schemas/sarif-schema-2.1.0.json","runs":[{"artifacts":[],"columnKind":"unicodeCodePoints","results":[{"message":{"text":""},"ruleId":"clang.unittest","ruleIndex":0}],"tool":{"driver":{"fullName":"sarif test runner","informationUri":"https://clang.llvm.org/docs/UsersManual.html","language":"en-US","name":"sarif test","rules":[{"fullDescription":{"text":"Example rule created during unit tests"},"id":"clang.unittest","name":"clang unit test"}],"version":"1.0.0"}}}],"version":"2.1.0"})";

				SarifDocumentWriter Writer{SourceMgr};
				const SarifRule &Rule =
				SarifRule::create()
				.setRuleId("clang.unittest")
				.setDescription("Example rule created during unit tests")
				.setName("clang unit test");

				// WHEN: A run contains a result
				Writer.createRun("sarif test", "sarif test runner", "1.0.0");
				unsigned ruleIdx = Writer.createRule(Rule);
				const SarifResult &Result = SarifResult::create(ruleIdx);
				Writer.appendResult(Result);
				std::string Output = serializeSarifDocument(Writer.createDocument());

				// THEN:
				ASSERT_THAT(Output, ::testing::StrEq(ExpectedOutput));
				}

				// Check that serializing artifacts from results produces valid SARIF
				TEST_F(SarifDocumentWriterTest, checkSerializingArtifacts) {
				// GIVEN:
				const std::string ExpectedOutput =
				R"({"$schema":"https://docs.oasis-open.org/sarif/sarif/v2.1.0/cos02/schemas/sarif-schema-2.1.0.json","runs":[{"artifacts":[{"length":40,"location":{"index":0,"uri":"file:///main.cpp"},"mimeType":"text/plain","roles":["resultFile"]}],"columnKind":"unicodeCodePoints","results":[{"locations":[{"physicalLocation":{"artifactLocation":{"index":0},"region":{"endColumn":14,"startColumn":14,"startLine":3}}}],"message":{"text":"expected ';' after top level declarator"},"ruleId":"clang.unittest","ruleIndex":0}],"tool":{"driver":{"fullName":"sarif test runner","informationUri":"https://clang.llvm.org/docs/UsersManual.html","language":"en-US","name":"sarif test","rules":[{"fullDescription":{"text":"Example rule created during unit tests"},"id":"clang.unittest","name":"clang unit test"}],"version":"1.0.0"}}}],"version":"2.1.0"})";

				SarifDocumentWriter Writer{SourceMgr};
				const SarifRule &Rule =
				SarifRule::create()
				.setRuleId("clang.unittest")
				.setDescription("Example rule created during unit tests")
				.setName("clang unit test");

				// WHEN: A result is added with valid source locations for its diagnostics
				Writer.createRun("sarif test", "sarif test runner", "1.0.0");
				unsigned RuleIdx = Writer.createRule(Rule);

				llvm::SmallVector<CharSourceRange, 1> DiagLocs;
				const char *SourceText = "int foo = 0;\n"
				"int bar = 1;\n"
				"float x = 0.0\n";

				FileID MainFileID =
				registerSource("/main.cpp", SourceText, /* IsMainFile = */ true);
				CharSourceRange SourceCSR =
				getFakeCharSourceRange(MainFileID, {3, 14}, {3, 14});

				DiagLocs.push_back(SourceCSR);

				const SarifResult &Result =
				SarifResult::create(RuleIdx).setLocations(DiagLocs).setDiagnosticMessage(
				"expected ';' after top level declarator");
				Writer.appendResult(Result);
				std::string Output = serializeSarifDocument(Writer.createDocument());

				// THEN: Assert that the serialized SARIF is as expected
				ASSERT_THAT(Output, ::testing::StrEq(ExpectedOutput));
				}

				TEST_F(SarifDocumentWriterTest, checkSerializingCodeflows) {
				// GIVEN:
				const std::string ExpectedOutput =
				R"({"$schema":"https://docs.oasis-open.org/sarif/sarif/v2.1.0/cos02/schemas/sarif-schema-2.1.0.json","runs":[{"artifacts":[{"length":27,"location":{"index":1,"uri":"file:///test-header-1.h"},"mimeType":"text/plain","roles":["resultFile"]},{"length":30,"location":{"index":2,"uri":"file:///test-header-2.h"},"mimeType":"text/plain","roles":["resultFile"]},{"length":28,"location":{"index":3,"uri":"file:///test-header-3.h"},"mimeType":"text/plain","roles":["resultFile"]},{"length":41,"location":{"index":0,"uri":"file:///main.cpp"},"mimeType":"text/plain","roles":["resultFile"]}],"columnKind":"unicodeCodePoints","results":[{"codeFlows":[{"threadFlows":[{"locations":[{"importance":"essential","location":{"message":{"text":"Message #1"},"physicalLocation":{"artifactLocation":{"index":1},"region":{"endColumn":8,"endLine":2,"startColumn":1,"startLine":1}}}},{"importance":"important","location":{"message":{"text":"Message #2"},"physicalLocation":{"artifactLocation":{"index":2},"region":{"endColumn":8,"endLine":2,"startColumn":1,"startLine":1}}}},{"importance":"unimportant","location":{"message":{"text":"Message #3"},"physicalLocation":{"artifactLocation":{"index":3},"region":{"endColumn":8,"endLine":2,"startColumn":1,"startLine":1}}}}]}]}],"locations":[{"physicalLocation":{"artifactLocation":{"index":0},"region":{"endColumn":8,"endLine":2,"startColumn":5,"startLine":2}}}],"message":{"text":"Redefinition of 'foo'"},"ruleId":"clang.unittest","ruleIndex":0}],"tool":{"driver":{"fullName":"sarif test runner","informationUri":"https://clang.llvm.org/docs/UsersManual.html","language":"en-US","name":"sarif test","rules":[{"fullDescription":{"text":"Example rule created during unit tests"},"id":"clang.unittest","name":"clang unit test"}],"version":"1.0.0"}}}],"version":"2.1.0"})";

				const char *SourceText = "int foo = 0;\n"
				"int foo = 1;\n"
				"float x = 0.0;\n";
				FileID MainFileID =
				registerSource("/main.cpp", SourceText, /* IsMainFile = */ true);
				CharSourceRange DiagLoc{getFakeCharSourceRange(MainFileID, {2, 5}, {2, 8})};

				SarifDocumentWriter Writer{SourceMgr};
				const SarifRule &Rule =
				SarifRule::create()
				.setRuleId("clang.unittest")
				.setDescription("Example rule created during unit tests")
				.setName("clang unit test");

				constexpr unsigned int NUM_CASES = 3;
				llvm::SmallVector<ThreadFlow, NUM_CASES> Threadflows;
				const char *HeaderTexts[NUM_CASES]{("#pragma once\n"
				"#include <foo>"),
				("#ifndef FOO\n"
				"#define FOO\n"
				"#endif"),
				("#ifdef FOO\n"
				"#undef FOO\n"
				"#endif")};
				const char *HeaderNames[NUM_CASES]{"/test-header-1.h", "/test-header-2.h",
				"/test-header-3.h"};
				ThreadFlowImportance Importances[NUM_CASES]{
				ThreadFlowImportance::Essential, ThreadFlowImportance::Important,
				ThreadFlowImportance::Unimportant};
				for (size_t Idx = 0; Idx != NUM_CASES; ++Idx) {
				FileID FID = registerSource(HeaderNames[Idx], HeaderTexts[Idx]);
				CharSourceRange &&CSR = getFakeCharSourceRange(FID, {1, 1}, {2, 8});
				std::string Message = llvm::formatv("Message #{0}", Idx + 1);
				ThreadFlow Item = ThreadFlow::create()
				.setRange(CSR)
				.setImportance(Importances[Idx])
				.setMessage(Message);
				Threadflows.push_back(Item);
				}

				// WHEN: A result containing code flows and diagnostic locations is added
				Writer.createRun("sarif test", "sarif test runner", "1.0.0");
				unsigned RuleIdx = Writer.createRule(Rule);
				const SarifResult &Result = SarifResult::create(RuleIdx)
				.setLocations({DiagLoc})
				.setDiagnosticMessage("Redefinition of 'foo'")
				.setThreadFlows(Threadflows);
				Writer.appendResult(Result);
				std::string Output = serializeSarifDocument(Writer.createDocument());

				// THEN: Assert that the serialized SARIF is as expected
				ASSERT_THAT(Output, ::testing::StrEq(ExpectedOutput));
				}

				} // namespace

This is an archive of the discontinued LLVM Phabricator instance.

[clang] Emit SARIF Diagnostics: Create `clang::SarifDocumentWriter` interfaceClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 445466

clang/include/clang/Basic/Sarif.h

clang/lib/Basic/CMakeLists.txt

clang/lib/Basic/Sarif.cpp

clang/unittests/Basic/CMakeLists.txt

clang/unittests/Basic/SarifTest.cpp

[clang] Emit SARIF Diagnostics: Create `clang::SarifDocumentWriter` interface
ClosedPublic