Page MenuHomePhabricator

[WIP][clang] adds a way to provide user-oriented reasons
Needs ReviewPublic

Authored by cjdb on Nov 29 2022, 12:21 PM.

Details

Reviewers
aaron.ballman
erichkeane
shafik
njames93
vaibhav.y
Group Reviewers
Restricted Project
Summary

Part of the [revised diagnostic model][1] is to provide users with
diagnostics that explain what's going wrong from their perspective,
as opposed to from the compiler's perspective. This may be achieved
through a rewording of the reason, or by provididng more info, or both.

This commit also changes the diagnostic for an explicit template
parameter gone wrong to demonstrate how it works.

NOTE: this is a work-in-progress. Most of the work has been fleshed out, but the note needs to be provided with the kind of template argument it is given. I'm looking into doing this, but it seems to require a lot of tedious manual changes, and I'd prefer to get input on the changes that are currently in the diff before making those extra changes (which will distract from the current stuff, and may ultimately be unnecessary if the current direction is undesirable).

[1]: https://discourse.llvm.org/t/rfc-improving-clang-s-diagnostics/62584

Diff Detail

Event Timeline

cjdb created this revision.Nov 29 2022, 12:21 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript
cjdb requested review of this revision.Nov 29 2022, 12:21 PM
Herald added projects: Restricted Project, Restricted Project, Restricted Project, Restricted Project. · View Herald TranscriptNov 29 2022, 12:21 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript

A few comments. I don't mind the approach, though I find myself wondering if the "FormatDiagnostic" call should stay the same, and choose the legacy-reason only when a sarif reason doesn't exist? Or for some level of command line control?

The clang-side interface to this seems a touch clunky, and I fear it'll make continuing its use/generalizing its use less likely.

clang/include/clang/Basic/DiagnosticIDs.h
165

"Reason" is a part of the diagnostic itself, pre-rendered, right? Since these strings are literals, can this be an llvm::StringLiteral instead? (the constexpr-stringref type)?

Else, generating these is going to cost us at startup time.

clang/include/clang/Basic/DiagnosticSemaKinds.td
4520

Already kidna hate this format here. Is there any way we could make this be something more like:

DiagReason<Legacy<"whatever">, SARIF<"whatever">> ?

clang/lib/Format/TokenAnalyzer.cpp
47

This pair of calls has happened ~20 times by the time I've scrolled here. Can we do a single call here of some sort instead?

clang/test/Frontend/sarif-reason.cpp
16

This is definitely a case where I'd love the diagnostics formatted/arranged differently here. If you can use the #BOOKMARK style to make sure errors and notes are together, it would better illustrate what you're trying to do here.

cjdb added a comment.Nov 29 2022, 1:24 PM

I don't understand why test_demangle.pass.cpp was considered too big to upload. Here's the diff:

diff --git a/libcxxabi/test/test_demangle.pass.cpp b/libcxxabi/test/test_demangle.pass.cpp
index bbbbdce8e6c3..9da6fb7d2ad9 100644
--- a/libcxxabi/test/test_demangle.pass.cpp
+++ b/libcxxabi/test/test_demangle.pass.cpp
@@ -11451,7 +11451,9 @@ const char* cases[][2] =
     {"_ZN5clang16DiagnosticClientD1Ev", "clang::DiagnosticClient::~DiagnosticClient()"},
     {"_ZN5clang16DiagnosticClientD2Ev", "clang::DiagnosticClient::~DiagnosticClient()"},
     {"_ZN5clang16DiagnosticClient16HandleDiagnosticENS_10Diagnostic5LevelERKNS_14DiagnosticInfoE", "clang::DiagnosticClient::HandleDiagnostic(clang::Diagnostic::Level, clang::DiagnosticInfo const&)"},
-    {"_ZNK5clang14DiagnosticInfo16FormatDiagnosticERN4llvm15SmallVectorImplIcEE", "clang::DiagnosticInfo::FormatDiagnostic(llvm::SmallVectorImpl<char>&) const"},
+    {"_ZNK5clang14DiagnosticInfo13FormatSummaryERN4llvm15SmallVectorImplIcEE", "clang::DiagnosticInfo::FormatSummary(llvm::SmallVectorImpl<char>&) const"},
+    {"_ZNK5clang14DiagnosticInfo18FormatLegacyReasonERN4llvm15SmallVectorImplIcEE", "clang::DiagnosticInfo::FormatLegacyReason(llvm::SmallVectorImpl<char>&) const"},
+    {"_ZNK5clang14DiagnosticInfo17FormatSARIFReasonERN4llvm15SmallVectorImplIcEE", "clang::DiagnosticInfo::FormatSARIFReason(llvm::SmallVectorImpl<char>&) const"},
     {"_ZNK5clang14DiagnosticInfo16FormatDiagnosticEPKcS2_RN4llvm15SmallVectorImplIcEE", "clang::DiagnosticInfo::FormatDiagnostic(char const*, char const*, llvm::SmallVectorImpl<char>&) const"},
     {"_Z10ScanFormatPKcS0_c", "ScanFormat(char const*, char const*, char)"},
     {"_Z20HandlePluralModifierRKN5clang14DiagnosticInfoEjPKcjRN4llvm15SmallVectorImplIcEE", "HandlePluralModifier(clang::DiagnosticInfo const&, unsigned int, char const*, unsigned int, llvm::SmallVectorImpl<char>&)"},
cjdb added a comment.Nov 29 2022, 1:34 PM

The clang-side interface to this seems a touch clunky, and I fear it'll make continuing its use/generalizing its use less likely.

Is this w.r.t. the FormatDiagnostic being split up, or is it something else? If it's the former: I'll be changing that to FormatLegacyDiagnostic, which almost gets us back to where we started.

clang/include/clang/Basic/DiagnosticSemaKinds.td
4520

I like this suggestion.

cjdb added a comment.Nov 29 2022, 1:47 PM

though I find myself wondering if the "FormatDiagnostic" call should stay the same, and choose the legacy-reason only when a sarif reason doesn't exist? Or for some level of command line control?

Hmm... isn't this the point of the diagnostic consumers?

rymiel added a subscriber: rymiel.Nov 30 2022, 12:31 AM
rymiel added inline comments.
clang/include/clang/Frontend/DiagnosticRenderer.h
130 ↗(On Diff #478675)

Which parameter is this doxygen comment referring to?

clang/tools/clang-format/ClangFormat.cpp
397

I don't think this indent change was intended?

The clang-side interface to this seems a touch clunky, and I fear it'll make continuing its use/generalizing its use less likely.

Is this w.r.t. the FormatDiagnostic being split up, or is it something else? If it's the former: I'll be changing that to FormatLegacyDiagnostic, which almost gets us back to where we started.

Urgh, I was a bit afraid you'd ask that :D It is more of a feeling I guess, which is perhaps biased by this patch being particularly in the diagnostics-handling code. However, I suspect that over time, you're going to want to start switching all these uses of FormatLegacyReason over to FormatSarifReason, and I would want the 'easy way' to be the 'right' way in either case. Having it be a 2-liner, or over-specifying what is otherwise the 'default' behavior is a bit disconcerting.

though I find myself wondering if the "FormatDiagnostic" call should stay the same, and choose the legacy-reason only when a sarif reason doesn't exist? Or for some level of command line control?

Hmm... isn't this the point of the diagnostic consumers?

I don't have a great feeling of that? I haven't done much thinking into the diagnostics architecture over the years. That said, the use of when we'd choose one vs the other isn't completely clear to me yet.

clang/lib/Basic/Diagnostic.cpp
790–791

Comment no longer matches.

Maybe void FormatDiagnostic(SmallVectorImpl<char> &OutStr, DiagnosticMode mode)instead of void FormatDiagnostic(SmallVectorImpl<char> &OutStr)?
To make the transition easer and future proof.

Maybe void FormatDiagnostic(SmallVectorImpl<char> &OutStr, DiagnosticMode mode)instead of void FormatDiagnostic(SmallVectorImpl<char> &OutStr)?
To make the transition easer and future proof.

I like this idea. Perhaps with DiagnosticMode being a 3-way enum:

enum struct DiagnosticMode {
  Legacy,
  Sarif,  
  Default = Legacy
}

I like the idea in particular, since it makes a command line flag to modify "Default" to be whichever the user prefers pretty trivial.

cjdb marked 3 inline comments as done.Dec 1 2022, 11:10 AM

Maybe void FormatDiagnostic(SmallVectorImpl<char> &OutStr, DiagnosticMode mode)instead of void FormatDiagnostic(SmallVectorImpl<char> &OutStr)?
To make the transition easer and future proof.

I like this idea. Perhaps with DiagnosticMode being a 3-way enum:

enum struct DiagnosticMode {
  Legacy,
  Sarif,  
  Default = Legacy
}

I like the idea in particular, since it makes a command line flag to modify "Default" to be whichever the user prefers pretty trivial.

There's already a flag for this: -fdiagnostics-format=sarif. Why do we need a second diagnostic mode flag?

clang/tools/clang-format/ClangFormat.cpp
397

I find it ironic that clang-format misformatted ClangFormat.cpp.

Maybe void FormatDiagnostic(SmallVectorImpl<char> &OutStr, DiagnosticMode mode)instead of void FormatDiagnostic(SmallVectorImpl<char> &OutStr)?
To make the transition easer and future proof.

I like this idea. Perhaps with DiagnosticMode being a 3-way enum:

enum struct DiagnosticMode {
  Legacy,
  Sarif,  
  Default = Legacy
}

I like the idea in particular, since it makes a command line flag to modify "Default" to be whichever the user prefers pretty trivial.

There's already a flag for this: -fdiagnostics-format=sarif. Why do we need a second diagnostic mode flag?

Ah, oh... is the Sarif formatting being done with a new formatter? That seems unfortunate, since folks using the other formatters won't be able to use the user friendly formats.

Maybe void FormatDiagnostic(SmallVectorImpl<char> &OutStr, DiagnosticMode mode)instead of void FormatDiagnostic(SmallVectorImpl<char> &OutStr)?
To make the transition easer and future proof.

I like this idea. Perhaps with DiagnosticMode being a 3-way enum:

enum struct DiagnosticMode {
  Legacy,
  Sarif,  
  Default = Legacy
}

I like the idea in particular, since it makes a command line flag to modify "Default" to be whichever the user prefers pretty trivial.

There's already a flag for this: -fdiagnostics-format=sarif. Why do we need a second diagnostic mode flag?

Ah, oh... is the Sarif formatting being done with a new formatter? That seems unfortunate, since folks using the other formatters won't be able to use the user friendly formats.

I've been alerted offline that I am misunderstanding the Sarif proposal, and where this is going. I'll note that I wasn't present/invited at the calls where all of this was discussed, so I am admittedly not completely up to date. The above concern shouldn't stop others from reviewing this, particularly if you better understand the intent here.

cjdb updated this revision to Diff 479492.Dec 1 2022, 6:06 PM
cjdb marked 4 inline comments as done.

responds to feedback

cjdb added a comment.Dec 1 2022, 6:06 PM

Maybe void FormatDiagnostic(SmallVectorImpl<char> &OutStr, DiagnosticMode mode)instead of void FormatDiagnostic(SmallVectorImpl<char> &OutStr)?
To make the transition easer and future proof.

I like this idea. Perhaps with DiagnosticMode being a 3-way enum:

enum struct DiagnosticMode {
  Legacy,
  Sarif,  
  Default = Legacy
}

I like the idea in particular, since it makes a command line flag to modify "Default" to be whichever the user prefers pretty trivial.

There's already a flag for this: -fdiagnostics-format=sarif. Why do we need a second diagnostic mode flag?

Ah, oh... is the Sarif formatting being done with a new formatter? That seems unfortunate, since folks using the other formatters won't be able to use the user friendly formats.

I've been alerted offline that I am misunderstanding the Sarif proposal, and where this is going. I'll note that I wasn't present/invited at the calls where all of this was discussed, so I am admittedly not completely up to date. The above concern shouldn't stop others from reviewing this, particularly if you better understand the intent here.

I don't necessarily think you're the one misunderstanding here. We're definitely talking past one another, but I think it's equally likely that you're asking for something reasonable, and I'm mixing it up with something else.

clang/include/clang/Frontend/DiagnosticRenderer.h
130 ↗(On Diff #478675)

Thanks, that was from an old iteration of the CL.

clang/lib/Basic/Diagnostic.cpp
790–791

I've deleted the comment because it's already in the header, and risks going stale over and over.

clang/test/Frontend/sarif-reason.cpp
16

This is maybe done? I'm not sure if this is the #BOOKMARK style you're referring to, but it should capture the same intent. Lemme know if you had something else in mind and I'll happily change it 🙂

Then Sarif was a distraction. Still to reduce boilerplate and for A/B testing:

enum class DiagnosticMode {
  Legacy,
  UserOriented,
  Default = Legacy
}
cjdb added a comment.Dec 2 2022, 9:28 AM

Then Sarif was a distraction. Still to reduce boilerplate and for A/B testing:

enum class DiagnosticMode {
  Legacy,
  UserOriented,
  Default = Legacy
}

It took a fair bit of squinting and coffee, but I think I get it now. Having SARIF will be good for option 2: I hadn't realised this was at the Clang API level and not an enum for users to toggle on the command line! Thanks for your patience, I'll implement this now.

tschuett added a comment.EditedDec 2 2022, 9:38 AM

Maybe the kind/amount of information printed ( DiagnosticMode ) and the output device (console/sarif) are orthogonal issues.

Still it would nice to be able to toggle the diagnostic mode for users/testing / A/B.

cjdb added a comment.Dec 2 2022, 9:55 AM

Maybe the kind/amount of information printed ( DiagnosticMode ) and the output device (console/sarif) are orthogonal issues.

Still it would nice to be able to toggle the diagnostic mode for users/testing / A/B.

Hmm, are you asking for it to be possible for there to be legacy diagnostics with -fdiagnostics-format=sarif as well as user-oriented diagnostics? That should be doable, at least for now (I don't want to commit to this long-term in case it isn't viable at some point down the road, but it's hardly effort for what we currently support).

I do not ask you to do anything! I just noticed that you add a lot of FormatXXXDiagnostic functions. An alternativ design is to have one FormatDiagnostic function with a mode parameter. Then you can decide whether to print legacy or user-oriented reasons.

If next year you invent another diagnostic, you can extend the enum and the FormatDiagnostic function.

Sorry if this is derailing, but I wonder/worry about a few things here:

  1. Compounding structured output with phraseology - it seems like it might be worthwhile for these to be separate issues (doesn't mean the terminal output has to say exactly the same thing - as you've mentioned, we could have some fields that aren't rendered in some modes (eg: maybe a terminal only gets the summary, and not the reason - or perhaps you can ask for reasons to be provided if you don't mind the verbosity)). In the example case include here - describing the template parameter kind mismatch seems OK for Clang's current textual diagnostic experience & I don't think that would need to be limited to only the SARIF output?
  2. Could the new phraseology be included in a separate/parallel diagnostic file, essentially akin to a translation? Clang never did get support for multiple languages in its diagnostics, but maybe this is the time to do that work? And then have this new phrasing as the first "translation" of sorts?
  3. I'm not sure I'd commit to calling the current diagnostic experience "legacy" just yet & maybe this is my broadest feedback: It'd be great to avoid a two-system situation with the intent to merge things back in later.

I think what I'd like to see (though I'm far from the decider in any of this) would be that the existing underlying structured diagnostics system could have a new emission system, that produces SARIF instead of text on a terminal - same underlying structured representation, but providing that more directly (in SARIF/JSON/etc) so that it's not lossy in the conversion to an output format - essentially a machine-readable mode for clang's diagnostic output. (to a file, to the terminal, beside the existing terminal output or instead of it, whatever folks prefer) - this would, ideally, initially require no changes to diagnostic API usage across clang.
Diagnostic quality improvements, like the template parameter kind mismatch can be committed independently as desired.
Diagnostic infrastructure could be improved/expanded - adding the "reason" field, and a way to emit that (perhaps only in SARIF output, but maybe there's room for a flag to emit it in terminal diagnostics too).
Multiple diagnostic text files supported, and the ability to choose which diagnostic text - initially with the intent to support this newly proposed phrasing approach, but equally could be used for whole language translation.

Pretty much all of these features could be implemented somewhat in parallel, each feature would be of value independently to varying degrees, and we wouldn't end up with a deprecated/legacy/not ready yet situation (the only "not ready yet" part might be adding the new diagnostic phrasings - perhaps initially there'd be some way to fallback to the traditional diagnostics, so that the whole database didn't need to be translated wholesale - and once it's all translated and there's a lot of user feedback on the direction, we could consider changing the diagnostic database default, perhaps eventually deprecating the old database, and eventually removing it - without major API upheaval/renaming/addition/removal)

clang/lib/Basic/DiagnosticIDs.cpp
524

I don't think LLVM generally uses (xxx) in FIXMEs, or if we do it's probably only for bug numbers and maybe developer user names (though the latter's not ideal - people come and go, bug numbers are forever (kinda)) - not sure what the "spaceship" in there is meant to communicate (I'm aware of the C++20 spaceship operator, but I'd expect the thing in the () to be some list of known entities that we keep track of somewhere?)

Maybe // FIXME: Use spaceship operator in C++20?

812–825

Unrelated/accidental reformatting? (please commit separately if it's being committed)

cjdb added a comment.Dec 2 2022, 4:44 PM

I do not ask you to do anything! I just noticed that you add a lot of FormatXXXDiagnostic functions. An alternativ design is to have one FormatDiagnostic function with a mode parameter. Then you can decide whether to print legacy or user-oriented reasons.

If next year you invent another diagnostic, you can extend the enum and the FormatDiagnostic function.

Makes sense to me! Thank you for clarifying.

Sorry if this is derailing, but I wonder/worry about a few things here:

  1. Compounding structured output with phraseology - it seems like it might be worthwhile for these to be separate issues

I agree with you. The reason I've compounded the two is to demonstrate how this is supposed to be used (otherwise it really feels like a nothing burger where I've made change for the sake of change).

(doesn't mean the terminal output has to say exactly the same thing - as you've mentioned, we could have some fields that aren't rendered in some modes (eg: maybe a terminal only gets the summary, and not the reason - or perhaps you can ask for reasons to be provided if you don't mind the verbosity)). In the example case include here - describing the template parameter kind mismatch seems OK for Clang's current textual diagnostic experience & I don't think that would need to be limited to only the SARIF output?

@rsmith and I spent a fair amount of time chatting about giving Clang a verbose diagnostic mode, and we agreed that would have issues:

  1. If you wanted to use unstructured text in brief mode 90% of the time and then change when you need to clarify something, build systems are going to need to rebuild everything.
  2. Because scripts almost certainly exist that consume the current diagnostics, I'm reluctant to alter those in any meaningful way. I even note this in the RFC, where I explicitly state that there are no current plans to bin the existing model.
  1. Could the new phraseology be included in a separate/parallel diagnostic file, essentially akin to a translation? Clang never did get support for multiple languages in its diagnostics, but maybe this is the time to do that work? And then have this new phrasing as the first "translation" of sorts?

Yes and no. For strict a rephrasing, this would be possible, but reasons aren't necessarily going to be just a rephrasing (the one in this patch is not). I intend for the reason section to be a far richer field that contains information that we've never previously provided before: a translation file can't help there. However, using parallel diagnostic files might not be such a bad idea. How do we ensure the two don't get out of sync?

  1. I'm not sure I'd commit to calling the current diagnostic experience "legacy" just yet &

You mentioned "terminal diagnostics" below. I like that a lot more than the original "unstructured reason" and "traditional reason", so let's go with that for now.

maybe this is my broadest feedback: It'd be great to avoid a two-system situation with the intent to merge things back in later.

I don't know how we can without breaking the promise of backwards compatibility.

I think what I'd like to see (though I'm far from the decider in any of this) would be that the existing underlying structured diagnostics system could have a new emission system, that produces SARIF instead of text on a terminal - same underlying structured representation, but providing that more directly (in SARIF/JSON/etc) so that it's not lossy in the conversion to an output format - essentially a machine-readable mode for clang's diagnostic output. (to a file, to the terminal, beside the existing terminal output or instead of it, whatever folks prefer) - this would, ideally, initially require no changes to diagnostic API usage across clang.

I think you may be asking for what I mentioned here? That's doable and probably wise: I just need to add -fdiagnostics-format=sarif-compatibility (flag spelling to be bikeshed). (@tschuett it seems that this flag may happen regardless of whether you intended for it!)

Diagnostic quality improvements, like the template parameter kind mismatch can be committed independently as desired.

I'll coordinate with you next week to address my concerns here, but if we can achieve this, I'll be really happy. I'll be happier again if we can make it so both formats can have the change too :)

Pretty much all of these features could be implemented somewhat in parallel, each feature would be of value independently to varying degrees, and we wouldn't end up with a deprecated/legacy/not ready yet situation (the only "not ready yet" part might be adding the new diagnostic phrasings - perhaps initially there'd be some way to fallback to the traditional diagnostics, so that the whole database didn't need to be translated wholesale - and once it's all translated and there's a lot of user feedback on the direction, we could consider changing the diagnostic database default, perhaps eventually deprecating the old database, and eventually removing it - without major API upheaval/renaming/addition/removal)

Mostly agreement with the above wherever possible.

clang/lib/Basic/DiagnosticIDs.cpp
524

FIXME(spaceship) is really just to make it very easy to find. I can make a bug for it.

812–825

I do what git clang-format HEAD~ asks me to do.

erichkeane added inline comments.Dec 5 2022, 6:26 AM
clang/test/Frontend/sarif-reason.cpp
16

It isn't exactly (in that it is using line-numbers instead of bookmarks), but the ordering is fine for me.

The bookmarking is something like:

LineThatHasNote; // #NoteLine
...
LineThatCausesError;
// expected-error@-1 {{Some Error}}
// expected-note@#NoteLine {{The Note}}

However, what I REALLY care about is that the notes and errors are 'next' to eachother, since they are easier to read that way