This is an archive of the discontinued LLVM Phabricator instance.

[Clang Tablegen][RFC] Allow Early Textual Substitutions in `Diagnostic` messages.
ClosedPublic

Authored by EricWF on May 10 2018, 9:06 PM.

Details

Summary

There are cases where the same string or select is repeated verbatim in a lot of diagnostics. This can be a pain to maintain and update. Tablegen provides no way stash the common text somewhere and reuse it in the diagnostics, until now!

This patch allows diagnostic texts to contain %sub{<definition-name>}, where <definition-name> names a Tablegen record of type TextSubstitution. These substitutions are done early, before the diagnostic string is otherwise processed. All %sub modifiers will be replaced before the diagnostic definitions are emitted.

The substitution must specify all arguments used by the substitution, and modifier indexes in the substitution are re-numbered accordingly. For example:

def select_ovl_candidate : TextSubstitution<"%select{function|constructor}0%select{| template| %2}1">;

when used as

"candidate `%sub{select_ovl_candidate}3,2,1 not viable"

will act as if we wrote:

"candidate %select{function|constructor}3%select{| template| %1}2 not viable"

Diff Detail

Repository
rC Clang

Event Timeline

EricWF created this revision.May 10 2018, 9:06 PM
EricWF updated this revision to Diff 146279.May 10 2018, 9:39 PM

Fix copy paste error.

a.sidorin resigned from this revision.May 11 2018, 1:38 AM
rjmccall accepted this revision.May 11 2018, 10:41 AM

This seems like a good idea to me. I wonder if it would be better to take advantage of this to shrink the string tables by preserving the substitution structure until runtime, but that really doesn't need to be part of this first try.

utils/TableGen/ClangDiagnosticsEmitter.cpp
1209

I see why this has to be done separately, I think, but it should at least go in a helper function.

Also, please check for substitution-name conflicts.

This revision is now accepted and ready to land.May 11 2018, 10:41 AM

Thank you, I've been wanting this for ages but never got around to it :)

(I'd been thinking about doing this by adding custom enum -> text mappings, but I think your approach is at least somewhat more general.)

include/clang/Basic/DiagnosticSemaKinds.td
3626–3634

Why do some of these have trailing spaces and others not? That looks wrong to me.

3648

... and this is the reason. %1 is sometimes empty and sometimes non-empty, and we're "cunningly" passing that information through %0. Yuck.

It looks like this particular diagnostic refactoring can't work quite like this if we want it to be whitespace-correct (which I think we do), because some uses of your factored-out text also include a name and others don't. Perhaps adding another select on %0 that either adds a space or not would work, or two separate substitutions for the "include name" and "don't include name" cases?

More generally I wonder if we actually want to renumber %i's through a %sub, so that we could define the substitution as

TextSubstitution<
   "%select{function|function|constructor|function %1|function %1|constructor %1|...}0">

and then use it as, eg

"candidate %sub{select_ovl_candidate_kind_with_name}5,6 not viable"

and have it act as if we wrote

"candidate %select{function|function|constructor|function %6|function %6|constructor %6|...}5 not viable">

I expect we'll need that pretty much as soon as we have a substitution that takes more than one parameter, and I think we want a multi-parameter substitution for this refactoring :)

EricWF updated this revision to Diff 146467.May 12 2018, 2:19 AM

@rsmith How does this look?

Turns out we have to fully parse the diagnostic text in order to substitute modifier indexes, which is a bit of a complication. This patch does exactly that.

There is still some work done to produce useful error messages when the substitution is ill-formed, but hopefully the design of this patch is, in general, OK>

EricWF added inline comments.May 12 2018, 2:26 AM
test/TableGen/text-substitution.td
10

This test TBD. The emit-diag-docs.td test should hit all the major cases.

EricWF updated this revision to Diff 146470.May 12 2018, 4:57 AM
  • Update tests.

The last thing I want to tackle is improving parsing errors to at least name the diagnostic being parsed. Naming *why* it's invalid if we don't provide its name.

@rsmith Can select indexes be negative? What's the correct type to represent them? Existing code seems to use int, but the LLVM style seems to suggest unsigned is more appropriate.

utils/TableGen/ClangDiagnosticsEmitter.cpp
590

This is unused. Removing.

EricWF updated this revision to Diff 146491.May 12 2018, 5:28 PM
EricWF edited the summary of this revision. (Show Details)
  • Get argument renumbering work as @rsmith requested.
  • Make the changes to existing diagnostics whitespace correct.
  • Cleanup DiagnosticSemaKinds.td and SemaOverload.cpp to take advantage of this change.

@rsmith, @rjmccall: This change set is a lot larger now. Is it appropriate to commit the Sema cleanup with this patch?

EricWF requested review of this revision.May 12 2018, 5:29 PM
EricWF marked 4 inline comments as done.May 12 2018, 5:48 PM
EricWF updated this revision to Diff 146495.May 12 2018, 8:24 PM

Document %sub in InternalsManual.rst

Thanks, this is great.

Turns out we have to fully parse the diagnostic text in order to substitute modifier indexes, which is a bit of a complication. This patch does exactly that.

I like that you were able to reuse the existing parsing code for documentation generation here.

@rsmith Can select indexes be negative? What's the correct type to represent them? Existing code seems to use int, but the LLVM style seems to suggest unsigned is more appropriate.

No, select indexes must be in the range 0..<number of options-1>. I don't think we actually have a convention of using unsigned for integers that can't be negative, and I suspect that if we thought about a policy we'd pick the opposite one :)

@rsmith, @rjmccall: This change set is a lot larger now. Is it appropriate to commit the Sema cleanup with this patch?

I would ideally like to see one substitution added, and one set of diagnostics converted, as part of this patch, and for other diagnostics to be converted in separate changes after that.

docs/InternalsManual.rst
337

Missing the sub specifier here?

341–342

I don't think "Class:" makes sense here, since what this specifier consumes depends on its replacement string. "Class: varies" or similar might work, but maybe just drop it?

include/clang/Basic/DiagnosticSemaKinds.td
3526–3544

Is there a reason this one wasn't changed to use %sub?

test/TableGen/DiagnosticBase.inc
1–25

This is the wrong filename.

EricWF marked 4 inline comments as done.EditedMay 12 2018, 10:17 PM

Thanks, this is great.

Turns out we have to fully parse the diagnostic text in order to substitute modifier indexes, which is a bit of a complication. This patch does exactly that.

I like that you were able to reuse the existing parsing code for documentation generation here.

With heavy modifications :-)

But now that we have a full AST representation of diagnostics text, imagine what we can do next!

docs/InternalsManual.rst
341–342

Dropping it seems clearer.

include/clang/Basic/DiagnosticSemaKinds.td
3526–3544

It refers to implicit special members differently ("is the implicit default constructor" -> "constructor (the implicit default constructor"). I think I mistakenly thought transforming it would be grammatically incorrect. But looking further it seems fine.

EricWF updated this revision to Diff 146498.May 12 2018, 10:47 PM
EricWF marked 2 inline comments as done.

Address @rsmiths comments.

  • Prefer int to unsigned.
  • Substitute note_ovl_candidate.
  • Fix issues in docs.

Thanks, this is great.

Turns out we have to fully parse the diagnostic text in order to substitute modifier indexes, which is a bit of a complication. This patch does exactly that.

I like that you were able to reuse the existing parsing code for documentation generation here.

With heavy modifications :-)

EricWF updated this revision to Diff 146508.May 13 2018, 5:38 AM
EricWF marked an inline comment as done.

Misc cleanups.

  • proof read and correct documentation.
EricWF marked 2 inline comments as done.May 18 2018, 2:16 PM
EricWF added inline comments.
utils/TableGen/ClangDiagnosticsEmitter.cpp
1209

@rjmccall By substitution name conflicts do you mean substitution names which conflict with diagnostic names?

EricWF updated this revision to Diff 147581.May 18 2018, 2:16 PM
EricWF marked an inline comment as done.
  • Add error if a substitution and a diagnostic share the same name.

Any final comments on this?

rjmccall added inline comments.May 18 2018, 4:20 PM
utils/TableGen/ClangDiagnosticsEmitter.cpp
1209

I meant between substitutions, but I guess that's probably handled automatically by the base TableGen parser, sorry.

EricWF marked 2 inline comments as done.May 18 2018, 4:27 PM

Any final comments on this?

I would ideally like to see one substitution added, and one set of diagnostics converted, as part of this patch, and for other diagnostics to be converted in separate changes after that. Please pick one that doesn't also require changes to Sema, so this patch is separate from the refactoring of the Sema diagnostic emission.

EricWF updated this revision to Diff 147640.May 18 2018, 7:54 PM
  • Remove changes to Sema, and overload resolution diagnostics. Only change the special member function diagnostics as an example.

If there are no further comments, I'll commit this once I've run the test suite.

This revision was not accepted when it landed; it landed in state Needs Review.May 18 2018, 8:15 PM
This revision was automatically updated to reflect the committed changes.