This is an archive of the discontinued LLVM Phabricator instance.

[Syntax] Start to move trivial Node class definitions to TableGen. NFC
ClosedPublic

Authored by sammccall on Oct 31 2020, 5:05 PM.

Details

Summary

This defines two node archetypes with trivial class definitions:

  • Alternatives: the generated abstract classes are trivial as all functionality is via LLVM RTTI
  • Unconstrained: this is a placeholder, I think all of these are going to be Lists but today they have no special accessors etc, so we just say "could contain anything", and migrate them one-by-one to Sequence later.

Compared to Dmitri's prototype, Nodes.td looks more like a class hierarchy and
less like a grammar. (E.g. variants list the Alternatives parent rather than
vice versa).
The main reasons for this:

  • the hierarchy is an important part of the API we want direct control over.
    • e.g. we may introduce abstract bases like "loop" that the grammar doesn't care about in order to model is-a concepts that might make refactorings more expressive. This is less natural in a grammar-like idiom.
    • e.g. we're likely to have to model some alternatives as variants and others as class hierarchies, the choice will probably be based on natural is-a relationships.
  • it reduces the cognitive load of switching from editing *.td to working with code that uses the generated classes

Diff Detail

Event Timeline

sammccall created this revision.Oct 31 2020, 5:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 31 2020, 5:05 PM
sammccall requested review of this revision.Oct 31 2020, 5:05 PM

There are *lots* more nodes that can be converted here, I'll wait to review the pattern first :-)

The generated code is identical to the deleted code, except the constructor for abstract classes is protected now.

gribozavr2 accepted this revision.Nov 4 2020, 2:45 AM

e.g. we may introduce abstract bases like "loop" that the grammar doesn't care about in order to model is-a concepts that might make refactorings more expressive. This is less natural in a grammar-like idiom.

Loop can be an alternative rule in the grammar, I'm not sure what is less natural about it:

Statement ::=
  IfStatement | LoopStatement | BreakStatement | ...

LoopStatement ::=
  ForLoopStatement | WhileLoopStatement | ...
clang/include/clang/Tooling/Syntax/Syntax.td
35

I think it would be best to make documentation mandatory, and for nodes that are currently lacking it, add a FIXME so that we clearly see where the tech debt is.

47

+

It corresponds to an alternative rule in the grammar, for example:
Statement ::=
  IfStatement | ForStatement | ...

`Statement` should be modeled with `Alternatives<>`. `IfStatement` and `ForStatement` are derived from `Statement`.
``
clang/utils/TableGen/ClangSyntaxEmitter.cpp
149–150
This revision is now accepted and ready to land.Nov 4 2020, 2:45 AM

Compared to Dmitri's prototype, Nodes.td looks more like a class hierarchy and
less like a grammar. (E.g. variants list the Alternatives parent rather than
vice versa).

e.g. we may introduce abstract bases like "loop" that the grammar doesn't care about in order to model is-a concepts that might make refactorings more expressive. This is less natural in a grammar-like idiom.

Do you have a concrete example of such abstract base -- loop is in the grammar ? And in the case such an example appear, in my mind we would just change "our grammar" to have an alternative for this "loop" abstract base.

e.g. we're likely to have to model some alternatives as variants and others as class hierarchies, the choice will probably be based on natural is-a relationships.

I agree, alternatives and the two ways to model them are a tricky subject...

it reduces the cognitive load of switching from editing *.td to working with code that uses the generated classes

I think we should consider reading prior to editing.
A grammar is how we represent syntax, and syntax trees model syntax, as such I think we should rather consider the cognitive load of switching between the grammar and the definition of syntax trees.

I have a lot to improve in my writing skills, so I don't know if I could express all my thoughts. We could have a call if you want ;)

sammccall marked 3 inline comments as done.Nov 11 2020, 3:18 AM

Compared to Dmitri's prototype, Nodes.td looks more like a class hierarchy and
less like a grammar. (E.g. variants list the Alternatives parent rather than
vice versa).

e.g. we may introduce abstract bases like "loop" that the grammar doesn't care about in order to model is-a concepts that might make refactorings more expressive. This is less natural in a grammar-like idiom.

Do you have a concrete example of such abstract base -- loop is in the grammar ? And in the case such an example appear, in my mind we would just change "our grammar" to have an alternative for this "loop" abstract base.

Not really, I think you're right. This is a bad argument.

e.g. we're likely to have to model some alternatives as variants and others as class hierarchies, the choice will probably be based on natural is-a relationships.

I agree, alternatives and the two ways to model them are a tricky subject...

it reduces the cognitive load of switching from editing *.td to working with code that uses the generated classes

I think we should consider reading prior to editing.

Agree. I think the same load exists for reading though.

A grammar is how we represent syntax, and syntax trees model syntax, as such I think we should rather consider the cognitive load of switching between the grammar and the definition of syntax trees.

I don't think people reason about concrete grammar as a collection of productions though, rather as a collection of kinds of nodes and what they're composed of.
And we model it that way in the C++ API! I think the tablegen should follow that too...

sammccall updated this revision to Diff 304457.Nov 11 2020, 3:40 AM

Address comments, rebase

This revision was landed with ongoing or failed builds.Nov 11 2020, 3:41 AM
This revision was automatically updated to reflect the committed changes.