This is an archive of the discontinued LLVM Phabricator instance.

[Syntax] Add iterators over children of syntax trees.
ClosedPublic

Authored by sammccall on Oct 23 2020, 4:09 AM.

Details

Summary

This gives us slightly nicer syntax (foreach) for idioms currently expressed
as a loop, and the option to use range algorithms where it makes sense
(e.g. llvm::all_of et al encapsulate the needed flow control in a useful way).

It's also a building block for iteration over filtered views (e.g. iterate over
all Stmt children, with the right type):
for (const Statement &S : filter<Statement>(N.children()))

...

I realize the recent direction has been mostly towards strongly-typed
node-specific facilities, but I think it's important we have convenient
generic facilities too.

Diff Detail

Event Timeline

sammccall created this revision.Oct 23 2020, 4:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 23 2020, 4:09 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
sammccall requested review of this revision.Oct 23 2020, 4:09 AM

Thanks Sam! I learned a lot from your patch ^^

clang/include/clang/Tooling/Syntax/Tree.h
167–194

I think we should only have an iterator through Nodes, not a general one like this.

In general Tree has *heterogeneous* children:
For instance, 1+2 yields the syntax tree:

BinaryOperatorExpression
|-  IntegerLiteralExpression
|-  Leaf
`- IntegerLiteralExpression

Very rarely will syntax trees have all their children of the same kind, so I don't think it makes sense to try an provide an iterator template. That makes sense for lists , because generally they *have* the same kind. But in that case we provide special iterators for lists only.

211–228

TL;DR:
child_iterator -> ChildIterator
const_child_iterator -> ConstChildIterator
children -> getChildren

I see you followed the naming convention of the ClangAST, which makes loads of sense.

In syntax trees we follow the naming conventions of the conding standard -- just because we want consistency and we had to pick a guideline to follow -- according to that, functions should be verb like and names should be camel case.

If like us you chose const_child_iterator and children kind of "just because", could you change it to follow ConstChildIterator and getChildren respectively.

clang/lib/Tooling/Syntax/Tree.cpp
22–24

Hmm...
That looks funny, if the user uses a range-based for loop and forgets the &, then we would be copying the Node???

Also in many places we had to get the address to the node. That is not really consistent with how the syntax trees API's were designed, they generally take pointers instead of references. (that said we could obviously reconsider those design choices)

clang/unittests/Tooling/Syntax/TreeTest.cpp
129

Suggestion:
I would split this into Iterators and ConstIterators.

139

nit, feel free to ignore.

159–169

Is there a reason why you didn't use a range-based for loop over Children?

sammccall updated this revision to Diff 300674.Oct 26 2020, 7:54 AM
sammccall marked an inline comment as done.

Style changes

clang/include/clang/Tooling/Syntax/Tree.h
167–194

This does only iterate over nodes, note this class is private.
This is a common pattern when implementing iterators: you need iterator and const_iterator, they have almost identical implementations but operate on different types (const T vs T). Thus a private template.

Sometimes it's used directly (using iterator = iterator_base<Node>) but that means the template needs to be visible and also leads to worse compiler diagnostics when the canonical type is printed. Inheritance lets us prevent direct use of the base class and give the iterator classes distinct names.


That said, in future, I think we should (later) have a facility to iterate over children of a certain type. This would explicitly be a filtering operation, ignoring nodes not of that type.
Providing these typed operations for concrete subclasses of List makes sense, but in clangd outside of specific refactorings we've found generic facilities to often be more useful. When we designed syntax trees, providing a simple and generic data structure that could be used directly was an explicit goal.

211–228

In syntax trees we follow the naming conventions of the conding standard

I see that's been used in some of the newer classes, and that you changed this file to follow that style in 4c14ee61b73746b314d83e7c52e03d6527b78105. However it's not the original style we used here and elsewhere in libSyntax.
It's a bit frustrating to hear the argument about consistency, because we were quite careful and deliberate about this style, which was IMO fairly consistent prior to that change.

I'm willing to change these to match the local style in this file. However *_iterator, *_begin/end/range() is more common than FooIterator etc in LLVM overall (I think for consistency with iterator etc, which is endorsed by the style guide). So I don't think this change is particularly an improvement, even in terms of consistency.

clang/lib/Tooling/Syntax/Tree.cpp
22–24

That looks funny, if the user uses a range-based for loop and forgets the &, then we would be copying the Node???

It doesn't look funny to me - foreach usually iterates over values rather than pointers. This raises a good point - it looks like Node is copyable and shouldn't be. (The default copy operation violates the tree invariants, e.g. the copied node will not be a child of its parent). I'll send a patch...

(That said this is always the case with copyable objects in range-based for loops, and indeed using references - that doesn't usually mean we avoid using them)

Also in many places we had to get the address to the node. That is not really consistent with how the syntax trees API's were designed, they generally take pointers instead of references

I'm not convinced that taking the address is itself a burden, any more than using -> instead of . is a burden.
Sometimes the use of pointer vs reference intends to convey something about address stability, but here this goes with the type as all nodes are allocated on the arena. (IMO this is a part of the AST design that works well).
The other thing it conveys is nullability, and this seems valuable enough that IMO APIs that take *non-nullable* nodes by pointer should be reconsidered.

For the iterators specifically, the main alternative here I guess is having Tree::children() act as a container of *pointers to* children, instead of the children themselves. This *is* highly unconventional and surprising. Consider the ubiquitous for (const auto&N : T->children())... now the author/reader expects N to be a const reference to a child, instead it's a const reference to a non-const pointer to a child....

clang/unittests/Tooling/Syntax/TreeTest.cpp
129

Hmm, I'm not a big fan of duplicating all the setup bits. And where would we test that iterators can be converted to an equivalent const_iterator? (This doubles as our implicit test of the multi-pass guarantee, too...)

159–169

I want to be really explicit about the expectation that this yields three elements (as opposed to say zero).

The ElementsAre tests above should disallow that going wrong, but calling the methods once directly with the dumbest possible code seems worthwhile to me.

gribozavr2 accepted this revision.Oct 26 2020, 8:34 AM
gribozavr2 added a subscriber: gribozavr2.
gribozavr2 added inline comments.
clang/include/clang/Tooling/Syntax/Tree.h
192

"nodePointer()" ?

211–228

It can go either way:

As an exception, classes that mimic STL classes can have member names in STL’s style of lower-case words separated by underscores

219
clang/lib/Tooling/Syntax/Tree.cpp
22–24

That looks funny, if the user uses a range-based for loop and forgets the &, then we would be copying the Node???

Aren't they noncopyable?

This revision is now accepted and ready to land.Oct 26 2020, 8:34 AM
sammccall marked 2 inline comments as done.Oct 26 2020, 8:54 AM
sammccall added inline comments.
clang/include/clang/Tooling/Syntax/Tree.h
192

I find this a little confusing; it suggests to me there's also something *other* than the node here.
What aspect does it help with?

I can live with it, though currently would prefer pointer(), getPointer() or asPointer().

211–228

Yeah, though read narrowly that doesn't apply (Tree doesn't really mimic STL, and the rule applies to Tree::ChildIterator's members rather than the class itself).

I think it's rare enough to spell out iterator class names that those don't matter at all, and getChildren() is necessary given getParent() etc.

clang/lib/Tooling/Syntax/Tree.cpp
22–24

Noncopyable after D90163.

sammccall updated this revision to Diff 300691.Oct 26 2020, 8:55 AM
sammccall marked an inline comment as done.

Simplify ConstChildIterator cross-constructor

gribozavr2 added inline comments.Oct 26 2020, 9:03 AM
clang/include/clang/Tooling/Syntax/Tree.h
192

it suggests to me there's also something *other* than the node here.

Not in this case, but there often are other pointers -- LLVM types often have an "opaque pointer" representation, and that was the first thing that I thought about when I read asPointer(). However, it is strongly typed, so one can figure it out. Up to you.

eduucaldas accepted this revision.Oct 26 2020, 9:12 AM

Thanks for the instructive replies

This revision was landed with ongoing or failed builds.Oct 28 2020, 4:38 AM
This revision was automatically updated to reflect the committed changes.

@sammccall I've pushed rGc0053c62d9a0 to fix MSVC builds which were struggling to find a default constructor for ConstChildIterator

http://lab.llvm.org:8011/#/builders/83

Ah, thanks a lot, sorry for missing that!