This is an archive of the discontinued LLVM Phabricator instance.

[SyntaxTree] Provide iterators for Lists
Needs ReviewPublic

Authored by eduucaldas on Oct 26 2020, 8:26 AM.

Details

Reviewers
gribozavr2
Summary

Provide an iterator for List that iterates through elements and delimiters on pairs. This iterator is robust to missing elements.
For instance, in the following function call:
f(a b, , )
The iterator would emit the following elements and delimiters:
[("a", null), ("b", ","), (null, ","), (null, null) ]

This iterator abstracts out many corner cases when dealing with lists, thus providing a cleaner interface for mutation and traversal of Lists.

Note: This iterator has the memory footprint of one pointer.

Diff Detail

Event Timeline

eduucaldas created this revision.Oct 26 2020, 8:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 26 2020, 8:26 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
eduucaldas requested review of this revision.Oct 26 2020, 8:26 AM
eduucaldas edited the summary of this revision. (Show Details)Oct 26 2020, 8:27 AM
eduucaldas added a reviewer: gribozavr2.

Could you add tests that verify the pairing of elements and delimiters?

clang/include/clang/Tooling/Syntax/Tree.h
214
eduucaldas marked an inline comment as done.Oct 26 2020, 9:20 AM

Could you add tests that verify the pairing of elements and delimiters?

Those are surfaced through the previous tests, via getElementsAsNodesAndDelimiters

clang/include/clang/Tooling/Syntax/Tree.h
442

What do you think?

I didn't change it, because we had agreed on ElementsAsNodesAndDelimiters.

Fix comment.

(I started to review this, but now the current diff has only your changes against the first diff, rather than against master - can you upload a new diff?)

clang/include/clang/Tooling/Syntax/Tree.h
231

this name is bulky and hard to read, leads to more unwieldy names later (e.g. getElementAndDelimiterBeforeBegin which despite its length is unclear that it's even an iterator).

Would there be actual confusion in calling this ElementIterator, and providing the associated delimiters in addition to the elements?

282

nit: const (and getDelimiter)

iterators, like pointers, don't propagate const to their pointees.

(This rarely matters, but it sometimes comes up with template goop)

Diff against master, sorry about that

This seems like more implementation and API complexity than is justified by the problems it's solving.
I do think it's possible to drastically simplify it (I've left some suggestions here) but if not, I think we should move it to its own header.

clang/include/clang/Tooling/Syntax/Tree.h
231

overall, this design seems to be focused on the size being a single pointer, at the cost of complexity of most of the operations.

Why is this important to optimize for, compared to doing something like:

private:
  Node *Elt, *Delim, *Next;
  void advance() {
    // read one element
    // set Elt and/or Delim
    // always advances Next by at least one
  }
public:
  Iterator(Node *Start) : Elt(nullptr), Delim(nullptr), Next(Start) {}
  operator++() { advance(); }
  Leaf *getDelimiter() { return Delim; }
  Node *getElement() { return Elt; }

(Performance isn't the biggest concern here, but I'd usually expect two extra pointers on the stack to be preferable to all the branches in the current impl)

231

how sure are we that the template and strong typing is worthwhile on the iterators? Can we try without it for a while and see how painful it is?

235

Are the before-begin iterators needed immediately?

My understanding is that before-begin iterators are used with nodes due to the single-linked-list implementation, but that's going away. Can we avoid adding more of these by sequencing those changes differently?

242

NotSentinel -> Element?

Logically, the iterator is pointing at an element in the list. (If the element happens to be missing, fine...)

272

If we're not willing for this to be an InputIterator (I'm still not clear *why* not) or a stashing iterator (ditto), then maybe we might as well define operator* to refer to Element so people only have to use explicit iterators if they care about delimiters?

285

this is a crash if the list contains elements that have a type that doesn't match the container's expected type.

If we really need the strongly-typed version of this for constrained lists, we should define behavior here - probably return nullptr, just like an accessor on Tree for a strongly-typed single child whose type doesn't match what's expected.

364

what's the use case for this? What does it return on a default-constructed iterator?

443

these names are way way too long unless this is is going to be very rarely used (in which case it shouldn't be members at all).
Can we have iterator_range<...> getDelimitedElements() or something? (FWIW I'd prefer just getElements() and have the vector version be collectElements() or so)

clang/lib/Tooling/Syntax/Tree.cpp
352

Let's avoid deliberately crashing on invalid code.

I'd suggest either treating over non-element-non-delimiters as if they don't exist, or treating them as elements by default.

eduucaldas marked 8 inline comments as done.Oct 27 2020, 8:57 AM

I left some points unanswered, I'll answer them tomorrow :)

clang/include/clang/Tooling/Syntax/Tree.h
231

how sure are we that the template and strong typing is worthwhile on the iterators? Can we try without it for a while and see how painful it is?

It avoids duplication of code on derived classes - CallArguments, ParameterDeclarationList.... I suggest that we try to simplify other things and then rediscuss this point.

  1. BeforeBegin.
  2. Trying to fit everything into a pointer.
  3. Templates and strong typing.
231

Would there be actual confusion in calling this ElementIterator, and providing the associated delimiters in addition to the elements?

This is a very interesting proposition! And it brought some interesting ideas. We'll further develop on them and answer to this inline tomorrow :)

235
242

Agreed

285

We want children of lists with role ListElement to have the expected C++ type.

Here is how an accessor on Tree is implemented.

syntax::Expression *syntax::BinaryOperatorExpression::getLhs() {   
  return cast_or_null<syntax::Expression>(                                   
  findChild(syntax::NodeRole::LeftHandSide)); 
}

Notice that if there is a Node with LeftHandSide it *must* be of the type Expression.

364

This is useful for the mutations API.
To remove a range of list elements, you may need to know the termination kind of this list, and thus you need to access the parent list.
Of course you could do it like getElement()->getParent() but that wouldn't work for sentinels and neither for the case of a missing last element on a separated list.

"a, b,"    <=> [("a" , ","), ("b" , "," ), (null, null)]

I confess I didn't think on default-constructed iterators. I don't see any use for those, however they *are* required for LegacyForwardIterator and onwards.

clang/lib/Tooling/Syntax/Tree.cpp
352

We thought about invalid code.
This llvm_unreachable states an invariant on List, that any children can only have one of the two roles.

This would imply that a tree builder - which for invalid code would be the pseudo parser - would assign ListDelimiter for everything that matches the expected delimiter, and try to parse other things into the expected ElementType.

Let's examine an example. In:
f(1, +, 2)
We have the List CallArguments for which ElementType = Expression.
We have some options:

  • + is a Leaf with role Unknown.(which breaks the current invariant)
  • + is part of an Expression, more specifically a BinaryOperatorExpression with lhs and rhs missing.

At the same time we could think of another example:
f(1, int, 2)
What would we do in this case?

In conclusion, I think this invariant is a bit constraining. We can consider relaxing it to allow Unknown roles. I think in general it would be good for us to have a common understanding of the plans for the pseudo parser.

eduucaldas marked 6 inline comments as done.
  • const on getElement and similar.
  • NotSentinel -> Element
sammccall added inline comments.Oct 27 2020, 9:47 AM
clang/lib/Tooling/Syntax/Tree.cpp
352

We thought about invalid code.
This llvm_unreachable states an invariant on List, that any children can only have one of the two roles.

I think this is a bigger topic we should separate from this code review (though it ties into the template question). Maybe we can meet later in the week?

There's a tension between the generic and typed interfaces when it comes to modification: the typed interfaces benefit from and can enforce strong semantic invariants, where generic interfaces cannot.
It looks like the current approach is "invariants are strong, generic mutation APIs don't exist".
I'm not sure this is the right long-term direction:

  • many candidate invariants are incompatible with rich representation of broken code, and pseudoparsing, it's going to make the model very hard to understand
  • it requires all refactorings to be built on "blessed" in-tree primitives that preserve invariants. This does not scale well, and does not give out-of-tree users the flexibility they need to solve their problems - many are stuck until the next LLVM release.
  • generic/uniform access is empirically useful for some problems, and making access read-only is a large restriction on that option. (Examples: pseudoparsing, de/serialization)
  • breaking invariants is sometimes the right thing to do. Consider a user dragging code around in an IDE - we'd adjust the selection (e.g. snap to complete nodes) before moving it, but we can't just reject the edit due to syntax.