This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Tooling/Syntax/
-
clang/
-
Tooling/
-
Syntax/
19/24
Nodes.h
-
lib/Tooling/Syntax/
-
Tooling/
-
Syntax/
6/6
BuildTree.cpp
-
Nodes.cpp
-
Tree.cpp
-
unittests/Tooling/Syntax/
-
Tooling/
-
Syntax/
-
TreeTest.cpp

Differential D63835

[Syntax] Add nodes for most common statements
ClosedPublic

Authored by ilya-biryukov on Jun 26 2019, 11:47 AM.

Download Raw Diff

Details

Reviewers

sammccall

Commits

rG58fa50f43701: [Syntax] Add nodes for most common statements

Summary

Most of the statements mirror the ones provided by clang AST.
Major differences are:

expressions are wrapped into 'ExpressionStatement' instead of being a subclass of statement,
semicolons are always consumed by the leaf expressions (return, expression satement, etc),
some clang statements are not handled yet, we wrap those into an UnknownStatement class, which is not present in clang.

We also define an 'Expression' and 'UnknownExpression' classes in order
to produce 'ExpressionStatement' where needed. The actual implementation
of expressions is not yet ready, it will follow later.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 34705
Build 34704: arc lint + arc unit

Event Timeline

ilya-biryukov created this revision.Jun 26 2019, 11:47 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 26 2019, 11:47 AM

ilya-biryukov added a parent revision: D61637: [Syntax] Introduce syntax trees.Jun 26 2019, 11:48 AM

Harbormaster completed remote builds in B33964: Diff 206721.Jun 26 2019, 11:48 AM

This change mostly aims to illustrate that TreeBuilder seems to be powerful enough to go beyond basic nodes.
But it also introduces enough nodes to make the syntax trees minimally useful for traversing statement nodes. Hopefully that could become a good basis to define other APIs (mutations, etc).

sammccall added inline comments.Jul 8 2019, 10:46 AM

clang/include/clang/Tooling/Syntax/Nodes.h
25	there are going to be many of these. I'd suggest either sorting them all, or breaking them into blocks (e.g. statements vs declarations vs leaf/tu/etc) and sorting those blocks.
130	Do you want to expose the statement-ending-semicolon here? (Not all statements have it, but common enough you may want it in the base class instead of all children)
175	The fact that this can be an arbitrary statement is kind of shocking. But apparently true! In the long run, we're probably going to be able to find the case statements somehow, even though they're not part of the immediate grammar. Not sure whether this should be via the regular AST or by adding links here. Anyway, problem for another day.
190	expression for the value?
191	syntactically, is it useful to model the body as a single statement? It's not a CompoundStmt as it has no braces. Seems like a sequence... Or is the idea that the first following statement is the body (might be nothing), and subsequent ones aren't part of the body? Why is this more useful than making the body a sibling?
207	might be handy to unify this with CaseStatement somehow (a base class, or make it literally a CaseStatement with a null body and a bool isDefaultCase() that looks at the keyword token) Mostly thinking about code that's going to iterate over case statements.
216	I guess the missing cond here (and similar below) are due to complexities around the variable declaring variants? Warrants a FIXME I think
224	I think throughout it's important to mark which of these are: nullable in correct code nullable in code generated by recovery
296	(any reason we can't already have the expr here?)

Submitting a few comments to start up the discussions.

The actual changes will follow.

clang/include/clang/Tooling/Syntax/Nodes.h
130	Yes, only "leaf" (i.e. the ones not having any statement children) have it. I was thinking about: having a separate class for non-composite statements and providing an accessor there, providing an accessor in each of the leaf statements (would mean some duplication, but, arguably, better discoverability). But, from an offline conversation, we seem to disagree that inheritance is a proper way to model this. Would it be ok to do this in a follow-up? I'll add a FIXME for now.
191	This models the structure of the C++ grammar (and clang AST). Getting from a switch statements to all its `case` and `default` labels seems useful, but should be addressed by a separate API that traverses the corresponding syntax tree nodes. Marking as done, from an offline conversation we seem to agree here. Feel free to reopen if needed.
207	I would model with with a base class, but let's agree whether that's the right way to approach this.
224	I would suggest to only mark the nodes that are nullable in the correct code. For recovery, I would assume the following rule (please tell me if I'm wrong): On a construct whose parsing involved recovery: if the node has an introducing token (`if`, `try`, etc.), the corresponding child cannot be null. any other child can be null.

Rebase
Address comments
Restructure the roles
Remove the role from tree dumps for now With too many roles it is annoying to update the test outputs on incremental changes. I tried using the symbolic role names there, but they end up being too verbose.

Harbormaster completed remote builds in B34687: Diff 208937.Jul 10 2019, 6:29 AM

ilya-biryukov added inline comments.Jul 10 2019, 6:29 AM

clang/include/clang/Tooling/Syntax/Nodes.h
216	Yes. Added a FIXME
296	Added a getter for it.

Mark groups of kinds for statements and expressions

Harbormaster completed remote builds in B34705: Diff 208995.Jul 10 2019, 9:48 AM

This is ready for another round

clang/include/clang/Tooling/Syntax/Nodes.h
25	I've added two blocks now - statements and expressions. Did not sort, though, I find the semantic grouping (loop statements close to each other) more useful, but hard to keep consistent.

ilya-biryukov added a child revision: D64573: [Syntax] Allow to mutate syntax trees.Jul 11 2019, 9:44 AM

sammccall accepted this revision.Aug 5 2019, 4:20 AM

sammccall added inline comments.

clang/include/clang/Tooling/Syntax/Nodes.h
24	Can you add a comment here saying the ordering/blocks must correspond to the Node inheritance hierarchy? This is kind of common knowledge, but I think this is normally handled by tablegen.
79	As discussed offline, there's some options about how abstract/concrete these roles should be. e.g. for a list of function args, this could be FunctionOpenParen/FunctionArgExpr/FunctionArgComma/FunctionCloseParam (specific) <-> OpenParen/Arg/Comma/CloseParen <-> Open/Item/Separator/Close. The more specific ones are somewhat redundant with the parent/child type (but easy to assign systematically), and the more generic ones are more orthogonal (but require more design and may by hard to always make consistent). The concrete advantage of the generic roles is being able to write code like `getTrailingSemicolon(Tree)` or `findLoopBreak(Stmt)` or `removeListItem(Tree*, int)` in a fairly generic way, without resorting to adding a `Loop` base class or handling each case with separate code. This is up to you, though.
130	First: yes, let's not do this now. Second: I'm wary of using standard is-a inheritance to model more than alternation in the grammar. That is, ForStatement is-a Statement is fine, ForStatement is-a LoopyStatement is suspect. This is to do with the fact that LoopyStatement is-a Statement seems obvious, and we may end up with diamond-shaped inheritance and some conceptual confusion. This goes for all traits that aren't natural tree-shaped inheritance: HasTrailingSemicolon, LoopyStatement, ... I think there are two concerns here: we want to be able to get the trailing-semicolon if it exists we want to be able to check if the trailing-semicolon is expected including via its static type One way to do this (not the only one...): // generic helper, or callers could even write this directly Optional<Leaf> trailingSemi(Tree Node) { return firstElement(Node->Children<Leaf>(NodeRole::TrailingSemi)); } // mixin for trailing semi support. Note: doesn't inherit Statement! // maybe need/want some CRTP magic class TrailingSemicolon { Optional<Leaf> trailingSemi() const { return trailingSemi((const Node)this; } } // Gets the trailingSemi() accessor. ExprStmt : public Statement, TrailingSemicolon { ... }
224	Agree with this strategy, and the fact that it doesn't need to be documented on every node/occurrence. But it should definitely be documented somewhere at a high level! (With clang AST, this sort of thing feels like tribal knowledge)
265	nullable, marked somehow Optional<Expression> is tempting as a systematic and hard-to-ignore way of documenting that. And it reflects the fact that there are three conceptual states for children: present, legally missing, brokenly missing. At the same time, I'm not sure how to feel about the fact that in practice this can't be present but null, and the fact that other* non-optional pointers can be null.
clang/lib/Tooling/Syntax/BuildTree.cpp
99	since Expr : Stmt, we need to be a bit wary of overloading based on static type. It's tempting to say it's correct here: if we statically know E is an Expr, then maybe it's never correct to consume the semicolon. But is the converse true? e.g. if we're traversing using RAV and call getRange() in visitstmt... (The alternatives seem to be a) removing the expr version of the function, and having the stmt version take a `bool ConsumeSemi` or b) change the stmt version to have (dynamic) expr behave like the expr overload, and handle it specially when forming exprstmt. More verbose, genuinely conflicted here)
270	maybe group with corresponding `WalkUpFromCXXForRangeStmt`? (Could also group all `Traverse*` together if you prefer. Current ordering seems a little random)
272	nit: RAV?

This revision is now accepted and ready to land.Aug 5 2019, 4:20 AM

Group Traverse* and Walk* together
s/RAT/RAV
Add a comment about nullability of the accessors
Name function for consuming statements and expressions differently

clang/include/clang/Tooling/Syntax/Nodes.h
79	I definitely agree that writing generic functions is simpler with the proposed approach. However, I am aiming for safer APIs here, albeit less generic. E.g. we'll have something like `removeFunctionArgument(ArgumentList, int)` and `removeInitializer(InitializerList, int)` rather than `removeListItem(Tree*, int)` in the public API. Reasons are discoverability of the operations for particular node types. Generic functions might still make sense as an implementation detail to share the code. I'll keep as is for now, but will keep the suggestion in mind.
224	Added a corresponding comment to the file header.
265	Having `Optional<Expression*>` models the problem space better, but is much harder to use on the client side. I'd keep as is, the file comment explains that one should assume all accessors can return null. Update the comment here to indicate both `return;` and `return <expr>;` are represented by this node.
clang/lib/Tooling/Syntax/BuildTree.cpp
99	Using two functions with different names now.
270	Went for grouping `Traverse` and `Walk` together. Normally would also choose to put related methods (i.e. `WalkUpFromX` and `TraverseX`) together, but they have totally different meaning here: `TraverseX` creates a workaround for suboptimal RAV traversals and `WalkUpFromX` actually builds the syntax tree.
272	Right, thanks. `RAT` was funnier, though. Even if incorrect...

Closed by commit rG58fa50f43701: [Syntax] Add nodes for most common statements (authored by ilya-biryukov). · Explain WhyNov 6 2019, 2:04 AM

This revision was automatically updated to reflect the committed changes.

Build result: fail - 59843 tests passed, 21 failed and 768 were skipped.

failed: lld.ELF/linkerscript/filename-spec.s
failed: Clang.Index/index-module-with-vfs.m
failed: Clang.Modules/double-quotes.m
failed: Clang.Modules/framework-public-includes-private.m
failed: Clang.VFS/external-names.c
failed: Clang.VFS/framework-import.m
failed: Clang.VFS/implicit-include.c
failed: Clang.VFS/include-mixed-real-and-virtual.c
failed: Clang.VFS/include-real-from-virtual.c
failed: Clang.VFS/include-virtual-from-real.c
failed: Clang.VFS/include.c
failed: Clang.VFS/incomplete-umbrella.m
failed: Clang.VFS/module-import.m
failed: Clang.VFS/module_missing_vfs.m
failed: Clang.VFS/real-path-found-first.m
failed: Clang.VFS/relative-path.c
failed: Clang.VFS/test_nonmodular.c
failed: Clang.VFS/umbrella-framework-import-skipnonexist.m
failed: Clang.VFS/vfsroot-include.c
failed: Clang.VFS/vfsroot-module.m
failed: Clang.VFS/vfsroot-with-overlay.c

Log files: console-log.txt, CMakeCache.txt

Harbormaster failed remote builds in B40564: Diff 228023!Nov 6 2019, 2:13 AM

Revision Contents

Path

Size

clang/

include/

clang/

Tooling/

Syntax/

Nodes.h

229 lines

lib/

Tooling/

Syntax/

BuildTree.cpp

205 lines

Nodes.cpp

185 lines

Tree.cpp

11 lines

unittests/

Tooling/

Syntax/

TreeTest.cpp

317 lines

Diff 208995

clang/include/clang/Tooling/Syntax/Nodes.h

	Show All 15 Lines
	#include "clang/Tooling/Syntax/Tree.h"			#include "clang/Tooling/Syntax/Tree.h"
	#include "llvm/ADT/ArrayRef.h"			#include "llvm/ADT/ArrayRef.h"
	#include "llvm/ADT/StringRef.h"			#include "llvm/ADT/StringRef.h"
	#include "llvm/Support/raw_ostream.h"			#include "llvm/Support/raw_ostream.h"

	namespace clang {			namespace clang {
	namespace syntax {			namespace syntax {

	/// A kind of a syntax node, used for implementing casts.			/// A kind of a syntax node, used for implementing casts.
				sammccallUnsubmitted Done Reply Inline Actions Can you add a comment here saying the ordering/blocks must correspond to the Node inheritance hierarchy? This is kind of common knowledge, but I think this is normally handled by tablegen. sammccall: Can you add a comment here saying the ordering/blocks must correspond to the Node inheritance…
	enum class NodeKind : uint16_t {			enum class NodeKind : uint16_t {
				sammccallUnsubmitted Done Reply Inline Actions there are going to be many of these. I'd suggest either sorting them all, or breaking them into blocks (e.g. statements vs declarations vs leaf/tu/etc) and sorting those blocks. sammccall: there are going to be many of these. I'd suggest either sorting them all, or breaking them into…
				ilya-biryukovAuthorUnsubmitted Done Reply Inline Actions I've added two blocks now - statements and expressions. Did not sort, though, I find the semantic grouping (loop statements close to each other) more useful, but hard to keep consistent. ilya-biryukov: I've added two blocks now - statements and expressions. Did not sort, though, I find the…
	Leaf,			Leaf,
	TranslationUnit,			TranslationUnit,
	TopLevelDeclaration,			TopLevelDeclaration,

				// Expressions
				UnknownExpression,

				// Statements
				UnknownStatement,
				DeclarationStatement,
				EmptyStatement,
				SwitchStatement,
				CaseStatement,
				DefaultStatement,
				IfStatement,
				ForStatement,
				WhileStatement,
				ContinueStatement,
				BreakStatement,
				ReturnStatement,
				RangeBasedForStatement,
				ExpressionStatement,
	CompoundStatement			CompoundStatement
	};			};
	/// For debugging purposes.			/// For debugging purposes.
	llvm::raw_ostream &operator<<(llvm::raw_ostream &OS, NodeKind K);			llvm::raw_ostream &operator<<(llvm::raw_ostream &OS, NodeKind K);

	/// A relation between a parent and child node. Used for implementing accessors.			/// A relation between a parent and child node, e.g. 'left-hand-side of a binary
				/// expression'. Used for implementing accessors.
	enum class NodeRole : uint8_t {			enum class NodeRole : uint8_t {
	// A node without a parent.			// Roles common to multiple node kinds.
				/// A node without a parent
	Detached,			Detached,
	// Children of an unknown semantic nature, e.g. skipped tokens, comments.			/// Children of an unknown semantic nature, e.g. skipped tokens, comments.
	Unknown,			Unknown,
	// FIXME: should this be shared for all other nodes with braces, e.g. init			/// An opening parenthesis in argument lists and blocks, e.g. '{', '(', etc.
	// lists?			OpenParen,
	CompoundStatement_lbrace,			/// A closing parenthesis in argument lists and blocks, e.g. '}', ')', etc.
	CompoundStatement_rbrace			CloseParen,
				/// A keywords that introduces some grammar construct, e.g. 'if', 'try', etc.
				IntroducerKeyword,
				/// An inner statement for those that have only a single child of kind
				/// statement, e.g. loop body for while, for, etc; inner statement for case,
				/// default, etc.
				BodyStatement,

				// Roles specific to particular node kinds.
				CaseStatement_value,
				IfStatement_thenStatement,
				IfStatement_elseKeyword,
				IfStatement_elseStatement,
				ReturnStatement_value,
				ExpressionStatement_expression,
				CompoundStatement_statement
				sammccallUnsubmitted Done Reply Inline Actions As discussed offline, there's some options about how abstract/concrete these roles should be. e.g. for a list of function args, this could be FunctionOpenParen/FunctionArgExpr/FunctionArgComma/FunctionCloseParam (specific) <-> OpenParen/Arg/Comma/CloseParen <-> Open/Item/Separator/Close. The more specific ones are somewhat redundant with the parent/child type (but easy to assign systematically), and the more generic ones are more orthogonal (but require more design and may by hard to always make consistent). The concrete advantage of the generic roles is being able to write code like `getTrailingSemicolon(Tree)` or `findLoopBreak(Stmt)` or `removeListItem(Tree, int)` in a fairly generic way, without resorting to adding a `Loop` base class or handling each case with separate code. This is up to you, though. sammccall:* As discussed offline, there's some options about how abstract/concrete these roles should be.
				ilya-biryukovAuthorUnsubmitted Done Reply Inline Actions I definitely agree that writing generic functions is simpler with the proposed approach. However, I am aiming for safer APIs here, albeit less generic. E.g. we'll have something like `removeFunctionArgument(ArgumentList, int)` and `removeInitializer(InitializerList, int)` rather than `removeListItem(Tree, int)` in the public API. Reasons are discoverability of the operations for particular node types. Generic functions might still make sense as an implementation detail to share the code. I'll keep as is for now, but will keep the suggestion in mind. ilya-biryukov:* I definitely agree that writing generic functions is simpler with the proposed approach.
	};			};
				/// For debugging purposes.
				llvm::raw_ostream &operator<<(llvm::raw_ostream &OS, NodeRole R);

	/// A root node for a translation unit. Parent is always null.			/// A root node for a translation unit. Parent is always null.
	class TranslationUnit final : public Tree {			class TranslationUnit final : public Tree {
	public:			public:
	TranslationUnit() : Tree(NodeKind::TranslationUnit) {}			TranslationUnit() : Tree(NodeKind::TranslationUnit) {}
	static bool classof(const Node *N) {			static bool classof(const Node *N) {
	return N->kind() == NodeKind::TranslationUnit;			return N->kind() == NodeKind::TranslationUnit;
	}			}
	};			};

	/// FIXME: this node is temporary and will be replaced with nodes for various			/// FIXME: this node is temporary and will be replaced with nodes for various
	/// 'declarations' and 'declarators' from the C/C++ grammar			/// 'declarations' and 'declarators' from the C/C++ grammar
	///			///
	/// Represents any top-level declaration. Only there to give the syntax tree a			/// Represents any top-level declaration. Only there to give the syntax tree a
	/// bit of structure until we implement syntax nodes for declarations and			/// bit of structure until we implement syntax nodes for declarations and
	/// declarators.			/// declarators.
	class TopLevelDeclaration final : public Tree {			class TopLevelDeclaration final : public Tree {
	public:			public:
	TopLevelDeclaration() : Tree(NodeKind::TopLevelDeclaration) {}			TopLevelDeclaration() : Tree(NodeKind::TopLevelDeclaration) {}
	static bool classof(const Node *N) {			static bool classof(const Node *N) {
	return N->kind() == NodeKind::TopLevelDeclaration;			return N->kind() == NodeKind::TopLevelDeclaration;
	}			}
	};			};

				/// A base class for all expressions. Note that expressions are not statements,
				/// even though they are in clang.
				class Expression : public Tree {
				public:
				Expression(NodeKind K) : Tree(K) {}
				static bool classof(const Node *N) {
				return NodeKind::UnknownExpression <= N->kind() &&
				N->kind() <= NodeKind::UnknownExpression;
				}
				};

				/// An expression of an unknown kind, i.e. one not currently handled by the
				/// syntax tree.
				class UnknownExpression final : public Expression {
				public:
				UnknownExpression() : Expression(NodeKind::UnknownExpression) {}
				static bool classof(const Node *N) {
				return N->kind() == NodeKind::UnknownExpression;
				}
				};

	/// An abstract node for C++ statements, e.g. 'while', 'if', etc.			/// An abstract node for C++ statements, e.g. 'while', 'if', etc.
				/// FIXME: add accessors for semicolon of statements that have it.
	class Statement : public Tree {			class Statement : public Tree {
				sammccallUnsubmitted Not Done Reply Inline Actions Do you want to expose the statement-ending-semicolon here? (Not all statements have it, but common enough you may want it in the base class instead of all children) sammccall: Do you want to expose the statement-ending-semicolon here? (Not all statements have it, but…
				ilya-biryukovAuthorUnsubmitted Not Done Reply Inline Actions Yes, only "leaf" (i.e. the ones not having any statement children) have it. I was thinking about: having a separate class for non-composite statements and providing an accessor there, providing an accessor in each of the leaf statements (would mean some duplication, but, arguably, better discoverability). But, from an offline conversation, we seem to disagree that inheritance is a proper way to model this. Would it be ok to do this in a follow-up? I'll add a FIXME for now. ilya-biryukov: Yes, only "leaf" (i.e. the ones not having any statement children) have it. I was thinking…
				sammccallUnsubmitted Not Done Reply Inline Actions First: yes, let's not do this now. Second: I'm wary of using standard is-a inheritance to model more than alternation in the grammar. That is, ForStatement is-a Statement is fine, ForStatement is-a LoopyStatement is suspect. This is to do with the fact that LoopyStatement is-a Statement seems obvious, and we may end up with diamond-shaped inheritance and some conceptual confusion. This goes for all traits that aren't natural tree-shaped inheritance: HasTrailingSemicolon, LoopyStatement, ... I think there are two concerns here: we want to be able to get the trailing-semicolon if it exists we want to be able to check if the trailing-semicolon is expected including via its static type One way to do this (not the only one...): // generic helper, or callers could even write this directly Optional<Leaf> trailingSemi(Tree Node) { return firstElement(Node->Children<Leaf>(NodeRole::TrailingSemi)); } // mixin for trailing semi support. Note: doesn't inherit Statement! // maybe need/want some CRTP magic class TrailingSemicolon { Optional<Leaf> trailingSemi() const { return trailingSemi((const Node)this; } } // Gets the trailingSemi() accessor. ExprStmt : public Statement, TrailingSemicolon { ... } sammccall: First: yes, let's not do this now. Second: I'm wary of using standard is-a inheritance to…
	public:			public:
	Statement(NodeKind K) : Tree(K) {}			Statement(NodeKind K) : Tree(K) {}
	static bool classof(const Node *N) {			static bool classof(const Node *N) {
	return NodeKind::CompoundStatement <= N->kind() &&			return NodeKind::UnknownStatement <= N->kind() &&
	N->kind() <= NodeKind::CompoundStatement;			N->kind() <= NodeKind::CompoundStatement;
	}			}
	};			};

				/// A statement of an unknown kind, i.e. one not currently handled by the syntax
				/// tree.
				class UnknownStatement final : public Statement {
				public:
				UnknownStatement() : Statement(NodeKind::UnknownStatement) {}
				static bool classof(const Node *N) {
				return N->kind() == NodeKind::UnknownStatement;
				}
				};

				/// E.g. 'int a, b = 10;'
				class DeclarationStatement final : public Statement {
				public:
				DeclarationStatement() : Statement(NodeKind::DeclarationStatement) {}
				static bool classof(const Node *N) {
				return N->kind() == NodeKind::DeclarationStatement;
				}
				};

				/// The no-op statement, i.e. ';'.
				class EmptyStatement final : public Statement {
				public:
				EmptyStatement() : Statement(NodeKind::EmptyStatement) {}
				static bool classof(const Node *N) {
				return N->kind() == NodeKind::EmptyStatement;
				}
				};

				/// switch (<cond>) <body>
				class SwitchStatement final : public Statement {
				public:
				SwitchStatement() : Statement(NodeKind::SwitchStatement) {}
				static bool classof(const Node *N) {
				return N->kind() == NodeKind::SwitchStatement;
				}
				syntax::Leaf *switchKeyword();
				syntax::Statement *body();
				sammccallUnsubmitted Done Reply Inline Actions The fact that this can be an arbitrary statement is kind of shocking. But apparently true! In the long run, we're probably going to be able to find the case statements somehow, even though they're not part of the immediate grammar. Not sure whether this should be via the regular AST or by adding links here. Anyway, problem for another day. sammccall: The fact that this can be an arbitrary statement is kind of shocking. But apparently true! In…
				};

				/// case <value>: <body>
				class CaseStatement final : public Statement {
				public:
				CaseStatement() : Statement(NodeKind::CaseStatement) {}
				static bool classof(const Node *N) {
				return N->kind() == NodeKind::CaseStatement;
				}
				syntax::Leaf *caseKeyword();
				syntax::Expression *value();
				syntax::Statement *body();
				};

				/// default: <body>
				sammccallUnsubmitted Done Reply Inline Actions expression for the value? sammccall: expression for the value?
				class DefaultStatement final : public Statement {
				sammccallUnsubmitted Done Reply Inline Actions syntactically, is it useful to model the body as a single statement? It's not a CompoundStmt as it has no braces. Seems like a sequence... Or is the idea that the first following statement is the body (might be nothing), and subsequent ones aren't part of the body? Why is this more useful than making the body a sibling? sammccall: syntactically, is it useful to model the body as a single statement? It's not a CompoundStmt as…
				ilya-biryukovAuthorUnsubmitted Done Reply Inline Actions This models the structure of the C++ grammar (and clang AST). Getting from a switch statements to all its `case` and `default` labels seems useful, but should be addressed by a separate API that traverses the corresponding syntax tree nodes. Marking as done, from an offline conversation we seem to agree here. Feel free to reopen if needed. ilya-biryukov: This models the structure of the C++ grammar (and clang AST). Getting from a switch statements…
				public:
				DefaultStatement() : Statement(NodeKind::DefaultStatement) {}
				static bool classof(const Node *N) {
				return N->kind() == NodeKind::DefaultStatement;
				}
				syntax::Leaf *defaultKeyword();
				syntax::Statement *body();
				};

				/// if (cond) <then-statement> else <else-statement>
				/// FIXME: add condition that models 'expression or variable declaration'
				class IfStatement final : public Statement {
				public:
				IfStatement() : Statement(NodeKind::IfStatement) {}
				static bool classof(const Node *N) {
				return N->kind() == NodeKind::IfStatement;
				sammccallUnsubmitted Not Done Reply Inline Actions might be handy to unify this with CaseStatement somehow (a base class, or make it literally a CaseStatement with a null body and a bool isDefaultCase() that looks at the keyword token) Mostly thinking about code that's going to iterate over case statements. sammccall: might be handy to unify this with CaseStatement somehow (a base class, or make it literally a…
				ilya-biryukovAuthorUnsubmitted Not Done Reply Inline Actions I would model with with a base class, but let's agree whether that's the right way to approach this. ilya-biryukov: I would model with with a base class, but let's agree whether that's the right way to approach…
				}
				syntax::Leaf *ifKeyword();
				syntax::Statement *thenStatement();
				syntax::Leaf *elseKeyword();
				syntax::Statement *elseStatement();
				};

				/// for (<init>; <cond>; <increment>) <body>
				class ForStatement final : public Statement {
				sammccallUnsubmitted Done Reply Inline Actions I guess the missing cond here (and similar below) are due to complexities around the variable declaring variants? Warrants a FIXME I think sammccall: I guess the missing cond here (and similar below) are due to complexities around the variable…
				ilya-biryukovAuthorUnsubmitted Done Reply Inline Actions Yes. Added a FIXME ilya-biryukov: Yes. Added a FIXME
				public:
				ForStatement() : Statement(NodeKind::ForStatement) {}
				static bool classof(const Node *N) {
				return N->kind() == NodeKind::ForStatement;
				}
				syntax::Leaf *forKeyword();
				syntax::Statement *body();
				};
				sammccallUnsubmitted Done Reply Inline Actions I think throughout it's important to mark which of these are: nullable in correct code nullable in code generated by recovery sammccall: I think throughout it's important to mark which of these are: - nullable in correct code…
				ilya-biryukovAuthorUnsubmitted Done Reply Inline Actions I would suggest to only mark the nodes that are nullable in the correct code. For recovery, I would assume the following rule (please tell me if I'm wrong): On a construct whose parsing involved recovery: if the node has an introducing token (`if`, `try`, etc.), the corresponding child cannot be null. any other child can be null. ilya-biryukov: I would suggest to only mark the nodes that are nullable in the correct code. For recovery, I…
				sammccallUnsubmitted Done Reply Inline Actions Agree with this strategy, and the fact that it doesn't need to be documented on every node/occurrence. But it should definitely be documented somewhere at a high level! (With clang AST, this sort of thing feels like tribal knowledge) sammccall: Agree with this strategy, and the fact that it doesn't need to be documented on every…
				ilya-biryukovAuthorUnsubmitted Done Reply Inline Actions Added a corresponding comment to the file header. ilya-biryukov: Added a corresponding comment to the file header.

				/// while (<cond>) <body>
				class WhileStatement final : public Statement {
				public:
				WhileStatement() : Statement(NodeKind::WhileStatement) {}
				static bool classof(const Node *N) {
				return N->kind() == NodeKind::WhileStatement;
				}
				syntax::Leaf *whileKeyword();
				syntax::Statement *body();
				};

				/// continue;
				class ContinueStatement final : public Statement {
				public:
				ContinueStatement() : Statement(NodeKind::ContinueStatement) {}
				static bool classof(const Node *N) {
				return N->kind() == NodeKind::ContinueStatement;
				}
				syntax::Leaf *continueKeyword();
				};

				/// break;
				class BreakStatement final : public Statement {
				public:
				BreakStatement() : Statement(NodeKind::BreakStatement) {}
				static bool classof(const Node *N) {
				return N->kind() == NodeKind::BreakStatement;
				}
				syntax::Leaf *breakKeyword();
				};

				/// return <expr>;
				class ReturnStatement final : public Statement {
				public:
				ReturnStatement() : Statement(NodeKind::ReturnStatement) {}
				static bool classof(const Node *N) {
				return N->kind() == NodeKind::ReturnStatement;
				}
				syntax::Leaf *returnKeyword();
				syntax::Expression *value();
				sammccallUnsubmitted Done Reply Inline Actions nullable, marked somehow Optional<Expression> is tempting as a systematic and hard-to-ignore way of documenting that. And it reflects the fact that there are three conceptual states for children: present, legally missing, brokenly missing. At the same time, I'm not sure how to feel about the fact that in practice this can't be present but null, and the fact that other* non-optional pointers can be null. sammccall: nullable, marked somehow Optional<Expression*> is tempting as a systematic and hard-to-ignore…
				ilya-biryukovAuthorUnsubmitted Done Reply Inline Actions Having `Optional<Expression>` models the problem space better, but is much harder to use on the client side. I'd keep as is, the file comment explains that one should assume all accessors can return null. Update the comment here to indicate both `return;` and `return <expr>;` are represented by this node. ilya-biryukov:* Having `Optional<Expression*>` models the problem space better, but is much harder to use on…
				};

				/// for (<decl> : <init>) <body>
				class RangeBasedForStatement final : public Statement {
				public:
				RangeBasedForStatement() : Statement(NodeKind::RangeBasedForStatement) {}
				static bool classof(const Node *N) {
				return N->kind() == NodeKind::RangeBasedForStatement;
				}
				syntax::Leaf *forKeyword();
				syntax::Statement *body();
				};

				/// Expression in a statement position, e.g. functions calls inside compound
				/// statements or inside a loop body.
				class ExpressionStatement final : public Statement {
				public:
				ExpressionStatement() : Statement(NodeKind::ExpressionStatement) {}
				static bool classof(const Node *N) {
				return N->kind() == NodeKind::ExpressionStatement;
				}
				syntax::Expression *expression();
				};

	/// { statement1; statement2; … }			/// { statement1; statement2; … }
	class CompoundStatement final : public Statement {			class CompoundStatement final : public Statement {
	public:			public:
	CompoundStatement() : Statement(NodeKind::CompoundStatement) {}			CompoundStatement() : Statement(NodeKind::CompoundStatement) {}
	static bool classof(const Node *N) {			static bool classof(const Node *N) {
	return N->kind() == NodeKind::CompoundStatement;			return N->kind() == NodeKind::CompoundStatement;
	}			}
				sammccallUnsubmitted Done Reply Inline Actions (any reason we can't already have the expr here?) sammccall: (any reason we can't already have the expr here?)
				ilya-biryukovAuthorUnsubmitted Done Reply Inline Actions Added a getter for it. ilya-biryukov: Added a getter for it.
	syntax::Leaf *lbrace();			syntax::Leaf *lbrace();
				/// FIXME: use custom iterator instead of 'vector'.
				std::vector<syntax::Statement *> statements();
	syntax::Leaf *rbrace();			syntax::Leaf *rbrace();
	};			};

	} // namespace syntax			} // namespace syntax
	} // namespace clang			} // namespace clang
	#endif			#endif

clang/lib/Tooling/Syntax/BuildTree.cpp

Show All 21 Lines
#include "llvm/Support/Allocator.h"		#include "llvm/Support/Allocator.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/FormatVariadic.h"		#include "llvm/Support/FormatVariadic.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <map>		#include <map>

using namespace clang;		using namespace clang;

		static bool isImplicitExpr(clang::Expr *E) { return E->IgnoreImplicit() != E; }

/// A helper class for constructing the syntax tree while traversing a clang		/// A helper class for constructing the syntax tree while traversing a clang
/// AST.		/// AST.
///		///
/// At each point of the traversal we maintain a list of pending nodes.		/// At each point of the traversal we maintain a list of pending nodes.
/// Initially all tokens are added as pending nodes. When processing a clang AST		/// Initially all tokens are added as pending nodes. When processing a clang AST
/// node, the clients need to:		/// node, the clients need to:
/// - create a corresponding syntax node,		/// - create a corresponding syntax node,
/// - assign roles to all pending child nodes with 'markChild' and		/// - assign roles to all pending child nodes with 'markChild' and
Show All 9 Lines	public:
TreeBuilder(syntax::Arena &Arena) : Arena(Arena), Pending(Arena) {}		TreeBuilder(syntax::Arena &Arena) : Arena(Arena), Pending(Arena) {}

llvm::BumpPtrAllocator &allocator() { return Arena.allocator(); }		llvm::BumpPtrAllocator &allocator() { return Arena.allocator(); }

/// Populate children for \p New node, assuming it covers tokens from \p		/// Populate children for \p New node, assuming it covers tokens from \p
/// Range.		/// Range.
void foldNode(llvm::ArrayRef<syntax::Token> Range, syntax::Tree *New);		void foldNode(llvm::ArrayRef<syntax::Token> Range, syntax::Tree *New);

		/// Mark the \p Child node with a corresponding \p Role. All marked children
		/// should be consumed by foldNode.
		/// (!) this overload should only be called for expressions in a statement
		/// position, it will wrap expressions into expression statement.
		void markChild(Stmt *Child, NodeRole Role);
		/// It is important to call this overload for expressions in non-statement
		/// position to avoid wrapping into expression statement.
		void markChild(Expr *Child, NodeRole Role);

/// Set role for a token starting at \p Loc.		/// Set role for a token starting at \p Loc.
void markChildToken(SourceLocation Loc, tok::TokenKind Kind, NodeRole R);		void markChildToken(SourceLocation Loc, tok::TokenKind Kind, NodeRole R);

/// Finish building the tree and consume the root node.		/// Finish building the tree and consume the root node.
syntax::TranslationUnit *finalize() && {		syntax::TranslationUnit *finalize() && {
auto Tokens = Arena.tokenBuffer().expandedTokens();		auto Tokens = Arena.tokenBuffer().expandedTokens();
// Build the root of the tree, consuming all the children.		// Build the root of the tree, consuming all the children.
Pending.foldChildren(Tokens,		Pending.foldChildren(Tokens,
Show All 12 Lines	llvm::ArrayRef<syntax::Token> getRange(SourceLocation First,
assert(Last.isValid());		assert(Last.isValid());
assert(First == Last \|\|		assert(First == Last \|\|
Arena.sourceManager().isBeforeInTranslationUnit(First, Last));		Arena.sourceManager().isBeforeInTranslationUnit(First, Last));
return llvm::makeArrayRef(findToken(First), std::next(findToken(Last)));		return llvm::makeArrayRef(findToken(First), std::next(findToken(Last)));
}		}
llvm::ArrayRef<syntax::Token> getRange(const Decl *D) const {		llvm::ArrayRef<syntax::Token> getRange(const Decl *D) const {
return getRange(D->getBeginLoc(), D->getEndLoc());		return getRange(D->getBeginLoc(), D->getEndLoc());
}		}
		llvm::ArrayRef<syntax::Token> getRange(const Expr *E) const {
		return getRange(E->getBeginLoc(), E->getEndLoc());
		}
		/// Find the adjusted range for the statement, consuming the trailing
		/// semicolon when needed.
llvm::ArrayRef<syntax::Token> getRange(const Stmt *S) const {		llvm::ArrayRef<syntax::Token> getRange(const Stmt *S) const {
		sammccallUnsubmitted Done Reply Inline Actions since Expr : Stmt, we need to be a bit wary of overloading based on static type. It's tempting to say it's correct here: if we statically know E is an Expr, then maybe it's never correct to consume the semicolon. But is the converse true? e.g. if we're traversing using RAV and call getRange() in visitstmt... (The alternatives seem to be a) removing the expr version of the function, and having the stmt version take a `bool ConsumeSemi` or b) change the stmt version to have (dynamic) expr behave like the expr overload, and handle it specially when forming exprstmt. More verbose, genuinely conflicted here) sammccall: since Expr : Stmt, we need to be a bit wary of overloading based on static type. It's tempting…
		ilya-biryukovAuthorUnsubmitted Done Reply Inline Actions Using two functions with different names now. ilya-biryukov: Using two functions with different names now.
return getRange(S->getBeginLoc(), S->getEndLoc());		auto Tokens = getRange(S->getBeginLoc(), S->getEndLoc());
		if (isa<CompoundStmt>(S))
		return Tokens;

		// Some statements miss a trailing semicolon, e.g. 'return', 'continue' and
		// all statements that end with those. Consume this semicolon here.
		//
		// (!) statements never consume 'eof', so looking at the next token is ok.
		if (Tokens.back().kind() != tok::semi && Tokens.end()->kind() == tok::semi)
		return llvm::makeArrayRef(Tokens.begin(), Tokens.end() + 1);
		return Tokens;
}		}

private:		private:
/// Finds a token starting at \p L. The token must exist.		/// Finds a token starting at \p L. The token must exist.
const syntax::Token *findToken(SourceLocation L) const;		const syntax::Token *findToken(SourceLocation L) const;

/// A collection of trees covering the input tokens.		/// A collection of trees covering the input tokens.
/// When created, each tree corresponds to a single token in the file.		/// When created, each tree corresponds to a single token in the file.
▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	bool WalkUpFromTranslationUnitDecl(TranslationUnitDecl *TU) {
// (!) we do not want to call VisitDecl(), the declaration for translation		// (!) we do not want to call VisitDecl(), the declaration for translation
// unit is built by finalize().		// unit is built by finalize().
return true;		return true;
}		}

bool WalkUpFromCompoundStmt(CompoundStmt *S) {		bool WalkUpFromCompoundStmt(CompoundStmt *S) {
using NodeRole = syntax::NodeRole;		using NodeRole = syntax::NodeRole;

Builder.markChildToken(S->getLBracLoc(), tok::l_brace,		Builder.markChildToken(S->getLBracLoc(), tok::l_brace, NodeRole::OpenParen);
NodeRole::CompoundStatement_lbrace);		for (auto *Child : S->body())
		Builder.markChild(Child, NodeRole::CompoundStatement_statement);
Builder.markChildToken(S->getRBracLoc(), tok::r_brace,		Builder.markChildToken(S->getRBracLoc(), tok::r_brace,
NodeRole::CompoundStatement_rbrace);		NodeRole::CloseParen);

Builder.foldNode(Builder.getRange(S),		Builder.foldNode(Builder.getRange(S),
new (allocator()) syntax::CompoundStatement);		new (allocator()) syntax::CompoundStatement);
return true;		return true;
}		}

		// Some statements are not yet handled by syntax trees.
		bool WalkUpFromStmt(Stmt *S) {
		Builder.foldNode(Builder.getRange(S),
		new (allocator()) syntax::UnknownStatement);
		return true;
		}

		bool TraverseCXXForRangeStmt(CXXForRangeStmt *S) {
		sammccallUnsubmitted Done Reply Inline Actions maybe group with corresponding `WalkUpFromCXXForRangeStmt`? (Could also group all `Traverse` together if you prefer. Current ordering seems a little random) sammccall:* maybe group with corresponding `WalkUpFromCXXForRangeStmt`? (Could also group all `Traverse*`…
		ilya-biryukovAuthorUnsubmitted Done Reply Inline Actions Went for grouping `Traverse` and `Walk` together. Normally would also choose to put related methods (i.e. `WalkUpFromX` and `TraverseX`) together, but they have totally different meaning here: `TraverseX` creates a workaround for suboptimal RAV traversals and `WalkUpFromX` actually builds the syntax tree. ilya-biryukov: Went for grouping `Traverse` and `Walk` together. Normally would also choose to put related…
		// We override to traverse range initializer as VarDecl.
		// RAT traverses it as a statement, we produce invalid node kinds in that
		sammccallUnsubmitted Done Reply Inline Actions nit: RAV? sammccall: nit: RAV?
		ilya-biryukovAuthorUnsubmitted Done Reply Inline Actions Right, thanks. `RAT` was funnier, though. Even if incorrect... ilya-biryukov: Right, thanks. `RAT` was funnier, though. Even if incorrect...
		// case.
		// FIXME: should do this in RAT instead?
		if (S->getInit() && !TraverseStmt(S->getInit()))
		return false;
		if (S->getLoopVariable() && !TraverseDecl(S->getLoopVariable()))
		return false;
		if (S->getRangeInit() && !TraverseStmt(S->getRangeInit()))
		return false;
		if (S->getBody() && !TraverseStmt(S->getBody()))
		return false;
		return true;
		}

		// Some expressions are not yet handled by syntax trees.
		bool WalkUpFromExpr(Expr *E) {
		assert(!isImplicitExpr(E) && "should be handled by TraverseStmt");
		Builder.foldNode(Builder.getRange(E),
		new (allocator()) syntax::UnknownExpression);
		return true;
		}

		bool TraverseStmt(Stmt *S) {
		if (auto *E = llvm::dyn_cast_or_null<Expr>(S)) {
		// (!) do not recurse into subexpressions.
		// we do not have syntax trees for expressions yet, so we only want to see
		// the first top-level expression.
		return WalkUpFromExpr(E->IgnoreImplicit());
		}
		return RecursiveASTVisitor::TraverseStmt(S);
		}

		// The code below is very regular, it could even be generated with some
		// preprocessor magic. We merely assign roles to the corresponding children
		// and fold resulting nodes.
		bool WalkUpFromDeclStmt(DeclStmt *S) {
		Builder.foldNode(Builder.getRange(S),
		new (allocator()) syntax::DeclarationStatement);
		return true;
		}

		bool WalkUpFromNullStmt(NullStmt *S) {
		Builder.foldNode(Builder.getRange(S),
		new (allocator()) syntax::EmptyStatement);
		return true;
		}

		bool WalkUpFromSwitchStmt(SwitchStmt *S) {
		Builder.markChildToken(S->getSwitchLoc(), tok::kw_switch,
		syntax::NodeRole::IntroducerKeyword);
		Builder.markChild(S->getBody(), syntax::NodeRole::BodyStatement);
		Builder.foldNode(Builder.getRange(S),
		new (allocator()) syntax::SwitchStatement);
		return true;
		}

		bool WalkUpFromCaseStmt(CaseStmt *S) {
		Builder.markChildToken(S->getKeywordLoc(), tok::kw_case,
		syntax::NodeRole::IntroducerKeyword);
		Builder.markChild(S->getLHS(), syntax::NodeRole::CaseStatement_value);
		Builder.markChild(S->getSubStmt(), syntax::NodeRole::BodyStatement);
		Builder.foldNode(Builder.getRange(S),
		new (allocator()) syntax::CaseStatement);
		return true;
		}

		bool WalkUpFromDefaultStmt(DefaultStmt *S) {
		Builder.markChildToken(S->getKeywordLoc(), tok::kw_default,
		syntax::NodeRole::IntroducerKeyword);
		Builder.markChild(S->getSubStmt(), syntax::NodeRole::BodyStatement);
		Builder.foldNode(Builder.getRange(S),
		new (allocator()) syntax::DefaultStatement);
		return true;
		}

		bool WalkUpFromIfStmt(IfStmt *S) {
		Builder.markChildToken(S->getIfLoc(), tok::kw_if,
		syntax::NodeRole::IntroducerKeyword);
		Builder.markChild(S->getThen(),
		syntax::NodeRole::IfStatement_thenStatement);
		Builder.markChildToken(S->getElseLoc(), tok::kw_else,
		syntax::NodeRole::IfStatement_elseKeyword);
		Builder.markChild(S->getElse(),
		syntax::NodeRole::IfStatement_elseStatement);
		Builder.foldNode(Builder.getRange(S),
		new (allocator()) syntax::IfStatement);
		return true;
		}

		bool WalkUpFromForStmt(ForStmt *S) {
		Builder.markChildToken(S->getForLoc(), tok::kw_for,
		syntax::NodeRole::IntroducerKeyword);
		Builder.markChild(S->getBody(), syntax::NodeRole::BodyStatement);
		Builder.foldNode(Builder.getRange(S),
		new (allocator()) syntax::ForStatement);
		return true;
		}

		bool WalkUpFromWhileStmt(WhileStmt *S) {
		Builder.markChildToken(S->getWhileLoc(), tok::kw_while,
		syntax::NodeRole::IntroducerKeyword);
		Builder.markChild(S->getBody(), syntax::NodeRole::BodyStatement);
		Builder.foldNode(Builder.getRange(S),
		new (allocator()) syntax::WhileStatement);
		return true;
		}

		bool WalkUpFromContinueStmt(ContinueStmt *S) {
		Builder.markChildToken(S->getContinueLoc(), tok::kw_continue,
		syntax::NodeRole::IntroducerKeyword);
		Builder.foldNode(Builder.getRange(S),
		new (allocator()) syntax::ContinueStatement);
		return true;
		}

		bool WalkUpFromBreakStmt(BreakStmt *S) {
		Builder.markChildToken(S->getBreakLoc(), tok::kw_break,
		syntax::NodeRole::IntroducerKeyword);
		Builder.foldNode(Builder.getRange(S),
		new (allocator()) syntax::BreakStatement);
		return true;
		}

		bool WalkUpFromReturnStmt(ReturnStmt *S) {
		Builder.markChildToken(S->getReturnLoc(), tok::kw_return,
		syntax::NodeRole::IntroducerKeyword);
		Builder.markChild(S->getRetValue(),
		syntax::NodeRole::ReturnStatement_value);
		Builder.foldNode(Builder.getRange(S),
		new (allocator()) syntax::ReturnStatement);
		return true;
		}

		bool WalkUpFromCXXForRangeStmt(CXXForRangeStmt *S) {
		Builder.markChildToken(S->getForLoc(), tok::kw_for,
		syntax::NodeRole::IntroducerKeyword);
		Builder.markChild(S->getBody(), syntax::NodeRole::BodyStatement);
		Builder.foldNode(Builder.getRange(S),
		new (allocator()) syntax::RangeBasedForStatement);
		return true;
		}

private:		private:
/// A small helper to save some typing.		/// A small helper to save some typing.
llvm::BumpPtrAllocator &allocator() { return Builder.allocator(); }		llvm::BumpPtrAllocator &allocator() { return Builder.allocator(); }

syntax::TreeBuilder &Builder;		syntax::TreeBuilder &Builder;
const LangOptions &LangOpts;		const LangOptions &LangOpts;
};		};
} // namespace		} // namespace

void syntax::TreeBuilder::foldNode(llvm::ArrayRef<syntax::Token> Range,		void syntax::TreeBuilder::foldNode(llvm::ArrayRef<syntax::Token> Range,
syntax::Tree *New) {		syntax::Tree *New) {
Pending.foldChildren(Range, New);		Pending.foldChildren(Range, New);
}		}

void syntax::TreeBuilder::markChildToken(SourceLocation Loc,		void syntax::TreeBuilder::markChildToken(SourceLocation Loc,
tok::TokenKind Kind, NodeRole Role) {		tok::TokenKind Kind, NodeRole Role) {
if (Loc.isInvalid())		if (Loc.isInvalid())
return;		return;
Pending.assignRole(*findToken(Loc), Role);		Pending.assignRole(*findToken(Loc), Role);
}		}

		void syntax::TreeBuilder::markChild(Stmt *Child, NodeRole Role) {
		if (!Child)
		return;

		auto Range = getRange(Child);
		// This is an expression in a statement position, consume the trailing
		// semicolon and form an 'ExpressionStatement' node.
		if (auto *E = dyn_cast<Expr>(Child)) {
		Pending.assignRole(getRange(E), NodeRole::ExpressionStatement_expression);
		// (!) 'getRange(Stmt)' ensures this already covers a trailing semicolon.
		Pending.foldChildren(Range, new (allocator()) syntax::ExpressionStatement);
		}
		Pending.assignRole(Range, Role);
		}

		void syntax::TreeBuilder::markChild(Expr *Child, NodeRole Role) {
		Pending.assignRole(getRange(Child), Role);
		}

const syntax::Token *syntax::TreeBuilder::findToken(SourceLocation L) const {		const syntax::Token *syntax::TreeBuilder::findToken(SourceLocation L) const {
auto Tokens = Arena.tokenBuffer().expandedTokens();		auto Tokens = Arena.tokenBuffer().expandedTokens();
auto &SM = Arena.sourceManager();		auto &SM = Arena.sourceManager();
auto It = llvm::partition_point(Tokens, [&](const syntax::Token &T) {		auto It = llvm::partition_point(Tokens, [&](const syntax::Token &T) {
return SM.isBeforeInTranslationUnit(T.location(), L);		return SM.isBeforeInTranslationUnit(T.location(), L);
});		});
assert(It != Tokens.end());		assert(It != Tokens.end());
assert(It->location() == L);		assert(It->location() == L);
Show All 9 Lines

clang/lib/Tooling/Syntax/Nodes.cpp

	Show All 12 Lines
	llvm::raw_ostream &syntax::operator<<(llvm::raw_ostream &OS, NodeKind K) {			llvm::raw_ostream &syntax::operator<<(llvm::raw_ostream &OS, NodeKind K) {
	switch (K) {			switch (K) {
	case NodeKind::Leaf:			case NodeKind::Leaf:
	return OS << "Leaf";			return OS << "Leaf";
	case NodeKind::TranslationUnit:			case NodeKind::TranslationUnit:
	return OS << "TranslationUnit";			return OS << "TranslationUnit";
	case NodeKind::TopLevelDeclaration:			case NodeKind::TopLevelDeclaration:
	return OS << "TopLevelDeclaration";			return OS << "TopLevelDeclaration";
				case NodeKind::UnknownExpression:
				return OS << "UnknownExpression";
				case NodeKind::UnknownStatement:
				return OS << "UnknownStatement";
				case NodeKind::DeclarationStatement:
				return OS << "DeclarationStatement";
				case NodeKind::EmptyStatement:
				return OS << "EmptyStatement";
				case NodeKind::SwitchStatement:
				return OS << "SwitchStatement";
				case NodeKind::CaseStatement:
				return OS << "CaseStatement";
				case NodeKind::DefaultStatement:
				return OS << "DefaultStatement";
				case NodeKind::IfStatement:
				return OS << "IfStatement";
				case NodeKind::ForStatement:
				return OS << "ForStatement";
				case NodeKind::WhileStatement:
				return OS << "WhileStatement";
				case NodeKind::ContinueStatement:
				return OS << "ContinueStatement";
				case NodeKind::BreakStatement:
				return OS << "BreakStatement";
				case NodeKind::ReturnStatement:
				return OS << "ReturnStatement";
				case NodeKind::RangeBasedForStatement:
				return OS << "RangeBasedForStatement";
				case NodeKind::ExpressionStatement:
				return OS << "ExpressionStatement";
	case NodeKind::CompoundStatement:			case NodeKind::CompoundStatement:
	return OS << "CompoundStatement";			return OS << "CompoundStatement";
	}			}
	llvm_unreachable("unknown node kind");			llvm_unreachable("unknown node kind");
	}			}

				llvm::raw_ostream &syntax::operator<<(llvm::raw_ostream &OS, NodeRole R) {
				switch (R) {
				case syntax::NodeRole::Detached:
				return OS << "Detached";
				case syntax::NodeRole::Unknown:
				return OS << "Unknown";
				case syntax::NodeRole::OpenParen:
				return OS << "OpenParen";
				case syntax::NodeRole::CloseParen:
				return OS << "CloseParen";
				case syntax::NodeRole::IntroducerKeyword:
				return OS << "IntroducerKeyword";
				case syntax::NodeRole::BodyStatement:
				return OS << "BodyStatement";
				case syntax::NodeRole::CaseStatement_value:
				return OS << "CaseStatement_value";
				case syntax::NodeRole::IfStatement_thenStatement:
				return OS << "IfStatement_thenStatement";
				case syntax::NodeRole::IfStatement_elseKeyword:
				return OS << "IfStatement_elseKeyword";
				case syntax::NodeRole::IfStatement_elseStatement:
				return OS << "IfStatement_elseStatement";
				case syntax::NodeRole::ReturnStatement_value:
				return OS << "ReturnStatement_value";
				case syntax::NodeRole::ExpressionStatement_expression:
				return OS << "ExpressionStatement_expression";
				case syntax::NodeRole::CompoundStatement_statement:
				return OS << "CompoundStatement_statement";
				}
				llvm_unreachable("invalid role");
				}

				syntax::Leaf *syntax::SwitchStatement::switchKeyword() {
				return llvm::cast_or_null<syntax::Leaf>(
				findChild(syntax::NodeRole::IntroducerKeyword));
				}

				syntax::Statement *syntax::SwitchStatement::body() {
				return llvm::cast_or_null<syntax::Statement>(
				findChild(syntax::NodeRole::BodyStatement));
				}

				syntax::Leaf *syntax::CaseStatement::caseKeyword() {
				return llvm::cast_or_null<syntax::Leaf>(
				findChild(syntax::NodeRole::IntroducerKeyword));
				}

				syntax::Expression *syntax::CaseStatement::value() {
				return llvm::cast_or_null<syntax::Expression>(
				findChild(syntax::NodeRole::CaseStatement_value));
				}

				syntax::Statement *syntax::CaseStatement::body() {
				return llvm::cast_or_null<syntax::Statement>(
				findChild(syntax::NodeRole::BodyStatement));
				}

				syntax::Leaf *syntax::DefaultStatement::defaultKeyword() {
				return llvm::cast_or_null<syntax::Leaf>(
				findChild(syntax::NodeRole::IntroducerKeyword));
				}

				syntax::Statement *syntax::DefaultStatement::body() {
				return llvm::cast_or_null<syntax::Statement>(
				findChild(syntax::NodeRole::BodyStatement));
				}

				syntax::Leaf *syntax::IfStatement::ifKeyword() {
				return llvm::cast_or_null<syntax::Leaf>(
				findChild(syntax::NodeRole::IntroducerKeyword));
				}

				syntax::Statement *syntax::IfStatement::thenStatement() {
				return llvm::cast_or_null<syntax::Statement>(
				findChild(syntax::NodeRole::IfStatement_thenStatement));
				}

				syntax::Leaf *syntax::IfStatement::elseKeyword() {
				return llvm::cast_or_null<syntax::Leaf>(
				findChild(syntax::NodeRole::IfStatement_elseKeyword));
				}

				syntax::Statement *syntax::IfStatement::elseStatement() {
				return llvm::cast_or_null<syntax::Statement>(
				findChild(syntax::NodeRole::IfStatement_elseStatement));
				}

				syntax::Leaf *syntax::ForStatement::forKeyword() {
				return llvm::cast_or_null<syntax::Leaf>(
				findChild(syntax::NodeRole::IntroducerKeyword));
				}

				syntax::Statement *syntax::ForStatement::body() {
				return llvm::cast_or_null<syntax::Statement>(
				findChild(syntax::NodeRole::BodyStatement));
				}

				syntax::Leaf *syntax::WhileStatement::whileKeyword() {
				return llvm::cast_or_null<syntax::Leaf>(
				findChild(syntax::NodeRole::IntroducerKeyword));
				}

				syntax::Statement *syntax::WhileStatement::body() {
				return llvm::cast_or_null<syntax::Statement>(
				findChild(syntax::NodeRole::BodyStatement));
				}

				syntax::Leaf *syntax::ContinueStatement::continueKeyword() {
				return llvm::cast_or_null<syntax::Leaf>(
				findChild(syntax::NodeRole::IntroducerKeyword));
				}

				syntax::Leaf *syntax::BreakStatement::breakKeyword() {
				return llvm::cast_or_null<syntax::Leaf>(
				findChild(syntax::NodeRole::IntroducerKeyword));
				}

				syntax::Leaf *syntax::ReturnStatement::returnKeyword() {
				return llvm::cast_or_null<syntax::Leaf>(
				findChild(syntax::NodeRole::IntroducerKeyword));
				}

				syntax::Expression *syntax::ReturnStatement::value() {
				return llvm::cast_or_null<syntax::Expression>(
				findChild(syntax::NodeRole::ReturnStatement_value));
				}

				syntax::Leaf *syntax::RangeBasedForStatement::forKeyword() {
				return llvm::cast_or_null<syntax::Leaf>(
				findChild(syntax::NodeRole::IntroducerKeyword));
				}

				syntax::Statement *syntax::RangeBasedForStatement::body() {
				return llvm::cast_or_null<syntax::Statement>(
				findChild(syntax::NodeRole::BodyStatement));
				}

				syntax::Expression *syntax::ExpressionStatement::expression() {
				return llvm::cast_or_null<syntax::Expression>(
				findChild(syntax::NodeRole::ExpressionStatement_expression));
				}

	syntax::Leaf *syntax::CompoundStatement::lbrace() {			syntax::Leaf *syntax::CompoundStatement::lbrace() {
	return llvm::cast_or_null<syntax::Leaf>(			return llvm::cast_or_null<syntax::Leaf>(
	findChild(NodeRole::CompoundStatement_lbrace));			findChild(syntax::NodeRole::OpenParen));
				}

				std::vector<syntax::Statement *> syntax::CompoundStatement::statements() {
				std::vector<syntax::Statement *> Children;
				for (auto *C = firstChild(); C; C = C->nextSibling()) {
				if (C->role() == syntax::NodeRole::CompoundStatement_statement)
				Children.push_back(llvm::cast<syntax::Statement>(C));
				}
				return Children;
	}			}

	syntax::Leaf *syntax::CompoundStatement::rbrace() {			syntax::Leaf *syntax::CompoundStatement::rbrace() {
	return llvm::cast_or_null<syntax::Leaf>(			return llvm::cast_or_null<syntax::Leaf>(
	findChild(NodeRole::CompoundStatement_rbrace));			findChild(syntax::NodeRole::CloseParen));
	}			}

clang/lib/Tooling/Syntax/Tree.cpp

Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	if (T.kind() == tok::eof) {
continue;		continue;
}		}
OS << T.text(SM);		OS << T.text(SM);
}		}
}		}

static void dumpTree(llvm::raw_ostream &OS, const syntax::Node *N,		static void dumpTree(llvm::raw_ostream &OS, const syntax::Node *N,
const syntax::Arena &A, std::vector<bool> IndentMask) {		const syntax::Arena &A, std::vector<bool> IndentMask) {
if (N->role() != syntax::NodeRole::Unknown) {
// FIXME: print the symbolic name of a role.
if (N->role() == syntax::NodeRole::Detached)		if (N->role() == syntax::NodeRole::Detached)
OS << "*: ";		OS << "*: ";
else		// FIXME: find a nice way to print other roles.
OS << static_cast<int>(N->role()) << ": ";
}
if (auto *L = llvm::dyn_cast<syntax::Leaf>(N)) {		if (auto *L = llvm::dyn_cast<syntax::Leaf>(N)) {
dumpTokens(OS, *L->token(), A.sourceManager());		dumpTokens(OS, *L->token(), A.sourceManager());
OS << "\n";		OS << "\n";
return;		return;
}		}

auto *T = llvm::cast<syntax::Tree>(N);		auto *T = llvm::cast<syntax::Tree>(N);
OS << T->kind() << "\n";		OS << T->kind() << "\n";
▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

clang/unittests/Tooling/Syntax/TreeTest.cpp

Show First 20 Lines • Show All 130 Lines • ▼ Show 20 Lines	)cpp",
R"txt(		R"txt(
*: TranslationUnit		*: TranslationUnit
\|-TopLevelDeclaration		\|-TopLevelDeclaration
\| \|-int		\| \|-int
\| \|-main		\| \|-main
\| \|-(		\| \|-(
\| \|-)		\| \|-)
\| `-CompoundStatement		\| `-CompoundStatement
\| \|-2: {		\| \|-{
\| `-3: }		\| `-}
\|-TopLevelDeclaration		\|-TopLevelDeclaration
\| \|-void		\| \|-void
\| \|-foo		\| \|-foo
\| \|-(		\| \|-(
\| \|-)		\| \|-)
\| `-CompoundStatement		\| `-CompoundStatement
\| \|-2: {		\| \|-{
\| `-3: }		\| `-}
`-<eof>		`-<eof>
)txt"},		)txt"},
};		// if.
		{
		R"cpp(
		int main() {
		if (true) {}
		if (true) {} else if (false) {}
		}
		)cpp",
		R"txt(
		*: TranslationUnit
		\|-TopLevelDeclaration
		\| \|-int
		\| \|-main
		\| \|-(
		\| \|-)
		\| `-CompoundStatement
		\| \|-{
		\| \|-IfStatement
		\| \| \|-if
		\| \| \|-(
		\| \| \|-UnknownExpression
		\| \| \| `-true
		\| \| \|-)
		\| \| `-CompoundStatement
		\| \| \|-{
		\| \| `-}
		\| \|-IfStatement
		\| \| \|-if
		\| \| \|-(
		\| \| \|-UnknownExpression
		\| \| \| `-true
		\| \| \|-)
		\| \| \|-CompoundStatement
		\| \| \| \|-{
		\| \| \| `-}
		\| \| \|-else
		\| \| `-IfStatement
		\| \| \|-if
		\| \| \|-(
		\| \| \|-UnknownExpression
		\| \| \| `-false
		\| \| \|-)
		\| \| `-CompoundStatement
		\| \| \|-{
		\| \| `-}
		\| `-}
		`-<eof>
		)txt"},
		// for.
		{R"cpp(
		void test() {
		for (;;) {}
		}
		)cpp",
		R"txt(
		*: TranslationUnit
		\|-TopLevelDeclaration
		\| \|-void
		\| \|-test
		\| \|-(
		\| \|-)
		\| `-CompoundStatement
		\| \|-{
		\| \|-ForStatement
		\| \| \|-for
		\| \| \|-(
		\| \| \|-;
		\| \| \|-;
		\| \| \|-)
		\| \| `-CompoundStatement
		\| \| \|-{
		\| \| `-}
		\| `-}
		`-<eof>
		)txt"},
		// declaration statement.
		{"void test() { int a = 10; }",
		R"txt(
		*: TranslationUnit
		\|-TopLevelDeclaration
		\| \|-void
		\| \|-test
		\| \|-(
		\| \|-)
		\| `-CompoundStatement
		\| \|-{
		\| \|-DeclarationStatement
		\| \| \|-int
		\| \| \|-a
		\| \| \|-=
		\| \| \|-10
		\| \| `-;
		\| `-}
		`-<eof>
		)txt"},
		{"void test() { ; }", R"txt(
		*: TranslationUnit
		\|-TopLevelDeclaration
		\| \|-void
		\| \|-test
		\| \|-(
		\| \|-)
		\| `-CompoundStatement
		\| \|-{
		\| \|-EmptyStatement
		\| \| `-;
		\| `-}
		`-<eof>
		)txt"},
		// switch, case and default.
		{R"cpp(
		void test() {
		switch (true) {
		case 0:
		default:;
		}
		}
		)cpp",
		R"txt(
		*: TranslationUnit
		\|-TopLevelDeclaration
		\| \|-void
		\| \|-test
		\| \|-(
		\| \|-)
		\| `-CompoundStatement
		\| \|-{
		\| \|-SwitchStatement
		\| \| \|-switch
		\| \| \|-(
		\| \| \|-UnknownExpression
		\| \| \| `-true
		\| \| \|-)
		\| \| `-CompoundStatement
		\| \| \|-{
		\| \| \|-CaseStatement
		\| \| \| \|-case
		\| \| \| \|-UnknownExpression
		\| \| \| \| `-0
		\| \| \| \|-:
		\| \| \| `-DefaultStatement
		\| \| \| \|-default
		\| \| \| \|-:
		\| \| \| `-EmptyStatement
		\| \| \| `-;
		\| \| `-}
		\| `-}
		`-<eof>
		)txt"},
		// while.
		{R"cpp(
		void test() {
		while (true) { continue; break; }
		}
		)cpp",
		R"txt(
		*: TranslationUnit
		\|-TopLevelDeclaration
		\| \|-void
		\| \|-test
		\| \|-(
		\| \|-)
		\| `-CompoundStatement
		\| \|-{
		\| \|-WhileStatement
		\| \| \|-while
		\| \| \|-(
		\| \| \|-UnknownExpression
		\| \| \| `-true
		\| \| \|-)
		\| \| `-CompoundStatement
		\| \| \|-{
		\| \| \|-ContinueStatement
		\| \| \| \|-continue
		\| \| \| `-;
		\| \| \|-BreakStatement
		\| \| \| \|-break
		\| \| \| `-;
		\| \| `-}
		\| `-}
		`-<eof>
		)txt"},
		// return.
		{R"cpp(
		int test() { return 1; }
		)cpp",
		R"txt(
		*: TranslationUnit
		\|-TopLevelDeclaration
		\| \|-int
		\| \|-test
		\| \|-(
		\| \|-)
		\| `-CompoundStatement
		\| \|-{
		\| \|-ReturnStatement
		\| \| \|-return
		\| \| \|-UnknownExpression
		\| \| \| `-1
		\| \| `-;
		\| `-}
		`-<eof>
		)txt"},
		// Range-based for.
		{R"cpp(
		void test() {
		int a[3];
		for (int x : a) ;
		}
		)cpp",
		R"txt(
		*: TranslationUnit
		\|-TopLevelDeclaration
		\| \|-void
		\| \|-test
		\| \|-(
		\| \|-)
		\| `-CompoundStatement
		\| \|-{
		\| \|-DeclarationStatement
		\| \| \|-int
		\| \| \|-a
		\| \| \|-[
		\| \| \|-3
		\| \| \|-]
		\| \| `-;
		\| \|-RangeBasedForStatement
		\| \| \|-for
		\| \| \|-(
		\| \| \|-int
		\| \| \|-x
		\| \| \|-:
		\| \| \|-UnknownExpression
		\| \| \| `-a
		\| \| \|-)
		\| \| `-EmptyStatement
		\| \| `-;
		\| `-}
		`-<eof>
		)txt"},
		// Unhandled statements should end up as 'unknown statement'.
		// This example uses a 'label statement', which does not yet have a syntax
		// counterpart.
		{"void main() { foo: return 100; }", R"txt(
		*: TranslationUnit
		\|-TopLevelDeclaration
		\| \|-void
		\| \|-main
		\| \|-(
		\| \|-)
		\| `-CompoundStatement
		\| \|-{
		\| \|-UnknownStatement
		\| \| \|-foo
		\| \| \|-:
		\| \| `-ReturnStatement
		\| \| \|-return
		\| \| \|-UnknownExpression
		\| \| \| `-100
		\| \| `-;
		\| `-}
		`-<eof>
		)txt"},
		// expressions should be wrapped in 'ExpressionStatement' when they appear
		// in a statement position.
		{R"cpp(
		void test() {
		test();
		if (true) test(); else test();
		}
		)cpp",
		R"txt(
		*: TranslationUnit
		\|-TopLevelDeclaration
		\| \|-void
		\| \|-test
		\| \|-(
		\| \|-)
		\| `-CompoundStatement
		\| \|-{
		\| \|-ExpressionStatement
		\| \| \|-UnknownExpression
		\| \| \| \|-test
		\| \| \| \|-(
		\| \| \| `-)
		\| \| `-;
		\| \|-IfStatement
		\| \| \|-if
		\| \| \|-(
		\| \| \|-UnknownExpression
		\| \| \| `-true
		\| \| \|-)
		\| \| \|-ExpressionStatement
		\| \| \| \|-UnknownExpression
		\| \| \| \| \|-test
		\| \| \| \| \|-(
		\| \| \| \| `-)
		\| \| \| `-;
		\| \| \|-else
		\| \| `-ExpressionStatement
		\| \| \|-UnknownExpression
		\| \| \| \|-test
		\| \| \| \|-(
		\| \| \| `-)
		\| \| `-;
		\| `-}
		`-<eof>
		)txt"}};

for (const auto &T : Cases) {		for (const auto &T : Cases) {
auto *Root = buildTree(T.first);		auto *Root = buildTree(T.first);
std::string Expected = llvm::StringRef(T.second).trim().str();		std::string Expected = llvm::StringRef(T.second).trim().str();
std::string Actual = llvm::StringRef(Root->dump(*Arena)).trim();		std::string Actual = llvm::StringRef(Root->dump(*Arena)).trim();
EXPECT_EQ(Expected, Actual) << "the resulting dump is:\n" << Actual;		EXPECT_EQ(Expected, Actual) << "the resulting dump is:\n" << Actual;
}		}
}		}
} // namespace		} // namespace

This is an archive of the discontinued LLVM Phabricator instance.

[Syntax] Add nodes for most common statementsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 208995

clang/include/clang/Tooling/Syntax/Nodes.h

clang/lib/Tooling/Syntax/BuildTree.cpp

clang/lib/Tooling/Syntax/Nodes.cpp

clang/lib/Tooling/Syntax/Tree.cpp

clang/unittests/Tooling/Syntax/TreeTest.cpp

[Syntax] Add nodes for most common statements
ClosedPublic