This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang-tools-extra/pseudo/
-
pseudo/
-
include/clang-pseudo/
-
clang-pseudo/
2/4
GLR.h
-
lib/
3/3
Forest.cpp
-
GLR.cpp
-
cxx/
-
cxx.bnf
-
grammar/
-
LRGraph.cpp
-
test/
1/1
lr-build-basic.test
-
lr-build-conflicts.test
-
unittests/
-
ForestTest.cpp
1/1
GLRTest.cpp

Differential D130550

[pseudo] Start rules are `_ := start-symbol EOF`, improve recovery.
ClosedPublic

Authored by sammccall on Jul 25 2022, 11:46 PM.

Download Raw Diff

Details

Reviewers

hokein

Commits

rGbd5cc6575bdb: [pseudo] Start rules are `_ := start-symbol EOF`, improve recovery.

Summary

Previously we were calling glrRecover() ad-hoc at the end of input.
Two main problems with this:

glrRecover() on two separate code paths is inelegant
We may have to recover several times in succession (e.g. to exit from nested scopes), so we need a loop at end-of-file

Having an actual shift action for an EOF terminal allows us to handle
both concerns in the main shift/recover/reduce loop.

This revealed a recovery design bug where recovery could enter a loop by
repeatedly choosing the same parent to identically recover from.
Addressed this by allowing each node to be used as a recovery base once.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

sammccall created this revision.Jul 25 2022, 11:46 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 25 2022, 11:46 PM

sammccall requested review of this revision.Jul 25 2022, 11:46 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 25 2022, 11:46 PM

Herald added subscribers: cfe-commits, alextsao1999. · View Herald Transcript

Harbormaster completed remote builds in B177545: Diff 447580.Jul 26 2022, 12:06 AM

sammccall mentioned this in D130551: [pseudo] Allow opaque nodes to represent terminals.Jul 26 2022, 3:37 AM

+1 on this change, it would make the expose-lookahead-index-to-guard change easier.

clang-tools-extra/pseudo/include/clang-pseudo/GLR.h
74	haven't look at it deeply -- is this bug related to this eof change? This looks like a different bug in recovery.
clang-tools-extra/pseudo/lib/Forest.cpp
191	nit: in the underlying TokenStream implementation, `tokens()` has a trailing eof token, I think we can fold this into the above loop (if we expose a `token_eof()` method in TokenStream). Not sure we should do this.

sammccall added inline comments.Jul 26 2022, 7:21 AM

clang-tools-extra/pseudo/include/clang-pseudo/GLR.h
74	They're "related" in that they both fix repeated-recovery scenarios. This change fixes that we can hit an infinite loop when applying recovery repeatedly. The eof change fixes that recovery is (erroneously) only applied once at eof. I hoped to cover them with the same testcase, which tests repeated recovery at EOF. I can extract this change with a separate test if you like, though it will be very similar to the one I have here.
clang-tools-extra/pseudo/lib/Forest.cpp
191	I think this doesn't generalize well... at the moment we're parsing the whole stream, but in future we likely want to parse a subrange (pp-disabled regions?). In such a case we would still want the terminating EOF terminal as a device for parsing, even though there's no corresponding token.

hokein added inline comments.Jul 29 2022, 1:27 AM

clang-tools-extra/pseudo/include/clang-pseudo/GLR.h
74	This change fixes that we can hit an infinite loop when applying recovery repeatedly. I'm more worried about this bug, I think this is an important bug, worth a separate patch to fix it, right now it looks like a join-effort in the eof change. The eof change fixes that recovery is (erroneously) only applied once at eof. Not sure I follow this. I think the eof change is basically to remove a technical debt (avoid the special case and repeated code after main parsing loop). Am I missing something?
clang-tools-extra/pseudo/lib/Forest.cpp
191	oh, ok, that's fair enough.
clang-tools-extra/pseudo/test/lr-build-basic.test
16	there should be a State 4 (with a `_ := expr EOF •` item)
clang-tools-extra/pseudo/unittests/GLRTest.cpp
622	remove the debugging statement.

sammccall added inline comments.Jul 29 2022, 1:50 AM

clang-tools-extra/pseudo/include/clang-pseudo/GLR.h
74	Not sure I follow this. I think the eof change is basically to remove a technical debt (avoid the special case and repeated code after main parsing loop). Am I missing something? There's a bug lurking in that tech debt: if recovery does not advance the cursor then we should go around the loop again, but if it happens at eof then (in the old code) there's no loop to go around at all.

This revision was not accepted when it landed; it landed in state Needs Review.Aug 19 2022, 7:50 AM

This revision was landed with ongoing or failed builds.

Closed by commit rGbd5cc6575bdb: [pseudo] Start rules are `_ := start-symbol EOF`, improve recovery. (authored by sammccall). · Explain Why

This revision was automatically updated to reflect the committed changes.

sammccall marked 4 inline comments as done.

sammccall added a commit: rGbd5cc6575bdb: [pseudo] Start rules are `_ := start-symbol EOF`, improve recovery..

Revision Contents

Path

Size

clang-tools-extra/

pseudo/

include/

clang-pseudo/

GLR.h

2 lines

lib/

Forest.cpp

8 lines

GLR.cpp

59 lines

cxx/

cxx.bnf

6 lines

grammar/

LRGraph.cpp

5 lines

test/

lr-build-basic.test

12 lines

lr-build-conflicts.test

33 lines

unittests/

ForestTest.cpp

16 lines

GLRTest.cpp

43 lines

Diff 447580

clang-tools-extra/pseudo/include/clang-pseudo/GLR.h

Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	struct GSS {
// Nodes are not exactly pushed and popped on the stack: pushing is just		// Nodes are not exactly pushed and popped on the stack: pushing is just
// allocating a new head node with a parent pointer to the old head. Popping		// allocating a new head node with a parent pointer to the old head. Popping
// is just forgetting about a node and remembering its parent instead.		// is just forgetting about a node and remembering its parent instead.
struct alignas(struct Node *) Node {		struct alignas(struct Node *) Node {
// LR state describing how parsing should continue from this head.		// LR state describing how parsing should continue from this head.
LRTable::StateID State;		LRTable::StateID State;
// Used internally to track reachability during garbage collection.		// Used internally to track reachability during garbage collection.
bool GCParity;		bool GCParity;
		// Have we already used this node for error recovery? (prevents loops)
		hokeinUnsubmitted Not Done Reply Inline Actions haven't look at it deeply -- is this bug related to this eof change? This looks like a different bug in recovery. hokein: haven't look at it deeply -- is this bug related to this eof change? This looks like a…
		sammccallAuthorUnsubmitted Done Reply Inline Actions They're "related" in that they both fix repeated-recovery scenarios. This change fixes that we can hit an infinite loop when applying recovery repeatedly. The eof change fixes that recovery is (erroneously) only applied once at eof. I hoped to cover them with the same testcase, which tests repeated recovery at EOF. I can extract this change with a separate test if you like, though it will be very similar to the one I have here. sammccall: They're "related" in that they both fix repeated-recovery scenarios. This change fixes that we…
		hokeinUnsubmitted Not Done Reply Inline Actions This change fixes that we can hit an infinite loop when applying recovery repeatedly. I'm more worried about this bug, I think this is an important bug, worth a separate patch to fix it, right now it looks like a join-effort in the eof change. The eof change fixes that recovery is (erroneously) only applied once at eof. Not sure I follow this. I think the eof change is basically to remove a technical debt (avoid the special case and repeated code after main parsing loop). Am I missing something? hokein: > This change fixes that we can hit an infinite loop when applying recovery repeatedly. I'm…
		sammccallAuthorUnsubmitted Done Reply Inline Actions Not sure I follow this. I think the eof change is basically to remove a technical debt (avoid the special case and repeated code after main parsing loop). Am I missing something? There's a bug lurking in that tech debt: if recovery does not advance the cursor then we should go around the loop again, but if it happens at eof then (in the old code) there's no loop to go around at all. sammccall: > Not sure I follow this. I think the eof change is basically to remove a technical debt (avoid…
		mutable bool Recovered = false;
// Number of the parents of this node.		// Number of the parents of this node.
// The parents hold previous parsed symbols, and may resume control after		// The parents hold previous parsed symbols, and may resume control after
// this node is reduced.		// this node is reduced.
unsigned ParentCount;		unsigned ParentCount;
// The parse node for the last parsed symbol.		// The parse node for the last parsed symbol.
// This symbol appears on the left of the dot in the parse state's items.		// This symbol appears on the left of the dot in the parse state's items.
// (In the literature, the node is attached to the edge to the parent).		// (In the literature, the node is attached to the edge to the parent).
const ForestNode *Payload = nullptr;		const ForestNode *Payload = nullptr;
▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

clang-tools-extra/pseudo/lib/Forest.cpp

Show First 20 Lines • Show All 172 Lines • ▼ Show 20 Lines	std::function<void(const ForestNode *, Token::Index, llvm::Optional<SymbolID>,
};		};
LineDecoration LineDec;		LineDecoration LineDec;
Dump(this, KEnd, llvm::None, LineDec);		Dump(this, KEnd, llvm::None, LineDec);
return Result;		return Result;
}		}

llvm::ArrayRef<ForestNode>		llvm::ArrayRef<ForestNode>
ForestArena::createTerminals(const TokenStream &Code) {		ForestArena::createTerminals(const TokenStream &Code) {
ForestNode *Terminals = Arena.Allocate<ForestNode>(Code.tokens().size());		ForestNode *Terminals = Arena.Allocate<ForestNode>(Code.tokens().size() + 1);
size_t Index = 0;		size_t Index = 0;
for (const auto &T : Code.tokens()) {		for (const auto &T : Code.tokens()) {
new (&Terminals[Index])		new (&Terminals[Index])
ForestNode(ForestNode::Terminal, tokenSymbol(T.Kind),		ForestNode(ForestNode::Terminal, tokenSymbol(T.Kind),
/Start=/Index, /TerminalData/ 0);		/Start=/Index, /TerminalData/ 0);
++Index;		++Index;
}		}
		// Include an `eof` terminal.
		// This is important to drive the final shift/recover/reduce loop.
		new (&Terminals[Index])
		hokeinUnsubmitted Done Reply Inline Actions nit: in the underlying TokenStream implementation, `tokens()` has a trailing eof token, I think we can fold this into the above loop (if we expose a `token_eof()` method in TokenStream). Not sure we should do this. hokein: nit: in the underlying TokenStream implementation, `tokens()` has a trailing eof token, I think…
		sammccallAuthorUnsubmitted Done Reply Inline Actions I think this doesn't generalize well... at the moment we're parsing the whole stream, but in future we likely want to parse a subrange (pp-disabled regions?). In such a case we would still want the terminating EOF terminal as a device for parsing, even though there's no corresponding token. sammccall: I think this doesn't generalize well... at the moment we're parsing the whole stream, but in…
		hokeinUnsubmitted Done Reply Inline Actions oh, ok, that's fair enough. hokein: oh, ok, that's fair enough.
		ForestNode(ForestNode::Terminal, tokenSymbol(tok::eof),
		/Start=/Index, /TerminalData/ 0);
		++Index;
NodeCount = Index;		NodeCount = Index;
return llvm::makeArrayRef(Terminals, Index);		return llvm::makeArrayRef(Terminals, Index);
}		}

} // namespace pseudo		} // namespace pseudo
} // namespace clang		} // namespace clang

clang-tools-extra/pseudo/lib/GLR.cpp

Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	void glrRecover(llvm::ArrayRef<const GSS::Node *> OldHeads,
// (e.g. in the expr recovery above, stay inside the parentheses).		// (e.g. in the expr recovery above, stay inside the parentheses).
// FIXME: find a more satisfying way to avoid such false recovery.		// FIXME: find a more satisfying way to avoid such false recovery.
// FIXME: Add a test for spurious recovery once tests can define strategies.		// FIXME: Add a test for spurious recovery once tests can define strategies.
std::vector<const ForestNode *> Path;		std::vector<const ForestNode *> Path;
llvm::DenseSet<const GSS::Node *> Seen;		llvm::DenseSet<const GSS::Node *> Seen;
auto WalkUp = [&](const GSS::Node *N, Token::Index NextTok, auto &WalkUp) {		auto WalkUp = [&](const GSS::Node *N, Token::Index NextTok, auto &WalkUp) {
if (!Seen.insert(N).second)		if (!Seen.insert(N).second)
return;		return;
		if (!N->Recovered) { // Don't recover the same way twice!
for (auto Strategy : Lang.Table.getRecovery(N->State)) {		for (auto Strategy : Lang.Table.getRecovery(N->State)) {
Options.push_back(PlaceholderRecovery{		Options.push_back(PlaceholderRecovery{
NextTok,		NextTok,
Strategy.Result,		Strategy.Result,
Strategy.Strategy,		Strategy.Strategy,
N,		N,
Path,		Path,
});		});
LLVM_DEBUG(llvm::dbgs()		LLVM_DEBUG(llvm::dbgs()
<< "Option: recover " << Lang.G.symbolName(Strategy.Result)		<< "Option: recover " << Lang.G.symbolName(Strategy.Result)
<< " at token " << NextTok << "\n");		<< " at token " << NextTok << "\n");
}		}
		}
Path.push_back(N->Payload);		Path.push_back(N->Payload);
for (const GSS::Node *Parent : N->parents())		for (const GSS::Node *Parent : N->parents())
WalkUp(Parent, N->Payload->startTokenIndex(), WalkUp);		WalkUp(Parent, N->Payload->startTokenIndex(), WalkUp);
Path.pop_back();		Path.pop_back();
};		};
for (auto *N : OldHeads)		for (auto *N : OldHeads)
WalkUp(N, TokenIndex, WalkUp);		WalkUp(N, TokenIndex, WalkUp);

▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	void glrRecover(llvm::ArrayRef<const GSS::Node *> OldHeads,
});		});
// FIXME: in general, we might have the same Option->Symbol multiple times,		// FIXME: in general, we might have the same Option->Symbol multiple times,
// and we risk creating redundant Forest and GSS nodes.		// and we risk creating redundant Forest and GSS nodes.
// We also may inadvertently set up the next glrReduce to create a sequence		// We also may inadvertently set up the next glrReduce to create a sequence
// node duplicating an opaque node that we're creating here.		// node duplicating an opaque node that we're creating here.
// There are various options, including simply breaking ties between options.		// There are various options, including simply breaking ties between options.
// For now it's obscure enough to ignore.		// For now it's obscure enough to ignore.
for (const PlaceholderRecovery *Option : BestOptions) {		for (const PlaceholderRecovery *Option : BestOptions) {
		Option->RecoveryNode->Recovered = true;
const ForestNode &Placeholder =		const ForestNode &Placeholder =
Params.Forest.createOpaque(Option->Symbol, RecoveryRange->Begin);		Params.Forest.createOpaque(Option->Symbol, RecoveryRange->Begin);
const GSS::Node *NewHead = Params.GSStack.addNode(		const GSS::Node *NewHead = Params.GSStack.addNode(
*Lang.Table.getGoToState(Option->RecoveryNode->State, Option->Symbol),		*Lang.Table.getGoToState(Option->RecoveryNode->State, Option->Symbol),
&Placeholder, {Option->RecoveryNode});		&Placeholder, {Option->RecoveryNode});
NewHeads.push_back(NewHead);		NewHeads.push_back(NewHead);
}		}
TokenIndex = RecoveryRange->End;		TokenIndex = RecoveryRange->End;
▲ Show 20 Lines • Show All 444 Lines • ▼ Show 20 Lines	for (unsigned I = 0; I < Terminals.size();) {
Reduce(NextHeads, Lookahead);		Reduce(NextHeads, Lookahead);
// Prepare for the next token.		// Prepare for the next token.
std::swap(Heads, NextHeads);		std::swap(Heads, NextHeads);
NextHeads.clear();		NextHeads.clear();
MaybeGC();		MaybeGC();
}		}
LLVM_DEBUG(llvm::dbgs() << llvm::formatv("Reached eof\n"));		LLVM_DEBUG(llvm::dbgs() << llvm::formatv("Reached eof\n"));

// The parse was successful if we're in state `_ := start-symbol .`		// The parse was successful if in state `_ := start-symbol EOF .`
auto AcceptState = Lang.Table.getGoToState(StartState, StartSymbol);		// The GSS parent has `_ := start-symbol . EOF`; its payload is the parse.
assert(AcceptState.has_value() && "goto must succeed after start symbol!");		auto AfterStart = Lang.Table.getGoToState(StartState, StartSymbol);
		assert(AfterStart.has_value() && "goto must succeed after start symbol!");
		auto Accept = Lang.Table.getShiftState(*AfterStart, tokenSymbol(tok::eof));
		assert(Accept.has_value() && "shift EOF must succeed!");
auto SearchForAccept = [&](llvm::ArrayRef<const GSS::Node *> Heads) {		auto SearchForAccept = [&](llvm::ArrayRef<const GSS::Node *> Heads) {
const ForestNode *Result = nullptr;		const ForestNode *Result = nullptr;
for (const auto *Head : Heads) {		for (const auto *Head : Heads) {
if (Head->State == *AcceptState) {		if (Head->State == *Accept) {
assert(Head->Payload->symbol() == StartSymbol);		assert(Head->Payload->symbol() == tokenSymbol(tok::eof));
assert(Result == nullptr && "multiple results!");		assert(Result == nullptr && "multiple results!");
Result = Head->Payload;		Result = Head->parents().front()->Payload;
		assert(Result->symbol() == StartSymbol);
}		}
}		}
return Result;		return Result;
};		};
if (auto *Result = SearchForAccept(Heads))		if (auto *Result = SearchForAccept(Heads))
return *Result;		return *Result;
// Failed to parse the input, attempt to run recovery.
// FIXME: this awkwardly repeats the recovery in the loop, when shift fails.
// More elegant is to include EOF in the token stream, and make the
// augmented rule: `_ := translation-unit EOF`. In this way recovery at EOF
// would not be a special case: it show up as a failure to shift the EOF
// token.
unsigned I = Terminals.size();
glrRecover(Heads, I, Params, Lang, NextHeads);
Reduce(NextHeads, tokenSymbol(tok::eof));
if (auto *Result = SearchForAccept(NextHeads))
return *Result;

// We failed to parse the input, returning an opaque forest node for recovery.		// We failed to parse the input, returning an opaque forest node for recovery.
// FIXME: as above, we can add fallback error handling so this is impossible.		// FIXME: as above, we can add fallback error handling so this is impossible.
return Params.Forest.createOpaque(StartSymbol, /Token::Index=/0);		return Params.Forest.createOpaque(StartSymbol, /Token::Index=/0);
}		}

void glrReduce(std::vector<const GSS::Node *> &Heads, SymbolID Lookahead,		void glrReduce(std::vector<const GSS::Node *> &Heads, SymbolID Lookahead,
const ParseParams &Params, const Language &Lang) {		const ParseParams &Params, const Language &Lang) {
// Create a new GLRReduce each time for tests, performance doesn't matter.		// Create a new GLRReduce each time for tests, performance doesn't matter.
GLRReduce{Params, Lang}(Heads, Lookahead);		GLRReduce{Params, Lang}(Heads, Lookahead);
}		}

const GSS::Node GSS::addNode(LRTable::StateID State, const ForestNode Symbol,		const GSS::Node GSS::addNode(LRTable::StateID State, const ForestNode Symbol,

llvm::ArrayRef<const Node *> Parents) {		llvm::ArrayRef<const Node *> Parents) {
Node *Result = new (allocate(Parents.size()))		Node *Result = new (allocate(Parents.size())) Node();
Node({State, GCParity, static_cast<uint16_t>(Parents.size())});		Result->State = State;
		Result->GCParity = GCParity;
		Result->ParentCount = Parents.size();
Alive.push_back(Result);		Alive.push_back(Result);
++NodesCreated;		++NodesCreated;
Result->Payload = Symbol;		Result->Payload = Symbol;
if (!Parents.empty())		if (!Parents.empty())
llvm::copy(Parents, reinterpret_cast<const Node **>(Result + 1));		llvm::copy(Parents, reinterpret_cast<const Node **>(Result + 1));
return Result;		return Result;
}		}

▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

clang-tools-extra/pseudo/lib/cxx/cxx.bnf

	Show All 23 Lines
	#			#
	# [1] https://isocpp.org/files/papers/N4860.pdf			# [1] https://isocpp.org/files/papers/N4860.pdf

	# _ lists all the start-symbols which we support parsing.			# _ lists all the start-symbols which we support parsing.
	#			#
	# We list important nonterminals as start symbols, rather than doing it for all			# We list important nonterminals as start symbols, rather than doing it for all
	# nonterminals by default, this reduces the number of states by 30% and LRTable			# nonterminals by default, this reduces the number of states by 30% and LRTable
	# actions by 16%.			# actions by 16%.
	_ := translation-unit			_ := translation-unit EOF
	_ := statement-seq			_ := statement-seq EOF
	_ := declaration-seq			_ := declaration-seq EOF

	# gram.key			# gram.key
	typedef-name := IDENTIFIER			typedef-name := IDENTIFIER
	typedef-name := simple-template-id			typedef-name := simple-template-id
	namespace-name := IDENTIFIER			namespace-name := IDENTIFIER
	namespace-name := namespace-alias			namespace-name := namespace-alias
	namespace-alias := IDENTIFIER			namespace-alias := IDENTIFIER
	class-name := IDENTIFIER			class-name := IDENTIFIER
	▲ Show 20 Lines • Show All 734 Lines • Show Last 20 Lines

clang-tools-extra/pseudo/lib/grammar/LRGraph.cpp

Show First 20 Lines • Show All 234 Lines • ▼ Show 20 Lines	LRGraph LRGraph::buildLR0(const Grammar &G) {
auto RRange = G.table().Nonterminals[G.underscore()].RuleRange;		auto RRange = G.table().Nonterminals[G.underscore()].RuleRange;
for (RuleID RID = RRange.Start; RID < RRange.End; ++RID) {		for (RuleID RID = RRange.Start; RID < RRange.End; ++RID) {
auto StartState = std::vector<Item>{Item::start(RID, G)};		auto StartState = std::vector<Item>{Item::start(RID, G)};
auto Result = Builder.insert(std::move(StartState));		auto Result = Builder.insert(std::move(StartState));
assert(Result.second && "State must be new");		assert(Result.second && "State must be new");
PendingStates.push_back(Result.first);		PendingStates.push_back(Result.first);

const Rule &StartRule = G.lookupRule(RID);		const Rule &StartRule = G.lookupRule(RID);
assert(StartRule.Size == 1 &&		assert(StartRule.Size == 2 &&
"Start rule must have exactly one symbol in its body!");		StartRule.seq().back() == tokenSymbol(tok::eof) &&
		"Start rule must be of the form `_ := start-symbol EOF`!");
Builder.addStartState(StartRule.seq().front(), Result.first);		Builder.addStartState(StartRule.seq().front(), Result.first);
}		}

while (!PendingStates.empty()) {		while (!PendingStates.empty()) {
auto StateID = PendingStates.back();		auto StateID = PendingStates.back();
PendingStates.pop_back();		PendingStates.pop_back();
for (auto Next : nextAvailableKernelItems(Builder.find(StateID), G)) {		for (auto Next : nextAvailableKernelItems(Builder.find(StateID), G)) {
auto Insert = Builder.insert(Next.second);		auto Insert = Builder.insert(Next.second);
Show All 12 Lines

clang-tools-extra/pseudo/test/lr-build-basic.test

	_ := expr			_ := expr EOF
	expr := id			expr := id
	id := IDENTIFIER			id := IDENTIFIER

	# RUN: clang-pseudo -grammar %s -print-graph \| FileCheck %s --check-prefix=GRAPH			# RUN: clang-pseudo -grammar %s -print-graph \| FileCheck %s --check-prefix=GRAPH
	# GRAPH: States:			# GRAPH: States:
	# GRAPH-NEXT: State 0			# GRAPH-NEXT: State 0
	# GRAPH-NEXT: _ := • expr			# GRAPH-NEXT: _ := • expr EOF
	# GRAPH-NEXT: expr := • id			# GRAPH-NEXT: expr := • id
	# GRAPH-NEXT: id := • IDENTIFIER			# GRAPH-NEXT: id := • IDENTIFIER
	# GRAPH-NEXT: State 1			# GRAPH-NEXT: State 1
	# GRAPH-NEXT: _ := expr •			# GRAPH-NEXT: _ := expr • EOF
	# GRAPH-NEXT: State 2			# GRAPH-NEXT: State 2
	# GRAPH-NEXT: expr := id •			# GRAPH-NEXT: expr := id •
	# GRAPH-NEXT: State 3			# GRAPH-NEXT: State 3
	# GRAPH-NEXT: id := IDENTIFIER •			# GRAPH-NEXT: id := IDENTIFIER •
				hokeinUnsubmitted Done Reply Inline Actions there should be a State 4 (with a `_ := expr EOF •` item) hokein: there should be a State 4 (with a `_ := expr EOF •` item)

	# RUN: clang-pseudo -grammar %s -print-table \| FileCheck %s --check-prefix=TABLE			# RUN: clang-pseudo -grammar %s -print-table \| FileCheck %s --check-prefix=TABLE
	# TABLE: LRTable:			# TABLE: LRTable:
	# TABLE-NEXT: State 0			# TABLE-NEXT: State 0
	# TABLE-NEXT: IDENTIFIER: shift state 3			# TABLE-NEXT: IDENTIFIER: shift state 3
	# TABLE-NEXT: expr: go to state 1			# TABLE-NEXT: expr: go to state 1
	# TABLE-NEXT: id: go to state 2			# TABLE-NEXT: id: go to state 2
	# TABLE-NEXT: State 1			# TABLE-NEXT: State 1
				# TABLE-NEXT: EOF: shift state 4
	# TABLE-NEXT: State 2			# TABLE-NEXT: State 2
	# TABLE-NEXT: EOF: reduce by rule 1 'expr := id'			# TABLE-NEXT: EOF: reduce by rule 2 'expr := id'
	# TABLE-NEXT: State 3			# TABLE-NEXT: State 3
	# TABLE-NEXT: EOF: reduce by rule 0 'id := IDENTIFIER'			# TABLE-NEXT: EOF: reduce by rule 1 'id := IDENTIFIER'
				# TABLE-NEXT: State 4

clang-tools-extra/pseudo/test/lr-build-conflicts.test

	_ := expr			_ := expr EOF
	expr := expr - expr # S/R conflict at state 4 on '-' token			expr := expr - expr # S/R conflict at state 4 on '-' token
	expr := IDENTIFIER			expr := IDENTIFIER

	# RUN: clang-pseudo -grammar %s -print-graph \| FileCheck %s --check-prefix=GRAPH			# RUN: clang-pseudo -grammar %s -print-graph \| FileCheck %s --check-prefix=GRAPH
	# GRAPH: States			# GRAPH: States
	# GRAPH-NEXT: State 0			# GRAPH-NEXT: State 0
				# GRAPH-NEXT: _ := • expr EOF
	# GRAPH-NEXT: expr := • expr - expr			# GRAPH-NEXT: expr := • expr - expr
	# GRAPH-NEXT: _ := • expr
	# GRAPH-NEXT: expr := • IDENTIFIER			# GRAPH-NEXT: expr := • IDENTIFIER
	# GRAPH-NEXT: State 1			# GRAPH-NEXT: State 1
	# GRAPH-NEXT: _ := expr •			# GRAPH-NEXT: _ := expr • EOF
	# GRAPH-NEXT: expr := expr • - expr			# GRAPH-NEXT: expr := expr • - expr
	# GRAPH-NEXT: State 2			# GRAPH-NEXT: State 2
	# GRAPH-NEXT: expr := IDENTIFIER •			# GRAPH-NEXT: expr := IDENTIFIER •
	# GRAPH-NEXT: State 3			# GRAPH-NEXT: State 3
				# GRAPH-NEXT: _ := expr EOF •
				# GRAPH-NEXT: State 4
	# GRAPH-NEXT: expr := • expr - expr			# GRAPH-NEXT: expr := • expr - expr
	# GRAPH-NEXT: expr := expr - • expr			# GRAPH-NEXT: expr := expr - • expr
	# GRAPH-NEXT: expr := • IDENTIFIER			# GRAPH-NEXT: expr := • IDENTIFIER
	# GRAPH-NEXT: State 4			# GRAPH-NEXT: State 5
	# GRAPH-NEXT: expr := expr - expr •			# GRAPH-NEXT: expr := expr - expr •
	# GRAPH-NEXT: expr := expr • - expr			# GRAPH-NEXT: expr := expr • - expr
	# GRAPH-NEXT: 0 ->[expr] 1			# GRAPH-NEXT: 0 ->[expr] 1
	# GRAPH-NEXT: 0 ->[IDENTIFIER] 2			# GRAPH-NEXT: 0 ->[IDENTIFIER] 2
	# GRAPH-NEXT: 1 ->[-] 3			# GRAPH-NEXT: 1 ->[EOF] 3
	# GRAPH-NEXT: 3 ->[expr] 4			# GRAPH-NEXT: 1 ->[-] 4
	# GRAPH-NEXT: 3 ->[IDENTIFIER] 2			# GRAPH-NEXT: 4 ->[expr] 5
	# GRAPH-NEXT: 4 ->[-] 3			# GRAPH-NEXT: 4 ->[IDENTIFIER] 2
				# GRAPH-NEXT: 5 ->[-] 4

	# RUN: clang-pseudo -grammar %s -print-table \| FileCheck %s --check-prefix=TABLE			# RUN: clang-pseudo -grammar %s -print-table \| FileCheck %s --check-prefix=TABLE
	# TABLE: LRTable:			# TABLE: LRTable:
	# TABLE-NEXT: State 0			# TABLE-NEXT: State 0
	# TABLE-NEXT: IDENTIFIER: shift state 2			# TABLE-NEXT: IDENTIFIER: shift state 2
	# TABLE-NEXT: expr: go to state 1			# TABLE-NEXT: expr: go to state 1
	# TABLE-NEXT: State 1			# TABLE-NEXT: State 1
	# TABLE-NEXT: -: shift state 3			# TABLE-NEXT: EOF: shift state 3
				# TABLE-NEXT: -: shift state 4
	# TABLE-NEXT: State 2			# TABLE-NEXT: State 2
	# TABLE-NEXT: EOF -: reduce by rule 1 'expr := IDENTIFIER'			# TABLE-NEXT: EOF -: reduce by rule 2 'expr := IDENTIFIER'
	# TABLE-NEXT: State 3			# TABLE-NEXT: State 3
	# TABLE-NEXT: IDENTIFIER: shift state 2
	# TABLE-NEXT: expr: go to state 4
	# TABLE-NEXT: State 4			# TABLE-NEXT: State 4
	# TABLE-NEXT: -: shift state 3			# TABLE-NEXT: IDENTIFIER: shift state 2
	# TABLE-NEXT: EOF -: reduce by rule 0 'expr := expr - expr'			# TABLE-NEXT: expr: go to state 5
				# TABLE-NEXT: State 5
				# TABLE-NEXT: -: shift state 4
				# TABLE-NEXT: EOF -: reduce by rule 1 'expr := expr - expr'

clang-tools-extra/pseudo/unittests/ForestTest.cpp

	Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines

	protected:			protected:
	Grammar G;			Grammar G;
	std::vector<std::string> Diags;			std::vector<std::string> Diags;
	};			};

	TEST_F(ForestTest, DumpBasic) {			TEST_F(ForestTest, DumpBasic) {
	build(R"cpp(			build(R"cpp(
	_ := add-expression			_ := add-expression EOF
	add-expression := id-expression + id-expression			add-expression := id-expression + id-expression
	id-expression := IDENTIFIER			id-expression := IDENTIFIER
	)cpp");			)cpp");
	ASSERT_TRUE(Diags.empty());			ASSERT_TRUE(Diags.empty());
	ForestArena Arena;			ForestArena Arena;
	const auto &TS =			const auto &TS =
	cook(lex("a + b", clang::LangOptions()), clang::LangOptions());			cook(lex("a + b", clang::LangOptions()), clang::LangOptions());

	auto T = Arena.createTerminals(TS);			auto T = Arena.createTerminals(TS);
	ASSERT_EQ(T.size(), 3u);			ASSERT_EQ(T.size(), 4u);
	const auto *Left = &Arena.createSequence(			const auto *Left = &Arena.createSequence(
	symbol("id-expression"), ruleFor("id-expression"), {&T.front()});			symbol("id-expression"), ruleFor("id-expression"), {&T.front()});
	const auto *Right = &Arena.createSequence(symbol("id-expression"),			const auto *Right = &Arena.createSequence(symbol("id-expression"),
	ruleFor("id-expression"), {&T[2]});			ruleFor("id-expression"), {&T[2]});

	const auto *Add =			const auto *Add =
	&Arena.createSequence(symbol("add-expression"), ruleFor("add-expression"),			&Arena.createSequence(symbol("add-expression"), ruleFor("add-expression"),
	{Left, &T[1], Right});			{Left, &T[1], Right});
	EXPECT_EQ(Add->dumpRecursive(G, true),			EXPECT_EQ(Add->dumpRecursive(G, true),
	"[ 0, end) add-expression := id-expression + id-expression\n"			"[ 0, end) add-expression := id-expression + id-expression\n"
	"[ 0, 1) ├─id-expression~IDENTIFIER := tok[0]\n"			"[ 0, 1) ├─id-expression~IDENTIFIER := tok[0]\n"
	"[ 1, 2) ├─+ := tok[1]\n"			"[ 1, 2) ├─+ := tok[1]\n"
	"[ 2, end) └─id-expression~IDENTIFIER := tok[2]\n");			"[ 2, end) └─id-expression~IDENTIFIER := tok[2]\n");
	EXPECT_EQ(Add->dumpRecursive(G, false),			EXPECT_EQ(Add->dumpRecursive(G, false),
	"[ 0, end) add-expression := id-expression + id-expression\n"			"[ 0, end) add-expression := id-expression + id-expression\n"
	"[ 0, 1) ├─id-expression := IDENTIFIER\n"			"[ 0, 1) ├─id-expression := IDENTIFIER\n"
	"[ 0, 1) │ └─IDENTIFIER := tok[0]\n"			"[ 0, 1) │ └─IDENTIFIER := tok[0]\n"
	"[ 1, 2) ├─+ := tok[1]\n"			"[ 1, 2) ├─+ := tok[1]\n"
	"[ 2, end) └─id-expression := IDENTIFIER\n"			"[ 2, end) └─id-expression := IDENTIFIER\n"
	"[ 2, end) └─IDENTIFIER := tok[2]\n");			"[ 2, end) └─IDENTIFIER := tok[2]\n");
	}			}

	TEST_F(ForestTest, DumpAmbiguousAndRefs) {			TEST_F(ForestTest, DumpAmbiguousAndRefs) {
	build(R"cpp(			build(R"cpp(
	_ := type			_ := type EOF
	type := class-type # rule 3			type := class-type # rule 4
	type := enum-type # rule 4			type := enum-type # rule 5
	class-type := shared-type			class-type := shared-type
	enum-type := shared-type			enum-type := shared-type
	shared-type := IDENTIFIER)cpp");			shared-type := IDENTIFIER)cpp");
	ASSERT_TRUE(Diags.empty());			ASSERT_TRUE(Diags.empty());
	ForestArena Arena;			ForestArena Arena;
	const auto &TS = cook(lex("abc", clang::LangOptions()), clang::LangOptions());			const auto &TS = cook(lex("abc", clang::LangOptions()), clang::LangOptions());

	auto Terminals = Arena.createTerminals(TS);			auto Terminals = Arena.createTerminals(TS);
	ASSERT_EQ(Terminals.size(), 1u);			ASSERT_EQ(Terminals.size(), 2u);

	const auto *SharedType = &Arena.createSequence(			const auto *SharedType = &Arena.createSequence(
	symbol("shared-type"), ruleFor("shared-type"), {Terminals.begin()});			symbol("shared-type"), ruleFor("shared-type"), {Terminals.begin()});
	const auto *ClassType = &Arena.createSequence(			const auto *ClassType = &Arena.createSequence(
	symbol("class-type"), ruleFor("class-type"), {SharedType});			symbol("class-type"), ruleFor("class-type"), {SharedType});
	const auto *EnumType = &Arena.createSequence(			const auto *EnumType = &Arena.createSequence(
	symbol("enum-type"), ruleFor("enum-type"), {SharedType});			symbol("enum-type"), ruleFor("enum-type"), {SharedType});
	const auto *Alternative1 =			const auto *Alternative1 =
	&Arena.createSequence(symbol("type"), /RuleID=/3, {ClassType});			&Arena.createSequence(symbol("type"), /RuleID=/4, {ClassType});
	const auto *Alternative2 =			const auto *Alternative2 =
	&Arena.createSequence(symbol("type"), /RuleID=/4, {EnumType});			&Arena.createSequence(symbol("type"), /RuleID=/5, {EnumType});
	const auto *Type =			const auto *Type =
	&Arena.createAmbiguous(symbol("type"), {Alternative1, Alternative2});			&Arena.createAmbiguous(symbol("type"), {Alternative1, Alternative2});
	EXPECT_EQ(Type->dumpRecursive(G),			EXPECT_EQ(Type->dumpRecursive(G),
	"[ 0, end) type := <ambiguous>\n"			"[ 0, end) type := <ambiguous>\n"
	"[ 0, end) ├─type := class-type\n"			"[ 0, end) ├─type := class-type\n"
	"[ 0, end) │ └─class-type := shared-type\n"			"[ 0, end) │ └─class-type := shared-type\n"
	"[ 0, end) │ └─shared-type := IDENTIFIER #1\n"			"[ 0, end) │ └─shared-type := IDENTIFIER #1\n"
	"[ 0, end) │ └─IDENTIFIER := tok[0]\n"			"[ 0, end) │ └─IDENTIFIER := tok[0]\n"
	▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

clang-tools-extra/pseudo/unittests/GLRTest.cpp

Show First 20 Lines • Show All 503 Lines • ▼ Show 20 Lines	TEST_F(GLRTest, PerfectForestNodeSharing) {
// Run the GLR on a simple grammar and test that we build exactly one forest		// Run the GLR on a simple grammar and test that we build exactly one forest
// node per (SymbolID, token range).		// node per (SymbolID, token range).

// This is a grmammar where the original parsing-stack-based forest node		// This is a grmammar where the original parsing-stack-based forest node
// sharing approach will fail. In its LR0 graph, it has two states containing		// sharing approach will fail. In its LR0 graph, it has two states containing
// item `expr := • IDENTIFIER`, and both have different goto states on the		// item `expr := • IDENTIFIER`, and both have different goto states on the
// nonterminal `expr`.		// nonterminal `expr`.
build(R"bnf(		build(R"bnf(
_ := test		_ := test EOF

test := { expr		test := { expr
test := { IDENTIFIER		test := { IDENTIFIER
test := left-paren expr		test := left-paren expr
left-paren := {		left-paren := {
expr := IDENTIFIER		expr := IDENTIFIER
)bnf");		)bnf");
TestLang.Table = LRTable::buildSLR(TestLang.G);		TestLang.Table = LRTable::buildSLR(TestLang.G);
Show All 22 Lines
TEST_F(GLRTest, GLRReduceOrder) {		TEST_F(GLRTest, GLRReduceOrder) {
// Given the following grammar, and the input `IDENTIFIER`, reductions should		// Given the following grammar, and the input `IDENTIFIER`, reductions should
// be performed in the following order:		// be performed in the following order:
// 1. foo := IDENTIFIER		// 1. foo := IDENTIFIER
// 2. { test := IDENTIFIER, test := foo }		// 2. { test := IDENTIFIER, test := foo }
// foo should be reduced first, so that in step 2 we have completed reduces		// foo should be reduced first, so that in step 2 we have completed reduces
// for test, and form an ambiguous forest node.		// for test, and form an ambiguous forest node.
build(R"bnf(		build(R"bnf(
_ := test		_ := test EOF

test := IDENTIFIER		test := IDENTIFIER
test := foo		test := foo
foo := IDENTIFIER		foo := IDENTIFIER
)bnf");		)bnf");
clang::LangOptions LOptions;		clang::LangOptions LOptions;
const TokenStream &Tokens = cook(lex("IDENTIFIER", LOptions), LOptions);		const TokenStream &Tokens = cook(lex("IDENTIFIER", LOptions), LOptions);
TestLang.Table = LRTable::buildSLR(TestLang.G);		TestLang.Table = LRTable::buildSLR(TestLang.G);
Show All 10 Lines
}		}

TEST_F(GLRTest, RecoveryEndToEnd) {		TEST_F(GLRTest, RecoveryEndToEnd) {
// Simple example of brace-based recovery showing:		// Simple example of brace-based recovery showing:
// - recovered region includes tokens both ahead of and behind the cursor		// - recovered region includes tokens both ahead of and behind the cursor
// - multiple possible recovery rules		// - multiple possible recovery rules
// - recovery from outer scopes is rejected		// - recovery from outer scopes is rejected
build(R"bnf(		build(R"bnf(
_ := block		_ := block EOF

block := { block [recover=Braces] }		block := { block [recover=Braces] }
block := { numbers [recover=Braces] }		block := { numbers [recover=Braces] }
numbers := NUMERIC_CONSTANT NUMERIC_CONSTANT		numbers := NUMERIC_CONSTANT NUMERIC_CONSTANT
)bnf");		)bnf");
TestLang.Table = LRTable::buildSLR(TestLang.G);		TestLang.Table = LRTable::buildSLR(TestLang.G);
TestLang.RecoveryStrategies.try_emplace(extensionID("Braces"), recoverBraces);		TestLang.RecoveryStrategies.try_emplace(extensionID("Braces"), recoverBraces);
clang::LangOptions LOptions;		clang::LangOptions LOptions;
Show All 12 Lines	EXPECT_EQ(Parsed.dumpRecursive(TestLang.G),
"[ 4, 5) │ │ └─} := tok[4]\n"		"[ 4, 5) │ │ └─} := tok[4]\n"
"[ 1, 5) │ └─block := { numbers [recover=Braces] }\n"		"[ 1, 5) │ └─block := { numbers [recover=Braces] }\n"
"[ 1, 2) │ ├─{ := tok[1]\n"		"[ 1, 2) │ ├─{ := tok[1]\n"
"[ 2, 4) │ ├─numbers := <opaque>\n"		"[ 2, 4) │ ├─numbers := <opaque>\n"
"[ 4, 5) │ └─} := tok[4]\n"		"[ 4, 5) │ └─} := tok[4]\n"
"[ 5, end) └─} := tok[5]\n");		"[ 5, end) └─} := tok[5]\n");
}		}

		TEST_F(GLRTest, RepeatedRecovery) {
		// We require multiple steps of recovery at eof and then a reduction in order
		// to successfully parse.
		build(R"bnf(
		_ := function EOF
		# FIXME: this forces EOF to be in follow(signature).
		# Remove it once we use unconstrained reduction for recovery.
		_ := signature EOF

		function := signature body [recover=Skip]
		signature := IDENTIFIER params [recover=Skip]
		params := ( )
		body := { }
		)bnf");
		TestLang.Table = LRTable::buildSLR(TestLang.G);
		llvm::errs() << TestLang.Table.dumpForTests(TestLang.G);
		hokeinUnsubmitted Done Reply Inline Actions remove the debugging statement. hokein: remove the debugging statement.
		TestLang.RecoveryStrategies.try_emplace(
		extensionID("Skip"),
		[](Token::Index Start, const TokenStream &) { return Start; });
		clang::LangOptions LOptions;
		TokenStream Tokens = cook(lex("main", LOptions), LOptions);

		const ForestNode &Parsed =
		glrParse({Tokens, Arena, GSStack}, id("function"), TestLang);
		EXPECT_EQ(Parsed.dumpRecursive(TestLang.G),
		"[ 0, end) function := signature body [recover=Skip]\n"
		"[ 0, 1) ├─signature := IDENTIFIER params [recover=Skip]\n"
		"[ 0, 1) │ ├─IDENTIFIER := tok[0]\n"
		"[ 1, 1) │ └─params := <opaque>\n"
		"[ 1, end) └─body := <opaque>\n");
		}


TEST_F(GLRTest, NoExplicitAccept) {		TEST_F(GLRTest, NoExplicitAccept) {
build(R"bnf(		build(R"bnf(
_ := test		_ := test EOF

test := IDENTIFIER test		test := IDENTIFIER test
test := IDENTIFIER		test := IDENTIFIER
)bnf");		)bnf");
clang::LangOptions LOptions;		clang::LangOptions LOptions;
// Given the following input, and the grammar above, we perform two reductions		// Given the following input, and the grammar above, we perform two reductions
// of the nonterminal `test` when the next token is `eof`, verify that the		// of the nonterminal `test` when the next token is `eof`, verify that the
// parser stops at the right state.		// parser stops at the right state.
const TokenStream &Tokens = cook(lex("id id", LOptions), LOptions);		const TokenStream &Tokens = cook(lex("id id", LOptions), LOptions);
TestLang.Table = LRTable::buildSLR(TestLang.G);		TestLang.Table = LRTable::buildSLR(TestLang.G);

const ForestNode &Parsed =		const ForestNode &Parsed =
glrParse({Tokens, Arena, GSStack}, id("test"), TestLang);		glrParse({Tokens, Arena, GSStack}, id("test"), TestLang);
EXPECT_EQ(Parsed.dumpRecursive(TestLang.G),		EXPECT_EQ(Parsed.dumpRecursive(TestLang.G),
"[ 0, end) test := IDENTIFIER test\n"		"[ 0, end) test := IDENTIFIER test\n"
"[ 0, 1) ├─IDENTIFIER := tok[0]\n"		"[ 0, 1) ├─IDENTIFIER := tok[0]\n"
"[ 1, end) └─test := IDENTIFIER\n"		"[ 1, end) └─test := IDENTIFIER\n"
"[ 1, end) └─IDENTIFIER := tok[1]\n");		"[ 1, end) └─IDENTIFIER := tok[1]\n");
}		}

TEST_F(GLRTest, GuardExtension) {		TEST_F(GLRTest, GuardExtension) {
build(R"bnf(		build(R"bnf(
_ := start		_ := start EOF

start := IDENTIFIER [guard]		start := IDENTIFIER [guard]
)bnf");		)bnf");
TestLang.Guards.try_emplace(		TestLang.Guards.try_emplace(
ruleFor("start"), [&](const GuardParams &P) {		ruleFor("start"), [&](const GuardParams &P) {
assert(P.RHS.size() == 1 &&		assert(P.RHS.size() == 1 &&
P.RHS.front()->symbol() ==		P.RHS.front()->symbol() ==
tokenSymbol(clang::tok::identifier));		tokenSymbol(clang::tok::identifier));
▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines