This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang-tools-extra/pseudo/
-
pseudo/
-
include/clang-pseudo/
-
clang-pseudo/
3/3
GLR.h
-
grammar/
2/2
Grammar.h
1/2
LRGraph.h
3/4
LRTable.h
-
lib/
24/34
GLR.cpp
-
grammar/
-
Grammar.cpp
-
GrammarBNF.cpp
-
LRGraph.cpp
-
LRTableBuild.cpp
-
test/cxx/
-
cxx/
-
empty-member-spec.cpp
-
recovery-init-list.cpp
-
unittests/
5/7
GLRTest.cpp

Differential D128486

[pseudo] Add error-recovery framework & brace-based recovery
ClosedPublic

Authored by sammccall on Jun 23 2022, 7:31 PM.

Download Raw Diff

Details

Reviewers

hokein

Commits

rG312116748890: [pseudo] Add error-recovery framework & brace-based recovery
rGa0f4c10ae227: [pseudo] Add error-recovery framework & brace-based recovery

Summary

The idea is:

a parse failure is detected when all heads die when trying to shift the next token
we can recover by choosing a nonterminal we're partway through parsing, and determining where it ends through nonlocal means (e.g. matching brackets)
we can find candidates by walking up the stack from the (ex-)heads
the token range is defined using heuristics attached to grammar rules
the unparsed region is represented in the forest by an Opaque node

This patch has the core GLR functionality.
It does not allow recovery heuristics to be attached as extensions to
the grammar, but rather infers a brace-based heuristic.

Expected followups:

make recovery heuristics grammar extensions (depends on D127448)
add recover to our grammar for bracketed constructs and sequence nodes
change the structure of our augmented _ := start rules to eliminate some special-cases in glrParse.
(if I can work out how): avoid some spurious recovery cases described in comments
grammar changes to eliminate the hard distinction between init-list and designated-init-list shown in the recovery-init-list.cpp testcase

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

sammccall created this revision.Jun 23 2022, 7:31 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2022, 7:31 PM

Herald added a subscriber: mgrang. · View Herald Transcript

Harbormaster completed remote builds in B171752: Diff 439597.Jun 23 2022, 7:47 PM

sammccall updated this revision to Diff 440376.Jun 27 2022, 1:53 PM

add tests, clean up

Harbormaster completed remote builds in B172309: Diff 440376.Jun 27 2022, 1:54 PM

reduce after final recovery
comments

Harbormaster completed remote builds in B172313: Diff 440379.Jun 27 2022, 2:00 PM

sammccall published this revision for review.Jun 27 2022, 2:04 PM

sammccall added a reviewer: hokein.

Herald added a project: Restricted Project. · View Herald TranscriptJun 27 2022, 2:04 PM

Herald added subscribers: cfe-commits, alextsao1999. · View Herald Transcript

revert format changes

Harbormaster completed remote builds in B172315: Diff 440383.Jun 27 2022, 2:05 PM

update testcase

Harbormaster completed remote builds in B172325: Diff 440393.Jun 27 2022, 2:14 PM

rebase

Harbormaster completed remote builds in B172503: Diff 440646.Jun 28 2022, 8:59 AM

This revision was not accepted when it landed; it landed in state Needs Review.Jun 28 2022, 12:08 PM

This revision was landed with ongoing or failed builds.

Closed by commit rGa0f4c10ae227: [pseudo] Add error-recovery framework & brace-based recovery (authored by sammccall). · Explain Why

This revision was automatically updated to reflect the committed changes.

sammccall added a commit: rGa0f4c10ae227: [pseudo] Add error-recovery framework & brace-based recovery.

Sorry, I committed this by mistake when working on another change.
Reverted and this is ready for review.

sammccall added a reverting change: rG743971faaf84: Revert "[pseudo] Add error-recovery framework & brace-based recovery".Jun 28 2022, 12:11 PM

the patch looks a good start to me, some initial comments (mostly around the recovery part).

clang-tools-extra/pseudo/include/clang-pseudo/GLR.h
150	If I read it correctly, consuming zero token is consider failure of the function, right?
clang-tools-extra/pseudo/include/clang-pseudo/grammar/Grammar.h
134	So each rule will have at most 1 recovery-strategy (it is fine for the initial version), but I think in the future we want more (probably we need to change the Sequence to an array of `{SymbolID, RevoeryStrategy}`). selection-statement := IF CONSTEXPR_opt ( init-statement_opt condition ) statement ELSE statement We might want different recoveries in `( . init-statement_opt condition )` `(init-statement_opt condition) . statement`, `ELSE . statement`.
clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRTable.h
248	I see the motivation of the `OffsetTable` structure now, this would come as a follow-up to simplify the `ReduceOffset` and `RecoveryOffset`, right?
clang-tools-extra/pseudo/lib/GLR.cpp
29	nit: unsigned => `Token::Index`
44	The `GLR.cpp` file is growing significantly, I think the recovery part is large enough to be lived in a separate file `GLRRecovery.cpp`, but the declaration can still be in the `GLR.h`.
64	this is not implemented, right? Add a FIXME?
68	nit: maybe name it `Parses` or `PartialParses`. Path make me think this is a patch of GSS nodes.
83	I think you're right -- I thought the first GSS node with a recovery state we encounter during the Walkup state is the node we want. This example is a little confusing (it still matches our previous mental model), I didn't get it until we discussed offline. I think the following example is clearer parsing the text `if (true) else ?` IfStmt := if (stmt) else . stmt - which we're currently parsing IfStmt := if (.stmt) else stmt - (left) the first recovery GSS node, should not recover from this IfStmt := . if (stmt) else stmt - (up), we should recover from this I also think it is worth to add it into the test.
88	I can't think of a better solution other than a search-based one (would like to think more about it). Currently, we find all recovery options by traversing the whole GSS, and do a post-filter (based on the Start, and End). I might do both in the DFS (which will save some cost of traversing), the DFS will give us the best recovery options, and then we build the GSS node, and forest node. But up to you.
91	nit: Walkup seems a bit clearer than DFS.
111	any particular reason why we iterate the OldHeads in a reversed order?
130	nit: might be clearer to move the assertion to the beginning of the function.
133	`further right` should be `further left`, right?
149	If we find a better recovery option, we're throwing all the newly-built heads and forest nodes, this seems wasteful. I think we can avoid it by first finding the best solutions and creating new heads and gss nodes for them.
150	The Line135-Line150 code looks like a good candidate for an `evaluateOption` function.
174	Advancing the TokenIndex here seems a little surprising and doesn't match what the comment says ( `On failure, NewHeads is empty and TokenIndex is unchanged.`). I think this should be done in caller side.
600	currently, it is fine for bracket-based recovery. In the future, we will need to run a force reduction for `Heads` regardless of the lookahead token before running the glrRecover, add a FIXME?
601	If the glrRecover returns a bool indicating whether we're able to recover, then the code is simpler: if (NextHeads.empty() && !glrRecover(...)) return Params.Forest.createOpaque(...);
clang-tools-extra/pseudo/unittests/GLRTest.cpp
382	nit: for GSS nodes, I will name them like `GSSNode<StateID>`, it is less describable, but it can be easily distinguished from other forest nodes, and match the GSS described in the comment, which I think it is clearer.
425	I suppose there is an extra `?` in the string literal, the index of the brackets (`4`, `5`) used below doesn't match the string literal here.

address comments

Harbormaster completed remote builds in B172845: Diff 441125.Jun 29 2022, 12:23 PM

address comments

clang-tools-extra/pseudo/include/clang-pseudo/GLR.h
150	Consuming zero tokens + producing heads is success, consuming zero tokens + not producing heads is failure. Tweaked this comment a bit. (Also fixed a bit of logic that ignored recovery that didn't advance TokenIndex!)
clang-tools-extra/pseudo/include/clang-pseudo/grammar/Grammar.h
134	Added a comment to call out this limitation. There are several ways to deal with this, e.g. we could split up the grammar into multiple rules, each with one recovery. (But the approach you describe makes sense too)
clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRTable.h
248	Yes. Though I'm on the fence about whether it's worth it with one case (it's a bit awkward to generalize the building IIRC).
clang-tools-extra/pseudo/lib/GLR.cpp
44	This is interesting, recover/shift/reduce/parse are (vertically) self-contained enough that it didn't seem like a big problem yet... If the concern is file length, maybe we'd thather start with `reduce`; if it's relatedness, `GSS`? My line count is: recover: 156 shift: 47 reduce: 319 parse: 87 GSS: 66
68	Renamed to DiscardedParse, does that work for you?
83	Fair enough, a couple of problems with that example though: dropping the "then" clause from the grammar of an if statement is confusing (but adding it back in makes the example complicated) using a non-kernel item for the last fails to show the "up" edge clearly (but there's not enough context to show a kernel item) Came up with another one that hopefully works. I didn't manage to write a reasonable test showing it - it basically can't happen with bracket recovery. In order to demonstrate it we need `{` in the "wrong" recovery construct to be paired with a `}` after the cursor... which makes it look very much like a correct recovery. Added a FIXME to add a testcase once we can define new recovery strategies instead.
88	I'm not sure what you mean by a search-based one, can you elaborate? Currently, we find all recovery options by traversing the whole GSS, and do a post-filter (based on the Start, and End). I might do both in the DFS (which will save some cost of traversing) This replaces a relatively simple traversal of GSS nodes (which there aren't that many of) with running the recovery strategy more times - this is (usually) a more complicated algorithm running over tokens, which is (usually) a larger data set. It seems likely to be a bad trade performance-wise. In any case, performance isn't terribly critical here, and mixing the discovery & evaluation of options seems harder to read & debug (we lose the nice documented data structures we can debug, predictable -debug dumping of not-taken recovery options, etc).
111	Not that I can remember, and the unit tests don't seem to care, so removed
130	The purpose of the assertion is to make it obvious that NewHeads.clear() only discards items added during the loop below. An assertion at the top of the function would be less useful for this, both because it's not on the screen when reading NewHeads.clear() and because it's much less locally obvious that nothing has been pushed onto NewHeads at some prior point in the function.
150	What would the signature of such a function look like? There are many different possible outcomes: replace the set with this option (BestOptions.clear) add this option to the set (BestOptions.push_back) discard this option (continue) discard this option and all future options (break) update recovery range or not Access to control flow (break/continue) and read/write access to RecoveryRange and BestOptions seem like the natural way to express this, so it's not clear what a function could abstract away.
174	Oops, this shouldn't be done at all - originally I had a different contract for glrRecover(). (In fact, if NewHeads is empty after glrRecover() we never look at TokenIndex again, but let's not rely on that)
600	Done. This impacts the internal structure of the recovered node much more than it does which recoveries are available, so I think we can defer this for a fair while.
601	I had this initially, but: a) it's redundant, and makes it less locally obvious that we've repopulated NewHeads b) soon recovery will always succeed, so !NewHeads.empty() becomes an assertion
clang-tools-extra/pseudo/unittests/GLRTest.cpp
382	Yeah, I found those testcases difficult to read :-( I did work out that the numbers might be state numbers rather than node numbers, but I found it hard to a) remember that, since it's often GSSNode0, GSSNode1 etc and b) remember which state is which. It doesn't generalize, e.g. the RecoverRightmost case has 3 nodes with the same state.
425	Oops, yes! Originally this testcase was meant to test more things, and apparently I didn't finish simplifying it...

Harbormaster completed remote builds in B172864: Diff 441154.Jun 29 2022, 1:41 PM

Looks great, let's ship it! feel free to land it in any form you think it is suitable.

clang-tools-extra/pseudo/include/clang-pseudo/GLR.h
150	nit: remove the duplicated `consumes`.
clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRGraph.h
148	nit: mention the `Result` must be a nonterminal.
clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRTable.h
131	nit: unrelated method?
248	A motivating bit is that it is tricky to implement a correct `get` method (we both made the same out-of-bound issue). I'm fine with the current form, we can revisit it afterwards.
clang-tools-extra/pseudo/lib/GLR.cpp
44	Yeah, indeed these pieces can be split, my main concern is not file length -- I would prefer keeping shift/reduce/parse into a single file, as they form a complete GLR algorithm (splitting them would make it harder to read and follow). Recovery seems like a different thing, I would image this part of code will grow more in the future the GLR algorithm has the core recovery mechanism framework with some fallback recovery strategies (eof, brackets) we have different recovery strategies implemented (some of them might be cxx-specific, probably be part of pseudoCXX library);
68	It is better than the original name. The `DiscardedParse` is a bit weird when we start to put it under the opaque node, in that sense, they are not discarded, IMO
83	oops, your example makes more sense -- I didn't notice that I missed the if-body stmt.
88	I'm not sure what you mean by a search-based one, can you elaborate? The search-based one refers to the current one -- basically we perform a brute-force search for all available recoveries and get the best one. In any case, performance isn't terribly critical here, and mixing the discovery & evaluation of options seems harder to read & debug (we lose the nice documented data structures we can debug, predictable -debug dumping of not-taken recovery options, etc). That's fair enough.
130	Yeah, but I'd treat it as a contract of the API (the `NewHeads` argument passed to `glrRecover` must be empty). btw, the empty assertion is missing in the latest version.
178	should we worry about the case where we create a duplicated forest node? e.g. we have two best options and both recover to the same nonterminal.
clang-tools-extra/pseudo/unittests/GLRTest.cpp
487	nit: I'd probably move this to the comment mentioned in glrRecovery(), which is more discoverable.
558	nit: not sure the intention having the `RecoveryEndToEnd` separated from the above recover-related tests, why not grouping them together?

This revision is now accepted and ready to land.Jul 5 2022, 7:08 AM

Thanks in particular for flagging the issue with duplicate forest nodes, you've found a hole in the model.
That said, I've left a big FIXME and I think we should patch it later.

clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRGraph.h
148	Hmm, I don't think it has to be? (To deduce a brace recovery rule we require it, but that doesn't mean it needs to be true in general)
clang-tools-extra/pseudo/lib/GLR.cpp
44	It's hard to tell yet, but I guess I don't really see why it's going to be easier to reason about shift/recover in isolation vs shift/reduce. Let's see how this goes past the initial patch (in particular once we move stuff into extensions)
130	Yes, I think this is obsolete - the latest version of glrRecover no longer requires NewHeads to be empty at all, as it doesn't use it as scratch space.
178	Oh no, you're right... we also have all the options for the GSS node being either the same/different in that case. And it gets worse, what if we have one recovery -> X, and another recovery -> Y, and a rule `X := Y`. The following `glrReduce` will produce a duplicate `X`, instead of an `AmbiguousX{ OpaqueX, SequenceX{ OpaqueY } }`. As much as I hate this, I think we should slap a FIXME on it and move on. I don't think multiple tied recoveries is common, and the solution here is just as likely to break the tie as it is to fix this with fancy algorithms. However I don't think we have enough data to make a decision on exactly what to do yet.
clang-tools-extra/pseudo/unittests/GLRTest.cpp
558	this tests glrParse, and it's grouped with the other glrParse tests consistent with this file (e.g. GLRReduceOrder is with the glrParse tests, not with the glrReduce tests)

This revision was landed with ongoing or failed builds.Jul 5 2022, 11:50 AM

Closed by commit rG312116748890: [pseudo] Add error-recovery framework & brace-based recovery (authored by sammccall). · Explain Why

This revision was automatically updated to reflect the committed changes.

sammccall marked an inline comment as done.

sammccall added a commit: rG312116748890: [pseudo] Add error-recovery framework & brace-based recovery.

Revision Contents

Path

Size

clang-tools-extra/

pseudo/

include/

clang-pseudo/

GLR.h

20 lines

grammar/

Grammar.h

15 lines

LRGraph.h

17 lines

LRTable.h

31 lines

lib/

GLR.cpp

224 lines

grammar/

7 lines

11 lines

37 lines

28 lines

test/

cxx/

empty-member-spec.cpp

2 lines

recovery-init-list.cpp

13 lines

unittests/

GLRTest.cpp

164 lines

Diff 441154

clang-tools-extra/pseudo/include/clang-pseudo/GLR.h

Show First 20 Lines • Show All 138 Lines • ▼ Show 20 Lines	void glrShift(llvm::ArrayRef<const GSS::Node *> OldHeads,
const ForestNode &NextTok, const ParseParams &Params,		const ForestNode &NextTok, const ParseParams &Params,
std::vector<const GSS::Node *> &NewHeads);		std::vector<const GSS::Node *> &NewHeads);
// Applies available reductions on Heads, appending resulting heads to the list.		// Applies available reductions on Heads, appending resulting heads to the list.
//		//
// Exposed for testing only.		// Exposed for testing only.
void glrReduce(std::vector<const GSS::Node *> &Heads, SymbolID Lookahead,		void glrReduce(std::vector<const GSS::Node *> &Heads, SymbolID Lookahead,
const ParseParams &Params);		const ParseParams &Params);

		// Heuristically recover from a state where no further parsing is possible.
		//
		// OldHeads is the parse state at TokenIndex.
		// This function consumes consumes zero or more tokens by advancing TokenIndex,
		hokeinUnsubmitted Done Reply Inline Actions If I read it correctly, consuming zero token is consider failure of the function, right? hokein: If I read it correctly, consuming zero token is consider failure of the function, right?
		sammccallAuthorUnsubmitted Done Reply Inline Actions Consuming zero tokens + producing heads is success, consuming zero tokens + not producing heads is failure. Tweaked this comment a bit. (Also fixed a bit of logic that ignored recovery that didn't advance TokenIndex!) sammccall: Consuming zero tokens + producing heads is success, consuming zero tokens + not producing heads…
		hokeinUnsubmitted Done Reply Inline Actions nit: remove the duplicated `consumes`. hokein: nit: remove the duplicated `consumes`.
		// and places any recovery states created in NewHeads.
		//
		// On failure, NewHeads is empty and TokenIndex is unchanged.
		//
		// WARNING: glrRecover acts as a "fallback shift". If it consumes no tokens,
		// there is a risk of the parser falling into an infinite loop, creating an
		// endless sequence of recovery nodes.
		// Generally it is safe for recovery to match 0 tokens against sequence symbols
		// like `statement-seq`, as the grammar won't permit another statement-seq
		// immediately afterwards. However recovery strategies for `statement` should
		// consume at least one token, as statements may be adjacent in the input.
		void glrRecover(llvm::ArrayRef<const GSS::Node *> OldHeads,
		unsigned &TokenIndex, const TokenStream &Tokens,
		const ParseParams &Params,
		std::vector<const GSS::Node *> &NewHeads);

} // namespace pseudo		} // namespace pseudo
} // namespace clang		} // namespace clang

#endif // CLANG_PSEUDO_GLR_H		#endif // CLANG_PSEUDO_GLR_H

clang-tools-extra/pseudo/include/clang-pseudo/grammar/Grammar.h

	Show First 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
	inline bool isNonterminal(SymbolID ID) { return !isToken(ID); }			inline bool isNonterminal(SymbolID ID) { return !isToken(ID); }
	// The terminals are always the clang tok::TokenKind (not all are used).			// The terminals are always the clang tok::TokenKind (not all are used).
	inline tok::TokenKind symbolToToken(SymbolID SID) {			inline tok::TokenKind symbolToToken(SymbolID SID) {
	assert(isToken(SID));			assert(isToken(SID));
	SID &= ~TokenFlag;			SID &= ~TokenFlag;
	assert(SID < NumTerminals);			assert(SID < NumTerminals);
	return static_cast<tok::TokenKind>(SID);			return static_cast<tok::TokenKind>(SID);
	}			}
	inline SymbolID tokenSymbol(tok::TokenKind TK) {			inline constexpr SymbolID tokenSymbol(tok::TokenKind TK) {
	return TokenFlag \| static_cast<SymbolID>(TK);			return TokenFlag \| static_cast<SymbolID>(TK);
	}			}
				// Error recovery strategies.
				// FIXME: these should be provided as extensions instead.
				enum class RecoveryStrategy : uint8_t { None, Braces };

	// An extension is a piece of native code specific to a grammar that modifies			// An extension is a piece of native code specific to a grammar that modifies
	// the behavior of annotated rules. One ExtensionID is assigned for each unique			// the behavior of annotated rules. One ExtensionID is assigned for each unique
	// attribute value (all attributes share a namespace).			// attribute value (all attributes share a namespace).
	using ExtensionID = uint16_t;			using ExtensionID = uint16_t;

	// A RuleID uniquely identifies a production rule in a grammar.			// A RuleID uniquely identifies a production rule in a grammar.
	// It is an index into a table of rules.			// It is an index into a table of rules.
	using RuleID = uint16_t;			using RuleID = uint16_t;
	// There are maximum 2^12 rules.			// There are maximum 2^12 rules.
	static constexpr unsigned RuleBits = 12;			static constexpr unsigned RuleBits = 12;

	// Represent a production rule in the grammar, e.g.			// Represent a production rule in the grammar, e.g.
	// expression := a b c			// expression := a b c
	// ^Target ^Sequence			// ^Target ^Sequence
	struct Rule {			struct Rule {
	Rule(SymbolID Target, llvm::ArrayRef<SymbolID> Seq);			Rule(SymbolID Target, llvm::ArrayRef<SymbolID> Seq);

	// We occupy 4 bits for the sequence, in theory, it can be at most 2^4 tokens			// We occupy 4 bits for the sequence, in theory, it can be at most 2^4 tokens
	// long, however, we're stricter in order to reduce the size, we limit the max			// long, however, we're stricter in order to reduce the size, we limit the max
	// length to 9 (this is the longest sequence in cxx grammar).			// length to 9 (this is the longest sequence in cxx grammar).
	static constexpr unsigned SizeBits = 4;			static constexpr unsigned SizeBits = 4;
	static constexpr unsigned MaxElements = 9;			static constexpr unsigned MaxElements = 9;
	static_assert(MaxElements <= (1 << SizeBits), "Exceeds the maximum limit");			static_assert(MaxElements < (1 << SizeBits), "Exceeds the maximum limit");
	static_assert(SizeBits + SymbolBits <= 16,			static_assert(SizeBits + SymbolBits <= 16,
	"Must be able to store symbol ID + size efficiently");			"Must be able to store symbol ID + size efficiently");

	// 16 bits for target symbol and size of sequence:			// 16 bits for target symbol and size of sequence:
	// SymbolID : 12 \| Size : 4			// SymbolID : 12 \| Size : 4
	SymbolID Target : SymbolBits;			SymbolID Target : SymbolBits;
	uint8_t Size : SizeBits; // Size of the Sequence			uint8_t Size : SizeBits; // Size of the Sequence
	SymbolID Sequence[MaxElements];			SymbolID Sequence[MaxElements];

	// A guard extension controls whether a reduction of a rule will be conducted			// A guard extension controls whether a reduction of a rule will be conducted
	// by the GLR parser.			// by the GLR parser.
	// 0 is sentinel unset extension ID, indicating there is no guard extension			// 0 is sentinel unset extension ID, indicating there is no guard extension
	// being set for this rule.			// being set for this rule.
	ExtensionID Guard = 0;			ExtensionID Guard = 0;

				// Specifies the index within Sequence eligible for error recovery.
				// Given stmt := { stmt-seq_opt }, if we fail to parse the stmt-seq then we
				// should recover by finding the matching brace, and forcing stmt-seq to match
				// everything between braces.
				// For now, only a single strategy at a single point is possible.
				uint8_t RecoveryIndex = -1;
				hokeinUnsubmitted Done Reply Inline Actions So each rule will have at most 1 recovery-strategy (it is fine for the initial version), but I think in the future we want more (probably we need to change the Sequence to an array of `{SymbolID, RevoeryStrategy}`). selection-statement := IF CONSTEXPR_opt ( init-statement_opt condition ) statement ELSE statement We might want different recoveries in `( . init-statement_opt condition )` `(init-statement_opt condition) . statement`, `ELSE . statement`. hokein: So each rule will have at most 1 recovery-strategy (it is fine for the initial version), but I…
				sammccallAuthorUnsubmitted Done Reply Inline Actions Added a comment to call out this limitation. There are several ways to deal with this, e.g. we could split up the grammar into multiple rules, each with one recovery. (But the approach you describe makes sense too) sammccall: Added a comment to call out this limitation. There are several ways to deal with this, e.g. we…
				RecoveryStrategy Recovery = RecoveryStrategy::None;

	llvm::ArrayRef<SymbolID> seq() const {			llvm::ArrayRef<SymbolID> seq() const {
	return llvm::ArrayRef<SymbolID>(Sequence, Size);			return llvm::ArrayRef<SymbolID>(Sequence, Size);
	}			}
	friend bool operator==(const Rule &L, const Rule &R) {			friend bool operator==(const Rule &L, const Rule &R) {
	return L.Target == R.Target && L.seq() == R.seq() && L.Guard == R.Guard;			return L.Target == R.Target && L.seq() == R.seq() && L.Guard == R.Guard;
	}			}
	};			};

	▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRGraph.h

Show First 20 Lines • Show All 131 Lines • ▼ Show 20 Lines	public:
// An edge in the LR graph, it represents a transition in the LR automaton.		// An edge in the LR graph, it represents a transition in the LR automaton.
// If the parser is at state Src, with a lookahead Label, then it		// If the parser is at state Src, with a lookahead Label, then it
// transits to state Dst.		// transits to state Dst.
struct Edge {		struct Edge {
StateID Src, Dst;		StateID Src, Dst;
SymbolID Label;		SymbolID Label;
};		};

		// A possible error recovery: choose to match some tokens against a symbol.
		//
		// e.g. a state that contains
		// stmt := { . stmt-seq [recover=braces] }
		// has a Recovery { Src = S, Strategy=braces, Result=stmt-seq }.
		struct Recovery {
		StateID Src; // The state we are in when encountering the error.
		RecoveryStrategy Strategy; // Heuristic choosing the tokens to match.
		SymbolID Result; // The symbol that is produced.
		hokeinUnsubmitted Not Done Reply Inline Actions nit: mention the `Result` must be a nonterminal. hokein: nit: mention the `Result` must be a nonterminal.
		sammccallAuthorUnsubmitted Done Reply Inline Actions Hmm, I don't think it has to be? (To deduce a brace recovery rule we require it, but that doesn't mean it needs to be true in general) sammccall: Hmm, I don't think it has to be? (To deduce a brace recovery rule we require it, but that…
		};

llvm::ArrayRef<State> states() const { return States; }		llvm::ArrayRef<State> states() const { return States; }
llvm::ArrayRef<Edge> edges() const { return Edges; }		llvm::ArrayRef<Edge> edges() const { return Edges; }
		llvm::ArrayRef<Recovery> recoveries() const { return Recoveries; }
llvm::ArrayRef<std::pair<SymbolID, StateID>> startStates() const {		llvm::ArrayRef<std::pair<SymbolID, StateID>> startStates() const {
return StartStates;		return StartStates;
}		}

std::string dumpForTests(const Grammar &) const;		std::string dumpForTests(const Grammar &) const;

private:		private:
LRGraph(std::vector<State> States, std::vector<Edge> Edges,		LRGraph(std::vector<State> States, std::vector<Edge> Edges,
		std::vector<Recovery> Recoveries,
std::vector<std::pair<SymbolID, StateID>> StartStates)		std::vector<std::pair<SymbolID, StateID>> StartStates)
: States(std::move(States)), Edges(std::move(Edges)),		: States(std::move(States)), Edges(std::move(Edges)),
StartStates(std::move(StartStates)) {}		Recoveries(std::move(Recoveries)), StartStates(std::move(StartStates)) {
		}

std::vector<State> States;		std::vector<State> States;
std::vector<Edge> Edges;		std::vector<Edge> Edges;
		std::vector<Recovery> Recoveries;
std::vector<std::pair<SymbolID, StateID>> StartStates;		std::vector<std::pair<SymbolID, StateID>> StartStates;
};		};

} // namespace pseudo		} // namespace pseudo
} // namespace clang		} // namespace clang

namespace llvm {		namespace llvm {
// Support clang::pseudo::Item as DenseMap keys.		// Support clang::pseudo::Item as DenseMap keys.
Show All 18 Lines

clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRTable.h

Show First 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	private:
static_assert(ValueBits >= RuleBits, "Value must be able to store RuleID");		static_assert(ValueBits >= RuleBits, "Value must be able to store RuleID");
static_assert(KindBits + ValueBits <= 16,		static_assert(KindBits + ValueBits <= 16,
"Must be able to store kind and value efficiently");		"Must be able to store kind and value efficiently");
uint16_t K : KindBits;		uint16_t K : KindBits;
// Either StateID or RuleID, depending on the Kind.		// Either StateID or RuleID, depending on the Kind.
uint16_t Value : ValueBits;		uint16_t Value : ValueBits;
};		};

		struct Recovery {
		RecoveryStrategy Strategy;
		SymbolID Result;
		};

		// Returns all available actions for the given state on a terminal.
		// Expected to be called by LR parsers.
		llvm::ArrayRef<Action> getActions(StateID State, SymbolID Terminal) const;
		hokeinUnsubmitted Done Reply Inline Actions nit: unrelated method? hokein: nit: unrelated method?
// Returns the state after we reduce a nonterminal.		// Returns the state after we reduce a nonterminal.
// Expected to be called by LR parsers.		// Expected to be called by LR parsers.
// REQUIRES: Nonterminal is valid here.		// REQUIRES: Nonterminal is valid here.
StateID getGoToState(StateID State, SymbolID Nonterminal) const;		StateID getGoToState(StateID State, SymbolID Nonterminal) const;
// Returns the state after we shift a terminal.		// Returns the state after we shift a terminal.
// Expected to be called by LR parsers.		// Expected to be called by LR parsers.
// If the terminal is invalid here, returns None.		// If the terminal is invalid here, returns None.
llvm::Optional<StateID> getShiftState(StateID State, SymbolID Terminal) const;		llvm::Optional<StateID> getShiftState(StateID State, SymbolID Terminal) const;
Show All 14 Lines	public:
// Returns whether Terminal can follow Nonterminal in a valid source file.		// Returns whether Terminal can follow Nonterminal in a valid source file.
bool canFollow(SymbolID Nonterminal, SymbolID Terminal) const {		bool canFollow(SymbolID Nonterminal, SymbolID Terminal) const {
assert(isToken(Terminal));		assert(isToken(Terminal));
assert(isNonterminal(Nonterminal));		assert(isNonterminal(Nonterminal));
return FollowSets.test(tok::NUM_TOKENS * Nonterminal +		return FollowSets.test(tok::NUM_TOKENS * Nonterminal +
symbolToToken(Terminal));		symbolToToken(Terminal));
}		}

		// Looks up available recovery actions if we stopped parsing in this state.
		llvm::ArrayRef<Recovery> getRecovery(StateID State) const {
		return llvm::makeArrayRef(Recoveries.data() + RecoveryOffset[State],
		Recoveries.data() + RecoveryOffset[State + 1]);
		}

// Returns the state from which the LR parser should start to parse the input		// Returns the state from which the LR parser should start to parse the input
// tokens as the given StartSymbol.		// tokens as the given StartSymbol.
//		//
// In LR parsing, the start state of `translation-unit` corresponds to		// In LR parsing, the start state of `translation-unit` corresponds to
// `_ := • translation-unit`.		// `_ := • translation-unit`.
//		//
// Each start state responds to a single grammar rule like `_ := start`.		// Each start state responds to a single grammar rule like `_ := start`.
// REQUIRE: The given StartSymbol must exist in the grammar (in a form of		// REQUIRE: The given StartSymbol must exist in the grammar (in a form of
Show All 21 Lines	struct Entry {
StateID State;		StateID State;
SymbolID Symbol;		SymbolID Symbol;
Action Act;		Action Act;
};		};
struct ReduceEntry {		struct ReduceEntry {
StateID State;		StateID State;
RuleID Rule;		RuleID Rule;
};		};
// Build a specifid table for testing purposes.		struct RecoveryEntry {
static LRTable buildForTests(const Grammar &G, llvm::ArrayRef<Entry>,		StateID State;
llvm::ArrayRef<ReduceEntry>);		RecoveryStrategy Strategy;
		SymbolID Result;
		};
		// Build a specified table for testing purposes.
		static LRTable buildForTests(const Grammar &, llvm::ArrayRef<Entry>,
		llvm::ArrayRef<ReduceEntry>,
		llvm::ArrayRef<RecoveryEntry> = {});

private:		private:
// Looks up actions stored in the generic table.		// Looks up actions stored in the generic table.
llvm::ArrayRef<Action> find(StateID State, SymbolID Symbol) const;		llvm::ArrayRef<Action> find(StateID State, SymbolID Symbol) const;

// Conceptually the LR table is a multimap from (State, SymbolID) => Action.		// Conceptually the LR table is a multimap from (State, SymbolID) => Action.
// Our physical representation is quite different for compactness.		// Our physical representation is quite different for compactness.

Show All 15 Lines	private:
std::vector<uint32_t> ReduceOffset;		std::vector<uint32_t> ReduceOffset;
std::vector<RuleID> Reduces;		std::vector<RuleID> Reduces;
// Conceptually this is a bool[SymbolID][Token], each entry describing whether		// Conceptually this is a bool[SymbolID][Token], each entry describing whether
// the grammar allows the (nonterminal) symbol to be followed by the token.		// the grammar allows the (nonterminal) symbol to be followed by the token.
//		//
// This is flattened by encoding the (SymbolID Nonterminal, tok::Kind Token)		// This is flattened by encoding the (SymbolID Nonterminal, tok::Kind Token)
// as an index: Nonterminal * NUM_TOKENS + Token.		// as an index: Nonterminal * NUM_TOKENS + Token.
llvm::BitVector FollowSets;		llvm::BitVector FollowSets;

		// Recovery stores all recovery actions from all states.
		// A given state has [RecoveryOffset[S], RecoveryOffset[S+1]).
		std::vector<uint32_t> RecoveryOffset;
		hokeinUnsubmitted Done Reply Inline Actions I see the motivation of the `OffsetTable` structure now, this would come as a follow-up to simplify the `ReduceOffset` and `RecoveryOffset`, right? hokein: I see the motivation of the `OffsetTable` structure now, this would come as a follow-up to…
		sammccallAuthorUnsubmitted Done Reply Inline Actions Yes. Though I'm on the fence about whether it's worth it with one case (it's a bit awkward to generalize the building IIRC). sammccall: Yes. Though I'm on the fence about whether it's worth it with one case (it's a bit awkward to…
		hokeinUnsubmitted Not Done Reply Inline Actions A motivating bit is that it is tricky to implement a correct `get` method (we both made the same out-of-bound issue). I'm fine with the current form, we can revisit it afterwards. hokein: A motivating bit is that it is tricky to implement a correct `get` method (we both made the…
		std::vector<Recovery> Recoveries;
};		};
llvm::raw_ostream &operator<<(llvm::raw_ostream &, const LRTable::Action &);		llvm::raw_ostream &operator<<(llvm::raw_ostream &, const LRTable::Action &);

} // namespace pseudo		} // namespace pseudo
} // namespace clang		} // namespace clang

#endif // CLANG_PSEUDO_GRAMMAR_LRTABLE_H		#endif // CLANG_PSEUDO_GRAMMAR_LRTABLE_H

clang-tools-extra/pseudo/lib/GLR.cpp

Show All 18 Lines
#include <algorithm>		#include <algorithm>
#include <memory>		#include <memory>
#include <queue>		#include <queue>

#define DEBUG_TYPE "GLR.cpp"		#define DEBUG_TYPE "GLR.cpp"

namespace clang {		namespace clang {
namespace pseudo {		namespace pseudo {
		namespace {

		Token::Index findRecoveryEndpoint(RecoveryStrategy Strategy,
		hokeinUnsubmitted Done Reply Inline Actions nit: unsigned => `Token::Index` hokein: nit: unsigned => `Token::Index`
		const GSS::Node *RecoveryNode,
		const TokenStream &Tokens) {
		assert(Strategy == RecoveryStrategy::Braces);
		const ForestNode *LBrace = RecoveryNode->Payload;
		assert(LBrace->kind() == ForestNode::Terminal &&
		LBrace->symbol() == tokenSymbol(tok::l_brace));
		if (const Token *RBrace = Tokens.tokens()[LBrace->startTokenIndex()].pair())
		return Tokens.index(*RBrace);
		return Token::Invalid;
		}

		} // namespace

		void glrRecover(llvm::ArrayRef<const GSS::Node *> OldHeads,
		unsigned &TokenIndex, const TokenStream &Tokens,
		hokeinUnsubmitted Not Done Reply Inline Actions The `GLR.cpp` file is growing significantly, I think the recovery part is large enough to be lived in a separate file `GLRRecovery.cpp`, but the declaration can still be in the `GLR.h`. hokein: The `GLR.cpp` file is growing significantly, I think the recovery part is large enough to be…
		sammccallAuthorUnsubmitted Not Done Reply Inline Actions This is interesting, recover/shift/reduce/parse are (vertically) self-contained enough that it didn't seem like a big problem yet... If the concern is file length, maybe we'd thather start with `reduce`; if it's relatedness, `GSS`? My line count is: recover: 156 shift: 47 reduce: 319 parse: 87 GSS: 66 sammccall: This is interesting, recover/shift/reduce/parse are (vertically) self-contained enough that it…
		hokeinUnsubmitted Not Done Reply Inline Actions Yeah, indeed these pieces can be split, my main concern is not file length -- I would prefer keeping shift/reduce/parse into a single file, as they form a complete GLR algorithm (splitting them would make it harder to read and follow). Recovery seems like a different thing, I would image this part of code will grow more in the future the GLR algorithm has the core recovery mechanism framework with some fallback recovery strategies (eof, brackets) we have different recovery strategies implemented (some of them might be cxx-specific, probably be part of pseudoCXX library); hokein: Yeah, indeed these pieces can be split, my main concern is not file length -- I would prefer…
		sammccallAuthorUnsubmitted Done Reply Inline Actions It's hard to tell yet, but I guess I don't really see why it's going to be easier to reason about shift/recover in isolation vs shift/reduce. Let's see how this goes past the initial patch (in particular once we move stuff into extensions) sammccall: It's hard to tell yet, but I guess I don't really see why it's going to be easier to reason…
		const ParseParams &Params,
		std::vector<const GSS::Node *> &NewHeads) {
		LLVM_DEBUG(llvm::dbgs() << "Recovery at token " << TokenIndex << "...\n");
		// Describes a possibility to recover by forcibly interpreting a range of
		// tokens around the cursor as a nonterminal that we expected to see.
		struct PlaceholderRecovery {
		// The token prior to the nonterminal which is being recovered.
		// This starts of the region we're skipping, so higher Position is better.
		Token::Index Position;
		// The nonterminal which will be created in order to recover.
		SymbolID Symbol;
		// The heuristic used to choose the bounds of the nonterminal to recover.
		RecoveryStrategy Strategy;

		// The GSS head where we are expecting the recovered nonterminal.
		const GSS::Node *RecoveryNode;
		// Payload of nodes on the way back from the OldHead to the recovery node.
		// These represent the partial parse that is being discarded.
		// They should become the children of the opaque recovery node.
		// FIXME: internal structure of opaque nodes is not implemented.
		hokeinUnsubmitted Done Reply Inline Actions this is not implemented, right? Add a FIXME? hokein: this is not implemented, right? Add a FIXME?
		//
		// There may be multiple paths leading to the same recovery node, we choose
		// one arbitrarily.
		std::vector<const ForestNode *> DiscardedParse;
		hokeinUnsubmitted Done Reply Inline Actions nit: maybe name it `Parses` or `PartialParses`. Path make me think this is a patch of GSS nodes. hokein: nit: maybe name it `Parses` or `PartialParses`. Path make me think this is a patch of GSS nodes.
		sammccallAuthorUnsubmitted Done Reply Inline Actions Renamed to DiscardedParse, does that work for you? sammccall: Renamed to DiscardedParse, does that work for you?
		hokeinUnsubmitted Not Done Reply Inline Actions It is better than the original name. The `DiscardedParse` is a bit weird when we start to put it under the opaque node, in that sense, they are not discarded, IMO hokein: It is better than the original name. The `DiscardedParse` is a bit weird when we start to put…
		};
		std::vector<PlaceholderRecovery> Options;

		// Find recovery options by walking up the stack.
		//
		// This is similar to exception handling: we walk up the "frames" of nested
		// rules being parsed until we find one that has a "handler" which allows us
		// to determine the node bounds without parsing it.
		//
		// Unfortunately there's a significant difference: the stack contains both
		// "upward" nodes (ancestor parses) and "leftward" ones.
		// e.g. when parsing `{ if (1) ? }` as compound-stmt, the stack contains:
		// stmt := IF ( expr ) . stmt - current state, we should recover here!
		// stmt := IF ( expr . ) stmt - (left, no recovery here)
		// stmt := IF ( . expr ) stmt - left, we should NOT recover here!
		hokeinUnsubmitted Not Done Reply Inline Actions I think you're right -- I thought the first GSS node with a recovery state we encounter during the Walkup state is the node we want. This example is a little confusing (it still matches our previous mental model), I didn't get it until we discussed offline. I think the following example is clearer parsing the text `if (true) else ?` IfStmt := if (stmt) else . stmt - which we're currently parsing IfStmt := if (.stmt) else stmt - (left) the first recovery GSS node, should not recover from this IfStmt := . if (stmt) else stmt - (up), we should recover from this I also think it is worth to add it into the test. hokein: I think you're right -- I thought the first GSS node with a recovery state we encounter during…
		sammccallAuthorUnsubmitted Done Reply Inline Actions Fair enough, a couple of problems with that example though: dropping the "then" clause from the grammar of an if statement is confusing (but adding it back in makes the example complicated) using a non-kernel item for the last fails to show the "up" edge clearly (but there's not enough context to show a kernel item) Came up with another one that hopefully works. I didn't manage to write a reasonable test showing it - it basically can't happen with bracket recovery. In order to demonstrate it we need `{` in the "wrong" recovery construct to be paired with a `}` after the cursor... which makes it look very much like a correct recovery. Added a FIXME to add a testcase once we can define new recovery strategies instead. sammccall: Fair enough, a couple of problems with that example though: - dropping the "then" clause from…
		hokeinUnsubmitted Not Done Reply Inline Actions oops, your example makes more sense -- I didn't notice that I missed the if-body stmt. hokein: oops, your example makes more sense -- I didn't notice that I missed the if-body stmt.
		// stmt := IF . ( expr ) stmt - (left, no recovery here)
		// stmt-seq := . stmt - up, we might recover here
		// compound-stmt := { . stmt-seq } - up, we should recover here!
		//
		// It's not obvious how to avoid collecting "leftward" recovery options.
		hokeinUnsubmitted Not Done Reply Inline Actions I can't think of a better solution other than a search-based one (would like to think more about it). Currently, we find all recovery options by traversing the whole GSS, and do a post-filter (based on the Start, and End). I might do both in the DFS (which will save some cost of traversing), the DFS will give us the best recovery options, and then we build the GSS node, and forest node. But up to you. hokein: I can't think of a better solution other than a search-based one (would like to think more…
		sammccallAuthorUnsubmitted Done Reply Inline Actions I'm not sure what you mean by a search-based one, can you elaborate? Currently, we find all recovery options by traversing the whole GSS, and do a post-filter (based on the Start, and End). I might do both in the DFS (which will save some cost of traversing) This replaces a relatively simple traversal of GSS nodes (which there aren't that many of) with running the recovery strategy more times - this is (usually) a more complicated algorithm running over tokens, which is (usually) a larger data set. It seems likely to be a bad trade performance-wise. In any case, performance isn't terribly critical here, and mixing the discovery & evaluation of options seems harder to read & debug (we lose the nice documented data structures we can debug, predictable -debug dumping of not-taken recovery options, etc). sammccall: I'm not sure what you mean by a search-based one, can you elaborate? > Currently, we find all…
		hokeinUnsubmitted Not Done Reply Inline Actions I'm not sure what you mean by a search-based one, can you elaborate? The search-based one refers to the current one -- basically we perform a brute-force search for all available recoveries and get the best one. In any case, performance isn't terribly critical here, and mixing the discovery & evaluation of options seems harder to read & debug (we lose the nice documented data structures we can debug, predictable -debug dumping of not-taken recovery options, etc). That's fair enough. hokein: > I'm not sure what you mean by a search-based one, can you elaborate? The search-based one…
		// I think the distinction is ill-defined after merging items into states.
		// For now, we have to take this into account when defining recovery rules.
		// (e.g. in the expr recovery above, stay inside the parentheses).
		hokeinUnsubmitted Done Reply Inline Actions nit: Walkup seems a bit clearer than DFS. hokein: nit: Walkup seems a bit clearer than DFS.
		// FIXME: find a more satisfying way to avoid such false recovery.
		std::vector<const ForestNode *> Path;
		llvm::DenseSet<const GSS::Node *> Seen;
		auto WalkUp = [&](const GSS::Node *N, Token::Index NextTok, auto &WalkUp) {
		if (!Seen.insert(N).second)
		return;
		for (auto Strategy : Params.Table.getRecovery(N->State)) {
		Options.push_back(PlaceholderRecovery{
		NextTok,
		Strategy.Result,
		Strategy.Strategy,
		N,
		Path,
		});
		LLVM_DEBUG(llvm::dbgs()
		<< "Option: recover " << Params.G.symbolName(Strategy.Result)
		<< " at token " << NextTok << "\n");
		}
		Path.push_back(N->Payload);
		for (const GSS::Node *Parent : N->parents())
		hokeinUnsubmitted Done Reply Inline Actions any particular reason why we iterate the OldHeads in a reversed order? hokein: any particular reason why we iterate the OldHeads in a reversed order?
		sammccallAuthorUnsubmitted Done Reply Inline Actions Not that I can remember, and the unit tests don't seem to care, so removed sammccall: Not that I can remember, and the unit tests don't seem to care, so removed
		WalkUp(Parent, N->Payload->startTokenIndex(), WalkUp);
		Path.pop_back();
		};
		for (auto *N : OldHeads)
		WalkUp(N, TokenIndex, WalkUp);

		// Now we select the option(s) we will use to recover.
		//
		// We prefer options starting further right, as these discard less code
		// (e.g. we prefer to recover inner scopes rather than outer ones).
		// The options also need to agree on an endpoint, so the parser has a
		// consistent position afterwards.
		//
		// So conceptually we're sorting by the tuple (start, end), though we avoid
		// computing `end` for options that can't be winners.

		// Consider options starting further right first.
		// Don't drop the others yet though, we may still use them if preferred fails.
		llvm::stable_sort(Options, [&](const auto &L, const auto &R) {
		hokeinUnsubmitted Done Reply Inline Actions nit: might be clearer to move the assertion to the beginning of the function. hokein: nit: might be clearer to move the assertion to the beginning of the function.
		sammccallAuthorUnsubmitted Done Reply Inline Actions The purpose of the assertion is to make it obvious that NewHeads.clear() only discards items added during the loop below. An assertion at the top of the function would be less useful for this, both because it's not on the screen when reading NewHeads.clear() and because it's much less locally obvious that nothing has been pushed onto NewHeads at some prior point in the function. sammccall: The purpose of the assertion is to make it obvious that NewHeads.clear() only discards items…
		hokeinUnsubmitted Done Reply Inline Actions Yeah, but I'd treat it as a contract of the API (the `NewHeads` argument passed to `glrRecover` must be empty). btw, the empty assertion is missing in the latest version. hokein: Yeah, but I'd treat it as a contract of the API (the `NewHeads` argument passed to `glrRecover`…
		sammccallAuthorUnsubmitted Done Reply Inline Actions Yes, I think this is obsolete - the latest version of glrRecover no longer requires NewHeads to be empty at all, as it doesn't use it as scratch space. sammccall: Yes, I think this is obsolete - the latest version of glrRecover no longer requires NewHeads to…
		return L.Position > R.Position;
		});

		hokeinUnsubmitted Done Reply Inline Actions `further right` should be `further left`, right? hokein: `further right` should be `further left`, right?
		// We may find multiple winners, but they will have the same range.
		llvm::Optional<Token::Range> RecoveryRange;
		std::vector<const PlaceholderRecovery *> BestOptions;
		for (const PlaceholderRecovery &Option : Options) {
		// If this starts further left than options we've already found, then
		// we'll never find anything better. Skip computing End for the rest.
		if (RecoveryRange && Option.Position < RecoveryRange->Begin)
		break;

		auto End =
		findRecoveryEndpoint(Option.Strategy, Option.RecoveryNode, Tokens);
		// Recovery may not take the parse backwards.
		if (End == Token::Invalid \|\| End < TokenIndex)
		continue;
		if (RecoveryRange) {
		// If this is worse than our previous options, ignore it.
		hokeinUnsubmitted Done Reply Inline Actions If we find a better recovery option, we're throwing all the newly-built heads and forest nodes, this seems wasteful. I think we can avoid it by first finding the best solutions and creating new heads and gss nodes for them. hokein: If we find a better recovery option, we're throwing all the newly-built heads and forest nodes…
		if (RecoveryRange->End < End)
		hokeinUnsubmitted Not Done Reply Inline Actions The Line135-Line150 code looks like a good candidate for an `evaluateOption` function. hokein: The Line135-Line150 code looks like a good candidate for an `evaluateOption` function.
		sammccallAuthorUnsubmitted Done Reply Inline Actions What would the signature of such a function look like? There are many different possible outcomes: replace the set with this option (BestOptions.clear) add this option to the set (BestOptions.push_back) discard this option (continue) discard this option and all future options (break) update recovery range or not Access to control flow (break/continue) and read/write access to RecoveryRange and BestOptions seem like the natural way to express this, so it's not clear what a function could abstract away. sammccall: What would the signature of such a function look like? There are many different possible…
		continue;
		// If this is an improvement over our previous options, then drop them.
		if (RecoveryRange->End > End)
		BestOptions.clear();
		}
		// Create recovery nodes and heads for them in the GSS. These may be
		// discarded if a better recovery is later found, but this path isn't hot.
		RecoveryRange = {Option.Position, End};
		BestOptions.push_back(&Option);
		}

		if (BestOptions.empty()) {
		LLVM_DEBUG(llvm::dbgs() << "Recovery failed after trying " << Options.size()
		<< " strategies\n");
		return;
		}

		// We've settled on a set of recovery options, so create their nodes and
		// advance the cursor.
		LLVM_DEBUG({
		llvm::dbgs() << "Recovered range=" << *RecoveryRange << ":";
		for (const auto *Option : BestOptions)
		llvm::dbgs() << " " << Params.G.symbolName(Option->Symbol);
		llvm::dbgs() << "\n";
		hokeinUnsubmitted Done Reply Inline Actions Advancing the TokenIndex here seems a little surprising and doesn't match what the comment says ( `On failure, NewHeads is empty and TokenIndex is unchanged.`). I think this should be done in caller side. hokein: Advancing the TokenIndex here seems a little surprising and doesn't match what the comment says…
		sammccallAuthorUnsubmitted Done Reply Inline Actions Oops, this shouldn't be done at all - originally I had a different contract for glrRecover(). (In fact, if NewHeads is empty after glrRecover() we never look at TokenIndex again, but let's not rely on that) sammccall: Oops, this shouldn't be done at all - originally I had a different contract for glrRecover().
		});
		for (const PlaceholderRecovery *Option : BestOptions) {
		const ForestNode &Placeholder =
		Params.Forest.createOpaque(Option->Symbol, Option->Position);
		hokeinUnsubmitted Not Done Reply Inline Actions should we worry about the case where we create a duplicated forest node? e.g. we have two best options and both recover to the same nonterminal. hokein: should we worry about the case where we create a duplicated forest node? e.g. we have two best…
		sammccallAuthorUnsubmitted Done Reply Inline Actions Oh no, you're right... we also have all the options for the GSS node being either the same/different in that case. And it gets worse, what if we have one recovery -> X, and another recovery -> Y, and a rule `X := Y`. The following `glrReduce` will produce a duplicate `X`, instead of an `AmbiguousX{ OpaqueX, SequenceX{ OpaqueY } }`. As much as I hate this, I think we should slap a FIXME on it and move on. I don't think multiple tied recoveries is common, and the solution here is just as likely to break the tie as it is to fix this with fancy algorithms. However I don't think we have enough data to make a decision on exactly what to do yet. sammccall: Oh no, you're right... we also have all the options for the GSS node being either the…
		const GSS::Node *NewHead = Params.GSStack.addNode(
		Params.Table.getGoToState(Option->RecoveryNode->State, Option->Symbol),
		&Placeholder, {Option->RecoveryNode});
		NewHeads.push_back(NewHead);
		}
		TokenIndex = RecoveryRange->End;
		}

using StateID = LRTable::StateID;		using StateID = LRTable::StateID;

llvm::raw_ostream &operator<<(llvm::raw_ostream &OS, const GSS::Node &N) {		llvm::raw_ostream &operator<<(llvm::raw_ostream &OS, const GSS::Node &N) {
std::vector<std::string> ParentStates;		std::vector<std::string> ParentStates;
for (const auto *Parent : N.parents())		for (const auto *Parent : N.parents())
ParentStates.push_back(llvm::formatv("{0}", Parent->State));		ParentStates.push_back(llvm::formatv("{0}", Parent->State));
OS << llvm::formatv("state {0}, parsed symbol {1}, parents {2}", N.State,		OS << llvm::formatv("state {0}, parsed symbol {1}, parents {3}", N.State,
N.Payload->symbol(), llvm::join(ParentStates, ", "));		N.Payload ? N.Payload->symbol() : 0,
		llvm::join(ParentStates, ", "));
return OS;		return OS;
}		}

// Apply all pending shift actions.		// Apply all pending shift actions.
// In theory, LR parsing doesn't have shift/shift conflicts on a single head.		// In theory, LR parsing doesn't have shift/shift conflicts on a single head.
// But we may have multiple active heads, and each head has a shift action.		// But we may have multiple active heads, and each head has a shift action.
//		//
// We merge the stack -- if multiple heads will reach the same state after		// We merge the stack -- if multiple heads will reach the same state after
▲ Show 20 Lines • Show All 378 Lines • ▼ Show 20 Lines	if (++I != 20) // Run periodically to balance CPU and memory usage.
return;		return;
I = 0;		I = 0;

// We need to copy the list: Roots is consumed by the GC.		// We need to copy the list: Roots is consumed by the GC.
Roots = Heads;		Roots = Heads;
GSS.gc(std::move(Roots));		GSS.gc(std::move(Roots));
};		};
// Each iteration fully processes a single token.		// Each iteration fully processes a single token.
for (unsigned I = 0; I < Terminals.size(); ++I) {		for (unsigned I = 0; I < Terminals.size();) {
LLVM_DEBUG(llvm::dbgs() << llvm::formatv(		LLVM_DEBUG(llvm::dbgs() << llvm::formatv(
"Next token {0} (id={1})\n",		"Next token {0} (id={1})\n",
G.symbolName(Terminals[I].symbol()), Terminals[I].symbol()));		G.symbolName(Terminals[I].symbol()), Terminals[I].symbol()));
// Consume the token.		// Consume the token.
glrShift(Heads, Terminals[I], Params, NextHeads);		glrShift(Heads, Terminals[I], Params, NextHeads);

		// If we weren't able to consume the token, try to skip over some tokens
		// so we can keep parsing.
		if (NextHeads.empty()) {
		// FIXME: Heads may not be fully reduced, because our reductions were
		hokeinUnsubmitted Done Reply Inline Actions currently, it is fine for bracket-based recovery. In the future, we will need to run a force reduction for `Heads` regardless of the lookahead token before running the glrRecover, add a FIXME? hokein: currently, it is fine for bracket-based recovery. In the future, we will need to run a force…
		sammccallAuthorUnsubmitted Done Reply Inline Actions Done. This impacts the internal structure of the recovered node much more than it does which recoveries are available, so I think we can defer this for a fair while. sammccall: Done. This impacts the internal structure of the recovered node much more than it does which…
		// constrained by lookahead (but lookahead is meaningless to recovery).
		hokeinUnsubmitted Done Reply Inline Actions If the glrRecover returns a bool indicating whether we're able to recover, then the code is simpler: if (NextHeads.empty() && !glrRecover(...)) return Params.Forest.createOpaque(...); hokein: If the glrRecover returns a bool indicating whether we're able to recover, then the code is…
		sammccallAuthorUnsubmitted Done Reply Inline Actions I had this initially, but: a) it's redundant, and makes it less locally obvious that we've repopulated NewHeads b) soon recovery will always succeed, so !NewHeads.empty() becomes an assertion sammccall: I had this initially, but: a) it's redundant, and makes it less locally obvious that we've…
		glrRecover(Heads, I, Tokens, Params, NextHeads);
		if (NextHeads.empty())
		// FIXME: Ensure the `_ := start-symbol` rules have a fallback
		// error-recovery strategy attached. Then this condition can't happen.
		return Params.Forest.createOpaque(StartSymbol, /Token::Index=/0);
		} else
		++I;

// Form nonterminals containing the token we just consumed.		// Form nonterminals containing the token we just consumed.
SymbolID Lookahead = I + 1 == Terminals.size() ? tokenSymbol(tok::eof)		SymbolID Lookahead =
: Terminals[I + 1].symbol();		I == Terminals.size() ? tokenSymbol(tok::eof) : Terminals[I].symbol();
Reduce(NextHeads, Lookahead);		Reduce(NextHeads, Lookahead);
// Prepare for the next token.		// Prepare for the next token.
std::swap(Heads, NextHeads);		std::swap(Heads, NextHeads);
NextHeads.clear();		NextHeads.clear();
MaybeGC();		MaybeGC();
}		}
LLVM_DEBUG(llvm::dbgs() << llvm::formatv("Reached eof\n"));		LLVM_DEBUG(llvm::dbgs() << llvm::formatv("Reached eof\n"));

		// The parse was successful if we're in state `_ := start-symbol .`
StateID AcceptState = Params.Table.getGoToState(StartState, StartSymbol);		StateID AcceptState = Params.Table.getGoToState(StartState, StartSymbol);
		auto SearchForAccept = [&](llvm::ArrayRef<const GSS::Node *> Heads) {
const ForestNode *Result = nullptr;		const ForestNode *Result = nullptr;
for (const auto *Head : Heads) {		for (const auto *Head : Heads) {
if (Head->State == AcceptState) {		if (Head->State == AcceptState) {
assert(Head->Payload->symbol() == StartSymbol);		assert(Head->Payload->symbol() == StartSymbol);
assert(Result == nullptr && "multiple results!");		assert(Result == nullptr && "multiple results!");
Result = Head->Payload;		Result = Head->Payload;
}		}
}		}
if (Result)		return Result;
		};
		if (auto *Result = SearchForAccept(Heads))
		return *Result;
		// Failed to parse the input, attempt to run recovery.
		// FIXME: this awkwardly repeats the recovery in the loop, when shift fails.
		// More elegant is to include EOF in the token stream, and make the
		// augmented rule: `_ := translation-unit EOF`. In this way recovery at EOF
		// would not be a special case: it show up as a failure to shift the EOF
		// token.
		unsigned I = Terminals.size();
		glrRecover(Heads, I, Tokens, Params, NextHeads);
		Reduce(NextHeads, tokenSymbol(tok::eof));
		if (auto *Result = SearchForAccept(NextHeads))
return *Result;		return *Result;

// We failed to parse the input, returning an opaque forest node for recovery.		// We failed to parse the input, returning an opaque forest node for recovery.
//		// FIXME: as above, we can add fallback error handling so this is impossible.
// FIXME: We will need to invoke our generic error-recovery handlers when we
// reach EOF without reaching accept state, and involving the eof
// token in the above main for-loopmay be the best way to reuse the code).
return Params.Forest.createOpaque(StartSymbol, /Token::Index=/0);		return Params.Forest.createOpaque(StartSymbol, /Token::Index=/0);
}		}

void glrReduce(std::vector<const GSS::Node *> &Heads, SymbolID Lookahead,		void glrReduce(std::vector<const GSS::Node *> &Heads, SymbolID Lookahead,
const ParseParams &Params) {		const ParseParams &Params) {
// Create a new GLRReduce each time for tests, performance doesn't matter.		// Create a new GLRReduce each time for tests, performance doesn't matter.
GLRReduce{Params}(Heads, Lookahead);		GLRReduce{Params}(Heads, Lookahead);
}		}

const GSS::Node GSS::addNode(LRTable::StateID State, const ForestNode Symbol,		const GSS::Node GSS::addNode(LRTable::StateID State, const ForestNode Symbol,

llvm::ArrayRef<const Node *> Parents) {		llvm::ArrayRef<const Node *> Parents) {
Node *Result = new (allocate(Parents.size()))		Node *Result = new (allocate(Parents.size()))
Node({State, GCParity, static_cast<unsigned>(Parents.size())});		Node({State, GCParity, static_cast<uint16_t>(Parents.size())});
Alive.push_back(Result);		Alive.push_back(Result);
++NodesCreated;		++NodesCreated;
Result->Payload = Symbol;		Result->Payload = Symbol;
if (!Parents.empty())		if (!Parents.empty())
llvm::copy(Parents, reinterpret_cast<const Node **>(Result + 1));		llvm::copy(Parents, reinterpret_cast<const Node **>(Result + 1));
return Result;		return Result;
}		}

▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

clang-tools-extra/pseudo/lib/grammar/Grammar.cpp

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	llvm::Optional<SymbolID> Grammar::findNonterminal(llvm::StringRef Name) const {
return llvm::None;		return llvm::None;
}		}

std::string Grammar::dumpRule(RuleID RID) const {		std::string Grammar::dumpRule(RuleID RID) const {
std::string Result;		std::string Result;
llvm::raw_string_ostream OS(Result);		llvm::raw_string_ostream OS(Result);
const Rule &R = T->Rules[RID];		const Rule &R = T->Rules[RID];
OS << symbolName(R.Target) << " :=";		OS << symbolName(R.Target) << " :=";
for (SymbolID SID : R.seq())		for (unsigned I = 0; I < R.Size; ++I) {
OS << " " << symbolName(SID);		OS << " " << symbolName(R.Sequence[I]);
		if (R.RecoveryIndex == I)
		OS << " [recover=" << static_cast<unsigned>(R.Recovery) << "]";
		}
if (R.Guard)		if (R.Guard)
OS << " [guard=" << T->AttributeValues[R.Guard] << "]";		OS << " [guard=" << T->AttributeValues[R.Guard] << "]";
return Result;		return Result;
}		}

std::string Grammar::dumpRules(SymbolID SID) const {		std::string Grammar::dumpRules(SymbolID SID) const {
assert(isNonterminal(SID));		assert(isNonterminal(SID));
std::string Result;		std::string Result;
▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

clang-tools-extra/pseudo/lib/grammar/GrammarBNF.cpp

Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	for (const auto &Spec : Specs) {
for (const RuleSpec::Element &Elt : Spec.Sequence)		for (const RuleSpec::Element &Elt : Spec.Sequence)
Symbols.push_back(Lookup(Elt.Symbol));		Symbols.push_back(Lookup(Elt.Symbol));
T->Rules.push_back(Rule(Lookup(Spec.Target), Symbols));		T->Rules.push_back(Rule(Lookup(Spec.Target), Symbols));
applyAttributes(Spec, *T, T->Rules.back());		applyAttributes(Spec, *T, T->Rules.back());
}		}

assert(T->Rules.size() < (1 << RuleBits) &&		assert(T->Rules.size() < (1 << RuleBits) &&
"Too many rules to fit in RuleID bits!");		"Too many rules to fit in RuleID bits!");
		// Wherever RHS contains { foo }, mark foo for brace-recovery.
		// FIXME: this should be grammar annotations instead.
		for (auto &Rule : T->Rules) {
		for (unsigned I = 2; I < Rule.Size; ++I)
		if (Rule.Sequence[I] == tokenSymbol(tok::r_brace) &&
		Rule.Sequence[I - 2] == tokenSymbol(tok::l_brace) &&
		!isToken(Rule.Sequence[I - 1])) {
		Rule.Recovery = RecoveryStrategy::Braces;
		Rule.RecoveryIndex = I - 1;
		}
		}
const auto &SymbolOrder = getTopologicalOrder(T.get());		const auto &SymbolOrder = getTopologicalOrder(T.get());
llvm::stable_sort(		llvm::stable_sort(
T->Rules, [&SymbolOrder](const Rule &Left, const Rule &Right) {		T->Rules, [&SymbolOrder](const Rule &Left, const Rule &Right) {
// Sorted by the topological order of the nonterminal Target.		// Sorted by the topological order of the nonterminal Target.
return SymbolOrder[Left.Target] < SymbolOrder[Right.Target];		return SymbolOrder[Left.Target] < SymbolOrder[Right.Target];
});		});
for (SymbolID SID = 0; SID < T->Nonterminals.size(); ++SID) {		for (SymbolID SID = 0; SID < T->Nonterminals.size(); ++SID) {
auto StartIt = llvm::partition_point(T->Rules, [&](const Rule &R) {		auto StartIt = llvm::partition_point(T->Rules, [&](const Rule &R) {
▲ Show 20 Lines • Show All 235 Lines • Show Last 20 Lines

clang-tools-extra/pseudo/lib/grammar/LRGraph.cpp

Show First 20 Lines • Show All 114 Lines • ▼ Show 20 Lines	for (const Item &I : Batch)
Next.push_back(I.advance());		Next.push_back(I.advance());
// sort the set to keep order determinism for hash computation.		// sort the set to keep order determinism for hash computation.
llvm::sort(Next);		llvm::sort(Next);
Results.push_back({AdvancedSymbol, std::move(Next)});		Results.push_back({AdvancedSymbol, std::move(Next)});
}		}
return Results;		return Results;
}		}

		std::vector<std::pair<RecoveryStrategy, SymbolID>>
		availableRecovery(const State &S, const Grammar &G) {
		std::vector<std::pair<RecoveryStrategy, SymbolID>> Result;
		for (const Item &I : S.Items) {
		const auto &Rule = G.lookupRule(I.rule());
		if (I.dot() != Rule.RecoveryIndex)
		continue;
		Result.push_back({Rule.Recovery, Rule.seq()[Rule.RecoveryIndex]});
		}
		llvm::sort(Result);
		Result.erase(std::unique(Result.begin(), Result.end()), Result.end());
		return Result;
		}

} // namespace		} // namespace

std::string Item::dump(const Grammar &G) const {		std::string Item::dump(const Grammar &G) const {
const auto &Rule = G.lookupRule(RID);		const auto &Rule = G.lookupRule(RID);
auto ToNames = [&](llvm::ArrayRef<SymbolID> Syms) {		auto ToNames = [&](llvm::ArrayRef<SymbolID> Syms) {
std::vector<llvm::StringRef> Results;		std::vector<llvm::StringRef> Results;
for (auto SID : Syms)		for (auto SID : Syms)
Results.push_back(G.symbolName(SID));		Results.push_back(G.symbolName(SID));
return Results;		return Results;
};		};
return llvm::formatv("{0} := {1} • {2}", G.symbolName(Rule.Target),		return llvm::formatv("{0} := {1} • {2}{3}", G.symbolName(Rule.Target),
llvm::join(ToNames(Rule.seq().take_front(DotPos)), " "),		llvm::join(ToNames(Rule.seq().take_front(DotPos)), " "),
llvm::join(ToNames(Rule.seq().drop_front(DotPos)), " "))		llvm::join(ToNames(Rule.seq().drop_front(DotPos)), " "),
		Rule.RecoveryIndex == DotPos ? " [recovery]" : "")
.str();		.str();
}		}

std::string State::dump(const Grammar &G, unsigned Indent) const {		std::string State::dump(const Grammar &G, unsigned Indent) const {
std::string Result;		std::string Result;
llvm::raw_string_ostream OS(Result);		llvm::raw_string_ostream OS(Result);
for (const auto &Item : Items)		for (const auto &Item : Items)
OS.indent(Indent) << llvm::formatv("{0}\n", Item.dump(G));		OS.indent(Indent) << llvm::formatv("{0}\n", Item.dump(G));
Show All 32 Lines	std::pair<StateID, /inserted/ bool> insert(ItemSet KernelItems) {
StatesIndex.insert({std::move(KernelItems), NextStateID});		StatesIndex.insert({std::move(KernelItems), NextStateID});
return {NextStateID, true};		return {NextStateID, true};
}		}

void insertEdge(StateID Src, StateID Dst, SymbolID Label) {		void insertEdge(StateID Src, StateID Dst, SymbolID Label) {
Edges.push_back({Src, Dst, Label});		Edges.push_back({Src, Dst, Label});
}		}

		void insertRecovery(StateID Src, RecoveryStrategy Strategy,
		SymbolID Result) {
		Recoveries.push_back({Src, Strategy, Result});
		}

// Returns a state with the given id.		// Returns a state with the given id.
const State &find(StateID ID) const {		const State &find(StateID ID) const {
assert(ID < States.size());		assert(ID < States.size());
return States[ID];		return States[ID];
}		}

void addStartState(SymbolID Sym, StateID State) {		void addStartState(SymbolID Sym, StateID State) {
StartStates.push_back({Sym, State});		StartStates.push_back({Sym, State});
}		}

LRGraph build() && {		LRGraph build() && {
States.shrink_to_fit();		States.shrink_to_fit();
Edges.shrink_to_fit();		Edges.shrink_to_fit();
		Recoveries.shrink_to_fit();
llvm::sort(StartStates);		llvm::sort(StartStates);
StartStates.shrink_to_fit();		StartStates.shrink_to_fit();
return LRGraph(std::move(States), std::move(Edges),		return LRGraph(std::move(States), std::move(Edges), std::move(Recoveries),
std::move(StartStates));		std::move(StartStates));
}		}

private:		private:
// Key is the kernel item sets.		// Key is the kernel item sets.
llvm::DenseMap<ItemSet, /index of States/ size_t> StatesIndex;		llvm::DenseMap<ItemSet, /index of States/ size_t> StatesIndex;
std::vector<State> States;		std::vector<State> States;
std::vector<Edge> Edges;		std::vector<Edge> Edges;
		std::vector<Recovery> Recoveries;
const Grammar &G;		const Grammar &G;
std::vector<std::pair<SymbolID, StateID>> StartStates;		std::vector<std::pair<SymbolID, StateID>> StartStates;
} Builder(G);		} Builder(G);

std::vector<StateID> PendingStates;		std::vector<StateID> PendingStates;
// Initialize states with the start symbol.		// Initialize states with the start symbol.
auto RRange = G.table().Nonterminals[G.underscore()].RuleRange;		auto RRange = G.table().Nonterminals[G.underscore()].RuleRange;
for (RuleID RID = RRange.Start; RID < RRange.End; ++RID) {		for (RuleID RID = RRange.Start; RID < RRange.End; ++RID) {
auto StartState = std::vector<Item>{Item::start(RID, G)};		auto StartState = std::vector<Item>{Item::start(RID, G)};
auto Result = Builder.insert(std::move(StartState));		auto Result = Builder.insert(std::move(StartState));
assert(Result.second && "State must be new");		assert(Result.second && "State must be new");
PendingStates.push_back(Result.first);		PendingStates.push_back(Result.first);

const Rule &StartRule = G.lookupRule(RID);		const Rule &StartRule = G.lookupRule(RID);
assert(StartRule.Size == 1 &&		assert(StartRule.Size == 1 &&
"Start rule must have exactly one symbol in its body!");		"Start rule must have exactly one symbol in its body!");
Builder.addStartState(StartRule.seq().front(), Result.first);		Builder.addStartState(StartRule.seq().front(), Result.first);
}		}

while (!PendingStates.empty()) {		while (!PendingStates.empty()) {
auto CurrentStateID = PendingStates.back();		auto StateID = PendingStates.back();
PendingStates.pop_back();		PendingStates.pop_back();
for (auto Next :		for (auto Next : nextAvailableKernelItems(Builder.find(StateID), G)) {
nextAvailableKernelItems(Builder.find(CurrentStateID), G)) {
auto Insert = Builder.insert(Next.second);		auto Insert = Builder.insert(Next.second);
if (Insert.second) // new state, insert to the pending queue.		if (Insert.second) // new state, insert to the pending queue.
PendingStates.push_back(Insert.first);		PendingStates.push_back(Insert.first);
Builder.insertEdge(CurrentStateID, Insert.first, Next.first);		Builder.insertEdge(StateID, Insert.first, Next.first);
}		}
		for (auto Recovery : availableRecovery(Builder.find(StateID), G))
		Builder.insertRecovery(StateID, Recovery.first, Recovery.second);
}		}
return std::move(Builder).build();		return std::move(Builder).build();
}		}

} // namespace pseudo		} // namespace pseudo
} // namespace clang		} // namespace clang

clang-tools-extra/pseudo/lib/grammar/LRTableBuild.cpp

//===--- LRTableBuild.cpp - Build a LRTable from LRGraph ---------- C++--===//		//===--- LRTableBuild.cpp - Build a LRTable from LRGraph ---------- C++--===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "clang-pseudo/grammar/Grammar.h"		#include "clang-pseudo/grammar/Grammar.h"
#include "clang-pseudo/grammar/LRGraph.h"		#include "clang-pseudo/grammar/LRGraph.h"
#include "clang-pseudo/grammar/LRTable.h"		#include "clang-pseudo/grammar/LRTable.h"
#include "clang/Basic/TokenKinds.h"		#include "clang/Basic/TokenKinds.h"
#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
		#include "llvm/Support/raw_ostream.h"
#include <cstdint>		#include <cstdint>

namespace llvm {		namespace llvm {
template <> struct DenseMapInfo<clang::pseudo::LRTable::Entry> {		template <> struct DenseMapInfo<clang::pseudo::LRTable::Entry> {
using Entry = clang::pseudo::LRTable::Entry;		using Entry = clang::pseudo::LRTable::Entry;
static inline Entry getEmptyKey() {		static inline Entry getEmptyKey() {
static Entry E{static_cast<clang::pseudo::SymbolID>(-1), 0,		static Entry E{static_cast<clang::pseudo::SymbolID>(-1), 0,
clang::pseudo::LRTable::Action::sentinel()};		clang::pseudo::LRTable::Action::sentinel()};
Show All 17 Lines
namespace clang {		namespace clang {
namespace pseudo {		namespace pseudo {

struct LRTable::Builder {		struct LRTable::Builder {
std::vector<std::pair<SymbolID, StateID>> StartStates;		std::vector<std::pair<SymbolID, StateID>> StartStates;
llvm::DenseSet<Entry> Entries;		llvm::DenseSet<Entry> Entries;
llvm::DenseMap<StateID, llvm::SmallSet<RuleID, 4>> Reduces;		llvm::DenseMap<StateID, llvm::SmallSet<RuleID, 4>> Reduces;
std::vector<llvm::DenseSet<SymbolID>> FollowSets;		std::vector<llvm::DenseSet<SymbolID>> FollowSets;
		std::vector<LRGraph::Recovery> Recoveries;

LRTable build(unsigned NumStates) && {		LRTable build(unsigned NumStates) && {
// E.g. given the following parsing table with 3 states and 3 terminals:		// E.g. given the following parsing table with 3 states and 3 terminals:
//		//
// a b c		// a b c
// +-------+----+-------+-+		// +-------+----+-------+-+
// \|state0 \| \| s0,r0 \| \|		// \|state0 \| \| s0,r0 \| \|
// \|state1 \| acc\| \| \|		// \|state1 \| acc\| \| \|
Show All 28 Lines	LRTable build(unsigned NumStates) && {
size_t SortedIndex = 0;		size_t SortedIndex = 0;
for (StateID State = 0; State < Table.StateOffset.size(); ++State) {		for (StateID State = 0; State < Table.StateOffset.size(); ++State) {
Table.StateOffset[State] = SortedIndex;		Table.StateOffset[State] = SortedIndex;
while (SortedIndex < Sorted.size() && Sorted[SortedIndex].State == State)		while (SortedIndex < Sorted.size() && Sorted[SortedIndex].State == State)
++SortedIndex;		++SortedIndex;
}		}
Table.StartStates = std::move(StartStates);		Table.StartStates = std::move(StartStates);

		// Error recovery entries: sort (no dups already), and build offset lookup.
		llvm::sort(Recoveries,
		[&](const LRGraph::Recovery &L, const LRGraph::Recovery &R) {
		return std::tie(L.Src, L.Result, L.Strategy) <
		std::tie(R.Src, R.Result, R.Strategy);
		});
		Table.Recoveries.reserve(Recoveries.size());
		for (const auto &R : Recoveries)
		Table.Recoveries.push_back({R.Strategy, R.Result});
		Table.RecoveryOffset = std::vector<uint32_t>(NumStates + 1, 0);
		SortedIndex = 0;
		for (StateID State = 0; State < NumStates; ++State) {
		Table.RecoveryOffset[State] = SortedIndex;
		while (SortedIndex < Recoveries.size() &&
		Recoveries[SortedIndex].Src == State)
		SortedIndex++;
		}
		Table.RecoveryOffset[NumStates] = SortedIndex;
		assert(SortedIndex == Recoveries.size());

// Compile the follow sets into a bitmap.		// Compile the follow sets into a bitmap.
Table.FollowSets.resize(tok::NUM_TOKENS * FollowSets.size());		Table.FollowSets.resize(tok::NUM_TOKENS * FollowSets.size());
for (SymbolID NT = 0; NT < FollowSets.size(); ++NT)		for (SymbolID NT = 0; NT < FollowSets.size(); ++NT)
for (SymbolID Follow : FollowSets[NT])		for (SymbolID Follow : FollowSets[NT])
Table.FollowSets.set(NT * tok::NUM_TOKENS + symbolToToken(Follow));		Table.FollowSets.set(NT * tok::NUM_TOKENS + symbolToToken(Follow));

// Store the reduce actions in a vector partitioned by state.		// Store the reduce actions in a vector partitioned by state.
Table.ReduceOffset.reserve(NumStates + 1);		Table.ReduceOffset.reserve(NumStates + 1);
Show All 10 Lines	LRTable build(unsigned NumStates) && {
}		}
Table.ReduceOffset.push_back(Table.Reduces.size());		Table.ReduceOffset.push_back(Table.Reduces.size());

return Table;		return Table;
}		}
};		};

LRTable LRTable::buildForTests(const Grammar &G, llvm::ArrayRef<Entry> Entries,		LRTable LRTable::buildForTests(const Grammar &G, llvm::ArrayRef<Entry> Entries,
llvm::ArrayRef<ReduceEntry> Reduces) {		llvm::ArrayRef<ReduceEntry> Reduces,
		llvm::ArrayRef<RecoveryEntry> Recoveries) {
StateID MaxState = 0;		StateID MaxState = 0;
for (const auto &Entry : Entries) {		for (const auto &Entry : Entries) {
MaxState = std::max(MaxState, Entry.State);		MaxState = std::max(MaxState, Entry.State);
if (Entry.Act.kind() == LRTable::Action::Shift)		if (Entry.Act.kind() == LRTable::Action::Shift)
MaxState = std::max(MaxState, Entry.Act.getShiftState());		MaxState = std::max(MaxState, Entry.Act.getShiftState());
if (Entry.Act.kind() == LRTable::Action::GoTo)		if (Entry.Act.kind() == LRTable::Action::GoTo)
MaxState = std::max(MaxState, Entry.Act.getGoToState());		MaxState = std::max(MaxState, Entry.Act.getGoToState());
}		}
Builder Build;		Builder Build;
Build.Entries.insert(Entries.begin(), Entries.end());		Build.Entries.insert(Entries.begin(), Entries.end());
for (const ReduceEntry &E : Reduces)		for (const ReduceEntry &E : Reduces)
Build.Reduces[E.State].insert(E.Rule);		Build.Reduces[E.State].insert(E.Rule);
Build.FollowSets = followSets(G);		Build.FollowSets = followSets(G);
		for (const auto &R : Recoveries)
		Build.Recoveries.push_back({R.State, R.Strategy, R.Result});
return std::move(Build).build(/NumStates=/MaxState + 1);		return std::move(Build).build(/NumStates=/MaxState + 1);
}		}

LRTable LRTable::buildSLR(const Grammar &G) {		LRTable LRTable::buildSLR(const Grammar &G) {
auto Graph = LRGraph::buildLR0(G);		auto Graph = LRGraph::buildLR0(G);
Builder Build;		Builder Build;
Build.StartStates = Graph.startStates();		Build.StartStates = Graph.startStates();
		Build.Recoveries = Graph.recoveries();
for (const auto &T : Graph.edges()) {		for (const auto &T : Graph.edges()) {
Action Act = isToken(T.Label) ? Action::shift(T.Dst) : Action::goTo(T.Dst);		Action Act = isToken(T.Label) ? Action::shift(T.Dst) : Action::goTo(T.Dst);
Build.Entries.insert({T.Src, T.Label, Act});		Build.Entries.insert({T.Src, T.Label, Act});
}		}
Build.FollowSets = followSets(G);		Build.FollowSets = followSets(G);
assert(Graph.states().size() <= (1 << StateBits) &&		assert(Graph.states().size() <= (1 << StateBits) &&
"Graph states execceds the maximum limit!");		"Graph states execceds the maximum limit!");
// Add reduce actions.		// Add reduce actions.
Show All 18 Lines

clang-tools-extra/pseudo/test/cxx/empty-member-spec.cpp

	// RUN: clang-pseudo -grammar=%cxx-bnf-file -source=%s --print-forest \| FileCheck %s			// RUN: clang-pseudo -grammar=%cxx-bnf-file -source=%s --print-forest \| FileCheck %s
	class Foo {			class Foo {
	public:			public:
	};			};
	// CHECK: decl-specifier-seq~class-specifier := class-head { member-specification }			// CHECK: decl-specifier-seq~class-specifier := class-head { member-specification [recover=1] }
	// CHECK-NEXT: ├─class-head := class-key class-head-name			// CHECK-NEXT: ├─class-head := class-key class-head-name
	// CHECK-NEXT: │ ├─class-key~CLASS := tok[0]			// CHECK-NEXT: │ ├─class-key~CLASS := tok[0]
	// CHECK-NEXT: │ └─class-head-name~IDENTIFIER := tok[1]			// CHECK-NEXT: │ └─class-head-name~IDENTIFIER := tok[1]
	// CHECK-NEXT: ├─{ := tok[2]			// CHECK-NEXT: ├─{ := tok[2]
	// CHECK-NEXT: ├─member-specification := access-specifier :			// CHECK-NEXT: ├─member-specification := access-specifier :
	// CHECK-NEXT: │ ├─access-specifier~PUBLIC := tok[3]			// CHECK-NEXT: │ ├─access-specifier~PUBLIC := tok[3]
	// CHECK-NEXT: │ └─: := tok[4]			// CHECK-NEXT: │ └─: := tok[4]
	// CHECK-NEXT: └─} := tok[5]			// CHECK-NEXT: └─} := tok[5]

clang-tools-extra/pseudo/test/cxx/recovery-init-list.cpp

This file was added.

				// RUN: clang-pseudo -grammar=%cxx-bnf-file -source=%s --print-forest \| FileCheck %s
				auto x = { complete garbage };
				// CHECK: translation-unit~simple-declaration
				// CHECK-NEXT: ├─decl-specifier-seq~AUTO := tok[0]
				// CHECK-NEXT: ├─init-declarator-list~init-declarator
				// CHECK-NEXT: │ ├─declarator~IDENTIFIER := tok[1]
				// CHECK-NEXT: │ └─initializer~brace-or-equal-initializer
				// CHECK-NEXT: │ ├─= := tok[2]
				// CHECK-NEXT: │ └─initializer-clause~braced-init-list
				// CHECK-NEXT: │ ├─{ := tok[3]
				// CHECK-NEXT: │ ├─initializer-list := <opaque>
				// CHECK-NEXT: │ └─} := tok[6]
				// CHECK-NEXT: └─; := tok[7]

clang-tools-extra/pseudo/unittests/GLRTest.cpp

//===--- GLRTest.cpp - Test the GLR parser ----------------------- C++ --===//		//===--- GLRTest.cpp - Test the GLR parser ----------------------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "clang-pseudo/GLR.h"		#include "clang-pseudo/GLR.h"
		#include "clang-pseudo/Bracket.h"
#include "clang-pseudo/Token.h"		#include "clang-pseudo/Token.h"
#include "clang-pseudo/grammar/Grammar.h"		#include "clang-pseudo/grammar/Grammar.h"
#include "clang/Basic/LangOptions.h"		#include "clang/Basic/LangOptions.h"
#include "clang/Basic/TokenKinds.h"		#include "clang/Basic/TokenKinds.h"
#include "llvm/ADT/StringExtras.h"		#include "llvm/ADT/StringExtras.h"
#include "llvm/Support/FormatVariadic.h"		#include "llvm/Support/FormatVariadic.h"
#include "gmock/gmock.h"		#include "gmock/gmock.h"
#include "gtest/gtest.h"		#include "gtest/gtest.h"
Show All 9 Lines	llvm::raw_ostream &operator<<(llvm::raw_ostream &OS,
return OS;		return OS;
}		}

namespace {		namespace {

using Action = LRTable::Action;		using Action = LRTable::Action;
using testing::AllOf;		using testing::AllOf;
using testing::ElementsAre;		using testing::ElementsAre;
		using testing::IsEmpty;
using testing::UnorderedElementsAre;		using testing::UnorderedElementsAre;

MATCHER_P(state, StateID, "") { return arg->State == StateID; }		MATCHER_P(state, StateID, "") { return arg->State == StateID; }
MATCHER_P(parsedSymbol, FNode, "") { return arg->Payload == FNode; }		MATCHER_P(parsedSymbol, FNode, "") { return arg->Payload == FNode; }
MATCHER_P(parsedSymbolID, SID, "") { return arg->Payload->symbol() == SID; }		MATCHER_P(parsedSymbolID, SID, "") { return arg->Payload->symbol() == SID; }
		MATCHER_P(start, Start, "") { return arg->Payload->startTokenIndex() == Start; }

testing::Matcher<const GSS::Node *>		testing::Matcher<const GSS::Node *>
parents(llvm::ArrayRef<const GSS::Node *> Parents) {		parents(llvm::ArrayRef<const GSS::Node *> Parents) {
return testing::Property(&GSS::Node::parents,		return testing::Property(&GSS::Node::parents,
testing::UnorderedElementsAreArray(Parents));		testing::UnorderedElementsAreArray(Parents));
}		}

class GLRTest : public ::testing::Test {		class GLRTest : public ::testing::Test {
▲ Show 20 Lines • Show All 185 Lines • ▼ Show 20 Lines	TEST_F(GLRTest, ReduceJoiningWithMultipleBases) {
auto EnumNameNode = &Arena.createOpaque(id("enum-name"), /TokenIndex=*/1);		auto EnumNameNode = &Arena.createOpaque(id("enum-name"), /TokenIndex=*/1);

const auto *GSSNode0 =		const auto *GSSNode0 =
GSStack.addNode(/State=/0, /ForestNode=/nullptr, /Parents=/{});		GSStack.addNode(/State=/0, /ForestNode=/nullptr, /Parents=/{});
const auto *GSSNode1 = GSStack.addNode(		const auto *GSSNode1 = GSStack.addNode(
/State=/1, /ForestNode=/CVQualifierNode, /Parents=/{GSSNode0});		/State=/1, /ForestNode=/CVQualifierNode, /Parents=/{GSSNode0});
const auto *GSSNode2 = GSStack.addNode(		const auto *GSSNode2 = GSStack.addNode(
/State=/2, /ForestNode=/CVQualifierNode, /Parents=/{GSSNode0});		/State=/2, /ForestNode=/CVQualifierNode, /Parents=/{GSSNode0});
const auto *GSSNode3 =		const auto *GSSNode3 = GSStack.addNode(
GSStack.addNode(/State=/3, /ForestNode=/ClassNameNode,		/State=/3, /ForestNode=/ClassNameNode,
/Parents=/{GSSNode1});		/Parents=/{GSSNode1});
const auto *GSSNode4 =		const auto *GSSNode4 =
GSStack.addNode(/State=/4, /ForestNode=/EnumNameNode,		GSStack.addNode(/State=/4, /ForestNode=/EnumNameNode,
/Parents=/{GSSNode2});		/Parents=/{GSSNode2});

// FIXME: figure out a way to get rid of the hard-coded reduce RuleID!		// FIXME: figure out a way to get rid of the hard-coded reduce RuleID!
LRTable Table = LRTable::buildForTests(		LRTable Table = LRTable::buildForTests(
G,		G,
{		{
▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	EXPECT_THAT(Heads,
parents(Root))));		parents(Root))));

// When the lookahead is -, reduce is not performed.		// When the lookahead is -, reduce is not performed.
Heads = {GSSNode1};		Heads = {GSSNode1};
glrReduce(Heads, tokenSymbol(tok::minus), {G, Table, Arena, GSStack});		glrReduce(Heads, tokenSymbol(tok::minus), {G, Table, Arena, GSStack});
EXPECT_THAT(Heads, ElementsAre(GSSNode1));		EXPECT_THAT(Heads, ElementsAre(GSSNode1));
}		}

		TEST_F(GLRTest, Recover) {
		// Recovery while parsing "word" inside braces.
		// Before:
		// 0--1({)--2(?)
		// After recovering a `word` at state 1:
		// 0--3(word) // 3 is goto(1, word)
		buildGrammar({"word"}, {});
		LRTable Table = LRTable::buildForTests(
		G, {{/State=/1, id("word"), Action::goTo(3)}}, /Reduce=/{},
		/Recovery=/{{/State=/1, RecoveryStrategy::Braces, id("word")}});

		auto *LBrace = &Arena.createTerminal(tok::l_brace, 0);
		auto *Question1 = &Arena.createTerminal(tok::question, 1);
		const auto *Root = GSStack.addNode(0, nullptr, {});
		hokeinUnsubmitted Not Done Reply Inline Actions nit: for GSS nodes, I will name them like `GSSNode<StateID>`, it is less describable, but it can be easily distinguished from other forest nodes, and match the GSS described in the comment, which I think it is clearer. hokein: nit: for GSS nodes, I will name them like `GSSNode<StateID>`, it is less describable, but it…
		sammccallAuthorUnsubmitted Done Reply Inline Actions Yeah, I found those testcases difficult to read :-( I did work out that the numbers might be state numbers rather than node numbers, but I found it hard to a) remember that, since it's often GSSNode0, GSSNode1 etc and b) remember which state is which. It doesn't generalize, e.g. the RecoverRightmost case has 3 nodes with the same state. sammccall: Yeah, I found those testcases difficult to read :-( I did work out that the numbers might be…
		const auto *OpenedBraces = GSStack.addNode(1, LBrace, {Root});
		const auto *AfterQuestion1 = GSStack.addNode(2, Question1, {OpenedBraces});

		// Need a token stream with paired braces so the strategy works.
		clang::LangOptions LOptions;
		TokenStream Tokens = cook(lex("{ ? ? ? }", LOptions), LOptions);
		pairBrackets(Tokens);
		std::vector<const GSS::Node *> NewHeads;

		unsigned TokenIndex = 2;
		glrRecover({AfterQuestion1}, TokenIndex, Tokens, {G, Table, Arena, GSStack},
		NewHeads);
		EXPECT_EQ(TokenIndex, 4u) << "should skip ahead to matching brace";
		EXPECT_THAT(NewHeads, ElementsAre(
		AllOf(state(3), parsedSymbolID(id("word")),
		parents({OpenedBraces}), start(1u))));
		EXPECT_EQ(NewHeads.front()->Payload->kind(), ForestNode::Opaque);

		// Test recovery failure: omit closing brace so strategy fails
		TokenStream NoRBrace = cook(lex("{ ? ? ? ?", LOptions), LOptions);
		pairBrackets(NoRBrace);
		NewHeads.clear();
		TokenIndex = 2;
		glrRecover({AfterQuestion1}, TokenIndex, NoRBrace,
		{G, Table, Arena, GSStack}, NewHeads);
		EXPECT_EQ(TokenIndex, 2u) << "should not advance on failure";
		EXPECT_THAT(NewHeads, IsEmpty());
		}

		TEST_F(GLRTest, RecoverRightmost) {
		// In a nested block structure, we recover at the innermost possible block.
		// Before:
		// 0--1({)--1({)--1({)
		// After recovering a `block` at inside the second braces:
		// 0--1({)--2(body) // 2 is goto(1, body)
		buildGrammar({"body"}, {});
		LRTable Table = LRTable::buildForTests(
		G, {{/State=/1, id("body"), Action::goTo(2)}}, /Reduce=/{},
		/Recovery=/{{/State=/1, RecoveryStrategy::Braces, id("body")}});

		clang::LangOptions LOptions;
		// Innermost brace is unmatched, to test fallback to next brace.
		TokenStream Tokens = cook(lex("{ { { ? } }", LOptions), LOptions);
		hokeinUnsubmitted Done Reply Inline Actions I suppose there is an extra `?` in the string literal, the index of the brackets (`4`, `5`) used below doesn't match the string literal here. hokein: I suppose there is an extra `?` in the string literal, the index of the brackets (`4`, `5`)…
		sammccallAuthorUnsubmitted Done Reply Inline Actions Oops, yes! Originally this testcase was meant to test more things, and apparently I didn't finish simplifying it... sammccall: Oops, yes! Originally this testcase was meant to test more things, and apparently I didn't…
		Tokens.tokens()[0].Pair = 5;
		Tokens.tokens()[1].Pair = 4;
		Tokens.tokens()[4].Pair = 1;
		Tokens.tokens()[5].Pair = 0;

		auto *Brace1 = &Arena.createTerminal(tok::l_brace, 0);
		auto *Brace2 = &Arena.createTerminal(tok::l_brace, 1);
		auto *Brace3 = &Arena.createTerminal(tok::l_brace, 2);
		const auto *Root = GSStack.addNode(0, nullptr, {});
		const auto *In1 = GSStack.addNode(1, Brace1, {Root});
		const auto *In2 = GSStack.addNode(1, Brace2, {In1});
		const auto *In3 = GSStack.addNode(1, Brace3, {In2});

		unsigned TokenIndex = 3;
		std::vector<const GSS::Node *> NewHeads;
		glrRecover({In3}, TokenIndex, Tokens, {G, Table, Arena, GSStack}, NewHeads);
		EXPECT_EQ(TokenIndex, 5u);
		EXPECT_THAT(NewHeads, ElementsAre(AllOf(state(2), parsedSymbolID(id("body")),
		parents({In2}), start(2u))));
		}

		TEST_F(GLRTest, RecoverAlternatives) {
		// Recovery inside braces with multiple equally good options
		// Before:
		// 0--1({)
		// After recovering either `word` or `number` inside the braces:
		// 0--1({)--2(word) // 2 is goto(1, word)
		// └--3(number) // 3 is goto(1, number)
		buildGrammar({"number", "word"}, {});
		LRTable Table = LRTable::buildForTests(
		G,
		{
		{/State=/1, id("number"), Action::goTo(2)},
		{/State=/1, id("word"), Action::goTo(3)},
		},
		/Reduce=/{},
		/Recovery=/
		{
		{/State=/1, RecoveryStrategy::Braces, id("number")},
		{/State=/1, RecoveryStrategy::Braces, id("word")},
		});
		auto *LBrace = &Arena.createTerminal(tok::l_brace, 0);
		const auto *Root = GSStack.addNode(0, nullptr, {});
		const auto *OpenedBraces = GSStack.addNode(1, LBrace, {Root});

		clang::LangOptions LOptions;
		TokenStream Tokens = cook(lex("{ ? }", LOptions), LOptions);
		pairBrackets(Tokens);
		std::vector<const GSS::Node *> NewHeads;
		unsigned TokenIndex = 1;

		glrRecover({OpenedBraces}, TokenIndex, Tokens, {G, Table, Arena, GSStack},
		NewHeads);
		EXPECT_EQ(TokenIndex, 2u);
		EXPECT_THAT(NewHeads,
		UnorderedElementsAre(AllOf(state(2), parsedSymbolID(id("number")),
		parents({OpenedBraces}), start(1u)),
		AllOf(state(3), parsedSymbolID(id("word")),
		parents({OpenedBraces}), start(1u))));
		}

		// FIXME: Add a test for the spurious recovery mentioned in glrRecovery()
		hokeinUnsubmitted Done Reply Inline Actions nit: I'd probably move this to the comment mentioned in glrRecovery(), which is more discoverable. hokein: nit: I'd probably move this to the comment mentioned in glrRecovery(), which is more…
		// once we can extend the recovery strategies to do so.

TEST_F(GLRTest, PerfectForestNodeSharing) {		TEST_F(GLRTest, PerfectForestNodeSharing) {
// Run the GLR on a simple grammar and test that we build exactly one forest		// Run the GLR on a simple grammar and test that we build exactly one forest
// node per (SymbolID, token range).		// node per (SymbolID, token range).

// This is a grmammar where the original parsing-stack-based forest node		// This is a grmammar where the original parsing-stack-based forest node
// sharing approach will fail. In its LR0 graph, it has two states containing		// sharing approach will fail. In its LR0 graph, it has two states containing
// item `expr := • IDENTIFIER`, and both have different goto states on the		// item `expr := • IDENTIFIER`, and both have different goto states on the
// nonterminal `expr`.		// nonterminal `expr`.
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	TEST_F(GLRTest, GLRReduceOrder) {
EXPECT_EQ(Parsed.dumpRecursive(G), "[ 0, end) test := <ambiguous>\n"		EXPECT_EQ(Parsed.dumpRecursive(G), "[ 0, end) test := <ambiguous>\n"
"[ 0, end) ├─test := IDENTIFIER\n"		"[ 0, end) ├─test := IDENTIFIER\n"
"[ 0, end) │ └─IDENTIFIER := tok[0]\n"		"[ 0, end) │ └─IDENTIFIER := tok[0]\n"
"[ 0, end) └─test := foo\n"		"[ 0, end) └─test := foo\n"
"[ 0, end) └─foo := IDENTIFIER\n"		"[ 0, end) └─foo := IDENTIFIER\n"
"[ 0, end) └─IDENTIFIER := tok[0]\n");		"[ 0, end) └─IDENTIFIER := tok[0]\n");
}		}

		TEST_F(GLRTest, RecoveryEndToEnd) {
		hokeinUnsubmitted Not Done Reply Inline Actions nit: not sure the intention having the `RecoveryEndToEnd` separated from the above recover-related tests, why not grouping them together? hokein: nit: not sure the intention having the `RecoveryEndToEnd` separated from the above recover…
		sammccallAuthorUnsubmitted Done Reply Inline Actions this tests glrParse, and it's grouped with the other glrParse tests consistent with this file (e.g. GLRReduceOrder is with the glrParse tests, not with the glrReduce tests) sammccall: this tests glrParse, and it's grouped with the other glrParse tests consistent with this file…
		// Simple example of brace-based recovery showing:
		// - recovered region includes tokens both ahead of and behind the cursor
		// - multiple possible recovery rules
		// - recovery from outer scopes is rejected
		build(R"bnf(
		_ := block

		block := { block }
		block := { numbers }
		numbers := NUMERIC_CONSTANT NUMERIC_CONSTANT
		)bnf");
		auto LRTable = LRTable::buildSLR(G);
		clang::LangOptions LOptions;
		TokenStream Tokens = cook(lex("{ { 42 ? } }", LOptions), LOptions);
		pairBrackets(Tokens);

		const ForestNode &Parsed =
		glrParse(Tokens, {G, LRTable, Arena, GSStack}, id("block"));
		EXPECT_EQ(Parsed.dumpRecursive(G),
		"[ 0, end) block := { block [recover=1] }\n"
		"[ 0, 1) ├─{ := tok[0]\n"
		"[ 1, 5) ├─block := <ambiguous>\n"
		"[ 1, 5) │ ├─block := { block [recover=1] }\n"
		"[ 1, 2) │ │ ├─{ := tok[1]\n"
		"[ 2, 4) │ │ ├─block := <opaque>\n"
		"[ 4, 5) │ │ └─} := tok[4]\n"
		"[ 1, 5) │ └─block := { numbers [recover=1] }\n"
		"[ 1, 2) │ ├─{ := tok[1]\n"
		"[ 2, 4) │ ├─numbers := <opaque>\n"
		"[ 4, 5) │ └─} := tok[4]\n"
		"[ 5, end) └─} := tok[5]\n");
		}

TEST_F(GLRTest, NoExplicitAccept) {		TEST_F(GLRTest, NoExplicitAccept) {
build(R"bnf(		build(R"bnf(
_ := test		_ := test

test := IDENTIFIER test		test := IDENTIFIER test
test := IDENTIFIER		test := IDENTIFIER
)bnf");		)bnf");
clang::LangOptions LOptions;		clang::LangOptions LOptions;
Show All 39 Lines