This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang-tools-extra/pseudo/
-
pseudo/
-
include/clang-pseudo/
-
clang-pseudo/
1/2
GLR.h
-
grammar/
1/1
LRTable.h
-
lib/
8/10
GLR.cpp
-
grammar/
3/3
LRTable.cpp
-
unittests/
-
GLRTest.cpp

Differential D128297

[pseudo] Track heads as GSS nodes, rather than as "pending actions".
ClosedPublic

Authored by sammccall on Jun 21 2022, 12:23 PM.

Download Raw Diff

Details

Reviewers

hokein

Commits

rGe3ec054dfdf4: [pseudo] Track heads as GSS nodes, rather than as "pending actions".

Summary

IMO this model is simpler to understand (borrowed from the LR0 patch D127357).
It also makes error recovery easier to implement, as we have a simple list of
head nodes lying around to recover from when needed.
(It's not quite as nice as LR0 in this respect though).

It's slightly slower (2.24 -> 2.12 MB/S on my machine = 5%) but nothing close
to as bad as LR0.
However

I think we'd have to eat a litle performance loss otherwise to implement error recovery.
this frees up some complexity budget for optimizations like fastpath push/pop (this + fastpath is already faster than head)
I haven't changed the data structure here and it's now pretty dumb, we can make it faster

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

sammccall created this revision.Jun 21 2022, 12:23 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 21 2022, 12:23 PM

sammccall requested review of this revision.Jun 21 2022, 12:23 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 21 2022, 12:23 PM

Herald added subscribers: cfe-commits, alextsao1999. · View Herald Transcript

Harbormaster completed remote builds in B171163: Diff 438798.Jun 21 2022, 1:01 PM

Thanks, this looks a reasonable change to me.

clang-tools-extra/pseudo/include/clang-pseudo/GLR.h
140	nit: change NewHeads to a pointer? it seems clearer that NewHeads is the output of the function from the caller side `glrShift(OldHeads, ..., &NewHeads)`. I think it would be clearer if glrShift just returns NewHeads, but I understand we want to avoid temporary object for performance reasons.
clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRTable.h
133	the function signature (return type) of `getGotoState` and `getShiftState` is not symmetry, it is fine and by design (we never all getGoToState on an invalid Nonterminal, it is guaranteed by the LR parser, but we might call getShiftState on a dead head). It is probably worth a comment explicitly specifying the difference.
clang-tools-extra/pseudo/lib/GLR.cpp
67	I can see the benefit of keep them tight in the loop (and doing a shift before reduce), but I found the code is a bit confusing, it took me a while to understand. the concept `Heads` and `NextHeads` are different for glrShift, and glrReduce -- The NextHeads returned by the glrShift should be the Heads used in glrReduce, so I was confused when reading `glrReduce(Heads)` the code at first glance -- until I realized that there is a `swap(Heads, NextHeads)` on L68... it seems a little weird that at the end of the loop, we do a cleanup on NextHeads directly (I would image there is a `swap(Head, NextHead)` and then cleanup) Instead of naming the two lists as `Heads` and `NewHeads`, how about naming them `HeadsForShift` (or `ShiftHeads`) and `HeadsForReduce`? the code would look like for ( ... ) { glrShift(HeadsForShift, ..., &HeadsForReduce); .... glrReduce(HeadsForReduce, ..., &HeadsForShift); } It looks clearer to me.
69	I'd probably leave a blank line deliberately after glrShift, because the glrReduce work on the next I+1 token.
114	nit: add a `NewHeads.empty` assertion?
203–205	IIRC, in glrReduce, we only append newly-created GSS nodes in Heads, and never to do deleting, so the Heads will end up with lots of unnecessary heads (heads created for reduces), and we will call `getShiftState` on them. We know that after glrReduce, active heads are heads with available shift actions, so we can optimize it further, we could just use a vector<GSS::Nodes> which just contains active heads , this could be done in the popPending. I think it will increase the performance by reducing the number of unnecessary heads.
267	might be name it PoppedHeadIndex?
361–362	the comment is stale now.
clang-tools-extra/pseudo/lib/grammar/LRTable.cpp
78	instead of using find directly, just use `getActions()`.
79	nit: it is worth a comment saying that if there is a shift action, it must be exactly 1, this is guaranteed by the LR parser (no shift-shift conflict)

As discussed offline, I'm going to land this patch as other approved optimizations are based on it, and there are no big objections.

Please feel free to keep discussing details here and I'll land followups.

clang-tools-extra/pseudo/include/clang-pseudo/GLR.h
140	I think that's only clearer if there's a convention that references are immutable and not used for output, and I don't think that convention exists in LLVM. Maybe there's some association, but using a pointer can also communicate other things (like nullability) that we don't intend here.
clang-tools-extra/pseudo/lib/GLR.cpp
69	I think we have different mental models here. With loop index I: glrShift consumes token I glrReduce consumes no tokens, just rearranges things I think of glrReduce as: a) not operating on a token, at least in the sense that glrShift does b) being part of the work for token I, not token I+1 (it's consuming stack items corresponding to token I) It happens to look ahead at token I+1, but I see that as mostly incidental: with LR(0) we wouldn't do that, with SLR(2) we'd look at both token I+1 and I+2. Token I+1 isn't fundamentally important. (I'm happy to add the blank line, but wanted to clarify because this is one reason I see this formulation as simpler)
114	Why does this function care about that? It just appends to NewHeads.
203–205	Thanks for the offline discussion - I understand this better now! You're right that at the moment we're doing a lookup that happens to yield this information. This is because shift + reduce info is stored in the same table. This would make glrShift cheaper, which could be worth up to 9% now and up to 15% later (after glrReduce optimizations). However D128318 splits these data structures apart to exploit different patterns in them (shift actions are sparser, reduces are denser but have patterns that allows them to be compressed). I don't want to implement this now as that change is a bigger speedup (24%). Fundamentally the idea is to avoid a single hashtable lookup. Storing one bit per (state, terminal) to see whether shift is possible is only 74kB, maybe a big std::bitset would work? Added a FIXME to LRTable::getShiftState.
267	No, it points to the first head that isn't popped. Renamed to NextPopHead
clang-tools-extra/pseudo/lib/grammar/LRTable.cpp
78	OK, but I think we should get rid of getActions soon.

address review comments

This revision was not accepted when it landed; it landed in state Needs Review.Jun 23 2022, 8:27 AM

This revision was landed with ongoing or failed builds.

Closed by commit rGe3ec054dfdf4: [pseudo] Track heads as GSS nodes, rather than as "pending actions". (authored by sammccall). · Explain Why

This revision was automatically updated to reflect the committed changes.

sammccall marked an inline comment as done.

sammccall added a commit: rGe3ec054dfdf4: [pseudo] Track heads as GSS nodes, rather than as "pending actions"..

Harbormaster completed remote builds in B171619: Diff 439407.Jun 23 2022, 8:37 AM

sammccall added a reverting change: rG2c80b5319870: Revert "[pseudo] Track heads as GSS nodes, rather than as "pending actions".".Jun 23 2022, 9:16 AM

Revision Contents

Path

Size

clang-tools-extra/

pseudo/

include/

clang-pseudo/

GLR.h

31 lines

grammar/

LRTable.h

5 lines

lib/

GLR.cpp

128 lines

grammar/

LRTable.cpp

11 lines

unittests/

GLRTest.cpp

174 lines

Diff 439407

clang-tools-extra/pseudo/include/clang-pseudo/GLR.h

	Show First 20 Lines • Show All 126 Lines • ▼ Show 20 Lines
	// and returns a forest node of the start symbol.			// and returns a forest node of the start symbol.
	//			//
	// A rule `_ := StartSymbol` must exit for the chosen start symbol.			// A rule `_ := StartSymbol` must exit for the chosen start symbol.
	//			//
	// If the parsing fails, we model it as an opaque node in the forest.			// If the parsing fails, we model it as an opaque node in the forest.
	const ForestNode &glrParse(const TokenStream &Code, const ParseParams &Params,			const ForestNode &glrParse(const TokenStream &Code, const ParseParams &Params,
	SymbolID StartSymbol);			SymbolID StartSymbol);

	// An active stack head can have multiple available actions (reduce/reduce			// Shift a token onto all OldHeads, placing the results into NewHeads.
	// actions, reduce/shift actions).
	// A step is any one action applied to any one stack head.
	struct ParseStep {
	// A specific stack head.
	const GSS::Node *Head = nullptr;
	// An action associated with the head.
	LRTable::Action Action = LRTable::Action::sentinel();
	};
	// A callback is invoked whenever a new GSS head is created during the GLR
	// parsing process (glrShift, or glrReduce).
	using NewHeadCallback = std::function<void(const GSS::Node *)>;
	// Apply all PendingShift actions on a given GSS state, newly-created heads are
	// passed to the callback.
	//
	// When this function returns, PendingShift is empty.
	//			//
	// Exposed for testing only.			// Exposed for testing only.
	void glrShift(std::vector<ParseStep> &PendingShift, const ForestNode &NextTok,			void glrShift(llvm::ArrayRef<const GSS::Node *> OldHeads,
	const ParseParams &Params, NewHeadCallback NewHeadCB);			const ForestNode &NextTok, const ParseParams &Params,
	// Applies PendingReduce actions, until no more reduce actions are available.			std::vector<const GSS::Node *> &NewHeads);
				hokeinUnsubmitted Not Done Reply Inline Actions nit: change NewHeads to a pointer? it seems clearer that NewHeads is the output of the function from the caller side `glrShift(OldHeads, ..., &NewHeads)`. I think it would be clearer if glrShift just returns NewHeads, but I understand we want to avoid temporary object for performance reasons. hokein: nit: change NewHeads to a pointer? it seems clearer that NewHeads is the output of the function…
				sammccallAuthorUnsubmitted Done Reply Inline Actions I think that's only clearer if there's a convention that references are immutable and not used for output, and I don't think that convention exists in LLVM. Maybe there's some association, but using a pointer can also communicate other things (like nullability) that we don't intend here. sammccall: I think that's only clearer if there's a convention that references are immutable and not used…
	//			// Applies available reductions on Heads, appending resulting heads to the list.
	// When this function returns, PendingReduce is empty. Calls to NewHeadCB may
	// add elements to PendingReduce
	//			//
	// Exposed for testing only.			// Exposed for testing only.
	void glrReduce(std::vector<ParseStep> &PendingReduce, const ParseParams &Params,			void glrReduce(std::vector<const GSS::Node *> &Heads, SymbolID Lookahead,
	NewHeadCallback NewHeadCB);			const ParseParams &Params);

	} // namespace pseudo			} // namespace pseudo
	} // namespace clang			} // namespace clang

	#endif // CLANG_PSEUDO_GLR_H			#endif // CLANG_PSEUDO_GLR_H

clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRTable.h

Show First 20 Lines • Show All 122 Lines • ▼ Show 20 Lines	private:
uint16_t Value : ValueBits;		uint16_t Value : ValueBits;
};		};

// Returns all available actions for the given state on a terminal.		// Returns all available actions for the given state on a terminal.
// Expected to be called by LR parsers.		// Expected to be called by LR parsers.
llvm::ArrayRef<Action> getActions(StateID State, SymbolID Terminal) const;		llvm::ArrayRef<Action> getActions(StateID State, SymbolID Terminal) const;
// Returns the state after we reduce a nonterminal.		// Returns the state after we reduce a nonterminal.
// Expected to be called by LR parsers.		// Expected to be called by LR parsers.
		// REQUIRES: Nonterminal is valid here.
StateID getGoToState(StateID State, SymbolID Nonterminal) const;		StateID getGoToState(StateID State, SymbolID Nonterminal) const;
		// Returns the state after we shift a terminal.
		hokeinUnsubmitted Done Reply Inline Actions the function signature (return type) of `getGotoState` and `getShiftState` is not symmetry, it is fine and by design (we never all getGoToState on an invalid Nonterminal, it is guaranteed by the LR parser, but we might call getShiftState on a dead head). It is probably worth a comment explicitly specifying the difference. hokein: the function signature (return type) of `getGotoState` and `getShiftState` is not symmetry, it…
		// Expected to be called by LR parsers.
		// If the terminal is invalid here, returns None.
		llvm::Optional<StateID> getShiftState(StateID State, SymbolID Terminal) const;

// Looks up available actions.		// Looks up available actions.
// Returns empty if no available actions in the table.		// Returns empty if no available actions in the table.
llvm::ArrayRef<Action> find(StateID State, SymbolID Symbol) const;		llvm::ArrayRef<Action> find(StateID State, SymbolID Symbol) const;

// Returns the state from which the LR parser should start to parse the input		// Returns the state from which the LR parser should start to parse the input
// tokens as the given StartSymbol.		// tokens as the given StartSymbol.
//		//
▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

clang-tools-extra/pseudo/lib/GLR.cpp

Show All 39 Lines
const ForestNode &glrParse(const TokenStream &Tokens, const ParseParams &Params,		const ForestNode &glrParse(const TokenStream &Tokens, const ParseParams &Params,
SymbolID StartSymbol) {		SymbolID StartSymbol) {
assert(isNonterminal(StartSymbol) && "Start symbol must be a nonterminal");		assert(isNonterminal(StartSymbol) && "Start symbol must be a nonterminal");
llvm::ArrayRef<ForestNode> Terminals = Params.Forest.createTerminals(Tokens);		llvm::ArrayRef<ForestNode> Terminals = Params.Forest.createTerminals(Tokens);
auto &G = Params.G;		auto &G = Params.G;
(void)G;		(void)G;
auto &GSS = Params.GSStack;		auto &GSS = Params.GSStack;

// Lists of active shift, reduce actions.
std::vector<ParseStep> PendingShift, PendingReduce;
auto AddSteps = [&](const GSS::Node *Head, SymbolID NextTok) {
for (const auto &Action : Params.Table.getActions(Head->State, NextTok)) {
switch (Action.kind()) {
case LRTable::Action::Shift:
PendingShift.push_back({Head, Action});
break;
case LRTable::Action::Reduce:
PendingReduce.push_back({Head, Action});
break;
default:
llvm_unreachable("unexpected action kind!");
}
}
};
StateID StartState = Params.Table.getStartState(StartSymbol);		StateID StartState = Params.Table.getStartState(StartSymbol);
std::vector<const GSS::Node *> NewHeads = {		std::vector<const GSS::Node > Heads = {GSS.addNode(/State=*/StartState,
GSS.addNode(/State=/StartState,		/ForestNode=/nullptr,
/ForestNode=/nullptr, {})};		{})};
		std::vector<const GSS::Node *> NextHeads;
auto MaybeGC = [&, Roots(std::vector<const GSS::Node *>{}), I(0u)]() mutable {		auto MaybeGC = [&, Roots(std::vector<const GSS::Node *>{}), I(0u)]() mutable {
assert(PendingShift.empty() && PendingReduce.empty() &&		assert(NextHeads.empty() && "Running GC at the wrong time!");
"Running GC at the wrong time!");

if (++I != 20) // Run periodically to balance CPU and memory usage.		if (++I != 20) // Run periodically to balance CPU and memory usage.
return;		return;
I = 0;		I = 0;

// We need to copy the list: Roots is consumed by the GC.		// We need to copy the list: Roots is consumed by the GC.
Roots = NewHeads;		Roots = Heads;
GSS.gc(std::move(Roots));		GSS.gc(std::move(Roots));
};		};
for (const ForestNode &Terminal : Terminals) {		for (unsigned I = 0; I < Terminals.size(); ++I) {
LLVM_DEBUG(llvm::dbgs() << llvm::formatv("Next token {0} (id={1})\n",		LLVM_DEBUG(llvm::dbgs() << llvm::formatv(
G.symbolName(Terminal.symbol()),		"Next token {0} (id={1})\n",
Terminal.symbol()));		G.symbolName(Terminals[I].symbol()), Terminals[I].symbol()));
for (const auto *Head : NewHeads)
		hokeinUnsubmitted Done Reply Inline Actions I can see the benefit of keep them tight in the loop (and doing a shift before reduce), but I found the code is a bit confusing, it took me a while to understand. the concept `Heads` and `NextHeads` are different for glrShift, and glrReduce -- The NextHeads returned by the glrShift should be the Heads used in glrReduce, so I was confused when reading `glrReduce(Heads)` the code at first glance -- until I realized that there is a `swap(Heads, NextHeads)` on L68... it seems a little weird that at the end of the loop, we do a cleanup on NextHeads directly (I would image there is a `swap(Head, NextHead)` and then cleanup) Instead of naming the two lists as `Heads` and `NewHeads`, how about naming them `HeadsForShift` (or `ShiftHeads`) and `HeadsForReduce`? the code would look like for ( ... ) { glrShift(HeadsForShift, ..., &HeadsForReduce); .... glrReduce(HeadsForReduce, ..., &HeadsForShift); } It looks clearer to me. hokein: I can see the benefit of keep them tight in the loop (and doing a shift before reduce), but I…
AddSteps(Head, Terminal.symbol());		glrShift(Heads, Terminals[I], Params, NextHeads);
NewHeads.clear();
		hokeinUnsubmitted Done Reply Inline Actions I'd probably leave a blank line deliberately after glrShift, because the glrReduce work on the next I+1 token. hokein: I'd probably leave a blank line deliberately after glrShift, because the glrReduce work on the…
		sammccallAuthorUnsubmitted Done Reply Inline Actions I think we have different mental models here. With loop index I: glrShift consumes token I glrReduce consumes no tokens, just rearranges things I think of glrReduce as: a) not operating on a token, at least in the sense that glrShift does b) being part of the work for token I, not token I+1 (it's consuming stack items corresponding to token I) It happens to look ahead at token I+1, but I see that as mostly incidental: with LR(0) we wouldn't do that, with SLR(2) we'd look at both token I+1 and I+2. Token I+1 isn't fundamentally important. (I'm happy to add the blank line, but wanted to clarify because this is one reason I see this formulation as simpler) sammccall: I think we have different mental models here. With loop index I: - glrShift consumes token I…
glrReduce(PendingReduce, Params,		SymbolID Lookahead = I + 1 == Terminals.size() ? tokenSymbol(tok::eof)
[&](const GSS::Node * NewHead) {		: Terminals[I + 1].symbol();
// A reduce will enable more steps.		glrReduce(NextHeads, Lookahead, Params);
AddSteps(NewHead, Terminal.symbol());
});

glrShift(PendingShift, Terminal, Params,		std::swap(Heads, NextHeads);
[&](const GSS::Node *NewHead) { NewHeads.push_back(NewHead); });		NextHeads.clear();
MaybeGC();		MaybeGC();
}		}
LLVM_DEBUG(llvm::dbgs() << llvm::formatv("Next is eof\n"));		LLVM_DEBUG(llvm::dbgs() << llvm::formatv("Reached eof\n"));
for (const auto *Heads : NewHeads)
AddSteps(Heads, tokenSymbol(tok::eof));

StateID AcceptState = Params.Table.getGoToState(StartState, StartSymbol);		StateID AcceptState = Params.Table.getGoToState(StartState, StartSymbol);
// Collect new heads created from the final reduce.
std::vector<const GSS::Node*> Heads;
glrReduce(PendingReduce, Params, [&](const GSS::Node *NewHead) {
Heads.push_back(NewHead);
// A reduce will enable more steps.
AddSteps(NewHead, tokenSymbol(tok::eof));
});

const ForestNode *Result = nullptr;		const ForestNode *Result = nullptr;
for (const auto *Head : Heads) {		for (const auto *Head : Heads) {
if (Head->State == AcceptState) {		if (Head->State == AcceptState) {
assert(Head->Payload->symbol() == StartSymbol);		assert(Head->Payload->symbol() == StartSymbol);
assert(Result == nullptr && "multiple results!");		assert(Result == nullptr && "multiple results!");
Result = Head->Payload;		Result = Head->Payload;
}		}
}		}
Show All 15 Lines
// shifting a token, we shift only once by combining these heads.		// shifting a token, we shift only once by combining these heads.
//		//
// E.g. we have two heads (2, 3) in the GSS, and will shift both to reach 4:		// E.g. we have two heads (2, 3) in the GSS, and will shift both to reach 4:
// 0---1---2		// 0---1---2
// └---3		// └---3
// After the shift action, the GSS is:		// After the shift action, the GSS is:
// 0---1---2---4		// 0---1---2---4
// └---3---┘		// └---3---┘
void glrShift(std::vector<ParseStep> &PendingShift, const ForestNode &NewTok,		void glrShift(llvm::ArrayRef<const GSS::Node *> OldHeads,
const ParseParams &Params, NewHeadCallback NewHeadCB) {		const ForestNode &NewTok, const ParseParams &Params,
		std::vector<const GSS::Node *> &NewHeads) {
		hokeinUnsubmitted Not Done Reply Inline Actions nit: add a `NewHeads.empty` assertion? hokein: nit: add a `NewHeads.empty` assertion?
		sammccallAuthorUnsubmitted Done Reply Inline Actions Why does this function care about that? It just appends to NewHeads. sammccall: Why does this function care about that? It just appends to NewHeads.
assert(NewTok.kind() == ForestNode::Terminal);		assert(NewTok.kind() == ForestNode::Terminal);
assert(llvm::all_of(PendingShift,
[](const ParseStep &Step) {
return Step.Action.kind() == LRTable::Action::Shift;
}) &&
"Pending shift actions must be shift actions");
LLVM_DEBUG(llvm::dbgs() << llvm::formatv(" Shift {0} ({1} active heads):\n",		LLVM_DEBUG(llvm::dbgs() << llvm::formatv(" Shift {0} ({1} active heads):\n",
Params.G.symbolName(NewTok.symbol()),		Params.G.symbolName(NewTok.symbol()),
PendingShift.size()));		OldHeads.size()));

// We group pending shifts by their target state so we can merge them.		// We group pending shifts by their target state so we can merge them.
llvm::stable_sort(PendingShift, [](const ParseStep &L, const ParseStep &R) {		llvm::SmallVector<std::pair<StateID, const GSS::Node *>, 8> Shifts;
return L.Action.getShiftState() < R.Action.getShiftState();		for (const auto *H : OldHeads)
});		if (auto S = Params.Table.getShiftState(H->State, NewTok.symbol()))
auto Rest = llvm::makeArrayRef(PendingShift);		Shifts.push_back({*S, H});
		llvm::stable_sort(Shifts, llvm::less_first{});

		auto Rest = llvm::makeArrayRef(Shifts);
llvm::SmallVector<const GSS::Node *> Parents;		llvm::SmallVector<const GSS::Node *> Parents;
while (!Rest.empty()) {		while (!Rest.empty()) {
// Collect the batch of PendingShift that have compatible shift states.		// Collect the batch of PendingShift that have compatible shift states.
// Their heads become TempParents, the parents of the new GSS node.		// Their heads become TempParents, the parents of the new GSS node.
StateID NextState = Rest.front().Action.getShiftState();		StateID NextState = Rest.front().first;

Parents.clear();		Parents.clear();
for (const auto &Base : Rest) {		for (const auto &Base : Rest) {
if (Base.Action.getShiftState() != NextState)		if (Base.first != NextState)
break;		break;
Parents.push_back(Base.Head);		Parents.push_back(Base.second);
}		}
Rest = Rest.drop_front(Parents.size());		Rest = Rest.drop_front(Parents.size());

LLVM_DEBUG(llvm::dbgs() << llvm::formatv(" --> S{0} ({1} heads)\n",		LLVM_DEBUG(llvm::dbgs() << llvm::formatv(" --> S{0} ({1} heads)\n",
NextState, Parents.size()));		NextState, Parents.size()));
NewHeadCB(Params.GSStack.addNode(NextState, &NewTok, Parents));		NewHeads.push_back(Params.GSStack.addNode(NextState, &NewTok, Parents));
}		}
PendingShift.clear();
}		}

namespace {		namespace {
// A KeyedQueue yields pairs of keys and values in order of the keys.		// A KeyedQueue yields pairs of keys and values in order of the keys.
template <typename Key, typename Value>		template <typename Key, typename Value>
using KeyedQueue =		using KeyedQueue =
std::priority_queue<std::pair<Key, Value>,		std::priority_queue<std::pair<Key, Value>,
std::vector<std::pair<Key, Value>>, llvm::less_first>;		std::vector<std::pair<Key, Value>>, llvm::less_first>;
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
// └--2(cv-qualifier)--┘ // goto(2, type-name)		// └--2(cv-qualifier)--┘ // goto(2, type-name)
//		//
// Before (joining due to same goto state, the same base):		// Before (joining due to same goto state, the same base):
// 0--1(class-name)--3(STAR)		// 0--1(class-name)--3(STAR)
// └--2(enum-name)--4(STAR)		// └--2(enum-name)--4(STAR)
// After reducing 3 by `pointer := class-name STAR` and		// After reducing 3 by `pointer := class-name STAR` and
// 2 by`enum-name := class-name STAR`:		// 2 by`enum-name := class-name STAR`:
// 0--5(pointer) // 5 is goto(0, pointer)		// 0--5(pointer) // 5 is goto(0, pointer)
void glrReduce(std::vector<ParseStep> &PendingReduce, const ParseParams &Params,		void glrReduce(std::vector<const GSS::Node *> &Heads, SymbolID Lookahead,
NewHeadCallback NewHeadCB) {		const ParseParams &Params) {
		assert(isToken(Lookahead));
		hokeinUnsubmitted Not Done Reply Inline Actions IIRC, in glrReduce, we only append newly-created GSS nodes in Heads, and never to do deleting, so the Heads will end up with lots of unnecessary heads (heads created for reduces), and we will call `getShiftState` on them. We know that after glrReduce, active heads are heads with available shift actions, so we can optimize it further, we could just use a vector<GSS::Nodes> which just contains active heads , this could be done in the popPending. I think it will increase the performance by reducing the number of unnecessary heads. hokein: IIRC, in glrReduce, we only append newly-created GSS nodes in Heads, and never to do deleting…
		sammccallAuthorUnsubmitted Done Reply Inline Actions Thanks for the offline discussion - I understand this better now! You're right that at the moment we're doing a lookup that happens to yield this information. This is because shift + reduce info is stored in the same table. This would make glrShift cheaper, which could be worth up to 9% now and up to 15% later (after glrReduce optimizations). However D128318 splits these data structures apart to exploit different patterns in them (shift actions are sparser, reduces are denser but have patterns that allows them to be compressed). I don't want to implement this now as that change is a bigger speedup (24%). Fundamentally the idea is to avoid a single hashtable lookup. Storing one bit per (state, terminal) to see whether shift is possible is only 74kB, maybe a big std::bitset would work? Added a FIXME to LRTable::getShiftState. sammccall: Thanks for the offline discussion - I understand this better now! You're right that at the…
// There are two interacting complications:		// There are two interacting complications:
// 1. Performing one reduce can unlock new reduces on the newly-created head.		// 1. Performing one reduce can unlock new reduces on the newly-created head.
// 2a. The ambiguous ForestNodes must be complete (have all sequence nodes).		// 2a. The ambiguous ForestNodes must be complete (have all sequence nodes).
// This means we must have unlocked all the reduces that contribute to it.		// This means we must have unlocked all the reduces that contribute to it.
// 2b. Similarly, the new GSS nodes must be complete (have all parents).		// 2b. Similarly, the new GSS nodes must be complete (have all parents).
//		//
// We define a "family" of reduces as those that produce the same symbol and		// We define a "family" of reduces as those that produce the same symbol and
// cover the same range of tokens. These are exactly the set of reductions		// cover the same range of tokens. These are exactly the set of reductions
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	void glrReduce(std::vector<const GSS::Node *> &Heads, SymbolID Lookahead,
struct PushSpec {		struct PushSpec {
// A base node is the head after popping the GSS nodes we are reducing.		// A base node is the head after popping the GSS nodes we are reducing.
const GSS::Node* Base = nullptr;		const GSS::Node* Base = nullptr;
Sequence Seq;		Sequence Seq;
};		};
KeyedQueue<Family, PushSpec> Sequences;		KeyedQueue<Family, PushSpec> Sequences;

Sequence TempSequence;		Sequence TempSequence;

		// We treat Heads as a queue of Pop operations still to be performed.
		// NextPopHead is our position within it.
		unsigned NextPopHead = 0;
		hokeinUnsubmitted Done Reply Inline Actions might be name it PoppedHeadIndex? hokein: might be name it PoppedHeadIndex?
		sammccallAuthorUnsubmitted Done Reply Inline Actions No, it points to the first head that isn't popped. Renamed to NextPopHead sammccall: No, it points to the first head that isn't popped. Renamed to NextPopHead
// Pop walks up the parent chain(s) for a reduction from Head by to Rule.		// Pop walks up the parent chain(s) for a reduction from Head by to Rule.
// Once we reach the end, record the bases and sequences.		// Once we reach the end, record the bases and sequences.
auto Pop = [&](const GSS::Node *Head, RuleID RID) {		auto Pop = [&](const GSS::Node *Head, RuleID RID) {
LLVM_DEBUG(llvm::dbgs() << " Pop " << Params.G.dumpRule(RID) << "\n");		LLVM_DEBUG(llvm::dbgs() << " Pop " << Params.G.dumpRule(RID) << "\n");
const auto &Rule = Params.G.lookupRule(RID);		const auto &Rule = Params.G.lookupRule(RID);
Family F{/Start=/0, /Symbol=/Rule.Target, /Rule=/RID};		Family F{/Start=/0, /Symbol=/Rule.Target, /Rule=/RID};
TempSequence.resize_for_overwrite(Rule.Size);		TempSequence.resize_for_overwrite(Rule.Size);
auto DFS = [&](const GSS::Node *N, unsigned I, auto &DFS) {		auto DFS = [&](const GSS::Node *N, unsigned I, auto &DFS) {
if (I == Rule.Size) {		if (I == Rule.Size) {
F.Start = TempSequence.front()->startTokenIndex();		F.Start = TempSequence.front()->startTokenIndex();
LLVM_DEBUG(llvm::dbgs() << " --> base at S" << N->State << "\n");		LLVM_DEBUG(llvm::dbgs() << " --> base at S" << N->State << "\n");
Sequences.emplace(F, PushSpec{N, TempSequence});		Sequences.emplace(F, PushSpec{N, TempSequence});
return;		return;
}		}
TempSequence[Rule.Size - 1 - I] = N->Payload;		TempSequence[Rule.Size - 1 - I] = N->Payload;
for (const GSS::Node *Parent : N->parents())		for (const GSS::Node *Parent : N->parents())
DFS(Parent, I + 1, DFS);		DFS(Parent, I + 1, DFS);
};		};
DFS(Head, 0, DFS);		DFS(Head, 0, DFS);
};		};
auto PopPending = [&] {		auto PopPending = [&] {
for (const ParseStep &Pending : PendingReduce)		for (; NextPopHead < Heads.size(); ++NextPopHead) {
Pop(Pending.Head, Pending.Action.getReduceRule());		// FIXME: if there's exactly one head in the queue, and the pop stage
PendingReduce.clear();		// is trivial, we could pop + push without touching the expensive queues.
		for (const auto &A :
		Params.Table.getActions(Heads[NextPopHead]->State, Lookahead)) {
		if (A.kind() != LRTable::Action::Reduce)
		continue;
		Pop(Heads[NextPopHead], A.getReduceRule());
		}
		}
};		};

std::vector<std::pair</Goto/ StateID, const GSS::Node *>> FamilyBases;		std::vector<std::pair</Goto/ StateID, const GSS::Node *>> FamilyBases;
std::vector<std::pair<RuleID, Sequence>> FamilySequences;		std::vector<std::pair<RuleID, Sequence>> FamilySequences;

std::vector<const GSS::Node *> TempGSSNodes;		std::vector<const GSS::Node *> TempGSSNodes;
std::vector<const ForestNode *> TempForestNodes;		std::vector<const ForestNode *> TempForestNodes;

▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	while (!BasesLeft.empty()) {
StateID NextState = BasesLeft.front().first;		StateID NextState = BasesLeft.front().first;
auto &Parents = TempGSSNodes;		auto &Parents = TempGSSNodes;
Parents.clear();		Parents.clear();
for (const auto &Base : BasesLeft) {		for (const auto &Base : BasesLeft) {
if (Base.first != NextState)		if (Base.first != NextState)
break;		break;
Parents.push_back(Base.second);		Parents.push_back(Base.second);
}		}
BasesLeft = BasesLeft.drop_front(Parents.size());		BasesLeft = BasesLeft.drop_front(Parents.size());
		Heads.push_back(Params.GSStack.addNode(NextState, Parsed, Parents));
		hokeinUnsubmitted Done Reply Inline Actions the comment is stale now. hokein: the comment is stale now.
// Invoking the callback for new heads, a real GLR parser may add new
// reduces to the PendingReduce queue!
NewHeadCB(Params.GSStack.addNode(NextState, Parsed, Parents));
}		}
PopPending();		PopPending();
}		}
assert(Sequences.empty());		assert(Sequences.empty());
}		}

const GSS::Node GSS::addNode(LRTable::StateID State, const ForestNode Symbol,		const GSS::Node GSS::addNode(LRTable::StateID State, const ForestNode Symbol,
llvm::ArrayRef<const Node *> Parents) {		llvm::ArrayRef<const Node *> Parents) {
▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

clang-tools-extra/pseudo/lib/grammar/LRTable.cpp

Show First 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	for (SymbolID NontermID = 0; NontermID < G.table().Nonterminals.size();
OS.indent(4) << llvm::formatv("'{0}': go to state {1}\n",		OS.indent(4) << llvm::formatv("'{0}': go to state {1}\n",
G.symbolName(NontermID),		G.symbolName(NontermID),
getGoToState(S, NontermID));		getGoToState(S, NontermID));
}		}
}		}
return OS.str();		return OS.str();
}		}

		llvm::Optional<LRTable::StateID>
		LRTable::getShiftState(StateID State, SymbolID Terminal) const {
		// FIXME: we spend a significant amount of time on misses here.
		// We could consider storing a std::bitset for a cheaper test?
		hokeinUnsubmitted Done Reply Inline Actions instead of using find directly, just use `getActions()`. hokein: instead of using find directly, just use `getActions()`.
		sammccallAuthorUnsubmitted Done Reply Inline Actions OK, but I think we should get rid of getActions soon. sammccall: OK, but I think we should get rid of getActions soon.
		assert(pseudo::isToken(Terminal) && "expected terminal symbol!");
		hokeinUnsubmitted Done Reply Inline Actions nit: it is worth a comment saying that if there is a shift action, it must be exactly 1, this is guaranteed by the LR parser (no shift-shift conflict) hokein: nit: it is worth a comment saying that if there is a shift action, it must be exactly 1, this…
		for (const auto &Result : getActions(State, Terminal))
		if (Result.kind() == Action::Shift)
		return Result.getShiftState(); // unique: no shift/shift conflicts.
		return llvm::None;
		}

llvm::ArrayRef<LRTable::Action> LRTable::getActions(StateID State,		llvm::ArrayRef<LRTable::Action> LRTable::getActions(StateID State,
SymbolID Terminal) const {		SymbolID Terminal) const {
assert(pseudo::isToken(Terminal) && "expect terminal symbol!");		assert(pseudo::isToken(Terminal) && "expect terminal symbol!");
return find(State, Terminal);		return find(State, Terminal);
}		}

LRTable::StateID LRTable::getGoToState(StateID State,		LRTable::StateID LRTable::getGoToState(StateID State,
SymbolID Nonterminal) const {		SymbolID Nonterminal) const {
Show All 39 Lines

clang-tools-extra/pseudo/unittests/GLRTest.cpp

//===--- GLRTest.cpp - Test the GLR parser ----------------------- C++ --===//		//===--- GLRTest.cpp - Test the GLR parser ----------------------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "clang-pseudo/GLR.h"		#include "clang-pseudo/GLR.h"
#include "clang-pseudo/grammar/Grammar.h"
#include "clang-pseudo/Token.h"		#include "clang-pseudo/Token.h"
		#include "clang-pseudo/grammar/Grammar.h"
#include "clang/Basic/LangOptions.h"		#include "clang/Basic/LangOptions.h"
#include "clang/Basic/TokenKinds.h"		#include "clang/Basic/TokenKinds.h"
#include "llvm/ADT/StringExtras.h"		#include "llvm/ADT/StringExtras.h"
#include "llvm/Support/FormatVariadic.h"		#include "llvm/Support/FormatVariadic.h"
#include "gmock/gmock.h"		#include "gmock/gmock.h"
#include "gtest/gtest.h"		#include "gtest/gtest.h"
#include <memory>		#include <memory>

namespace clang {		namespace clang {
namespace pseudo {		namespace pseudo {

llvm::raw_ostream &operator<<(llvm::raw_ostream &OS,		llvm::raw_ostream &operator<<(llvm::raw_ostream &OS,
const std::vector<const GSS::Node *> &Heads) {		const std::vector<const GSS::Node *> &Heads) {
for (const auto *Head : Heads)		for (const auto *Head : Heads)
OS << *Head << "\n";		OS << *Head << "\n";
return OS;		return OS;
}		}

namespace {		namespace {

using Action = LRTable::Action;		using Action = LRTable::Action;
using testing::AllOf;		using testing::AllOf;
		using testing::UnorderedElementsAre;

MATCHER_P(state, StateID, "") { return arg->State == StateID; }		MATCHER_P(state, StateID, "") { return arg->State == StateID; }
MATCHER_P(parsedSymbol, FNode, "") { return arg->Payload == FNode; }		MATCHER_P(parsedSymbol, FNode, "") { return arg->Payload == FNode; }
MATCHER_P(parsedSymbolID, SID, "") { return arg->Payload->symbol() == SID; }		MATCHER_P(parsedSymbolID, SID, "") { return arg->Payload->symbol() == SID; }

testing::Matcher<const GSS::Node *>		testing::Matcher<const GSS::Node *>
parents(llvm::ArrayRef<const GSS::Node *> Parents) {		parents(llvm::ArrayRef<const GSS::Node *> Parents) {
return testing::Property(&GSS::Node::parents,		return testing::Property(&GSS::Node::parents,
Show All 36 Lines	RuleID ruleFor(llvm::StringRef NonterminalName) const {
if (RuleRange.End - RuleRange.Start == 1)		if (RuleRange.End - RuleRange.Start == 1)
return G->table().Nonterminals[id(NonterminalName)].RuleRange.Start;		return G->table().Nonterminals[id(NonterminalName)].RuleRange.Start;
ADD_FAILURE() << "Expected a single rule for " << NonterminalName		ADD_FAILURE() << "Expected a single rule for " << NonterminalName
<< ", but it has " << RuleRange.End - RuleRange.Start		<< ", but it has " << RuleRange.End - RuleRange.Start
<< " rule!\n";		<< " rule!\n";
return 0;		return 0;
}		}

NewHeadCallback captureNewHeads() {
return [this](const GSS::Node *NewHead) {
NewHeadResults.push_back(NewHead);
};
};

protected:		protected:
std::unique_ptr<Grammar> G;		std::unique_ptr<Grammar> G;
ForestArena Arena;		ForestArena Arena;
GSS GSStack;		GSS GSStack;
std::vector<const GSS::Node*> NewHeadResults;
};		};

TEST_F(GLRTest, ShiftMergingHeads) {		TEST_F(GLRTest, ShiftMergingHeads) {
// Given a test case where we have two heads 1, 2, 3 in the GSS, the heads 1,		// Given a test case where we have two heads 1, 2, 3 in the GSS, the heads 1,
// 2 have shift actions to reach state 4, and the head 3 has a shift action to		// 2 have shift actions to reach state 4, and the head 3 has a shift action to
// reach state 5:		// reach state 5:
// 0--1		// 0--1
// └--2		// └--2
// └--3		// └--3
// After the shift action, the GSS (with new heads 4, 5) is:		// After the shift action, the GSS (with new heads 4, 5) is:
// 0---1---4		// 0---1---4
// └---2---┘		// └---2---┘
// └---3---5		// └---3---5
auto *GSSNode0 =		auto *GSSNode0 =
GSStack.addNode(/State=/0, /ForestNode=/nullptr, /Parents=/{});		GSStack.addNode(/State=/0, /ForestNode=/nullptr, /Parents=/{});
auto GSSNode1 = GSStack.addNode(/State=/0, /ForestNode=*/nullptr,		auto GSSNode1 = GSStack.addNode(/State=/1, /ForestNode=*/nullptr,
/Parents=/{GSSNode0});		/Parents=/{GSSNode0});
auto GSSNode2 = GSStack.addNode(/State=/0, /ForestNode=*/nullptr,		auto GSSNode2 = GSStack.addNode(/State=/2, /ForestNode=*/nullptr,
/Parents=/{GSSNode0});		/Parents=/{GSSNode0});
auto GSSNode3 = GSStack.addNode(/State=/0, /ForestNode=*/nullptr,		auto GSSNode3 = GSStack.addNode(/State=/3, /ForestNode=*/nullptr,
/Parents=/{GSSNode0});		/Parents=/{GSSNode0});

buildGrammar({}, {}); // Create a fake empty grammar.		buildGrammar({}, {}); // Create a fake empty grammar.
LRTable T = LRTable::buildForTests(G->table(), /Entries=/{});		LRTable T =
		LRTable::buildForTests(G->table(), /Entries=/{
		{1, tokenSymbol(tok::semi), Action::shift(4)},
		{2, tokenSymbol(tok::semi), Action::shift(4)},
		{3, tokenSymbol(tok::semi), Action::shift(5)},
		});

ForestNode &SemiTerminal = Arena.createTerminal(tok::semi, 0);		ForestNode &SemiTerminal = Arena.createTerminal(tok::semi, 0);
std::vector<ParseStep> PendingShift = {		std::vector<const GSS::Node *> NewHeads;
{GSSNode1, Action::shift(4)},		glrShift({GSSNode1, GSSNode2, GSSNode3}, SemiTerminal,
{GSSNode3, Action::shift(5)},		{*G, T, Arena, GSStack}, NewHeads);
{GSSNode2, Action::shift(4)},
};
glrShift(PendingShift, SemiTerminal, {*G, T, Arena, GSStack},
captureNewHeads());

EXPECT_THAT(NewHeadResults, testing::UnorderedElementsAre(		EXPECT_THAT(NewHeads,
AllOf(state(4), parsedSymbol(&SemiTerminal),		UnorderedElementsAre(AllOf(state(4), parsedSymbol(&SemiTerminal),
parents({GSSNode1, GSSNode2})),		parents({GSSNode1, GSSNode2})),
AllOf(state(5), parsedSymbol(&SemiTerminal),		AllOf(state(5), parsedSymbol(&SemiTerminal),
parents({GSSNode3}))))		parents({GSSNode3}))))
<< NewHeadResults;		<< NewHeads;
}		}

TEST_F(GLRTest, ReduceConflictsSplitting) {		TEST_F(GLRTest, ReduceConflictsSplitting) {
// Before (splitting due to R/R conflict):		// Before (splitting due to R/R conflict):
// 0--1(IDENTIFIER)		// 0--1(IDENTIFIER)
// After reducing 1 by `class-name := IDENTIFIER` and		// After reducing 1 by `class-name := IDENTIFIER` and
// `enum-name := IDENTIFIER`:		// `enum-name := IDENTIFIER`:
// 0--2(class-name) // 2 is goto(0, class-name)		// 0--2(class-name) // 2 is goto(0, class-name)
// └--3(enum-name) // 3 is goto(0, enum-name)		// └--3(enum-name) // 3 is goto(0, enum-name)
buildGrammar({"class-name", "enum-name"},		buildGrammar({"class-name", "enum-name"},
{"class-name := IDENTIFIER", "enum-name := IDENTIFIER"});		{"class-name := IDENTIFIER", "enum-name := IDENTIFIER"});

LRTable Table = LRTable::buildForTests(		LRTable Table = LRTable::buildForTests(
G->table(), {{/State=/0, id("class-name"), Action::goTo(2)},		G->table(), {
{/State=/0, id("enum-name"), Action::goTo(3)}});		{/State=/0, id("class-name"), Action::goTo(2)},
		{/State=/0, id("enum-name"), Action::goTo(3)},
		{/State=/1, tokenSymbol(tok::l_brace),
		Action::reduce(ruleFor("class-name"))},
		{/State=/1, tokenSymbol(tok::l_brace),
		Action::reduce(ruleFor("enum-name"))},
		});

const auto *GSSNode0 =		const auto *GSSNode0 =
GSStack.addNode(/State=/0, /ForestNode=/nullptr, /Parents=/{});		GSStack.addNode(/State=/0, /ForestNode=/nullptr, /Parents=/{});
const auto *GSSNode1 =		const auto *GSSNode1 =
GSStack.addNode(3, &Arena.createTerminal(tok::identifier, 0), {GSSNode0});		GSStack.addNode(1, &Arena.createTerminal(tok::identifier, 0), {GSSNode0});

std::vector<ParseStep> PendingReduce = {		std::vector<const GSS::Node *> Heads = {GSSNode1};
{GSSNode1, Action::reduce(ruleFor("class-name"))},		glrReduce(Heads, tokenSymbol(tok::l_brace), {*G, Table, Arena, GSStack});
{GSSNode1, Action::reduce(ruleFor("enum-name"))}};		EXPECT_THAT(Heads, UnorderedElementsAre(
glrReduce(PendingReduce, {*G, Table, Arena, GSStack},		GSSNode1,
captureNewHeads());
EXPECT_THAT(NewHeadResults,
testing::UnorderedElementsAre(
AllOf(state(2), parsedSymbolID(id("class-name")),		AllOf(state(2), parsedSymbolID(id("class-name")),
parents({GSSNode0})),		parents({GSSNode0})),
AllOf(state(3), parsedSymbolID(id("enum-name")),		AllOf(state(3), parsedSymbolID(id("enum-name")),
parents({GSSNode0})))) << NewHeadResults;		parents({GSSNode0}))))
		<< Heads;
}		}

TEST_F(GLRTest, ReduceSplittingDueToMultipleBases) {		TEST_F(GLRTest, ReduceSplittingDueToMultipleBases) {
// Before (splitting due to multiple bases):		// Before (splitting due to multiple bases):
// 2(class-name)--4(*)		// 2(class-name)--4(*)
// 3(enum-name)---┘		// 3(enum-name)---┘
// After reducing 4 by `ptr-operator := *`:		// After reducing 4 by `ptr-operator := *`:
// 2(class-name)--5(ptr-operator) // 5 is goto(2, ptr-operator)		// 2(class-name)--5(ptr-operator) // 5 is goto(2, ptr-operator)
Show All 9 Lines	TEST_F(GLRTest, ReduceSplittingDueToMultipleBases) {
const auto *GSSNode3 =		const auto *GSSNode3 =
GSStack.addNode(/State=/3, /ForestNode=/EnumNameNode, /Parents=/{});		GSStack.addNode(/State=/3, /ForestNode=/EnumNameNode, /Parents=/{});
const auto *GSSNode4 = GSStack.addNode(		const auto *GSSNode4 = GSStack.addNode(
/State=/4, &Arena.createTerminal(tok::star, /TokenIndex=/1),		/State=/4, &Arena.createTerminal(tok::star, /TokenIndex=/1),
/Parents=/{GSSNode2, GSSNode3});		/Parents=/{GSSNode2, GSSNode3});

LRTable Table = LRTable::buildForTests(		LRTable Table = LRTable::buildForTests(
G->table(),		G->table(),
{{/State=/2, id("ptr-operator"), Action::goTo(/NextState=/5)},		{
{/State=/3, id("ptr-operator"), Action::goTo(/NextState=/6)}});		{/State=/2, id("ptr-operator"), Action::goTo(/NextState=/5)},
std::vector<ParseStep> PendingReduce = {		{/State=/3, id("ptr-operator"), Action::goTo(/NextState=/6)},
{GSSNode4, Action::reduce(ruleFor("ptr-operator"))}};		{/State=/4, tokenSymbol(tok::identifier),
glrReduce(PendingReduce, {*G, Table, Arena, GSStack},		Action::reduce(ruleFor("ptr-operator"))},
captureNewHeads());		});
		std::vector<const GSS::Node *> Heads = {GSSNode4};
		glrReduce(Heads, tokenSymbol(tok::identifier), {*G, Table, Arena, GSStack});

EXPECT_THAT(NewHeadResults,		EXPECT_THAT(Heads, UnorderedElementsAre(
testing::UnorderedElementsAre(		GSSNode4,
AllOf(state(5), parsedSymbolID(id("ptr-operator")),		AllOf(state(5), parsedSymbolID(id("ptr-operator")),
parents({GSSNode2})),		parents({GSSNode2})),
AllOf(state(6), parsedSymbolID(id("ptr-operator")),		AllOf(state(6), parsedSymbolID(id("ptr-operator")),
parents({GSSNode3})))) << NewHeadResults;		parents({GSSNode3}))))
		<< Heads;
// Verify that the payload of the two new heads is shared, only a single		// Verify that the payload of the two new heads is shared, only a single
// ptr-operator node is created in the forest.		// ptr-operator node is created in the forest.
EXPECT_EQ(NewHeadResults[0]->Payload, NewHeadResults[1]->Payload);		EXPECT_EQ(Heads[1]->Payload, Heads[2]->Payload);
}		}

TEST_F(GLRTest, ReduceJoiningWithMultipleBases) {		TEST_F(GLRTest, ReduceJoiningWithMultipleBases) {
// Before (joining due to same goto state, multiple bases):		// Before (joining due to same goto state, multiple bases):
// 0--1(cv-qualifier)--3(class-name)		// 0--1(cv-qualifier)--3(class-name)
// └--2(cv-qualifier)--4(enum-name)		// └--2(cv-qualifier)--4(enum-name)
// After reducing 3 by `type-name := class-name` and		// After reducing 3 by `type-name := class-name` and
// 4 by `type-name := enum-name`:		// 4 by `type-name := enum-name`:
Show All 15 Lines	const auto *GSSNode2 = GSStack.addNode(
/State=/2, /ForestNode=/CVQualifierNode, /Parents=/{GSSNode0});		/State=/2, /ForestNode=/CVQualifierNode, /Parents=/{GSSNode0});
const auto *GSSNode3 =		const auto *GSSNode3 =
GSStack.addNode(/State=/3, /ForestNode=/ClassNameNode,		GSStack.addNode(/State=/3, /ForestNode=/ClassNameNode,
/Parents=/{GSSNode1});		/Parents=/{GSSNode1});
const auto *GSSNode4 =		const auto *GSSNode4 =
GSStack.addNode(/State=/4, /ForestNode=/EnumNameNode,		GSStack.addNode(/State=/4, /ForestNode=/EnumNameNode,
/Parents=/{GSSNode2});		/Parents=/{GSSNode2});

		// FIXME: figure out a way to get rid of the hard-coded reduce RuleID!
LRTable Table = LRTable::buildForTests(		LRTable Table = LRTable::buildForTests(
G->table(),		G->table(),
{{/State=/1, id("type-name"), Action::goTo(/NextState=/5)},
{/State=/2, id("type-name"), Action::goTo(/NextState=/5)}});
// FIXME: figure out a way to get rid of the hard-coded reduce RuleID!
std::vector<ParseStep> PendingReduce = {
{
GSSNode3, Action::reduce(/RuleID=/0) // type-name := class-name
},
{		{
GSSNode4, Action::reduce(/RuleID=/1) // type-name := enum-name		{/State=/1, id("type-name"), Action::goTo(/NextState=/5)},
}};		{/State=/2, id("type-name"), Action::goTo(/NextState=/5)},
glrReduce(PendingReduce, {*G, Table, Arena, GSStack},		{/State=/3, tokenSymbol(tok::l_paren),
captureNewHeads());		Action::reduce(/* type-name := class-name */ 0)},
		{/State=/4, tokenSymbol(tok::l_paren),
		Action::reduce(/* type-name := enum-name */ 1)},
		});
		std::vector<const GSS::Node *> Heads = {GSSNode3, GSSNode4};
		glrReduce(Heads, tokenSymbol(tok::l_paren), {*G, Table, Arena, GSStack});

// Verify that the stack heads are joint at state 5 after reduces.		// Verify that the stack heads are joint at state 5 after reduces.
EXPECT_THAT(NewHeadResults, testing::UnorderedElementsAre(AllOf(		EXPECT_THAT(Heads, UnorderedElementsAre(GSSNode3, GSSNode4,
state(5), parsedSymbolID(id("type-name")),		AllOf(state(5),
		parsedSymbolID(id("type-name")),
parents({GSSNode1, GSSNode2}))))		parents({GSSNode1, GSSNode2}))))
<< NewHeadResults;		<< Heads;
// Verify that we create an ambiguous ForestNode of two parses of `type-name`.		// Verify that we create an ambiguous ForestNode of two parses of `type-name`.
EXPECT_EQ(NewHeadResults.front()->Payload->dumpRecursive(*G),		EXPECT_EQ(Heads.back()->Payload->dumpRecursive(*G),
"[ 1, end) type-name := <ambiguous>\n"		"[ 1, end) type-name := <ambiguous>\n"
"[ 1, end) ├─type-name := class-name\n"		"[ 1, end) ├─type-name := class-name\n"
"[ 1, end) │ └─class-name := <opaque>\n"		"[ 1, end) │ └─class-name := <opaque>\n"
"[ 1, end) └─type-name := enum-name\n"		"[ 1, end) └─type-name := enum-name\n"
"[ 1, end) └─enum-name := <opaque>\n");		"[ 1, end) └─enum-name := <opaque>\n");
}		}

TEST_F(GLRTest, ReduceJoiningWithSameBase) {		TEST_F(GLRTest, ReduceJoiningWithSameBase) {
Show All 20 Lines	const auto *GSSNode2 =
/Parents=/{GSSNode0});		/Parents=/{GSSNode0});
const auto *GSSNode3 =		const auto *GSSNode3 =
GSStack.addNode(/State=/3, /ForestNode=/StartTerminal,		GSStack.addNode(/State=/3, /ForestNode=/StartTerminal,
/Parents=/{GSSNode1});		/Parents=/{GSSNode1});
const auto *GSSNode4 =		const auto *GSSNode4 =
GSStack.addNode(/State=/4, /ForestNode=/StartTerminal,		GSStack.addNode(/State=/4, /ForestNode=/StartTerminal,
/Parents=/{GSSNode2});		/Parents=/{GSSNode2});

LRTable Table = LRTable::buildForTests(
G->table(), {{/State=/0, id("pointer"), Action::goTo(5)}});
// FIXME: figure out a way to get rid of the hard-coded reduce RuleID!		// FIXME: figure out a way to get rid of the hard-coded reduce RuleID!
std::vector<ParseStep> PendingReduce = {		LRTable Table = LRTable::buildForTests(
{		G->table(), {
GSSNode3, Action::reduce(/RuleID=/0) // pointer := class-name *		{/State=/0, id("pointer"), Action::goTo(5)},
},		{3, tokenSymbol(tok::l_paren),
{		Action::reduce(/* pointer := class-name */ 0)},
GSSNode4, Action::reduce(/RuleID=/1) // pointer := enum-name *		{4, tokenSymbol(tok::l_paren),
}};		Action::reduce(/* pointer := enum-name */ 1)},
glrReduce(PendingReduce, {*G, Table, Arena, GSStack},		});
captureNewHeads());		std::vector<const GSS::Node *> Heads = {GSSNode3, GSSNode4};
		glrReduce(Heads, tokenSymbol(tok::l_paren), {*G, Table, Arena, GSStack});

EXPECT_THAT(NewHeadResults, testing::UnorderedElementsAre(		EXPECT_THAT(
		Heads, UnorderedElementsAre(GSSNode3, GSSNode4,
AllOf(state(5), parsedSymbolID(id("pointer")),		AllOf(state(5), parsedSymbolID(id("pointer")),
parents({GSSNode0}))))		parents({GSSNode0}))))
<< NewHeadResults;		<< Heads;
EXPECT_EQ(NewHeadResults.front()->Payload->dumpRecursive(*G),		EXPECT_EQ(Heads.back()->Payload->dumpRecursive(*G),
"[ 0, end) pointer := <ambiguous>\n"		"[ 0, end) pointer := <ambiguous>\n"
"[ 0, end) ├─pointer := class-name *\n"		"[ 0, end) ├─pointer := class-name *\n"
"[ 0, 1) │ ├─class-name := <opaque>\n"		"[ 0, 1) │ ├─class-name := <opaque>\n"
"[ 1, end) │ └─* := tok[1]\n"		"[ 1, end) │ └─* := tok[1]\n"
"[ 0, end) └─pointer := enum-name *\n"		"[ 0, end) └─pointer := enum-name *\n"
"[ 0, 1) ├─enum-name := <opaque>\n"		"[ 0, 1) ├─enum-name := <opaque>\n"
"[ 1, end) └─* := tok[1]\n");		"[ 1, end) └─* := tok[1]\n");
}		}
▲ Show 20 Lines • Show All 119 Lines • Show Last 20 Lines