This is an archive of the discontinued LLVM Phabricator instance.

[syntax][pseudo] Implement LR parsing table.
ClosedPublic

Authored by hokein on Jan 25 2022, 2:46 PM.

Details

Summary

This patch introduces a dense implementation of the LR parsing table, which is
used by LR parsers. We implement a SLR(1) parsing table from an LR(0) automaton.

Statistics of the LR parsing table on the C++ spec grammar:

number of states: 1449
number of actions: 83069
number of index entries: 72612
size of the table (bytes): 334928

Diff Detail

Event Timeline

hokein created this revision.Jan 25 2022, 2:46 PM
hokein requested review of this revision.Jan 25 2022, 2:46 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 25 2022, 2:46 PM
hokein updated this revision to Diff 404442.Jan 31 2022, 1:41 AM

Polish the LRTable implementation, using two flat vectors

memory-size (bytes): 1380908 => 664600

hokein edited the summary of this revision. (Show Details)Jan 31 2022, 1:41 AM

A few initial comments just on the "easy parts" - changes to grammar and the first/follow set computation.

I need to study the rest. Generally I have a feeling some of the impl is very inefficient but efficiency may not matter much in this context, and it's unclear how much better it can be made without adding lots of complexity. Going to look more closely :-)

clang/include/clang/Tooling/Syntax/Pseudo/Grammar.h
129

This function seems like a bit of an attractive nuisance: it could be a cheap accessor, but isn't. There are just two callers - both are calling it in a loop and both seem dubious to be iterating over all symbols at all. I wonder if we can avoid it entirely.

131

this "augmented symbol" name doesn't seem widely used. (search for lr "augmented symbol" produces few results, mostly unrelated). It's also a confusing name, because it's IIUC it's not the symbol that's augmented, but rather the grammar has been augmented with the symbol...

I'd suggest just calling it the start symbol, unless the distinction is critical (I don't yet understand why it's important - I added it in the prototype just to avoid having the language-specific name "translation-unit" hardcoded).

Otherwise it's hard to suggest a good name without understanding what it's for, but it should be descriptive...

134

In about the same number of words you could explain what this actually is :-)

// Computes the set of terminals that could appear at the start of each non-terminal.

134

is there some reason these functions need to be part of the grammar class?

They seem like part of the LR building to me, it could be local to LRBuilder.cpp, or exposed from LRTable.h or so for testing

clang/include/clang/Tooling/Syntax/Pseudo/LRAutomaton.h
66 ↗(On Diff #404442)

why is std::set used here? (generally a data structure to avoid)

75 ↗(On Diff #404442)

I have a feeling we've discussed this before but...

I find this alias/abstraction unhelpful when reading the code, because operations on states are e.g. state.insert or state.contains, which don't match the abstraction.

I'd probably prefer just using ItemSet, but alternatively struct State { ItemSet Items; } seems like it would work well: State.Items.contains(...) reads very naturally.

clang/lib/Tooling/Syntax/Pseudo/Grammar.cpp
49

this function could use some comments.

A rule S := T ... implies elements in first(S):
 - if T is a terminal, first(S) contains T
 - if T is a nonterminal, first(S) contains first(T)
Since first(T) may not have been fully computed yet,
first(S) itself may end up being incomplete.
We iterate until a fixed point.
(This isn't particularly efficient, but table building performance isn't critical).
61

bool Changed = true; while(Changed) {? or maybe for (bool Changed = true; Changed;) {?

(do-while is rare enough that it always takes me a bit to grasp the control flow)

66

comment that we only need to consider the first element because symbols are non-nullable

98

nit: ... Y Z, with consistent spaces/capitalization
Unless case is meant to imply something here?

105

copying the set here seems gratuitous, especially since we're doing this in a loop where !changed is the usual thing.

I think inlining ExpandFollowSet for each case would make sense here. It turns into 1 line for the NT case plus two lines for the T case, so it's actually shorter overall

111

I think if (isNonterminal(Z)) ...expand... would be clearer here, since the condition is really part of rule 3 (but in a not-totally obvious way, it's nice to see them side-by-side)

clang/lib/Tooling/Syntax/Pseudo/LRBuilder.cpp
80 ↗(On Diff #404442)

nit: please be consistent about spacing & capitalization for rules

I think trying to use a convention of uppercase/lowercase for term/nonterm is difficult because there's no clear way to indicate that a symbol could be either, which is common.

80 ↗(On Diff #404442)

I can't really understand this description, it'd be helpful if it didn't reuse the name "advance" (particularly as a noun)

153 ↗(On Diff #404442)

this loop looks like it's iterating over every possible symbol that might appear next, rather than just the ones that do (and creating a temporary array of 1000 items each outer iteration to do it).

Seems like we could do better by instead advancing every rule, and spitting out a list of {symbol, ItemSet} pairs?

157 ↗(On Diff #404442)

Hmm, and advance() is computing closures from scratch every time, probably just to find "oh, the state already existed". Seems like we could save work by using the kernel set to represent the full set until we actually need to iterate over its items.

I'd like to think about this some more.

OK, a bunch of random notes but I'm getting a higher level picture.

I'm not sure about the split between LRAutomaton and LRTable. I suppose doing it in two stages simplifies the implementation, but I don't think the first stage particularly needs to be public.
Also I think the names are confusing: Automaton vs Table isn't a clear contrast as one describes behavior and one a data structure. And the "LRTable" seems more automaton-like to me!

I think my suggestion would be:

  • call the final structure LRAutomaton
  • call the LRAutomaton::build()::Builder structure LRGraph or something
  • I'm not sure the LRAutomaton structure needs to exist exactly, seems like the current LRTable::build() could operate on LRGraph directly

but I'm going to ponder this more.

clang/include/clang/Tooling/Syntax/Pseudo/LRAutomaton.h
32 ↗(On Diff #404442)

accepting the grammar as a parameter rather than the rule length would let us ensure the length is correct locally rather than relying on the caller.
Essentially all the callers need to do a lookup anyway.

45 ↗(On Diff #404442)

nit: redundant pseudo::qualifier

47 ↗(On Diff #404442)

nit: unsigned.

just dot()?
and similarly ruleID()->rule()?
Not sure these would really be unclear

79 ↗(On Diff #404442)

Hmm, isn't the point of GLR that it's a nondeterministic automaton?

We've done the LR-style NFA->DFA conversion, but it's not complete (when we hit a conflict we continue rather than giving up)

clang/lib/Tooling/Syntax/Pseudo/LRBuilder.cpp
144 ↗(On Diff #404442)

nit: too many braces - please spell out some of the types
(and Auto should probably be State here?)

215 ↗(On Diff #404442)

These comments are echoing the code - saying what, not why.

"If we've just parsed the start symbol, we can accept the input".

220 ↗(On Diff #404442)

Seems like we should maybe have the accepted symbol as a payload on the Accept action?

Actually why do we need an Accept action, as opposed to just a reduce action and recognize it because it's the symbol we want and the stack is empty?

224 ↗(On Diff #404442)

"If we've reached the end of a rule A := ..., then we can reduce if the next token is in the follow set of A"

OK, I think I understand it now! Sorry for so many comments, happy to chat offline about which ones to address/where to start.

clang/include/clang/Tooling/Syntax/Pseudo/LRTable.h
40

API polish: this class is a mix of accessors and direct member state - it can be hard to know how to best interact with such a thing.

I'd suggest:

  • public factories static Action Action::reduce(RuleID) etc, private data & constructor
  • expose Kind kind(), but not the individual is() methods: they don't buy much and this pattern encourages switch where appropriate
  • make Kind enum rather than enum class - Action::Shift is sufficiently scoped already
  • shift() & goTo() -> targetState() which asserts either condition. Apart from anything else, this makes naming easier: "shift" is a verb so shift() is a confusing name.
40

mention that this combines the Action and Go-to tables from LR literature?

42

I think it's worth documenting what each of these actions means.

Referring to enum values as "action table" and "goto table" seems confusing - they're part of the same table! The main reference point when reading the code needs to be the code, rather than the textbook.

68

if you make the data members private the union could just be a uint16 and save yourself some trouble here :-) Up to you

85

This action struct can be squeezed to 16 bits.

Kind only needs 3 bits (I suspect it can be 2: Error and Accept aren't heavily used).
RuleID is 12 bits, StateID would fit within 11 and 12 should be safe.

Combined with the optimization suggested below, this should get the whole table down to 335KB.

103

nit: please be consistent with Nonterminal (and "nonterminal") vs NonTerminal (and "non-terminal"). I think Nonterminal is used elsewhere in the code.

106

Why is this assertion valid? Is it a precondition of the function that some item in From can advance over NonTerminal?

122

I think there's an even more compact representation:
Basically sort the index keys as symbol-major, but then don't store the actual symbol there but rather store the range for the symbol in an external array:

// index is lookahead tok::TokenKind. Values are indices into Index: the range for tok is StateIdx[TokenIdx[tok]...TokenIdx[tok+1]]
vector<unsigned> TokenIdx;

// index is nonterminal SymbolID. Values are  indices into Index: the range for sym is StateIdx[NontermIdx[sym]...NontermIdx[sym+1]].
vector<unsigned> NontermIdx;

// parallel to actions, value is the from state. Only subranges are sorted.
vector<StateID> StateIdx;

vector<Action> Actions;

This should be 4 * (392 + 245) + (2 + 4) * 83k ~= 500k. And the one extra indirection should save 5-10 steps of binary search.

This is equivalent to 2 * vector<unsigned> + vector<pair<StateID, Action>> but keeping the searched index compact is probably a good thing.

clang/lib/Tooling/Syntax/Pseudo/LRBuilder.cpp
74 ↗(On Diff #404442)

As far as I can tell, this is the only place where we need to deduplicate/check for membership in the set.

WDYT about using a local DenseSet<Item> in closure() itself for this, and making the representation of ItemSet a (small)vector?

(IIUC the reason for std::set over DenseMap is determinism. Using vector should preserve that: if you really want sorting you can sort after computing closure, but if we just want a well-defined order to ensure stable state numbers then insertion order should work fine)

109 ↗(On Diff #404442)

using hashcode instead of the key itself should be explained.
(If it's just an issue with hash of std::set, I think using std::vector fixes it)

157 ↗(On Diff #404442)

Yeah, I think this should work. Builder would maintain a map from (sorted kernel items) => stateID, and Builder.insert() would call closure() only if actually inserting.

clang/tools/clang-pseudo/ClangPseudo.cpp
52

I think that we should also have flags to:

  • dump the grammar
  • dump the LR table
hokein marked 7 inline comments as done.Feb 4 2022, 5:59 AM
hokein added inline comments.
clang/include/clang/Tooling/Syntax/Pseudo/Grammar.h
131

yeah, it is about augmented grammar. Strictly speaking, it is a start symbol of augmented symbol. Maybe we should not speak augmented stuff in the code, it is not a well-known term, and causes confusion. rename it to start symbol instead.

It looks like using augmented grammar is a "standard" LR technique, it introduces a converged start state and accepting state in the LR graph, which makes it easier for the implementation. And it is a good fit for support multiple start symbols.

134

as discussed, made them as free functions in Grammar.h. Separate the first and follow set changes to https://reviews.llvm.org/D118990, comments around them should be addressed there.

clang/lib/Tooling/Syntax/Pseudo/Grammar.cpp
98

Y was for nonterminal while z was for nonterminal or terminals. They are not super important, changed all to Uppercase.

hokein marked 11 inline comments as done.Feb 7 2022, 11:39 AM

comments around LRAutomaton should be addressed in the separate patch, https://reviews.llvm.org/D119172.

clang/include/clang/Tooling/Syntax/Pseudo/LRAutomaton.h
32 ↗(On Diff #404442)

right, but we still need this one, for constructing an empty/Tombstone items for DenseMapInfo.

66 ↗(On Diff #404442)

the main reason to we want a deterministic order for computing the hash value. Changed to a sorted vector.

79 ↗(On Diff #404442)

yes and no. There are a few conceptually difference here.

GLR is indeed a nondeterministic parser, but I think the term deterministic/nondeterministic in parser is different than the ones used in automaton context.
A deterministic parser (typical LR) has a property that there will always be only one possible action to choose from, thus it is fast (linear), and it produces only one syntax tree;
A deterministic automaton is described here.

We've done the LR-style NFA->DFA conversion, but it's not complete (when we hit a conflict we continue rather than giving up)

Not exactly, there is no NFA->DFA conversion here. What this patch does:

  1. we constructs a deterministic LR(0) automaton (DFA) directly, this is what LRAutomaton represents;
  2. based on the DFA, we construct an LR table (called Action and GoTo Table in standard LR). In particular, we use the FollowSet to determine reduce actions, it is called SLR(1) table which is powerful the the LR(0) table.

A conflict in the LRTable doesn't imply an incomplete DFA. For an arbitrary grammar (no matter it is ambiguous or not), we can always construct a DFA. For example, based on the same LR(0) automaton, we could build a SLR(1) table (without conflicts) or LR(0) table (with conflicts).
The real problem lies in the interpretation of states of the automaton, considering a node in the automaton with state 0

E := T. + E
E := T.

and the node as a "+" outer edge, if the next symbol is +, we have two options "reduce E := T" or shift +. SLR can tell reduction is not available + is not in the FOLLOW(E) (thus no conflict) while LR(0) will accept both (thus conflicts).

I added some bits in the file comment of the LRGraph.h, hope it can clarify these concepts.

clang/lib/Tooling/Syntax/Pseudo/LRBuilder.cpp
109 ↗(On Diff #404442)

using the ItemSet as key seems heavy (for cxx.grammar, 65KB for llvm::DenseMap<itemset, ..> vs 32KB for llvm::DenseMap<hash_code>) and unnecessary (we don't need to access the Itemset).

Added comments.

144 ↗(On Diff #404442)

too many braces - please spell out some of the types

this is one of cons of using inlay-hints :(

hokein updated this revision to Diff 407862.Feb 11 2022, 6:25 AM
hokein marked 10 inline comments as done.
  • rebase, rescope the patch to LRTable
  • refine the Action class interfaces, and nameing;
  • use a more compact Index, reduce LRTable from 660KB => 335KB;
  • address review comments;
hokein added inline comments.Feb 11 2022, 6:27 AM
clang/include/clang/Tooling/Syntax/Pseudo/LRTable.h
85

squeezed to 16 bits (3 bits for Kind, and 13 bits for the Value).

106

This is guaranteed by a DFA (a node can not have two out edges with a same label), the same reason why we don't have shift/shift conflicts.

122

This looks like a nice improvement, though the implementation is a little tricky -- we reduced the binary-search space from 2 dimension (State, Symbol) to 1 dimension Symbol.

hokein edited the summary of this revision. (Show Details)Feb 11 2022, 6:27 AM

This is really nice, mostly nits.

clang/include/clang/Tooling/Syntax/Pseudo/Grammar.h
179

why is this exposed/required rather than being initialized by the GrammarTable constructor?
Since this is essentially static (must always correspond to tok::TokenKind) it seems that GrammarTable could just have an ArrayRef and it could be initialized by a lazy-init singleton:

// in Grammar.cpp
static ArrayRef<std::string> getTerminalNames() {
  static std::vector<std::string> *TerminalNames = []{

  };
  return *TerminalNames;
}

(I know eventually we'd like GrammarTable to be generated and have very minimal dynamic init, but there are lots of other details needed e.g. we can't statically initialize vector<Rule> either, so we should cross that bridge when we come to it)

clang/include/clang/Tooling/Syntax/Pseudo/LRTable.h
29

space -> sparse

59

Maybe mention relation to GLR: GLR can execute these in parallel?

68

nit: I think you want this to be a comment on the section rather than on Error, leave a blank line?

69

Error seems like an action we'd dynamically handle, rather than an unreachable sentinel.

I'd prefer to call this sentinel and llvm_unreachable() on it in the relevant places. Even if we do plan dynamic error actions, we have enough spare bits to keep these two cases separate.

81

again blank line between "nonterminal actions" and "go to state n"

81

pust -> push?

I thought goto replaces the top of the stack rather than being pushed onto it.

e.g.
stack is { stmt := . expr ; }, lookahead is IDENT, action is shift "expr := IDENT ."
stack is { stmt := . expr ; | expr := IDENT . }, lookahead is semi, action is reduce
stack is { stmt := . expr ; }, reduced symbol is expr, goto is state "stmt := expr . ;"
stack is { stmt := expr . ;}, lookahead is semi...

Line 3=>4 doesn't seem like a push

107

maybe value()=>opaque() or asInteger()?

Partly to give a hint a bit more at the purpose, partly to avoid confusion of Value != value()

115

static assert RuleBits <= ValueBits, and I think we want a StateBits above plus some assertions on the number of states when building

115

FWIW I've mostly seen just unsigned as the field type for bitfields. It seems a little confusing to explicitly specify the size both as 16 and 13 bits.

Up to you, but if you change this one also consider the Kind enum.

137

This is quite a lot of surface (together with the DenseMapInfo stuff) to expose.
Currently it looks like it needs to be in the header so that the LRTableTest can construct a table directly rather than going through LRGraph.

I think you could expose a narrower interface:

struct Entry { ;... };
// Specify exact table contents for testing.
static LRTable buildForTests(ArrayRef<Entry>);

Then the builder/densemapinfo can be moved to the cpp file, and both buildForTests and buildSLR can use the builder internally. WDYT?

155

maybe "value is the offset into States/Actions where the entries for this symbol begin".

164

Reading the building/lookup code, "index" and "idx" are used so often they're hard to follow.

I think calling this "States", parallel to "Actions", would be clearer.

Similarly maybe the NontermIdx -> NontermOffset (Offset into States/Actions arrays where we find entries for this nonterminal).
Then we can just use the word "index" for actual subscripts.

166

concepetually -> conceptually.

I think mixing this comment in with the concrete data is confusing.
Maybe lift it to the top of the private section like:

// Conceptually this is a multimap from (State, SymbolID) => action.
// In the literature this is often a table (and multiple entries, i.e. conflicts are forbidden).
// Our physical representation is quite different for compactness.
clang/lib/Tooling/Syntax/Pseudo/LRTable.cpp
26

as things currently stand this should be unreachable - these values should never escape the DenseMap. assert or llvm_unreachable?

98

(moving the comment thread to the new code location)

The DFA input guarantees Result.size() <= 1, but why can't it be zero?
If this is a requirement on the caller, mention it in the header?

163

nit: "assign" seems clearer than "resize"

166

We end up with holes because we're looping over the values rather than the keys. This also feels quite indirect.

Seems clear enough to reverse this, something like:

unsigned Pos = 0;
for (unsigned NT = 0; TK < GT.Nonterminals.size(); ++I) {
  NontermIdx[NT] = Pos;
  while (Pos < Sorted.size() && Sorted[Pos].Action == NT)
    ++Pos;
}
NontermIdx.back() = Pos;
// and the same for terminals
clang/lib/Tooling/Syntax/Pseudo/LRTableBuild.cpp
20

(I think it would be worth moving this into LRTable in order to hide the Builder type from the header...)

32

(rephrasing a comment that got lost earlier in the review)

I'm not totally clear on the scope/purpose of the accept action:

  • if it's meant to be sufficient to know whether the parse succeeded, I think it needs the symbol we accepted as a payload. The grammar will accept them all, but the caller will place restrictions.
  • if we're OK inspecting the stack and seeing our last reduction was a Decl or so, why doens't the parser just do that at the end of the stream and do away with Accept altogether? (This would avoid the need to splice eof tokens into token streams to parse subranges of them).
clang/unittests/Tooling/Syntax/Pseudo/LRGraphTest.cpp
0

Seems sensible to combine the table-building tests in here, but it does make the organization of the tests hard.
(Particularly since LRTableTests.cpp exists but the most important tests are in this file instead).

Since these cross module boundaries, and they are exactly "bundle of text in, bundle of text out"...
how do you feel about making them lit tests instead?

# RUN: clang-pseudo -grammar %s -print-graph | FileCheck --check-prefix=GRAPH
# RUN: clang-pseudo -grammar %s -print-table | FileCheck --check-prefix=TABLE
_ := expr
...
#      GRAPH: States:
# GRAPH-NEXT: ...
#      TABLE: LRTable:
# TABLE-NEXT: ...
hokein updated this revision to Diff 409892.Feb 18 2022, 3:33 AM
hokein marked 14 inline comments as done.
  • address comments
  • more api polishments
  • LRGraph unittest => lit tests
  • hide LRTable::Builder, move to LRTableBuild.cpp
hokein updated this revision to Diff 409893.Feb 18 2022, 3:35 AM

update

clang/include/clang/Tooling/Syntax/Pseudo/Grammar.h
179

fair enough, let's hide it in a GrammarTable contructor.

clang/include/clang/Tooling/Syntax/Pseudo/LRTable.h
69

changed to Sentinel, and remove Error action for now.

81

yeah, this is mostly right -- stack in step 2 has two frames, rather than a combined one.

The truth is:

  1. stack is [{ _ := . stmt | stmt := . expr ; }], lookahead is IDENT, action is shift "expr := IDENT ." (push a new state to the stack)
  2. stack is [{ _ := . stmt | stmt := . expr ; }, { expr := IDENT . }], lookahead is semi, action is reduce
  3. stack is [{ _ := . stmt | stmt := . expr ; }], reduced symbol is expr, goto is state "stmt := expr . ;"
  4. stack is [{ _ := . stmt | stmt := . expr ;}, { stmt := expr . ;}], lookahead is semi, action is shift "stmt := expr ; ."
  5. stack is [{ _ := . stmt | stmt := . expr ;}, { stmt := expr . ;}, { stmt := expr ; .}], lookahead is eof, action is reduce "stmt := expr ;"
  6. stack is [{ _ := . stmt | stmt := . expr ;}], reduced symbol is stmt, goto state is "_ := stmt ."
  7. stack is [{ _ := . stmt | stmt := . expr ;}, { _ := stmt .}], lookahead is eof, action is accept, the parsing is done

Step 3 => 4 is a push.

107

renamed to opaque.

137

Yeah, I exposed the Builder mainly for testing purposes. The new interface looks good to me.

clang/lib/Tooling/Syntax/Pseudo/LRTable.cpp
98

yeah, getGoToState and getActions are expected to be called by the GLR parser, the parser itself should guarantee it during parsing. Added comments.

166

yeah, this is much better!

clang/lib/Tooling/Syntax/Pseudo/LRTableBuild.cpp
20

I tend to move all build bits (Builder, BuildSLRTable, BuildFor tests) to this file rather than LRTable.cpp.

32

The accept action is following what is described in the standard LR literature.

Yeah, the main purpose of the accept action is to tell us whether a parse succeeded.
We could add a ruleID as a payload for accept action, but I'm not sure whether it would be useful, the associated rule is about _ (e.g. _ := translation_unit) -- we are interested in translation_unit, this can be retrieved from the forest.

I think the both options are mostly equivalent (the 1st option is a standard LR implementation, the 2rd one seems more add-hoc)-- we can treat accept action as a "special" reduce action of the rule "_ := translation_unit" except that we don't do the reduce, we just stop parsing.

I think we probably will revisit later -- things become tricker, if we start supporting snippets (the start symbol _ will have multiple rules), e.g. for the follow grammar,

_ := stmt
_ := expr
stmt := expr
expr := ID

the input "ID" can be parsed as a "stmt" or an "expr", if we use a unified state { _ := . stmt | _ := . expr } to start, we will have a new-type conflict (accept/reduce). I don't think we want to handle this kind of new conflicts. There are some options:

  • careful manage the _ rules, to make sure no such conflicts happened;
  • introduce a new augmented grammar __ := _ (the accept/reduce would become reduce/reduce conflict);
  • use separate start state per _ rule, and callers need to pass a targeting non-terminal to the parser;

Since there are some uncertain bits here, I'd prefer not doing further changes in this patch (but happy to add a ruleID payload for accept action). WDYT?

clang/unittests/Tooling/Syntax/Pseudo/LRGraphTest.cpp
0

I don't have strong opinion about these, both of them work. I think having options in the clang-pseudo tool to dump graph/table is a helpful feature for debugging as well, using lit tests might make more sense.

sammccall accepted this revision.Feb 21 2022, 1:24 AM

Thanks, I really like the way the tests look now!

clang/include/clang/Tooling/Syntax/Pseudo/LRTable.h
79

// Pops

81

parsd -> parsed

clang/lib/Tooling/Syntax/Pseudo/LRTable.cpp
29

nit: goTo -> go to, like dumpForTests?

This revision is now accepted and ready to land.Feb 21 2022, 1:24 AM
hokein updated this revision to Diff 410725.Feb 23 2022, 12:12 AM
hokein marked 3 inline comments as done.

address comments

This revision was landed with ongoing or failed builds.Feb 23 2022, 12:21 AM
This revision was automatically updated to reflect the committed changes.

Hi, this commit is causing runtime failures on Windows in debug builds. Can you please correct or revert? Thanks!

clang/lib/Tooling/Syntax/Pseudo/LRTable.cpp
119

This is causing an assertion with debug builds on Windows because Actions[End] is out of bounds (so the MSVC STL debug assertions catch the issue) for the test cases in this patch.

Hi, this commit is causing runtime failures on Windows in debug builds. Can you please correct or revert? Thanks!

sorry for the trouble. I'm mostly running out of time today, I will revert this patch, and fix it tomorrow.

Hi, this commit is causing runtime failures on Windows in debug builds. Can you please correct or revert? Thanks!

sorry for the trouble. I'm mostly running out of time today, I will revert this patch, and fix it tomorrow.

Thanks!

hokein added inline comments.Feb 23 2022, 12:36 PM
clang/lib/Tooling/Syntax/Pseudo/LRTable.cpp
119

this should be llvm::makeArrayRef(&Actions[Start], End - Start). Fixed in 302ca279cb83043ef7d60115eb5ba58f12064a4a.

aaron.ballman added inline comments.Feb 23 2022, 12:45 PM
clang/lib/Tooling/Syntax/Pseudo/LRTable.cpp
119

I can confirm that the issue is now fixed for me, thank you!