- Extend the GLR parser to allow conditional reduction based on the guard functions;
- Implement two simple guards (contextual-override/final) for cxx.bnf;
- layering: clangPseudoCXX depends on clangPseudo (as the guard function need to access the TokenStream);
Details
- Reviewers
sammccall - Commits
- rG9ab67cc8bfe7: [pseudo] Implement guard extension.
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
The ParseLang + CLI library seem clearly separate from everything else, I think this patch should be split.
Comments on that part first...
clang-tools-extra/pseudo/benchmarks/Benchmark.cpp | ||
---|---|---|
68 | getLanguage() should not be inside the loop. This is happening in a few places Maybe it should stay in setupGrammarAndSource? | |
clang-tools-extra/pseudo/include/clang-pseudo/Language.h | ||
36 | why are these all references with ParseLang (presumably) passed by value, instead of values with ParseLang passed by reference? | |
clang-tools-extra/pseudo/include/clang-pseudo/cli/CLI.h | ||
25 | nit: I don't think a namespace is needed here, compared to just getLanguageFromFlags() | |
27 | either "language" is a clear enough name or it isn't - given naming elsewhere this should be getParseLang I think | |
clang-tools-extra/pseudo/lib/GLR.cpp | ||
298–299 | I'm not sure exactly what the best thing to do about this is, but Params.Lang.G is hard to read - it's a long complicated name that doesn't even name the thing. I think probably we should split ParseParams up instead of nesting ParseLang in it, e.g. struct GLRStorage { ForestArena&, GSS } glrReduce(vector<ParseStep>, const ParseLang&, GLStorage&, NewHeadCallback) Or even put a forest pointer in the GSS just for convenience. | |
clang-tools-extra/pseudo/lib/cli/CLI.cpp | ||
53 | huh? diagnostics aren't fatal (at least haven't been considered so before) | |
54 | "missing guard function" is probably a grammar diagnostic, maybe we should add some logic to validate extensions? | |
58–59 | I don't think the CLI library should be in charge of knowing the fallback logic for each type of extension. | |
clang-tools-extra/pseudo/tool/ClangPseudo.cpp | ||
94–95 | drop if(true)? | |
clang-tools-extra/pseudo/unittests/GLRTest.cpp | ||
51–59 | either do it or don't - this is too small to check in a fixme IMO | |
57 | can this just be a member? |
(comments for guard part)
clang-tools-extra/pseudo/include/clang-pseudo/Language.h | ||
---|---|---|
23 | nit: I think the first sentence should try to describe what this *is* in as simple terms as possible, rather than how it interacts with the rest of the system. // A guard restricts when a grammar rule can be used | |
26 | I think it's worth including an example here. "e.g. a guard may allow virt-specifier := IDENTIFIER only if the identifier's text is 'override'". | |
28 | I think this sentence adds more confusion than it helps. (Not sure it's true depending on what you mean by "parsing branch" - the guard runs before the reduce not after, so the branch never exists) | |
clang-tools-extra/pseudo/lib/GLR.cpp | ||
306 | I think diagnosing missing guards but then treating them as always-passing is less surprising and more ergonomic while modifying a grammar |
clang-tools-extra/pseudo/lib/GLR.cpp | ||
---|---|---|
298–299 | Changed the signature of glrShift/Reduce/Parse. | |
306 | I wanted to make this as a contract (guard in each rule must refer to a valid extension function) in GLR parser. I think loosing the restriction is also fine -- the only downside I can see is that if we have a typo guard in the grammar file, which we don't have a corresponding function, the parser will run and emit confusing results. We can fix it by adding "missing guard function" diagnostics. |
Great!
clang-tools-extra/pseudo/include/clang-pseudo/GLR.h | ||
---|---|---|
116–117 | delete "being". | |
clang-tools-extra/pseudo/lib/GLR.cpp | ||
295 | nit: hadn't really intended SequenceRef to be a general vocabulary type, you might just want ArrayRef to match the guard signature, up to you | |
299 | this should be guarded by LLVM_DEBUG this whole thing might be clearer with lookup if (auto Guard = Lang.Guards.lookup(GuardID)) return Guard(RHS.S, Params.Code); LLVM_DEBUG(...); return true; | |
clang-tools-extra/pseudo/unittests/GLRTest.cpp | ||
435 | insert(make_pair(...)) => emplace(...)? | |
438 | remove |
getLanguage() should not be inside the loop. This is happening in a few places
Maybe it should stay in setupGrammarAndSource?