- declaration recovery strategy: search for likely declaration boundaries
- change decl-sequence to right-recursive to simplify this
- also use for class members (already right-recursive)
- refactor recovery strategy to use struct parameter and expose cursor position
Details
- Reviewers
sammccall
Diff Detail
- Repository
- rG LLVM Github Monorepo
- Build Status
Buildable 188377 Build 284495: arc lint + arc unit
Event Timeline
clang-tools-extra/pseudo/lib/cxx/CXX.cpp | ||
---|---|---|
426 | I think this excludes the parameter-declaration (from the grammar spec, it is a different nonterminal). | |
426 | The logic of implementations looks very solid. I think it would be good to have some unittests for it. | |
441 | if I read the code correctly, the following case should work like void foo() {} LLVM_DISCARD int bar() {} // this is an opaque `declaration` node? void foo2() {} // it won't work if we change void -> Foo. | |
450 | the list looks reasonable -- I think we can lookup the Follow(declaration) set to find more. | |
460 | maybe consider tok::identifier (if the identifier is a type specifier, it is a good indicator of a declaration starting) as well? And yeah, we're less certain about it, but since we use the indentation information, I guess it might probably work well? | |
511 | I think we miss to check the return index >= Cursor here. | |
clang-tools-extra/pseudo/lib/cxx/cxx.bnf | ||
308 | Just want to spell out, this causes a performance hit :( left-cursive grammar is more friendly to LR-like parsers, because the parser can do the reduction earlier (after a parsed a declaration, we can perform a declaration-seq reduce); while right-cursive grammar, the reduction starts from the last declaration (this means our GSS is more deeper).
I remembered we discussed this bit before, but I don't really remember the details now :( | |
clang-tools-extra/pseudo/test/cxx/recovery-declarations.cpp | ||
7 | nit: add some tests to cover the initializer, e.g. auto lambda = [] xxxx () xxx {}; |
- rebase, fixed a few conflicts
- address comments
- fix some bugs
- restructure the code, moving the implementation to a separate file
- add unittests for recoveryNextDeclaration
- keep the left-recursive declaration-seq grammar rule to void performance regression
I think the current version should be good to review, some bits need to take care of:
- left-recursive vs right-recursive of declaration-seq rule, see my other comments about it. Currently I keep it as-is to avoid the performance regression on large files (I still don't see your point how right-recursive can simplify the error recovery, it seems to me either can work at least for this case)
- recoveryNextDeclaration will consume at least 1 token (the return token index > Cursor) to avoid the risk of running in an infinite loop
- I keep the original implementation as-is (except fixing some bugs), but IMO T->Indent == OrigIndent && LikelyStartsDeclaration(T->Kind) is too strict (see my FIXME testcases), and we should probably include the IDENTIFIER (I believe it is a common case), LikelyStartsDeclaration(T->Kind) || (T->Kind = IDENTIFIER && T->Indent == OrigIndent) might be better.
clang-tools-extra/pseudo/lib/cxx/CXX.cpp | ||
---|---|---|
450 | added more: kw_friend, kw_inline, kw_explicit, kw_virtual. | |
clang-tools-extra/pseudo/lib/cxx/cxx.bnf | ||
308 | Got more data, it does hurt the performance on large files: SemaExpr.cpp: 9.54MB/s -> 7.98MB/s small/medium files doesn't seem to be affected (hypothesis: these files has less declarations than large files, thus being less affected by the right-recursive declaration-seq grammar rule) Hover.cpp: 8MB/s -> 7.92MB/s |
I think this excludes the parameter-declaration (from the grammar spec, it is a different nonterminal).