This is an archive of the discontinued LLVM Phabricator instance.

[pseudo] Eliminate the type-name identifier ambiguities in the grammar.
ClosedPublic

Authored by hokein on Jul 29 2022, 12:39 AM.

Details

Summary

See https://reviews.llvm.org/D130626 for motivation.

Identifier in the grammar has different categories (type-name, template-name, namespace-name), they requires semantic information to resolve.
This patch is to eliminate the "local" ambiguities in type-name, and namespace-name, which gives us a performance boost of the parser:

  • eliminate all different type rules (class-name, enum-name, typedef-name), fold them into a unified type-name, this removes the #1 type-name ambiguity, and gives us a big performance boost;
  • remove the namespace-alis rules, as they're hard and uninteresting;

Note that we could eliminate more and gain more performance (like fold template-name, type-name, namespace together), but at current stage, we'd like keep all existing categories of the identifier (as they might assist in correlated disambiguation & keep the representation of important concepts uniform).

Diff Detail

Event Timeline

hokein created this revision.Jul 29 2022, 12:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 29 2022, 12:39 AM
hokein requested review of this revision.Jul 29 2022, 12:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 29 2022, 12:39 AM
hokein updated this revision to Diff 453098.Aug 16 2022, 12:35 PM

update based on the offline discussion -- we'd like to keep all categories of
IDENTIFIER (type-name, namespace-name, template-name, template-name) as they
are useful in disambiguation, but we eliminate the ambiguities per each category

  • eliminate all different type rules (class-name, enum-name, typedef-name), fold them into a unified type-name, this removes the #1 type-name ambiguity, and gives us a big performance boost;
  • remove the namespace-alis rules, as they're less interesting and marginal useful;

New data on the patch:

fileambiguous nodesforest sizeglrParse performance
SemaCodeComplete.cpp11k -> 5.7K10.4MB -> 7.9MB7.1MB/s -> 9.98MB/s
AST.cpp1.3k -> 0.73K0.99MB -> 0.77MB6.7MB/s -> 8.4MB/s
clang-tools-extra/pseudo/lib/cxx/cxx.bnf
39–40

this can be removed as well, but the class-head-name rule uses it.

sammccall accepted this revision.Aug 16 2022, 3:04 PM

LG, though i think we should eliminate class-name altogether.

This gives most of the perf, which is nice!

The reduction in ambiguous nodes is less impressive than before :-( As discussed offline, i think the remaining ambiguous nodes serve a purpose as they will assist in correlated disambiguation & keep the representation of important concepts uniform. I may be wrong about that, in which case we can still drop them later.

The forest size reduction is nice to have, but least critical imo.

clang-tools-extra/pseudo/lib/cxx/cxx.bnf
37–38

Maybe "we don't distinguish between namespaces and namespace aliases, as it's hard and uninteresting"?

This revision is now accepted and ready to land.Aug 16 2022, 3:04 PM
sammccall added inline comments.Aug 16 2022, 3:07 PM
clang-tools-extra/pseudo/lib/cxx/cxx.bnf
39–40

Can't we use class-head-name := nested-name-specifier_opt type-name as class-name and type-name now match?

(Oops, not sure how i lost this comment)

hokein updated this revision to Diff 453259.Aug 17 2022, 4:35 AM
hokein marked 2 inline comments as done.

address comments.

LG, though i think we should eliminate class-name altogether.

This gives most of the perf, which is nice!

The reduction in ambiguous nodes is less impressive than before :-( As discussed offline, i think the remaining ambiguous nodes serve a purpose as they will assist in correlated disambiguation & keep the representation of important concepts uniform. I may be wrong about that, in which case we can still drop them later.

Yeah, this is because we still keep two critical ambiguous nodes:

  • simple-type-specifier (type-name vs template-name), e.g. for the CTAD case;
  • nested-name-specifier (type-name vs namespace-name), e.g. for a::b::c case;

These two ambiguous nodes are the top1 and top2 now.

The forest size reduction is nice to have, but least critical imo.

clang-tools-extra/pseudo/lib/cxx/cxx.bnf
39–40

ah, this is clever, I missed that the fact that class-name and type-name are identical now.

hokein retitled this revision from [pseudo] wip/prototype: eliminate identifier ambiguities in the grammar. to [pseudo] Eliminate the type-name identifier ambiguities in the grammar..Aug 17 2022, 4:46 AM
hokein edited the summary of this revision. (Show Details)
This revision was landed with ongoing or failed builds.Aug 17 2022, 5:31 AM
This revision was automatically updated to reflect the committed changes.