This is an archive of the discontinued LLVM Phabricator instance.

[pseudo] Remove unnecessary user-defined-string-literal rule.
Needs ReviewPublic

Authored by hokein on May 17 2022, 7:53 AM.

Details

Reviewers
sammccall
Summary

We accidently define two identical rules for user-defined-string-literal
(one left recursive and the other one is right recursive), it explodes
states when parsing a long long "" "" "" string literal.

TEST on a huge file which contains ~2.6w "" string literal:

before this patch: more than minutes
after this patch: < 1s

Diff Detail

Event Timeline

hokein created this revision.May 17 2022, 7:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2022, 7:53 AM
hokein requested review of this revision.May 17 2022, 7:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2022, 7:53 AM
hokein added inline comments.May 17 2022, 7:56 AM
clang-tools-extra/pseudo/lib/cxx.bnf
716

I also wrote this as right recursive, it significantly reduces the states in the GSS (35% saving!)

before:

Forest bytes: 1898128 nodes: 105456
GSS bytes: 3801192 nodes: 158029

after:

Forest bytes: 1898128 nodes: 105456
GSS bytes: 2539624 nodes: 105433

The intent here is that user-defined-string-literal-chunk := STRING_LITERAL only matches when there's a ud-suffix. And string-literal-chunk := STRING_LITERAL only matches when there isn't.
(e.g. with a rule guard, which we don't have implemented)
If this was happening, would we still have the explosion?

I believe (once the guards are added) this patch will lead to rejecting "foo" "bar"_ud "baz".