This is an archive of the discontinued LLVM Phabricator instance.

[pseudo] Eliminate the false `::` nested-name-specifier ambiguity
ClosedPublic

Authored by hokein on Jul 25 2022, 1:05 PM.

Details

Summary

The solution is to favor the longest possible nest-name-specifier, and
drop other alternatives by using the guard, per C++ [basic.lookup.qual.general].

Motivated cases:

Foo::Foo() {};
// the constructor can be parsed as:
//  - Foo ::Foo(); // where the first Foo is return-type, and ::Foo is the function declarator
//  + Foo::Foo(); // where Foo::Foo is the function declarator
void test() {

// a very slow parsing case when there are many qualifiers!
X::Y::Z;
// The statement can be parsed as:
//  - X ::Y::Z; // ::Y::Z is the declarator
//  - X::Y ::Z; // ::Z is the declarator
//  + X::Y::Z;  // a declaration without declarator (X::Y::Z is decl-specifier-seq)
//  + X::Y::Z;  // a qualifed-id expression
}

Diff Detail

Event Timeline

hokein created this revision.Jul 25 2022, 1:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 25 2022, 1:05 PM
hokein requested review of this revision.Jul 25 2022, 1:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 25 2022, 1:05 PM

My main concern here is that this might reject valid code (and if it doesn't, it's not obvious why).

It does look like C++ forbids the cases I can come up with (e.g. trying to provide a definition for ::Foo is rejected by clang with "definition or redeclaration of Foo cannot name the global scope).
But I'd be way more comfortable if we could connect the specific guard rules here with spec language.

If we can't do this rigorously and merely are trying to encourage the *common* parse, then we should do it in soft disambig.

I'd like to think about this some more, do you know what part of the spec says that x ::y is invalid if x is e.g. a typedef for int?

My main concern here is that this might reject valid code (and if it doesn't, it's not obvious why).
It does look like C++ forbids the cases I can come up with (e.g. trying to provide a definition for ::Foo is rejected by clang with "definition or redeclaration of Foo cannot name the global scope).
But I'd be way more comfortable if we could connect the specific guard rules here with spec language.

The qualified declarator is tricky (it was forbidden until https://cplusplus.github.io/CWG/issues/482.html).

The closest thing I can find in the standard is basic.lookup.qual.general:

If a name, template-id, or decltype-specifier is followed by a ​::​, it shall designate a namespace, class, enumeration, or dependent type, and the ​::​ is never interpreted as a complete nested-name-specifier.

IIUC, this rule forbids the parses X::Y ::Z; and X ::Y::Z; (because in the declarator we have a complete :: nested-name-specifier).

If we can't do this rigorously and merely are trying to encourage the *common* parse, then we should do it in soft disambig.

I'd like to think about this some more, do you know what part of the spec says that x ::y is invalid if x is e.g. a typedef for int?

I didn't try to find it harder, I guess it is probably in the qualified-name-look-up section.

TL;DR: I suspect this is valid but guarding nested-name-specifier := :: [guard=PrevNotIdentifier] may be equivalent and clearer.

My main concern here is that this might reject valid code (and if it doesn't, it's not obvious why).
It does look like C++ forbids the cases I can come up with (e.g. trying to provide a definition for ::Foo is rejected by clang with "definition or redeclaration of Foo cannot name the global scope).
But I'd be way more comfortable if we could connect the specific guard rules here with spec language.

The qualified declarator is tricky (it was forbidden until https://cplusplus.github.io/CWG/issues/482.html).

The closest thing I can find in the standard is basic.lookup.qual.general:

If a name, template-id, or decltype-specifier is followed by a ​::​, it shall designate a namespace, class, enumeration, or dependent type, and the ​::​ is never interpreted as a complete nested-name-specifier.

OK, so just spelling this out so I don't get lost...

A name is an identifier or operator name.
From the grammar nested-name-specifier can have components preceding :::

  • nothing (global scope)
  • type-name, namespace-name: these are identifiers = names
  • decltype-specifier,
  • (simple)-template-id

So the rule is saying that if the :: can bind to the thing on its left, it must (*all* the non-null components are either names, template IDs, or decltype-specifiers).

I think this is probably what you've implemented here, though it's not totally obvious to me.

It seems most natural/more obviously-correct to express this as a guard on namespace-specifier := ::, as that's what the rule directly restricts. It's not simple to decide in general (in c < d > :: a, c<d> can be a template-id or not), but we can rule out the cases where the preceding token is an identifier (which is a limitation of your patch, right?).

Does nested-name-specifier := :: [guard=PrevTokenNotIdentifier] work for your testcases?
I think this is easier to tie to the [basic.language.qual.general] language you quoted.
(If you *don't* find this clearer, please do push back)

hokein updated this revision to Diff 447815.Jul 26 2022, 1:29 PM

refine the patch: guard the "::" nested-name-specifier rule instead.

TL;DR: I suspect this is valid but guarding nested-name-specifier := :: [guard=PrevNotIdentifier] may be equivalent and clearer.

Yeah, that's better!

sammccall accepted this revision.Jul 26 2022, 2:54 PM

Nice!

clang-tools-extra/pseudo/lib/cxx/CXX.cpp
166

Is LookaheadIndex from another patch?
I can't find it at head.

It seems a bit gratuitous here vs P.RHS.front()->startTokenIndex()... In general getting the info from RHS seems cleaner than jumping across by reasoning how many tokens it has

321

You could write this as TOKEN_GUARD(coloncolon, Tok.prev().Kind != tok::identifier)

If it's that short i like having the guard logic inline to avoid the indirection for the reader

clang-tools-extra/pseudo/test/cxx/nested-name-specifier.cpp
18 ↗(On Diff #447815)

Maybe check:
a<b>::c is ambiguous 3 ways, with a fixme to eliminate the one that is invalid. (I.e. the declaration of ::c)

This revision is now accepted and ready to land.Jul 26 2022, 2:54 PM
hokein updated this revision to Diff 448265.Jul 28 2022, 1:53 AM
hokein marked an inline comment as done.

address review comments.

hokein added inline comments.Jul 28 2022, 1:54 AM
clang-tools-extra/pseudo/lib/cxx/CXX.cpp
166

Yeah, the lookaheadIndex is from my other patch: https://reviews.llvm.org/D130591

I think using Tok.prev() is much better.

321

Good point!

hokein retitled this revision from [pseudo][wip] Eliminate simple-type-specifier ambiguities. to [pseudo] Eliminate the false `::` nested-name-specifier ambiguity.Jul 28 2022, 1:56 AM
hokein edited the summary of this revision. (Show Details)
This revision was landed with ongoing or failed builds.Jul 28 2022, 2:01 AM
This revision was automatically updated to reflect the committed changes.