This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang-tools-extra/pseudo/
-
pseudo/
-
lib/cxx/
-
cxx/
1/1
CXX.cpp
3/4
cxx.bnf
-
test/
-
cxx/
-
decl-specfier-seq.cpp
-
fuzzer.cpp

Differential D130337

[pseudo] Eliminate multiple-specified-types ambiguities using guards
ClosedPublic

Authored by sammccall on Jul 22 2022, 2:29 AM.

Download Raw Diff

Details

Reviewers

hokein
usaxena95

Commits

rGb2b993a6ae67: [pseudo] Eliminate multiple-specified-types ambiguities using guards

Summary

Motivating case: foo bar; is not a declaration of nothing with foo and bar
both types.

This is a common and critical ambiguity, clangd/AST.cpp has 20% fewer
ambiguous nodes (1674->1332) after this change.

This could benefit from caching, but there's no caching infra in this patch.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

sammccall created this revision.Jul 22 2022, 2:29 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 22 2022, 2:29 AM

Herald added subscribers: usaxena95, kadircet. · View Herald Transcript

sammccall requested review of this revision.Jul 22 2022, 2:29 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 22 2022, 2:29 AM

Herald added subscribers: cfe-commits, alextsao1999, ilya-biryukov. · View Herald Transcript

sammccall edited the summary of this revision. (Show Details)Jul 22 2022, 2:29 AM

sammccall added a reviewer: usaxena95.

Implementation choices here:

we walk through the whole tree everytime we see decl-specifier-seq: this is dumb but we to reuse work we either need to add extra rules to the grammar (D130150) or an explicit cache. Caching is likely a good idea but not added in this patch.
we handle *all* rules for the interesting node types explicitly, rather than default: return false. This allows us to assert that all cases are handled, so things don't "fall through the cracks" after grammar changes. Alternative suggestions welcome. (I have a feeling this "exhaustive pattern-matching" idea will come up again...)
mix of iteration and recursion in the implementation: I suspect this doesn't matter much, as we'll rework it when adding caching.

use all_of

FWIW, as-is with no caching, this is a ~2% slowdown on my machine (5.82 -> 5.72 MB/s on SemaCodeComplete.cpp).
Whereas D130150 using the grammar is a a 7% speedup (5.82 -> 6.22), so roughly an 9% performance difference between the approaches.
My guess is we'll get some but not all of this back through caching, as hashing isn't free and we'll increase the size of our working set.

Harbormaster completed remote builds in B176953: Diff 446753.Jul 22 2022, 2:53 AM

In D130337#3671159, @sammccall wrote:

FWIW, as-is with no caching, this is a ~2% slowdown on my machine (5.82 -> 5.72 MB/s on SemaCodeComplete.cpp).
Whereas D130150 using the grammar is a a 7% speedup (5.82 -> 6.22), so roughly an 9% performance difference between the approaches.
My guess is we'll get some but not all of this back through caching, as hashing isn't free and we'll increase the size of our working set.

And indeed, I see about 6.0MB/s with a simple llvm::DenseMap<ForestNode*, bool>, so I expect we can get ~half the performance back.

Thanks, the change looks good in general.

we handle *all* rules for the interesting node types explicitly, rather than default: return false. This allows us to assert that all cases are handled, so things don't "fall through the cracks" after grammar changes. Alternative suggestions welcome. (I have a feeling this "exhaustive pattern-matching" idea will come up again...)

The dangerous bit is that now we will crash at the runtime if the parsing code triggers a missing case.

Yeah, I hit this problem in the previous function-declarator patch. One approach will be to define an enum for each nonterminal enum Nonterminal { rhs0_rhs1 }, rather than put all rules into a single enum. It is easier to enumerate all values for a single nonterminal (I think this is the common case)

namespace cxx {
namespace rule {

enum simple_type_specifier {
  builtin_type0,
  nested_name_specifier0_type_name1,
  ...
}

}
}

In D130337#3671159, @sammccall wrote:

FWIW, as-is with no caching, this is a ~2% slowdown on my machine (5.82 -> 5.72 MB/s on SemaCodeComplete.cpp).

Actually, the number looks quite good to me (I expected that there will be a significant slowdown without caching). I think we can check in the current version, and add the caching stuff afterwards.

Whereas D130150 using the grammar is a a 7% speedup (5.82 -> 6.22), so roughly an 9% performance difference between the approaches.

Yeah, because the grammar-based approach reduces the number of ambiguous node in the forest.

clang-tools-extra/pseudo/lib/cxx/CXX.cpp
285	nit: I would suggest using the index explicitly `P.RHS[0]`, `P.RHS[1]`, it increases the readability (the rul name encoding the index, easier to spot the corresponding element).
clang-tools-extra/pseudo/lib/cxx/cxx.bnf
353	offtopic comment: The sad bit of the `RuleID` approach (vs `guard=SingleType`) is that we don't really know what kind of the guard is by reading the grammar file only. I think this is critical information, and worth to keep in the grammar file. (ideas: add comments, or bring the `guard=SingleType` in the grammar again, but we ignore the `guard` value in the grammar parsing).
385	I think the reason to leave SHORT/LONG/SIGNED UNSIGNED as-is is that they can combined with other type (e.g. short int). Can we group them together, and add a comment?

sammccall marked 2 inline comments as done.Jul 22 2022, 6:07 AM

sammccall added inline comments.

clang-tools-extra/pseudo/lib/cxx/cxx.bnf
353	Yeah, the current balance doesn't feel obviously right. I'd like to leave this for the time being, because of the various options (remove the annotations, replace them with comments, bring back values), i'm not sure there's a clear winner. I have a suspicion that while it's appealing now to at least reference here all the restrictions that may apply, when we add "soft" disambiguation based on scoring it may not be so appealing as we won't be documenting the things that affect the parse in practice.
385	Grouped them. I don't think the idea that SHORT is a specifier but not actually a type needs to be spelled out, but added a comment about `builtin-type` (which is nonstandard) which hints at this.

In D130337#3671559, @hokein wrote:

Thanks, the change looks good in general.

we handle *all* rules for the interesting node types explicitly, rather than default: return false. This allows us to assert that all cases are handled, so things don't "fall through the cracks" after grammar changes. Alternative suggestions welcome. (I have a feeling this "exhaustive pattern-matching" idea will come up again...)

The dangerous bit is that now we will crash at the runtime if the parsing code triggers a missing case.

Yeah, grammar changes make this brittle.
Still, hopefully we canary our releases...

Yeah, I hit this problem in the previous function-declarator patch. One approach will be to define an enum for each nonterminal enum Nonterminal { rhs0_rhs1 }, rather than put all rules into a single enum. It is easier to enumerate all values for a single nonterminal (I think this is the common case)

Well, I don't think it's the most common (vs just targeting a rule or two) but certainly we never enumerate *all* the rules!
Interesting idea.

I'm not sure it'd be worth adding control-flow here to split up the switches by rule type though, switches are pretty heavyweight syntactically.

In D130337#3671159, @sammccall wrote:

FWIW, as-is with no caching, this is a ~2% slowdown on my machine (5.82 -> 5.72 MB/s on SemaCodeComplete.cpp).

Actually, the number looks quite good to me (I expected that there will be a significant slowdown without caching). I think we can check in the current version, and add the caching stuff afterwards.

I'm not so happy...

The relevant baseline here IMO is the +7% version, as we're clearly currently paying ~8% cost to build all the silly incorrect parses, and that cost is artificial.
So 9% overall slowdown now, and still 5% with the cache, feels significant. But we can change this later.

Whereas D130150 using the grammar is a a 7% speedup (5.82 -> 6.22), so roughly an 9% performance difference between the approaches.

Yeah, because the grammar-based approach reduces the number of ambiguous node in the forest.

*Both* approaches do that, which shows how slow the guard version is :-(

LMK if anything else blocking here. I want to take a stab at changing the enums (cool idea), but I don't think there's much point blocking this patch on it.
Better to use it as a testbed for the change.

The change looks good to me.

In D130337#3671575, @sammccall wrote:

LMK if anything else blocking here.

I don't want to block you, but I'd suggest postponing it a little bit until we collect some metrics in our internal pipeline (I think usaxena95@ is working on it, hopefully we will get it next week).

I want to take a stab at changing the enums (cool idea), but I don't think there's much point blocking this patch on it.

Agree.

Well, I don't think it's the most common (vs just targeting a rule or two) but certainly we never enumerate *all* the rules!
Interesting idea.

If we look at the existing guard implementations, we have a few of these usages:

in isFunctionDeclarator, we enumerate all rules of noptr_declarator, ptr_declarator, declarator ;
in hasExclusiveType, we enumerate all rules of decl_specifier, simple_type_specifier, type_specifier, type_specifier_seq etc;

hokein accepted this revision.Jul 22 2022, 6:33 AM

This revision is now accepted and ready to land.Jul 22 2022, 6:33 AM

In D130337#3671614, @hokein wrote:

If we look at the existing guard implementations, we have a few of these usages:

in isFunctionDeclarator, we enumerate all rules of noptr_declarator, ptr_declarator, declarator ;

in hasExclusiveType, we enumerate all rules of decl_specifier, simple_type_specifier, type_specifier, type_specifier_seq etc;

In both cases, we're enumerating all the rules of *multiple* nonterminals.
To get the compiler to verify exhaustiveness, we'd have to first switch over target symbol, then switch over rule ID. I expect this to be both a readability and performance hit, so I'm not sure we'll actually do it.
Still, rule::simple_declaration::decl_specifier_seq__declarator-seq__SEMI is a step up in readability and code-completion IMO. Sent https://reviews.llvm.org/D130414

Feel free to land it. We have some number now.

This revision was landed with ongoing or failed builds.Jul 25 2022, 3:57 AM

Closed by commit rGb2b993a6ae67: [pseudo] Eliminate multiple-specified-types ambiguities using guards (authored by sammccall). · Explain Why

This revision was automatically updated to reflect the committed changes.

sammccall added a commit: rGb2b993a6ae67: [pseudo] Eliminate multiple-specified-types ambiguities using guards.

sammccall mentioned this in rG0b90e136eee9: [pseudo] Style tweaks forgotten in D130337. NFC.Aug 16 2022, 1:26 AM

Revision Contents

Path

Size

clang-tools-extra/

pseudo/

lib/

cxx/

CXX.cpp

121 lines

cxx.bnf

27 lines

test/

cxx/

decl-specfier-seq.cpp

27 lines

fuzzer.cpp

2 lines

Diff 447270

clang-tools-extra/pseudo/lib/cxx/CXX.cpp

//===--- CXX.cpp - Define public interfaces for C++ grammar ---------------===//		//===--- CXX.cpp - Define public interfaces for C++ grammar ---------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "clang-pseudo/cxx/CXX.h"		#include "clang-pseudo/cxx/CXX.h"
#include "clang-pseudo/Forest.h"		#include "clang-pseudo/Forest.h"
#include "clang-pseudo/Language.h"		#include "clang-pseudo/Language.h"
#include "clang-pseudo/grammar/Grammar.h"		#include "clang-pseudo/grammar/Grammar.h"
#include "clang-pseudo/grammar/LRTable.h"		#include "clang-pseudo/grammar/LRTable.h"
#include "clang/Basic/CharInfo.h"		#include "clang/Basic/CharInfo.h"
#include "clang/Basic/TokenKinds.h"		#include "clang/Basic/TokenKinds.h"
#include "llvm/ADT/StringSwitch.h"		#include "llvm/ADT/StringSwitch.h"
		#include "llvm/Support/Debug.h"
#include <utility>		#include <utility>
		#define DEBUG_TYPE "CXX.cpp"

namespace clang {		namespace clang {
namespace pseudo {		namespace pseudo {
namespace cxx {		namespace cxx {
namespace {		namespace {
static const char *CXXBNF =		static const char *CXXBNF =
#include "CXXBNF.inc"		#include "CXXBNF.inc"
;		;
▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	bool isFunctionDeclarator(const ForestNode *Declarator) {
}		}
llvm_unreachable("unreachable");		llvm_unreachable("unreachable");
}		}

bool guardNextTokenNotElse(const GuardParams &P) {		bool guardNextTokenNotElse(const GuardParams &P) {
return symbolToToken(P.Lookahead) != tok::kw_else;		return symbolToToken(P.Lookahead) != tok::kw_else;
}		}

		// Whether this e.g. decl-specifier contains an "exclusive" type such as a class
		// name, and thus can't combine with a second exclusive type.
		//
		// Returns false for
		// - non-types
		// - "unsigned" etc that may suffice as types but may modify others
		// - cases of uncertainty (e.g. due to ambiguity)
		bool hasExclusiveType(const ForestNode *N) {
		// FIXME: every time we apply this check, we walk the whole subtree.
		// Add per-node caching instead.
		while (true) {
		assert(N->symbol() == (SymbolID)Symbol::decl_specifier_seq \|\|
		N->symbol() == (SymbolID)Symbol::type_specifier_seq \|\|
		N->symbol() == (SymbolID)Symbol::defining_type_specifier_seq \|\|
		N->symbol() == (SymbolID)Symbol::decl_specifier \|\|
		N->symbol() == (SymbolID)Symbol::type_specifier \|\|
		N->symbol() == (SymbolID)Symbol::defining_type_specifier \|\|
		N->symbol() == (SymbolID)Symbol::simple_type_specifier);
		if (N->kind() == ForestNode::Opaque)
		return false; // conservative
		if (N->kind() == ForestNode::Ambiguous)
		return llvm::all_of(N->alternatives(), hasExclusiveType); // conservative
		// All supported symbols are nonterminals.
		assert(N->kind() == ForestNode::Sequence);
		switch (N->rule()) {
		// seq := element seq: check element then continue into seq
		case (RuleID)Rule::decl_specifier_seq_0decl_specifier_1decl_specifier_seq:
		case (RuleID)Rule::defining_type_specifier_seq_0defining_type_specifier_1defining_type_specifier_seq:
		case (RuleID)Rule::type_specifier_seq_0type_specifier_1type_specifier_seq:
		if (hasExclusiveType(N->children()[0]))
		return true;
		N = N->children()[1];
		continue;
		// seq := element: continue into element
		case (RuleID)Rule::decl_specifier_seq_0decl_specifier:
		case (RuleID)Rule::type_specifier_seq_0type_specifier:
		case (RuleID)Rule::defining_type_specifier_seq_0defining_type_specifier:
		N = N->children()[0];
		continue;

		// defining-type-specifier
		case (RuleID)Rule::defining_type_specifier_0type_specifier:
		N = N->children()[0];
		continue;
		case (RuleID)Rule::defining_type_specifier_0class_specifier:
		case (RuleID)Rule::defining_type_specifier_0enum_specifier:
		return true;

		// decl-specifier
		case (RuleID)Rule::decl_specifier_0defining_type_specifier:
		N = N->children()[0];
		continue;
		case (RuleID)Rule::decl_specifier_0consteval:
		case (RuleID)Rule::decl_specifier_0constexpr:
		case (RuleID)Rule::decl_specifier_0constinit:
		case (RuleID)Rule::decl_specifier_0inline:
		case (RuleID)Rule::decl_specifier_0friend:
		case (RuleID)Rule::decl_specifier_0storage_class_specifier:
		case (RuleID)Rule::decl_specifier_0typedef:
		case (RuleID)Rule::decl_specifier_0function_specifier:
		return false;

		// type-specifier
		case (RuleID)Rule::type_specifier_0elaborated_type_specifier:
		case (RuleID)Rule::type_specifier_0typename_specifier:
		return true;
		case (RuleID)Rule::type_specifier_0simple_type_specifier:
		N = N->children()[0];
		continue;
		case (RuleID)Rule::type_specifier_0cv_qualifier:
		return false;

		// simple-type-specifier
		case (RuleID)Rule::simple_type_specifier_0type_name:
		case (RuleID)Rule::simple_type_specifier_0template_name:
		case (RuleID)Rule::simple_type_specifier_0builtin_type:
		case (RuleID)Rule::simple_type_specifier_0nested_name_specifier_1template_2simple_template_id:
		case (RuleID)Rule::simple_type_specifier_0nested_name_specifier_1template_name:
		case (RuleID)Rule::simple_type_specifier_0nested_name_specifier_1type_name:
		case (RuleID)Rule::simple_type_specifier_0decltype_specifier:
		case (RuleID)Rule::simple_type_specifier_0placeholder_type_specifier:
		return true;
		case (RuleID)Rule::simple_type_specifier_0long:
		case (RuleID)Rule::simple_type_specifier_0short:
		case (RuleID)Rule::simple_type_specifier_0signed:
		case (RuleID)Rule::simple_type_specifier_0unsigned:
		return false;

		default:
		LLVM_DEBUG(llvm::errs() << "Unhandled rule " << N->rule() << "\n");
		llvm_unreachable("hasExclusiveType be exhaustive!");
		}
		}
		}

llvm::DenseMap<ExtensionID, RuleGuard> buildGuards() {		llvm::DenseMap<ExtensionID, RuleGuard> buildGuards() {
		#define GUARD(cond) \
		{ \
		[](const GuardParams &P) { return cond; } \
		}
#define TOKEN_GUARD(kind, cond) \		#define TOKEN_GUARD(kind, cond) \
[](const GuardParams& P) { \		[](const GuardParams& P) { \
const Token &Tok = onlyToken(tok::kind, P.RHS, P.Tokens); \		const Token &Tok = onlyToken(tok::kind, P.RHS, P.Tokens); \
return cond; \		return cond; \
}		}
#define SYMBOL_GUARD(kind, cond) \		#define SYMBOL_GUARD(kind, cond) \
[](const GuardParams& P) { \		[](const GuardParams& P) { \
const ForestNode &N = onlySymbol((SymbolID)Symbol::kind, P.RHS, P.Tokens); \		const ForestNode &N = onlySymbol((SymbolID)Symbol::kind, P.RHS, P.Tokens); \
return cond; \		return cond; \
}		}
return {		return {
{(RuleID)Rule::function_declarator_0declarator,		{(RuleID)Rule::function_declarator_0declarator,
SYMBOL_GUARD(declarator, isFunctionDeclarator(&N))},		SYMBOL_GUARD(declarator, isFunctionDeclarator(&N))},
{(RuleID)Rule::non_function_declarator_0declarator,		{(RuleID)Rule::non_function_declarator_0declarator,
SYMBOL_GUARD(declarator, !isFunctionDeclarator(&N))},		SYMBOL_GUARD(declarator, !isFunctionDeclarator(&N))},

		// A {decl,type,defining-type}-specifier-sequence cannot have multiple
		// "exclusive" types (like class names): a value has only one type.
		{(RuleID)Rule::
		defining_type_specifier_seq_0defining_type_specifier_1defining_type_specifier_seq,
		GUARD(!hasExclusiveType(P.RHS[0]) \|\| !hasExclusiveType(P.RHS[1]))},
		hokeinUnsubmitted Done Reply Inline Actions nit: I would suggest using the index explicitly `P.RHS[0]`, `P.RHS[1]`, it increases the readability (the rul name encoding the index, easier to spot the corresponding element). hokein: nit: I would suggest using the index explicitly `P.RHS[0]`, `P.RHS[1]`, it increases the…
		{(RuleID)Rule::type_specifier_seq_0type_specifier_1type_specifier_seq,
		GUARD(!hasExclusiveType(P.RHS[0]) \|\| !hasExclusiveType(P.RHS[1]))},
		{(RuleID)Rule::decl_specifier_seq_0decl_specifier_1decl_specifier_seq,
		GUARD(!hasExclusiveType(P.RHS[0]) \|\| !hasExclusiveType(P.RHS[1]))},

{(RuleID)Rule::contextual_override_0identifier,		{(RuleID)Rule::contextual_override_0identifier,
TOKEN_GUARD(identifier, Tok.text() == "override")},		TOKEN_GUARD(identifier, Tok.text() == "override")},
{(RuleID)Rule::contextual_final_0identifier,		{(RuleID)Rule::contextual_final_0identifier,
TOKEN_GUARD(identifier, Tok.text() == "final")},		TOKEN_GUARD(identifier, Tok.text() == "final")},
{(RuleID)Rule::import_keyword_0identifier,		{(RuleID)Rule::import_keyword_0identifier,
TOKEN_GUARD(identifier, Tok.text() == "import")},		TOKEN_GUARD(identifier, Tok.text() == "import")},
{(RuleID)Rule::export_keyword_0identifier,		{(RuleID)Rule::export_keyword_0identifier,
TOKEN_GUARD(identifier, Tok.text() == "export")},		TOKEN_GUARD(identifier, Tok.text() == "export")},
{(RuleID)Rule::module_keyword_0identifier,		{(RuleID)Rule::module_keyword_0identifier,
TOKEN_GUARD(identifier, Tok.text() == "module")},		TOKEN_GUARD(identifier, Tok.text() == "module")},
{(RuleID)Rule::contextual_zero_0numeric_constant,		{(RuleID)Rule::contextual_zero_0numeric_constant,
TOKEN_GUARD(numeric_constant, Tok.text() == "0")},		TOKEN_GUARD(numeric_constant, Tok.text() == "0")},

{(RuleID)Rule::selection_statement_0if_1l_paren_2condition_3r_paren_4statement,		{(RuleID)Rule::
		selection_statement_0if_1l_paren_2condition_3r_paren_4statement,
guardNextTokenNotElse},		guardNextTokenNotElse},
{(RuleID)Rule::selection_statement_0if_1constexpr_2l_paren_3condition_4r_paren_5statement,		{(RuleID)Rule::
		selection_statement_0if_1constexpr_2l_paren_3condition_4r_paren_5statement,
guardNextTokenNotElse},		guardNextTokenNotElse},

// The grammar distinguishes (only) user-defined vs plain string literals,		// The grammar distinguishes (only) user-defined vs plain string literals,
// where the clang lexer distinguishes (only) encoding types.		// where the clang lexer distinguishes (only) encoding types.
{(RuleID)Rule::user_defined_string_literal_chunk_0string_literal,		{(RuleID)Rule::user_defined_string_literal_chunk_0string_literal,
TOKEN_GUARD(string_literal, isStringUserDefined(Tok))},		TOKEN_GUARD(string_literal, isStringUserDefined(Tok))},
{(RuleID)Rule::user_defined_string_literal_chunk_0utf8_string_literal,		{(RuleID)Rule::user_defined_string_literal_chunk_0utf8_string_literal,
TOKEN_GUARD(utf8_string_literal, isStringUserDefined(Tok))},		TOKEN_GUARD(utf8_string_literal, isStringUserDefined(Tok))},
{(RuleID)Rule::user_defined_string_literal_chunk_0utf16_string_literal,		{(RuleID)Rule::user_defined_string_literal_chunk_0utf16_string_literal,
▲ Show 20 Lines • Show All 90 Lines • Show Last 20 Lines

clang-tools-extra/pseudo/lib/cxx/cxx.bnf

	Show First 20 Lines • Show All 344 Lines • ▼ Show 20 Lines
	decl-specifier := function-specifier			decl-specifier := function-specifier
	decl-specifier := FRIEND			decl-specifier := FRIEND
	decl-specifier := TYPEDEF			decl-specifier := TYPEDEF
	decl-specifier := CONSTEXPR			decl-specifier := CONSTEXPR
	decl-specifier := CONSTEVAL			decl-specifier := CONSTEVAL
	decl-specifier := CONSTINIT			decl-specifier := CONSTINIT
	decl-specifier := INLINE			decl-specifier := INLINE
	decl-specifier-seq := decl-specifier			decl-specifier-seq := decl-specifier
	decl-specifier-seq := decl-specifier decl-specifier-seq			decl-specifier-seq := decl-specifier decl-specifier-seq [guard]
				hokeinUnsubmitted Not Done Reply Inline Actions offtopic comment: The sad bit of the `RuleID` approach (vs `guard=SingleType`) is that we don't really know what kind of the guard is by reading the grammar file only. I think this is critical information, and worth to keep in the grammar file. (ideas: add comments, or bring the `guard=SingleType` in the grammar again, but we ignore the `guard` value in the grammar parsing). hokein: offtopic comment: The sad bit of the `RuleID` approach (vs `guard=SingleType`) is that we don't…
				sammccallAuthorUnsubmitted Done Reply Inline Actions Yeah, the current balance doesn't feel obviously right. I'd like to leave this for the time being, because of the various options (remove the annotations, replace them with comments, bring back values), i'm not sure there's a clear winner. I have a suspicion that while it's appealing now to at least reference here all the restrictions that may apply, when we add "soft" disambiguation based on scoring it may not be so appealing as we won't be documenting the things that affect the parse in practice. sammccall: Yeah, the current balance doesn't feel obviously right. I'd like to leave this for the time…
	storage-class-specifier := STATIC			storage-class-specifier := STATIC
	storage-class-specifier := THREAD_LOCAL			storage-class-specifier := THREAD_LOCAL
	storage-class-specifier := EXTERN			storage-class-specifier := EXTERN
	storage-class-specifier := MUTABLE			storage-class-specifier := MUTABLE
	function-specifier := VIRTUAL			function-specifier := VIRTUAL
	function-specifier := explicit-specifier			function-specifier := explicit-specifier
	explicit-specifier := EXPLICIT ( constant-expression )			explicit-specifier := EXPLICIT ( constant-expression )
	explicit-specifier := EXPLICIT			explicit-specifier := EXPLICIT
	type-specifier := simple-type-specifier			type-specifier := simple-type-specifier
	type-specifier := elaborated-type-specifier			type-specifier := elaborated-type-specifier
	type-specifier := typename-specifier			type-specifier := typename-specifier
	type-specifier := cv-qualifier			type-specifier := cv-qualifier
	type-specifier-seq := type-specifier			type-specifier-seq := type-specifier
	type-specifier-seq := type-specifier type-specifier-seq			type-specifier-seq := type-specifier type-specifier-seq [guard]
	defining-type-specifier := type-specifier			defining-type-specifier := type-specifier
	defining-type-specifier := class-specifier			defining-type-specifier := class-specifier
	defining-type-specifier := enum-specifier			defining-type-specifier := enum-specifier
	defining-type-specifier-seq := defining-type-specifier			defining-type-specifier-seq := defining-type-specifier
	defining-type-specifier-seq := defining-type-specifier defining-type-specifier-seq			defining-type-specifier-seq := defining-type-specifier defining-type-specifier-seq [guard]
	simple-type-specifier := nested-name-specifier_opt type-name			simple-type-specifier := nested-name-specifier_opt type-name
	simple-type-specifier := nested-name-specifier TEMPLATE simple-template-id			simple-type-specifier := nested-name-specifier TEMPLATE simple-template-id
	simple-type-specifier := decltype-specifier			simple-type-specifier := decltype-specifier
	simple-type-specifier := placeholder-type-specifier			simple-type-specifier := placeholder-type-specifier
	simple-type-specifier := nested-name-specifier_opt template-name			simple-type-specifier := nested-name-specifier_opt template-name
	simple-type-specifier := CHAR			simple-type-specifier := builtin-type
	simple-type-specifier := CHAR8_T			builtin-type := CHAR
	simple-type-specifier := CHAR16_T			builtin-type := CHAR8_T
	simple-type-specifier := CHAR32_T			builtin-type := CHAR16_T
	simple-type-specifier := WCHAR_T			builtin-type := CHAR32_T
	simple-type-specifier := BOOL			builtin-type := WCHAR_T
				builtin-type := BOOL
	simple-type-specifier := SHORT			simple-type-specifier := SHORT
				hokeinUnsubmitted Done Reply Inline Actions I think the reason to leave SHORT/LONG/SIGNED UNSIGNED as-is is that they can combined with other type (e.g. short int). Can we group them together, and add a comment? hokein: I think the reason to leave SHORT/LONG/SIGNED UNSIGNED as-is is that they can combined with…
				sammccallAuthorUnsubmitted Done Reply Inline Actions Grouped them. I don't think the idea that SHORT is a specifier but not actually a type needs to be spelled out, but added a comment about `builtin-type` (which is nonstandard) which hints at this. sammccall: Grouped them. I don't think the idea that SHORT is a specifier but not actually a type needs to…
	simple-type-specifier := INT			builtin-type := INT
	simple-type-specifier := LONG			simple-type-specifier := LONG
	simple-type-specifier := SIGNED			simple-type-specifier := SIGNED
	simple-type-specifier := UNSIGNED			simple-type-specifier := UNSIGNED
	simple-type-specifier := FLOAT			builtin-type := FLOAT
	simple-type-specifier := DOUBLE			builtin-type := DOUBLE
	simple-type-specifier := VOID			builtin-type := VOID
	type-name := class-name			type-name := class-name
	type-name := enum-name			type-name := enum-name
	type-name := typedef-name			type-name := typedef-name
	elaborated-type-specifier := class-key nested-name-specifier_opt IDENTIFIER			elaborated-type-specifier := class-key nested-name-specifier_opt IDENTIFIER
	elaborated-type-specifier := class-key simple-template-id			elaborated-type-specifier := class-key simple-template-id
	elaborated-type-specifier := class-key nested-name-specifier TEMPLATE_opt simple-template-id			elaborated-type-specifier := class-key nested-name-specifier TEMPLATE_opt simple-template-id
	elaborated-type-specifier := elaborated-enum-specifier			elaborated-type-specifier := elaborated-enum-specifier
	elaborated-enum-specifier := ENUM nested-name-specifier_opt IDENTIFIER			elaborated-enum-specifier := ENUM nested-name-specifier_opt IDENTIFIER
	▲ Show 20 Lines • Show All 376 Lines • Show Last 20 Lines

clang-tools-extra/pseudo/test/cxx/decl-specfier-seq.cpp

This file was added.

				// RUN: clang-pseudo -grammar=cxx -source=%s --print-forest \| FileCheck %s

				// not parsed as Type{foo} Type{bar}
				foo bar;
				// CHECK-NOT: simple-declaration := decl-specifier-seq ;
				// CHECK: simple-declaration := decl-specifier-seq init-declarator-list ;
				// CHECK: ├─decl-specifier-seq~simple-type-specifier
				// CHECK: ├─init-declarator-list~IDENTIFIER
				// CHECK: └─;
				// CHECK-NOT: simple-declaration := decl-specifier-seq ;

				// not parsed as Type{std} Type{::string} Declarator{s};
				std::string s;
				// CHECK-NOT: nested-name-specifier := ::
				// CHECK: simple-declaration := decl-specifier-seq init-declarator-list ;
				// CHECK: ├─decl-specifier-seq~simple-type-specifier := <ambiguous>
				// CHECK: │ ├─simple-type-specifier := nested-name-specifier type-name
				// CHECK: │ │ ├─nested-name-specifier := <ambiguous> #1
				// CHECK: │ │ │ ├─nested-name-specifier := type-name ::
				// CHECK: │ │ │ └─nested-name-specifier := namespace-name ::
				// CHECK: │ │ └─type-name
				// CHECK: │ └─simple-type-specifier := nested-name-specifier template-name
				// CHECK: │ ├─nested-name-specifier =#1
				// CHECK: │ └─template-name~IDENTIFIER
				// CHECK: ├─init-declarator-list~IDENTIFIER
				// CHECK: └─;
				// CHECK-NOT: nested-name-specifier := ::

clang-tools-extra/pseudo/test/fuzzer.cpp

	// RUN: clang-pseudo-fuzzer -grammar=%cxx-bnf-file -print %s \| FileCheck %s			// RUN: clang-pseudo-fuzzer -grammar=%cxx-bnf-file -print %s \| FileCheck %s
	int x;			int x;
	// CHECK: translation-unit := declaration-seq			// CHECK: translation-unit := declaration-seq
	// CHECK: simple-type-specifier := INT			// CHECK: builtin-type := INT