- Add annotation handling ([key=value]) in the BNF grammar parser;
- Define and setup the API in the grammar for attributes;
- Implement a builtin guard for two simple c++ contexual-override/final use cases;
Details
- Reviewers
sammccall - Commits
- rGf1ac00c9b0d1: [pseudo] Add grammar annotations support.
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This patch might cover too many things (at least we could split the guide/glr implementation bit), but want to give you an overview of the picture first.
clang-tools-extra/pseudo/include/clang-pseudo/Grammar.h | ||
---|---|---|
88 | new names are welcome. Attribute is the name I came up with (I think it is clearer than the original Hook), | |
95 | I'm not quite happy with using the value as the ID, I think we can encode the Key into the ID as well (ID := Key | Value). Similar to the generated enum name, currently we just use the name of Value (Override), it will be more confusing when we add more keys/values, one idea is to add key as well (GuardOverride etc?). |
Nice!
This has tests for the parsing-the-attribute bits, but I think we're missing tests for the actual guards added.
clang-tools-extra/pseudo/include/clang-pseudo/GLR.h | ||
---|---|---|
114 ↗ | (On Diff #432538) | this signature seems a little off to me. Guard logic is identified by a guard ID and we look it up, but we're not passing in the guard ID for some reason. Instead we pass in the rule ID, which this function uses to look up the guard ID again. Why not just pass the guard ID? That said, it's a bit surprising that we have the rules declare separate guard rules, but then they're all implemented by one function. A map of functions seems more natural. (but not a performance advantage I guess) Naming: it's confusing that this umbrella function is called "guard", but a specific rule like "guard=Override" is also called "guard". I'd either call this a GuardTester or bypass the issue by making this a DenseMap whose values are Guards. Altogether I might write this as: using Guard = llvm::function_ref<bool(llvm::ArrayRef<const ForestNode *> RHS, const TokenStream &, const Grammar &)>; ... const DenseMap<AttributeID, Guard> &Guards; |
129 ↗ | (On Diff #432538) | either a reference, or a lightweight reference type like function_ref, for consistency with other fields? |
clang-tools-extra/pseudo/include/clang-pseudo/Grammar.h | ||
22 | nit: first "guard" should be "key"? | |
88 | I don't understand why we need AttributeKey, and to parse things into it, rather than just using StringRef. (Also here you've put it in the header, but it's not used in any interfaces) | |
95 | Packing all the possible values into a single unstructured enum is weird, but the problem (if it's a problem) is _redundancy_ and packing the attribute name in there as well just makes that worse. If we want to fix this, I think a sounder fix is something like: // Grammar.h using AttributeEnum = uint8_t; struct Rule { ...; AttributeEnum Guard = 0; } // CXX.h enum class GuardID : AttributeEnum { Override = 1; }; enum class RecoveryID : AttributeEnum { Parens = 1; }; i.e. we keep track of the unique values per attribute name and emit an enum for each attribute. This probably means emitting the actual enum definitions in tablegen, rather than a generic .inc file. (But I'm also fine with not fixing it at this point, and keeping one flat enum for the values) | |
96 | I agree HookID is an unfortunately vague name, but I did consider and reject AttributeID which I think is actively confusing. The difficulty of naming this is that we can't refer to the semantics (e.g. "Guard") as we want a single namespace for all of them, and all they have in common is syntax. However referring to "attribute values" makes for confusing internal data structures, as we don't model the whole attribute generically. So we're left with names that suggest "this is some kind of anchor to attach custom code to". Maybe ExtensionID is a little clearer than HookID? | |
98 | If we're going to use implicit bool conversions, we should not pretend this is some arbitrary value and drop the constant I think. | |
102 | if this optional and rarely set, I think it's should be a constructor parameter - this constructor could become unwieldy. It also forces you to process the attributes before creating the Rule, and reading the code doing it afterwards seems more natural. | |
clang-tools-extra/pseudo/include/clang-pseudo/cxx/CXX.h | ||
39 ↗ | (On Diff #432538) | this pattern is fragile. (and we're going to end up adding rules, too) I'd suggest either:
|
63 ↗ | (On Diff #432538) | "guard" looks like a verb here. If we're going to use it as both noun and verb, the relationship between the two needs to match the metaphor (in this case, it needs to be the guard who is doing the guarding) and it's not clear that it is. I think safest to stick to const Guard &getGuard() or so. |
clang-tools-extra/pseudo/lib/grammar/GrammarBNF.cpp | ||
47 | This patch takes this function from 80->120 lines, and it's starting to feel unwieldy. I think moving building the UniqueNonterminals/SymbolIDs + the new maps into a function (which returns a struct of SymbolIds and AttrValueIDs) would make this more manageable. | |
51 | unused | |
52 | why ""? | |
60 | AttrValueIDs never seems to get populated, so I'm not sure this works. | |
104 | This doesn't seem worth diagnosing to me - it seems stylistically weird but not really a problem to put a guard wherever. | |
104 | maybe another function to pull out here? | |
227 | Again, I don't think we should be aiming to provide nice diagnostics for each possible way we could get this wrong - the grammar here is part of the pseudo-parser, not user input. | |
258 | (This seems like a condition we can diagnose when applying the attribute, instead of needing to do it eagerly here) | |
clang-tools-extra/pseudo/unittests/GLRTest.cpp | ||
161 ↗ | (On Diff #432538) | These empty streams aren't valid (in multiple ways: they're not finalized, and they don't contain the tokens that the nodes refer to). Maybe add a tokenStreamForTest(StringLiteral) function somewhere? Safe because guaranteed to be null-terminated and have sufficient lifetime... |
address the grammar part comments:
- Rename to ExtensionID;
- Simplify the BNF parsing bit;
I addressed comments on the grammar part, (the remaining GLR, pseudo_gen parts are not covered yet), I think it would be better them into different patches, so that we can land the grammar bit first, then start doing the error recovery and guard implementation in parallel.
clang-tools-extra/pseudo/include/clang-pseudo/Grammar.h | ||
---|---|---|
102 | I suppose you mean we should drop the Guard parameter in the constructor, yeah, that sounds good to me, and simplifies the BNF parsing bit. | |
clang-tools-extra/pseudo/lib/grammar/GrammarBNF.cpp | ||
52 | This was for the sentinel default 0 attribute value. | |
60 | oops, this part of code wasn't fully cleaned up. AttrValueIDs is not needed indeed. | |
104 | sounds good, move the extension bits to a dedicate applyExtension. | |
104 |
It is valid in syntax level, but not in semantic level, I think this is confusing -- the bnf parser doesn't emit any warning on this usage, just silently ignore them (grammar writer might think the grammar is correct, guard should work even putting it in the middle of the rules). On the other hand, as you mentioned, grammar file is not user-faced, and we're the current authors, it seems ok to not do it (in favour of simplicity) right now (we might need to reconsider improving it there are new grammar contributors). | |
227 | fair enough, but we should keep minimal critical diagnostics at least to avoid shooting ourself in the foot. |
Thanks, this looks better.
Sorry about confusion with the names - I think annotation is a great name, and we should only use "extension" for the narrower "opaque thing that an annotation value refers to".
Happy to chat about this offline if you like
clang-tools-extra/pseudo/include/clang-pseudo/Grammar.h | ||
---|---|---|
22 | Hmm, I think we misunderstood each other... I think annotations is a good name for the syntactic [foo=bar] bit. I think the comment here is now harder to follow, and would prefer the old version. It's specifically "bar" which indicates some identifier whose semantics will be provided externally, so "bar" is a reference to an extension. If you want to mention the concept of an extension, then maybe at the end... | |
86 | "an extension uniquely identifies an extension" is a tautology. An extension is a piece of native code specific to a grammar that modifies the behavior of annotated rules. One ExtensionID is assigned for each unique attribute value (all attributes share a namespace). | |
88 | I don't think this last sentence is useful as-is, it looks important but it's not clear who it constrains. | |
213 | attribute values, or extension names (attributes are syntactic and have string values, extensions are semantic and can be referred to by names). | |
clang-tools-extra/pseudo/lib/grammar/GrammarBNF.cpp | ||
105 | doing this outside the loop above means this function has to handle all specs/rules and work out how they correspond, which is fragile. Instead can we do it inside the loop, and just handle a single attribute at a time? for (const auto &Spec : Specs) { // ... create rule for (const auto &Attribute : Spec.Attributes) applyAttribute(Attribute, T->Rules.back()); } | |
184 | rather attributes :-( | |
243 | For simplicity we could also drop this loop. If we ever need to specify two attributes on the same token, a := b [foo] [bar] works fine | |
246 | If we drop this line, then [foo] will be equivalent to [foo=], i.e. attribute is present and points at the empty string. This seems reasonable to me, and is a good syntax for boolean attributes (where [foo] denotes set and no attribute denotes unset) | |
clang-tools-extra/pseudo/unittests/GrammarTest.cpp | ||
104 | if you like you could add 3 more rules, with no annotation, guard=override, guard=somethingelse |
- bring back the attribute concept, narrow down the ExtensionID scope (only used for semantic);
- loose and simplify the BNF annotations parsing; ([] only allows single attribute, attribute without value are acceptable);
- address other comments;
clang-tools-extra/pseudo/include/clang-pseudo/Grammar.h | ||
---|---|---|
22 | oops, I misunderstood your "extensionID" comment completely (sorry). Bring it back now. | |
213 | renamed to attribute values. | |
clang-tools-extra/pseudo/lib/grammar/GrammarBNF.cpp | ||
246 | This syntax for bool-type attributes seems good to me. |
clang-tools-extra/pseudo/include/clang-pseudo/Grammar.h | ||
---|---|---|
89 | nit: I'd drop this sentence, as using it to index into a table is only ever used to produce debug representations I think - usually we use ExtensionID as an index into a map | |
215 | looking at this again, ExtensionNames seems clearer as ExtensionNames[ExtID] seems more obvious that the kinds agree. But up to you |
clang-tools-extra/pseudo/include/clang-pseudo/Grammar.h | ||
---|---|---|
215 | I considered extensionNames, but decided to use AttributeValues. AttributeValues corresponds to the syntactic grammar [key=value] where we call value as the attribute value, while using ExtensionNames is less clearer. And yeah AttributeValues[ExtID] is somehow confusing, but I think it is fine, as it is just print code. |
clang-tools-extra/pseudo/lib/grammar/GrammarBNF.cpp | ||
---|---|---|
234 | I get a warning/error on this line with this commit: 13:31:17 ../../clang-tools-extra/pseudo/lib/grammar/GrammarBNF.cpp:235:36: error: missing field 'Attributes' initializer [-Werror,-Wmissing-field-initializers] 13:31:17 Out.Sequence.push_back({Chunk}); 13:31:17 ^ 13:31:17 1 error generated. I see the warning when compiling with clang 8.0. |
clang-tools-extra/pseudo/lib/grammar/GrammarBNF.cpp | ||
---|---|---|
234 | sorry (I don't this warning enabled), should be fixed in 9ce232fba99c47c3246f06fcbe37c24b9d90585f. |
clang-tools-extra/pseudo/lib/grammar/GrammarBNF.cpp | ||
---|---|---|
234 | Thanks! |
nit: first "guard" should be "key"?