This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] Add `KeywordOrString` handling to AsmParser
ClosedPublic

Authored by youngar on Oct 12 2021, 3:06 PM.

Details

Summary

This adds a new parser and printer for text which may be a keyword or a
string. When printing, it will attempt to print the text as a keyword,
but if it has any special or non-printable characters, it will be
printed as an escaped string. When parsing, it will parse either a
valid keyword or a potentially escaped string. The printer allows for an
empty string, in which case it prints "".

This new function is used for printing the name in NamedAttributes, and
for printing the symbol name after the @. In CIRCT we are using this
to print module port names, which are conceptually similar to named
function arguments.

Diff Detail

Event Timeline

youngar created this revision.Oct 12 2021, 3:06 PM
youngar requested review of this revision.Oct 12 2021, 3:06 PM
youngar updated this revision to Diff 379202.Oct 12 2021, 3:16 PM

Fix description comment

youngar updated this revision to Diff 379203.Oct 12 2021, 3:19 PM

fix description comment again

This was tested using CIRCT with this patch.
Any suggestions on how to add tests for this?

I think this is something that was intentionally excluded from the parser/printer to not have arbitrary form (I remember trying to add this a while back...).

Can you elaborate a bit on your use case? The CIRCT patch you're linking does not have tests or examples I believe.

Also you wrote that "In CIRCT we are using this to print module port names, which are conceptually similar to named function arguments", but if you want to print named SSA value you can already do it without this patch.

This was tested using CIRCT with this patch.
Any suggestions on how to add tests for this?

Yes, we have a TestDialect where you can implement custom printer/parser for ops.

youngar added a comment.EditedOct 12 2021, 4:55 PM

Hi Mehdi

I think this is something that was intentionally excluded from the parser/printer to not have arbitrary form (I remember trying to add this a while back...).

Can you elaborate a bit on your use case?

Yeah. It makes more sense when you see an external module, where the port names are a part of the signature of the module:

firrtl.extmodule @Foo(in arg0: !firrtl.uint<2>)

In this example we are printing arg0 using the new printIdentifier function. I don't think it makes much sense to print it as %arg0 when there is no block or value. We could print it as "arg0" but I thought the extra quotes were not helpful. For regular modules we are printing port names as SSA values:

firrtl.module @Foo(in %arg0: !firrtl.uint<2>) {
  ...
}

The CIRCT patch you're linking does not have tests or examples I believe.

Since i'm upstreaming some code from CIRCT to MLIR, the examples are actually in an previous commit which I should have probably linked. You can see some examples in this open PR.

This patch LGTM, it is extending an existing pattern in the AsmParser and it causes boilerplate in other places. I'd love River's thoughts.

think this is something that was intentionally excluded from the parser/printer to not have arbitrary form (I remember trying to add this a while back...).

I don't think this exact one was excluded because of that, but I agree we decided we didn't want to have arbitrary form. That said, I think we should re-evaluate the premise there: LOTS of things (e.g. the eliding of dialect prefixes etc) have pushed back against this, and I think it makes sense to just concede to reality that dialect authors want to do weird things, because they spend LOTS of time looking at debug dumps etc. I think that we could have a much better design, e.g. have an enum for symbols instead of parseQuestion/parseOptionalQuestion pairs for everything etc. In any case, that is a different discussion.

Also you wrote that "In CIRCT we are using this to print module port names, which are conceptually similar to named function arguments", but if you want to print named SSA value you can already do it without this patch.

Yeah, as Andrew says, these aren't SSA names. The similar thing in MLIR is in printNamedAttribute which escapes the attribute name with quotes in the (really rare) case when it isn't a valid identifier according to the MLIR lexer.

mlir/include/mlir/IR/OpImplementation.h
697

typo "dentifier"

mlir/lib/IR/AsmPrinter.cpp
1563

Please add a doxygen comment, and please change printSymbolReference and AsmPrinter::Impl::printNamedAttribute to call this.

youngar updated this revision to Diff 379241.Oct 12 2021, 6:13 PM

printSymbolReference and printNamedAttribute now call printIdentifier. isBareIdentifier no longer asserts on empty string and will return false.

youngar marked 2 inline comments as done.Oct 12 2021, 6:14 PM
rriddle requested changes to this revision.Oct 13 2021, 12:26 AM

I think the concept is fine (as it is effectively just a combination of components we already support; i.e. keywords and strings), but that being said the name is very confusing. This isn't an identifier (in the parsing sense), as it could be a string (when not an identifier?). Also, how does this interact with the Keyword methods?

This revision now requires changes to proceed.Oct 13 2021, 12:26 AM

I agree, and I don't think that terseness is important here. How about "PotentiallyQuotedKeyword" or something explicit like that?

parsePotentiallyQuotedKeyword has the same sort of problem, where a keyword can't (by the tokenizer's definition) be quoted, but I like still like it better than the first suggestion. The most explicit thing I can think of is parseKeywordOrString, but the symmetrical API printKeywordOrString is a little weird (e.g. is it obvious that it prints a string when the text is not a valid keyword?).

I think "KeywordOrString" is close enough, despite the ambiguity on the print side of things.

youngar updated this revision to Diff 379802.Oct 14 2021, 12:21 PM
youngar retitled this revision from [MLIR] Add generic identifier parser to AsmParser to [MLIR] Add `KeywordOrString` handling to AsmParser.
youngar edited the summary of this revision. (Show Details)

identifier -> keywordOrString

rriddle accepted this revision.Oct 14 2021, 3:54 PM
rriddle added inline comments.
mlir/include/mlir/IR/OpImplementation.h
469–478

I would have the default methods take a std::string * parameter, and potentially add additional overloads that construct a StringAttr. This is the base AsmParser class, which is also used by attributes/types and those generally don't want to go through StringAttr.

This revision is now accepted and ready to land.Oct 14 2021, 3:54 PM
youngar updated this revision to Diff 379922.Oct 14 2021, 11:20 PM

Parse std::strings instead of StringAttrs. No overloads for parsing
attributes were added.

youngar marked an inline comment as done.Oct 14 2021, 11:40 PM
This revision was automatically updated to reflect the committed changes.

nice, thanks Andrew