This is an archive of the discontinued LLVM Phabricator instance.

[TableGen] Add new bang operator !format
Needs ReviewPublic

Authored by wangpc on Jul 27 2023, 6:44 AM.

Details

Summary

We can use # operator to concatenate strings, but it can be too
tedious sometimes and we can't use it to concatenate code strings.

A new bang operator !format is added to format strings (very limited
currently).

!format("{0} {1} ...", arg0, arg1, ...)

Its first operand is a format string, and followings are several
operands (can be none) to format. Currently, {i} where i is the
i-th format argument is the only placeholder format we support.

This bang operator can be used to format code string, which can
reduce a lot of duplicated code.

See D156432 for usages.

Diff Detail

Unit TestsFailed

Event Timeline

wangpc created this revision.Jul 27 2023, 6:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 27 2023, 6:44 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
wangpc requested review of this revision.Jul 27 2023, 6:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 27 2023, 6:44 AM
michaelmaitland added inline comments.
llvm/test/TableGen/format.td
24

Emit a warning? Probably an unintended usage of !format.

25

Emit a warning? Probably an unintended usage of !format.

67

Could be nice to add:

  • Format string but not enough arguments
  • Format string but too many arguments
simon_tatham added inline comments.Jul 27 2023, 7:54 AM
llvm/lib/TableGen/Record.cpp
2091–2093

I think it would at least be a good idea to add an escaping system, so that there's no literal text that can't be expressed at all.

Surely the right escaping system is the one that (as far as I can see) Python and std::format already agree on: {{ means a literal {, and }} means a literal }. In particular, {{{1}}} ought to expand to a literal {, then parameter 1, then a literal }.

If you do that, then the syntax has room for future enhancements (by putting more elaborate format specifiers inside the braces), but any format string that starts off valid will continue to be valid and do the same thing.

Thank you for implementing this. I have added just a couple of comments.

Francesco

llvm/docs/TableGen/ProgRef.rst
1731

This is really nitpicking. You allow (and I think it is correct) invocations without *args*:

string formatNoArg       = !format("");

... so maybe replace this with the suggestion? This is how it is done for printf-like variadic functions: https://en.cppreference.com/w/c/io/fprintf

llvm/include/llvm/TableGen/Record.h
1275–1276

Do we need this API? It doesn't seem to be used anywhere... (am I missing something?)

llvm/lib/TableGen/Record.cpp
1618

Found sounds like boolean to me. How about Pos?

2090

Avoid auto, cannot get the type information from the RHS.

2135

Same, no type info from the invocation of args(). Avoid the use of auto

llvm/test/TableGen/format.td
28

Nit: can we test these with values that do not correspond to the actual placeholder number? maybe replacements like the one suggested.

The documentation needs to be expanded to describe how each argument type is handled.

Would it be better to use a specific character as the escape character, rather than doubling the braces? For example, { and }

Would it be better to use a specific character as the escape character, rather than doubling the braces? For example, `{ and `}

That's probably how I'd do it if I were designing a formatting/interpolation syntax from scratch. But in the case where two major languages already agree on a syntax looking very like the one we're adopting here, I'd say that a more important concern is to avoid being confusingly similar-but-incompatible.

Would it be better to use a specific character as the escape character, rather than doubling the braces? For example, `{ and `}

That's probably how I'd do it if I were designing a formatting/interpolation syntax from scratch. But in the case where two major languages already agree on a syntax looking very like the one we're adopting here, I'd say that a more important concern is to avoid being confusingly similar-but-incompatible.

But in the case of Python, there is also a new string prefix. I don't know about other languages. Anyhoo, as long as we're convinced we won't eventually be doubling a third and fourth character (incompatibly), then I suppose it's okay. I like the extensible solution of a quoting character.

"new string prefix": if you're talking about Python f-strings, then the brace-based formatting syntax predates those by a long way. You could write "{0} {1}".format(a, b) long before you could abbreviate it to f"{a} {b}". But both of those syntaxes agree on writing a literal { or } by doubling it.

(And yes, I think the idea is that only the braces will ever need to be escaped in literal text, because everything else that's interesting in one of these format strings happens inside the braces, where different rules apply.)

"new string prefix": if you're talking about Python f-strings, then the brace-based formatting syntax predates those by a long way. You could write "{0} {1}".format(a, b) long before you could abbreviate it to f"{a} {b}". But both of those syntaxes agree on writing a literal { or } by doubling it.

(And yes, I think the idea is that only the braces will ever need to be escaped in literal text, because everything else that's interesting in one of these format strings happens inside the braces, where different rules apply.)

Yes, of course, the formatting syntax was invented before the f-strings. {{ and }} it is, then.

michaelmaitland added inline comments.Aug 1 2023, 8:13 AM
llvm/test/TableGen/format.td
23

What happens for something like:

!format("{}") or format!("{1} {} {0} {}", 1, 2);

wangpc marked 5 inline comments as done.Aug 1 2023, 8:42 AM
wangpc added inline comments.
llvm/test/TableGen/format.td
23

!format("{}")->"{}"
!format("{1} {} {0} {}", 1, 2)->2 {} 1 {}

Currently, I haven't implemented the format specifier {}, the only supported one is {i}.
The implementation of this patch is very simple, I hope we can delegate formatting to std::vformat someday, but std::vformat needs C++ 20.

michaelmaitland added inline comments.Aug 1 2023, 8:51 AM
llvm/test/TableGen/format.td
23

Can we add a test case for now so it is clear how this behaves? I am fine with delaying the change of how these act to be closer to std::vformat in the future.

tra added a comment.Aug 7 2023, 12:33 PM

Do we really need it at all?

Are there other use cases for it besides !dump() in D156420 ?

Can we just implement some sort or !repr(value) to give us textual representation of the value and then just do "some string" # !repr(some_value) # "some other string" ? That would be sufficient for debugging purposes.

Do we really need it at all?

Are there other use cases for it besides !dump() in D156420 ?

Can we just implement some sort or !repr(value) to give us textual representation of the value and then just do "some string" # !repr(some_value) # "some other string" ? That would be sufficient for debugging purposes.

Here is a proposed use: https://reviews.llvm.org/D156432

tra added a comment.Aug 7 2023, 1:12 PM

Here is a proposed use: https://reviews.llvm.org/D156432

I think we're overcomplicating things a bit too much here.
Specifically, this part: fmt can be a code fragment. . If you want to eval some code, that should probably be done with a lambda in D148915, though I'm not sure if we can define and invoke it like we could in C++ auto value = [](){ return "foo";} ()

When functions/lambdas are available, they would provide a better way to implement D156432 examples, IMO. I suspect it would obviate the need for !format.

When functions/lambdas are available, they would provide a better way to implement D156432 examples, IMO. I suspect it would obviate the need for !format.

The purpose of this patch, as @wangpc states, is to capture the ability to concatenate code strings. IMO we should go with the simplest and most straightforward approach. Maybe we should prefer lambdas/functions over format since the semantics for format is a particularly complex and non-standard one when we look at different languages (i.e. escaping, named args, positional args, formatters), while lambdas and functions are relatively more universal.