This is an archive of the discontinued LLVM Phabricator instance.

[NFC] Make format() more amenable to format attributes

Authored by fcloutier on Aug 22 2022, 2:43 PM.



llvm/Support/Format.h provides format(), a variadic template function that accepts a printf-style format string and the corresponding fixed format arguments. While this function does forward its arguments to snprintf, which is type-checked thanks to the gnu::format attribute on platforms where that attribute is supported, it cannot itself have that attribute because GCC does not allow the format attribute on non-variadic functions (variadic template functions not excepted).

Regardless, we can change the implementation of format() so that it would be more amenable to this attribute. Recently, clang did gain support for gnu::format on non-variadic functions. We still can't use it because GCC would reject the code. However, this does allow vendors who never build LLVM with GCC themselves to maintain a patch which has that attribute enabled, catching a few errors in the process. The changes to the format_object class allow us to attribute its constructor with gnu::format (which, in turns, allows us to attribute format() with gnu::format as well).

In addition, we create a new function, format_chk(Expected, Fmt, T...), for the case where Fmt isn't known at compile-time. At runtime, Fmt is verified to have specifiers which are compatible with Expected's.

There will be another PR fixing any eventual format specifier mistakes found. In warning flags parlance, this would allow LLVM to build with -Wformat -Wformat-nonliteral.

Diff Detail

Event Timeline

fcloutier created this revision.Aug 22 2022, 2:43 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 22 2022, 2:43 PM
fcloutier requested review of this revision.Aug 22 2022, 2:43 PM

If you write format_chk("<fallback: %u>", Format.c_str(), (unsigned char)(From)); in your code and at runtime the format string passed to the function is "%hu" (which means an argument of type short is expected), EnsureCompatible will just accept it. Is that correct?

Since clang emits a warning when compiling printf("%hu", (char)c); with -Wformat-type-confusion, it seems that you may want EnsureCompatible to reject it at runtime if you want to be strict and follow what clang is doing.

What are the rules for accepting or rejecting a format string at runtime?

Clang is being more zealous than the standard by warning you about this because integer-type variadic arguments undergo integer promotion ( in n2310 numbering). It is effectively impossible to pass an integer smaller than int to a variadic function. This means %hu and %hhu are constraints that the format function needs to read an unsigned short or unsigned char from the va_list, but anything smaller than unsigned will be converted to unsigned as it is being passed to the variadic function, and va_arg(ap, unsigned char) is necessarily equivalent to (unsigned char)va_arg(ap, unsigned). This is reiterated and explicitly spelled out in, which describes format strings:

hh: Specifies that a following d, I, o, u, x, or X conversion specifier applies to a signed char or unsigned char argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to signed char or unsigned char before printing); or that a following n conversion specifier applies to a pointer to a signed char argument.

This is the rationale for which EnsureCompatible lets you "confuse" any integer size up to int. The bar chosen is whether it would be undefined behavior to proceed, and in this case it isn't.

ahatanak added inline comments.Sep 8 2022, 6:45 PM

I don't think it's clear to the users of this function what they should pass for Expected. Can you leave a comment that explains what this function expects or in which cases Expected and Fmt would be incompatible?


Can you leave comments that explain what these enums mean?


I think this rejects the combination of length modifier l and floating point conversions (e.g., %lf), but the standard doesn't say that's undefined behavior. It just says it has no effect.


Can you use StringRef::find or some other functions that is available instead of writing a loop here?


Flags are completely ignored here, but isn't the behavior undefined for some combinations of flags and conversions according to the standard (e.g., "%0p")? If it is, can you leave a comment that notes that this code ignores undefined behavior? Or can you leave a FIXME note if you plan to fix this in the future?


Don't you have to check that the lengths of both lists are equal? What happens if Expected is %d%d and Fmt is %d?


I think it's better to immediately fail here if the format string passed at runtime is invalid and inform the user of the location of the invalid string by calling a function in Error.h (e.g., PrintFatalError).

fcloutier updated this revision to Diff 459230.Sep 9 2022, 4:48 PM
fcloutier marked 7 inline comments as done.

Address @ahatanak's comments.


Since there was just one use of this function and it was preferable to check a different way, I removed it entirely.


I started off using StringRef and found it really cumbersome compared to using a raw const char *. In retrospective, it's not surprising given that printf was meant for C first. I removed next and replaced its only use with strchr to avoid the loop.


I implemented the missing checks.


It's not UB to call va_end before you've read each argument of a va_list, so the one thing that matters is that Fmt doesn't have more specifiers than Expected. This is ensured by breaking out of the loop when Fmt runs out of specifiers, regardless of whether Expected still has more. If we run out of Expected specifiers while Fmt still has specifiers, we error out.

ahatanak added inline comments.Oct 24 2022, 12:18 PM

Can you use char as the underlying type for enums?


It's not UB, but why would a user want to pass a different number of specifiers? Isn't that an indication of some kind of error?

fcloutier updated this revision to Diff 470300.Oct 24 2022, 3:04 PM
fcloutier marked an inline comment as done.
ahatanak accepted this revision.Oct 24 2022, 9:40 PM

LGTM with minor comments.


Function names should start with a lower case letter.

Please fix this and other functions in this patch that start with an upper case letter.


The coding standard recommends using preincrement. A few other places are using postincrement too.

This revision is now accepted and ready to land.Oct 24 2022, 9:40 PM
fcloutier updated this revision to Diff 470625.Oct 25 2022, 3:21 PM
fcloutier marked an inline comment as done.

Addressing final comments, running diff again for automated testing.

Failing tests are from clangd and WebAssembly backend, which are not touched by this change.


There's just one use of it, so I'll change it to require the two sequences to have the same length.

This revision was landed with ongoing or failed builds.Oct 26 2022, 12:12 PM
This revision was automatically updated to reflect the committed changes.
beanz added a subscriber: beanz.Oct 26 2022, 12:26 PM

Failing tests are from clangd and WebAssembly backend, which are not touched by this change.

You sure? I'm seeing failures on main now in the Support format tests on ubuntu.

That's what it complained about:

But I received the mail for Ubuntu and am on it. Sorry about that.

thakis added a subscriber: thakis.Oct 26 2022, 12:51 PM

This is breaking check-llvm:

Please take a look and revert for now if it takes a while to fix.

I will revert now and continue the investigation since it has been about 15 minutes.