This is an archive of the discontinued LLVM Phabricator instance.

Allow non-variadic functions to be attributed with `__attribute__((format))`
ClosedPublic

Authored by fcloutier on Oct 26 2021, 3:35 PM.

Details

Summary

Clang only allows you to use __attribute__((format)) on variadic functions. There are legit use cases for __attribute__((format)) on non-variadic functions, such as:

(1) variadic templates

template<typename… Args>
void print(const char *fmt, Args… &&args) __attribute__((format(1, 2))); // error: format attribute requires variadic function

(2) functions which take fixed arguments and a custom format:

void print_number_string(const char *fmt, unsigned number, const char *string) __attribute__((format(1, 2)));
// ^error: format attribute requires variadic function

void foo(void) {
    print_number_string(“%08x %s\n”, 0xdeadbeef, “hello”);
    print_number_string(“%d %s”, 0xcafebabe, “bar”);
}

This change allows Clang users to attach __attribute__((format)) to non-variadic functions, including functions with C++ variadic templates. It replaces the error with a GCC compatibility warning and improves the type checker to ensure that received arrays are treated like pointers (this is a possibility in C++ since references to template types can bind to arrays).

Diff Detail

Event Timeline

fcloutier created this revision.Oct 26 2021, 3:35 PM
fcloutier requested review of this revision.Oct 26 2021, 3:35 PM

Thank you for this! I have some concerns that I'd like to talk out to hopefully make myself feel more comfortable with this.

My understanding of the utility from -Wformat warnings comes from the fact that the function receiving the format string has no way to validate that the variadic arguments passed have the correct types when compared to the format string. However, in both of the new cases you propose to support, that type information is present in the function signature (you may have to go through a bunch of template instantiations in the variadic template cases, but you get there eventually). Having the types in the signatures changes something I think might be pretty fundamental to the way the format string checker works -- it tries to figure out types *after default argument promotions* because those promotions are a lossy operation. However, when the types are specified in the signature, those default argument promotions no longer happen. The type passed for a %f may actually be a float rather than promoted to a double, the type for a char may actually be char rather than int, etc. I'm not convinced this is as simple as "now we allow non-vararg functions" for the implementation. I think we need some more extensive testing to prove that the diagnostics actually make sense or not.

In terms of the specific cases allowed, I think I am happier about variadic templates than I am about fixed-signature functions. For the variadic template, I think supporting -Wformat would mean users don't have to reimplement similar logic to what that check already handles even though they can do it themselves. But for a fixed-signature function, I'm not certain I see what the value add is -- the only valid format strings such a function could accept would have to be fixed to the function signature anyway. Can you explain what use cases you have for that situation?

Thanks for looking, Aaron. You're right that the main utility of the aggregation of format warnings is to extend C's type checking because there is no other good way, or good place, to do it. I have built hundreds of millions of shipping lines of C, C++ and Objective-C, and this change seems like it would be an effective fix in several places where we don't currently have anywhere else to go.

For variadic templates, you're right that at some final instantiation, the compiler will have all the format argument types in hand. What gets lost in piping and substitution is actually the format string that Clang must type-check against. You can see this in action in the LLVM code base, actually: if we turn on -Wformat-nonliteral for files including llvm/include/Support/Format.h, the warning will trigger for users of the llvm::format function. This is because the format string is stored in format_object_base::Fmt instead of being directly forwarded, and sprinkling constexpr in strategic places won't resolve this issue because SemaChecking has its own custom expression evaluator. Swapping it out for the more common ExprConstant stuff is probably not impossible, but it's difficult because SemaChecking supports its own kind of symbolism. For instance, it's OK to use a format string parameter as the format string argument of another function that wants one, and other attributes like format_arg can tell you to assume the same specifiers as you get from another expression. So, we could fix this, but it would be more work, and the same purpose is served by allowing fixed arguments to participate in the format attribute checking. Verifying that users of the LLVM llvm::format function are doing the right thing is a better experience than letting this bubble up from however many levels of template instantiation there is.

Type checking may well actually need to be tested better for the case where variadic argument promotions don't happen, but there's a fairly finite number of ways it could go wrong, and I still think that's the easier way to go.

I don't think that we can easily distinguish between parameters created by variadic templates and fixed parameters from a function without variadic templates. I also think that most people who write functions like llvm::format aggressively do not want to be in charge of type-checking the format string themselves, and would much rather defer to Clang's existing type checker. If Clang allows the format attribute to type-check parameters created by variadic templates, it follows that the path of least resistance is to allow it on functions with fixed arguments.

With that said, there are still use cases for allowing format strings in functions with fixed arguments. (Interestingly, it would not be possible to mix fixed format arguments with variadic format arguments in a C function, so maybe we can prevent that altogether, but it does not remove from the general usefulness of the feature). You'll have to take my word for this, but it's one of the handful of blockers that we have for very broad adoption of -Wformat-nonliteral. The story usually goes like this: there's some function NSString *foo(NSString *fmt, NSMyFoo *obj1, NSMyBar *obj2) that needs to do something with obj1 and obj2 before it decides whether it actually wants to print them, return something unrelated altogether, throw an exception, etc. General limitations of the C language make it difficult to untangle the print parts from the logic parts for reasons which, to me, align rather closely to compiler capriciousness given how close it is to supporting the feature. This is more common in Objective-C code bases as %@ is a universal format specifier for all object types.

Another way we could approach this problem is to have a new attribute to say that a format argument must have a string that conforms to another constant format string. For instance, I think that NSString *foo(NSString *fmt, NSMyFoo *obj1, NSMyBar *obj2) __attribute__((format_like(1, @"hello %@ %@!"))), by the compiler checks that fmt has specifiers equivalent to %@ %@, would give us roughly the same safety improvements. It would allow for even more fun things, like passing a format string to a function that will then itself supply arguments to a formatting function.

What do you think?

[...] Having the types in the signatures changes something I think might be pretty fundamental to the way the format string checker works -- it tries to figure out types *after default argument promotions* because those promotions are a lossy operation. However, when the types are specified in the signature, those default argument promotions no longer happen. The type passed for a %f may actually be a float rather than promoted to a double, the type for a char may actually be char rather than int, etc.

True, and I think you're right that this change needs some really exhaustive tests. First, notice that the point of format(printf) is still ultimately to pass the arguments to some printf-like varargs function that will do the default argument promotions. So if our argument type is float, well, in the non-pathological cases, eventually it should become a double (when it is finally passed to printf, possibly several levels deeper in the function call stack). But second, consider situations like

void myprintf(const char *fmt, int n) __attribute__((format(printf, 1, 2)));  // N.B.: int, not unsigned long
int main() {
    myprintf("%lu", 3uL);  // this should error
    myprintf("%d", 3uL);  // this should not error
}

In terms of the specific cases allowed, I think I am happier about variadic templates than I am about fixed-signature functions. [...] For a fixed-signature function, I'm not certain I see what the value add is -- the only valid format strings such a function could accept would have to be fixed to the function signature anyway.

Or, in some situations (like logging), the format string could be some weird amalgam of the signature and something else, right?

void mydebug(const char *fmt, int line) { printf(fmt, "test.cpp", line); }
int main() {
    mydebug("%s:%d", 42);  // printf("%s:%d", 42) would not be correct; but the way mydebug ultimately uses printf, this is actually safe and intentional
}

When we receive a va_list or a varargs ..., this scenario doesn't arise (AFAIK), because there is no way to manipulate or insert-more-arguments-into a va_list after the fact.

Initially I thought this PR was a slam-dunk, but thinking about all the pathological-but-possible corner cases here does make me more skeptical now.

Thanks Arthur for your feedback.

void myprintf(const char *fmt, int n) __attribute__((format(printf, 1, 2)));  // N.B.: int, not unsigned long
int main() {
    myprintf("%lu", 3uL);  // this should error
    myprintf("%d", 3uL);  // this should not error
}

This is handled naturally by the current implementation. The integer literal undergoes an implicit cast to int because that's the type of the n parameter, and it causes the %lu case to fail and the %d case to succeed.

Your second example is the scenario of concern for adding an attribute like the format_like attribute that I described in my response to Aaron. I think that these two features don't need to be tied together.

Thanks for looking, Aaron.

Thank you for the detailed response!

Type checking may well actually need to be tested better for the case where variadic argument promotions don't happen, but there's a fairly finite number of ways it could go wrong, and I still think that's the easier way to go.

I think this is the current thing for us to test and try out. It shouldn't be impossible to do somewhat exhaustive testing (there's only so many format specifiers and length modifiers to worry about), but it's those edge cases that have me worried.

I don't think that we can easily distinguish between parameters created by variadic templates and fixed parameters from a function without variadic templates.

Agreed. This should be checking the instantiations, so by that point, the variadic template is really more like a fixed parameter list anyway.

I also think that most people who write functions like llvm::format aggressively do not want to be in charge of type-checking the format string themselves, and would much rather defer to Clang's existing type checker. If Clang allows the format attribute to type-check parameters created by variadic templates, it follows that the path of least resistance is to allow it on functions with fixed arguments.

Also agreed.

With that said, there are still use cases for allowing format strings in functions with fixed arguments. (Interestingly, it would not be possible to mix fixed format arguments with variadic format arguments in a C function, so maybe we can prevent that altogether, but it does not remove from the general usefulness of the feature). You'll have to take my word for this, but it's one of the handful of blockers that we have for very broad adoption of -Wformat-nonliteral. The story usually goes like this: there's some function NSString *foo(NSString *fmt, NSMyFoo *obj1, NSMyBar *obj2) that needs to do something with obj1 and obj2 before it decides whether it actually wants to print them, return something unrelated altogether, throw an exception, etc. General limitations of the C language make it difficult to untangle the print parts from the logic parts for reasons which, to me, align rather closely to compiler capriciousness given how close it is to supporting the feature. This is more common in Objective-C code bases as %@ is a universal format specifier for all object types.

Another way we could approach this problem is to have a new attribute to say that a format argument must have a string that conforms to another constant format string. For instance, I think that NSString *foo(NSString *fmt, NSMyFoo *obj1, NSMyBar *obj2) __attribute__((format_like(1, @"hello %@ %@!"))), by the compiler checks that fmt has specifiers equivalent to %@ %@, would give us roughly the same safety improvements. It would allow for even more fun things, like passing a format string to a function that will then itself supply arguments to a formatting function.

What do you think?

I think the second option may be an idea worth exploring, but if we can avoid adding another attribute, that's usually preferred. I think that if we add sufficient test coverage, it'd make sense to use the existing attribute. As Arthur pointed out, ultimately this stuff is expected to be passed to printf (et al) and so long as the attribute continues to honor pointing out issues with that goal, I think it's a reasonable one to use in these situations. I think we only need to consider a secondary attribute if we find that the semantics we need are sufficiently different to warrant it.

Agreed. This should be checking the instantiations, so by that point, the variadic template is really more like a fixed parameter list anyway.

FWIW, in my own mental model, there's a big semantic difference between (varargs functions, variadic templates) on the one hand and (non-template functions) on the other. In my experience, there's nothing you can do with a varargs ellipsis except forward it along as a va_list; and it's uncommon to do anything with a variadic parameter pack except forward it along via std::forward; but messing with fixed arguments is quite common. Technically, you can mess with a parameter pack too:

void apple(const char *fmt, int x, int y) __attribute__((format(printf, 1, 2))) {
    printf(fmt, double(x), double(y));
}
void banana(const char *fmt, auto... args) __attribute__((format(printf, 1, 2))) {
    printf(fmt, double(args)...);
}
int main() {
    apple("%g %g", 17, 42);  // well-defined; shall we warn anyway? (My gut feeling is that this is relatively common)
    banana("%g %g", 17, 42);  // well-defined; shall we warn anyway? (My gut feeling is that this is extremely rare)
}

This morning I am leaning in favor of this PR as written. If a programmer wants apple/banana to be callable like that, without any diagnostics, then their appropriate course of action is simply not to apply the __attribute__((format(printf, 1, 2))) annotation.

clang/include/clang/Basic/DiagnosticSemaKinds.td
4129

I'd say with the %0 attribute (add "the")

clang/lib/AST/FormatString.cpp
324–328

Also, function references will decay to function pointers. I have no idea if you need to do anything special here to get the "right behavior" for function references. But please add a (compile-only?) test case for the function-pointer codepath, just to prove it doesn't crash or anything.

clang/test/Sema/attr-format.cpp
5–7 ↗(On Diff #382473)

Can we also do an example of a member function variadic template?

struct S {
  template<class... Args>
  void format(const char *fmt, Args&&... args)
    __attribute__((format(printf, 2, 3)));
};

Also, I believe that this entire file should be removed from Sema/ and combined into SemaCXX/attr-format.cpp.

I also notice that we have literally zero test coverage for cases where the format string is not the first argument to the function; but that can-and-should be addressed in a separate PR.

14 ↗(On Diff #382473)

Basically, add , do_format here.
(Aside: I'm surprised that Clang quietly lets you print a function pointer with %p. It'll work on sane architectures, but in general C and C++ don't require that function pointers even be the same size as void* — technically this should be UB or at least impl-defined behavior.)

fcloutier updated this revision to Diff 435302.Jun 8 2022, 12:47 PM

Apologies the long delay: things happened and I was pulled away. I have some time to finish this change now. I recommend re-reading the discussion up to now since it's not _that_ long and it provides a lot of very useful context.

The new change addresses requests from the previous round. The most substantial changes are around how Clang detects that a format string is being forwarded to another format function. This is now expressed in terms of transitions from format argument passing styles, such that given the following 3 function archetypes:

c
void fixed(const char *, int) __attribute__((format(printf, 1, 2)));
void variadic(const char *, ...) __attribute__((format(printf, 1, 2)));
void valist(const char *, va_list) __attribute__((format(printf, 1, 0)));

there are no warnings for:

  • a variadic function forwarding its format to a valist function
  • a valist function forwarding its format to another valist function
  • a fixed function forwarding its format to another fixed function (new)
  • a fixed function forwarding its format to a variadic function (new)

In other words, for instance, fixed can call variadic in its implementation without a warning. Anything else, like forwarding the format of a valist function to a fixed function, is a diagnostic.

fixed to fixed/variadic transitions don't check that arguments have compatible types, but it conceivably could. This is a limitation of the current implementation. However, at this point, we don't think that this is a very worthwhile effort; this could change in the future if adoption of the format attribute on functions with a fixed signature ramps up.

I also added a number of tests to make sure that we still have reasonable warnings. One interesting edge case when using __attribute__((format)) on functions with fixed parameters is that it's possible to come up with combinations that are impossible, for instance:

c
struct nontrivial { nontrivial(); ~nontrivial(); };
void foo(const char *, nontrivial) __attribute__((format(printf, 1, 2)));

It's not a diagnostic to declare this function, however it is always a diagnostic to call it because no printf format specifier can format a nontrivial object. Ideally there would be a diagnostic on the declaration, but I think that it's sufficient as it is.

Herald added a project: Restricted Project. · View Herald TranscriptJun 8 2022, 12:47 PM

Would it be better if I asked a colleague to finish the review?

Would it be better if I asked a colleague to finish the review?

Typically, you should try to get a LG from the reviewers who have been active on the review in the past (assuming they're still active in the community now). So no -- It just takes a while because there's a lot of review work to be done and only so many hours in the day; sorry for the delays!

I think there are some missing changes to AttrDocs.td for the new functionality, and this should have a release note as well.

clang/include/clang/Basic/DiagnosticSemaKinds.td
4126

Slight tweaks: GCC requires a function with the 'format' attribute to be variadic

clang/lib/AST/FormatString.cpp
327

I think this should be:

if (argTy->canDecayToPointerType())
  argTy = C.getDecayedType(argTy);
clang/lib/Sema/SemaChecking.cpp
5434–5440

Elide braces here (coding style rule).

8622–8623

Can use const auto * in these cases.

8626

Same here.

8630–8631

A better way to write this would be:

if (const auto *FnTy = D->getType()->getAs<FunctionProtoType>())
  IsVariadic = FnTy->isVariadic();
...
10027

Same suggestion here as above to use canDecayToPointerType() instead.

clang/lib/Sema/SemaDeclAttr.cpp
3881–3885

There's some braces you can elide here now.

fcloutier updated this revision to Diff 441703.Jul 1 2022, 8:47 AM
fcloutier set the repository for this revision to rG LLVM Github Monorepo.

Thanks, Aaron. I wasn't sure how to follow up given how long it had been since the review started. I understand that we're all busy (which explains the week delay on my part here as well).

I've addressed all of your comments except the one on this bit:

if (const FunctionType *FnTy = D->getFunctionType())
  IsVariadic = cast<FunctionProtoType>(FnTy)->isVariadic();

The proposed change isn't identical because D->getFunctionType() can return nullptr (for instance, if D is a BlockDecl). However, in the case FnTy isn't nullptr, then it is guaranteed to be a FunctionProtoType as the attribute is rejected on functions without a prototype.

Thanks, Aaron. I wasn't sure how to follow up given how long it had been since the review started. I understand that we're all busy (which explains the week delay on my part here as well).

No worries, pinging the review like you did is a good way to try to get it more attention, though it sometimes takes a few tries depending on the review.

I've addressed all of your comments except the one on this bit:

if (const FunctionType *FnTy = D->getFunctionType())
  IsVariadic = cast<FunctionProtoType>(FnTy)->isVariadic();

The proposed change isn't identical because D->getFunctionType() can return nullptr (for instance, if D is a BlockDecl). However, in the case FnTy isn't nullptr, then it is guaranteed to be a FunctionProtoType as the attribute is rejected on functions without a prototype.

The suggestion I had was slightly different:

if (const auto *FnTy = D->getType()->getAs<FunctionProtoType>())
  IsVariadic = FnTy->isVariadic();

It's getting as a prototyped function, and only if that succeeds do we check whether it's variadic. I think that is equivalent to what you have now, but is more clearly expressed. WDYT?

I'm afraid that's also not possible: D is a Decl, so it doesn't have getType(). Decl is the tightest-fitting superclass of BlockDecl, FunctionDecl and ObjCMethodDecl (because BlockDecl is a direct subclass of it).

One option could be to cast the Decl to a FunctionDecl and then use FDecl->isVariadic(), similarly to how it goes for BlockDecl and ObjCMethodDecl. I'm not sure that it's equivalent, but if you believe it is and like it better, I can do that.

I'm afraid that's also not possible: D is a Decl, so it doesn't have getType(). Decl is the tightest-fitting superclass of BlockDecl, FunctionDecl and ObjCMethodDecl (because BlockDecl is a direct subclass of it).

One option could be to cast the Decl to a FunctionDecl and then use FDecl->isVariadic(), similarly to how it goes for BlockDecl and ObjCMethodDecl. I'm not sure that it's equivalent, but if you believe it is and like it better, I can do that.

Ahhhhhh, I had forgotten about BlockDecl not being a ValueDecl. In that case, I think the code is fine as-is, sorry for the noise!

I think this generally LG; I found a few minor nits in the documentation and some questions on the tests. The test stuff can be handled in a follow-up if you think it's worthwhile.

clang/include/clang/Basic/AttrDocs.td
3147
3165
clang/test/SemaCXX/attr-format.cpp
76

This pointed out an interesting test case. What should the behavior be for:

format("%p", 0);

Because that sure feels like a more reasonable thing for someone to write expecting it to be treated as a null pointer constant.

77–78

This likely isn't specific to your changes, but the %p in these examples should be warning the user (a function or function pointer is not a pointer to void or a pointer to a character type, so that call is UB).

fcloutier updated this revision to Diff 442373.Jul 5 2022, 12:31 PM
fcloutier marked 2 inline comments as done.

Address documentation comments.

clang/test/SemaCXX/attr-format.cpp
76

I think that the current behavior is the right one:

test.c:4:17: warning: format specifies type 'void *' but the argument has type 'int' [-Wformat]
        printf("%p\n", 0);
                ~~     ^
                %d

The warning goes away if you use (void *)0, as expected. __attribute__((format)) has no semantic meaning, so we can't (and shouldn't) infer that 0 is a pointer based on the usage of %p.

77–78

This is already a -Wformat-pedantic warning, which IMO is the right warning group for it:

test.c:4:17: warning: format specifies type 'void *' but the argument has type 'int (*)()' [-Wformat-pedantic]
        printf("%p\n", main);
                ~~     ^~~~
1 warning generated.

The relevant bit is clang/lib/AST/FormatString.cpp:

case CPointerTy:
  if (argTy->isVoidPointerType()) {
    return Match;
  } if (argTy->isPointerType() || argTy->isObjCObjectPointerType() ||
        argTy->isBlockPointerType() || argTy->isNullPtrType()) {
    return NoMatchPedantic;
  } else {
    return NoMatch;
  }
aaron.ballman accepted this revision.Jul 5 2022, 1:01 PM

LGTM!

clang/test/SemaCXX/attr-format.cpp
76

Ah, you know what, I've convinced myself I was wrong and you're right. C2x 7.22.6.1p9 gives the latest conversion rules here, and I think passing 0, despite being the null pointer constant, is UB when the format specifier is %p. On targets where int and void * are the same width, this diagnostic feels rather pedantic. But on systems where those differ, it seems more important to issue the warning... so I think you're correct that we should leave this behavior alone.

Thanks for thinking it through with me. :-)

77–78

Ah, good that we have it in a pedantic diagnostic. I agree, it is a pedantic one, I thought we were missing it entirely.

This revision is now accepted and ready to land.Jul 5 2022, 1:01 PM
fcloutier updated this revision to Diff 442396.Jul 5 2022, 2:23 PM

There was a merge conflict on the release notes, updating the differential to get a CI build.

Unknown Object (User) added a subscriber: Unknown Object (User).Oct 19 2022, 6:49 PM

Thank you for your contribution! It is quite beneficial to me!
fireboy and watergirl