This is an archive of the discontinued LLVM Phabricator instance.

[clang][Sema] check default argument promotions for printf
ClosedPublic

Authored by inclyc on Aug 24 2022, 8:35 AM.

Details

Summary

The main focus of this patch is to make ArgType::matchesType check for
possible default parameter promotions when the argType is not a pointer.
If so, no warning will be given for int, unsigned int types as
corresponding arguments to %hhd and %hd. However, the usage of %hhd
corresponding to short is relatively rare, and it is more likely to be a
misuse. This patch keeps the original behavior of clang like this as
much as possible, while making it more convenient to consider the
default arguments promotion.

Fixes https://github.com/llvm/llvm-project/issues/57102

Diff Detail

Event Timeline

inclyc created this revision.Aug 24 2022, 8:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 24 2022, 8:35 AM
Herald added a subscriber: emaste. · View Herald Transcript
inclyc published this revision for review.Aug 24 2022, 8:51 AM

https://reviews.llvm.org/D132266

This patch seems to ignore this scenario:

printf("%hhf", [int])

Where we have length modifier h, and the argument type is int. Causes clang to suppress warnings about type mismatches between floats and integers.

clang/test/FixIt/format.mm
10 ↗(On Diff #455235)

The language built-in wchar_t will be properly diagnosed.

clang/test/SemaObjC/format-strings-objc.m
194 ↗(On Diff #455235)

This wchar_t is defined as typedef __WCHAR_TYPE__ wchar_t;. Actually it is int on my machine, I'm not sure if we should still report this warning as before, because it is just int, not a real "wchar_t".

Herald added a project: Restricted Project. · View Herald TranscriptAug 24 2022, 8:51 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Thank you for the patch, but it'd be really helpful to me as a reviewer if you and @nickdesaulniers could coordinate so there's only one patch trying to address #57102 instead of two competing patches (I'm happy to review whichever patch comes out). From a quick look over the test case changes, this patch looks like it's more in line with how I'd expect to resolve that issue compared to D132266.

Thank you for the patch, but it'd be really helpful to me as a reviewer if you and @nickdesaulniers could coordinate so there's only one patch trying to address #57102 instead of two competing patches (I'm happy to review whichever patch comes out). From a quick look over the test case changes, this patch looks like it's more in line with how I'd expect to resolve that issue compared to D132266.

The addition to clang/test/Sema/format-strings.c looks like it will resolve Linus' complaints in https://github.com/llvm/llvm-project/issues/57102; I'm happy to abandon D132266 in preference of this.

However, the usage of %hhd corresponding to short is relatively rare, and it is more likely to be a misuse.

Do we want to encode that in test_promotion in clang/test/Sema/format-strings.c? Seems like tests on shorts are missing.

clang/lib/AST/FormatString.cpp
359–360

might as well make this auto while you're here.

411–413

{} aren't necessary

417–420

{} aren't necessary

429–431

{} aren't necessary

clang/lib/Sema/SemaChecking.cpp
10084–10087

I think it would be slightly more readable to make these two assignments two dedicated statements.

clang/test/Sema/format-strings-scanf.c
271–272

Hmm...

274

delete empty newline

Do we want to encode that in test_promotion in clang/test/Sema/format-strings.c? Seems like tests on shorts are missing.

Tests for short and char "incompatibility" could be found elsewhere in this file.

format-strings.c

void should_understand_small_integers(void) {
  printf("%hhu", (short) 10); // expected-warning{{format specifies type 'unsigned char' but the argument has type 'short'}}
  printf("%hu\n", (unsigned char)1); // warning with -Wformat-pedantic only
  printf("%hu\n", (uint8_t)1);       // warning with -Wformat-pedantic only
}
/* ... */
void test13(short x) {
  char bel = 007;
  printf("bel: '0%hhd'\n", bel); // no-warning
  printf("x: '0%hhd'\n", x); // expected-warning {{format specifies type 'char' but the argument has type 'short'}}
}

Do I need to explicitly test again in the test_promotion?

inclyc updated this revision to Diff 455458.Aug 24 2022, 7:02 PM

Address comments & more tests & docs

inclyc updated this revision to Diff 455474.Aug 24 2022, 10:42 PM

Add tests for character literals

I've noticed that Linus mentioned the following code triggered warning by
clang. (With a little modifications shown below)

Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=21f9c8a13bb2a0c24d9c6b86bc0896542a28c197

printf("%hhd", 'a');
printf("%hd", 'a');

character literals are int in C. So clang-14 generates warnings like this:

local/format.c:9:20: warning: format specifies type 'char' but the argument has type 'int' [-Wformat]
    printf("%hhd", 'a');
            ~~~~   ^~~
            %d
local/format.c:10:19: warning: format specifies type 'short' but the argument has type 'char' [-Wformat]
    printf("%hd", 'a');
            ~~~   ^~~
            %hhd
2 warnings generated.

Ironically, we advise our users to change their source code, which leads to
another warning. I've added tests for this case, after this patch, only (%hd,
"char") will cause warnings. And clang will generate warning like this:

local/format.c:10:19: warning: format specifies type 'short' but the argument has type 'char' [-Wformat]
    printf("%hd", 'a');
            ~~~   ^~~
            %hhd
1 warning generated.

Use %hd with char is not reasonable. (Probabaly another misuse) So I kept
this warning as-is.

inclyc updated this revision to Diff 455485.Aug 25 2022, 12:02 AM

Fix CI issue

Thank you for the patch, but it'd be really helpful to me as a reviewer if you and @nickdesaulniers could coordinate so there's only one patch trying to address #57102 instead of two competing patches (I'm happy to review whichever patch comes out). From a quick look over the test case changes, this patch looks like it's more in line with how I'd expect to resolve that issue compared to D132266.

The addition to clang/test/Sema/format-strings.c looks like it will resolve Linus' complaints in https://github.com/llvm/llvm-project/issues/57102; I'm happy to abandon D132266 in preference of this.

Thanks, Nick!

I asked some questions about wchar_t behavior in the thread, but I sort of wonder if we should handle those concerns separately (if changes are needed) as it looks more like missing functionality than promotion-related behavior. We have other changes we'll need in this area anyway for C23 because of https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2630.pdf, https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2680.pdf, and https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2763.pdf so there's no need to cram it all into one patch.

clang/include/clang/AST/FormatString.h
263–264
clang/lib/AST/FormatString.cpp
368

Isn't this a match promotion as well? e.g. printf("%hhd", (_Bool)0); where the _Bool is promoted to int but then cast back down to a char type (which seems rather unlikely to be a type confusion issue).

380

In C, wchar_t is an integer type and that underlying type might be int or it might be promotable to int. Should we handle it here as a type confused promotion match?

404

It's a bit strange that we have two switches over the same BT->getKind() and the only difference is !Ptr; would it be easier to read if we combined the two switches into one and had logic in the individual cases for Ptr vs not Ptr?

411

wchar_t?

416–417

I agree that printf("%hhd", (short)0); is potentially type confused, but why printf("%d", (short)0);?

FWIW, I think that printf("%lc", (short)0)); is potentially type confused as well.

425

The comment is talking about passing a char (promoted to int) then printed as a short but the check is looking for the type to be printed as an int; that doesn't seem like a type confusion situation so much as a match due to promotion. Should the test below be looking for a short type instead of an int type?

432

Why is this automatically type confusion regardless of the type specified by the format string?

clang/lib/Sema/SemaChecking.cpp
10114

Do we need a similar special case for L'a' in C, which has type wchar_t (which is a typedef to an integer type)?

clang/test/FixIt/format.m
40 ↗(On Diff #455485)

It looks like some whitespace only changes snuck in.

173–176 ↗(On Diff #455485)

Should we leave the test but remove the expected warnings and fix-it comments instead?

195–208 ↗(On Diff #455485)

Hmmm, did we intend to change the behavior here? %C is an extension, so it's not clear to me if we wanted to modify this behavior or not. CC @rjmccall and @dexonsmith for some ObjC opinions.

clang/test/FixIt/format.mm
14–25 ↗(On Diff #455485)

Same questions here as above.

clang/test/Sema/format-strings-scanf.c
15

Looks like more whitespace-only changes snuck in.

265–266

WG14 hasn't made an official determination, but one test we should add is:

// FIXME: does this match what the C committee allows or should it be pedantically warned on?
char c;
void *vp;
scanf("%hhd", &c); // Pedantic warning?
scanf("%hhd", vp); // Pedantic warning?

I asked on the reflectors whether folks wanted me to file an NB comment about this and the answers have been mixed so far. I'll most likely still file an NB comment to get the committee officially on record though.

271–272

@nickdesaulniers -- are you thinking this might be a pedantic warning because it's always valid to cast to a pointer to char and write into it? e.g.,

short s;
char *cp = (char *)&s;
*cp = 0; // Not UB
289–290

Is what a clang bug?

Also, what is with the "or no effect" in that wording? "This could cause major issues or nothing bad at all might happen" is a very odd way to word a diagnostic. :-D (Maybe something to look into improving in a later patch.)

inclyc added inline comments.Aug 25 2022, 8:16 AM
clang/test/FixIt/format.m
40 ↗(On Diff #455485)

It looks like some whitespace only changes snuck in.

I'm sorry for this, do I need to discard these whitespace-only changes? Since this patch changes this file, is it my responsibility to keep other parts that don't belong to this patch unchanged, or am I correcting them by the way?

clang/test/Sema/format-strings-scanf.c
289–290

Is what a clang bug?

Look at sc which is a signed char, but scanf need a pointer float *, I add this comment because I think there should be a warning like

format specifies type 'float *' but the argument has type 'signed short *'

See printf tests below

aaron.ballman added inline comments.Aug 25 2022, 8:36 AM
clang/test/FixIt/format.m
40 ↗(On Diff #455485)

I'm sorry for this, do I need to discard these whitespace-only changes? Since this patch changes this file, is it my responsibility to keep other parts that don't belong to this patch unchanged, or am I correcting them by the way?

You should discard the whitespace only changes; you are welcome to land an NFC commit that fixes just the whitespace issues though!

We typically ask that changes unrelated to the patch summary should be separated out so that each patch is self-contained for just one thing. Not only is that easier to review, it also makes it less likely that we end up reverting good changes if we need to revert a patch.

clang/test/Sema/format-strings-scanf.c
289–290

Oh I see what you're saying! There are two issues with the code: one is that the format specifier itself is UB and the other is that the argument passed to this confused format specifier does not agree.

To me, the priority is the invalid format specifier; unless the user corrects that, we're not really certain what they were aiming to scan in. And I think it's *probably* more likely that the user made a typo in the format specifier instead of trying to scan a half float (or whatever they thought they were scanning) into a signed char, so it seems defensible to silence the mismatched specifier diagnostic.

WDYT?

inclyc added inline comments.Aug 25 2022, 8:57 AM
clang/lib/AST/FormatString.cpp
425

The comment is talking about passing a char (promoted to int) then printed as a short but the check is looking for the type to be printed as an int; that doesn't seem like a type confusion situation so much as a match due to promotion. Should the test below be looking for a short type instead of an int type?

Sorry for this (sometimes shit happens). There is a lot of redundant logic in this place, I will upload the corrected code immediately.

clang/test/Sema/format-strings-scanf.c
289–290
printf("%hf", // expected-warning{{length modifier 'h' results in undefined behavior or no effect with 'f' conversion specifier}}
  sc); // expected-warning{{format specifies type 'double' but the argument has type 'signed char'}}

Then clang seems to have two different behaviors to printf and scanf, and for printf it will give the kind of diagnosis I said. I don't think this problem is particularly serious, because the key point is that using %hf UB. So maybe we can just keep clang as-is? Or we should consider implement the same warning in scanf ?

inclyc marked 2 inline comments as done.Aug 25 2022, 9:23 AM
inclyc added inline comments.
clang/lib/AST/FormatString.cpp
368

Hmm? _Bool just matches AnyCharTy perfectly. MatchPromotion means the types having different length and they can be considered as "partially matched" because of promotions).

Isn't _Bool and AnyChar here just have the same length?

aaron.ballman added inline comments.Aug 25 2022, 9:29 AM
clang/lib/AST/FormatString.cpp
425

No worries!

clang/test/Sema/format-strings-scanf.c
289–290

So maybe we can just keep clang as-is?

I think let's keep clang as-is for the moment so we can keep the concerns separate.

aaron.ballman added inline comments.Aug 25 2022, 9:37 AM
clang/lib/AST/FormatString.cpp
368

Hmm? _Bool just matches AnyCharTy perfectly. MatchPromotion means the types having different length and they can be considered as "partially matched" because of promotions).

Isn't _Bool and AnyChar here just have the same length?

_Bool is not a character type, so it doesn't match AnyCharTy perfectly. The size can be the same but they're still different types.

inclyc updated this revision to Diff 455626.Aug 25 2022, 10:01 AM

Address comments

inclyc marked 20 inline comments as done.Aug 25 2022, 10:08 AM

Do we want to encode that in test_promotion in clang/test/Sema/format-strings.c? Seems like tests on shorts are missing.

Tests for short and char "incompatibility" could be found elsewhere in this file.

format-strings.c

void should_understand_small_integers(void) {
  printf("%hhu", (short) 10); // expected-warning{{format specifies type 'unsigned char' but the argument has type 'short'}}
  printf("%hu\n", (unsigned char)1); // warning with -Wformat-pedantic only
  printf("%hu\n", (uint8_t)1);       // warning with -Wformat-pedantic only
}
/* ... */
void test13(short x) {
  char bel = 007;
  printf("bel: '0%hhd'\n", bel); // no-warning
  printf("x: '0%hhd'\n", x); // expected-warning {{format specifies type 'char' but the argument has type 'short'}}
}

Do I need to explicitly test again in the test_promotion?

Feel free to delete or move existing test cases if they make more sense to remain together in your added block, rather than more isolated in other test functions.

clang/lib/AST/FormatString.cpp
404

I almost made the same recommendation myself. For the below switch pair, and the pair above.

411–413

make sure to mark these done in phabricator UI

clang/test/Sema/format-strings-scanf.c
271–272

No, was just curious about the comment Is this a clang bug ?. Looks like my comment has moved from the source location, and that you have a similar thread below. This one can be marked done.

inclyc added inline comments.Aug 25 2022, 10:22 AM
clang/lib/AST/FormatString.cpp
404

It's a bit strange that we have two switches over the same BT->getKind() and the only difference is !Ptr; would it be easier to read if we combined the two switches into one and had logic in the individual cases for Ptr vs not Ptr?

These two switch pairs have different functions. The lower one is only responsible for checking whether there is a signed or unsigned integer, and the upper one is checking whether there is a promotion (or type confusing). Will they be more difficult to understand if they are written together?

inclyc marked 3 inline comments as done.Aug 25 2022, 10:38 AM
inclyc updated this revision to Diff 455645.Aug 25 2022, 10:46 AM

Remove wchar tests.

These tests may need to be re-added after we have implemented wchar_t checks in C

clang/lib/AST/FormatString.cpp
404

Perhaps. I think the comments you added to all switches are helpful!

clang/lib/Sema/SemaChecking.cpp
10128–10130

There's something about this conditional...I can't quite put my finger on.

Isn't ImplicitMatch only updated in the const ImplicitCastExpr *ICE = dyn_cast<ImplicitCastExpr>(E) branch above, where IsCharacterLiteralInt cannot be?

Simultaneously, isn't IsCharacterLiteralInt only updated in the const CharacterLiteral *CL = dyn_cast<CharacterLiteral>(E) branch above, where ImplicitMatch cannot be?

I wonder if we should instead hoist:

if (Match == ArgType::MatchPromotion) {
  if (ImplicitMatch != ArgType::NoMatchPromotionTypeConfusion &&
      ImplicitMatch != ArgType::NoMatchTypeConfusion)
    return true;
  Match = ArgType::NoMatch;
}

into the const ImplicitCastExpr *ICE = dyn_cast<ImplicitCastExpr>(E) branch after ImplicitMatch is updated.

Then hoist

if (Match == ArgType::MatchPromotion)
  return true;

into the const CharacterLiteral *CL = dyn_cast<CharacterLiteral>(E) branch after IsCharacterLiteralInt is updated? Then we could delete the IsCharacterLiteralInt variable perhaps?

clang/lib/Sema/SemaChecking.cpp
10134–10136

Similarly, if ImplicitMatch can only be updated to one of these two values within the above const ImplicitCastExpr *ICE = dyn_cast<ImplicitCastExpr>(E) branch, does it make sense to additionally hoist this conditional check into that branch as well?

10134–10136

If we can do both suggestions, then ImplicitMatch can remain scoped to that branch, as it previously was.

inclyc updated this revision to Diff 455671.Aug 25 2022, 11:37 AM

Address comments

inclyc added inline comments.Aug 25 2022, 11:50 AM
clang/lib/Sema/SemaChecking.cpp
10128–10130

There's something about this conditional...I can't quite put my finger on.

Isn't ImplicitMatch only updated in the const ImplicitCastExpr *ICE = dyn_cast<ImplicitCastExpr>(E) branch above, where IsCharacterLiteralInt cannot be?

Simultaneously, isn't IsCharacterLiteralInt only updated in the const CharacterLiteral *CL = dyn_cast<CharacterLiteral>(E) branch above, where ImplicitMatch cannot be?

I wonder if we should instead hoist:

if (Match == ArgType::MatchPromotion) {
  if (ImplicitMatch != ArgType::NoMatchPromotionTypeConfusion &&
      ImplicitMatch != ArgType::NoMatchTypeConfusion)
    return true;
  Match = ArgType::NoMatch;
}

into the const ImplicitCastExpr *ICE = dyn_cast<ImplicitCastExpr>(E) branch after ImplicitMatch is updated.

The control flow may not entering any of these branches at all, and if we have Match == MatchPromotion we depend on the origin value of ImplicitMatch to determine whether or not to discard the MatchPromotion property. For example,

printf("%hd", 0);

We have MatchPromotion because the argument is an int and the format string specifies signed short, and we will not entering ImplicitCastExpr branch because the argument is an int already (so we don't need an "implicit cast")

Then hoist

if (Match == ArgType::MatchPromotion)
  return true;

into the const CharacterLiteral *CL = dyn_cast<CharacterLiteral>(E) branch after IsCharacterLiteralInt is updated? Then we could delete the IsCharacterLiteralInt variable perhaps?

Thank you for this! I've hoisted this into CharacterLiteralExpr branch. And now IsCharacterLiteralInt is unnecessary.

dexonsmith added inline comments.
clang/test/FixIt/format.m
195–208 ↗(On Diff #455485)

I’m not sure, but I agree it’s suspicious. Besides John, Mike Ash might be another person to check with (@-completion isn’t working for me on my phone right now, but perhaps someone else will have more luck?). Or, @arphaman, any other suggestions?

inclyc updated this revision to Diff 455793.Aug 25 2022, 10:01 PM

Trim whitespace in format.mm

inclyc updated this revision to Diff 455871.Aug 26 2022, 4:28 AM

Add LangOpts so we keep the Obj-C behavior of clang just as-is.

inclyc added inline comments.Aug 26 2022, 4:31 AM
clang/lib/AST/FormatString.cpp
368

I've add case BuiltinType::Bool: back to preserve https://reviews.llvm.org/D66856

inclyc updated this revision to Diff 455898.Aug 26 2022, 7:01 AM

Update comments

inclyc updated this revision to Diff 456111.Aug 27 2022, 6:06 AM

Delete extra newline.

Currently this patch has not fully implemented wchar_t related support, this
type seems to be even platform dependent in C language, if possible, maybe we
can consider support in subsequent patches?

inclyc added a reviewer: Restricted Project.Aug 31 2022, 6:58 AM
nickdesaulniers accepted this revision.Aug 31 2022, 9:38 AM

@aaron.ballman should review+accept this before you land it, but I'm satisfied with the result. Thank you @inclyc for working on this!

This revision is now accepted and ready to land.Aug 31 2022, 9:38 AM
aaron.ballman accepted this revision.Aug 31 2022, 9:53 AM

Basically LGTM as well, just some nits about comments and a small fix to the c status page.

Currently this patch has not fully implemented wchar_t related support, this
type seems to be even platform dependent in C language, if possible, maybe we
can consider support in subsequent patches?

I think that's reasonable.

clang/lib/AST/FormatString.cpp
360
371
404

Yeah, let's leave the structure be for now, we can always clarify it further in an NFC follow-up if it turns out to be a reasonable suggestion.

404
452
clang/lib/Sema/SemaChecking.cpp
10123–10125
10128

We should probably have a comment here about why ObjC is special..

clang/www/c_status.html
822
efriedma added inline comments.
clang/docs/ReleaseNotes.rst
141

It seems a little weird to say we "implemented" this. The standard doesn't require any warnings for misuse of printf etc., and clang doesn't actually implement printf, so we don't need to do anything to be standard-compliant. I'd just say that we adjusted -Wformat warnings to avoid false positives.

clang/lib/Sema/SemaChecking.cpp
10124

*likely

inclyc updated this revision to Diff 457153.Aug 31 2022, 6:57 PM

Update comments according to review feedback

This revision was landed with ongoing or failed builds.Aug 31 2022, 7:11 PM
This revision was automatically updated to reflect the committed changes.