The main focus of this patch is to make ArgType::matchesType check for
possible default parameter promotions when the argType is not a pointer.
If so, no warning will be given for int, unsigned int types as
corresponding arguments to %hhd and %hd. However, the usage of %hhd
corresponding to short is relatively rare, and it is more likely to be a
misuse. This patch keeps the original behavior of clang like this as
much as possible, while making it more convenient to consider the
default arguments promotion.
Details
- Reviewers
aaron.ballman nickdesaulniers - Group Reviewers
Restricted Project - Commits
- rGe3bd67eddf65: [clang][Sema] check default argument promotions for printf
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
https://reviews.llvm.org/D132266
This patch seems to ignore this scenario:
printf("%hhf", [int])
Where we have length modifier h, and the argument type is int. Causes clang to suppress warnings about type mismatches between floats and integers.
clang/test/FixIt/format.mm | ||
---|---|---|
10 | The language built-in wchar_t will be properly diagnosed. | |
clang/test/SemaObjC/format-strings-objc.m | ||
194 | This wchar_t is defined as typedef __WCHAR_TYPE__ wchar_t;. Actually it is int on my machine, I'm not sure if we should still report this warning as before, because it is just int, not a real "wchar_t". |
Thank you for the patch, but it'd be really helpful to me as a reviewer if you and @nickdesaulniers could coordinate so there's only one patch trying to address #57102 instead of two competing patches (I'm happy to review whichever patch comes out). From a quick look over the test case changes, this patch looks like it's more in line with how I'd expect to resolve that issue compared to D132266.
The addition to clang/test/Sema/format-strings.c looks like it will resolve Linus' complaints in https://github.com/llvm/llvm-project/issues/57102; I'm happy to abandon D132266 in preference of this.
However, the usage of %hhd corresponding to short is relatively rare, and it is more likely to be a misuse.
Do we want to encode that in test_promotion in clang/test/Sema/format-strings.c? Seems like tests on shorts are missing.
clang/lib/AST/FormatString.cpp | ||
---|---|---|
359 | might as well make this auto while you're here. | |
408–410 | {} aren't necessary | |
414–417 | {} aren't necessary | |
426–428 | {} aren't necessary | |
clang/lib/Sema/SemaChecking.cpp | ||
10084–10090 | I think it would be slightly more readable to make these two assignments two dedicated statements. | |
clang/test/Sema/format-strings-scanf.c | ||
271–272 | Hmm... | |
274 | delete empty newline |
Do we want to encode that in test_promotion in clang/test/Sema/format-strings.c? Seems like tests on shorts are missing.
Tests for short and char "incompatibility" could be found elsewhere in this file.
format-strings.c
void should_understand_small_integers(void) { printf("%hhu", (short) 10); // expected-warning{{format specifies type 'unsigned char' but the argument has type 'short'}} printf("%hu\n", (unsigned char)1); // warning with -Wformat-pedantic only printf("%hu\n", (uint8_t)1); // warning with -Wformat-pedantic only } /* ... */ void test13(short x) { char bel = 007; printf("bel: '0%hhd'\n", bel); // no-warning printf("x: '0%hhd'\n", x); // expected-warning {{format specifies type 'char' but the argument has type 'short'}} }
Do I need to explicitly test again in the test_promotion?
Add tests for character literals
I've noticed that Linus mentioned the following code triggered warning by
clang. (With a little modifications shown below)
printf("%hhd", 'a'); printf("%hd", 'a');
character literals are int in C. So clang-14 generates warnings like this:
local/format.c:9:20: warning: format specifies type 'char' but the argument has type 'int' [-Wformat] printf("%hhd", 'a'); ~~~~ ^~~ %d local/format.c:10:19: warning: format specifies type 'short' but the argument has type 'char' [-Wformat] printf("%hd", 'a'); ~~~ ^~~ %hhd 2 warnings generated.
Ironically, we advise our users to change their source code, which leads to
another warning. I've added tests for this case, after this patch, only (%hd,
"char") will cause warnings. And clang will generate warning like this:
local/format.c:10:19: warning: format specifies type 'short' but the argument has type 'char' [-Wformat] printf("%hd", 'a'); ~~~ ^~~ %hhd 1 warning generated.
Use %hd with char is not reasonable. (Probabaly another misuse) So I kept
this warning as-is.
Thanks, Nick!
I asked some questions about wchar_t behavior in the thread, but I sort of wonder if we should handle those concerns separately (if changes are needed) as it looks more like missing functionality than promotion-related behavior. We have other changes we'll need in this area anyway for C23 because of https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2630.pdf, https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2680.pdf, and https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2763.pdf so there's no need to cram it all into one patch.
clang/include/clang/AST/FormatString.h | ||
---|---|---|
263–264 | ||
clang/lib/AST/FormatString.cpp | ||
367 | Isn't this a match promotion as well? e.g. printf("%hhd", (_Bool)0); where the _Bool is promoted to int but then cast back down to a char type (which seems rather unlikely to be a type confusion issue). | |
378 | In C, wchar_t is an integer type and that underlying type might be int or it might be promotable to int. Should we handle it here as a type confused promotion match? | |
401 | It's a bit strange that we have two switches over the same BT->getKind() and the only difference is !Ptr; would it be easier to read if we combined the two switches into one and had logic in the individual cases for Ptr vs not Ptr? | |
408 | wchar_t? | |
413–414 | I agree that printf("%hhd", (short)0); is potentially type confused, but why printf("%d", (short)0);? FWIW, I think that printf("%lc", (short)0)); is potentially type confused as well. | |
422 | The comment is talking about passing a char (promoted to int) then printed as a short but the check is looking for the type to be printed as an int; that doesn't seem like a type confusion situation so much as a match due to promotion. Should the test below be looking for a short type instead of an int type? | |
429 | Why is this automatically type confusion regardless of the type specified by the format string? | |
clang/lib/Sema/SemaChecking.cpp | ||
10114 | Do we need a similar special case for L'a' in C, which has type wchar_t (which is a typedef to an integer type)? | |
clang/test/FixIt/format.m | ||
40 | It looks like some whitespace only changes snuck in. | |
173–176 | Should we leave the test but remove the expected warnings and fix-it comments instead? | |
195–208 | Hmmm, did we intend to change the behavior here? %C is an extension, so it's not clear to me if we wanted to modify this behavior or not. CC @rjmccall and @dexonsmith for some ObjC opinions. | |
clang/test/FixIt/format.mm | ||
14–25 | Same questions here as above. | |
clang/test/Sema/format-strings-scanf.c | ||
15 | Looks like more whitespace-only changes snuck in. | |
265–266 | WG14 hasn't made an official determination, but one test we should add is: // FIXME: does this match what the C committee allows or should it be pedantically warned on? char c; void *vp; scanf("%hhd", &c); // Pedantic warning? scanf("%hhd", vp); // Pedantic warning? I asked on the reflectors whether folks wanted me to file an NB comment about this and the answers have been mixed so far. I'll most likely still file an NB comment to get the committee officially on record though. | |
271–272 | @nickdesaulniers -- are you thinking this might be a pedantic warning because it's always valid to cast to a pointer to char and write into it? e.g., short s; char *cp = (char *)&s; *cp = 0; // Not UB | |
289–290 | Is what a clang bug? Also, what is with the "or no effect" in that wording? "This could cause major issues or nothing bad at all might happen" is a very odd way to word a diagnostic. :-D (Maybe something to look into improving in a later patch.) |
clang/test/FixIt/format.m | ||
---|---|---|
40 |
I'm sorry for this, do I need to discard these whitespace-only changes? Since this patch changes this file, is it my responsibility to keep other parts that don't belong to this patch unchanged, or am I correcting them by the way? | |
clang/test/Sema/format-strings-scanf.c | ||
289–290 |
Look at sc which is a signed char, but scanf need a pointer float *, I add this comment because I think there should be a warning like format specifies type 'float *' but the argument has type 'signed short *' See printf tests below |
clang/test/FixIt/format.m | ||
---|---|---|
40 |
You should discard the whitespace only changes; you are welcome to land an NFC commit that fixes just the whitespace issues though! We typically ask that changes unrelated to the patch summary should be separated out so that each patch is self-contained for just one thing. Not only is that easier to review, it also makes it less likely that we end up reverting good changes if we need to revert a patch. | |
clang/test/Sema/format-strings-scanf.c | ||
289–290 | Oh I see what you're saying! There are two issues with the code: one is that the format specifier itself is UB and the other is that the argument passed to this confused format specifier does not agree. To me, the priority is the invalid format specifier; unless the user corrects that, we're not really certain what they were aiming to scan in. And I think it's *probably* more likely that the user made a typo in the format specifier instead of trying to scan a half float (or whatever they thought they were scanning) into a signed char, so it seems defensible to silence the mismatched specifier diagnostic. WDYT? |
clang/lib/AST/FormatString.cpp | ||
---|---|---|
422 |
Sorry for this (sometimes shit happens). There is a lot of redundant logic in this place, I will upload the corrected code immediately. | |
clang/test/Sema/format-strings-scanf.c | ||
289–290 | printf("%hf", // expected-warning{{length modifier 'h' results in undefined behavior or no effect with 'f' conversion specifier}} sc); // expected-warning{{format specifies type 'double' but the argument has type 'signed char'}} Then clang seems to have two different behaviors to printf and scanf, and for printf it will give the kind of diagnosis I said. I don't think this problem is particularly serious, because the key point is that using %hf UB. So maybe we can just keep clang as-is? Or we should consider implement the same warning in scanf ? |
clang/lib/AST/FormatString.cpp | ||
---|---|---|
367 | Hmm? _Bool just matches AnyCharTy perfectly. MatchPromotion means the types having different length and they can be considered as "partially matched" because of promotions). Isn't _Bool and AnyChar here just have the same length? |
clang/lib/AST/FormatString.cpp | ||
---|---|---|
367 |
_Bool is not a character type, so it doesn't match AnyCharTy perfectly. The size can be the same but they're still different types. |
Feel free to delete or move existing test cases if they make more sense to remain together in your added block, rather than more isolated in other test functions.
clang/lib/AST/FormatString.cpp | ||
---|---|---|
401 | I almost made the same recommendation myself. For the below switch pair, and the pair above. | |
408–410 | make sure to mark these done in phabricator UI | |
clang/test/Sema/format-strings-scanf.c | ||
271–272 | No, was just curious about the comment Is this a clang bug ?. Looks like my comment has moved from the source location, and that you have a similar thread below. This one can be marked done. |
clang/lib/AST/FormatString.cpp | ||
---|---|---|
401 |
These two switch pairs have different functions. The lower one is only responsible for checking whether there is a signed or unsigned integer, and the upper one is checking whether there is a promotion (or type confusing). Will they be more difficult to understand if they are written together? |
Remove wchar tests.
These tests may need to be re-added after we have implemented wchar_t checks in C
clang/lib/AST/FormatString.cpp | ||
---|---|---|
401 | Perhaps. I think the comments you added to all switches are helpful! |
clang/lib/Sema/SemaChecking.cpp | ||
---|---|---|
10127–10129 | There's something about this conditional...I can't quite put my finger on. Isn't ImplicitMatch only updated in the const ImplicitCastExpr *ICE = dyn_cast<ImplicitCastExpr>(E) branch above, where IsCharacterLiteralInt cannot be? Simultaneously, isn't IsCharacterLiteralInt only updated in the const CharacterLiteral *CL = dyn_cast<CharacterLiteral>(E) branch above, where ImplicitMatch cannot be? I wonder if we should instead hoist: if (Match == ArgType::MatchPromotion) { if (ImplicitMatch != ArgType::NoMatchPromotionTypeConfusion && ImplicitMatch != ArgType::NoMatchTypeConfusion) return true; Match = ArgType::NoMatch; } into the const ImplicitCastExpr *ICE = dyn_cast<ImplicitCastExpr>(E) branch after ImplicitMatch is updated. Then hoist if (Match == ArgType::MatchPromotion) return true; into the const CharacterLiteral *CL = dyn_cast<CharacterLiteral>(E) branch after IsCharacterLiteralInt is updated? Then we could delete the IsCharacterLiteralInt variable perhaps? |
clang/lib/Sema/SemaChecking.cpp | ||
---|---|---|
10133–10135 | Similarly, if ImplicitMatch can only be updated to one of these two values within the above const ImplicitCastExpr *ICE = dyn_cast<ImplicitCastExpr>(E) branch, does it make sense to additionally hoist this conditional check into that branch as well? | |
10133–10135 | If we can do both suggestions, then ImplicitMatch can remain scoped to that branch, as it previously was. |
clang/lib/Sema/SemaChecking.cpp | ||
---|---|---|
10127–10129 |
The control flow may not entering any of these branches at all, and if we have Match == MatchPromotion we depend on the origin value of ImplicitMatch to determine whether or not to discard the MatchPromotion property. For example, printf("%hd", 0); We have MatchPromotion because the argument is an int and the format string specifies signed short, and we will not entering ImplicitCastExpr branch because the argument is an int already (so we don't need an "implicit cast")
Thank you for this! I've hoisted this into CharacterLiteralExpr branch. And now IsCharacterLiteralInt is unnecessary. |
clang/lib/AST/FormatString.cpp | ||
---|---|---|
367 | I've add case BuiltinType::Bool: back to preserve https://reviews.llvm.org/D66856 |
Delete extra newline.
Currently this patch has not fully implemented wchar_t related support, this
type seems to be even platform dependent in C language, if possible, maybe we
can consider support in subsequent patches?
@aaron.ballman should review+accept this before you land it, but I'm satisfied with the result. Thank you @inclyc for working on this!
Basically LGTM as well, just some nits about comments and a small fix to the c status page.
I think that's reasonable.
clang/lib/AST/FormatString.cpp | ||
---|---|---|
360 | ||
370 | ||
401 | Yeah, let's leave the structure be for now, we can always clarify it further in an NFC follow-up if it turns out to be a reasonable suggestion. | |
401 | ||
464 | ||
clang/lib/Sema/SemaChecking.cpp | ||
10123–10125 | ||
10139 | We should probably have a comment here about why ObjC is special.. | |
clang/www/c_status.html | ||
822 |
clang/docs/ReleaseNotes.rst | ||
---|---|---|
141 | It seems a little weird to say we "implemented" this. The standard doesn't require any warnings for misuse of printf etc., and clang doesn't actually implement printf, so we don't need to do anything to be standard-compliant. I'd just say that we adjusted -Wformat warnings to avoid false positives. | |
clang/lib/Sema/SemaChecking.cpp | ||
10124 | *likely |
It seems a little weird to say we "implemented" this. The standard doesn't require any warnings for misuse of printf etc., and clang doesn't actually implement printf, so we don't need to do anything to be standard-compliant. I'd just say that we adjusted -Wformat warnings to avoid false positives.