This is an archive of the discontinued LLVM Phabricator instance.

[Clang][SemaChecking] move %hh and %h -Wformat warnings to -Wformat-pedantic
AbandonedPublic

Authored by nickdesaulniers on Aug 19 2022, 4:15 PM.

Details

Reviewers
aaron.ballman
Summary

Similar to
commit cc01d6421f4a ("[Sema] Don't warn on printf('%hd', [char]) (PR41467)")

warning on %hhd -> int or %hd -> int is a major source of pain wrt.
compatibility with GCC; it's making enabling -Wformat for the Linux
kernel excessively painful.

Fixes: #57102

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptAug 19 2022, 4:15 PM
Herald added a subscriber: emaste. · View Herald Transcript
nickdesaulniers requested review of this revision.Aug 19 2022, 4:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 19 2022, 4:15 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
inclyc added a subscriber: inclyc.Aug 21 2022, 6:31 AM

Thanks for working on this @nickdesaulniers! I think we actually want to go a slightly different direction than this and disable the diagnostics entirely. Basically, we should be make sure the format specifier diagnostics are accounting for the clarifications in https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2562.pdf. So the h and hh modifiers would not even be pedantic warnings in this case.

This should also have a release note associated with it and if you think you've completed support for N2562, the clang/www/c_status.html page should update the Partial markings.

Thanks for working on this @nickdesaulniers! I think we actually want to go a slightly different direction than this and disable the diagnostics entirely. Basically, we should be make sure the format specifier diagnostics are accounting for the clarifications in https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2562.pdf. So the h and hh modifiers would not even be pedantic warnings in this case.

This should also have a release note associated with it and if you think you've completed support for N2562, the clang/www/c_status.html page should update the Partial markings.

Sorry, I have some questions about this clarification agreement (currently working on N2562). The length modifier hh says in the standard that it should point to signed char or unsigned char, and if an int parameter is passed, why shouldn't we give such a warning? (even if it's pedantic somehow)

N2562.pdf:

Modify 7.21.6.2p12:
...
Unless a length modifier is specified, tThe corresponding argument shall be a pointer to int signed integer.

Does this clarification statement mean that hhd should not be considered to correspond to int, but a signed char? If so, could this exactly imply that we have unmatched argument type? (i.e. int <--x--> hhd (signed char) )

Thanks for working on this @nickdesaulniers! I think we actually want to go a slightly different direction than this and disable the diagnostics entirely. Basically, we should be make sure the format specifier diagnostics are accounting for the clarifications in https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2562.pdf. So the h and hh modifiers would not even be pedantic warnings in this case.

This should also have a release note associated with it and if you think you've completed support for N2562, the clang/www/c_status.html page should update the Partial markings.

Sorry, I have some questions about this clarification agreement (currently working on N2562). The length modifier hh says in the standard that it should point to signed char or unsigned char, and if an int parameter is passed, why shouldn't we give such a warning? (even if it's pedantic somehow)

That's for the fscanf functionality, which is different than the fprintf functionality that's being addressed here. We should be diagnosing the fscanf type mismatch cases, per 7.23.6.2p12: "In the following, the type of the corresponding argument for a conversion specifier shall be a pointer to a type determined by the length modifiers, if any, or specified by the conversion specifier."

N2562.pdf:

Modify 7.21.6.2p12:
...
Unless a length modifier is specified, tThe corresponding argument shall be a pointer to int signed integer.

Does this clarification statement mean that hhd should not be considered to correspond to int, but a signed char?

Correct.

If so, could this exactly imply that we have unmatched argument type? (i.e. int <--x--> hhd (signed char) )

Here's how that's intended to be read:

int i;
signed char sc;
unsigned char uc;
char c;

scanf("%hhd, %hhd, %hhd, %hhd", &i, &sc, &uc, &c);

The first format specifier has a type mismatch: expected signed char * or unsigned char *, got int *.
The second format specifier is correct: expected signed char * or unsigned char *, got signed char *.
The third format specifier is correct: expected signed char * or unsigned char *, got unsigned char *.
The fourth format specifier is questionable (and I'll file an NB comment to clarify this) but I think intended to be correct: expected signed char * or unsigned char *, got char *. Pedantically, this is incorrect per 6.2.5 making char special.

What I'm confusing is
Which of the following two explanations is the exact meaning of hhd?

  1. consumes a 32-bit signed integer, then truncates it *inside* printf
  2. consumes an 8-bit signed integer

If it is the case 1, we should not emit this warning, but N2562 said that it still seems to be 2 ?

What I'm confusing is
Which of the following two explanations is the exact meaning of hhd?

  1. consumes a 32-bit signed integer, then truncates it *inside* printf
  2. consumes an 8-bit signed integer

If it is the case 1, we should not emit this warning, but N2562 said that it still seems to be 2 ?

There's some confusion happening that we should clarify first. The sections you've been quoting so far are for the behavior of fscanf not fprintf and the requirements are different.

For fprintf, hhd means that fprintf consumes an int-sized object by calling effectively signed char value = (signed char)va_arg(list, int);.
For fscanf, hhd means that fscanf consumes a pointer in which to store the converted data by effectively calling signed char *ptr = (signed char *)va_arg(list, signed char *); *ptr = value;

This patch is currently handling only the fprintf cases and is not addressing the fscanf ones.

What I'm confusing is
Which of the following two explanations is the exact meaning of hhd?

  1. consumes a 32-bit signed integer, then truncates it *inside* printf
  2. consumes an 8-bit signed integer

If it is the case 1, we should not emit this warning, but N2562 said that it still seems to be 2 ?

There's some confusion happening that we should clarify first. The sections you've been quoting so far are for the behavior of fscanf not fprintf and the requirements are different.

For fprintf, hhd means that fprintf consumes an int-sized object by calling effectively signed char value = (signed char)va_arg(list, int);.
For fscanf, hhd means that fscanf consumes a pointer in which to store the converted data by effectively calling signed char *ptr = (signed char *)va_arg(list, signed char *); *ptr = value;

This patch is currently handling only the fprintf cases and is not addressing the fscanf ones.

Thanks! This completely solved my doubts. I mistakenly thought that hhd also corresponds to signed char in printf now. :)

If some one use %hhd with an unmatched type argument. Should we emit diagnose like

format specifies type 'int' but the argument has type 'WhateverType'

instead of

format specifies type 'char' but the argument has type 'WhateverType'

?

It's seems that there are many regression tests about this. Should we keep the legacy behaviour?

If some one use %hhd with an unmatched type argument. Should we emit diagnose like

format specifies type 'int' but the argument has type 'WhateverType'

instead of

format specifies type 'char' but the argument has type 'WhateverType'

?

It's seems that there are many regression tests about this. Should we keep the legacy behaviour?

I think we want the diagnostic to identify the target type from the format specifier along with the actual and promoted type (if relevant) of the argument that mismatches. e.g.,

float f = 1.0f;
printf("%hd", f);

I would expect this to say something like format specifies type 'short' but the argument has type 'double' (promoted from 'float').

This way users know 1) what the type of the source argument is without hunting for it, 2) what the target type of the format specifier is without hunting down what the specifiers mean, 3) what the actual mismatching type is.

format specifies type 'short' but the argument has type 'double' (promoted from 'float')

I'm not sure about this. I'm curious about we just consider any integer with width less than or equal to int may be specified by %hhd ? (Because of default argument promotions). For example, we may expect %hhd could accept int short and char. And give diagnostics like

format specifies integers with width less than or equal to 'int', but the argument has type 'double' (promoted from 'float')

note: {info about why %hhd consumes argument like this}

fwprintf shall behave as if it uses va_arg with a type argument naming the type resulting from
applying the default argument promotions to the type corresponding to the conversion specification and
then converting the result of the va_arg expansion to the type corresponding to the conversion
specification.


OpenCL seems to defined a series of length modifiers for vectors, Is it more reasonable to consider hh as fixed length (char) only? Because vectors are passed without argument promotions.

format specifies type 'short' but the argument has type 'double' (promoted from 'float')

I'm not sure about this. I'm curious about we just consider any integer with width less than or equal to int may be specified by %hhd ? (Because of default argument promotions). For example, we may expect %hhd could accept int short and char.

I would expect those to *not* be diagnosed now. The goal of N2562 was to clarify that the default argument promotions happen and what causes UB is when there's a type mismatch after that promotion has taken place which would cause the value of the argument to be misinterpreted. e.g., if you say it's a char and you pass an int or any integer type of less width than int, that's fine because char promotes to int anyway. But if you say it's a char and you pass it a float, that's UB.

And give diagnostics like

format specifies integers with width less than or equal to 'int', but the argument has type 'double' (promoted from 'float')

note: {info about why %hhd consumes argument like this}

I guess I'm not really picturing what you have in mind here.

fwprintf shall behave as if it uses va_arg with a type argument naming the type resulting from
applying the default argument promotions to the type corresponding to the conversion specification and
then converting the result of the va_arg expansion to the type corresponding to the conversion
specification.


OpenCL seems to defined a series of length modifiers for vectors, Is it more reasonable to consider hh as fixed length (char) only? Because vectors are passed without argument promotions.

I have not put a lot of thought into what should happen when passing a vector type. The standard is obviously silent on this situation since it doesn't admit vector types are a thing, what really matters is what common C standard library implementations do when given a vector type. If they do bad things (e.g., glibc works fine but MSVC CRT prints garbage) then we'll likely want to diagnose depending on the situation and the language options enabled.