This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Sema/
-
Sema/
-
SemaChecking.cpp
-
test/
-
FixIt/
-
format.m
-
Sema/
-
format-strings-freebsd.c
-
format-strings-pedantic.c
-
format-strings.c

Differential D132266

[Clang][SemaChecking] move %hh and %h -Wformat warnings to -Wformat-pedantic
AbandonedPublic

Authored by nickdesaulniers on Aug 19 2022, 4:15 PM.

Download Raw Diff

Details

Reviewers

aaron.ballman

Summary

Similar to
commit cc01d6421f4a ("[Sema] Don't warn on printf('%hd', [char]) (PR41467)")

warning on %hhd -> int or %hd -> int is a major source of pain wrt.
compatibility with GCC; it's making enabling -Wformat for the Linux
kernel excessively painful.

Fixes: #57102

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nickdesaulniers created this revision.Aug 19 2022, 4:15 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 19 2022, 4:15 PM

Herald added a subscriber: emaste. · View Herald Transcript

nickdesaulniers requested review of this revision.Aug 19 2022, 4:15 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 19 2022, 4:15 PM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Harbormaster completed remote builds in B182313: Diff 454131.Aug 19 2022, 4:50 PM

dblaikie added a subscriber: dblaikie.Aug 20 2022, 1:20 AM

inclyc added a subscriber: inclyc.Aug 21 2022, 6:31 AM

Thanks for working on this @nickdesaulniers! I think we actually want to go a slightly different direction than this and disable the diagnostics entirely. Basically, we should be make sure the format specifier diagnostics are accounting for the clarifications in https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2562.pdf. So the h and hh modifiers would not even be pedantic warnings in this case.

This should also have a release note associated with it and if you think you've completed support for N2562, the clang/www/c_status.html page should update the Partial markings.

In D132266#3739513, @aaron.ballman wrote:

Thanks for working on this @nickdesaulniers! I think we actually want to go a slightly different direction than this and disable the diagnostics entirely. Basically, we should be make sure the format specifier diagnostics are accounting for the clarifications in https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2562.pdf. So the h and hh modifiers would not even be pedantic warnings in this case.

This should also have a release note associated with it and if you think you've completed support for N2562, the clang/www/c_status.html page should update the Partial markings.

Sorry, I have some questions about this clarification agreement (currently working on N2562). The length modifier hh says in the standard that it should point to signed char or unsigned char, and if an int parameter is passed, why shouldn't we give such a warning? (even if it's pedantic somehow)

N2562.pdf:

Modify 7.21.6.2p12:
...
Unless a length modifier is specified, tThe corresponding argument shall be a pointer to int ~~signed integer~~.

Does this clarification statement mean that hhd should not be considered to correspond to int, but a signed char? If so, could this exactly imply that we have unmatched argument type? (i.e. int <--x--> hhd (signed char) )

In D132266#3739542, @inclyc wrote:

In D132266#3739513, @aaron.ballman wrote:

Thanks for working on this @nickdesaulniers! I think we actually want to go a slightly different direction than this and disable the diagnostics entirely. Basically, we should be make sure the format specifier diagnostics are accounting for the clarifications in https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2562.pdf. So the h and hh modifiers would not even be pedantic warnings in this case.

This should also have a release note associated with it and if you think you've completed support for N2562, the clang/www/c_status.html page should update the Partial markings.

Sorry, I have some questions about this clarification agreement (currently working on N2562). The length modifier hh says in the standard that it should point to signed char or unsigned char, and if an int parameter is passed, why shouldn't we give such a warning? (even if it's pedantic somehow)

That's for the fscanf functionality, which is different than the fprintf functionality that's being addressed here. We should be diagnosing the fscanf type mismatch cases, per 7.23.6.2p12: "In the following, the type of the corresponding argument for a conversion specifier shall be a pointer to a type determined by the length modifiers, if any, or specified by the conversion specifier."

In D132266#3739567, @inclyc wrote:

N2562.pdf:

Modify 7.21.6.2p12:
...
Unless a length modifier is specified, tThe corresponding argument shall be a pointer to int ~~signed integer~~.

Does this clarification statement mean that hhd should not be considered to correspond to int, but a signed char?

Correct.

If so, could this exactly imply that we have unmatched argument type? (i.e. int <--x--> hhd (signed char) )

Here's how that's intended to be read:

int i;
signed char sc;
unsigned char uc;
char c;

scanf("%hhd, %hhd, %hhd, %hhd", &i, &sc, &uc, &c);

The first format specifier has a type mismatch: expected signed char * or unsigned char *, got int *.
The second format specifier is correct: expected signed char * or unsigned char *, got signed char *.
The third format specifier is correct: expected signed char * or unsigned char *, got unsigned char *.
The fourth format specifier is questionable (and I'll file an NB comment to clarify this) but I think intended to be correct: expected signed char * or unsigned char *, got char *. Pedantically, this is incorrect per 6.2.5 making char special.

What I'm confusing is
Which of the following two explanations is the exact meaning of hhd?

consumes a 32-bit signed integer, then truncates it *inside* printf
consumes an 8-bit signed integer

If it is the case 1, we should not emit this warning, but N2562 said that it still seems to be 2 ?

In D132266#3739600, @inclyc wrote:

What I'm confusing is
Which of the following two explanations is the exact meaning of hhd?

consumes a 32-bit signed integer, then truncates it *inside* printf

consumes an 8-bit signed integer

If it is the case 1, we should not emit this warning, but N2562 said that it still seems to be 2 ?

There's some confusion happening that we should clarify first. The sections you've been quoting so far are for the behavior of fscanf not fprintf and the requirements are different.

For fprintf, hhd means that fprintf consumes an int-sized object by calling effectively signed char value = (signed char)va_arg(list, int);.
For fscanf, hhd means that fscanf consumes a pointer in which to store the converted data by effectively calling signed char *ptr = (signed char *)va_arg(list, signed char *); *ptr = value;

This patch is currently handling only the fprintf cases and is not addressing the fscanf ones.

In D132266#3739618, @aaron.ballman wrote:

In D132266#3739600, @inclyc wrote:

What I'm confusing is
Which of the following two explanations is the exact meaning of hhd?

consumes a 32-bit signed integer, then truncates it *inside* printf

consumes an 8-bit signed integer

If it is the case 1, we should not emit this warning, but N2562 said that it still seems to be 2 ?

There's some confusion happening that we should clarify first. The sections you've been quoting so far are for the behavior of fscanf not fprintf and the requirements are different.

For fprintf, hhd means that fprintf consumes an int-sized object by calling effectively signed char value = (signed char)va_arg(list, int);.
For fscanf, hhd means that fscanf consumes a pointer in which to store the converted data by effectively calling signed char *ptr = (signed char *)va_arg(list, signed char *); *ptr = value;

This patch is currently handling only the fprintf cases and is not addressing the fscanf ones.

Thanks! This completely solved my doubts. I mistakenly thought that hhd also corresponds to signed char in printf now. :)

If some one use %hhd with an unmatched type argument. Should we emit diagnose like

format specifies type 'int' but the argument has type 'WhateverType'

instead of

format specifies type 'char' but the argument has type 'WhateverType'

It's seems that there are many regression tests about this. Should we keep the legacy behaviour?

In D132266#3740013, @inclyc wrote:
If some one use %hhd with an unmatched type argument. Should we emit diagnose like
format specifies type 'int' but the argument has type 'WhateverType'
instead of
format specifies type 'char' but the argument has type 'WhateverType'
?

It's seems that there are many regression tests about this. Should we keep the legacy behaviour?

I think we want the diagnostic to identify the target type from the format specifier along with the actual and promoted type (if relevant) of the argument that mismatches. e.g.,

float f = 1.0f;
printf("%hd", f);

I would expect this to say something like format specifies type 'short' but the argument has type 'double' (promoted from 'float').

This way users know 1) what the type of the source argument is without hunting for it, 2) what the target type of the format specifier is without hunting down what the specifiers mean, 3) what the actual mismatching type is.

format specifies type 'short' but the argument has type 'double' (promoted from 'float')

I'm not sure about this. I'm curious about we just consider any integer with width less than or equal to int may be specified by %hhd ? (Because of default argument promotions). For example, we may expect %hhd could accept int short and char. And give diagnostics like

format specifies integers with width less than or equal to 'int', but the argument has type 'double' (promoted from 'float')

note: {info about why %hhd consumes argument like this}

fwprintf shall behave as if it uses va_arg with a type argument naming the type resulting from
applying the default argument promotions to the type corresponding to the conversion specification and
then converting the result of the va_arg expansion to the type corresponding to the conversion
specification.

OpenCL seems to defined a series of length modifiers for vectors, Is it more reasonable to consider hh as fixed length (char) only? Because vectors are passed without argument promotions.

In D132266#3740168, @inclyc wrote:

format specifies type 'short' but the argument has type 'double' (promoted from 'float')

I'm not sure about this. I'm curious about we just consider any integer with width less than or equal to int may be specified by %hhd ? (Because of default argument promotions). For example, we may expect %hhd could accept int short and char.

I would expect those to *not* be diagnosed now. The goal of N2562 was to clarify that the default argument promotions happen and what causes UB is when there's a type mismatch after that promotion has taken place which would cause the value of the argument to be misinterpreted. e.g., if you say it's a char and you pass an int or any integer type of less width than int, that's fine because char promotes to int anyway. But if you say it's a char and you pass it a float, that's UB.

And give diagnostics like

format specifies integers with width less than or equal to 'int', but the argument has type 'double' (promoted from 'float')

note: {info about why %hhd consumes argument like this}

I guess I'm not really picturing what you have in mind here.

fwprintf shall behave as if it uses va_arg with a type argument naming the type resulting from
applying the default argument promotions to the type corresponding to the conversion specification and
then converting the result of the va_arg expansion to the type corresponding to the conversion
specification.

OpenCL seems to defined a series of length modifiers for vectors, Is it more reasonable to consider hh as fixed length (char) only? Because vectors are passed without argument promotions.

I have not put a lot of thought into what should happen when passing a vector type. The standard is obviously silent on this situation since it doesn't admit vector types are a thing, what really matters is what common C standard library implementations do when given a vector type. If they do bad things (e.g., glibc works fine but MSVC CRT prints garbage) then we'll likely want to diagnose depending on the situation and the language options enabled.

inclyc mentioned this in D132568: [clang][Sema] check default argument promotions for printf.Aug 24 2022, 8:51 AM

Prefer D132568.

Revision Contents

Path

Size

clang/

lib/

Sema/

SemaChecking.cpp

6 lines

test/

FixIt/

format.m

4 lines

Sema/

format-strings-freebsd.c

8 lines

format-strings-pedantic.c

13 lines

format-strings.c

15 lines

Diff 454131

clang/lib/Sema/SemaChecking.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 10,123 Lines • ▼ Show 20 Lines	CheckPrintfHandler::checkFormatExpr(const analyze_printf::PrintfSpecifier &FS,

// Look through enums to their underlying type.		// Look through enums to their underlying type.
bool IsEnum = false;		bool IsEnum = false;
if (auto EnumTy = ExprTy->getAs<EnumType>()) {		if (auto EnumTy = ExprTy->getAs<EnumType>()) {
ExprTy = EnumTy->getDecl()->getIntegerType();		ExprTy = EnumTy->getDecl()->getIntegerType();
IsEnum = true;		IsEnum = true;
}		}

		// Consider %hh or %h length modifiers for ints/unsigned ints -Wpedantic.
		LengthModifier::Kind LMK = FS.getLengthModifier().getKind();
		if ((LMK == LengthModifier::AsChar \|\| LMK == LengthModifier::AsShort) &&
		(ExprTy == S.Context.IntTy \|\| ExprTy == S.Context.UnsignedIntTy))
		Match = ArgType::NoMatchPedantic;

// %C in an Objective-C context prints a unichar, not a wchar_t.		// %C in an Objective-C context prints a unichar, not a wchar_t.
// If the argument is an integer of some kind, believe the %C and suggest		// If the argument is an integer of some kind, believe the %C and suggest
// a cast instead of changing the conversion specifier.		// a cast instead of changing the conversion specifier.
QualType IntendedTy = ExprTy;		QualType IntendedTy = ExprTy;
if (isObjCContext() &&		if (isObjCContext() &&
FS.getConversionSpecifier().getKind() == ConversionSpecifier::CArg) {		FS.getConversionSpecifier().getKind() == ConversionSpecifier::CArg) {
if (ExprTy->isIntegralOrUnscopedEnumerationType() &&		if (ExprTy->isIntegralOrUnscopedEnumerationType() &&
!ExprTy->isCharType()) {		!ExprTy->isCharType()) {
▲ Show 20 Lines • Show All 7,763 Lines • Show Last 20 Lines

clang/test/FixIt/format.m

	// RUN: %clang_cc1 -triple %itanium_abi_triple -fsyntax-only -fblocks -verify %s			// RUN: %clang_cc1 -triple %itanium_abi_triple -fsyntax-only -fblocks -verify %s -Wformat-pedantic
	// RUN: %clang_cc1 -triple %itanium_abi_triple -fdiagnostics-parseable-fixits -fblocks %s 2>&1 \| FileCheck %s			// RUN: %clang_cc1 -triple %itanium_abi_triple -fdiagnostics-parseable-fixits -fblocks %s 2>&1 -Wformat-pedantic \| FileCheck %s

	@class NSString;			@class NSString;
	extern void NSLog(NSString *, ...);			extern void NSLog(NSString *, ...);
	int printf(const char * restrict, ...) ;			int printf(const char * restrict, ...) ;

	void test_integer_correction (int x) {			void test_integer_correction (int x) {
	printf("%d", x); // no-warning			printf("%d", x); // no-warning
	printf("%s", x); // expected-warning{{format specifies type 'char *' but the argument has type 'int'}}			printf("%s", x); // expected-warning{{format specifies type 'char *' but the argument has type 'int'}}
	▲ Show 20 Lines • Show All 308 Lines • Show Last 20 Lines

clang/test/Sema/format-strings-freebsd.c

	// RUN: %clang_cc1 -fsyntax-only -verify -triple i386-unknown-freebsd %s			// RUN: %clang_cc1 -fsyntax-only -verify -triple i386-unknown-freebsd %s -Wformat-pedantic
	// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-unknown-freebsd %s			// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-unknown-freebsd %s -Wformat-pedantic
	// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-scei-ps4 %s			// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-scei-ps4 %s -Wformat-pedantic
	// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-sie-ps5 %s			// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-sie-ps5 %s -Wformat-pedantic

	// Test FreeBSD kernel printf extensions.			// Test FreeBSD kernel printf extensions.
	int freebsd_kernel_printf(const char *, ...) __attribute__((__format__(__freebsd_kprintf__, 1, 2)));			int freebsd_kernel_printf(const char *, ...) __attribute__((__format__(__freebsd_kprintf__, 1, 2)));

	void check_freebsd_kernel_extensions(int i, long l, char *s, short h)			void check_freebsd_kernel_extensions(int i, long l, char *s, short h)
	{			{
	// %b expects an int and a char *			// %b expects an int and a char *
	freebsd_kernel_printf("reg=%b\n", i, "\10\2BITTWO\1BITONE\n"); // no-warning			freebsd_kernel_printf("reg=%b\n", i, "\10\2BITTWO\1BITONE\n"); // no-warning
	Show All 36 Lines

clang/test/Sema/format-strings-pedantic.c

Show All 12 Lines	#ifdef __OBJC__
printf("%p", ^{}); // expected-warning {{format specifies type 'void *' but the argument has type 'void (^)(void)'}}		printf("%p", ^{}); // expected-warning {{format specifies type 'void *' but the argument has type 'void (^)(void)'}}
printf("%p", (id)0); // expected-warning {{format specifies type 'void *' but the argument has type 'id'}}		printf("%p", (id)0); // expected-warning {{format specifies type 'void *' but the argument has type 'id'}}
#endif		#endif

#ifdef __cplusplus		#ifdef __cplusplus
printf("%p", nullptr); // expected-warning {{format specifies type 'void *' but the argument has type 'std::nullptr_t'}}		printf("%p", nullptr); // expected-warning {{format specifies type 'void *' but the argument has type 'std::nullptr_t'}}
#endif		#endif
}		}

		void test_intentional_truncation(void) {
		int x = 42;
		printf("%hhd", x); // expected-warning {{format specifies type 'char' but the argument has type 'int'}}
		printf("%hhu", x); // expected-warning {{format specifies type 'unsigned char' but the argument has type 'int'}}
		printf("%hd", x); // expected-warning {{format specifies type 'short' but the argument has type 'int'}}
		printf("%hu", x); // expected-warning {{format specifies type 'unsigned short' but the argument has type 'int'}}
		unsigned y = 42;
		printf("%hhd", y); // expected-warning {{format specifies type 'char' but the argument has type 'unsigned int'}}
		printf("%hhu", y); // expected-warning {{format specifies type 'unsigned char' but the argument has type 'unsigned int'}}
		printf("%hd", y); // expected-warning {{format specifies type 'short' but the argument has type 'unsigned int'}}
		printf("%hu", y); // expected-warning {{format specifies type 'unsigned short' but the argument has type 'unsigned int'}}
		}

clang/test/Sema/format-strings.c

Show First 20 Lines • Show All 824 Lines • ▼ Show 20 Lines	void __attribute__((__format__(__printf__, 2, 3))) (^printf_arg2)(
vprintf(fmt, ap);		vprintf(fmt, ap);
vprintf(not_fmt, ap); // expected-warning{{format string is not a string literal}}		vprintf(not_fmt, ap); // expected-warning{{format string is not a string literal}}
va_end(ap);		va_end(ap);
};		};

printf_arg2("foo", "%s string %i\n", "aaa", 123);		printf_arg2("foo", "%s string %i\n", "aaa", 123);
printf_arg2("%s string\n", "foo", "bar"); // expected-warning{{data argument not used by format string}}		printf_arg2("%s string\n", "foo", "bar"); // expected-warning{{data argument not used by format string}}
}		}

		// These are handled by -Wformat-pedantic. See also
		// clang/test/Sema/format-strings-pedantic.c.
		void test_intentional_truncation(void) {
		int x = 42;
		printf("%hhd", x); // no-warning
		printf("%hhu", x); // no-warning
		printf("%hd", x); // no-warning
		printf("%hu", x); // no-warning
		unsigned y = 42;
		printf("%hhd", y); // no-warning
		printf("%hhu", y); // no-warning
		printf("%hd", y); // no-warning
		printf("%hu", y); // no-warning
		}