This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Sema/
-
Sema/
6
SemaChecking.cpp
-
test/Sema/
-
Sema/
-
compare.c

Differential D126960

[clang][sema] Unary not boolean to int conversion
Needs ReviewPublic

Authored by AshleyRoll on Jun 3 2022, 5:41 AM.

Download Raw Diff

Details

Reviewers

rsmith
aaron.ballman
erichkeane

Summary

I have modifyed GetExprRange() to capture the promotion
of boolean values to ints when applying unary not to ensure
-Wsign-compare identifies unsigned/signed comparisons as
identified in:

https://github.com/llvm/llvm-project/issues/18878

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

AshleyRoll created this revision.Jun 3 2022, 5:41 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 3 2022, 5:41 AM

AshleyRoll requested review of this revision.Jun 3 2022, 5:41 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 3 2022, 5:41 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Harbormaster completed remote builds in B167701: Diff 434004.Jun 3 2022, 6:17 AM

Rebased to remove clangd test failure

Harbormaster completed remote builds in B167883: Diff 434250.Jun 4 2022, 12:39 AM

junaire added reviewers: aaron.ballman, erichkeane.Jun 10 2022, 6:13 AM

erichkeane added inline comments.Jun 10 2022, 6:27 AM

clang/lib/Sema/SemaChecking.cpp
12328–12335	Richard mentions UO_PreInc, UO_PreDec, UO_Minus, and UO_Not. What about the other 3?
12330
12332	Can you trace what MaxWidth ends up being every time? Are we sure this will always just be int-width and not long long or something?

aaron.ballman added inline comments.Jun 10 2022, 7:52 AM

clang/lib/Sema/SemaChecking.cpp
12328–12335	`UO_Plus` also.

Thank you for the patch! This is certainly an improvement but I think there are still some cases where we compute the wrong range for ~ with this patch applied.

clang/lib/Sema/SemaChecking.cpp
12328–12335	Taking those in turn: `UO_PreInc` and `UO_PreDec` can only produce values in the range of the lvalue operand; we'll never have a more precise range for an lvalue than the range from its type unless it's a bit-field, in which case we do want to take the bit-field's width into account. So I think the current code is actually correct for those two -- looking at the range of the operand gives the right result even for bit-fields, whereas looking at the type of the operand would give a less precise result in that case. `UO_Minus` is not giving the right answer. I think we should model `-expr` in exactly the same way we model `0 - expr` -- see the code for `BO_Sub` above. `UO_Plus` seems OK with the current code: the range of values of `+expr` is the same as the range of values of `expr`, even if a promotion happens. In any case, I don't think we should have a `default` here that returns the width of the operand. Instead, I think the `default` should return the range for the type of `E`.
12334	I don't think falling through and picking up the expression range of the operand is ever correct for `~`. For example: bool f(char c) { return ~c > 0x10000; } ... produces a bogus tautological comparison warning. (This is not tautological: the `~` operator will map negative `char`s to `int`s in the range [2^31 - 2^7, 2^31) and non-negative `char`s to `int`s in the range [-2^31, -2^31+2^7), so this is equivalent to `c < 0`.) Also, using `MaxWidth` here is conservatively correct but isn't precise; for example, we should still warn on: bool f(bool b) { return ~b > 0x1'0000'0000LL; } ... because `~b` always fits in an `int`, but I think `MaxWidth` here will be 64 so we won't warn. (It'd be nice to warn even on `~b > 0` but I don't think we can do that without some major changes to how `IntRange` is represented.) The true result range of `~n` will be something like [-2^N, -2^N + 2^M) u [2^N - 2^M, 2^N) if the input is signed. `IntRange` can't represent a range with a hole in the middle like that, but it can represent [-2^N, 2^N), which seems like the least wrong answer that we can give -- that is notably also the entire range of the type of the `~` expression. We get a contiguous range [-2^N, -2^N + 2^M) if the input is known non-negative, but `IntRange` still can't represent that, and so that doesn't help us make our result any more precise than using the entire range of the type of `E`, unfortunately. So I think `UO_Not` should never look at its subexpression -- the best result we can give is `IntRange::forValueOfType(C, GetExprType(E))`, and that's what we should use.

@AshleyRoll are you still working on this patch?

I've not been able to get the time for a while, I hope to be able to spend some more time on it, but I'd probably do that through a GitHub PR now. If someone else wants to take it on, I'd be fine with that.

In D126960#4643979, @AshleyRoll wrote:

If someone else wants to take it on, I'd be fine with that.

You can resume working with this patch, if you want. It's just a reminder. )

Revision Contents

Path

Size

clang/

lib/

Sema/

SemaChecking.cpp

7 lines

test/

Sema/

compare.c

22 lines

Diff 434004

clang/lib/Sema/SemaChecking.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 12,319 Lines • ▼ Show 20 Lines if (const auto *UO = dyn_cast<UnaryOperator>(E)) {

// Boolean-valued operations are white-listed. // Boolean-valued operations are white-listed.

case UO_LNot: case UO_LNot:

return IntRange::forBoolType(); return IntRange::forBoolType();

// Operations with opaque sources are black-listed. // Operations with opaque sources are black-listed.

case UO_Deref: case UO_Deref:

case UO_AddrOf: // should be impossible case UO_AddrOf: // should be impossible

return IntRange::forValueOfType(C, GetExprType(E)); return IntRange::forValueOfType(C, GetExprType(E));

case UO_Not:

// unary not promotes boolean to integer

erichkeaneUnsubmitted

Not Done

case UO_Not:

- // unary not promotes boolean to integer

+ // Unary bitwise not promotes boolean to integer.

if (UO->getSubExpr()->isKnownToHaveBooleanValue())

erichkeane:

if (UO->getSubExpr()->isKnownToHaveBooleanValue())

return IntRange(MaxWidth, false);

erichkeaneUnsubmitted

Not Done

Can you trace what MaxWidth ends up being every time? Are we sure this will always just be int-width and not long long or something?

erichkeane: Can you trace what MaxWidth ends up being every time? Are we sure this will always just be int…

LLVM_FALLTHROUGH;

rsmithUnsubmitted

Not Done

I don't think falling through and picking up the expression range of the operand is ever correct for ~. For example:

bool f(char c) {
  return ~c > 0x10000;
}

... produces a bogus tautological comparison warning. (This is not tautological: the ~ operator will map negative chars to ints in the range [2^31 - 2^7, 2^31) and non-negative chars to ints in the range [-2^31, -2^31+2^7), so this is equivalent to c < 0.)

Also, using MaxWidth here is conservatively correct but isn't precise; for example, we should still warn on:

bool f(bool b) {
  return ~b > 0x1'0000'0000LL;
}

... because ~b always fits in an int, but I think MaxWidth here will be 64 so we won't warn. (It'd be nice to warn even on ~b > 0 but I don't think we can do that without some major changes to how IntRange is represented.)

The true result range of ~n will be something like [-2^N, -2^N + 2^M) u [2^N - 2^M, 2^N) if the input is signed. IntRange can't represent a range with a hole in the middle like that, but it can represent [-2^N, 2^N), which seems like the least wrong answer that we can give -- that is notably also the entire range of the type of the ~ expression. We get a contiguous range [-2^N, -2^N + 2^M) if the input is known non-negative, but IntRange still can't represent that, and so that doesn't help us make our result any more precise than using the entire range of the type of E, unfortunately. So I think UO_Not should never look at its subexpression -- the best result we can give is IntRange::forValueOfType(C, GetExprType(E)), and that's what we should use.

rsmith: I don't think falling through and picking up the expression range of the operand is ever…

erichkeaneUnsubmitted

Not Done

Richard mentions UO_PreInc, UO_PreDec, UO_Minus, and UO_Not. What about the other 3?

erichkeane: Richard mentions UO_PreInc, UO_PreDec, UO_Minus, and UO_Not. What about the other 3?

aaron.ballmanUnsubmitted

Not Done

UO_Plus also.

aaron.ballman: `UO_Plus` also.

rsmithUnsubmitted

Not Done

Taking those in turn:

UO_PreInc and UO_PreDec can only produce values in the range of the lvalue operand; we'll never have a more precise range for an lvalue than the range from its type unless it's a bit-field, in which case we *do* want to take the bit-field's width into account. So I think the current code is actually correct for those two -- looking at the range of the operand gives the right result even for bit-fields, whereas looking at the type of the operand would give a less precise result in that case.
UO_Minus is not giving the right answer. I think we should model -expr in exactly the same way we model 0 - expr -- see the code for BO_Sub above.
UO_Plus seems OK with the current code: the range of values of +expr is the same as the range of values of expr, even if a promotion happens.

In any case, I don't think we should have a default here that returns the width of the operand. Instead, I think the default should return the range for the type of E.

rsmith: Taking those in turn: - `UO_PreInc` and `UO_PreDec` can only produce values in the range of the…

default: default:

return GetExprRange(C, UO->getSubExpr(), MaxWidth, InConstantContext, return GetExprRange(C, UO->getSubExpr(), MaxWidth, InConstantContext,

Approximate); Approximate);

} }

if (const auto *OVE = dyn_cast<OpaqueValueExpr>(E)) if (const auto *OVE = dyn_cast<OpaqueValueExpr>(E))

return GetExprRange(C, OVE->getSourceExpr(), MaxWidth, InConstantContext, return GetExprRange(C, OVE->getSourceExpr(), MaxWidth, InConstantContext,

▲ Show 20 Lines • Show All 5,417 Lines • Show Last 20 Lines

clang/test/Sema/compare.c

	Show First 20 Lines • Show All 413 Lines • ▼ Show 20 Lines
	};			};

	void pr36008(enum PR36008EnumTest lhs) {			void pr36008(enum PR36008EnumTest lhs) {
	__typeof__(lhs) x = lhs;			__typeof__(lhs) x = lhs;
	__typeof__(kPR36008Value) y = (kPR36008Value);			__typeof__(kPR36008Value) y = (kPR36008Value);
	if (x == y) x = y; // no warning			if (x == y) x = y; // no warning
	if (y == x) y = x; // no warning			if (y == x) y = x; // no warning
	}			}

				int warn_on_different_sign_after_unary_operator(unsigned a, int b) {
				return
				// unary not promotes boolean to int
				(a > ~(!b)) // expected-warning {{comparison of integers of different signs: 'unsigned int' and 'int'}}
				&&
				(a > -(b)) // expected-warning {{comparison of integers of different signs: 'unsigned int' and 'int'}}
				&&
				(a > ++b) // expected-warning {{comparison of integers of different signs: 'unsigned int' and 'int'}}
				&&
				(a > --b) // expected-warning {{comparison of integers of different signs: 'unsigned int' and 'int'}}
				&&
				// unary not promotes boolean to int
				(b > ~(!a)) // no warning
				&&
				(b > -(a)) // expected-warning {{comparison of integers of different signs: 'int' and 'unsigned int'}}
				&&
				(b > ++a) // expected-warning {{comparison of integers of different signs: 'int' and 'unsigned int'}}
				&&
				(b > --a) // expected-warning {{comparison of integers of different signs: 'int' and 'unsigned int'}}
				;
				}

This is an archive of the discontinued LLVM Phabricator instance.

[clang][sema] Unary not boolean to int conversionNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 434004

clang/lib/Sema/SemaChecking.cpp

clang/test/Sema/compare.c

[clang][sema] Unary not boolean to int conversion
Needs ReviewPublic