This is an archive of the discontinued LLVM Phabricator instance.

Fix Wbitfield-constant-conversion on 1-bit signed bitfield
ClosedPublic

Authored by ShawnZhong on Aug 5 2022, 5:28 AM.

Diff Detail

Event Timeline

ShawnZhong created this revision.Aug 5 2022, 5:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 5 2022, 5:28 AM
ShawnZhong requested review of this revision.Aug 5 2022, 5:28 AM
ShawnZhong updated this revision to Diff 450274.Aug 5 2022, 5:43 AM
aaron.ballman added reviewers: aaron.ballman, Restricted Project.Aug 5 2022, 6:00 AM
aaron.ballman added a subscriber: aaron.ballman.

Thank you for working on this! I think this is going in the right direction, but I think there's some more work needed here because the rules depend on the type of the bit-field (at least in C, I've not checked C++ yet). See C2x 6.7.2.1p12 (and it's footnote): "A bit-field is interpreted as having a signed or unsigned integer type consisting of the specified number of bits. ..." (and the footnote goes on to say "As specified in 6.7.2 above, if the actual type specifier used is int or a typedef-name defined as int, then it is implementation-defined whether the bit-field is signed or unsigned. ...").

Assuming that we actually do treat plain int bit-fields as being signed (which I'm pretty sure we do), then I agree that these changes are fine. But you should add some CodeGen test coverage (or find existing coverage) that proves how we behave in that case and an int test in Sema showing we match the codegen behavior.

Also, can you add a release note for the fix?

clang/test/Sema/constant-conversion.c
145

Thanks for the quick reply and the reference on the C standard!

On the C++ side, Section C.1.8 specified that int bit-fields are signed:

Change: Bit-fields of type plain int are signed.
Rationale: Leaving the choice of signedness to implementations could lead to inconsistent definitions of template specializations. For consistency, the implementation freedom was eliminated for non-dependent types, too.
Effect on original feature: The choise is implementation-defined in C, but not so in C++.

Implementation-wise, I'll see what I can find in CodeGen on whether int bit-fields are signed for C

Thanks for the quick reply and the reference on the C standard!

On the C++ side, Section C.1.8 specified that int bit-fields are signed:

Change: Bit-fields of type plain int are signed.
Rationale: Leaving the choice of signedness to implementations could lead to inconsistent definitions of template specializations. For consistency, the implementation freedom was eliminated for non-dependent types, too.
Effect on original feature: The choise is implementation-defined in C, but not so in C++.

Agreed; I think that's falling out from https://eel.is/c++draft/class.bit#4.sentence-1 and https://eel.is/c++draft/basic.fundamental#1.sentence-1.

Implementation-wise, I'll see what I can find in CodeGen on whether int bit-fields are signed for C

Thanks!

ShawnZhong updated this revision to Diff 450468.Aug 5 2022, 9:36 PM
  • Added CodeGen tests.
  • Fixed failed tests that assigned 1 to int:1.
  • Modified the Sema test to use int: instead of signed char:1.
  • Added release note.

I did a pass through, this seems like it is the right thing. I looked over the codegen tests, and I'm convinced it does what we want from that perspective. I'll leave it to @aaron.ballman to take another look for +1'ing this though.

aaron.ballman accepted this revision.Aug 8 2022, 8:39 AM

LGTM! Do you need someone to commit on your behalf? If so, what name and email address would you like used for patch attribution?

This revision is now accepted and ready to land.Aug 8 2022, 8:39 AM
ShawnZhong marked an inline comment as done.Aug 9 2022, 1:11 AM

LGTM! Do you need someone to commit on your behalf? If so, what name and email address would you like used for patch attribution?

I'm new to Phabricator. I don't need someone else to commit on my behalf, but I guess I don't have permission to merge a commit on my own.

If the patch submitted misses author information for some reason, feel free to use Shawn Zhong <github@shawnzhong.com>.

thakis added a subscriber: thakis.Aug 11 2022, 3:04 PM
thakis added inline comments.
clang/lib/Sema/SemaChecking.cpp
13069

This was to suppress false positives. All instances we've seen of this are in fact false positives.

Was there any analysis on true / false positives for this change?

At least for our code, this seems to make the warning strictly worse.

aaron.ballman added inline comments.Aug 12 2022, 5:14 AM
clang/lib/Sema/SemaChecking.cpp
13069

I've not performed any such analysis, but it would be good to know. FWIW, this is the kind of situation I was thinking this diagnostic would help make far more clear: https://godbolt.org/z/336n9xex3 (not that I expect people to write that static assert, but I very much expect people who assign the value 1 into a bit-field and then check for the value 1 to come back out are going to be surprised).

That said, another approach to that specific scenario, which might have a better true positive rate would be:

struct S {
  int bit : 1;
};

int main() {
  constexpr S s{1}; // No warning
  if (s.bit == 1) { // Warn about explicit comparison of a 1-bit bit-field
    ...
  } else if (s.bit == 0) { // Warn about explicit comparison of a 1-bit bit-field?
    ...
  } else if (s.bit) { // No warning
    ...
  } else if (!s.bit) { // No warning
    ...
  }
}
ShawnZhong added inline comments.Aug 12 2022, 5:56 AM
clang/lib/Sema/SemaChecking.cpp
13069

Do you have something in mind that counts as false positives?

13069

BTW, I realized that no warning is reported when a bool is assigned to a single-bit signed bit-field:

https://godbolt.org/z/M5ex48T8s

int main() {
  struct S { int b : 1; } s;
  s.b = true;
  if (s.b == true) { puts("T"); } else { puts("F"); }
}
ShawnZhong added inline comments.Aug 12 2022, 6:34 AM
clang/lib/Sema/SemaChecking.cpp
13069

For reference, I found this issue on chromium:

https://bugs.chromium.org/p/chromium/issues/detail?id=1352183

aaron.ballman added inline comments.Aug 12 2022, 7:30 AM
clang/lib/Sema/SemaChecking.cpp
13069

Do you have something in mind that counts as false positives?

In my example above, I consider the use of s.bit and !s.bit to count as false positives. The value in the bit-field being 1 or -1 has no bearing on the semantics of the program, the only thing that matters is zero vs nonzero because of the boolean conversion.

BTW, I realized that no warning is reported when a bool is assigned to a single-bit signed bit-field:

Good catch, the conversion from bool to integer does give the value 1 explicitly. At the same time, I would consider the assignment to the bit-field from a boolean to be a non-issue, the problem only stems from attempting to use inspect the value.

aaron.ballman added inline comments.Aug 13 2022, 6:45 AM
clang/lib/Sema/SemaChecking.cpp
13069

BTW, I realized that no warning is reported when a bool is assigned to a single-bit signed bit-field:

Good catch, the conversion from bool to integer does give the value 1 explicitly. At the same time, I would consider the assignment to the bit-field from a boolean to be a non-issue, the problem only stems from attempting to use inspect the value.

Actually, that's a more interesting case than I had originally realized. See C2x 6.7.2.1p12. "If the value 0 or 1 is stored into a nonzero-width bit-field of type bool, the value of the bit-field shall compare equal to the value stored; a bool bit-field has the semantics of a bool."

So if you store a 1 into a bool bit-field, you have to get a 1 back out of it. The bit-field is implicitly unsigned, effectively.

ShawnZhong added inline comments.Aug 13 2022, 9:38 AM
clang/lib/Sema/SemaChecking.cpp
13069

In my example above, I consider the use of s.bit and !s.bit to count as false positives. The value in the bit-field being 1 or -1 has no bearing on the semantics of the program, the only thing that matters is zero vs nonzero because of the boolean conversion.

I agree with you that if s.bit is only used to be converted to bool, then the value stored being 1 or -1 does not matter that much. But hypothetically, one could use s.bit for arithmetic (e.g., index into an array), assignment (say into another int or enum), or, in your example, comparison against another value. If a warning is not generated during conversion, handling them in other places would be harder.

Actually, that's a more interesting case than I had originally realized. See C2x 6.7.2.1p12. "If the value 0 or 1 is stored into a nonzero-width bit-field of type bool, the value of the bit-field shall compare equal to the value stored; a bool bit-field has the semantics of a bool."
So if you store a 1 into a bool bit-field, you have to get a 1 back out of it. The bit-field is implicitly unsigned, effectively.

That makes sense to me. I'm not particularly concerned with assigning 1 to bool:1 or any uint*_t:1 in general. People may want to use them to compactly store bools in a struct.

I realized that no warning is reported when bool is assigned to a single-bit signed bit-field

The issue I referred to is the following:

int main() {
  struct S { int b : 1; } s;
  s.b = 1;     // warning generated after this patch
  s.b = true;  // still no warning, but there should be
}
aaron.ballman added inline comments.Aug 15 2022, 4:35 AM
clang/lib/Sema/SemaChecking.cpp
13069

If a warning is not generated during conversion, handling them in other places would be harder.

Harder, but with less false positives. We try to keep the false positive rate as low as we can for diagnostics in clang so that users are less tempted to entirely disable the diagnostic as being low-value.

The issue I referred to is the following:

int main() {
  struct S { int b : 1; } s;
  s.b = 1;     // warning generated after this patch
  s.b = true;  // still no warning, but there should be
}

On the one hand, yes -- pedantically, that is true. On the other hand, does use of a boolean indicate programmer intent that this field be treated as a boolean, and thus the actual value stored doesn't matter?

At this point, I think we need some evidence that this is finding true positives instead of only false positives. We've already heard about Chromium, but I think you should try against some other larger C and C++ code bases to see what the true positive rate is. If there's evidence that this it catching actual bugs in practice, then it strengthens the case for the current design. If we don't find many true positives but find it increases the false positive rate, we should either rework this patch or back it out entirely.

(I don't have a particular corpus in mind, but I'm thinking things like OpenSSL or sqlite3 for C code, and LLVM or Qt for C++ code in terms of size.)

bjope added a subscriber: bjope.Aug 16 2022, 5:30 AM

FWIW, just some things noticed when I examined some of the new warning that popped up after this patch:

https://github.com/llvm/llvm-project/issues/53253 mentioned that for example gcc complained about this. Although, as shown here https://godbolt.org/z/bq34Kexac there are some other differences that clang now complains already with -Wall, but that is not the case with gcc (I think one need to enable -Wpedantic in gcc to get a warning about signed overflow).

A similar warning (when assigning -1 to an unsigned bitfield) can be given by both gcc and clang by using -Wsign-conversion but that is not part of -Wall either. But maybe that is a totally different thing.

After some more thought and some offline discussions, I think I have a reasonable way forward: let's add -Wsingle-bit-bitfield-constant-conversion as a new warning group under -Wbitfield-constant-conversion that controls the diagnostic for one-bit bitfields. The diagnostic behavior here is pedantically correct and will help catch some bugs for folks, but there's enough existing code using 1 and not misbehaving (probably because they're not inspecting the value except in a boolean context) that having separate control seems useful.

What do others think?

A similar warning (when assigning -1 to an unsigned bitfield) can be given by both gcc and clang by using -Wsign-conversion but that is not part of -Wall either. But maybe that is a totally different thing.

It's a different thing. When the bit-field is unsigned, the conversion from signed is well-defined and so we only issue a -Wconversion warning, but it doesn't technically truncate the bit-field. That's also why we don't diagnose: https://godbolt.org/z/WW9477jfG

After some more thought and some offline discussions, I think I have a reasonable way forward: let's add -Wsingle-bit-bitfield-constant-conversion as a new warning group under -Wbitfield-constant-conversion that controls the diagnostic for one-bit bitfields. The diagnostic behavior here is pedantically correct and will help catch some bugs for folks, but there's enough existing code using 1 and not misbehaving (probably because they're not inspecting the value except in a boolean context) that having separate control seems useful.

What do others think?

I went ahead and posted https://reviews.llvm.org/D132851.