This is an archive of the discontinued LLVM Phabricator instance.

[clang] Add a warning (à la gcc) for too small enum bitfields
Needs ReviewPublic

Authored by faisalv on Nov 17 2020, 11:22 AM.

Details

Summary

Currently clang warns on 'assigning' to an enum bit-field that can not accommodate all its enumerators - but not when such a bit-field is defined.

GCC warns at definition time (https://godbolt.org/z/sKescn) - which seems like a useful thing to do, given certain programmer's propensities for memcpying.

This patch warns when such enum bit-fields are defined - and warns on unnamed bit-fields too (is that the right thing to do here?) building on work done by @rnk here https://github.com/llvm/llvm-project/commit/329f24d6f6e733fcadfd1be7cd3b430d63047c2e

Implementation Strategy:

  • add the check, modeled after Reid's check in his patch, into VerifyBitField (checks if the width expression is an ICE and it's a worthy type etc.) that gets called from CheckFieldDecl

Questions for reviewers:

  • Should we be emitting such a warning for unnamed bitfields?
  • When comparing an APSInt to an unsigned value - should i prefer using the overloaded operator (i.e. Value < BitsNeeded) or stick with BitsNeeded > Value.getZExtValue().

Thank you!

Diff Detail

Event Timeline

faisalv requested review of this revision.Nov 17 2020, 11:22 AM
faisalv created this revision.
rsmith added a subscriber: rsmith.Nov 17 2020, 2:40 PM
rsmith added inline comments.
clang/lib/Sema/SemaDecl.cpp
16443–16446

Please add a member function to EnumDecl to compute this. SemaChecking.cpp does the same calculation twice, and CGExpr.cpp does it once too, so it seems past time to factor out this calculation.

16448

When comparing an APSInt to an unsigned value - should i prefer using the overloaded operator (i.e. Value < BitsNeeded) or stick with BitsNeeded > Value.getZExtValue().

Never use get*ExtValue unless you've already checked that the integer fits in 64 bits (and even then, it's better to avoid it when you can). For example, due to the incorrect pre-existing use of getZExtValue in this function, we asserted on:

enum E {};
struct X { E e : (__int128)0x7fff'ffff'ffff'ffff * (__int128)0x7fff'ffff'ffff'ffff; };

I've fixed that in rG8e923ec2a803d54154aaa0079c1cfcf146b7a22f, but we should avoid reintroducing uses of getZExtValue here.

16454–16459

Should we be emitting such a warning for unnamed bitfields?

The warning is not useful for unnamed bit-fields, as they represent only padding and not values. We should suppress it in that case.

(Maybe we should have a warning for unnamed bit-fields of "unusual" types, since that probably indicates that a name was forgotten, but there might also be valid reasons to do that, such as to avoid the problematic corners of the MS ABI bit-field layout rule. So it seems better to suppress it entirely.)

clang/test/SemaCXX/warn-bitfield-enum-conversion.cpp
62

Please remove the FV_ prefix here -- authorship information is in the version control history if we need it.

It would also be better to write out the test twice (once with named and once with unnamed bit-fields) instead of using a macro and running the entire test file an additional time. Each clang invocation adds to our total test time much more substantially than a few more lines of testcase do, and non-macro-expanded testcases are easier to understand and maintain.

faisalv updated this revision to Diff 305970.Nov 17 2020, 9:27 PM

Based on Richards Feedback, this update includes the following changes:

  • avoids calling the fragile getZextValue() for comparing against bit-field width, and uses APSInt's comparison operator overload
  • suppresses/avoids the warning for unnamed bit-fields
  • simplifies the test case and avoids preprocessor cleverness (and thus an extra pass).

Thank you for taking the time to review this Richard!

faisalv planned changes to this revision.Nov 17 2020, 9:34 PM
faisalv marked 3 inline comments as done.
faisalv added inline comments.
clang/lib/Sema/SemaDecl.cpp
16443–16446

Done. Will try and commit a separate patch that replaces those uses, once this one has been approved.

16448

Aah! I'm glad I asked. Thanks!

clang/test/SemaCXX/warn-bitfield-enum-conversion.cpp
62

Please remove the FV_ prefix here -- authorship information is in the version control history if we need it.

Hah - that wasn't a signifier of authorship but rather employment of a well known acronym in military circles (Field Verification : https://www.acronymfinder.com/Field-Verification-(Census)-(FV).html ) Lol - can't slip anything by you Richard ;)

It would also be better to write out the test twice (once with named and once with unnamed bit-fields) instead of using a macro and running the entire test file an additional time. Each clang invocation adds to our total test time much more substantially than a few more lines of testcase do, and non-macro-expanded testcases are easier to understand and maintain.

Good to know. Thanks. Suppression of the warning for unnamed bitfields made that test case even simpler.

faisalv requested review of this revision.Nov 23 2020, 7:35 PM
faisalv marked 4 inline comments as done.

*ping*

Do you have any numbers on false positives / true positives uncovered by this tweak? In general, warning at use time instead of at declaration time tends to be much better for this rate, and we do this differently than gcc in several instances, because the gcc way has so many false positives that it makes warnings unusable, while warning on use makes the warning useful. So I'd like to see a better justification than "gcc does it" :)

Do you have any numbers on false positives / true positives uncovered by this tweak?

That's a great question - and unfortunately not only do I have no hard data to support or discourage the addition of such a warning - I don't even know how some of those incredible folks (who make data based claims about programming smells and needs) gather such data (leave alone making sure the data is a representative sample for the "right" kind of programmers) - and would love to be shown how to run such studies :)

In general, warning at use time instead of at declaration time tends to be much better for this rate, and we do this differently than gcc in several instances, because the gcc way has so many false positives that it makes warnings unusable, while warning on use makes the warning useful. So I'd like to see a better justification than "gcc does it" :)

Aah - i did not realize that it was a deliberate decision not to implement such a warning at definition time. My justification (asides from gcc being the light for us in dark places, when all other lights go out ;) for implementing a warning at definition time probably just stems from an instinctual preference for early diagnostics - i suppose there is no one size that fits all here - for e.g. some folks prefer run-time (duck-typing) operation checking vs compile-time checking, and others who prefer diagnosing fundamental issues with template-code when they are first parsed, as opposed to waiting for some instantiation - and then there is the entire C++0X concepts vs concepts-lite discussion ...

So, since I feel I lack the authority to justify such a warning (asides from my ideological propensities) in the face of your valid concerns (either anecdotal or fueled by sufficient data) - I would prefer to defer to you and withdraw the patch :) (unless someone else feels they are able to provide justification that might resonate with you).

Thanks for chiming in!