Page MenuHomePhabricator

[clang-tidy] Fix cppcoreguidelines-init-variables by removing the enum FixIt, and add support for initialization check of scoped enum.
ClosedPublic

Authored by Sockke on Jul 21 2021, 4:49 AM.

Details

Summary

In C++, the enumeration is never Integer, and the enumeration condition judgment is added to avoid compiling errors when it is initialized to an integer.
Add support for initialization check of scope enum.
As the following case show, clang-tidy will give a wrong automatic fix:

enum Color {Red, Green, Blue};
enum class Gender {Male, Female};
void func() {
  Color color; // Color color = 0; <--- fix bug
  Gender gender; // <--- no warning
}

Diff Detail

Event Timeline

Sockke created this revision.Jul 21 2021, 4:49 AM
Sockke requested review of this revision.Jul 21 2021, 4:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 21 2021, 4:49 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Is this the right decision to make, conceptually? It will leave the variable uninitialised still, and reading such an uninit variable is still an issue, even if it is an enum. Could we consider the alternative of warning the user about the uninitialized variable, just not offering an automatic (and potentially bad / incorrect) fix?

steven.zhang added inline comments.Jul 21 2021, 4:58 AM
clang-tools-extra/test/clang-tidy/checkers/cppcoreguidelines-init-variables.cpp
103

Technical speaking, we should warn about such case, ad the color is undefined now. Can we initialize it to the lowerest value of the enum type ?

MTC added a comment.EditedJul 21 2021, 4:59 AM

Is this the right decision to make, conceptually? It will leave the variable uninitialised still, and reading such an uninit variable is still an issue, even if it is an enum.

Yeah, that's right. However, it's much more difficult to give enum an initial value than an integer.

Could we consider the alternative of warning the user about the uninitialized variable, just not offering an automatic (and potentially bad / incorrect) fix?

Make sense, we (ByteDance) are also hesitating whether we should provide automatic fix for uninitialized variables, because automatic fix may change the program semantics.

MTC added inline comments.Jul 21 2021, 5:02 AM
clang-tools-extra/test/clang-tidy/checkers/cppcoreguidelines-init-variables.cpp
103

Good point! However, which value should be chosen as automatic fix is a problem worth discussing.

Is this the right decision to make, conceptually? It will leave the variable uninitialised still, and reading such an uninit variable is still an issue, even if it is an enum.

Yeah, that's right. However, it's much more difficult to give enum an initial value than an integer.

Could we consider the alternative of warning the user about the uninitialized variable, just not offering an automatic (and potentially bad / incorrect) fix?

Make sense, we (ByteDance) are also hesitating whether we should provide automatic repair for uninitialized variables, because automatic fix may change the program semantics.

This is the same as what we did for integer in essential. We are changing an uninitialized value to the default one, which makes sense as uninitialized means it could be anything.

Sockke edited the summary of this revision. (Show Details)Jul 21 2021, 8:34 PM
Sockke edited the summary of this revision. (Show Details)
Sockke edited the summary of this revision. (Show Details)Jul 21 2021, 8:38 PM
Sockke edited the summary of this revision. (Show Details)

At present, Our views on preventing UB are basically the same, so a warning still needs to be reported (BTW, the original version will not report warnings for enum class types). The final question is what is the recommended value for initialization and whether to provide the automatic fix. Hope you can give us a direction, thanks!@aaron.ballman

The problem with enums is that translating zero (0, 0.0, nullptr, etc...) to the enum case is not always apparent. A warning should always be given. And if you can find a zero member in the enum, we can report an automated suggestion for that.

To give an example

enum class BaseColour { BLACK, RED, GREEN, BLUE, WHITE };

BLACK is the 0, so we can fix-it. If the 0 is the first member (usually in enums that have a well-defined "default" or "None" case...), then, it is alright to use that as the automated fix.

However, if there isn't a "zero case" in the enum, for example:

enum class Something {
  FOO = -2,
  BAR, // This will be -1
  BAZ = 84,
  QUX  // This will be 85
};

then simply using the first member might not be the best fitting action, so in such case, I would opt for no FixIt. But, obviously, yes for giving a warning.

The problem with enums is that translating zero (0, 0.0, nullptr, etc...) to the enum case is not always apparent. A warning should always be given. And if you can find a zero member in the enum, we can report an automated suggestion for that.

I think we shouldn't give any fix-it for enumerations. Zero doesn't always mean "the right default value" -- for example, another common idiom is for the *last* member of the enumeration to be a sentinel value.

clang-tools-extra/test/clang-tidy/checkers/cppcoreguidelines-init-variables.cpp
103

FWIW, I'd expect a warning but no fix-it for these.

The problem with enums is that translating zero (0, 0.0, nullptr, etc...) to the enum case is not always apparent. A warning should always be given. And if you can find a zero member in the enum, we can report an automated suggestion for that.

I think we shouldn't give any fix-it for enumerations. Zero doesn't always mean "the right default value" -- for example, another common idiom is for the *last* member of the enumeration to be a sentinel value.

I agree with you, but we need to consider that the checker works in a way that it gives the "zero" for integers. If we are here, was that the right decision? I mean... I wonder how much consistency we should shoot for. (nullptr for the pointers as default is at least somewhat sensible.)

But definitely, the warning must be given, that is true.

The problem with enums is that translating zero (0, 0.0, nullptr, etc...) to the enum case is not always apparent. A warning should always be given. And if you can find a zero member in the enum, we can report an automated suggestion for that.

I think we shouldn't give any fix-it for enumerations. Zero doesn't always mean "the right default value" -- for example, another common idiom is for the *last* member of the enumeration to be a sentinel value.

I agree with you, but we need to consider that the checker works in a way that it gives the "zero" for integers. If we are here, was that the right decision?

If we're here, I think the right decision would have been to not support the C++ Core Guidelines in clang-tidy in the first place because of the lack of attention that's been paid to enforcement for the rules by the authors. ;-) But snark aside, I think = 0 is a somewhat defensible naïve default value for integers, but today I would prefer it give its fix-it on a note rather than on the warning. When there's only one fix-it to be applied to a warning, it's possible to apply the fix-it without any per-case user intervention and that default value isn't statically known to be safe. With nullptr for pointers, it's more defensible because that's a well-understood sentinel value that code *should* be checking for (and there's good tooling to catch null pointer dereferences for the other situations). With NAN for floats, it's more defensible because that (usually!) is a "sticky" value that poisons further uses of the type. There's nothing quite like that for integers.

I mean... I wonder how much consistency we should shoot for. (nullptr for the pointers as default is at least somewhat sensible.)

There are at least two ways to chase consistency here -- one is being consistent with other fix-its in the check, the other is being consistent with other fix-its across the project. My feeling is we should aim for consistency with the rest of the toolchain, which is to only issue a fix-it on a warning when the fix is known to be correct. https://clang.llvm.org/docs/InternalsManual.html#fix-it-hints

Perhaps one way forward would be to issue the integer fix-it on a note rather than a warning, and for enumerations, we could issue up to two fix-its on a note, one for the first and one for the last enumerator in an enumeration (and if the enum only contains one enumerator, there's only one fix-it to generate which suggests it could be on the warning rather than a note, but that seems like a lot of trouble for an unlikely scenario). However, I don't recall how clang-tidy interacts with fix-its on notes off the top of my head, so I'm making an assumption that clang-tidy's automatic fixit applying mode handles notes the same way as clang and we should double-check that assumption.

Another way forward would be to not issue a fix-it for integers or enumerations.

However, I don't recall how clang-tidy interacts with fix-its on notes off the top of my head, so I'm making an assumption that clang-tidy's automatic fixit applying mode handles notes the same way as clang and we should double-check that assumption.

I have one information from January that if you're viewing the diagnostic output as a sequence of [warning, note, note, ...] elements (so you "group by" warning), Clang-Tidy will apply the first fix (be it on the warning or the note) in the order of diag() calls. (There was a (never-upstreamed) check in which I had to abuse this fact.) This behaviour could've changed, though...

Another way forward would be to not issue a fix-it for integers or enumerations.

This might be the best course of action, and could be fixed in the same patch (this one)...

However, I don't recall how clang-tidy interacts with fix-its on notes off the top of my head, so I'm making an assumption that clang-tidy's automatic fixit applying mode handles notes the same way as clang and we should double-check that assumption.

I have one information from January that if you're viewing the diagnostic output as a sequence of [warning, note, note, ...] elements (so you "group by" warning), Clang-Tidy will apply the first fix (be it on the warning or the note) in the order of diag() calls. (There was a (never-upstreamed) check in which I had to abuse this fact.) This behaviour could've changed, though...

Another way forward would be to not issue a fix-it for integers or enumerations.

This might be the best course of action, and could be fixed in the same patch (this one)...

I think that's a reasonable way forward, though I don't insist on changing the integer behavior if others have strong opinions that it is correct. I do have strong opinions on fixing the enumeration behavior because that fix-it is wrong for C++ code.

However, I don't recall how clang-tidy interacts with fix-its on notes off the top of my head, so I'm making an assumption that clang-tidy's automatic fixit applying mode handles notes the same way as clang and we should double-check that assumption.

I have one information from January that if you're viewing the diagnostic output as a sequence of [warning, note, note, ...] elements (so you "group by" warning), Clang-Tidy will apply the first fix (be it on the warning or the note) in the order of diag() calls. (There was a (never-upstreamed) check in which I had to abuse this fact.) This behaviour could've changed, though...

Another way forward would be to not issue a fix-it for integers or enumerations.

This might be the best course of action, and could be fixed in the same patch (this one)...

I may be more supportive of the first view. Based on clang -Wuninitialized, we hope that clang-tidy can do more. In addition, if there are two recommended values for enum initialization, we can put them in one note.

Any thoughts? : )

Any thoughts? : )

First, let's first fix that we should still warn for the uninitialised enum case, without a FixIt. That's the issue at hand, right now, Clang-Tidy generates, as you identified, broken output. We can discuss the later steps after this is fixed. Please implement this logic, and update the patch, so we have a snapshot of how that would look like and the thing working.

Afterwards, as Aaron suggested:

for enumerations, we could issue up to two fix-its on a note, one for the first and one for the last enumerator in an enumeration (and if the enum only contains one enumerator, there's only one fix-it to generate which suggests it could be on the warning rather than a note, but that seems like a lot of trouble for an unlikely scenario). However, I don't recall how clang-tidy interacts with fix-its on notes off the top of my head, so I'm making an assumption that clang-tidy's automatic fixit applying mode handles notes the same way as clang and we should double-check that assumption.

This might require nontrivial changes to the check's code, and investigating the potential problem with automated fix-it application when multiple conflicting fix-its are given on a note (not a warning line).

I think we all would be fine with only doing the first step (reintroduce the warning, without a fixit) in this patch, so it can be merged and hit the the mainline code quickly. The next step can be its own patch. 🙂


In addition, I wager that adding a line about this fix to the release notes at clang-tools-extra/docs/ReleaseNotes.rst is a good option, I can imagine people having turned this check off due to it being broken on enums.

Any thoughts? : )

First, let's first fix that we should still warn for the uninitialised enum case, without a FixIt. That's the issue at hand, right now, Clang-Tidy generates, as you identified, broken output. We can discuss the later steps after this is fixed. Please implement this logic, and update the patch, so we have a snapshot of how that would look like and the thing working.

+1

Afterwards, as Aaron suggested:

for enumerations, we could issue up to two fix-its on a note, one for the first and one for the last enumerator in an enumeration (and if the enum only contains one enumerator, there's only one fix-it to generate which suggests it could be on the warning rather than a note, but that seems like a lot of trouble for an unlikely scenario). However, I don't recall how clang-tidy interacts with fix-its on notes off the top of my head, so I'm making an assumption that clang-tidy's automatic fixit applying mode handles notes the same way as clang and we should double-check that assumption.

This might require nontrivial changes to the check's code, and investigating the potential problem with automated fix-it application when multiple conflicting fix-its are given on a note (not a warning line).

I think we all would be fine with only doing the first step (reintroduce the warning, without a fixit) in this patch, so it can be merged and hit the the mainline code quickly. The next step can be its own patch. 🙂


In addition, I wager that adding a line about this fix to the release notes at clang-tools-extra/docs/ReleaseNotes.rst is a good option, I can imagine people having turned this check off due to it being broken on enums.

I think, the intention of the auto fix for such case is to make the program's behavior well-defined, not with some uninitialize value that cannot be predicted. It is ok as far as we select any enumerator to initialize it from my understanding. We should separate it into anther patch of cause.

Sockke updated this revision to Diff 362291.Jul 28 2021, 1:06 AM
Sockke retitled this revision from [clang-tidy] Fix cppcoreguidelines-init-variables with enum judgement to [clang-tidy] Fix cppcoreguidelines-init-variables by removing the enum FixIt, and add support for initialization check of scoped enum..
Sockke edited the summary of this revision. (Show Details)

Update!

whisperity added inline comments.Jul 28 2021, 2:32 AM
clang-tools-extra/docs/ReleaseNotes.rst
156 ↗(On Diff #362291)

Unterminated sequence. Also, where is this feature implemented?

whisperity added inline comments.Jul 28 2021, 5:01 AM
clang-tools-extra/docs/ReleaseNotes.rst
156 ↗(On Diff #362291)

@Sockke Sorry if I'm ultra dense here, but I still do not see what sort of support was added, or where. Do you mean that scoped enums were never reported before by this check, and that 5 lines of new code you added fixes that too? If that is the case, I'd rather rephrase the sentence here. I think FixIt is also an internal term (at least when written as such), clang-tidy --help only says:

--fix - Apply suggested fixes.

Suggestion:

Removed generating fixes for enums because the code generated was broken, trying to initialise the enum from an integer.

And if my understanding of what "support was added" is true, then also:

The check now also warns for uninitialised scoped enums.

Sockke updated this revision to Diff 362345.Jul 28 2021, 5:49 AM

Thanks for your reply @whisperity. I have updated!

Alright, thanks! So it does work that way, I didn't know. I think this is okay from my end. You can mark the inline discussion done, by the way. 🙂 Let's see if @aaron.ballman has any comments.

aaron.ballman accepted this revision.Jul 28 2021, 9:47 AM

LGTM -- it's not the cleanest solution (we're duplicating the call to diag()), but it'll suffice. If we wanted to clean it up a bit, we could use an llvm::Optional<const char *> Diagnose = llvm::None; where a non-None value means "diagnose" and if the stored pointer value is not null, it additionally adds the fixit information.

This revision is now accepted and ready to land.Jul 28 2021, 9:47 AM
Sockke updated this revision to Diff 362651.Jul 28 2021, 11:36 PM
Sockke marked 2 inline comments as done.

Thank you for your reply! I generally understand what you mean, but there are three cases, namely, no diagnosis, providing diagnosis, and providing diagnosis & FixIt. Only by judging whether "diagnosis" is null may fail to achieve the goal. I have revised it a little bit, do you think that's alright? @aaron.ballman

If we are tinkering with the code post-acceptance, could you please also change const char * to StringRef? They are the same thing essentially, but accessing the length is (AFIAK) O(1) and there isn't a concerning strlen call in the code.

whisperity added inline comments.Jul 29 2021, 12:30 AM
clang-tools-extra/clang-tidy/cppcoreguidelines/InitVariablesCheck.cpp
87

Also, if we're using Optional, this could also be = nullptr instead, and then we have the state when the Optional hasValue() but the value itself is null. In all other cases, it both hasValue() and the value is non-null.
This would also eliminate the need for the strlen call.

whisperity added inline comments.Jul 29 2021, 12:36 AM
clang-tools-extra/clang-tidy/cppcoreguidelines/InitVariablesCheck.cpp
87

In the StringRef case, you'd need to emplace a default-constructed StringRef (which has the semantics of not pointing anywhere and having length 0) into the Optional.

Sockke updated this revision to Diff 362663.Jul 29 2021, 12:59 AM

You made it very clear, thanks a lot! @whisperity

Thank you for your reply! I generally understand what you mean, but there are three cases, namely, no diagnosis, providing diagnosis, and providing diagnosis & FixIt. Only by judging whether "diagnosis" is null may fail to achieve the goal. I have revised it a little bit, do you think that's alright? @aaron.ballman

By how the code looks, you made the exact changes I was hoping to see. Thank you!

whisperity accepted this revision.Jul 29 2021, 5:40 AM
Sockke marked 2 inline comments as done.Jul 30 2021, 7:36 AM

Thank you for your reply! I generally understand what you mean, but there are three cases, namely, no diagnosis, providing diagnosis, and providing diagnosis & FixIt. Only by judging whether "diagnosis" is null may fail to achieve the goal. I have revised it a little bit, do you think that's alright? @aaron.ballman

By how the code looks, you made the exact changes I was hoping to see. Thank you!

Cheers! Can it be merged? @aaron.ballman

clang-tools-extra/docs/ReleaseNotes.rst
156 ↗(On Diff #362291)

Before that, isIntegerType() would not treat scoped enum as an integer, so scoped enum was ignored. But isEnumeralType() will not.

whisperity added a comment.EditedJul 30 2021, 7:39 AM

Cheers! Can it be merged? @aaron.ballman

Yes, but I think the patch needs to be rebased first as the pre-merge checks CI system said "patch application failed". Do you have commit access? If not, could you please specify the username and e-mail (the values of Git config variable user.name and user.email) you would like to have associated with the commit?

Cheers! Can it be merged? @aaron.ballman

Yes, but I think the patch needs to be rebased first as the pre-merge checks CI system said "patch application failed". Do you have commit access? If not, could you please specify the username and e-mail (the values of Git config variable user.name and user.email) you would like to have associated with the commit?

Thanks a lot!

Name: liuke
Email: liuke.gehry@bytedance.com

This revision was landed with ongoing or failed builds.Jul 30 2021, 9:25 AM
This revision was automatically updated to reflect the committed changes.

@Sockke Sorry to circle back here, but it seems to me that Clang now has a flag -Wuninitialized. Could you check how it behaves, compared to this check? If there are overlaps, what should we do, @aaron.ballman? Should parts of the check be deprecated?