This is an archive of the discontinued LLVM Phabricator instance.

[clang][Sema] Warn on uninitialized array elments
Needs ReviewPublic

Authored by beanz on Sep 28 2021, 2:16 PM.

Details

Summary

When defining a statically sized array with explicit array
initializers, having a warning for uninitialized members is nice.

This warning will only fire on missing elements in an explicitly sized
array initialized with an array initializer. It will not fire on char
arrays that are initialized using a string literal sequence.

This warning is added under a new warning group
-Wmissing-array-initializers.

Diff Detail

Event Timeline

beanz requested review of this revision.Sep 28 2021, 2:16 PM
beanz created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptSep 28 2021, 2:16 PM
beanz updated this revision to Diff 375711.Sep 28 2021, 3:00 PM

Updating so that the warning doesn't fire on an empty initializer list.

We typically do not introduce new off-by-default warnings into Clang because experience has shown that users typically do not enable them (so they tend not to be worth the maintenance burden). Instead, we try to make warnings that can be on-by-default with a very low false positive rate. Have you run over a large corpus of C and C++ code to see what false positives arise? Do you have ideas on what it would take to make this on by default? The one big one I can think of is that it's not at all uncommon in C to only initialize one element of the array to zero and rely on zero initialization of the rest (because an empty initializer list is technically not valid C code; it's a GNU extension we support): https://codesearch.isocpp.org/cgi-bin/cgi_ppsearch?q=%3D+%7B0%7D%3B&search=Search.

beanz added a comment.Sep 29 2021, 8:54 AM

I would greatly prefer to enable this by default, so let me build a toolchain and see how it holds up.

The empty initializer list extension is pretty widely supported, so I single zero-initialization is way less common these days, but I'll look.

I would greatly prefer to enable this by default, so let me build a toolchain and see how it holds up.

Awesome, SGTM!

The empty initializer list extension is pretty widely supported, so I single zero-initialization is way less common these days, but I'll look.

It may be widely supported, but there's plenty of single-zero-inits I was finding on that code search:

static guint notification_signals[LAST_SIGNAL] = { 0 };
char    port_str[MAX_PORT_STR_LEN+1] = {0};
char aBuf[64] = {0};
unsigned char strong_checksum[SHA256_DIGEST_LENGTH] = {0};
unsigned long zones_size[MAX_NR_ZONES] = {0};
unsigned long zholes_size[MAX_NR_ZONES] = {0};
// and so on

Even just building Clang with this enabled revealed... a lot... Some of them are easy to address (having tablegen disable the warning in generated code), some of them are exactly the kind of thing I wanted to warn on, and some of them are more complicated.

Based on that alone, I don't think this would be reasonable to enable by default :(

Thoughts on how to proceed?

xbolva00 added a subscriber: xbolva00.EditedSep 30 2021, 10:34 AM

Why just no special case "= {0};" pattern and do not warn in that case?

If there are more patterns, take it from other side - pick patterns where you are sure that they are wrong and developer needs to fix them and emit warnings for them.

Why just no special case "= {0};" pattern and do not warn in that case?

This was what I was thinking. I was basing that on the idea that = { 0 } to zero init the entire array is far more common than = { 1 } to init the first element to one and the rest to zero (which is a case I think this diagnostic shines on because some users think that will fill all 1 instead of only the first element).

If there are more patterns, take it from other side - pick patterns where you are sure that they are wrong and developer needs to fix them and emit warnings for them.

If there are other patterns, I'd love to know what they are.

xbolva00 added a subscriber: nathanchance.EditedSep 30 2021, 10:45 AM

If there are other patterns, I'd love to know what they are.

Well, I dont know :) but something may arise so small testing would be useful.

If possible, @beanz could share list of new warnings from LLVM build here? Also maybe you can try this with linux kernel? Or with cooperation with @nathanchance..

I can test this on the Linux kernel tonight and report the results tomorrow morning.

beanz added a comment.Sep 30 2021, 2:29 PM

Why just no special case "= {0};" pattern and do not warn in that case?

This case did show up, but was not the majority of issues for LLVM (although might be the majority we care about). The related = {nullptr}; pattern also shows a few times.

If there are more patterns, take it from other side - pick patterns where you are sure that they are wrong and developer needs to fix them and emit warnings for them.

There are a bunch of places (particularly in LLVM instruction matching), where LLVM generates data structures with statically sized arrays that hold the maximum possible values, and rely on zero-initialized values to denote the end of the used space.

For the tablegen-genreated code wrapping in pragmas to disable the warning works, but there isn't really a good way that I can think of to not emit a warning for an array of 10 elements where we provide 6 in a general case while also warning on the actual potential bug.

If possible, @beanz could share list of new warnings from LLVM build here?

The big offenders were the tablegen-generated asm and register matchers and the target files under clang/lib/Basic/Targets. Those accounted for filling my scroll back buffer a few times over...

I'm running a new build now disabling the warning int he clang target files to see how that shapes up. Will share what I find.

I can test this on the Linux kernel tonight and report the results tomorrow morning.

My apologies, I did not have time to get to this last night or today and I am going to be on vacation starting today for a week and a half. I can report back once I am back.

Why just no special case "= {0};" pattern and do not warn in that case?

This case did show up, but was not the majority of issues for LLVM (although might be the majority we care about). The related = {nullptr}; pattern also shows a few times.

If there are more patterns, take it from other side - pick patterns where you are sure that they are wrong and developer needs to fix them and emit warnings for them.

There are a bunch of places (particularly in LLVM instruction matching), where LLVM generates data structures with statically sized arrays that hold the maximum possible values, and rely on zero-initialized values to denote the end of the used space.

For the tablegen-genreated code wrapping in pragmas to disable the warning works, but there isn't really a good way that I can think of to not emit a warning for an array of 10 elements where we provide 6 in a general case while also warning on the actual potential bug.

This was the kind of thing I was worried about. I expect that this pattern also shows up (perhaps less frequently) in non-generated code as well.

If possible, @beanz could share list of new warnings from LLVM build here?

The big offenders were the tablegen-generated asm and register matchers and the target files under clang/lib/Basic/Targets. Those accounted for filling my scroll back buffer a few times over...

I'm running a new build now disabling the warning int he clang target files to see how that shapes up. Will share what I find.

Thanks, I'm curious to hear what you find!

Another alternative to surfacing this as a frontend warning is to make a bugprone check for clang-tidy as the false positive rate seems likely to be more suited to that tool.

RKSimon resigned from this revision.Oct 13 2021, 3:50 AM
RKSimon added a subscriber: RKSimon.