Page MenuHomePhabricator

[Sema] Implement -Wmemset-transposed-args

Authored by erik.pilkington on Jul 9 2018, 5:26 PM.



This warning tries to catch programs that incorrectly call memset with the second and third arguments transposed, ie memset(ary, sizeof(ary), 0) instead of memset(ary, 0, sizeof(ary)). This is done by looking at two factors: 1) if the last argument is 0, then this is likely a bug (looks this is what GCC's implementation does) and 2) if the last argument isn't 0, but the second argument is a sizeofexpression. This catches cases like memset(ary, sizeof(ary), 0xff). I also grouped a couple of related diagnostics that deal with these c functions into a new group, -Wsuspicious-memaccess.

Thanks for taking a look!

Diff Detail

rC Clang

Event Timeline

erik.pilkington created this revision.Jul 9 2018, 5:26 PM
Quuxplusone added inline comments.
662 ↗(On Diff #154727)

If it were my codebase, I'd rather see a cast to (unsigned char) than a cast to (int). (The second argument to memset is supposed to be a single byte.) Why did you pick (int) specifically?

7962 ↗(On Diff #154727)

Honestly, if the second argument to memset involves sizeof *at all*, it's probably a bug, isn't it?

memset(&buf, sizeof foo + 10, 0xff);
memset(&buf, sizeof foo * sizeof bar, 0xff);

Should Clang really go out of its way to silence the warning in these cases?

Address @Quuxplusone comments.

662 ↗(On Diff #154727)

I chose int because that's the actual type of the second parameter to memset, it just gets casted down to unsigned char internally. FWIW, either type will suppress the warning. I'm fine with recommending unsigned char if you have a strong preference for it.

7962 ↗(On Diff #154727)

Sure, that seems reasonable. The new patch makes this check less conservative.

thakis accepted this revision.Jul 11 2018, 8:27 AM
thakis added a subscriber: thakis.

lgtm assuming you ran this on some large code base to make sure it doesn't have false positives.

659 ↗(On Diff #154833)

nit: stay below 80 columns

This revision is now accepted and ready to land.Jul 11 2018, 8:27 AM
aaron.ballman added inline comments.Jul 13 2018, 5:02 AM
662 ↗(On Diff #154727)

My preference is for int as well.

7975–7976 ↗(On Diff #154833)

This functionality should apply equally to wmemset() as well, should it not? The only difference I can think of would be that the type should be cast to wchar_t instead of int to silence the warning.

7997 ↗(On Diff #154833)

Spurious whitespace.

Quuxplusone added inline comments.Jul 13 2018, 9:47 AM
662 ↗(On Diff #154727)

Personally I have a strong preference for "any 8-bit type" — the important thing is that it needs to indicate to the reader that it's intended to be the single-byte value that memset expects. I even want my compiler to diagnose memset(x, 0x1234, z) as a bug!
But I'm not inclined to fight hard, because this is all about the mechanism of *suppressing* of the warning, and I don't imagine we'll see too many people trying to suppress it.

erik.pilkington planned changes to this revision.Jul 17 2018, 10:16 AM

lgtm assuming you ran this on some large code base to make sure it doesn't have false positives.

Sorry for the delay here, I was running this on some very large internal code bases. This is overwhelmingly true-positive, but there are 2 improvements I want to make here before landing this:

  • Suppress the diagnostic in cases like memset(ptr, 0xff, PADDING), when PADDING is a macro defined to be 0.
  • Some platforms #define bzero to __builtin_memset, so we should recognize when we're in a macro expansion to bzero and emit a more accurate diagnostic for bzero(ary, 0);. We might as well add support for __builtin_bzero too.
7975–7976 ↗(On Diff #154833)

Looks like clang doesn't have a builtin for wmemset, its just defined as a normal function. I suppose that we could still try to diagnose calls to top-level functions named wmemset, but thats unprecedented and doesn't really seem worth the trouble.

Quuxplusone added inline comments.Jul 17 2018, 10:37 AM
7984 ↗(On Diff #154833)

Suppress the diagnostic in cases like memset(ptr, 0xff, PADDING), when PADDING is a macro defined to be 0.

Would it suffice to treat a literal 0xFF in the exact same way you treat a literal 0? That is,

if (SecondArg is integer literal 0 or 255 || ThirdArg involves sizeof) {
    do nothing
} else if (ThirdArg is integer literal 0 or 255 || SecondArg involves sizeof) {
    diagnose it

You should add a test case proving that memset(x, 0, 0) doesn't get diagnosed (assuming the two 0s come from two different macro expansions, for example).
My sketch above would fail to diagnose memset(x, 0, 255). I don't know if this is a feature or a bug. :) You probably ought to have a test case for that, either way, just to demonstrate the expected behavior.

aaron.ballman added inline comments.Jul 18 2018, 6:20 AM
7975–7976 ↗(On Diff #154833)

I'm fine leaving it off then; I agree it's not worth the trouble.

erik.pilkington marked 5 inline comments as done.

In this new patch:

  • Add support for builtin_bzero (-Wsuspicious-bzero), improve diagnostics for platforms that define bzero to builtin_memset
  • Suppress the diagnostic when the 0 in memset(ptr, something, 0) came from a macro expansion.
  • Address review comments


This revision is now accepted and ready to land.Jul 18 2018, 4:51 PM
7984 ↗(On Diff #154833)

I definitely don't think that ThirdArg being 255 should lead to a diagnostic, regardless of what SecondArg is. I'm fine with omitting a diagnostic in memset(ptr, 255, 0), but if the user explicitly spelled out 0 then I would probably prefer to diagnose. FWIW, for the case I found where we emitted a false-positive the second argument was 0xd9 or something.

aaron.ballman accepted this revision.Jul 19 2018, 5:44 AM

LGTM aside from a minor nit.

8751 ↗(On Diff #156178)

const auto *

This revision was automatically updated to reflect the committed changes.