This is an archive of the discontinued LLVM Phabricator instance.

[Sema] Implement -Wmemset-transposed-args
ClosedPublic

Authored by erik.pilkington on Jul 9 2018, 5:26 PM.

Details

Summary

This warning tries to catch programs that incorrectly call memset with the second and third arguments transposed, ie memset(ary, sizeof(ary), 0) instead of memset(ary, 0, sizeof(ary)). This is done by looking at two factors: 1) if the last argument is 0, then this is likely a bug (looks this is what GCC's implementation does) and 2) if the last argument isn't 0, but the second argument is a sizeofexpression. This catches cases like memset(ary, sizeof(ary), 0xff). I also grouped a couple of related diagnostics that deal with these c functions into a new group, -Wsuspicious-memaccess.

llvm.org/PR36242

Thanks for taking a look!
Erik

Diff Detail

Repository
rC Clang

Event Timeline

erik.pilkington created this revision.Jul 9 2018, 5:26 PM
Quuxplusone added inline comments.
clang/include/clang/Basic/DiagnosticSemaKinds.td
662 ↗(On Diff #154727)

If it were my codebase, I'd rather see a cast to (unsigned char) than a cast to (int). (The second argument to memset is supposed to be a single byte.) Why did you pick (int) specifically?

clang/lib/Sema/SemaChecking.cpp
7962 ↗(On Diff #154727)

Honestly, if the second argument to memset involves sizeof *at all*, it's probably a bug, isn't it?

memset(&buf, sizeof foo + 10, 0xff);
memset(&buf, sizeof foo * sizeof bar, 0xff);

Should Clang really go out of its way to silence the warning in these cases?

Address @Quuxplusone comments.

clang/include/clang/Basic/DiagnosticSemaKinds.td
662 ↗(On Diff #154727)

I chose int because that's the actual type of the second parameter to memset, it just gets casted down to unsigned char internally. FWIW, either type will suppress the warning. I'm fine with recommending unsigned char if you have a strong preference for it.

clang/lib/Sema/SemaChecking.cpp
7962 ↗(On Diff #154727)

Sure, that seems reasonable. The new patch makes this check less conservative.

thakis accepted this revision.Jul 11 2018, 8:27 AM
thakis added a subscriber: thakis.

lgtm assuming you ran this on some large code base to make sure it doesn't have false positives.

clang/include/clang/Basic/DiagnosticSemaKinds.td
659 ↗(On Diff #154833)

nit: stay below 80 columns

This revision is now accepted and ready to land.Jul 11 2018, 8:27 AM
aaron.ballman added inline comments.Jul 13 2018, 5:02 AM
clang/include/clang/Basic/DiagnosticSemaKinds.td
662 ↗(On Diff #154727)

My preference is for int as well.

clang/lib/Sema/SemaChecking.cpp
7975–7976 ↗(On Diff #154833)

This functionality should apply equally to wmemset() as well, should it not? The only difference I can think of would be that the type should be cast to wchar_t instead of int to silence the warning.

7997 ↗(On Diff #154833)

Spurious whitespace.

clang/include/clang/Basic/DiagnosticSemaKinds.td
662 ↗(On Diff #154727)

Personally I have a strong preference for "any 8-bit type" — the important thing is that it needs to indicate to the reader that it's intended to be the single-byte value that memset expects. I even want my compiler to diagnose memset(x, 0x1234, z) as a bug!
But I'm not inclined to fight hard, because this is all about the mechanism of *suppressing* of the warning, and I don't imagine we'll see too many people trying to suppress it.

erik.pilkington planned changes to this revision.Jul 17 2018, 10:16 AM

lgtm assuming you ran this on some large code base to make sure it doesn't have false positives.

Sorry for the delay here, I was running this on some very large internal code bases. This is overwhelmingly true-positive, but there are 2 improvements I want to make here before landing this:

  • Suppress the diagnostic in cases like memset(ptr, 0xff, PADDING), when PADDING is a macro defined to be 0.
  • Some platforms #define bzero to __builtin_memset, so we should recognize when we're in a macro expansion to bzero and emit a more accurate diagnostic for bzero(ary, 0);. We might as well add support for __builtin_bzero too.
clang/lib/Sema/SemaChecking.cpp
7975–7976 ↗(On Diff #154833)

Looks like clang doesn't have a builtin for wmemset, its just defined as a normal function. I suppose that we could still try to diagnose calls to top-level functions named wmemset, but thats unprecedented and doesn't really seem worth the trouble.

clang/lib/Sema/SemaChecking.cpp
7984 ↗(On Diff #154833)

Suppress the diagnostic in cases like memset(ptr, 0xff, PADDING), when PADDING is a macro defined to be 0.

Would it suffice to treat a literal 0xFF in the exact same way you treat a literal 0? That is,

if (SecondArg is integer literal 0 or 255 || ThirdArg involves sizeof) {
    do nothing
} else if (ThirdArg is integer literal 0 or 255 || SecondArg involves sizeof) {
    diagnose it
}

You should add a test case proving that memset(x, 0, 0) doesn't get diagnosed (assuming the two 0s come from two different macro expansions, for example).
My sketch above would fail to diagnose memset(x, 0, 255). I don't know if this is a feature or a bug. :) You probably ought to have a test case for that, either way, just to demonstrate the expected behavior.

aaron.ballman added inline comments.Jul 18 2018, 6:20 AM
clang/lib/Sema/SemaChecking.cpp
7975–7976 ↗(On Diff #154833)

I'm fine leaving it off then; I agree it's not worth the trouble.

erik.pilkington marked 5 inline comments as done.

In this new patch:

  • Add support for builtin_bzero (-Wsuspicious-bzero), improve diagnostics for platforms that define bzero to builtin_memset
  • Suppress the diagnostic when the 0 in memset(ptr, something, 0) came from a macro expansion.
  • Address review comments

Thanks!

This revision is now accepted and ready to land.Jul 18 2018, 4:51 PM
clang/lib/Sema/SemaChecking.cpp
7984 ↗(On Diff #154833)

I definitely don't think that ThirdArg being 255 should lead to a diagnostic, regardless of what SecondArg is. I'm fine with omitting a diagnostic in memset(ptr, 255, 0), but if the user explicitly spelled out 0 then I would probably prefer to diagnose. FWIW, for the case I found where we emitted a false-positive the second argument was 0xd9 or something.

aaron.ballman accepted this revision.Jul 19 2018, 5:44 AM

LGTM aside from a minor nit.

clang/lib/Sema/SemaChecking.cpp
8751 ↗(On Diff #156178)

const auto *

This revision was automatically updated to reflect the committed changes.