This is an archive of the discontinued LLVM Phabricator instance.

[Sema] Adds the pointer-to-int-cast diagnostic
ClosedPublic

Authored by Mordante on Jan 5 2020, 11:41 AM.

Details

Summary

Converting a pointer to an integer whose result cannot represented in the integer type is undefined behavior is C and prohibited in C++. C++ already has a diagnostic when casting. This adds a diagnostic for C.

The diagnostic is not enabled by default due to the number of diagnostics it triggered while running the tests.

Since this diagnostic uses the range of the conversion it also modifies int-to-pointer-cast diagnostic to use a range.

Fixes PR8718: No warning on casting between pointer and non-pointer-sized int

Diff Detail

Event Timeline

Mordante created this revision.Jan 5 2020, 11:41 AM

The diagnostic is not enabled by default

But GCC enables it for C even without "-Wall or -Wextra". Clang should follow it..

Unit tests: pass. 61248 tests passed, 0 failed and 736 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

The diagnostic is not enabled by default

But GCC enables it for C even without "-Wall or -Wextra". Clang should follow it..

I'll have a look what the impact is. When I tested it about 20 tests failed.

Mordante planned changes to this revision.Jan 7 2020, 12:59 PM

While looking at the test failures I noticed the tests pointer_to_integral_type_conv in clang/test/Sema/MicrosoftExtensions.c were not tested. Will look into it later.

rjmccall added a subscriber: rjmccall.

It's not unusual for new warnings to require changes to other tests. I agree with enabling this by default.

Mordante updated this revision to Diff 242393.Feb 4 2020, 11:51 AM
  • Enabled the warning by default
  • Added an Microsoft extension, the unit tests already expected this behaviour
rsmith accepted this revision.Feb 4 2020, 12:06 PM

Looks fine.

Would it make sense to put the MS extension warning into the -Wpointer-to-int-cast group so that we can control this warning consistently across platforms? You could get that effect by introducing a new warning group (eg, "-Wmicrosoft-pointer-to-int-cast") containing just the ExtWarn warning, and putting that group in both MicrosoftCast and PointerToIntCast.

clang/test/Sema/cast.c
1

Don't specify the warning flag here, so that we have some test coverage that this is enabled by default.

This revision is now accepted and ready to land.Feb 4 2020, 12:06 PM
Mordante marked 2 inline comments as done.Feb 16 2020, 6:40 AM

Thanks for the review.

I'll have a look at your suggestion for a follow-up patch.

clang/test/Sema/cast.c
1

Good catch! This was needed before the diagnostic was enabled by default.

This revision was automatically updated to reflect the committed changes.
Mordante marked an inline comment as done.

This revision caused issue for the MSVC build bot. I created a followup patch D74694.

There appear to a be semantic difference between GCC and clang with the current version of this patch which results in a lot of additional warnings in the Linux kernel: https://godbolt.org/z/eHFJd8

There appear to a be semantic difference between GCC and clang with the current version of this patch which results in a lot of additional warnings in the Linux kernel: https://godbolt.org/z/eHFJd8

Warning about casting to an enum seems clearly correct and in scope for this warning. Warning about casting to _Bool seems clearly incorrect and should not be warned about at all.

There appear to a be semantic difference between GCC and clang with the current version of this patch which results in a lot of additional warnings in the Linux kernel: https://godbolt.org/z/eHFJd8

Warning about casting to an enum seems clearly correct and in scope for this warning. Warning about casting to _Bool seems clearly incorrect and should not be warned about at all.

Agreed. I'll look at a followup patch to remove the warning for _Bool.

nathanchance added a comment.EditedFeb 18 2020, 2:51 PM

Warning about casting to an enum seems clearly correct and in scope for this warning. Warning about casting to _Bool seems clearly incorrect and should not be warned about at all.

Fair enough. There are 97 unique instances of the enum variant of the warning across the various configs that I test so I will start sending patches to add uintptr_t casts to silence it.

https://gist.github.com/nathanchance/2c2892e9e4b411fa78770ed3624812b4

There appear to a be semantic difference between GCC and clang with the current version of this patch which results in a lot of additional warnings in the Linux kernel: https://godbolt.org/z/eHFJd8

Warning about casting to an enum seems clearly correct and in scope for this warning. Warning about casting to _Bool seems clearly incorrect and should not be warned about at all.

Maybe we should only warn if the size of the void* is smaller than the size of the enum? (32b void*, 64b enum)? https://godbolt.org/z/oAts-u

Otherwise this warning creates a massive mess for us to clean up, and I suspect Linux kernel developers will just end up disabling the warning.

There appear to a be semantic difference between GCC and clang with the current version of this patch which results in a lot of additional warnings in the Linux kernel: https://godbolt.org/z/eHFJd8

Warning about casting to an enum seems clearly correct and in scope for this warning. Warning about casting to _Bool seems clearly incorrect and should not be warned about at all.

Maybe we should only warn if the size of the void* is smaller than the size of the enum? (32b void*, 64b enum)? https://godbolt.org/z/oAts-u

Otherwise this warning creates a massive mess for us to clean up, and I suspect Linux kernel developers will just end up disabling the warning.

If deployment is easier if we split out a subgroup that we can turn off, that seems fine. But I don't see any good abstract justification for warning about a cast to int and not a cast to an int-sized enum. What would the reasoning be, just that the latter "couldn't possibly" be intended to preserve the original pointer value, so it must be an opaque value being represented as a void*? That seems pretty weak to me.

There appear to a be semantic difference between GCC and clang with the current version of this patch which results in a lot of additional warnings in the Linux kernel: https://godbolt.org/z/eHFJd8

Warning about casting to an enum seems clearly correct and in scope for this warning. Warning about casting to _Bool seems clearly incorrect and should not be warned about at all.

Maybe we should only warn if the size of the void* is smaller than the size of the enum? (32b void*, 64b enum)? https://godbolt.org/z/oAts-u

Otherwise this warning creates a massive mess for us to clean up, and I suspect Linux kernel developers will just end up disabling the warning.

If deployment is easier if we split out a subgroup that we can turn off, that seems fine. But I don't see any good abstract justification for warning about a cast to int and not a cast to an int-sized enum. What would the reasoning be, just that the latter "couldn't possibly" be intended to preserve the original pointer value, so it must be an opaque value being represented as a void*? That seems pretty weak to me.

Less about enums, more about casts to/from void*, since you might use that in place of a union that would be too large to describe. Specifically, this struct is used throughout the kernel for most drivers: https://elixir.bootlin.com/linux/v5.5.4/source/include/linux/mod_devicetable.h#L260 It is exceedingly common to stuff whatever data in there: https://elixir.bootlin.com/linux/v5.5.4/source/drivers/ata/ahci_brcm.c#L428 so long as the driver is careful not to reinterpret the data as the incorrect type. Describing such a union for ever possible enum packed in there would not be fun.

There appear to a be semantic difference between GCC and clang with the current version of this patch which results in a lot of additional warnings in the Linux kernel: https://godbolt.org/z/eHFJd8

Warning about casting to an enum seems clearly correct and in scope for this warning. Warning about casting to _Bool seems clearly incorrect and should not be warned about at all.

Maybe we should only warn if the size of the void* is smaller than the size of the enum? (32b void*, 64b enum)? https://godbolt.org/z/oAts-u

Otherwise this warning creates a massive mess for us to clean up, and I suspect Linux kernel developers will just end up disabling the warning.

If deployment is easier if we split out a subgroup that we can turn off, that seems fine. But I don't see any good abstract justification for warning about a cast to int and not a cast to an int-sized enum. What would the reasoning be, just that the latter "couldn't possibly" be intended to preserve the original pointer value, so it must be an opaque value being represented as a void*? That seems pretty weak to me.

Less about enums, more about casts to/from void*, since you might use that in place of a union that would be too large to describe. Specifically, this struct is used throughout the kernel for most drivers: https://elixir.bootlin.com/linux/v5.5.4/source/include/linux/mod_devicetable.h#L260 It is exceedingly common to stuff whatever data in there: https://elixir.bootlin.com/linux/v5.5.4/source/drivers/ata/ahci_brcm.c#L428 so long as the driver is careful not to reinterpret the data as the incorrect type. Describing such a union for ever possible enum packed in there would not be fun.

No, I understand the pattern, but they must have already done some sort of pass over the code to make it warning-clean when they're working with a smaller integer type. Or do they just in practice never store smaller integers in there, whereas it's hard to control size with an enum?

No, I understand the pattern, but they must have already done some sort of pass over the code to make it warning-clean when they're working with a smaller integer type. Or do they just in practice never store smaller integers in there, whereas it's hard to control size with an enum?

Unsure.

There appear to a be semantic difference between GCC and clang with the current version of this patch which results in a lot of additional warnings in the Linux kernel: https://godbolt.org/z/eHFJd8

Warning about casting to an enum seems clearly correct and in scope for this warning. Warning about casting to _Bool seems clearly incorrect and should not be warned about at all.

Maybe we should only warn if the size of the void* is smaller than the size of the enum? (32b void*, 64b enum)? https://godbolt.org/z/oAts-u

Otherwise this warning creates a massive mess for us to clean up, and I suspect Linux kernel developers will just end up disabling the warning.

If deployment is easier if we split out a subgroup that we can turn off, that seems fine. But I don't see any good abstract justification for warning about a cast to int and not a cast to an int-sized enum. What would the reasoning be, just that the latter "couldn't possibly" be intended to preserve the original pointer value, so it must be an opaque value being represented as a void*? That seems pretty weak to me.

Less about enums, more about casts to/from void*, since you might use that in place of a union that would be too large to describe. Specifically, this struct is used throughout the kernel for most drivers: https://elixir.bootlin.com/linux/v5.5.4/source/include/linux/mod_devicetable.h#L260 It is exceedingly common to stuff whatever data in there: https://elixir.bootlin.com/linux/v5.5.4/source/drivers/ata/ahci_brcm.c#L428 so long as the driver is careful not to reinterpret the data as the incorrect type. Describing such a union for ever possible enum packed in there would not be fun.

No, I understand the pattern, but they must have already done some sort of pass over the code to make it warning-clean when they're working with a smaller integer type. Or do they just in practice never store smaller integers in there, whereas it's hard to control size with an enum?

Yes, if the data is a regular int, rather than an enum, all of the callsites either cast to long or uintptr_t (which is typedef'd in the kernel to unsigned long). There are a lot fewer of those spots in the kernel (at least from my super quick rg search), most of the spots use an enum, maybe to purposefully avoid this warning? Most, if not all the sites, only store a number that is less than 5 because they use that number to determine exactly which device is present from the match data so the driver can handle different quirks with things like case statements.

martong removed a subscriber: martong.Feb 19 2020, 2:39 AM

Okay. Can we raise this with the kernel folks instead of just assuming they'll be opposed? An obvious patch to fix a few dozen places where they're hit by a warning they intentionally enabled really doesn't seem like a burden. If they push back, fine, we can enable the warning without covering enums.

Okay. Can we raise this with the kernel folks instead of just assuming they'll be opposed? An obvious patch to fix a few dozen places where they're hit by a warning they intentionally enabled really doesn't seem like a burden. If they push back, fine, we can enable the warning without covering enums.

In the original diff the warning was opt in, in the final version it's enabled by default. So they didn't enable the warning intentionally. I agree; if the kernel folks rather not change their code I can create an option to disable the warning for enums, just like it can already be disabled for void*s.